Jiawei (Joe) Zhou

Jiawei (Joe) Zhou

About
Research
Publications
Open Source
Blog
Teaching

Recent & Upcoming Talks
- Example Talk
Projects
Blog
Projects
Experience
Teaching
- Learn JavaScript
- Learn Python
Publications
Blog
Open Source
Publications
Research

On this page

What do We Know about Vision-Language Models

· 1 min read

What do We Know about Vision-Language Models

Understanding

Attention Sink
Logit Lens, e.g. unsupervised segmentation

Efficiency

Token Pruning/Merging
Connect to videos

Capabilities

Visual Task Ablation (different types of images, e.g. from Berkeley work)
Dense image, ineffectiveness of CLIP encoder vs DINO encoder
Plantonous Hypothesis

Authors

← Jan 1, 0001

© 2026 Jiawei Zhou. This work is licensed under CC BY NC ND 4.0