Skip to content

[ICLR 2026] An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models

License

Notifications You must be signed in to change notification settings

cvsp-lab/ADA-VTP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

51 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

[ICLR 2026] An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models

Changwoo Baek, Jouwon Song, Sohyeon Kim*, Kyeongbo Kong†

*Equal contribution, †Corresponding author

🌐 Project Page | πŸ“„ Paper (Coming Soon)

πŸŽ‰ News

  • [2026/01] πŸ”₯ Our paper has been accepted to ICLR 2026! 🎊
  • [2026/02] πŸš€ Project page is now live!

πŸ“– Overview

Large Vision-Language Models (LVLMs) have adopted visual token pruning strategies to mitigate substantial computational overhead incurred by extensive visual token sequences. While prior works primarily focus on either attention-based or diversity-based pruning methods, in-depth analysis of these approaches' characteristics and limitations remains largely unexplored.

In this work, we conduct thorough empirical analysis using effective rank (erank) as a measure of feature diversity and attention score entropy to investigate visual token processing mechanisms and analyze the strengths and weaknesses of each approach.

πŸ” Key Findings

Our analysis reveals two key insights:

  1. Diversity aware hybrid pruning methods preserve less feature diversity than intended, and the diversity they do retain is closely tied to increased hallucination frequency compared to attention-based pruning.

Key Findings

  1. Attention-based approaches are more effective on simple images where visual evidence is concentrated, while diversity-based methods better handle complex images with distributed features.

Key Findings

Building on these empirical insights, we show that incorporating image-aware adjustments into existing hybrid pruning strategies consistently improves their performance. We also provide a minimal instantiation of our empirical findings through a simple adaptive pruning mechanism.

πŸ’» Code

Detailed implementation code is coming soon. 🚧

Stay tuned for updates! ⏳

πŸ“§ Contact

For questions or collaborations, please contact:

πŸ™ Acknowledgements

We thank LLaVA and FasterVLM for their excellent work and open-source contributions.

πŸ“œ License

This project is licensed under the Apache License 2.0