Keywords: Vision-Language Models (VLMs); inference-time visual token reduction; token reduction; token pruning; token clustering and token merging; FlashAttention-compatible token reduction; positional-bias mitigation in token pruning
![]() |
![]() |
| PACT vs other methods on LLaVA-OneVision-7B | PACT vs other methods on Qwen2-VL-7B |
![]() |
![]() |
| DBDPC vs other clustering algorithms on LLaVA-OneVision-7B | PACT vs other methods on LLaVA-1.6-Mistral-7B |
conda env create -f environment.yml
conda activate pactenv
pip install flash-attn==2.6.3Our repo allows you to test PACT along with different visual token reduction methods like FastV, Visual Token Withdrawal, ToMe, and four clustering algorithms :agglomerative clustering, k-means, Density Peaks Clustering, and DBSCAN.
You can find scripts in the scripts folder to reproduce results from the paper.
For example, to test PACT on LLaVA-OneVision-7B, you can run:
cd PACT/scripts
bash pact_llava-onevision7b.shThis script file demonstrates how to test all the different methods supported by our repository. Each method is defined by a config file, with different config files available in the the configs folder. For documentation on config file parameters, refer to this file.
You can also test a custom pruning or clustering-based reduction method or combine both by using:
- custom_clustering.json for custom clustering based methods.
- custom_pruning.json for custom pruning methods
- custom_combined.json for combinig both pruning and clustering-based merging.
In addition to using the correct config file, you need to implement your reduction logic by modifying the custom_pruning function (which computes scores for token pruning) or/and the custom_token_reduction function (which typically defines a clustering-then-merging method) in utils.py. Please refer to the documentation of these functions for more details. Once implemented, you can easily test your custom pruning methods, your custom clustering-based reduction methods, or even combine both by running:
cd PACT/scripts
bash test_custom.shThe visual token reduction is implemented by modifying llava_arch.py and modeling_qwen2.py for LLaVA-OneVision, and modeling_qwen2_vl.py for Qwen-VL 2.0. The modifications are based on functions defined in utils.py.
If you find our work useful, please consider citing our paper:
@InProceedings{Dhouib_2025_CVPR,
author = {Dhouib, Mohamed and Buscaldi, Davide and Vanier, Sonia and Shabou, Aymen},
title = {PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2025},
pages = {14582-14592},
doi = {10.1109/CVPR52734.2025.01359}
}This work received financial support from Crédit Agricole S.A. through the research chair with Ecole Polytechnique on Trustworthy and Responsible AI.



