Skip to content
/ PACT Public

[CVPR 2025] PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models

License

Notifications You must be signed in to change notification settings

orailix/PACT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models, CVPR 2025

arXivCVPR 2025

Keywords: Vision-Language Models (VLMs); inference-time visual token reduction; token reduction; token pruning; token clustering and token merging; FlashAttention-compatible token reduction; positional-bias mitigation in token pruning

PACT Performance

PACT Performance LLAVA PACT Performance Qwen
PACT vs other methods on LLaVA-OneVision-7B PACT vs other methods on Qwen2-VL-7B
DBDPC Performance LLAVA PACT Performance InternVL
DBDPC vs other clustering algorithms on LLaVA-OneVision-7B PACT vs other methods on LLaVA-1.6-Mistral-7B

Setup

conda env create -f environment.yml
conda activate pactenv
pip install flash-attn==2.6.3

Usage

1. Testing Existing Methods

Our repo allows you to test PACT along with different visual token reduction methods like FastV, Visual Token Withdrawal, ToMe, and four clustering algorithms :agglomerative clustering, k-means, Density Peaks Clustering, and DBSCAN.

You can find scripts in the scripts folder to reproduce results from the paper. For example, to test PACT on LLaVA-OneVision-7B, you can run:

cd PACT/scripts
bash pact_llava-onevision7b.sh

This script file demonstrates how to test all the different methods supported by our repository. Each method is defined by a config file, with different config files available in the the configs folder. For documentation on config file parameters, refer to this file.

2. Testing Custom Reduction Methods

You can also test a custom pruning or clustering-based reduction method or combine both by using:

In addition to using the correct config file, you need to implement your reduction logic by modifying the custom_pruning function (which computes scores for token pruning) or/and the custom_token_reduction function (which typically defines a clustering-then-merging method) in utils.py. Please refer to the documentation of these functions for more details. Once implemented, you can easily test your custom pruning methods, your custom clustering-based reduction methods, or even combine both by running:

cd PACT/scripts
bash test_custom.sh

Implementation Details

The visual token reduction is implemented by modifying llava_arch.py and modeling_qwen2.py for LLaVA-OneVision, and modeling_qwen2_vl.py for Qwen-VL 2.0. The modifications are based on functions defined in utils.py.

Citation

If you find our work useful, please consider citing our paper:

@InProceedings{Dhouib_2025_CVPR,
    author    = {Dhouib, Mohamed and Buscaldi, Davide and Vanier, Sonia and Shabou, Aymen},
    title     = {PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2025},
    pages     = {14582-14592},
    doi       = {10.1109/CVPR52734.2025.01359}
}

Acknowledgments

This work received financial support from Crédit Agricole S.A. through the research chair with Ecole Polytechnique on Trustworthy and Responsible AI.

About

[CVPR 2025] PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages