Dependency

pytorch>=2.0.1
torchvision>=0.15.2
munkres>=1.1.4
scikit-learn>=1.2.2
clip>=1.0
timm>=0.9.2
faiss-gpu>=1.7.4

Dataset

CIFAR-10 will be automatically downloaded by Pytorch.

Usage

To improve the readability and extendibility of the code, we split different steps of our method into separate .py files. Below is the step-by-step tutorial. Note that the intermediate results would be saved to the ./data folder.

Image and Text Embedding Inference

We first need to compute the image embedding with the CLIP model by running

python image_embedding.py

and the embedding of WordNet nouns (provided in the ./data folder) for text space construction by running

python text_embedding.py

Text Counterpart Construction

Next, we aim to find discriminative nouns to describe image semantic centers. Motivated by the zero-shot classification paradigm of CLIP, we reversely classify all nouns into $k$ image semantic centers and select the top confident nouns for each image semantic center by running

python filter_nouns.py

The selected nouns compose the text space catering to the input images. Then, we retrieve nouns for each image to compute its counterpart in the text modality by running

python retrieve_text.py

Cluster Heads Training and evaluation

For better collaboration between image and text features, we train additional cluster heads to further improve the clustering performance by running

python train_head.py

The training is extremely efficient, which takes only one minute for the CIFAR-10 dataset.

LICENSE

Our implementation uses the codebase of TAC.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
report		report
README.md		README.md
concat_kmeans.py		concat_kmeans.py
data_utils.py		data_utils.py
eval_utils.py		eval_utils.py
filter_nouns.py		filter_nouns.py
image_embedding.py		image_embedding.py
loss_utils.py		loss_utils.py
models.py		models.py
retrieve_text.py		retrieve_text.py
text_embedding.py		text_embedding.py
train_head.py		train_head.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dependency

Dataset

Usage

Image and Text Embedding Inference

Text Counterpart Construction

Cluster Heads Training and evaluation

LICENSE

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Dependency

Dataset

Usage

Image and Text Embedding Inference

Text Counterpart Construction

Cluster Heads Training and evaluation

LICENSE

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages