LEAD++: Unsupervised Fine-grained Visual Recognition with Multi-context Enhanced Entropy-based Adaptive Distillation

This work is an extension of our previous method, LEAD, which focuses on entropy-based adaptive distillation for fine-grained visual representation learning. The official implementation of LEAD is available at: https://github.com/HuiGuanLab/LEAD.

📦 Datasets Preparation

In our experiments, we use the following publicly available fine-grained datasets. All datasets can be downloaded by clicking the corresponding links below.

Dataset	Download Link
CUB-200-2011	Download
Stanford Cars	Download
FGVC Aircraft	Download
Stanford Dogs	Download

All datasets are expected to be processed and organized in a unified ImageFolder format.Please download the datasets and arrange them following this structure. For the CUB-200-2011 and FGVC Aircraft dataset, you can use the following command to convert it into the desired ImageFolder format:

python aircraft_organize.py --ds /path/to/fgvc-aircraft-2013b --out /path/to/aircraft --link none

python bird_organize.py --cub_root /path/to/CUB_200_2011 --output_root /path/to/bird_imagefolder

All datasets are expected to be processed into the following structure:

LEAD++
├── bird/🐦
│   ├── train/ 
		├── 001.Black_footed_Albatross
		├── 002.Laysan_Albatross
		……
	├── test/
├── car/🚙
│   ├── train/ 
		├── Acura Integra Type R 2001
		├── Acura RL Sedan 2012
		……
    ├── test/
├── aircraft/✈️
│   ├── train/ 
		├── 707-320
		├── 727-200
		……
    ├── test/
……

🌏 Environments

Ubuntu 22.04
CUDA 12.4

Use the following instructions to create the corresponding conda environment. Besides, you should download the ResNet50 pre-trained model by clicking here and save it in this folder.

conda create --name LEAD++ python=3.9.1
conda activate LEAD++
pip install -r requirements.txt

🕹 Mutual-information-based Localization Preprocessing

Before training, we need to generate cropped images by following the steps below. Please make sure you are in the LEAD++ root directory before running the commands.

cd DDT
chmod +x ./run_ddt.sh

./run_ddt.sh $task $dataset $pretrained $cuda_device

$task is the task name (bird, car, aircraft and others).

$dataset is the dataset path for unsupervised pre-training.

$pretrained indicates whether to use pretrained weights for the model.In our implementation, we use pretrained ResNet-50 weights.

$cuda_device is the ID of used GPU.

After that, we will obtain the cropped version of the dataset.

🚀 Direct Training and Downstream Testing

For ease of use, we have pre-converted the text descriptions generated by LLM into tensor format and placed them in the text_description_tensor folder. The original descriptions and the descriptions of random categories generated by LLM are all in the text_description folder.
Run the following scripts for pre-training and downstream linear probing and image retrieval.

chmod +x ./run_*.sh

./run_train_test.sh $task $dataset $llm_description $train_ckpt_name $num_classes $cuda_device &linear_name

$task is the task name (bird or car or aircraft).

$dataset is the dataset path for unsupervised pre-training.

$llm_description is the text description address generated by LLM.

$train_ckpt_name is the name of the folder where the checkpoints are saved in.

$num_classes is the Number of labels. bird 200, car 196, aircraft 100.

$cuda_device is the ID of used GPU.

$linear_name is the name of the folder where the linear probing checkpoints are saved in.

An example of pretraining on CUB_200_2011.

./run_train_test.sh bird bird/ text_description_tensor/bird_text_tensor.pt result_bird 200 0,1 linear_bird

⚗ Single Unsupervised Training

For ease of use, we have pre-converted the text descriptions generated by LLM into tensor format and placed them in the text_description_tensor folder. The original descriptions and the descriptions of random categories generated by LLM are all in the text_description folder.
Run the following script for pretraining. It will save the checkpoints to ./checkpoints/$checkpoints_name/.

chmod +x ./run_train.sh

./run_train.sh $task $dataset $llm_description $checkpoints_name $num_classes $cuda_device

$task is the task name (bird or car or aircraft).

$dataset is the dataset path for unsupervised pre-training.

$llm_description is the text description address generated by LLM.

$checkpoints_name is the name of the folder where the checkpoints are saved in.

$num_classes is the Number of labels. bird 200, car 196, aircraft 100.

$cuda_device is the ID of used GPU.

An example of pretraining on CUB_200_2011.

./run_train.sh bird bird/ text_description_tensor/bird_text_tensor.pt result_bird 200 0,1

📋 Single Downstream Task Evaluation

Linear probing

Run the following script for linear probing. We use a single machine and a single GPU to train linear probing. It will save the checkpoints to ./checkpoints_linear/$checkpoints_name/.

chmod +x ./run_linear.sh

./run_linear.sh $task $pretrained $checkpoints_name $num_classes $cuda_device

$task is the task name (bird or car or aircraft).

$pretrained is the name of the folder where the training checkpoints are saved in.

$checkpoints_name is the name of the folder where the linear probing checkpoints are saved in.

$num_classes is the Number of labels. bird 200, car 196, aircraft 100.

$cuda_device is the ID of used GPU.

An example of linear probing on CUB_200_2011.

./run_linear.sh bird result_bird linear_bird 200 0

Image Retrieval

Run the following script for Image Retrieval. We use a single machine and a single GPU to implement image retrieval.

chmod +x ./run_retrieval.sh

./run_retrieval.sh $task $dataset $pretrained $cuda_device

$task is the task name (bird or car or aircraft).

$dataset is the path to the cropped dataset generated in the Mutual-information-based Localization Preprocessing step for unsupervised pre-training.

$pretrained is the name of the folder where the training checkpoints are saved in.

$cuda_device is the ID of used GPU.

An example of linear probing on CUB_200_2011.

./run_retrieval.sh bird bird/ result_bird 0

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
DDT		DDT
text_description		text_description
text_description_tensor		text_description_tensor
LICENSE		LICENSE
README.md		README.md
aircraft_organize.py		aircraft_organize.py
bird_organize.py		bird_organize.py
dataset.py		dataset.py
framework.png		framework.png
linear_probing.py		linear_probing.py
main.py		main.py
models.py		models.py
requirements.txt		requirements.txt
resnet.py		resnet.py
resnet_eval.py		resnet_eval.py
retrieval.py		retrieval.py
run_linear.sh		run_linear.sh
run_retrieval.sh		run_retrieval.sh
run_train.sh		run_train.sh
run_train_test.sh		run_train_test.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LEAD++: Unsupervised Fine-grained Visual Recognition with Multi-context Enhanced Entropy-based Adaptive Distillation

📦 Datasets Preparation

🌏 Environments

🕹 Mutual-information-based Localization Preprocessing

🚀 Direct Training and Downstream Testing

⚗ Single Unsupervised Training

📋 Single Downstream Task Evaluation

Linear probing

Image Retrieval

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

HuiGuanLab/LEAD-pp

Folders and files

Latest commit

History

Repository files navigation

LEAD++: Unsupervised Fine-grained Visual Recognition with Multi-context Enhanced Entropy-based Adaptive Distillation

📦 Datasets Preparation

🌏 Environments

🕹 Mutual-information-based Localization Preprocessing

🚀 Direct Training and Downstream Testing

⚗ Single Unsupervised Training

📋 Single Downstream Task Evaluation

Linear probing

Image Retrieval

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages