LEAD++: Unsupervised Fine-grained Visual Recognition with Multi-context Enhanced Entropy-based Adaptive Distillation
In our experiments, we use the following publicly available fine-grained datasets. All datasets can be downloaded by clicking the corresponding links below.
| Dataset | Download Link |
|---|---|
| CUB-200-2011 | Download |
| Stanford Cars | Download |
| FGVC Aircraft | Download |
| Stanford Dogs | Download |
All datasets are expected to be processed and organized in a unified ImageFolder format.Please download the datasets and arrange them following this structure. For the CUB-200-2011 and FGVC Aircraft dataset, you can use the following command to convert it into the desired ImageFolder format:
python aircraft_organize.py --ds /path/to/fgvc-aircraft-2013b --out /path/to/aircraft --link none
python bird_organize.py --cub_root /path/to/CUB_200_2011 --output_root /path/to/bird_imagefolder
All datasets are expected to be processed into the following structure:
LEAD++
├── bird/🐦
│ ├── train/
├── 001.Black_footed_Albatross
├── 002.Laysan_Albatross
……
├── test/
├── car/🚙
│ ├── train/
├── Acura Integra Type R 2001
├── Acura RL Sedan 2012
……
├── test/
├── aircraft/✈️
│ ├── train/
├── 707-320
├── 727-200
……
├── test/
……
- Ubuntu 22.04
- CUDA 12.4
Use the following instructions to create the corresponding conda environment. Besides, you should download the ResNet50 pre-trained model by clicking here and save it in this folder.
conda create --name LEAD++ python=3.9.1
conda activate LEAD++
pip install -r requirements.txtBefore training, we need to generate cropped images by following the steps below. Please make sure you are in the LEAD++ root directory before running the commands.
cd DDT
chmod +x ./run_ddt.sh
./run_ddt.sh $task $dataset $pretrained $cuda_device
$task is the task name (bird, car, aircraft and others).
$dataset is the dataset path for unsupervised pre-training.
$pretrained indicates whether to use pretrained weights for the model.In our implementation, we use pretrained ResNet-50 weights.
$cuda_device is the ID of used GPU.
After that, we will obtain the cropped version of the dataset.
- For ease of use, we have pre-converted the text descriptions generated by LLM into tensor format and placed them in the
text_description_tensorfolder. The original descriptions and the descriptions of random categories generated by LLM are all in thetext_descriptionfolder. - Run the following scripts for pre-training and downstream linear probing and image retrieval.
chmod +x ./run_*.sh
./run_train_test.sh $task $dataset $llm_description $train_ckpt_name $num_classes $cuda_device &linear_name
$task is the task name (bird or car or aircraft).
$dataset is the dataset path for unsupervised pre-training.
$llm_description is the text description address generated by LLM.
$train_ckpt_name is the name of the folder where the checkpoints are saved in.
$num_classes is the Number of labels. bird 200, car 196, aircraft 100.
$cuda_device is the ID of used GPU.
$linear_name is the name of the folder where the linear probing checkpoints are saved in.
- An example of pretraining on CUB_200_2011.
./run_train_test.sh bird bird/ text_description_tensor/bird_text_tensor.pt result_bird 200 0,1 linear_bird
- For ease of use, we have pre-converted the text descriptions generated by LLM into tensor format and placed them in the
text_description_tensorfolder. The original descriptions and the descriptions of random categories generated by LLM are all in thetext_descriptionfolder. - Run the following script for pretraining. It will save the checkpoints to
./checkpoints/$checkpoints_name/.
chmod +x ./run_train.sh
./run_train.sh $task $dataset $llm_description $checkpoints_name $num_classes $cuda_device
$task is the task name (bird or car or aircraft).
$dataset is the dataset path for unsupervised pre-training.
$llm_description is the text description address generated by LLM.
$checkpoints_name is the name of the folder where the checkpoints are saved in.
$num_classes is the Number of labels. bird 200, car 196, aircraft 100.
$cuda_device is the ID of used GPU.
- An example of pretraining on CUB_200_2011.
./run_train.sh bird bird/ text_description_tensor/bird_text_tensor.pt result_bird 200 0,1
- Run the following script for linear probing. We use a single machine and a single GPU to train linear probing. It will save the checkpoints to
./checkpoints_linear/$checkpoints_name/.
chmod +x ./run_linear.sh
./run_linear.sh $task $pretrained $checkpoints_name $num_classes $cuda_device
$task is the task name (bird or car or aircraft).
$pretrained is the name of the folder where the training checkpoints are saved in.
$checkpoints_name is the name of the folder where the linear probing checkpoints are saved in.
$num_classes is the Number of labels. bird 200, car 196, aircraft 100.
$cuda_device is the ID of used GPU.
- An example of linear probing on CUB_200_2011.
./run_linear.sh bird result_bird linear_bird 200 0
- Run the following script for Image Retrieval. We use a single machine and a single GPU to implement image retrieval.
chmod +x ./run_retrieval.sh
./run_retrieval.sh $task $dataset $pretrained $cuda_device
$task is the task name (bird or car or aircraft).
$dataset is the path to the cropped dataset generated in the Mutual-information-based Localization Preprocessing step for unsupervised pre-training.
$pretrained is the name of the folder where the training checkpoints are saved in.
$cuda_device is the ID of used GPU.
- An example of linear probing on CUB_200_2011.
./run_retrieval.sh bird bird/ result_bird 0
