Official implementation of the paper Exploring Cross-Modal Flows for Few-Shot Learning.
- Python 3.8.20
- Pytorch 2.3.0
- CUDA 12.1
git clone https://github.com/HKUST-LongGroup/FMA.git
cd FMAconda create -n fma python=3.8.20
conda activate fma
conda install pytorch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install git+https://github.com/openai/CLIP.git
pip install scipy==1.10.1- Create a data directory:
mkdir -p data-
Follow the detailed instructions in DATASETS.md to download and organize each dataset.
-
The expected directory structure:
.data/
├── oxford_pets/
├── eurosat/
├── ucf101/
├── sun397/
├── caltech-101/
├── dtd/
├── fgvc_aircraft/
├── food-101/
├── oxford_flowers/
├── stanford_cars/
└── imagenet/
- For feature extractors like coop, cocoop, adapter and lora, first download according pre-trained checkpoints:
wget https://github.com/HKUST-LongGroup/FMA/releases/download/FMA_PEFT/FMA_PEFT.tar.gz
Then unpack it tar -xzvf FMA_PEFT.tar.gz , the expected directory structure:
.checkpoints/
├── eurosat/
│ └── coop_vit_b16_16s.pth
│ └── cocoop_vit_b16_16s.pth
│ └── adapter_vit_b16_16s.pth
| └── lora_vit_b16_16s.pth
├── oxford_pets/
├── ...
Train a model with default configuration (EuroSAT, 16-shot, CLIP ViT-B/16):
python train.pyYou can also modify the default configuration in config.py:
class DefaultConfig:
def __init__(self):
self.epochs = 200
self.batch_size = 32
self.lr = 2e-4
self.clip_type = 'ViT-B/16' # CLIP backbone
self.dataset = 'EuroSAT'
self.num_shots = 16
self.blocks = 12 # Number of residual blocks
# ... other parametersIn addition to edit the config.py, you can also customize the training by specifying command-line arguments:
# Train on a specific dataset
python train.py --dataset OxfordPets
# Specify number of shots
python train.py --dataset EuroSAT --num_shots 8
# Choose feature extractor
python train.py --feature_extractor coop
# Combine multiple options
python train.py --dataset UCF101 --num_shots 4 --feature_extractor clip
# other options, like seed, epochs, batch size, blocks, etc.
python train.py --dataset UCF101 --num_shots 4 --feature_extractor lora --seed 1 --epochs 300 --batch_size 64 --blocks 5This arguments will override the default setting( in DefaultConfig).
-
--dataset: Dataset name (default:EuroSAT)- Options:
OxfordPets,EuroSAT,UCF101,SUN397,Caltech101,DescribableTextures,FGVCAircraft,Food101,OxfordFlowers,StanfordCars,ImageNet
- Options:
-
--num_shots: Number of shots for few-shot learning (default:16)- Options:
1,2,4,8,16
- Options:
-
--feature_extractor: Feature extractor type (default:coop)- Options:
clip,coop,cocoop,adapter,lora
- Options:
-
--seed: Random seed for reproducibility (default:1) -
--gamma: Stochastic noise level for feature interpolation (default:0) -
--epochs: Number of training epochs (default:200) -
--batch_size: Batch size for training (default:32) -
--blocks: Number of residual blocks in velocity network (default:12)
Training creates a timestamped checkpoint directory:
checkpoints/
└── 010857/
├── config.json # Training configuration
├── model.pth # Trained model weights
└── log.txt # Training logs
To evaluate a trained model, use the timestamp of the checkpoint:
python test.py <timestamp>Example:
python test.py 010857This will:
- Load the saved configuration and model weights
- Evaluate the model on the test set with different inference steps (0-10)
- Report accuracy for each number of steps
.
├── config.py # Configuration and hyperparameters
├── train.py # Training script
├── test.py # Evaluation script
├── models/
│ ├── fm.py # Flow matching network
│ ├── utils.py # Tool definition and class.
│ ├── feature_extractor.py # Feature extraction interface
│ ├── coop_extractor.py # CoOp extraction interface
│ ├── cocoop_extractor.py # CoCoOp extraction interface
│ ├── lora_extractor.py # CLIP-LoRA extraction interface
│ ├── adapter_extractor.py # CLIP-Adapter extraction interface
│ └── clip_extractor.py # CLIP feature extractor
├── datasets/
│ ├── __init__.py # Dataset registry
│ ├── eurosat.py # EuroSAT dataset
│ ├── oxford_pets.py # Oxford Pets dataset
│ └── ... # Other datasets
├── checkpoints/ # Saved models
├── data/ # Datasets
├── DATASETS.md # Dataset preparation guide
└── README.md # This file
@misc{jiang2025exploringcrossmodalflowsfewshot,
title={Exploring Cross-Modal Flows for Few-Shot Learning},
author={Ziqi Jiang and Yanghao Wang and Long Chen},
year={2025},
eprint={2510.14543},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2510.14543},
}