Exploring Cross-Modal Flows for Few-Shot Learning

Official implementation of the paper Exploring Cross-Modal Flows for Few-Shot Learning.

⚙️ Installation

Prerequisites

Python 3.8.20
Pytorch 2.3.0
CUDA 12.1

Environment Setup

git clone https://github.com/HKUST-LongGroup/FMA.git
cd FMA

conda create -n fma python=3.8.20
conda activate fma
conda install pytorch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install git+https://github.com/openai/CLIP.git
pip install scipy==1.10.1

📁 Dataset Preparation

Setup Instructions

Create a data directory:

mkdir -p data

Follow the detailed instructions in DATASETS.md to download and organize each dataset.
The expected directory structure:

.data/
├── oxford_pets/
├── eurosat/
├── ucf101/
├── sun397/
├── caltech-101/
├── dtd/
├── fgvc_aircraft/
├── food-101/
├── oxford_flowers/
├── stanford_cars/
└── imagenet/

For feature extractors like coop, cocoop, adapter and lora, first download according pre-trained checkpoints:

wget https://github.com/HKUST-LongGroup/FMA/releases/download/FMA_PEFT/FMA_PEFT.tar.gz

Then unpack it tar -xzvf FMA_PEFT.tar.gz , the expected directory structure:

.checkpoints/
├── eurosat/
│   └── coop_vit_b16_16s.pth
│   └── cocoop_vit_b16_16s.pth
│   └── adapter_vit_b16_16s.pth
|   └── lora_vit_b16_16s.pth
├── oxford_pets/
├── ...

🚀 Training

Basic Training

Train a model with default configuration (EuroSAT, 16-shot, CLIP ViT-B/16):

python train.py

Configuration File

You can also modify the default configuration in config.py:

class DefaultConfig:
    def __init__(self):
        self.epochs = 200
        self.batch_size = 32
        self.lr = 2e-4
        self.clip_type = 'ViT-B/16'  # CLIP backbone
        self.dataset = 'EuroSAT'
        self.num_shots = 16
        self.blocks = 12  # Number of residual blocks
        # ... other parameters

Custom Configuration

In addition to edit the config.py, you can also customize the training by specifying command-line arguments:

# Train on a specific dataset
python train.py --dataset OxfordPets

# Specify number of shots
python train.py --dataset EuroSAT --num_shots 8

# Choose feature extractor
python train.py --feature_extractor coop

# Combine multiple options
python train.py --dataset UCF101 --num_shots 4 --feature_extractor clip

# other options, like seed, epochs, batch size, blocks, etc.
python train.py --dataset UCF101 --num_shots 4 --feature_extractor lora --seed 1 --epochs 300 --batch_size 64 --blocks 5

This arguments will override the default setting( in DefaultConfig).

Available Arguments

--dataset: Dataset name (default: EuroSAT)
- Options: OxfordPets, EuroSAT, UCF101, SUN397, Caltech101, DescribableTextures, FGVCAircraft, Food101, OxfordFlowers, StanfordCars, ImageNet
--num_shots: Number of shots for few-shot learning (default: 16)
- Options: 1, 2, 4, 8, 16
--feature_extractor: Feature extractor type (default: coop)
- Options: clip, coop, cocoop, adapter, lora
--seed: Random seed for reproducibility (default: 1)
--gamma: Stochastic noise level for feature interpolation (default: 0)
--epochs: Number of training epochs (default: 200)
--batch_size: Batch size for training (default: 32)
--blocks: Number of residual blocks in velocity network (default: 12)

Training Output

Training creates a timestamped checkpoint directory:

checkpoints/
└── 010857/
    ├── config.json      # Training configuration
    ├── model.pth        # Trained model weights
    └── log.txt          # Training logs

🧪 Evaluation

Test a Trained Model

To evaluate a trained model, use the timestamp of the checkpoint:

python test.py <timestamp>

Example:

python test.py 010857

This will:

Load the saved configuration and model weights
Evaluate the model on the test set with different inference steps (0-10)
Report accuracy for each number of steps

🤖 Project Structure

.
├── config.py                  # Configuration and hyperparameters
├── train.py                   # Training script
├── test.py                    # Evaluation script
├── models/
│   ├── fm.py                  # Flow matching network
│   ├── utils.py               # Tool definition and class.
│   ├── feature_extractor.py   # Feature extraction interface
│   ├── coop_extractor.py      # CoOp extraction interface
│   ├── cocoop_extractor.py    # CoCoOp extraction interface
│   ├── lora_extractor.py      # CLIP-LoRA extraction interface
│   ├── adapter_extractor.py   # CLIP-Adapter extraction interface
│   └── clip_extractor.py      # CLIP feature extractor
├── datasets/
│   ├── __init__.py           # Dataset registry
│   ├── eurosat.py            # EuroSAT dataset
│   ├── oxford_pets.py        # Oxford Pets dataset
│   └── ...                   # Other datasets
├── checkpoints/              # Saved models 
├── data/                     # Datasets 
├── DATASETS.md              # Dataset preparation guide
└── README.md                # This file

Acknowledgments

CLIP for pre-trained vision-language models
CoOp for dataset preparation scripts

Citation

@misc{jiang2025exploringcrossmodalflowsfewshot,
      title={Exploring Cross-Modal Flows for Few-Shot Learning}, 
      author={Ziqi Jiang and Yanghao Wang and Long Chen},
      year={2025},
      eprint={2510.14543},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2510.14543}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploring Cross-Modal Flows for Few-Shot Learning

Table of Contents

⚙️ Installation

Prerequisites

Environment Setup

📁 Dataset Preparation

Setup Instructions

🚀 Training

Basic Training

Configuration File

Custom Configuration

Available Arguments

Training Output

🧪 Evaluation

Test a Trained Model

🤖 Project Structure

Acknowledgments

Citation

About

Uh oh!

Releases 1

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
datasets		datasets
models		models
.gitignore		.gitignore
DATASETS.md		DATASETS.md
README.md		README.md
config.py		config.py
test.py		test.py
train.py		train.py

HKUST-LongGroup/FMA

Folders and files

Latest commit

History

Repository files navigation

Exploring Cross-Modal Flows for Few-Shot Learning

Table of Contents

⚙️ Installation

Prerequisites

Environment Setup

📁 Dataset Preparation

Setup Instructions

🚀 Training

Basic Training

Configuration File

Custom Configuration

Available Arguments

Training Output

🧪 Evaluation

Test a Trained Model

🤖 Project Structure

Acknowledgments

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Uh oh!

Languages

Packages