- Python 3.8+
- Datasets:
PIE,Infraparis
- Clone the repository:
git clone --recursive https://github.com/your-repo/embeddings_renault.git
git lfs pull# Create a virtual environment
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
pip install -r requirements.txtEach notebook enables to generate and analyze embeddings for one encoder specific model.
To finetune UniTok on the PIE or Infraparis dataset, you can use the following command. Make sure to replace the paths with your actual dataset and model paths.
python UniTok_finetuning/finetune_unitok.py \
--checkpoint /path/to/pretrained_unitok.pth \
--dataset-type pie|infraparis \
--pie-root /path/to/pie_dataset_root \
--infraparis-root /path/to/infraparis_dataset_root \
--captions-file /path/to/captions.json \
--output-dir ./finetune_output \
--batch-size 32 \
--epochs 20 \
--lr 5e-5To generate embeddings for images and text, you can use the following script:
python generation_embeddings/UniTok.py \
--checkpoint /path/to/unitok_checkpoint.pth \ # Path to UniTok checkpoint (standard or fine-tuned)
--pie-dir /path/to/PIE \ # Path to PIE dataset root directory
--sets set01 set02 \ # (Optional) Specific PIE sets to process (e.g., set01 set02). If not specified, process all sets
--output-dir /path/to/output_embeddings \ # Directory to save the embeddings
--batch-size 64 \ # (Optional) Batch size for processing images (default: 64)
--subset 1000 \ # (Optional) Process only a subset of images (for testing)
--crop-top \ # (Optional) Apply top-half cropping (useful for street-level images)
--sample-rate 100 # (Optional) Process every N-th image (default: 100)
# Parameter explanations:
# --checkpoint: Path to the UniTok model checkpoint (.pth file).
# --pie-dir: Root directory of the PIE dataset.
# --sets: List of PIE sets to process (e.g., set01 set02). If omitted, all sets are processed.
# --output-dir: Directory where the output embeddings file will be saved.
# --batch-size: Number of images processed at once (default: 64).
# --subset: If set, only the first N images will be processed (useful for quick tests).
# --crop-top: If set, crops the top half of each image before embedding.
# --sample-rate: Only every N-th image is processed (default: 100).To convert the embeddings to a contrastive format, you can use the following script:
# for PIE dataset:
python UniTok_finetuning/pie_contrastive/contrastive_conversion.py