This repository contains the official codes for TransVOS: Video Object Setmentation with Transformers.
- torch >= 1.6.0
- torchvison >= 0.7.0
- ...
To installl requirements, run:
conda env update -n TransVOS --file requirements.yamlWe follow AFB-URR to convert static images (MSRA10K, ECSSD, PASCAL-S, PASCAL VOC2012, COCO) into a uniform format (followed DAVIS).
Download the YouTube-VOS dataset, then organize data as following format:
YTBVOS
|----train
| |-----JPEGImages
| |-----Annotations
| |-----meta.json
|----valid
| |-----JPEGImages
| |-----Annotations
| |-----meta.json
Where JPEGImages and Annotations contain the frames and annotation masks of each video.
Download the DAVIS17 datasets, then organize data as following format:
DAVIS
|----JPEGImages
| |-----480p
|----Annotations
| |-----480p (annotations for DAVIS 2017)
|----ImageSets
| |-----2016
| |-----2017
|----DAVIS-test-dev (data for DAVIS 2017 test-dev)
To pretrain the TransVOS network on static images, modify the dataset root ($cfg.DATA.PRETRAIN_ROOT) in config.py, then run following command.
python train.py --gpu ${GPU-IDS} --exp_name ${experiment} --pretrainTo train the TransVOS network on DAVIS & YouTube-VOS, modify the dataset root ($cfg.DATA.DAVIS_ROOT, $cfg.DATA.YTBVOS_ROOT) in config.py, then run following command.
python train.py --gpu ${GPU-IDS} --exp_name ${experiment} --initial ${./checkpoints/*.pth.tar}Download the pretrained DAVIS17 checkpoint and YouTube-VOS checkpoint.
To eval the TransVOS network on (DAVIS16/17), modify $cfg.DATA.VAL.DATASET_NAME, then run following command
python eval.py --checkpoint ${./checkpoints/*.pth.tar}To test the TransVOS network on (DAVIS17 test-dev/youTube-vos), modify $cfg.DATA.TEST.DATASET_NAME, then run following command
python test.py --checkpoint ${./checkpoints/*.pth.tar}The test results will be saved as indexed png file at ${results}/.
Additionally, you can modify some setting parameters in config.py to change configuration.
This codebase is built upon official AFB-URR repository and official DETR repository.
@article{mei2021transvos,
title={TransVOS: Video Object Segmentation with Transformers},
author={Mei, Jianbiao and Wang, Mengmeng and Lin, Yeneng and Liu, Yong},
journal={arXiv preprint arXiv:2106.00588},
year={2021}
}