AirObject is a CVPR 2022 research project. In this work, we present AirObject, a temporal class-agnostic object encoding method to obtain global keypoint graph-based embeddings of objects. Specifically, the global 3D object embeddings are generated using a temporal convolutional network across structural information of multiple frames obtained from a graph attention-based encoding method. This repo contains the official code to run AirObject and other Baselines presented in the paper for Video Object Identification. We also provide code for training and global object tracking.
Matching Temporally Evolving Representations using AirObject
For more details, please see:
-
Full paper PDF: AirObject: A Temporally Evolving Graph Embedding for Object Identification.
-
Authors: Nikhil Varma Keetha, Chen Wang, Yuheng Qiu, Kuan Xu, Sebastian Scherer
Simply run the following commands:
conda create --channel conda-forge --name airobj --file ./AirObject/conda_requirements.txt
conda activate airobj
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=10.2 -c pytorch
pip install pyyaml opencv-python scipy tqdm pycocotools
cd ./AirObject/cocoapi/PythonAPI
python setup.py build
python setup.py installFor Data Loading, we use dataloaders present in the datasets folder.
Please download train.zip and annotations_train.json from the OVIS Dataset for Inference:
For inference, please download the models.zip file:
We first start by pre-extracting SuperPoint features for all the images. Please modify the superpoint_extraction.yaml config file to extract SuperPoint Features for different datasets:
python './AirObject/superpoint_extraction.py' -c './AirObject/config/superpoint_extraction.yaml' -g 1Please modify the eval.yaml config file to test for different methods and datasets.
On the first run, we save the video object dictionary to the save_dir to avoid redundant computation. Subsequently, to use the video object dictionary, set resume to True.
python './AirObject/eval.py' -c './AirObject/config/eval.yaml' -g 1PR-AUC(%) results on YT-VIS Test Split:
| 2D Baseline | 3D Baseline | NetVLAD | SeqNet | AirObject |
|---|---|---|---|---|
| 81.00 | 80.01 | 75.43 | 89.42 | 91.20 |
PR-AUC(%) results on OVIS:
| 2D Baseline | 3D Baseline | NetVLAD | SeqNet | AirObject |
|---|---|---|---|---|
| 40.11 | 43.50 | 40.54 | 62.60 | 63.07 |
Please modify the eval_seq.yaml config file to test SeqNet & AirObject for different sequence lengths.
Use the saved video object dictionary obtained from eval.py.
python './AirObject/eval_seq.py' -c './AirObject/config/eval_seq.yaml' -g 1Expected directory structure for track.py:
Base Directory/
├── image_0
├── semantic
└── sp_0MaskRCNN Inference:
python './tracking/maskrcnn_inference.py' -c './config/tracking/maskrcnn_inference.yaml' -g 1SuperPoint Inference:
python './tracking/superpoint_inference.py' -c './config/tracking/superpoint_inference.yaml' -g 1Please modify track.yaml to run global object tracking with different methods and parameters.
Global Object Tracking:
python './tracking/track.py' -c './config/tracking/track.yaml' -g 1To train Graph Attention Encoder: (Please refer to train_gcn.yaml)
python './train/train_gcn.py' -c './config/train_gcn.yaml' -g 1To train NetVLAD: (Please refer to train_netvlad.yaml)
python './train/train_netvlad.py' -c './config/train_netvlad.yaml' -g 1To train SeqNet: (Please refer to train_seqnet.yaml)
python './train/train_seqnet.py' -c './config/train_seqnet.yaml' -g 1To train AirObject: (Please refer to train_airobj.yaml)
python './train/train_airobj.py' -c './config/train_airobj.yaml' -g 1If any ideas from the paper or code from this repo are used, please consider citing:
@inproceedings{keetha2022airobject,
title = {AirObject: A Temporally Evolving Graph Embedding for Object Identification},
author = {Keetha, Nikhil Varma and Wang, Chen and Qiu, Yuheng and Xu, Kuan and Scherer, Sebastian},
booktitle = {CVPR},
year = {2022},
url = {https://arxiv.org/abs/2111.15150}}The code is licensed under the BSD 3-Clause License.
The authors acknowledge the support from the AirLab and Robotics Institute, Carnegie Mellon University. This work was supported by ONR Grant N0014-19-1-2266 and ARL DCIST CRA award W911NF-17-2-0181.
We would also like to acknowledge the AirCode and YT-VOS API code repos.
Please check out the Air Series Articles.
