GitHub - KAIST-Visual-AI-Group/VG-AVS: Toward Ambulatory Vision: Learning Visually-Grounded Active View Selection

Toward Ambulatory Vision: Learning Visually-Grounded Active View Selection

Juil Koo* · Daehyeon Choi* · Sangwoo Youn* · Phillip Y. Lee · Minhyuk Sung

(* Equal Contribution)

KAIST

arXiv 2025

TL;DR

We introduce Visually Grounded Active View Selection (VG-AVS) Framework, enabling embodied agents to actively adjust their viewpoint for better Visual Question Answering using only current visual cues, achieving state-of-the-art performance on synthetic and real-world benchmarks.

Release Checklist

🚧 Pretrained (SFT, SFT+GRPO) model checkpoint. (Expected due: early of January)

✅ AVS-ProcTHOR & AVS-HM3D dataset, training/inference/evaluation code. (12.24)

Code

1. Environment Setup

We tested our code in CUDA 12.8 with NVIDIA H200 GPUs. However, it might work in different CUDA environment and GPU device.

Conda Environment

Clone this repository:

git clone https://github.com/KAIST-Visual-AI-Group/VG-AVS.git
cd VG-AVS

# initialize virtual environment. we used conda.
conda create --name avs python=3.11 -y 
conda activate avs

# Firstly, install torch fit with your gpu. We used 2.8.0+cu128.
pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu128 

# install other libraries.
bash setup.sh

2. Download data

ProcTHOR

We release the training data (ProcTHOR) and evaluation data (ProcTHOR) in huggingface, so please download and move these files in your project folder.

# move data samples in 'data' folder
mv avs_procthor_train.tar.gz avs_procthor_existence.tar.gz avs_procthor_counting.tar.gz avs_procthor_state.tar.gz ./data/

# extract files from tar file 
tar -xvf avs_procthor_train.tar.gz
tar -xvf avs_procthor_existence.tar.gz
tar -xvf avs_procthor_counting.tar.gz
tar -xvf avs_procthor_state.tar.gz

HM3D

For the case of HM3D dataset, please download the datasets with this offiical instruction at first. (Habitat-Matterport3D)

# authorize yourself and download 'v0.2/val' splits. 
mv hm3d-val-semantic-configs-v0.2.tar hm3d-val-semantic-annots-v0.2.tar hm3d-val-habitat-v0.2.tar hm3d-val-glb-v0.2.tar ./data/hm3d/val/

# extract files from tar file 
cd ./data/hm3d/val 
tar -xvf hm3d-val-semantic-configs-v0.2.tar
tar -xvf hm3d-val-semantic-annots-v0.2.tar
tar -xvf hm3d-val-habitat-v0.2.tar
tar -xvf hm3d-val-glb-v0.2.tar

Then additionally download data snapshot from huggingface, then move it into "data" folder.

# move data samples in 'data' folder
mv avs_hm3d.tar.gz ./data/

Finally, the folder structure is like below:

data/
├── hm3d/
│   └── val/
│       ├── 00800-TEEsavR23oF/
│       └── 00YYY-zzzzzzzzzzz/
├── avs_procthor_train/
├── avs_procthor_existence/
├── avs_procthor_counting/
├── avs_procthor_state/
├── avs_hm3d/
├── avs_hm3d_overall.jsonl
└──...

3. Setting Simulation Environment

Habitat-Sim

export CMAKE_POLICY_VERSION_MINIMUM=3.5

# building habitat-sim from source. it would take minutes.
git clone --branch stable https://github.com/facebookresearch/habitat-sim.git
cd habitat-sim
pip install . -v

Sanity Check

Our framework utilizes two different types of simulation environment (AI2-THOR and HM3D), so before running the code, please check each environment works properly in your setting.

Please follow notebook/environment_check.ipynb.

4. Download Pretrained model

We will release our pretrained model soon. Stay tuned:)

5. Run

Before running training or evaluation scripts, you need to configure the following paths and API keys.

Configuration

Required Paths

Variable	Description	Example
`PROJECT_ROOT`	Root directory of the project	`/home/user/VG-AVS`
`DATA_JSONL`	Path to training/evaluation JSONL file	`/path/to/data/avs_procthor_train.jsonl`
`IMG_ROOT`	Root directory containing images	`${PROJECT_ROOT}/data`
`MODEL_PATH`	Path to trained model (for evaluation)	`${PROJECT_ROOT}/src/open-r1-multimodal/output/grpo-procthor`

API Keys (for Evaluation)

The evaluation scripts use LLM APIs for the verifier model. Set these environment variables:

export GEMINI_API_KEY="your_gemini_api_key"   # Required for Gemini verifier
export OPENAI_API_KEY="your_openai_api_key"   # Optional, for GPT verifier

Tutorial

You can easliy test our framework in ProcTHOR environment.

bash src/open-r1-multimodal/run_scripts/test_procthor_single_sample.sh

Training

SFT Training (Supervised Fine-tuning):

bash src/open-r1-multimodal/run_scripts/run_sft_procthor_active_qa.sh

GRPO Training (Reinforcement Learning):

# Set required paths
bash src/open-r1-multimodal/run_scripts/run_grpo_procthor_active_qa.sh

Evaluation

ProcTHOR Evaluation:

# Set required paths and API keys
bash src/open-r1-multimodal/run_scripts/test_procthor_action_accuracy.sh

HM3D Evaluation:

bash src/open-r1-multimodal/run_scripts/test_hm3d_action_accuracy.sh

Acknowledgement

Our implementation is built upon amazing projects including Qwen2.5-VL, VLM-R1, AI2-THOR, Habitat-Sim. We greatly thank all authors and contributors for open-sourcing their code and model checkpoints.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
data		data
notebook		notebook
src/open-r1-multimodal		src/open-r1-multimodal
.gitignore		.gitignore
README.md		README.md
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Toward Ambulatory Vision: Learning Visually-Grounded Active View Selection

TL;DR

Release Checklist

Code

1. Environment Setup

Conda Environment

2. Download data

ProcTHOR

HM3D

3. Setting Simulation Environment

Habitat-Sim

Sanity Check

4. Download Pretrained model

5. Run

Configuration

Required Paths

API Keys (for Evaluation)

Tutorial

Training

Evaluation

Acknowledgement

About

Uh oh!

Releases

Packages

Languages

KAIST-Visual-AI-Group/VG-AVS

Folders and files

Latest commit

History

Repository files navigation

Toward Ambulatory Vision: Learning Visually-Grounded Active View Selection

TL;DR

Release Checklist

Code

1. Environment Setup

Conda Environment

2. Download data

ProcTHOR

HM3D

3. Setting Simulation Environment

Habitat-Sim

Sanity Check

4. Download Pretrained model

5. Run

Configuration

Required Paths

API Keys (for Evaluation)

Tutorial

Training

Evaluation

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages