VCA: Vision-Click-Action framework for precise manipulation of segmented objects

This repository is the official implementation of VCA.

Repo Structure

train.py Train VCA via DDP
train_one.py Train VCA on one GPU
policy.py An adaptor for ACT policy
detr Model definitions of ACT, modified from DETR
robot/constants.py Constants shared across files
robot/record_episode_w_mask.py Collect robot's state-action data for training
utils.py Utils such as data loading and helper functions

Installation

conda create -n act python=3.8.10
conda activate act
pip install -e .

Also install the realtime_sam2 submodule and download checkpoints

git submodule update --init --recursive --remote
cd external/tamapp/checkpoints
./download_checkpoints.sh

Example Usages

First create ros1 package doosan_robot_server here: https://github.com/robrosinc/doosan_robot_serverConnect

Data collection:

roscore
roslaunch ros_tcp_endpoint teleop_camera_rviz.launch
python mask_server.py --use_masks
python robot/record_episode_w_mask.py --use_masks --task_name blocksort_mask

Training: train for DDP on multiple GPUs(need to add distributedsampler in utils.py first):

PYTHONWARNINGS=ignore torchrun --nproc_per_node=2 --master_port=12355 train.py --ckpt_dir /root/checkpoints --policy_class ACT --task_name blocksort_mask --use_mask --save_every 20000 --batch_size 64 --seed 10 --num_steps 200100 --img_obs_size 1 --wandb --lr 1e-4

train_one.py for single GPU:

python train_one.py --ckpt_dir /root/checkpoints --policy_class ACT --task_name blocksort_mask --use_masks --save_every 20000 --batch_size 128 --seed 10 --num_steps 200100 --img_obs_size 1 --wandb --lr 1e-4

Inference:

roscore
rosrun doosan_robot_server doosan_robot_server_node
roslaunch ros_tcp_endpoint teleop_camera_rviz.launch
python mask_server.py --use_masks
python mask_client_cpp.py --use_masks --ckpt_dir checkpoint/blocksort

License

This repository was initially forked from the ACT repo (MIT License).

This project includes realtimeSAM2 as a submodule,

which was originally forked from: https://github.com/Gy920/segment-anything-2-real-time (Apache License 2.0)

Modifications were made.

Name		Name	Last commit message	Last commit date
Latest commit History 247 Commits
detr		detr
external		external
robot		robot
templates		templates
venv		venv
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
commands.txt		commands.txt
conda_env.yaml		conda_env.yaml
mask_client_cpp.py		mask_client_cpp.py
mask_server.py		mask_server.py
mask_teacher.py		mask_teacher.py
mask_teacher_empty.py		mask_teacher_empty.py
numeric_client_cpp.py		numeric_client_cpp.py
numeric_server.py		numeric_server.py
policy.py		policy.py
requirements.txt		requirements.txt
setup.py		setup.py
text_client_cpp.py		text_client_cpp.py
text_server.py		text_server.py
text_server_dev.py		text_server_dev.py
toprompts.py		toprompts.py
train.py		train.py
train_one.py		train_one.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VCA: Vision-Click-Action framework for precise manipulation of segmented objects

Repo Structure

Installation

Example Usages

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VCA: Vision-Click-Action framework for precise manipulation of segmented objects

Repo Structure

Installation

Example Usages

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages