This repository contains the code accompanying the paper "Enhancing Video-Based Robot Failure Detection Using Task Knowledge" which was presented at the European Conference on Mobile Robots 2025. We propose to make use of task knowledge to improve performance of video-based robot failure detection. We use the temporal boundaries of the robot’s actions and the location of task-relevant objects to guide frame selection and pre-processing before using a video-classification model for failure detection.
Install the requirements using pip install -r requirements.txt
The MViT checkpoint can be found here ("Kinetics/MVIT_B_32x3_CONV_K600").
We train the baseline for 5 epochs, and load that checkpoint using the partial_ckpt argument and train all models (including the baseline) for another 10 epochs. Alternatively, you could just train your model for 15 epochs to get similar results.
The following command will run the training and test script for the baseline model:
python main.py \
--data_root=/path/to/armbench-defects-video-0.1/ \
--dataset=armbench_video \
--training_trials=/path/to/armbench-defects-video-0.1/train.json \
--val_trials=/path/to/armbench-defects-video-0.1/test_subset.json \
--test_trials=/path/to/armbench-defects-video-0.1/test.json \
--mvit_config=configs/MVIT_B_32x3_CONV.yaml \
--mvit_ckpt=/path/to/k600.pyth \
--batch_size=1 \
--accumulate_grad_batches=16 \
--n_threads=16 \
--learning_rate=0.00003 \
--max_epochs=10 \
--enable_progress_bar \
--partial_ckpt=/path/to/previous_ckpt/last.ckpt \
--log_dir=logs
Add/replace the following options for the different variants mentioned in the paper:
-
Action Subset:
--action_subset_frame_selection -
Action-based Crop:
--action_crop -
Action-aligned FPS augmentation
--action_aligned_fps_aug -
Random FPS augmentation
--non_action_aligned_fps_aug -
Treat each action as a separate sample
--actions_separately -
Image Pair Model
--dataset=armbench_img_pair
python main.py \
--data_root=/path/to/finonet/ \
--dataset=failure_video \
--num_classes=2 \
--mvit_config=configs/MVIT_B_32x3_CONV.yaml \
--mvit_ckpt=/path/to/k600.pyth \
--batch_size=1 \
--accumulate_grad_batches=16 \
--n_threads=16 \
--learning_rate=0.0001 \
--max_epochs=50 \
--enable_progress_bar \
--log_dir=logs \
-
Action-aligned FPS augmentation
--action_aligned_fps_aug -
Image Pair Model
--dataset=failure_img_pair
python main.py \
--data_root=/path/to/ImperfectPour/ \
--dataset=imperfect_pour \
--num_classes=2 \
--mvit_config=configs/MVIT_B_32x3_CONV.yaml \
--mvit_ckpt=/path/to/k600.pyth \
--batch_size=1 \
--accumulate_grad_batches=16 \
--n_threads=16 \
--learning_rate=0.0001 \
--max_epochs=50 \
--enable_progress_bar \
--log_dir=logs \
-
Action-aligned FPS augmentation
--action_aligned_fps_aug -
Image Pair Model
--dataset=imperfect_pour_img_pair
Please cite the paper as follows:
@inproceedings{thoduka2025enhancing,
title={{Enhancing Video-Based Robot Failure Detection Using Task Knowledge}},
author={Thoduka, Santosh and Houben, Sebastian and Gall, Juergen and Plöger, Paul G.},
booktitle={2025 European Conference on Mobile Robots (ECMR)},
year={2025},
pages={1-6},
doi={10.1109/ECMR65884.2025.11162998},
organization={IEEE}
}