"SAM-DAQ: Segment Anything Model with Depth-guided Adaptive Queries for RGB-D Video Salient Object Detection"
by Jia Lin, Xiaofei Zhou, Jiyuan Liu, Runmin Cong, Guodao Zhang, Zhi Liu and Jiyong Zhang
Accepted at AAAI Conference on Artificial Intelligence (AAAI 2026), Poster Track
📑 Paper (arxiv) | 🌐 Project Page
We propose SAM-DAQ, which adapts SAM for fully automatic segmentation by seamlessly integrating depth and temporal cues within a unified framework.
Key Highlights:
- 💡 Depth-Guided Adaptive Adapter: enables prompt-free fine-tuning with minimal memory consumption while facilitates effective RGB-D fusion.
- 🧩 Query-Based Memory: provides efficient online temporal modeling.
- Dependent Models: sam2.1_hiera_large.pt (from facebookresearch/sam2)
- Models we provided: 夸克网盘, Google Drive
Use scripts/train.sh and scripts/test.sh to train and inference separately.
The evaluation tool (DVSOD/DVSOD-Evaluation) is used to measure all saliency results.
The benchmark results of our work can be accessed in:
The work is based on MemSAM and SAM2. Thanks for the open source contributions to these efforts!
if you find our work useful, please cite our paper, thank you!
@article{lin2025sam,
title={SAM-DAQ: Segment Anything Model with Depth-guided Adaptive Queries for RGB-D Video Salient Object Detection},
author={Lin, Jia and Zhou, Xiaofei and Liu, Jiyuan and Cong, Runmin and Zhang, Guodao and Liu, Zhi and Zhang, Jiyong},
journal={arXiv preprint arXiv:2511.09870},
year={2025}
}
