This repository includes the official implementation of the paper:
Video Individual Counting With Implicit One-to-many Matching
International Conference on Image Processing (ICIP), 2025
Xuhui Zhu1, Jing Xu2,Bingjie Wang3,Huikang Dai2,Hao Lu1
1Huazhong University of Science and Technology, China
2FiberHome Telecommunication Technologies Co., Ltd., China
3University of Rochester, Rochester, USA
Video Individual Counting (VIC) aims to estimate pedestrian flux from a video. Existing VIC approaches, however, mainly follow a one-to-one (O2O) matching strategy where the same pedestrian must be exactly matched between frames, leading to sensitivity to appearance variations or missing detections. In this work, we show that the O2O matching could be relaxed to a one-to-many (O2M) matching problem, which better fits the problem nature of VIC and can leverage the social grouping behavior of walking pedestrians. We therefore introduce OMAN, a simple but effective VIC model with implicit One-to-Many mAtchiNg, featuring an implicit context generator and a one-to-many pairwise matcher. Experiments on the SenseCrowd and CroHD benchmarks show that OMAN achieves the state-of-the-art performance.
Clone and set up the CGNet repository:
git clone https://github.com/tiny-smart/OMAN
cd OMAN
conda create -n OMAN python=3.9
conda activate OMAN
pip install -r requirements.txt
- SenseCrowd: Download the dataset from Baidu disk or from the original dataset link.
- Download ImageNet pretrained ConvNext[baidu dist][Google drive], and put it in
pretrainedfolder. Or you can define your pre-trained model path in models/backbones/backbone.py - To test OMAN on SenseCrowd dataset, run
python test.py
- To evaluate the results after testing, run
python eval_metrics.py
- Environment:
python==3.9
pytorch==2.0.1
torchvision==0.15.2
- Models:
| Dataset | Model Link | MAE | MSE | WRAE |
|---|---|---|---|---|
| SenseCrowd | SENSE.pth[Baidu disk][Google drive] | 8.58 | 16.80 | 10.89% |
If you find this work helpful for your research, please consider citing:
@INPROCEEDINGS{11084398,
author={Zhu, Xuhui and Xu, Jing and Wang, Bingjie and Dai, Huikang and Lu, Hao},
booktitle={2025 IEEE International Conference on Image Processing (ICIP)},
title={Video Individual Counting with Implicit One-to-Many Matching},
year={2025},
volume={},
number={},
pages={61-66},
keywords={Legged locomotion;Pedestrians;Sensitivity;Codes;Image processing;Semantics;Benchmark testing;Generators;Standards;Context modeling;Video individual counting;pedestrian flux;semantic correspondence;one-to-many matching},
doi={10.1109/ICIP55913.2025.11084398}}
This code is for academic purposes only. Contact: Xuhui Zhu (XuhuiZhu@hust.edu.cn)
We thank the authors of CGNet and PET for open-sourcing their work.
