Bin Li* 1
Ruichi Zhang* 2
Han Liang† 3
Jingyan Zhang1
Juze Zhang4
Xin Chen3
Lan Xu1
Jingyi Yu1
Jingya Wang† 1,5
1ShanghaiTech University
2University of Pennsylvania
3ByteDance
4Stanford University
5InstAdapt
*Equal contribution †Corresponding Author
InterAgent is the first end-to-end framework for text-driven physics-based multi-agent humanoid control.
It introduces:
- 🧠 An autoregressive diffusion transformer
- 🔀 Multi-stream blocks decoupling proprioception, exteroception, and action
- 🔗 A novel interaction graph exteroception representation
- ⚡ Sparse edge-based attention for robust interaction modeling
InterAgent produces coherent, physically plausible, and semantically faithful multi-agent behaviors from only text prompts.
Clone the repository and create the environment:
git clone https://github.com/BinLee26/InterAgent.git
cd InterAgent
conda create -n interagent python=3.8 -y
conda activate interagent
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -e .Download the training file:
interhuman_train.pkl
from HuggingFace and place it under:
data/
We rely on the official ASE implementation to obtain tracking state-action pairs.
If you wish to track your own motion data:
- Modify ASE to support two agents
- You may refer to the implementation in
./inferencefor a working example of the two-agent setup.
After downloading the data, run:
python training/interagent/dataset/dataset_create_lmdb.pyTo train the policy network:
bash scripts/train.shYou may modify hyperparameters inside the corresponding config files if needed.
Follow the setup instructions in the official
ASE repository
to configure the simulation environment.
Download the pretrained checkpoint from HuggingFace and place it in
checkpoint/
cd inference/multi-agent
bash infer.shIf you find our work useful, please consider citing:
@article{li2025interagent,
title={InterAgent: Physics-based Multi-agent Command Execution via Diffusion on Interaction Graphs},
author={Li, Bin and Zhang, Ruichi and Liang, Han and Zhang, Jingyan and Zhang, Juze and Chen, Xin and Xu, Lan and Yu, Jingyi and Wang, Jingya},
journal={arXiv preprint arXiv:2512.07410},
year={2025}
}This project partially builds upon several excellent open-source works:
We sincerely thank the authors for making their code publicly available.
Please refer to the LICENSE file for details.
