LongCat-Video-Avatar

🔥 Latest News

Dec 16, 2025: 🚀 We are excited to announce the release of LongCat-Video-Avatar, a unified model that delivers expressive and highly dynamic audio-driven character animation, supporting native tasks including Audio-Text-to-Video, Audio-Text-Image-to-Video, and Video Continuation with seamless compatibility for both single-stream and multi-stream audio inputs. The release includes our Technical Report, code, model weights, and project page.

Support Multiple Generation Modes: one unified model can be used for audio-text-to-video (AT2V) generation, audio-text-image-to-video （ATI2V） generation， and Video Continuation.
Natural Human Dynamics: The disentangled unconditional guidance is designed to effectively decouple speech signals from motion dynamics for natural behavior.
Avoid Repetitive Content: The reference skip attention is adopted to strategically incorporates reference cues to preserve identity while preventing excessive conditional image leakage
Alleviate Error Accumulation from VAE: Cross-Chunk Latent Stitching is designed to eliminates redundant VAE decode-encode cycles to reduce pixel degradation in long sequences

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
assets		assets
audios		audios
images		images
static		static
videos		videos
.DS_Store		.DS_Store
README.md		README.md
index.html		index.html