GitHub - GoodGangLabs/AnimaSync: Voice-driven 3D avatar animation engine — lip sync, facial expressions, body motion from audio alone. Rust/WASM, browser-native.

Voice-driven 3D avatar animation engine for the browser.

Extracts emotion from speech and generates lip sync, facial expressions, and body motion in real time — entirely client-side via Rust/WASM.

V1 Demo · V2 Demo · V1 vs V2 · npm V1 · npm V2

Features

Voice → Full-body Animation
_{Not just lip sync. Analyzes speech to generate lip movements, emotional facial expressions, eye blinks, and body poses — all from a single audio stream.}

Emotion-aware Expressions
_{Automatically maps vocal characteristics to facial expressions. Eyebrow raises, smile intensity, jaw dynamics, and blink patterns respond to how things are said, not just what is said.}

Built-in Body Motion
_{Embedded VRMA bone animation clips (idle / speaking poses) with automatic crossfade. Your avatar breathes, shifts weight, and moves naturally — out of the box.}

Browser-native WASM
_{No server needed. Entire pipeline runs in the browser at 30fps with near-native performance via Rust → WebAssembly. ARKit-compatible 52 or 111-dim output.}

Real-time Streaming
_{AudioWorklet-based microphone capture with ~300ms latency. Feed live mic, TTS, or recorded audio — get animated avatar frames back instantly.}

Plug & Play
_{3 lines of code to go from audio to animated avatar. 30-day free trial, no signup. First-class Three.js + VRM integration.}

What AnimaSync Does

Most lip sync engines stop at mouth shapes. AnimaSync goes further — it treats voice as the complete animation source:

Layer	What it generates	How
Lip Sync	Mouth shapes matching phonemes	ONNX inference → ARKit blendshapes (jaw, mouth, tongue)
Facial Expression	Emotion-driven brows, cheeks, eyes	Voice energy & pitch → expression mapping + anatomical constraints
Eye Animation	Natural blinks, micro-movements	Stochastic blink injection (2.5–4.5s intervals, 15% double-blink)
Body Motion	Idle breathing, speaking gestures	Embedded VRMA bone clips with automatic idle ↔ speaking crossfade

One audio stream in → a fully animated 3D avatar out.

Quick Start

Install

# V2 recommended for most use cases
npm install @goodganglabs/lipsync-wasm-v2

# V1 for full 111-dim expression control
npm install @goodganglabs/lipsync-wasm-v1

Peer dependency: onnxruntime-web >= 1.17.0

Minimal Example

import { LipSyncWasmWrapper } from '@goodganglabs/lipsync-wasm-v2';

const lipsync = new LipSyncWasmWrapper();
await lipsync.init(); // 30-day free trial — no key needed

// One call — get lip sync + expressions + blinks, all at once
const result = await lipsync.processFile(audioFile);
for (let i = 0; i < result.frame_count; i++) {
  const frame = lipsync.getFrame(result, i); // number[52] — full face animation
  applyToYourAvatar(frame);
}

CDN (No Bundler)

<script src="https://cdn.jsdelivr.net/npm/onnxruntime-web@1.17.0/dist/ort.min.js"></script>
<script type="module">
  const CDN = 'https://cdn.jsdelivr.net/npm/@goodganglabs/lipsync-wasm-v2@latest';
  const { LipSyncWasmWrapper } = await import(`${CDN}/lipsync-wasm-wrapper.js`);

  const lipsync = new LipSyncWasmWrapper({ wasmPath: `${CDN}/lipsync_wasm_v2.js` });
  await lipsync.init();
  // Ready to process audio
</script>

Examples

Working examples you can run locally — zero npm install, all loaded from CDN.

Example	Description	Source
V1 Data	V1 phoneme engine — 52 ARKit blendshapes visualization, ONNX inference, playback.	index.html
V2 Data	V2 student model — 52 ARKit blendshapes direct prediction, crisp mouth.	index.html
V1 vs V2	Side-by-side dual avatar comparison. Same voice, two animation engines.	index.html

Run any example:

cd examples/vanilla-basic   # or vanilla-avatar, vanilla-comparison
npx serve .                  # or: python3 -m http.server 8080

V1 vs V2

	V2 (Recommended)	V1 (Full Control)
npm	`@goodganglabs/lipsync-wasm-v2`	`@goodganglabs/lipsync-wasm-v1`
Output	52-dim ARKit blendshapes	111-dim ARKit blendshapes
Model	Student distillation (direct prediction)	Phoneme classification → viseme mapping
Post-processing	crisp_mouth + fade + auto-blink	OneEuroFilter + anatomical constraints
Expression generation	Blink injection in post-process	Built-in `IdleExpressionGenerator` (blinks + micro-expressions)
Voice activity	Not included	Built-in `VoiceActivityDetector` (body pose switching)
ONNX fallback	None (ONNX required)	Heuristic mode (energy-based)
Body motion	VRMA idle/speaking (both versions)	VRMA idle/speaking + VAD auto-switch
Best for	Most projects, quick integration	Full expression control, custom avatars

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│  Browser                                                             │
│                                                                      │
│  Audio Source (File / Mic / TTS)                                     │
│       │                                                              │
│       ▼                                                              │
│  ┌──────────┐    ┌────────────┐    ┌──────────────────────────────┐ │
│  │   WASM   │    │    ONNX    │    │           WASM               │ │
│  │ Feature  │───▶│ Inference  │───▶│  Post-processing             │ │
│  │ Extract  │    │   (JS)     │    │  + Expression mapping        │ │
│  └──────────┘    └────────────┘    └────────────┬─────────────────┘ │
│                                                  │                   │
│                        ┌─────────────────────────┼────────────┐     │
│                        │                         │            │     │
│                        ▼                         ▼            ▼     │
│                   Lip Sync              Facial Expression   Blinks  │
│                 (jaw, mouth,          (brows, cheeks,     (natural  │
│                  tongue)               smile, frown)     stochastic)│
│                        │                         │            │     │
│                        └─────────────┬───────────┘            │     │
│                                      ▼                        │     │
│                           52/111-dim ARKit Blendshapes @30fps │     │
│                                      │  ◄─────────────────────┘     │
│                                      ▼                              │
│                        ┌──────────────────────────┐                 │
│                        │  VRMA Bone Animation      │                │
│                        │  idle ↔ speaking crossfade │                │
│                        │  (body pose + gestures)   │                │
│                        └────────────┬─────────────┘                 │
│                                     ▼                               │
│                           3D Avatar (Three.js / Babylon / Unity)    │
└─────────────────────────────────────────────────────────────────────┘

V2 Pipeline

Audio 16kHz PCM
  → [WASM] librosa-compatible features: 141-dim @30fps
  → [JS]   ONNX student model → 52-dim (lip sync + expressions)
  → [WASM] crisp_mouth (mouth sharpening) → fade_in_out (natural onset/offset)
  → [WASM] add_blinks (stochastic eye animation)
  → [WASM] Preset blending: expression channels (brows, eyes) blended with lip sync
  → [VRMA] Bone animation: idle ↔ speaking pose auto-crossfade

V1 Pipeline

Audio 16kHz PCM
  → [WASM] MFCC extraction: 13-dim @100fps
  → [JS]   ONNX inference: 61 phoneme → 22 visemes
  → [WASM] Viseme → 111-dim ARKit blendshapes (lip + expression + extras)
  → [WASM] FPS conversion: 100fps → 30fps
  → [WASM] Anatomical constraints (bilateral symmetry + jaw correction)
  → [WASM] OneEuroFilter (temporal smoothing for natural motion)
  → [WASM] Preset blending: face 40% (expression) + mouth 60% (lip sync)
  → [WASM] IdleExpressionGenerator: blinks (2.5–4.5s, 15% double) + micro-expressions
  → [VRMA] Bone animation: idle ↔ speaking pose crossfade (VAD-triggered)

API Reference

Both V1 and V2 expose the same LipSyncWasmWrapper class:

class LipSyncWasmWrapper {
  constructor(options?: { wasmPath?: string });

  readonly ready: boolean;
  readonly modelVersion: 'v1' | 'v2';
  readonly blendshapeDim: 111 | 52;

  // Initialize — validates license + loads ONNX model
  init(options?: {
    licenseKey?: string;
    onProgress?: (stage: string, percent: number) => void;
    preset?: boolean | string;
  }): Promise<{ mode: string }>;

  // Batch processing
  processFile(file: File): Promise<ProcessResult>;
  processAudio(pcm16k: Float32Array): Promise<ProcessResult>;
  processAudioBuffer(buf: AudioBuffer): Promise<ProcessResult>;

  // Real-time streaming
  processAudioChunk(chunk: Float32Array, isLast?: boolean): Promise<ProcessResult | null>;

  // Extract single frame
  getFrame(result: ProcessResult, index: number): number[];

  // Bone animations
  getVrmaBytes(): { idle: Uint8Array; speaking: Uint8Array };

  // Cleanup
  reset(): void;
  dispose(): void;
}

interface ProcessResult {
  blendshapes: number[];  // flat array: frame_count × dim
  frame_count: number;
  fps: number;            // always 30
  mode: string;
}

Method Quick Reference

Method	Use Case
`processFile(file)`	File upload → returns lip sync + expression + blink frames
`processAudio(float32)`	Pre-loaded audio (e.g., fetched from TTS API)
`processAudioChunk(chunk)`	Real-time mic / TTS streaming
`getVrmaBytes()`	Bone animation clips for idle breathing & speaking gestures
`reset()`	Clear streaming state between utterances

Loading Progress Stages

await lipsync.init({
  onProgress: (stage, percent) => {
    // stage: 'wasm' → 'license' → 'decrypt' → 'onnx'
    updateProgressBar(stage, percent);
  }
});

Real-time Streaming Pattern

// 1. Capture mic audio at 16kHz via AudioWorklet
const ctx = new AudioContext({ sampleRate: 16000 });
// ... setup AudioWorklet (see examples/vanilla-avatar)

// 2. Feed chunks → get blendshape frames back
worklet.port.onmessage = async (e) => {
  const result = await lipsync.processAudioChunk(e.data);
  if (result) {
    for (let i = 0; i < result.frame_count; i++) {
      frameQueue.push(lipsync.getFrame(result, i));
    }
  }
};

// 3. Consume at 30fps in render loop
function render() {
  requestAnimationFrame(render);
  if (frameQueue.length > 0) {
    applyToAvatar(frameQueue.shift());
  }
}

Licensing

30-Day Free Trial

Call init() without a license key. All features available, no signup needed.

await lipsync.init();                                    // free trial
await lipsync.init({ licenseKey: 'ggl_your_key' });     // paid license

	Free Trial	Paid License
Duration	30 days from first use	Unlimited
Setup	None (automatic)	Pass `licenseKey` to `init()`
Domain	Any	Configurable per key
Features	Full access	Full access

Contact GoodGang Labs for license inquiries.

Security

ONNX models are AES-256-GCM encrypted and embedded into the WASM binary
No separate model files are served — decryption happens at runtime
License tokens are Ed25519 signed with 24-hour TTL
Tokens cached in sessionStorage to minimize server requests

Built by GoodGang Labs

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github		.github
assets/readme		assets/readme
examples		examples
tests		tests
.gitignore		.gitignore
.nojekyll		.nojekyll
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
playwright.config.js		playwright.config.js
robots.txt		robots.txt
sitemap.xml		sitemap.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Features

What AnimaSync Does

Quick Start

Install

Minimal Example

CDN (No Bundler)

Examples

V1 vs V2

Architecture

V2 Pipeline

V1 Pipeline

API Reference

Method Quick Reference

Loading Progress Stages

Real-time Streaming Pattern

Licensing

30-Day Free Trial

Security

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

GoodGangLabs/AnimaSync

Folders and files

Latest commit

History

Repository files navigation

Features

What AnimaSync Does

Quick Start

Install

Minimal Example

CDN (No Bundler)

Examples

V1 vs V2

Architecture

V2 Pipeline

V1 Pipeline

API Reference

Method Quick Reference

Loading Progress Stages

Real-time Streaming Pattern

Licensing

30-Day Free Trial

Security

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages