Voice-driven 3D avatar animation engine for the browser.
Extracts emotion from speech and generates lip sync, facial expressions, and body motion in real time — entirely client-side via Rust/WASM.
|
Voice → Full-body Animation Emotion-aware Expressions Built-in Body Motion |
Browser-native WASM Real-time Streaming Plug & Play |
Most lip sync engines stop at mouth shapes. AnimaSync goes further — it treats voice as the complete animation source:
| Layer | What it generates | How |
|---|---|---|
| Lip Sync | Mouth shapes matching phonemes | ONNX inference → ARKit blendshapes (jaw, mouth, tongue) |
| Facial Expression | Emotion-driven brows, cheeks, eyes | Voice energy & pitch → expression mapping + anatomical constraints |
| Eye Animation | Natural blinks, micro-movements | Stochastic blink injection (2.5–4.5s intervals, 15% double-blink) |
| Body Motion | Idle breathing, speaking gestures | Embedded VRMA bone clips with automatic idle ↔ speaking crossfade |
One audio stream in → a fully animated 3D avatar out.
# V2 recommended for most use cases
npm install @goodganglabs/lipsync-wasm-v2
# V1 for full 111-dim expression control
npm install @goodganglabs/lipsync-wasm-v1Peer dependency:
onnxruntime-web>= 1.17.0
import { LipSyncWasmWrapper } from '@goodganglabs/lipsync-wasm-v2';
const lipsync = new LipSyncWasmWrapper();
await lipsync.init(); // 30-day free trial — no key needed
// One call — get lip sync + expressions + blinks, all at once
const result = await lipsync.processFile(audioFile);
for (let i = 0; i < result.frame_count; i++) {
const frame = lipsync.getFrame(result, i); // number[52] — full face animation
applyToYourAvatar(frame);
}<script src="https://cdn.jsdelivr.net/npm/onnxruntime-web@1.17.0/dist/ort.min.js"></script>
<script type="module">
const CDN = 'https://cdn.jsdelivr.net/npm/@goodganglabs/lipsync-wasm-v2@latest';
const { LipSyncWasmWrapper } = await import(`${CDN}/lipsync-wasm-wrapper.js`);
const lipsync = new LipSyncWasmWrapper({ wasmPath: `${CDN}/lipsync_wasm_v2.js` });
await lipsync.init();
// Ready to process audio
</script>Working examples you can run locally — zero npm install, all loaded from CDN.
| Example | Description | Source |
|---|---|---|
| V1 Data | V1 phoneme engine — 52 ARKit blendshapes visualization, ONNX inference, playback. | index.html |
| V2 Data | V2 student model — 52 ARKit blendshapes direct prediction, crisp mouth. | index.html |
| V1 vs V2 | Side-by-side dual avatar comparison. Same voice, two animation engines. | index.html |
Run any example:
cd examples/vanilla-basic # or vanilla-avatar, vanilla-comparison
npx serve . # or: python3 -m http.server 8080| V2 (Recommended) | V1 (Full Control) | |
|---|---|---|
| npm | @goodganglabs/lipsync-wasm-v2 |
@goodganglabs/lipsync-wasm-v1 |
| Output | 52-dim ARKit blendshapes | 111-dim ARKit blendshapes |
| Model | Student distillation (direct prediction) | Phoneme classification → viseme mapping |
| Post-processing | crisp_mouth + fade + auto-blink | OneEuroFilter + anatomical constraints |
| Expression generation | Blink injection in post-process | Built-in IdleExpressionGenerator (blinks + micro-expressions) |
| Voice activity | Not included | Built-in VoiceActivityDetector (body pose switching) |
| ONNX fallback | None (ONNX required) | Heuristic mode (energy-based) |
| Body motion | VRMA idle/speaking (both versions) | VRMA idle/speaking + VAD auto-switch |
| Best for | Most projects, quick integration | Full expression control, custom avatars |
┌─────────────────────────────────────────────────────────────────────┐
│ Browser │
│ │
│ Audio Source (File / Mic / TTS) │
│ │ │
│ ▼ │
│ ┌──────────┐ ┌────────────┐ ┌──────────────────────────────┐ │
│ │ WASM │ │ ONNX │ │ WASM │ │
│ │ Feature │───▶│ Inference │───▶│ Post-processing │ │
│ │ Extract │ │ (JS) │ │ + Expression mapping │ │
│ └──────────┘ └────────────┘ └────────────┬─────────────────┘ │
│ │ │
│ ┌─────────────────────────┼────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ Lip Sync Facial Expression Blinks │
│ (jaw, mouth, (brows, cheeks, (natural │
│ tongue) smile, frown) stochastic)│
│ │ │ │ │
│ └─────────────┬───────────┘ │ │
│ ▼ │ │
│ 52/111-dim ARKit Blendshapes @30fps │ │
│ │ ◄─────────────────────┘ │
│ ▼ │
│ ┌──────────────────────────┐ │
│ │ VRMA Bone Animation │ │
│ │ idle ↔ speaking crossfade │ │
│ │ (body pose + gestures) │ │
│ └────────────┬─────────────┘ │
│ ▼ │
│ 3D Avatar (Three.js / Babylon / Unity) │
└─────────────────────────────────────────────────────────────────────┘
Audio 16kHz PCM
→ [WASM] librosa-compatible features: 141-dim @30fps
→ [JS] ONNX student model → 52-dim (lip sync + expressions)
→ [WASM] crisp_mouth (mouth sharpening) → fade_in_out (natural onset/offset)
→ [WASM] add_blinks (stochastic eye animation)
→ [WASM] Preset blending: expression channels (brows, eyes) blended with lip sync
→ [VRMA] Bone animation: idle ↔ speaking pose auto-crossfade
Audio 16kHz PCM
→ [WASM] MFCC extraction: 13-dim @100fps
→ [JS] ONNX inference: 61 phoneme → 22 visemes
→ [WASM] Viseme → 111-dim ARKit blendshapes (lip + expression + extras)
→ [WASM] FPS conversion: 100fps → 30fps
→ [WASM] Anatomical constraints (bilateral symmetry + jaw correction)
→ [WASM] OneEuroFilter (temporal smoothing for natural motion)
→ [WASM] Preset blending: face 40% (expression) + mouth 60% (lip sync)
→ [WASM] IdleExpressionGenerator: blinks (2.5–4.5s, 15% double) + micro-expressions
→ [VRMA] Bone animation: idle ↔ speaking pose crossfade (VAD-triggered)
Both V1 and V2 expose the same LipSyncWasmWrapper class:
class LipSyncWasmWrapper {
constructor(options?: { wasmPath?: string });
readonly ready: boolean;
readonly modelVersion: 'v1' | 'v2';
readonly blendshapeDim: 111 | 52;
// Initialize — validates license + loads ONNX model
init(options?: {
licenseKey?: string;
onProgress?: (stage: string, percent: number) => void;
preset?: boolean | string;
}): Promise<{ mode: string }>;
// Batch processing
processFile(file: File): Promise<ProcessResult>;
processAudio(pcm16k: Float32Array): Promise<ProcessResult>;
processAudioBuffer(buf: AudioBuffer): Promise<ProcessResult>;
// Real-time streaming
processAudioChunk(chunk: Float32Array, isLast?: boolean): Promise<ProcessResult | null>;
// Extract single frame
getFrame(result: ProcessResult, index: number): number[];
// Bone animations
getVrmaBytes(): { idle: Uint8Array; speaking: Uint8Array };
// Cleanup
reset(): void;
dispose(): void;
}
interface ProcessResult {
blendshapes: number[]; // flat array: frame_count × dim
frame_count: number;
fps: number; // always 30
mode: string;
}| Method | Use Case |
|---|---|
processFile(file) |
File upload → returns lip sync + expression + blink frames |
processAudio(float32) |
Pre-loaded audio (e.g., fetched from TTS API) |
processAudioChunk(chunk) |
Real-time mic / TTS streaming |
getVrmaBytes() |
Bone animation clips for idle breathing & speaking gestures |
reset() |
Clear streaming state between utterances |
await lipsync.init({
onProgress: (stage, percent) => {
// stage: 'wasm' → 'license' → 'decrypt' → 'onnx'
updateProgressBar(stage, percent);
}
});// 1. Capture mic audio at 16kHz via AudioWorklet
const ctx = new AudioContext({ sampleRate: 16000 });
// ... setup AudioWorklet (see examples/vanilla-avatar)
// 2. Feed chunks → get blendshape frames back
worklet.port.onmessage = async (e) => {
const result = await lipsync.processAudioChunk(e.data);
if (result) {
for (let i = 0; i < result.frame_count; i++) {
frameQueue.push(lipsync.getFrame(result, i));
}
}
};
// 3. Consume at 30fps in render loop
function render() {
requestAnimationFrame(render);
if (frameQueue.length > 0) {
applyToAvatar(frameQueue.shift());
}
}Call init() without a license key. All features available, no signup needed.
await lipsync.init(); // free trial
await lipsync.init({ licenseKey: 'ggl_your_key' }); // paid license| Free Trial | Paid License | |
|---|---|---|
| Duration | 30 days from first use | Unlimited |
| Setup | None (automatic) | Pass licenseKey to init() |
| Domain | Any | Configurable per key |
| Features | Full access | Full access |
Contact GoodGang Labs for license inquiries.
- ONNX models are AES-256-GCM encrypted and embedded into the WASM binary
- No separate model files are served — decryption happens at runtime
- License tokens are Ed25519 signed with 24-hour TTL
- Tokens cached in
sessionStorageto minimize server requests
Built by GoodGang Labs