zen5 — Mixture of Diverse Experts (MoDE)

zen5 is a next-generation AI architecture that routes across expert modules harvested from the largest open-source models using complexity-aware hierarchical routing.

Unlike standard models that always use the same compute, zen5 adapts compute to task difficulty:

Tier	Active Params	Latency	Example Task
T0	0.8B	<50ms	"Hello, how are you?"
T1	9B	<200ms	"Summarize this article"
T2	10-40B	<2s	"Compare Keynesian vs monetarist economics"
T3	40-80B	<5s	"Derive Black-Scholes from first principles"
T4	80-100B+	<10s	"Design a novel consensus algorithm"

Architecture

Input → ComplexityEstimator (64 tokens) → Tier Assignment
  T0: 0.8B dense → direct generation
  T1: 9B dense → standard generation
  T2: 10-40B MoE → multi-expert routing
  T3: 40-80B MoE → deep expert routing
  T4: 80-100B+ MoE → frontier expert routing + adaptive escalation

Key Innovations

Complexity-Aware Routing — Lightweight estimator predicts task difficulty from first 64 tokens, activates appropriate tier
Cross-Architecture Experts — Experts from 6+ diverse model families (Gated DeltaNet, Lightning Attention, DeepseekV3 MoE, FP8 MoE, GLM MoE)
MoE++ Zero-Computation Experts (ICLR 2025 Oral) — Zero/Copy/Constant experts for trivial tokens, 1.1-2.1x throughput
ReMoE ReLU Routing (ICLR 2025) — Adaptive expert count via ReLU gating (no fixed Top-K)
Adaptive Escalation — Mid-generation promotion to higher tiers when confidence drops
Transfusion — Unified autoregressive (text) + diffusion (3D/video) in single architecture

Expert Pool (~3.1T total parameters)

Source	Params	Type	Architecture	Tier
Qwen3.5-0.8B	0.8B	Dense	Gated DeltaNet	T0
Qwen3.5-9B	9B	Dense	Gated DeltaNet	T1
MiniMax-M2.5	230B	MoE	Lightning Attention	T2
GLM-5	744B	MoE	GLM MoE	T3
Kimi K2.5	1.04T	MoE (384 experts)	DeepseekV3	T4
Ling-1T	1T	MoE	FP8 MoE	T4

Omnimodal Experts

Modality	Source	Architecture
Vision	Qwen3-VL	Native ViT
Video	Wan2.2 / CogVideoX	MoE video diffusion
3D	TRELLIS.2	Rectified Flow DiT + SC-VAE
Audio	Qwen3-Omni / zen-tts	Thinker-Talker

Model Lineup

Model	Total Params	Active Params	Target
zen5	750B	0.8-50B	General
zen5-coder	1.8T	0.8-80B	Code
zen5-omni	2.5T	0.8-100B	Omnimodal
zen5-max	3.1T	0.8-100B+	Frontier

Trainable Parameters

Only ~394M parameters are trained (0.013% of total model):

ComplexityEstimator: 207M — predicts task difficulty tier
AlignmentLayer: 134M — projects diverse architectures to shared space
MoDERouter: 52M — per-tier expert routing with ReLU gating

All expert weights are frozen — extracted and served as-is from source models.

Quick Start

# List expert pool
python scripts/expert_extraction.py list

# Extract experts from source models
python scripts/expert_extraction.py extract-all --output ./experts/

# Train complexity router
python scripts/router_training.py train --output ./checkpoints/

# Benchmark routing accuracy
python scripts/benchmark.py routing --checkpoint ./checkpoints/router/best.pt

# Generate full evaluation report
python scripts/benchmark.py report --checkpoint ./checkpoints/router/best.pt

Training

zen5 is trained in the open on the Hanzo Network. Training consists of 5 phases:

Expert Extraction — Harvest FFN/MoE blocks from source models
Alignment Pre-training — Learn cross-architecture projections
Router Training — Train complexity estimator and tier routers
Integration — Joint fine-tuning with frozen experts
Omnimodal Fusion — Add vision/video/3D/audio experts

See docs/ARCHITECTURE.md for detailed training strategy and hardware requirements.

Repository Structure

zen5/
├── scripts/
│   ├── expert_extraction.py   # Extract experts from source models
│   ├── router_training.py     # Train complexity router + alignment
│   ├── benchmark.py           # Evaluation and benchmarking suite
│   └── hf_repos.py           # HuggingFace model card generation
├── docs/
│   └── ARCHITECTURE.md        # Full architecture specification
├── training/                  # Training configs and data
├── checkpoints/               # Model checkpoints
└── paper/                     # Technical report

Requirements

torch>=2.4
transformers>=4.48
safetensors
huggingface_hub

References

MoE++ (ICLR 2025 Oral) — Zero-Computation Experts
ReMoE (ICLR 2025) — ReLU-based MoE routing
HMoE (EMNLP 2025) — Heterogeneous expert sizes
Symbolic-MoE (2025) — Skill-based routing
Uni-MoE-2.0 — Dynamic capacity multimodal MoE
Transfusion (Meta 2024) — Unified AR + diffusion

Links

Zen LM — Model family home
Hanzo AI — Infrastructure and training
HuggingFace — Model downloads
Architecture Paper — Coming soon

License

Apache 2.0

zen5: Mixture of Diverse Experts — Clarity Through Diversity

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
docs		docs
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

zen5 — Mixture of Diverse Experts (MoDE)

Architecture

Key Innovations

Expert Pool (~3.1T total parameters)

Omnimodal Experts

Model Lineup

Trainable Parameters

Quick Start

Training

Repository Structure

Requirements

References

Links

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

zen5 — Mixture of Diverse Experts (MoDE)

Architecture

Key Innovations

Expert Pool (~3.1T total parameters)

Omnimodal Experts

Model Lineup

Trainable Parameters

Quick Start

Training

Repository Structure

Requirements

References

Links

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages