🚀 @ruvector/ruvllm-wasm v2.0.0 — Browser-native LLM inference is here

## @ruvector/ruvllm-wasm v2.0.0 — Browser-Native AI Runtime

The first functional release of **[@ruvector/ruvllm-wasm](https://www.npmjs.com/package/@ruvector/ruvllm-wasm)** is live on npm. This replaces the [deprecated v0.1.0 placeholder](https://github.com/ruvnet/ruvector/issues/238) with a fully compiled 435KB WebAssembly binary — semantic routing, adaptive learning, KV cache management, and chat template formatting running entirely client-side.

No server. No API keys. No network latency.

**Full documentation**: [`crates/ruvllm-wasm/README.md`](https://github.com/ruvnet/ruvector/blob/main/crates/ruvllm-wasm/README.md)

---

### Install

```bash
npm install @ruvector/ruvllm-wasm
```

### Quick start

```javascript
import init, { ChatTemplateWasm, ChatMessageWasm, HnswRouterWasm, healthCheck } from '@ruvector/ruvllm-wasm';

await init();
console.log(healthCheck()); // true

// Format conversations for any model family
const template = ChatTemplateWasm.detectFromModelId("meta-llama/Llama-3-8B");
const prompt = template.format([
  ChatMessageWasm.system("You are a helpful assistant."),
  ChatMessageWasm.user("What can you do?"),
]);

// Route queries to the best agent in <1ms using HNSW semantic search
const router = new HnswRouterWasm(384, 1000);
router.addPattern(embedding, "code-agent", "handles code generation");
router.addPattern(embedding2, "research-agent", "handles research queries");
const result = router.route(queryEmbedding);
console.log(result.name, result.score); // "code-agent" 0.95

// Persist and restore router state
const snapshot = router.toJson();
const restored = HnswRouterWasm.fromJson(snapshot);
```

---

### What ships in this release

| Feature | Type | Description |
|---------|------|-------------|
| **HNSW Semantic Router** | `HnswRouterWasm` | Bidirectional graph with cosine similarity — 150x faster than linear scan. Add patterns, route queries, serialize to JSON. |
| **KV Cache** | `KvCacheWasm` | Two-tier token cache (FP32 tail for recent tokens + u8 quantized store for history). Configurable tail length, head count, head dimension. |
| **Chat Templates** | `ChatTemplateWasm` | Format prompts for 7 model families: Llama3, Llama2, Mistral, Qwen, ChatML, Phi, Gemma. Auto-detect from model ID. |
| **MicroLoRA** | `MicroLoraWasm` | Per-request LoRA adaptation (rank 1-4). Forward pass, gradient accumulation, weight updates — all in <1ms. Serialize/restore adapter state. |
| **SONA Instant** | `SonaInstantWasm` | EMA quality tracking, adaptive rank adjustment, EWC-lite weight importance, pattern buffer with cosine-similarity suggestion. |
| **Memory Pool** | `InferenceArenaWasm` | O(1) bump allocator for inference temporaries. Pre-size for model dimensions. Tracks high-water mark. |
| **Buffer Pool** | `BufferPoolWasm` | Pre-allocated buffers in size classes (1KB-256KB). Configurable max per class. Pool hit rate tracking. |
| **Web Workers** | `ParallelInference` | Async parallel matmul, attention, layerNorm across multiple workers. SharedArrayBuffer detection, capability-level reporting. |
| **Feature Detection** | Free functions | `feature_summary()`, `optimal_worker_count()`, `supports_parallel_inference()`, `is_simd_available()`, `cross_origin_isolated()` |
| **TypeScript** | `.d.ts` | Complete type definitions for all 20+ exported types and free functions |

### Package stats

| Metric | Value |
|--------|-------|
| WASM binary | 435 KB (178 KB gzipped) |
| JS glue | 128 KB |
| TypeScript defs | 45 KB |
| Exported types | 20+ |
| Browser support | Chrome 57+, Firefox 52+, Safari 11+, Edge 79+ |

### Architecture

```
JavaScript/TypeScript App
        │
        ▼
  ┌─────────────┐
  │ ruvllm_wasm │  ← 435KB compiled WASM
  │   .js glue  │  ← wasm-bindgen generated
  └──────┬──────┘
         │
    ┌────┴────────────────────────┐
    │                             │
    ▼                             ▼
┌──────────┐  ┌──────────────────────┐
│ Core     │  │ Intelligent Features │
│          │  │                      │
│ KvCache  │  │ HnswRouter (150x)   │
│ Arena    │  │ MicroLoRA (<1ms)     │
│ BufPool  │  │ SONA Instant         │
│ Chat TPL │  │ Web Workers          │
└──────────┘  └──────────────────────┘
```

### Build from source

```bash
# Requires: rustup target add wasm32-unknown-unknown && cargo install wasm-pack

# Release build (workaround for Rust 1.91 codegen bug)
CARGO_PROFILE_RELEASE_CODEGEN_UNITS=256 CARGO_PROFILE_RELEASE_LTO=off \
  wasm-pack build crates/ruvllm-wasm --target web --scope ruvector --release

# Dev build (no workaround needed)
wasm-pack build crates/ruvllm-wasm --target web --scope ruvector --dev
```

See [ADR-084](https://github.com/ruvnet/ruvector/blob/main/docs/adr/ADR-084-ruvllm-wasm-publish.md) for build details and known limitations.

### What's next

- [ ] **IntelligentLLM** — Combined router + LoRA + SONA in one unified API
- [ ] **WebGPU attention** — GPU-accelerated attention (matmul GPU path already works)
- [ ] **GGUF streaming loader** — Load quantized models directly in the browser
- [ ] **Worker completion signals** — Replace setTimeout polling with message-based coordination

### Related packages

| Package | Version | Description |
|---------|---------|-------------|
| [`@ruvector/ruvllm`](https://www.npmjs.com/package/@ruvector/ruvllm) | 2.5.2 | Node.js LLM orchestration with SONA learning |
| [`ruvector`](https://www.npmjs.com/package/ruvector) | 0.2.11 | Full CLI with 48 commands + 91 MCP tools |
| [`@ruvector/rvf`](https://www.npmjs.com/package/@ruvector/rvf) | — | Cognitive container runtime |

---

PR: #241 | ADR: [ADR-084](https://github.com/ruvnet/ruvector/blob/main/docs/adr/ADR-084-ruvllm-wasm-publish.md) | Docs: [`crates/ruvllm-wasm/README.md`](https://github.com/ruvnet/ruvector/blob/main/crates/ruvllm-wasm/README.md)

Feature	Type	Description
HNSW Semantic Router	`HnswRouterWasm`	Bidirectional graph with cosine similarity — 150x faster than linear scan. Add patterns, route queries, serialize to JSON.
KV Cache	`KvCacheWasm`	Two-tier token cache (FP32 tail for recent tokens + u8 quantized store for history). Configurable tail length, head count, head dimension.
Chat Templates	`ChatTemplateWasm`	Format prompts for 7 model families: Llama3, Llama2, Mistral, Qwen, ChatML, Phi, Gemma. Auto-detect from model ID.
MicroLoRA	`MicroLoraWasm`	Per-request LoRA adaptation (rank 1-4). Forward pass, gradient accumulation, weight updates — all in <1ms. Serialize/restore adapter state.
SONA Instant	`SonaInstantWasm`	EMA quality tracking, adaptive rank adjustment, EWC-lite weight importance, pattern buffer with cosine-similarity suggestion.
Memory Pool	`InferenceArenaWasm`	O(1) bump allocator for inference temporaries. Pre-size for model dimensions. Tracks high-water mark.
Buffer Pool	`BufferPoolWasm`	Pre-allocated buffers in size classes (1KB-256KB). Configurable max per class. Pool hit rate tracking.
Web Workers	`ParallelInference`	Async parallel matmul, attention, layerNorm across multiple workers. SharedArrayBuffer detection, capability-level reporting.
Feature Detection	Free functions	`feature_summary()`, `optimal_worker_count()`, `supports_parallel_inference()`, `is_simd_available()`, `cross_origin_isolated()`
TypeScript	`.d.ts`	Complete type definitions for all 20+ exported types and free functions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚀 @ruvector/ruvllm-wasm v2.0.0 — Browser-native LLM inference is here #242

@ruvector/ruvllm-wasm v2.0.0 — Browser-Native AI Runtime

Install

Quick start

What ships in this release

Package stats

Architecture

Build from source

What's next

Related packages

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Metric	Value
WASM binary	435 KB (178 KB gzipped)
JS glue	128 KB
TypeScript defs	45 KB
Exported types	20+
Browser support	Chrome 57+, Firefox 52+, Safari 11+, Edge 79+

Package	Version	Description
`@ruvector/ruvllm`	2.5.2	Node.js LLM orchestration with SONA learning
`ruvector`	0.2.11	Full CLI with 48 commands + 91 MCP tools
`@ruvector/rvf`	—	Cognitive container runtime

🚀 @ruvector/ruvllm-wasm v2.0.0 — Browser-native LLM inference is here #242

Description

@ruvector/ruvllm-wasm v2.0.0 — Browser-Native AI Runtime

Install

Quick start

What ships in this release

Package stats

Architecture

Build from source

What's next

Related packages

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions