-
Notifications
You must be signed in to change notification settings - Fork 341
Closed as not planned
Closed as not planned
Copy link
Labels
enhancementNew feature or requestNew feature or request
Description
@ruvector/ruvllm-wasm v2.0.0 β Browser-Native AI Runtime
The first functional release of @ruvector/ruvllm-wasm is live on npm. This replaces the deprecated v0.1.0 placeholder with a fully compiled 435KB WebAssembly binary β semantic routing, adaptive learning, KV cache management, and chat template formatting running entirely client-side.
No server. No API keys. No network latency.
Full documentation: crates/ruvllm-wasm/README.md
Install
npm install @ruvector/ruvllm-wasmQuick start
import init, { ChatTemplateWasm, ChatMessageWasm, HnswRouterWasm, healthCheck } from '@ruvector/ruvllm-wasm';
await init();
console.log(healthCheck()); // true
// Format conversations for any model family
const template = ChatTemplateWasm.detectFromModelId("meta-llama/Llama-3-8B");
const prompt = template.format([
ChatMessageWasm.system("You are a helpful assistant."),
ChatMessageWasm.user("What can you do?"),
]);
// Route queries to the best agent in <1ms using HNSW semantic search
const router = new HnswRouterWasm(384, 1000);
router.addPattern(embedding, "code-agent", "handles code generation");
router.addPattern(embedding2, "research-agent", "handles research queries");
const result = router.route(queryEmbedding);
console.log(result.name, result.score); // "code-agent" 0.95
// Persist and restore router state
const snapshot = router.toJson();
const restored = HnswRouterWasm.fromJson(snapshot);What ships in this release
| Feature | Type | Description |
|---|---|---|
| HNSW Semantic Router | HnswRouterWasm |
Bidirectional graph with cosine similarity β 150x faster than linear scan. Add patterns, route queries, serialize to JSON. |
| KV Cache | KvCacheWasm |
Two-tier token cache (FP32 tail for recent tokens + u8 quantized store for history). Configurable tail length, head count, head dimension. |
| Chat Templates | ChatTemplateWasm |
Format prompts for 7 model families: Llama3, Llama2, Mistral, Qwen, ChatML, Phi, Gemma. Auto-detect from model ID. |
| MicroLoRA | MicroLoraWasm |
Per-request LoRA adaptation (rank 1-4). Forward pass, gradient accumulation, weight updates β all in <1ms. Serialize/restore adapter state. |
| SONA Instant | SonaInstantWasm |
EMA quality tracking, adaptive rank adjustment, EWC-lite weight importance, pattern buffer with cosine-similarity suggestion. |
| Memory Pool | InferenceArenaWasm |
O(1) bump allocator for inference temporaries. Pre-size for model dimensions. Tracks high-water mark. |
| Buffer Pool | BufferPoolWasm |
Pre-allocated buffers in size classes (1KB-256KB). Configurable max per class. Pool hit rate tracking. |
| Web Workers | ParallelInference |
Async parallel matmul, attention, layerNorm across multiple workers. SharedArrayBuffer detection, capability-level reporting. |
| Feature Detection | Free functions | feature_summary(), optimal_worker_count(), supports_parallel_inference(), is_simd_available(), cross_origin_isolated() |
| TypeScript | .d.ts |
Complete type definitions for all 20+ exported types and free functions |
Package stats
| Metric | Value |
|---|---|
| WASM binary | 435 KB (178 KB gzipped) |
| JS glue | 128 KB |
| TypeScript defs | 45 KB |
| Exported types | 20+ |
| Browser support | Chrome 57+, Firefox 52+, Safari 11+, Edge 79+ |
Architecture
JavaScript/TypeScript App
β
βΌ
βββββββββββββββ
β ruvllm_wasm β β 435KB compiled WASM
β .js glue β β wasm-bindgen generated
ββββββββ¬βββββββ
β
ββββββ΄βββββββββββββββββββββββββ
β β
βΌ βΌ
ββββββββββββ ββββββββββββββββββββββββ
β Core β β Intelligent Features β
β β β β
β KvCache β β HnswRouter (150x) β
β Arena β β MicroLoRA (<1ms) β
β BufPool β β SONA Instant β
β Chat TPL β β Web Workers β
ββββββββββββ ββββββββββββββββββββββββ
Build from source
# Requires: rustup target add wasm32-unknown-unknown && cargo install wasm-pack
# Release build (workaround for Rust 1.91 codegen bug)
CARGO_PROFILE_RELEASE_CODEGEN_UNITS=256 CARGO_PROFILE_RELEASE_LTO=off \
wasm-pack build crates/ruvllm-wasm --target web --scope ruvector --release
# Dev build (no workaround needed)
wasm-pack build crates/ruvllm-wasm --target web --scope ruvector --devSee ADR-084 for build details and known limitations.
What's next
- IntelligentLLM β Combined router + LoRA + SONA in one unified API
- WebGPU attention β GPU-accelerated attention (matmul GPU path already works)
- GGUF streaming loader β Load quantized models directly in the browser
- Worker completion signals β Replace setTimeout polling with message-based coordination
Related packages
| Package | Version | Description |
|---|---|---|
@ruvector/ruvllm |
2.5.2 | Node.js LLM orchestration with SONA learning |
ruvector |
0.2.11 | Full CLI with 48 commands + 91 MCP tools |
@ruvector/rvf |
β | Cognitive container runtime |
PR: #241 | ADR: ADR-084 | Docs: crates/ruvllm-wasm/README.md
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request