Skip to content

feat: ruvllm-wasm v2.0.0 — first functional WASM publish #240

@ruvnet

Description

@ruvnet

Summary

Published @ruvector/ruvllm-wasm@2.0.0 to npm with compiled WASM binaries, replacing the deprecated v0.1.0 placeholder.

What's Included

  • KV Cache — Two-tier (FP32 tail + u8 quantized store) token management
  • Memory Pooling — Arena allocator + buffer pool for minimal allocation overhead
  • Chat Templates — Llama3, Mistral, Qwen, ChatML, Phi, Gemma format support
  • HNSW Semantic Router — Bidirectional graph with cosine similarity
  • MicroLoRA — Per-request adaptation (rank 1-4)
  • SONA Instant — EMA quality tracking + adaptive rank
  • Web Workers — Parallel inference with SharedArrayBuffer detection
  • TypeScript — Complete .d.ts type definitions

Build Notes

Rust 1.91 has a WASM codegen bug (type mismatch in release profile). Workaround:

CARGO_PROFILE_RELEASE_CODEGEN_UNITS=256 CARGO_PROFILE_RELEASE_LTO=off \
  wasm-pack build crates/ruvllm-wasm --target web --scope ruvector --release

Not Yet Wired

  • IntelligentLLMWasm (combined router+lora+sona) — commented out, needs API alignment
  • WebGPU attention shader (matmul works, attention falls back to CPU)
  • GGUF model streaming loader
  • Worker pool proper task completion (uses setTimeout polling)

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions