SpillwaveSolutions · RichardHightower · Mar 12, 2026 · Mar 11, 2026 · Mar 11, 2026 · Mar 11, 2026
diff --git a/.planning/PROJECT.md b/.planning/PROJECT.md
@@ -2,8 +2,21 @@
 
 ## Current State
 
-**Version:** v2.5 (Shipped 2026-03-10)
-**Status:** Production-ready with semantic dedup, stale filtering, 5-CLI E2E test harness, and full adapter coverage
+**Version:** v2.6 (In Progress)
+**Status:** Building retrieval quality, lifecycle automation, and episodic memory
+
+## Current Milestone: v2.6 Retrieval Quality, Lifecycle & Episodic Memory
+
+**Goal:** Complete hybrid search, add ranking intelligence, automate index lifecycle, expose operational metrics, and enable the system to learn from past task outcomes.
+
+**Target features:**
+- Complete BM25 hybrid search wiring (currently hardcoded `false`)
+- Salience scoring at write time + usage-based decay in retrieval ranking
+- Automated vector pruning and BM25 lifecycle policies via scheduler
+- Admin observability RPCs for dedup/ranking metrics
+- Episodic memory — record task outcomes, search similar past episodes, value-based retention
+
+**Previous version:** v2.5 (Shipped 2026-03-10) — semantic dedup, stale filtering, 5-CLI E2E test harness
 
 The system implements a complete 6-layer cognitive stack with control plane, multi-agent support, semantic dedup, retrieval quality filtering, and comprehensive testing:
 - Layer 0: Raw Events (RocksDB) — agent-tagged, dedup-aware (store-and-skip-outbox)
@@ -209,12 +222,37 @@ Agent Memory implements a layered cognitive architecture:
 - [x] Configurable staleness parameters via config.toml — v2.5
 - [x] 10 E2E tests proving dedup, stale filtering, and fail-open — v2.5
 
-### Active
+### Active (v2.6)
+
+**Hybrid Search**
+- [ ] BM25 wired into hybrid search handler and retrieval routing
+
+**Ranking Quality**
+- [ ] Salience scoring at write time (TOC nodes, Grips)
+- [ ] Usage-based decay in retrieval ranking (access_count tracking)
+
+**Lifecycle Automation**
+- [ ] Vector index pruning via scheduler job
+- [ ] BM25 lifecycle policy with level-filtered rebuild
+
+**Observability**
+- [ ] Admin RPCs for dedup metrics (buffer_size, events skipped)
+- [ ] Ranking metrics exposure (salience distribution, usage stats)
+- [ ] `deduplicated` field in IngestEventResponse
+
+**Episodic Memory**
+- [ ] Episode schema and RocksDB storage (CF_EPISODES)
+- [ ] gRPC RPCs (StartEpisode, RecordAction, CompleteEpisode, GetSimilarEpisodes)
+- [ ] Value-based retention (outcome score sweet spot)
+- [ ] Retrieval integration for similar episode search
+
+### Deferred / Future
 
-**Deferred / Future**
 - Cross-project unified memory
-- Admin dedup dashboard (events skipped, threshold hits, buffer utilization)
 - Per-agent dedup scoping
+- Consolidation hook (extract durable knowledge from events, needs NLP/LLM)
+- True daemonization (double-fork on Unix)
+- API-based summarizer wiring (OpenAI/Anthropic)
 
 ### Out of Scope
 
@@ -314,4 +352,4 @@ CLI client and agent skill query the daemon. Agent receives TOC navigation tools
 | std::sync::RwLock for InFlightBuffer | Operations are sub-microsecond; tokio RwLock overhead unnecessary | ✓ Validated v2.5 |
 
 ---
-*Last updated: 2026-03-10 after v2.5 milestone*
+*Last updated: 2026-03-10 after v2.6 milestone start*
diff --git a/.planning/REQUIREMENTS.md b/.planning/REQUIREMENTS.md
@@ -0,0 +1,152 @@
+# Requirements: Agent Memory v2.6
+
+**Defined:** 2026-03-10
+**Core Value:** Agent can answer "what were we talking about last week?" without scanning everything
+
+## v2.6 Requirements
+
+Requirements for Retrieval Quality, Lifecycle & Episodic Memory milestone. Each maps to roadmap phases.
+
+### Hybrid Search
+
+- [ ] **HYBRID-01**: BM25 wired into HybridSearchHandler (currently hardcoded `bm25_available() = false`)
+- [ ] **HYBRID-02**: Hybrid search returns combined BM25 + vector results via RRF score fusion
+- [ ] **HYBRID-03**: BM25 fallback enabled in retrieval routing when vector index unavailable
+- [ ] **HYBRID-04**: E2E test verifies hybrid search returns results from both BM25 and vector layers
+
+### Ranking
+
+- [ ] **RANK-01**: Salience score calculated at write time on TOC nodes (length_density + kind_boost + pinned_boost)
+- [ ] **RANK-02**: Salience score calculated at write time on Grips
+- [ ] **RANK-03**: `is_pinned` field added to TocNode and Grip (default false)
+- [ ] **RANK-04**: Usage tracking: `access_count` and `last_accessed` updated on retrieval hits
+- [ ] **RANK-05**: Usage-based decay penalty applied in retrieval ranking (1.0 / (1.0 + 0.15 * access_count))
+- [ ] **RANK-06**: Combined ranking formula: similarity * salience_factor * usage_penalty
+- [ ] **RANK-07**: Ranking composites with existing StaleFilter (score floor at 50% to prevent collapse)
+- [ ] **RANK-08**: Salience and usage_decay configurable via config.toml sections
+- [ ] **RANK-09**: E2E test: pinned/high-salience items rank higher than low-salience items
+- [ ] **RANK-10**: E2E test: frequently-accessed items score lower than fresh items (usage decay)
+
+### Lifecycle
+
+- [ ] **LIFE-01**: Vector pruning scheduler job calls existing `prune(age_days)` on configurable schedule
+- [ ] **LIFE-02**: CLI command: `memory-daemon admin prune-vectors --age-days N`
+- [ ] **LIFE-03**: Config: `[lifecycle.vector] segment_retention_days` controls pruning threshold
+- [ ] **LIFE-04**: BM25 rebuild with level filter excludes fine-grain docs after rollup
+- [ ] **LIFE-05**: CLI command: `memory-daemon admin rebuild-bm25 --min-level day`
+- [ ] **LIFE-06**: Config: `[lifecycle.bm25] min_level_after_rollup` controls BM25 retention granularity
+- [ ] **LIFE-07**: E2E test: old segments pruned from vector index after lifecycle job runs
+
+### Observability
+
+- [ ] **OBS-01**: `buffer_size` exposed in GetDedupStatus (currently hardcoded 0)
+- [ ] **OBS-02**: `deduplicated` field added to IngestEventResponse (deferred proto change from v2.5)
+- [ ] **OBS-03**: Dedup threshold hit rate and events_skipped rate exposed via admin RPC
+- [ ] **OBS-04**: Ranking metrics (salience distribution, usage decay stats) queryable via admin RPC
+- [ ] **OBS-05**: CLI: `memory-daemon status --verbose` shows dedup/ranking health summary
+
+### Episodic Memory
+
+- [ ] **EPIS-01**: Episode struct with episode_id, task, plan, actions, outcome_score, lessons_learned, failure_modes, embedding, created_at
+- [ ] **EPIS-02**: Action struct with action_type, input, result, timestamp
+- [ ] **EPIS-03**: CF_EPISODES column family in RocksDB for episode storage
+- [ ] **EPIS-04**: StartEpisode gRPC RPC creates new episode and returns episode_id
+- [ ] **EPIS-05**: RecordAction gRPC RPC appends action to in-progress episode
+- [ ] **EPIS-06**: CompleteEpisode gRPC RPC finalizes episode with outcome_score, lessons, failure_modes
+- [ ] **EPIS-07**: GetSimilarEpisodes gRPC RPC searches by vector similarity on episode embeddings
+- [ ] **EPIS-08**: Value-based retention: episodes scored by distance from 0.65 optimal outcome
+- [ ] **EPIS-09**: Retention threshold: episodes with value_score < 0.18 eligible for pruning
+- [ ] **EPIS-10**: Configurable via `[episodic]` config section (enabled, value_threshold, max_episodes)
+- [ ] **EPIS-11**: E2E test: create episode → complete → search by similarity returns match
+- [ ] **EPIS-12**: E2E test: value-based retention correctly identifies low/high value episodes
+
+## Future Requirements
+
+Deferred to v2.7+. Tracked but not in current roadmap.
+
+### Consolidation
+
+- **CONS-01**: Extract durable knowledge (preferences, constraints, procedures) from recent events
+- **CONS-02**: Daily consolidation scheduler job with NLP/LLM pattern extraction
+- **CONS-03**: CF_CONSOLIDATED column family for extracted knowledge atoms
+
+### Cross-Project
+
+- **XPROJ-01**: Unified memory queries across multiple project stores
+- **XPROJ-02**: Cross-project dedup for shared context
+
+### Agent Scoping
+
+- **SCOPE-01**: Per-agent dedup thresholds (only dedup within same agent's history)
+- **SCOPE-02**: Agent-filtered lifecycle policies
+
+### Operational
+
+- **OPS-01**: True daemonization (double-fork on Unix)
+- **OPS-02**: API-based summarizer wiring (OpenAI/Anthropic when key present)
+- **OPS-03**: Config example file (config.toml.example) shipped with binary
+
+## Out of Scope
+
+| Feature | Reason |
+|---------|--------|
+| LLM-based episode summarization | Adds latency, hallucination risk, external dependency |
+| Automatic memory forgetting/deletion | Violates append-only invariant |
+| Real-time outcome feedback loops | Out of scope for v2.6; need agent framework integration |
+| Graph-based episode dependencies | Overengineered for initial episode support |
+| Per-agent lifecycle scoping | Defer to v2.7 when multi-agent dedup is validated |
+| Continuous outcome recording | Adoption killer — complete episodes only |
+| Real-time index rebuilds | UX killer — batch via scheduler only |
+| Cross-project memory | Requires architectural rethink of per-project isolation |
+
+## Traceability
+
+| Requirement | Phase | Status |
+|-------------|-------|--------|
+| HYBRID-01 | Phase 39 | Pending |
+| HYBRID-02 | Phase 39 | Pending |
+| HYBRID-03 | Phase 39 | Pending |
+| HYBRID-04 | Phase 39 | Pending |
+| RANK-01 | Phase 40 | Pending |
+| RANK-02 | Phase 40 | Pending |
+| RANK-03 | Phase 40 | Pending |
+| RANK-04 | Phase 40 | Pending |
+| RANK-05 | Phase 40 | Pending |
+| RANK-06 | Phase 40 | Pending |
+| RANK-07 | Phase 40 | Pending |
+| RANK-08 | Phase 40 | Pending |
+| RANK-09 | Phase 40 | Pending |
+| RANK-10 | Phase 40 | Pending |
+| LIFE-01 | Phase 41 | Pending |
+| LIFE-02 | Phase 41 | Pending |
+| LIFE-03 | Phase 41 | Pending |
+| LIFE-04 | Phase 41 | Pending |
+| LIFE-05 | Phase 41 | Pending |
+| LIFE-06 | Phase 41 | Pending |
+| LIFE-07 | Phase 41 | Pending |
+| OBS-01 | Phase 42 | Pending |
+| OBS-02 | Phase 42 | Pending |
+| OBS-03 | Phase 42 | Pending |
+| OBS-04 | Phase 42 | Pending |
+| OBS-05 | Phase 42 | Pending |
+| EPIS-01 | Phase 43 | Pending |
+| EPIS-02 | Phase 43 | Pending |
+| EPIS-03 | Phase 43 | Pending |
+| EPIS-04 | Phase 44 | Pending |
+| EPIS-05 | Phase 44 | Pending |
+| EPIS-06 | Phase 44 | Pending |
+| EPIS-07 | Phase 44 | Pending |
+| EPIS-08 | Phase 44 | Pending |
+| EPIS-09 | Phase 44 | Pending |
+| EPIS-10 | Phase 44 | Pending |
+| EPIS-11 | Phase 44 | Pending |
+| EPIS-12 | Phase 44 | Pending |
+
+**Coverage:**
+- v2.6 requirements: 38 total
+- Mapped to phases: 38
+- Unmapped: 0 ✓
+
+---
+*Requirements defined: 2026-03-10*
+*Last updated: 2026-03-10 after initial definition*
diff --git a/.planning/ROADMAP.md b/.planning/ROADMAP.md
@@ -9,6 +9,7 @@
 - ✅ **v2.3 Install & Setup Experience** — Phases 28-29 (shipped 2026-02-12)
 - ✅ **v2.4 Headless CLI Testing** — Phases 30-34 (shipped 2026-03-05)
 - ✅ **v2.5 Semantic Dedup & Retrieval Quality** — Phases 35-38 (shipped 2026-03-10)
+- **v2.6 Retrieval Quality, Lifecycle & Episodic Memory** — Phases 39-44 (in progress)
 
 ## Phases
 
@@ -95,19 +96,129 @@ See: `.planning/milestones/v2.4-ROADMAP.md`
 </details>
 
 <details>
-<summary>✅ v2.5 Semantic Dedup & Retrieval Quality (Phases 35-38) — SHIPPED 2026-03-10</summary>
+<summary>v2.5 Semantic Dedup & Retrieval Quality (Phases 35-38) -- SHIPPED 2026-03-10</summary>
 
-- [x] Phase 35: DedupGate Foundation (2/2 plans) — completed 2026-03-05
-- [x] Phase 36: Ingest Pipeline Wiring (3/3 plans) — completed 2026-03-06
-- [x] Phase 37: StaleFilter (3/3 plans) — completed 2026-03-09
-- [x] Phase 38: E2E Validation (3/3 plans) — completed 2026-03-10
+- [x] Phase 35: DedupGate Foundation (2/2 plans) -- completed 2026-03-05
+- [x] Phase 36: Ingest Pipeline Wiring (3/3 plans) -- completed 2026-03-06
+- [x] Phase 37: StaleFilter (3/3 plans) -- completed 2026-03-09
+- [x] Phase 38: E2E Validation (3/3 plans) -- completed 2026-03-10
 
 See: `.planning/milestones/v2.5-ROADMAP.md`
 
 </details>
 
+### v2.6 Retrieval Quality, Lifecycle & Episodic Memory (In Progress)
+
+**Milestone Goal:** Complete hybrid search wiring, add ranking intelligence with salience and usage decay, automate index lifecycle, expose operational observability metrics, and enable episodic memory for learning from past task outcomes.
+
+- [ ] **Phase 39: BM25 Hybrid Wiring** - Wire BM25 into hybrid search handler and retrieval routing
+- [ ] **Phase 40: Salience Scoring + Usage Decay** - Ranking quality with write-time salience and retrieval-time usage decay
+- [ ] **Phase 41: Lifecycle Automation** - Scheduled vector pruning and BM25 lifecycle policies
+- [ ] **Phase 42: Observability RPCs** - Admin metrics for dedup, ranking, and operational health
+- [ ] **Phase 43: Episodic Memory Schema & Storage** - Episode and Action data model with RocksDB column family
+- [ ] **Phase 44: Episodic Memory gRPC & Retrieval** - Episode lifecycle RPCs, similarity search, and value-based retention
+
+## Phase Details
+
+### Phase 39: BM25 Hybrid Wiring
+**Goal**: Users get combined lexical and semantic search results from a single query, with BM25 serving as fallback when vector index is unavailable
+**Depends on**: v2.5 (shipped)
+**Requirements**: HYBRID-01, HYBRID-02, HYBRID-03, HYBRID-04
+**Success Criteria** (what must be TRUE):
+  1. A teleport_query returns results that include both BM25 keyword matches and vector similarity matches, fused via RRF scoring
+  2. When the vector index is unavailable, route_query falls back to BM25-only results instead of returning empty
+  3. The hybrid search handler reports bm25_available() = true (no longer hardcoded false)
+  4. An E2E test proves that a query matching content indexed by both BM25 and vector returns combined results from both layers
+**Plans**: 2
+
+Plans:
+- [ ] 39-01: Wire BM25 into HybridSearchHandler and retrieval routing
+- [ ] 39-02: E2E hybrid search test
+
+### Phase 40: Salience Scoring + Usage Decay
+**Goal**: Retrieval results are ranked by a composed formula that rewards high-salience content, penalizes overused results, and composes cleanly with existing stale filtering
+**Depends on**: Phase 39
+**Requirements**: RANK-01, RANK-02, RANK-03, RANK-04, RANK-05, RANK-06, RANK-07, RANK-08, RANK-09, RANK-10
+**Success Criteria** (what must be TRUE):
+  1. TOC nodes and Grips have salience scores calculated at write time based on length density, kind boost, and pinned boost
+  2. Retrieval results for pinned or high-salience items consistently rank higher than low-salience items of similar similarity
+  3. Frequently accessed results receive a usage decay penalty so that fresh results surface above stale, over-accessed ones
+  4. The combined ranking formula (similarity x salience_factor x usage_penalty) composes with StaleFilter without collapsing scores below min_confidence threshold
+  5. Salience weights and usage decay parameters are configurable via config.toml sections
+**Plans**: 3
+
+Plans:
+- [ ] 40-01: Salience scoring at write time
+- [ ] 40-02: Usage-based decay in retrieval ranking
+- [ ] 40-03: Ranking E2E tests
+
+### Phase 41: Lifecycle Automation
+**Goal**: Index sizes are automatically managed through scheduled pruning jobs, preventing unbounded growth of vector and BM25 indexes
+**Depends on**: Phase 40
+**Requirements**: LIFE-01, LIFE-02, LIFE-03, LIFE-04, LIFE-05, LIFE-06, LIFE-07
+**Success Criteria** (what must be TRUE):
+  1. Old vector index segments are automatically pruned by the scheduler based on configurable segment_retention_days
+  2. An admin CLI command allows manual vector pruning with --age-days parameter
+  3. BM25 index can be rebuilt with a --min-level filter that excludes fine-grain segment docs after rollup
+  4. An admin CLI command allows manual BM25 rebuild with level filtering
+  5. An E2E test proves that old segments are removed from the vector index after a lifecycle job runs
+**Plans**: 2
+
+Plans:
+- [ ] 41-01: Vector pruning wiring + CLI command
+- [ ] 41-02: BM25 lifecycle policy + E2E test
+
+### Phase 42: Observability RPCs
+**Goal**: Operators can inspect dedup, ranking, and system health metrics through admin RPCs and CLI, enabling production monitoring and debugging
+**Depends on**: Phase 40
+**Requirements**: OBS-01, OBS-02, OBS-03, OBS-04, OBS-05
+**Success Criteria** (what must be TRUE):
+  1. GetDedupStatus returns the actual InFlightBuffer size and dedup hit rate (no longer hardcoded 0)
+  2. IngestEventResponse includes a deduplicated boolean field indicating whether the event was a duplicate
+  3. Ranking metrics (salience distribution, usage decay stats) are queryable via admin RPC
+  4. `memory-daemon status --verbose` prints a human-readable summary of dedup and ranking health
+**Plans**: 2
+
+Plans:
+- [ ] 42-01: Dedup observability — buffer size + deduplicated field
+- [ ] 42-02: Ranking metrics + verbose status CLI
+
+### Phase 43: Episodic Memory Schema & Storage
+**Goal**: The system has a persistent, queryable storage layer for task episodes with structured actions and outcomes
+**Depends on**: v2.5 (shipped) — independent of Phases 39-42
+**Requirements**: EPIS-01, EPIS-02, EPIS-03
+**Success Criteria** (what must be TRUE):
+  1. Episode struct exists with episode_id, task, plan, actions, outcome_score, lessons_learned, failure_modes, embedding, and created_at fields
+  2. Action struct exists with action_type, input, result, and timestamp fields
+  3. CF_EPISODES column family is registered in RocksDB and episodes can be stored and retrieved by ID
+**Plans**: 1
+
+Plans:
+- [ ] 43-01: Episode schema, storage, and column family
+
+### Phase 44: Episodic Memory gRPC & Retrieval
+**Goal**: Agents can record task outcomes as episodes, search for similar past episodes by vector similarity, and the system retains episodes based on their learning value
+**Depends on**: Phase 43
+**Requirements**: EPIS-04, EPIS-05, EPIS-06, EPIS-07, EPIS-08, EPIS-09, EPIS-10, EPIS-11, EPIS-12
+**Success Criteria** (what must be TRUE):
+  1. An agent can start an episode, record actions during execution, and complete it with an outcome score and lessons learned
+  2. GetSimilarEpisodes returns past episodes ranked by vector similarity to a query embedding, enabling "we solved this before" retrieval
+  3. Value-based retention scores episodes by distance from the 0.65 optimal outcome, and episodes below the retention threshold are eligible for pruning
+  4. Episodic memory is configurable via [episodic] config section (enabled flag, value_threshold, max_episodes)
+  5. E2E tests prove the full episode lifecycle (create, record, complete, search) and value-based retention scoring
+**Plans**: 3
+
+Plans:
+- [ ] 44-01: Episode gRPC proto definitions and handler
+- [ ] 44-02: Similar episode search and value-based retention
+- [ ] 44-03: Episodic memory E2E tests
+
 ## Progress
 
+**Execution Order:**
+Phases execute in numeric order: 39 → 40 → 41 → 42 → 43 → 44
+Note: Phases 43-44 (Episodic Memory) are independent of 39-42 and could be parallelized.
+
 | Phase | Milestone | Plans | Status | Completed |
 |-------|-----------|-------|--------|-----------|
 | 1-9 | v1.0 | 20/20 | Complete | 2026-01-30 |
@@ -116,11 +227,14 @@ See: `.planning/milestones/v2.5-ROADMAP.md`
 | 24-27 | v2.2 | 10/10 | Complete | 2026-02-11 |
 | 28-29 | v2.3 | 2/2 | Complete | 2026-02-12 |
 | 30-34 | v2.4 | 15/15 | Complete | 2026-03-05 |
-| 35 | v2.5 | 2/2 | Complete | 2026-03-05 |
-| 36 | v2.5 | 3/3 | Complete | 2026-03-06 |
-| 37 | v2.5 | 3/3 | Complete | 2026-03-09 |
-| 38 | v2.5 | 3/3 | Complete | 2026-03-10 |
+| 35-38 | v2.5 | 11/11 | Complete | 2026-03-10 |
+| 39. BM25 Hybrid Wiring | v2.6 | 0/2 | Planned | - |
+| 40. Salience + Usage Decay | v2.6 | 0/3 | Planned | - |
+| 41. Lifecycle Automation | v2.6 | 0/2 | Planned | - |
+| 42. Observability RPCs | v2.6 | 0/2 | Planned | - |
+| 43. Episodic Schema & Storage | v2.6 | 0/1 | Planned | - |
+| 44. Episodic gRPC & Retrieval | v2.6 | 0/3 | Planned | - |
 
 ---
 
-*Updated: 2026-03-10 after v2.5 milestone shipped*
+*Updated: 2026-03-11 after v2.6 roadmap created*