Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
9baa8ad
docs: start milestone v2.6 Retrieval Quality, Lifecycle & Episodic Me…
RichardHightower Mar 11, 2026
c0a482d
docs: complete v2.6 project research
RichardHightower Mar 11, 2026
982ecd6
docs: define milestone v2.6 requirements
RichardHightower Mar 11, 2026
5580747
docs: create milestone v2.6 roadmap (6 phases)
RichardHightower Mar 11, 2026
dbd26bc
docs: plan all v2.6 phases (39-44, 13 plans)
RichardHightower Mar 11, 2026
937a61d
feat(43-01): define Episode, Action, and ActionResult types
RichardHightower Mar 11, 2026
0421c2e
feat(43-01): add CF_EPISODES column family for episodic memory
RichardHightower Mar 11, 2026
71cbb83
feat(43-01): add episode CRUD storage operations
RichardHightower Mar 11, 2026
bacb8a8
feat(43-01): add EpisodicConfig with value scoring parameters
RichardHightower Mar 11, 2026
f7608d3
chore(43-01): apply cargo fmt formatting fixes
RichardHightower Mar 11, 2026
d40c677
Merge branch 'main' into feature/phase-43-episodic-schema-storage
RichardHightower Mar 11, 2026
01451b6
docs(43-01): complete Episode Schema, Storage, and Column Family plan
RichardHightower Mar 11, 2026
86dc6a3
feat(40): wire salience enrichment + usage decay into retrieval pipeline
RichardHightower Mar 11, 2026
a93ca85
feat(40): salience scoring + usage decay in retrieval ranking
RichardHightower Mar 11, 2026
33d12a7
test(40): add E2E ranking tests for salience enrichment and usage decay
RichardHightower Mar 11, 2026
99479f4
fix(39): restore BM25 hybrid wiring after auto-revert
RichardHightower Mar 11, 2026
dd92651
feat(41): lifecycle automation - vector pruning CLI + BM25 rebuild
RichardHightower Mar 11, 2026
01a0da4
feat(42): observability RPCs - dedup buffer, ranking metrics, verbose…
RichardHightower Mar 11, 2026
b2979ec
merge: integrate phase-43 episodic schema into feature branch
RichardHightower Mar 12, 2026
8b14b32
feat(44): episodic memory gRPC, handler, similarity search, and E2E t…
RichardHightower Mar 12, 2026
f1d69c4
docs: mark v2.6 milestone 100% complete (13/13 plans)
RichardHightower Mar 12, 2026
e834b7b
Merge phase-44 episodic gRPC into feature branch
RichardHightower Mar 12, 2026
bf11c84
fix: escape Arc<Storage> in rustdoc comment to fix doc build
RichardHightower Mar 12, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 44 additions & 6 deletions .planning/PROJECT.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,21 @@

## Current State

**Version:** v2.5 (Shipped 2026-03-10)
**Status:** Production-ready with semantic dedup, stale filtering, 5-CLI E2E test harness, and full adapter coverage
**Version:** v2.6 (In Progress)
**Status:** Building retrieval quality, lifecycle automation, and episodic memory

## Current Milestone: v2.6 Retrieval Quality, Lifecycle & Episodic Memory

**Goal:** Complete hybrid search, add ranking intelligence, automate index lifecycle, expose operational metrics, and enable the system to learn from past task outcomes.

**Target features:**
- Complete BM25 hybrid search wiring (currently hardcoded `false`)
- Salience scoring at write time + usage-based decay in retrieval ranking
- Automated vector pruning and BM25 lifecycle policies via scheduler
- Admin observability RPCs for dedup/ranking metrics
- Episodic memory — record task outcomes, search similar past episodes, value-based retention

**Previous version:** v2.5 (Shipped 2026-03-10) — semantic dedup, stale filtering, 5-CLI E2E test harness

The system implements a complete 6-layer cognitive stack with control plane, multi-agent support, semantic dedup, retrieval quality filtering, and comprehensive testing:
- Layer 0: Raw Events (RocksDB) — agent-tagged, dedup-aware (store-and-skip-outbox)
Expand Down Expand Up @@ -209,12 +222,37 @@ Agent Memory implements a layered cognitive architecture:
- [x] Configurable staleness parameters via config.toml — v2.5
- [x] 10 E2E tests proving dedup, stale filtering, and fail-open — v2.5

### Active
### Active (v2.6)

**Hybrid Search**
- [ ] BM25 wired into hybrid search handler and retrieval routing

**Ranking Quality**
- [ ] Salience scoring at write time (TOC nodes, Grips)
- [ ] Usage-based decay in retrieval ranking (access_count tracking)

**Lifecycle Automation**
- [ ] Vector index pruning via scheduler job
- [ ] BM25 lifecycle policy with level-filtered rebuild

**Observability**
- [ ] Admin RPCs for dedup metrics (buffer_size, events skipped)
- [ ] Ranking metrics exposure (salience distribution, usage stats)
- [ ] `deduplicated` field in IngestEventResponse

**Episodic Memory**
- [ ] Episode schema and RocksDB storage (CF_EPISODES)
- [ ] gRPC RPCs (StartEpisode, RecordAction, CompleteEpisode, GetSimilarEpisodes)
- [ ] Value-based retention (outcome score sweet spot)
- [ ] Retrieval integration for similar episode search

### Deferred / Future

**Deferred / Future**
- Cross-project unified memory
- Admin dedup dashboard (events skipped, threshold hits, buffer utilization)
- Per-agent dedup scoping
- Consolidation hook (extract durable knowledge from events, needs NLP/LLM)
- True daemonization (double-fork on Unix)
- API-based summarizer wiring (OpenAI/Anthropic)

### Out of Scope

Expand Down Expand Up @@ -314,4 +352,4 @@ CLI client and agent skill query the daemon. Agent receives TOC navigation tools
| std::sync::RwLock for InFlightBuffer | Operations are sub-microsecond; tokio RwLock overhead unnecessary | ✓ Validated v2.5 |

---
*Last updated: 2026-03-10 after v2.5 milestone*
*Last updated: 2026-03-10 after v2.6 milestone start*
152 changes: 152 additions & 0 deletions .planning/REQUIREMENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
# Requirements: Agent Memory v2.6

**Defined:** 2026-03-10
**Core Value:** Agent can answer "what were we talking about last week?" without scanning everything

## v2.6 Requirements

Requirements for Retrieval Quality, Lifecycle & Episodic Memory milestone. Each maps to roadmap phases.

### Hybrid Search

- [ ] **HYBRID-01**: BM25 wired into HybridSearchHandler (currently hardcoded `bm25_available() = false`)
- [ ] **HYBRID-02**: Hybrid search returns combined BM25 + vector results via RRF score fusion
- [ ] **HYBRID-03**: BM25 fallback enabled in retrieval routing when vector index unavailable
- [ ] **HYBRID-04**: E2E test verifies hybrid search returns results from both BM25 and vector layers

### Ranking

- [ ] **RANK-01**: Salience score calculated at write time on TOC nodes (length_density + kind_boost + pinned_boost)
- [ ] **RANK-02**: Salience score calculated at write time on Grips
- [ ] **RANK-03**: `is_pinned` field added to TocNode and Grip (default false)
- [ ] **RANK-04**: Usage tracking: `access_count` and `last_accessed` updated on retrieval hits
- [ ] **RANK-05**: Usage-based decay penalty applied in retrieval ranking (1.0 / (1.0 + 0.15 * access_count))
- [ ] **RANK-06**: Combined ranking formula: similarity * salience_factor * usage_penalty
- [ ] **RANK-07**: Ranking composites with existing StaleFilter (score floor at 50% to prevent collapse)
- [ ] **RANK-08**: Salience and usage_decay configurable via config.toml sections
- [ ] **RANK-09**: E2E test: pinned/high-salience items rank higher than low-salience items
- [ ] **RANK-10**: E2E test: frequently-accessed items score lower than fresh items (usage decay)

### Lifecycle

- [ ] **LIFE-01**: Vector pruning scheduler job calls existing `prune(age_days)` on configurable schedule
- [ ] **LIFE-02**: CLI command: `memory-daemon admin prune-vectors --age-days N`
- [ ] **LIFE-03**: Config: `[lifecycle.vector] segment_retention_days` controls pruning threshold
- [ ] **LIFE-04**: BM25 rebuild with level filter excludes fine-grain docs after rollup
- [ ] **LIFE-05**: CLI command: `memory-daemon admin rebuild-bm25 --min-level day`
- [ ] **LIFE-06**: Config: `[lifecycle.bm25] min_level_after_rollup` controls BM25 retention granularity
- [ ] **LIFE-07**: E2E test: old segments pruned from vector index after lifecycle job runs

### Observability

- [ ] **OBS-01**: `buffer_size` exposed in GetDedupStatus (currently hardcoded 0)
- [ ] **OBS-02**: `deduplicated` field added to IngestEventResponse (deferred proto change from v2.5)
- [ ] **OBS-03**: Dedup threshold hit rate and events_skipped rate exposed via admin RPC
- [ ] **OBS-04**: Ranking metrics (salience distribution, usage decay stats) queryable via admin RPC
- [ ] **OBS-05**: CLI: `memory-daemon status --verbose` shows dedup/ranking health summary

### Episodic Memory

- [ ] **EPIS-01**: Episode struct with episode_id, task, plan, actions, outcome_score, lessons_learned, failure_modes, embedding, created_at
- [ ] **EPIS-02**: Action struct with action_type, input, result, timestamp
- [ ] **EPIS-03**: CF_EPISODES column family in RocksDB for episode storage
- [ ] **EPIS-04**: StartEpisode gRPC RPC creates new episode and returns episode_id
- [ ] **EPIS-05**: RecordAction gRPC RPC appends action to in-progress episode
- [ ] **EPIS-06**: CompleteEpisode gRPC RPC finalizes episode with outcome_score, lessons, failure_modes
- [ ] **EPIS-07**: GetSimilarEpisodes gRPC RPC searches by vector similarity on episode embeddings
- [ ] **EPIS-08**: Value-based retention: episodes scored by distance from 0.65 optimal outcome
- [ ] **EPIS-09**: Retention threshold: episodes with value_score < 0.18 eligible for pruning
- [ ] **EPIS-10**: Configurable via `[episodic]` config section (enabled, value_threshold, max_episodes)
- [ ] **EPIS-11**: E2E test: create episode → complete → search by similarity returns match
- [ ] **EPIS-12**: E2E test: value-based retention correctly identifies low/high value episodes

## Future Requirements

Deferred to v2.7+. Tracked but not in current roadmap.

### Consolidation

- **CONS-01**: Extract durable knowledge (preferences, constraints, procedures) from recent events
- **CONS-02**: Daily consolidation scheduler job with NLP/LLM pattern extraction
- **CONS-03**: CF_CONSOLIDATED column family for extracted knowledge atoms

### Cross-Project

- **XPROJ-01**: Unified memory queries across multiple project stores
- **XPROJ-02**: Cross-project dedup for shared context

### Agent Scoping

- **SCOPE-01**: Per-agent dedup thresholds (only dedup within same agent's history)
- **SCOPE-02**: Agent-filtered lifecycle policies

### Operational

- **OPS-01**: True daemonization (double-fork on Unix)
- **OPS-02**: API-based summarizer wiring (OpenAI/Anthropic when key present)
- **OPS-03**: Config example file (config.toml.example) shipped with binary

## Out of Scope

| Feature | Reason |
|---------|--------|
| LLM-based episode summarization | Adds latency, hallucination risk, external dependency |
| Automatic memory forgetting/deletion | Violates append-only invariant |
| Real-time outcome feedback loops | Out of scope for v2.6; need agent framework integration |
| Graph-based episode dependencies | Overengineered for initial episode support |
| Per-agent lifecycle scoping | Defer to v2.7 when multi-agent dedup is validated |
| Continuous outcome recording | Adoption killer — complete episodes only |
| Real-time index rebuilds | UX killer — batch via scheduler only |
| Cross-project memory | Requires architectural rethink of per-project isolation |

## Traceability

| Requirement | Phase | Status |
|-------------|-------|--------|
| HYBRID-01 | Phase 39 | Pending |
| HYBRID-02 | Phase 39 | Pending |
| HYBRID-03 | Phase 39 | Pending |
| HYBRID-04 | Phase 39 | Pending |
| RANK-01 | Phase 40 | Pending |
| RANK-02 | Phase 40 | Pending |
| RANK-03 | Phase 40 | Pending |
| RANK-04 | Phase 40 | Pending |
| RANK-05 | Phase 40 | Pending |
| RANK-06 | Phase 40 | Pending |
| RANK-07 | Phase 40 | Pending |
| RANK-08 | Phase 40 | Pending |
| RANK-09 | Phase 40 | Pending |
| RANK-10 | Phase 40 | Pending |
| LIFE-01 | Phase 41 | Pending |
| LIFE-02 | Phase 41 | Pending |
| LIFE-03 | Phase 41 | Pending |
| LIFE-04 | Phase 41 | Pending |
| LIFE-05 | Phase 41 | Pending |
| LIFE-06 | Phase 41 | Pending |
| LIFE-07 | Phase 41 | Pending |
| OBS-01 | Phase 42 | Pending |
| OBS-02 | Phase 42 | Pending |
| OBS-03 | Phase 42 | Pending |
| OBS-04 | Phase 42 | Pending |
| OBS-05 | Phase 42 | Pending |
| EPIS-01 | Phase 43 | Pending |
| EPIS-02 | Phase 43 | Pending |
| EPIS-03 | Phase 43 | Pending |
| EPIS-04 | Phase 44 | Pending |
| EPIS-05 | Phase 44 | Pending |
| EPIS-06 | Phase 44 | Pending |
| EPIS-07 | Phase 44 | Pending |
| EPIS-08 | Phase 44 | Pending |
| EPIS-09 | Phase 44 | Pending |
| EPIS-10 | Phase 44 | Pending |
| EPIS-11 | Phase 44 | Pending |
| EPIS-12 | Phase 44 | Pending |

**Coverage:**
- v2.6 requirements: 38 total
- Mapped to phases: 38
- Unmapped: 0 ✓

---
*Requirements defined: 2026-03-10*
*Last updated: 2026-03-10 after initial definition*
134 changes: 124 additions & 10 deletions .planning/ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
- ✅ **v2.3 Install & Setup Experience** — Phases 28-29 (shipped 2026-02-12)
- ✅ **v2.4 Headless CLI Testing** — Phases 30-34 (shipped 2026-03-05)
- ✅ **v2.5 Semantic Dedup & Retrieval Quality** — Phases 35-38 (shipped 2026-03-10)
- **v2.6 Retrieval Quality, Lifecycle & Episodic Memory** — Phases 39-44 (in progress)

## Phases

Expand Down Expand Up @@ -95,19 +96,129 @@ See: `.planning/milestones/v2.4-ROADMAP.md`
</details>

<details>
<summary>v2.5 Semantic Dedup & Retrieval Quality (Phases 35-38) SHIPPED 2026-03-10</summary>
<summary>v2.5 Semantic Dedup & Retrieval Quality (Phases 35-38) -- SHIPPED 2026-03-10</summary>

- [x] Phase 35: DedupGate Foundation (2/2 plans) completed 2026-03-05
- [x] Phase 36: Ingest Pipeline Wiring (3/3 plans) completed 2026-03-06
- [x] Phase 37: StaleFilter (3/3 plans) completed 2026-03-09
- [x] Phase 38: E2E Validation (3/3 plans) completed 2026-03-10
- [x] Phase 35: DedupGate Foundation (2/2 plans) -- completed 2026-03-05
- [x] Phase 36: Ingest Pipeline Wiring (3/3 plans) -- completed 2026-03-06
- [x] Phase 37: StaleFilter (3/3 plans) -- completed 2026-03-09
- [x] Phase 38: E2E Validation (3/3 plans) -- completed 2026-03-10

See: `.planning/milestones/v2.5-ROADMAP.md`

</details>

### v2.6 Retrieval Quality, Lifecycle & Episodic Memory (In Progress)

**Milestone Goal:** Complete hybrid search wiring, add ranking intelligence with salience and usage decay, automate index lifecycle, expose operational observability metrics, and enable episodic memory for learning from past task outcomes.

- [ ] **Phase 39: BM25 Hybrid Wiring** - Wire BM25 into hybrid search handler and retrieval routing
- [ ] **Phase 40: Salience Scoring + Usage Decay** - Ranking quality with write-time salience and retrieval-time usage decay
- [ ] **Phase 41: Lifecycle Automation** - Scheduled vector pruning and BM25 lifecycle policies
- [ ] **Phase 42: Observability RPCs** - Admin metrics for dedup, ranking, and operational health
- [ ] **Phase 43: Episodic Memory Schema & Storage** - Episode and Action data model with RocksDB column family
- [ ] **Phase 44: Episodic Memory gRPC & Retrieval** - Episode lifecycle RPCs, similarity search, and value-based retention

## Phase Details

### Phase 39: BM25 Hybrid Wiring
**Goal**: Users get combined lexical and semantic search results from a single query, with BM25 serving as fallback when vector index is unavailable
**Depends on**: v2.5 (shipped)
**Requirements**: HYBRID-01, HYBRID-02, HYBRID-03, HYBRID-04
**Success Criteria** (what must be TRUE):
1. A teleport_query returns results that include both BM25 keyword matches and vector similarity matches, fused via RRF scoring
2. When the vector index is unavailable, route_query falls back to BM25-only results instead of returning empty
3. The hybrid search handler reports bm25_available() = true (no longer hardcoded false)
4. An E2E test proves that a query matching content indexed by both BM25 and vector returns combined results from both layers
**Plans**: 2

Plans:
- [ ] 39-01: Wire BM25 into HybridSearchHandler and retrieval routing
- [ ] 39-02: E2E hybrid search test

### Phase 40: Salience Scoring + Usage Decay
**Goal**: Retrieval results are ranked by a composed formula that rewards high-salience content, penalizes overused results, and composes cleanly with existing stale filtering
**Depends on**: Phase 39
**Requirements**: RANK-01, RANK-02, RANK-03, RANK-04, RANK-05, RANK-06, RANK-07, RANK-08, RANK-09, RANK-10
**Success Criteria** (what must be TRUE):
1. TOC nodes and Grips have salience scores calculated at write time based on length density, kind boost, and pinned boost
2. Retrieval results for pinned or high-salience items consistently rank higher than low-salience items of similar similarity
3. Frequently accessed results receive a usage decay penalty so that fresh results surface above stale, over-accessed ones
4. The combined ranking formula (similarity x salience_factor x usage_penalty) composes with StaleFilter without collapsing scores below min_confidence threshold
5. Salience weights and usage decay parameters are configurable via config.toml sections
**Plans**: 3

Plans:
- [ ] 40-01: Salience scoring at write time
- [ ] 40-02: Usage-based decay in retrieval ranking
- [ ] 40-03: Ranking E2E tests

### Phase 41: Lifecycle Automation
**Goal**: Index sizes are automatically managed through scheduled pruning jobs, preventing unbounded growth of vector and BM25 indexes
**Depends on**: Phase 40
**Requirements**: LIFE-01, LIFE-02, LIFE-03, LIFE-04, LIFE-05, LIFE-06, LIFE-07
**Success Criteria** (what must be TRUE):
1. Old vector index segments are automatically pruned by the scheduler based on configurable segment_retention_days
2. An admin CLI command allows manual vector pruning with --age-days parameter
3. BM25 index can be rebuilt with a --min-level filter that excludes fine-grain segment docs after rollup
4. An admin CLI command allows manual BM25 rebuild with level filtering
5. An E2E test proves that old segments are removed from the vector index after a lifecycle job runs
**Plans**: 2

Plans:
- [ ] 41-01: Vector pruning wiring + CLI command
- [ ] 41-02: BM25 lifecycle policy + E2E test

### Phase 42: Observability RPCs
**Goal**: Operators can inspect dedup, ranking, and system health metrics through admin RPCs and CLI, enabling production monitoring and debugging
**Depends on**: Phase 40
**Requirements**: OBS-01, OBS-02, OBS-03, OBS-04, OBS-05
**Success Criteria** (what must be TRUE):
1. GetDedupStatus returns the actual InFlightBuffer size and dedup hit rate (no longer hardcoded 0)
2. IngestEventResponse includes a deduplicated boolean field indicating whether the event was a duplicate
3. Ranking metrics (salience distribution, usage decay stats) are queryable via admin RPC
4. `memory-daemon status --verbose` prints a human-readable summary of dedup and ranking health
**Plans**: 2

Plans:
- [ ] 42-01: Dedup observability — buffer size + deduplicated field
- [ ] 42-02: Ranking metrics + verbose status CLI

### Phase 43: Episodic Memory Schema & Storage
**Goal**: The system has a persistent, queryable storage layer for task episodes with structured actions and outcomes
**Depends on**: v2.5 (shipped) — independent of Phases 39-42
**Requirements**: EPIS-01, EPIS-02, EPIS-03
**Success Criteria** (what must be TRUE):
1. Episode struct exists with episode_id, task, plan, actions, outcome_score, lessons_learned, failure_modes, embedding, and created_at fields
2. Action struct exists with action_type, input, result, and timestamp fields
3. CF_EPISODES column family is registered in RocksDB and episodes can be stored and retrieved by ID
**Plans**: 1

Plans:
- [ ] 43-01: Episode schema, storage, and column family

### Phase 44: Episodic Memory gRPC & Retrieval
**Goal**: Agents can record task outcomes as episodes, search for similar past episodes by vector similarity, and the system retains episodes based on their learning value
**Depends on**: Phase 43
**Requirements**: EPIS-04, EPIS-05, EPIS-06, EPIS-07, EPIS-08, EPIS-09, EPIS-10, EPIS-11, EPIS-12
**Success Criteria** (what must be TRUE):
1. An agent can start an episode, record actions during execution, and complete it with an outcome score and lessons learned
2. GetSimilarEpisodes returns past episodes ranked by vector similarity to a query embedding, enabling "we solved this before" retrieval
3. Value-based retention scores episodes by distance from the 0.65 optimal outcome, and episodes below the retention threshold are eligible for pruning
4. Episodic memory is configurable via [episodic] config section (enabled flag, value_threshold, max_episodes)
5. E2E tests prove the full episode lifecycle (create, record, complete, search) and value-based retention scoring
**Plans**: 3

Plans:
- [ ] 44-01: Episode gRPC proto definitions and handler
- [ ] 44-02: Similar episode search and value-based retention
- [ ] 44-03: Episodic memory E2E tests

## Progress

**Execution Order:**
Phases execute in numeric order: 39 → 40 → 41 → 42 → 43 → 44
Note: Phases 43-44 (Episodic Memory) are independent of 39-42 and could be parallelized.

| Phase | Milestone | Plans | Status | Completed |
|-------|-----------|-------|--------|-----------|
| 1-9 | v1.0 | 20/20 | Complete | 2026-01-30 |
Expand All @@ -116,11 +227,14 @@ See: `.planning/milestones/v2.5-ROADMAP.md`
| 24-27 | v2.2 | 10/10 | Complete | 2026-02-11 |
| 28-29 | v2.3 | 2/2 | Complete | 2026-02-12 |
| 30-34 | v2.4 | 15/15 | Complete | 2026-03-05 |
| 35 | v2.5 | 2/2 | Complete | 2026-03-05 |
| 36 | v2.5 | 3/3 | Complete | 2026-03-06 |
| 37 | v2.5 | 3/3 | Complete | 2026-03-09 |
| 38 | v2.5 | 3/3 | Complete | 2026-03-10 |
| 35-38 | v2.5 | 11/11 | Complete | 2026-03-10 |
| 39. BM25 Hybrid Wiring | v2.6 | 0/2 | Planned | - |
| 40. Salience + Usage Decay | v2.6 | 0/3 | Planned | - |
| 41. Lifecycle Automation | v2.6 | 0/2 | Planned | - |
| 42. Observability RPCs | v2.6 | 0/2 | Planned | - |
| 43. Episodic Schema & Storage | v2.6 | 0/1 | Planned | - |
| 44. Episodic gRPC & Retrieval | v2.6 | 0/3 | Planned | - |

---

*Updated: 2026-03-10 after v2.5 milestone shipped*
*Updated: 2026-03-11 after v2.6 roadmap created*
Loading
Loading