-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Summary
Implement persistent cross-episode memory so agents learn and improve from run to run. This is one of Agent Arena's most unique features -- no agent framework provides game-episode-aware persistent learning out of the box.
An agent that dies to fire in Episode 1 should avoid that area in Episode 2. An agent that discovers a berry cluster should go there first next time. An agent that learns "craft torch before exploring" should do that automatically in future episodes.
Why This Matters
Cross-episode learning is:
- Our killer feature: No framework (LangGraph, CrewAI, etc.) provides this. It is uniquely valuable in a simulation environment.
- The most compelling AI learning opportunity: Memory management, pattern recognition, strategy evaluation, and reward attribution are core agent challenges.
- Visually rewarding: Watching an agent visibly improve across episodes is powerful for learners.
- Framework-compatible: Persistent memory is exposed as query tools -- frameworks do not need to know about persistence.
Architecture
How It Fits the Three-Category Model
Cross-episode memory surfaces through query tools. The framework does not need to know or care that results come from a persistent store:
CONTEXT (this tick observation -- current episode only)
"You see 2 berries nearby, health=100, explored 0%"
QUERY TOOLS (now include persistent data from past episodes)
get_episode_summary(count=3) -> "Last run: berry cluster NE, died to fire at center"
query_spatial_memory(pos, radius) -> includes locations from previous episodes with confidence
get_strategy_notes() -> "Craft torch before exploring (confirmed 2x)"
recall_location("workbench") -> "(5, 0, 8) -- seen in 3 previous episodes"
ACTION TOOLS (this tick decision -- unchanged)
move_to(...) | collect(...) | craft_item(...) | explore(...)
Per-Episode Lifecycle
Episode starts
|
v
Agent plays (ticks 1..N)
- Current observations via context
- Query tools return current + persistent memory
- SpatialMemory tracks object locations this episode
- EpisodeMemoryManager tracks key events this episode
|
v
Episode ends (agent dies, time runs out, objective met)
|
v
Post-Episode Processing
1. Generate episode summary (score, key events, decisions)
2. Merge spatial knowledge into persistent store (with confidence)
3. Evaluate strategies (which decisions correlated with good outcomes?)
4. Save to persistent store
|
v
Next episode starts with richer memory
Persistent Store
Simple file-based storage per agent (no database needed at this scale):
persistent_memory/
agent_001/
episodes.json # Episode summaries with scores and key events
spatial_knowledge.json # Aggregated object locations with confidence scores
strategies.json # Learned strategies with confirmation counts
Three Layers of Learning (Implementation Phases)
Layer 1: Factual Memory (what happened)
Store raw observations and events from past episodes.
- Episode summaries: "Collected 4 berries, took 15 damage, score 62"
- Object locations: "Berries seen at (10,0,5), (12,0,3) in Episode 3"
- Events: "Took fire damage at (5,0,5) on tick 23"
AI concepts learned: Memory storage, retrieval, recency weighting
Layer 2: Pattern Recognition (what recurs)
Aggregate facts across episodes to identify reliable patterns.
- "Berries consistently spawn in NE quadrant (seen 4/5 episodes, high confidence)"
- "Fire always appears near center (seen 3/5 episodes)"
- Confidence scores decay if patterns are not confirmed in recent episodes
AI concepts learned: Statistical aggregation, confidence scoring, belief updating
Layer 3: Strategy Learning (what works)
Track which decisions led to good outcomes across episodes.
- "Episodes where I crafted torch first scored 40% higher on average"
- "Exploring before collecting leads to better resource discovery"
- "Approaching fire from the east is safer (less damage taken)"
AI concepts learned: Reward attribution, strategy evaluation, counterfactual reasoning
Existing Code to Build On
| Component | Status | What is Needed |
|---|---|---|
| EpisodeMemoryManager | Exists -- tracks within-episode events, detects boundaries | Add persistence to disk between episodes |
| SpatialMemory | Exists -- grid-based index with staleness tracking | Add cross-episode merge with confidence decay |
| Episode summaries | Exists in EpisodeMemoryManager | Add post-episode summary generation and storage |
| Strategy notes | Does not exist | New: track decision-outcome correlations |
| Persistent store | Does not exist | New: JSON file per agent that survives restarts |
| Confidence scoring | Does not exist | New: track how many episodes confirm a pattern |
Implementation Plan
Phase 1: Persistence Layer (2 days)
- Create PersistentMemoryStore class (JSON file read/write per agent)
- Add episode boundary hooks: on_episode_start() loads, on_episode_end() saves
- Store episode summaries with score, key events, tick count, timestamp
- Expose get_episode_summary(count=N) as query tool returning past N episodes
Phase 2: Spatial Persistence (2 days)
- On episode end, merge current SpatialMemory into persistent spatial store
- Add confidence scores: objects seen in multiple episodes get higher confidence
- Add recency decay: old observations lose confidence over time
- query_spatial_memory() returns both current and persistent results (flagged appropriately)
- recall_location() searches persistent store for named object types
Phase 3: Strategy Learning with RAG (3-4 days)
Phases 1 and 2 use structured JSON storage because the data is well-defined (episode summaries, object coordinates, scores). Phase 3 is where RAG becomes the right tool -- strategy insights are unstructured, natural language observations that accumulate over many episodes:
- "When I explored north before crafting, I found more resources but took more damage"
- "Crafting a torch helped in episodes with fire but was wasted effort in episodes without"
- "Approaching the berry cluster from the east avoided the fire hazard"
These do not fit neatly into JSON. A question like "how do I craft a torch?" or "what should I do when health is low near fire?" needs semantic search across many past experiences.
Implementation approach: Ingest tick-by-tick experience logs and strategy notes into a RAG store (we already have Milvus + nomic-embed-text on the Ubuntu server). Expose via a query tool:
ask_experience("how do I craft a torch?")
-> "Need 2 wood. Must be near a workbench. Crafted successfully in Ep3 tick 45.
Tip: gather wood first, workbench is usually near center of map."
ask_experience("what should I do when health is low?")
-> "Avoid fire (center area). Move to NE corner where berries are safe.
In Ep2, fleeing north when health dropped below 50 prevented death."- Log tick-by-tick experience data (decisions, outcomes, context) during episodes
- Post-episode: ingest experience logs into RAG store (Milvus via existing infrastructure)
- Expose ask_experience(question) as query tool -- semantic search over past experiences
- Track key decisions per episode (first action, crafting order, exploration strategy)
- Correlate decisions with episode outcomes (score, damage, resources, survival time)
- Generate strategy notes: "When I did X, outcome was Y (N episodes)"
- Expose get_strategy_notes() as query tool (structured) alongside ask_experience() (RAG)
- Add confirmation counting: strategies confirmed across episodes gain weight
Note: Phases 1-2 (structured JSON) should be completed first. They solve 80% of cross-episode learning with 20% of the effort. RAG in Phase 3 handles the long tail of unstructured experiential knowledge that grows over many episodes.
Phase 4: Integration with Framework Starters (1-2 days)
- LangGraph starter demonstrates cross-episode learning
- Tutorial explains: memory management, when to trust old data, confidence thresholds
- Show measurable improvement: graph score across episodes
Example: Agent Improving Over 5 Episodes
Episode 1 (Score: 25)
- Wanders randomly, finds 2 berries, walks into fire, dies
- Saves: fire location, berry locations, "died to fire" event
Episode 2 (Score: 45)
- Remembers fire location, avoids center
- Finds berry cluster in NE corner
- Saves: NE berry cluster (confidence: 1 episode), fire avoidance strategy
Episode 3 (Score: 62)
- Goes directly to NE corner (remembered from Ep2)
- Berry cluster confirmed (confidence: 2 episodes)
- Discovers workbench, crafts torch
- Saves: torch crafting improved exploration
Episode 4 (Score: 78)
- Crafts torch first (strategy from Ep3)
- Heads to NE corner (high confidence)
- Explores more safely with torch
- Strategy confirmed: "torch first" now has 2 confirmations
Episode 5 (Score: 91)
- Optimal opening: craft torch -> head NE -> collect berries -> explore south
- All strategies have high confidence
- Agent is now consistently performing well
Dependencies
- Add tool completion callbacks from Godot to Python #71 (tool completion callbacks) -- needed to track decision outcomes
- Add framework adapter system for LangGraph, Claude Agent SDK, and other agent frameworks #74 (framework adapters) -- persistent memory exposed as query tools
- Existing EpisodeMemoryManager and SpatialMemory as foundation
Success Criteria
- Agent measurably improves over 5+ episodes on foraging scenario
- Persistent memory survives process restarts
- Query tools return cross-episode data transparently
- At least one starter demonstrates and teaches cross-episode learning
- Learner can inspect what the agent "learned" via memory files or inspector
Metadata
Metadata
Assignees
Labels
Type
Projects
Status