-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Problem
Issues #71–#77 each solve a piece of the refactor, but no single issue owns the integrated end-to-end user experience. Each issue could be "done" in isolation without the full flow actually working.
This meta-issue defines the concrete acceptance gate: a brand-new developer can go from git clone to watching their agent play a scenario in under 10 minutes, with zero Godot knowledge.
The Target Experience
# Step 1: Clone and enter a starter (2 min)
git clone https://github.com/JustInternetAI/AgentArena.git
cd AgentArena/starters/beginner
# Step 2: Install dependencies (2 min)
pip install -r requirements.txt
# Step 3: Run (10 seconds)
python run.py --scenario foraging
# What happens:
# - Python agent server starts on port 5000
# - Game window launches automatically (compiled executable)
# - Foraging scenario loads
# - Agent starts making decisions and moving
# - User watches their agent in the game window
# - Terminal shows decision logNo Godot editor. No manual scene selection. No pressing SPACE. No configuring ports.
Smoke Test (Integration Gate)
Before any release, this test must pass:
A LangGraph agent plays 3 episodes of foraging, scoring >50 on episode 3, launched with a single
python run.py --scenario foraging --episodes 3command, with zero manual intervention.
This forces the following to work together:
- Compiled executable auto-launches (Compiled game executable with scenario launcher (no Godot IDE for users) #77)
- Tool completion callbacks work (Add tool completion callbacks from Godot to Python #71)
- Framework adapter runs the agent (Add framework adapter system for LangGraph, Claude Agent SDK, and other agent frameworks #74)
- Episode lifecycle restarts between episodes (Consolidate SDK: single arena API, single IPC server #78-ish)
- Persistent memory improves performance across episodes (Implement persistent cross-episode memory for agent learning across runs #76)
Checklist (Cross-Issue Integration)
Installation
-
pip install -r requirements.txtinstalls SDK + all dependencies - No manual path manipulation or sys.path hacks needed
- Game executable is pre-built and available (download or included)
Single Command Launch
-
python run.py --scenario foragingstarts everything - Game window appears within 5 seconds
- Agent connects and starts moving within 10 seconds
- No user interaction needed after the command
Visible Feedback
- Terminal shows: "Agent connected", "Episode 1 started", decision logs
- Game window shows: agent moving, collecting resources, score updating
- On episode end: terminal shows score summary
Error Handling
- Clear error if game executable not found ("Download from: ...")
- Clear error if port in use ("Port 5000 already in use, try --port 5001")
- Clear error if dependencies missing ("pip install -r requirements.txt first")
- Graceful shutdown on Ctrl+C (kills both Python server and game window)
Documentation
- Root README has 5-line quickstart matching the target experience above
- Each starter README matches (beginner, intermediate, langchain, claude-sdk)
- Troubleshooting section covers top 5 failure modes
Issues That Contribute
| Issue | What it provides |
|---|---|
| #77 | Compiled executable, auto-launch, scenario selection |
| #71 | Tool completion callbacks (agent sees results) |
| #72 | Mock testing (develop without Godot) |
| #73 | Complete intermediate starter |
| #74 | Framework adapters (LangGraph, Claude SDK) |
| #75 | Game-side inspector |
| #76 | Persistent cross-episode memory |
| SDK consolidation | Single API surface |
| SDK packaging | pip install works |
| Episode lifecycle | Auto-restart between episodes |
This Issue Is "Done" When
A fresh machine with Python 3.11 and no prior Agent Arena setup can complete the target experience above. Tested on both Windows and Ubuntu.
Estimated Effort
Not a separate work item — this is the integration test that validates all other issues are truly complete. ~Half day to write the automated smoke test, verify on clean machine, and update READMEs.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status