End-to-end onboarding gate: git clone to running agent in 10 minutes

## Problem

Issues #71–#77 each solve a piece of the refactor, but no single issue owns the **integrated end-to-end user experience**. Each issue could be "done" in isolation without the full flow actually working.

This meta-issue defines the concrete acceptance gate: a brand-new developer can go from `git clone` to watching their agent play a scenario in under 10 minutes, with zero Godot knowledge.

## The Target Experience

```bash
# Step 1: Clone and enter a starter (2 min)
git clone https://github.com/JustInternetAI/AgentArena.git
cd AgentArena/starters/beginner

# Step 2: Install dependencies (2 min)
pip install -r requirements.txt

# Step 3: Run (10 seconds)
python run.py --scenario foraging

# What happens:
# - Python agent server starts on port 5000
# - Game window launches automatically (compiled executable)
# - Foraging scenario loads
# - Agent starts making decisions and moving
# - User watches their agent in the game window
# - Terminal shows decision log
```

No Godot editor. No manual scene selection. No pressing SPACE. No configuring ports.

## Smoke Test (Integration Gate)

Before any release, this test must pass:

> **A LangGraph agent plays 3 episodes of foraging, scoring >50 on episode 3, launched with a single `python run.py --scenario foraging --episodes 3` command, with zero manual intervention.**

This forces the following to work together:
- Compiled executable auto-launches (#77)
- Tool completion callbacks work (#71)
- Framework adapter runs the agent (#74)
- Episode lifecycle restarts between episodes (#78-ish)
- Persistent memory improves performance across episodes (#76)

## Checklist (Cross-Issue Integration)

### Installation
- [ ] `pip install -r requirements.txt` installs SDK + all dependencies
- [ ] No manual path manipulation or sys.path hacks needed
- [ ] Game executable is pre-built and available (download or included)

### Single Command Launch
- [ ] `python run.py --scenario foraging` starts everything
- [ ] Game window appears within 5 seconds
- [ ] Agent connects and starts moving within 10 seconds
- [ ] No user interaction needed after the command

### Visible Feedback
- [ ] Terminal shows: "Agent connected", "Episode 1 started", decision logs
- [ ] Game window shows: agent moving, collecting resources, score updating
- [ ] On episode end: terminal shows score summary

### Error Handling
- [ ] Clear error if game executable not found ("Download from: ...")
- [ ] Clear error if port in use ("Port 5000 already in use, try --port 5001")
- [ ] Clear error if dependencies missing ("pip install -r requirements.txt first")
- [ ] Graceful shutdown on Ctrl+C (kills both Python server and game window)

### Documentation
- [ ] Root README has 5-line quickstart matching the target experience above
- [ ] Each starter README matches (beginner, intermediate, langchain, claude-sdk)
- [ ] Troubleshooting section covers top 5 failure modes

## Issues That Contribute

| Issue | What it provides |
|-------|-----------------|
| #77 | Compiled executable, auto-launch, scenario selection |
| #71 | Tool completion callbacks (agent sees results) |
| #72 | Mock testing (develop without Godot) |
| #73 | Complete intermediate starter |
| #74 | Framework adapters (LangGraph, Claude SDK) |
| #75 | Game-side inspector |
| #76 | Persistent cross-episode memory |
| SDK consolidation | Single API surface |
| SDK packaging | pip install works |
| Episode lifecycle | Auto-restart between episodes |

## This Issue Is "Done" When

A fresh machine with Python 3.11 and no prior Agent Arena setup can complete the target experience above. Tested on both Windows and Ubuntu.

## Estimated Effort
Not a separate work item — this is the integration test that validates all other issues are truly complete. ~Half day to write the automated smoke test, verify on clean machine, and update READMEs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

End-to-end onboarding gate: git clone to running agent in 10 minutes #82

Problem

The Target Experience

Smoke Test (Integration Gate)

Checklist (Cross-Issue Integration)

Installation

Single Command Launch

Visible Feedback

Error Handling

Documentation

Issues That Contribute

This Issue Is "Done" When

Estimated Effort

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue	What it provides
#77	Compiled executable, auto-launch, scenario selection
#71	Tool completion callbacks (agent sees results)
#72	Mock testing (develop without Godot)
#73	Complete intermediate starter
#74	Framework adapters (LangGraph, Claude SDK)
#75	Game-side inspector
#76	Persistent cross-episode memory
SDK consolidation	Single API surface
SDK packaging	pip install works
Episode lifecycle	Auto-restart between episodes

End-to-end onboarding gate: git clone to running agent in 10 minutes #82

Description

Problem

The Target Experience

Smoke Test (Integration Gate)

Checklist (Cross-Issue Integration)

Installation

Single Command Launch

Visible Feedback

Error Handling

Documentation

Issues That Contribute

This Issue Is "Done" When

Estimated Effort

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions