-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Problem
Agent Arena has two observability layers that don't talk to each other:
- Game-side (TraceStore): Observations, tool results, scores — keyed by
(agent_id, tick) - LLM-side (LangSmith/Anthropic console): Prompts, model responses, token usage, latency
When debugging a bad decision at tick 42, a user has to manually correlate between these systems. There's no way to click on a tick and see the full chain: what the agent saw → what prompt was built → what the LLM returned → what tool was called → what happened in the game.
Proposed Solution
Add a trace bridge that links game ticks to framework trace IDs, creating a unified view.
How it works
-
SDK passes tick context to the decide callback:
Thedecide(observation)function already receives the tick viaobservation.tick. No change needed. -
Framework starters attach Arena metadata to LLM calls:
# In starters/langchain/agent.py result = graph.invoke( {"observation": obs}, config={"metadata": {"arena_tick": obs.tick, "arena_agent": obs.agent_id}} )
LangSmith automatically indexes this metadata, making it searchable.
-
TraceStore captures the framework trace URL back:
# After the LLM call, store the link trace.add_step("framework_trace", { "langsmith_run_id": run_id, "langsmith_url": f"https://smith.langchain.com/runs/{run_id}" })
-
Result: unified per-tick trace
Tick 42: observation: {pos: [1,2,3], resources: [{name: "berry", dist: 3.2}]} framework_trace: https://smith.langchain.com/runs/abc123 ← click to see LLM details decision: {tool: "collect", params: {target: "berry"}} tool_result: {success: true, items_collected: 1} score: {resources_collected: 5}
Framework-agnostic design
The bridge should work with any framework:
- LangGraph: LangSmith run metadata + callbacks
- Claude SDK: Anthropic console trace IDs
- OpenAI SDK: OpenAI dashboard request IDs
- Custom: Any string URL/ID the user wants to attach
The SDK provides a simple hook:
def decide(observation: Observation) -> Decision:
# User's framework code here...
observation.trace_metadata["framework_url"] = langsmith_url
return decisionAcceptance Criteria
- TraceStore supports storing external trace links per
(agent_id, tick) - LangGraph starter attaches
arena_tick+arena_agentas LangSmith run metadata - TraceStore captures LangSmith run URL back into the game-side trace
- A user can go from tick → full prompt/response in LangSmith with one click
- Design is framework-agnostic (works for Claude SDK, OpenAI, etc.)
- Documentation shows the debugging workflow end-to-end
Dependencies
- Depends on Add framework adapter system for LangGraph, Claude Agent SDK, and other agent frameworks #74 (framework adapter system — need at least one working framework starter)
- Related to Refactor inspector into game-side debugger, delegate LLM tracing to frameworks #75 (inspector refactor — game-side trace becomes the inspector's data source)
Estimated Effort
1 day (after #74 is complete)
Metadata
Metadata
Assignees
Labels
Type
Projects
Status