Skip to content

feat: multi-turn support in evalview capture (record conversation sessions as turns: YAML) #64

@hidai25

Description

@hidai25

Summary

evalview capture is a proxy that records real agent traffic as test YAML files — but it creates single-turn tests (input: query: ...). If a user's app makes 3 consecutive calls to the agent during one session, capture should group them into a turns: block instead of 3 separate files.

Current behaviour

3 requests to the agent → 3 separate single-turn YAML files:

tests/captured-1.yaml   (input: query: "find flights NYC-Paris")
tests/captured-2.yaml   (input: query: "book the cheapest")
tests/captured-3.yaml   (input: query: "send me a confirmation")

Desired behaviour

Add a --session-timeout option (default: 30 s). Requests arriving within the timeout of each other are grouped into one turns: YAML:

name: captured-session-1
turns:
  - query: "find flights NYC-Paris"
  - query: "book the cheapest"
  - query: "send me a confirmation"
expected:
  tools: []
thresholds:
  min_score: 70

Implementation hints

  • Capture logic lives in evalview/cli.py in the capture command
  • Each captured request is currently written immediately — buffer them per session instead
  • A session boundary = gap > --session-timeout seconds between requests, or Ctrl-C
  • On session close, if len(requests) >= 2 → write turns: YAML; otherwise write single-turn as today
  • The ConversationTurn model in evalview/core/types.py is the schema to use

Files to touch

  • evalview/cli.pycapture command
  • evalview/core/types.py — already has ConversationTurn

Acceptance criteria

  • evalview capture --session-timeout 30 groups requests into turns
  • Single requests still produce single-turn YAML (backward compatible)
  • Generated YAML is valid and passes evalview run
  • --session-timeout 0 disables grouping (always single-turn)

Metadata

Metadata

Assignees

No one assigned

    Labels

    cliCommand-line interfaceenhancementNew feature or requestgood first issueGood for newcomershelp wantedExtra attention is needed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions