Skip to content

feat: multi-turn conversation support in HTML visual report (Mermaid diagram + turn sections) #65

@hidai25

Description

@hidai25

Summary

The HTML visual report (evalview report) generates a Mermaid sequence diagram for each test. For multi-turn tests, all turns' tool calls are merged into one flat diagram with no indication of where each turn starts and ends.

Current behaviour

The Mermaid diagram shows all steps in one flat sequence — for a 3-turn booking test you get 8 tool calls with no turn boundaries visible.

Desired behaviour

Add visual separators between turns in the Mermaid diagram:

sequenceDiagram
  participant User
  participant Agent

  Note over User,Agent: Turn 1 — "Find flights NYC to Paris"
  User->>Agent: find flights NYC to Paris
  Agent->>Flights API: search_flights(origin=NYC, dest=CDG)
  Flights API-->>Agent: [results]

  Note over User,Agent: Turn 2 — "Book the cheapest one"
  User->>Agent: book the cheapest one
  Agent->>Bookings API: book_flight(flight_id=AA123)
  ...

Also add a "Turns" section in the test result card showing each turn's query and tool calls in collapsible <details> blocks.

Implementation hints

  • Mermaid generation is in evalview/visualization/generators.py
  • The ExecutionTrace has a flat steps list — you'll need to tag steps with their turn index
    • Option A (simpler): Add a turn_index: Optional[int] field to ExecutionStep in evalview/core/types.py and set it in _execute_multi_turn_trace
    • Option B (no schema change): Use step timestamps to infer boundaries from turn_traces end times
  • Mermaid Note over syntax for turn headers: Note over User,Agent: Turn 1 — query text

Files to touch

  • evalview/visualization/generators.py — Mermaid generation
  • evalview/core/types.py — optionally add turn_index to ExecutionStep
  • evalview/cli.py_execute_multi_turn_trace to tag steps with turn index

Acceptance criteria

  • Mermaid diagram shows Note over separators between turns
  • Each turn's query appears in the note text (truncated to 60 chars)
  • Single-turn tests are unaffected
  • HTML report still opens and renders correctly

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions