Skip to content

feat: per-turn pass/fail display in multi-turn console output #63

@hidai25

Description

@hidai25

Summary

Right now, multi-turn tests show a single merged result in the console — but there's no visibility into which individual turn caused a failure.

Current behaviour

✗  flight-booking-conversation   score=52  (3 turns)

Desired behaviour

✗  flight-booking-conversation   score=52  (3 turns)
     Turn 1 ✓  search_flights called, output OK
     Turn 2 ✗  book_flight not called  ← failure here
     Turn 3 –  skipped (previous turn failed)

Implementation hints

  • _execute_multi_turn_trace() in evalview/cli.py already collects turn_traces (one ExecutionTrace per turn)
  • The per-turn traces are merged before evaluation — split them out first, evaluate each independently, then merge
  • The verbose log line is at cli.py around the is_multi_turn dispatch block — add per-turn status lines there with [dim] formatting
  • Use / / glyphs, indented 5 spaces to align under the test name

Files to touch

  • evalview/cli.pyexecute_single_test + _execute_multi_turn_trace
  • Optionally evalview/reporters/console_reporter.py if you want a shared formatter

Good first issue?

Yes — the plumbing (turn_traces list) is already there. This is purely a display change with no changes to evaluation logic.

Acceptance criteria

  • evalview run shows per-turn status for multi-turn tests
  • Pass/fail per turn is visible even without --verbose
  • No changes to scoring logic

Metadata

Metadata

Assignees

No one assigned

    Labels

    cliCommand-line interfaceenhancementNew feature or requestgood first issueGood for newcomershelp wantedExtra attention is needed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions