Skip to content

Add --eval factual named evaluator #58

@hidai25

Description

@hidai25

Summary

Add a built-in evaluator activated with --eval factual that checks whether the agent's response is factually consistent with its tool outputs. Currently users must write a rubric from scratch.

What it should do

Compare the agent's final response against the data returned by its tool calls and flag contradictions. No rubric needed in the YAML.

How to implement

  • Add evalview/evaluators/factual_evaluator.py
  • Follow the pattern of existing evaluators in evalview/evaluators/
  • Hook it into the --eval flag in evalview/cli.py

Acceptance criteria

  • evalview run --eval factual works without a rubric in the YAML
  • Test passes when response is consistent with tool outputs
  • Test fails when response contradicts tool outputs

Good first issue notes

This is a well-scoped, isolated task. The evaluator pattern is consistent across the codebase — look at existing evaluators for reference. No changes to core architecture needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions