-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Labels
good first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed
Description
Summary
Add a built-in evaluator activated with --eval factual that checks whether the agent's response is factually consistent with its tool outputs. Currently users must write a rubric from scratch.
What it should do
Compare the agent's final response against the data returned by its tool calls and flag contradictions. No rubric needed in the YAML.
How to implement
- Add
evalview/evaluators/factual_evaluator.py - Follow the pattern of existing evaluators in
evalview/evaluators/ - Hook it into the
--evalflag inevalview/cli.py
Acceptance criteria
evalview run --eval factualworks without a rubric in the YAML- Test passes when response is consistent with tool outputs
- Test fails when response contradicts tool outputs
Good first issue notes
This is a well-scoped, isolated task. The evaluator pattern is consistent across the codebase — look at existing evaluators for reference. No changes to core architecture needed.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
good first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed