Add --eval no-hallucination named evaluator

## Summary

Add a built-in evaluator that checks whether the agent's response contains claims not supported by its tool outputs. Activated with `--eval no-hallucination`.

## What it should do

Use LLM-as-judge with a fixed prompt to compare the agent's final response against the data returned by its tool calls. Flag any claims in the response that have no grounding in the tool outputs.

## How to implement

- Add `evalview/evaluators/hallucination_evaluator.py`
- Follow the pattern of existing evaluators in `evalview/evaluators/`
- Use the existing LLM judge infrastructure (already set up in the codebase)
- Hook it into the `--eval` flag in `evalview/cli.py`

## Acceptance criteria

- `evalview run --eval no-hallucination` works without a rubric in the YAML
- Test fails when response contains a claim not present in tool outputs
- Test passes when response is fully grounded in tool data

## Good first issue notes

The LLM judge infrastructure is already built — you just need to write the evaluator class and the judge prompt. Look at `evalview/evaluators/output_evaluator.py` as the closest reference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add --eval no-hallucination named evaluator #60

Summary

What it should do

How to implement

Acceptance criteria

Good first issue notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add --eval no-hallucination named evaluator #60

Description

Summary

What it should do

How to implement

Acceptance criteria

Good first issue notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions