Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 21 additions & 7 deletions tests/script/test_compare_evaluations.py
Original file line number Diff line number Diff line change
Expand Up @@ -191,18 +191,32 @@ def test_compare_score_distributions_scipy_example(
# (though the exact p-values depend on the implementation)
assert "tests" in result

def test_compare_score_distributions_identical_data(
def test_compare_score_distributions_precise_delta(
self, comparison_instance: EvaluationComparison
) -> None:
"""Test _compare_score_distributions with identical data."""
scores1 = [0.8, 0.8, 0.8, 0.8, 0.8]
scores2 = [0.8, 0.8, 0.8, 0.8, 0.8]
"""
This test validates behavior with a precise 0.001 mean difference instead of identical values, using reasonable floating-point tolerance and verifying mean calculations.
"""
scores1 = [0.7999, 0.7988, 0.799, 0.80, 0.81]
scores2 = [s + 0.001 for s in scores1]

result = comparison_instance._compare_score_distributions(scores1, scores2)

assert result["run1_stats"]["mean"] == result["run2_stats"]["mean"]
assert result["mean_difference"] == 0.0
assert result["relative_change"] == 0.0
expected_mean1 = 0.80154
expected_diff = 0.001
expected_rel_change = (expected_diff / expected_mean1) * 100 # ~0.1248%

assert result["run1_stats"]["mean"] == pytest.approx(expected_mean1)
f"Baseline mean mismatch. Expected {expected_mean1}, got {result['run1_stats']['mean']}"
Comment on lines +209 to +210
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Bug: assertion message is a detached expression, never displayed on failure.

Line 209's assert ends without a message. Line 210 is a standalone f-string expression that is silently discarded — it's not the assertion's failure message.

🐛 Proposed fix
-        assert result["run1_stats"]["mean"] == pytest.approx(expected_mean1)
-        f"Baseline mean mismatch. Expected {expected_mean1}, got {result['run1_stats']['mean']}"
+        assert result["run1_stats"]["mean"] == pytest.approx(expected_mean1), (
+            f"Baseline mean mismatch. Expected {expected_mean1}, got {result['run1_stats']['mean']}"
+        )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
assert result["run1_stats"]["mean"] == pytest.approx(expected_mean1)
f"Baseline mean mismatch. Expected {expected_mean1}, got {result['run1_stats']['mean']}"
assert result["run1_stats"]["mean"] == pytest.approx(expected_mean1), (
f"Baseline mean mismatch. Expected {expected_mean1}, got {result['run1_stats']['mean']}"
)
🤖 Prompt for AI Agents
In `@tests/script/test_compare_evaluations.py` around lines 209 - 210, The
assertion currently lacks its failure message because the f-string on the next
line is a detached expression; update the assert that compares
result["run1_stats"]["mean"] to pytest.approx(expected_mean1) to include the
f-string as the second argument (assert <condition>, f"...") so the message with
expected_mean1 and actual result['run1_stats']['mean'] is displayed on failure;
locate this in the test_compare_evaluations.py test where result and
expected_mean1 are used.


expected_mean2 = expected_mean1 + expected_diff
assert result["run2_stats"]["mean"] == pytest.approx(
expected_mean2
), f"Adjusted mean mismatch. Expected {expected_mean2}, got {result['run2_stats']['mean']}"

assert result["mean_difference"] == pytest.approx(
expected_diff
), f"Mean difference mismatch. Expected {expected_diff}, got {result['mean_difference']}"

def test_perform_pass_rate_tests_basic(
self, comparison_instance: EvaluationComparison
Expand Down