[DRAFT] harness: Add Vidore V3 benchmark and BEIR metrics support by jioffe502 · Pull Request #1378 · NVIDIA/nv-ingest

jioffe502 · 2026-02-05T18:12:45Z

Description

Adds Vidore V3 benchmark support and BEIR evaluation metrics to the test harness.

Changes

Add Vidore V3 dataset configurations with HuggingFace integration for ground truth
Add dataset groups feature for running multiple datasets (e.g., --dataset=vidore)
Add optional BEIR metrics (NDCG, MAP, Precision) for recall evaluation

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

- Add 8 Vidore V3 dataset configurations (finance_en, industrial, computer_science, pharmaceuticals, hr, energy, physics, finance_fr) - Add vidore_load_ground_truth() using HuggingFace datasets API - Add vidore_recall() evaluator with PDF-only matching - Add extract_page_as_image, extract_method, image_elements_modality config options to support Vidore's OCR-based page image retrieval - Add datasets>=2.0.0 dependency for HuggingFace qrels loading Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Jacob Ioffe <jioffe@nvidia.com>

- Add dataset_groups section to test_configs.yaml with vidore, vidore_english, vidore_quick groups - Add expand_dataset_names() in config.py to handle group expansion - Add --list-datasets CLI option to show available datasets and groups - Update README.md with dataset groups documentation Usage: uv run nv-ingest-harness-run --list-datasets uv run nv-ingest-harness-run --case=e2e_recall --dataset=vidore uv run nv-ingest-harness-run --case=e2e_recall --dataset=vidore_quick Note: test_configs.yaml includes temp test settings (vdb_backend: milvus, reranker_mode: none, modified vidore_quick) - revert after testing. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add optional BEIR evaluation (NDCG, MAP, Precision) to recall tests - Configurable via enable_beir in test_configs.yaml or ENABLE_BEIR env var - Add beir>=2.0.0 dependency to harness - Add nvidia/llama-nemotron-embed-vl-1b-v2 to known embedding models Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add embed model fallback detection (dim=1024 warning) to e2e.py and recall.py - Add Milvus collection vector dimension verification after ingestion - Enable BEIR metrics by default for all Vidore V3 datasets Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Jacob Ioffe <jioffe@nvidia.com>

jioffe502 and others added 3 commits February 3, 2026 23:31

jioffe502 requested a review from a team as a code owner February 5, 2026 18:12

jioffe502 requested review from ChrisJar, charlesbluca and drobison00 and removed request for drobison00 February 5, 2026 18:12

jioffe502 marked this pull request as draft February 5, 2026 18:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DRAFT] harness: Add Vidore V3 benchmark and BEIR metrics support#1378

[DRAFT] harness: Add Vidore V3 benchmark and BEIR metrics support#1378
jioffe502 wants to merge 4 commits intoNVIDIA:mainfrom
jioffe502:vidore-v3-benchmark

jioffe502 commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jioffe502 commented Feb 5, 2026

Description

Changes

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant