refactor: clean up code smells and fix minor issues by sumukshashidhar · Pull Request #197 · huggingface/yourbench

sumukshashidhar · 2025-12-28T03:21:30Z

Summary

This PR addresses several code quality issues identified during a comprehensive codebase review.

Changes

1. Fixed Unused `_validate_mode()` Function (HIGH priority)

File: yourbench/pipeline/question_generation/_core.py
The _validate_mode() function was defined but never used
The same validation logic was duplicated 3 times in run_single_shot(), run_multi_hop(), and run_cross_document()
Added _get_mode_from_config() wrapper and replaced all 3 duplicated blocks

2. Removed Dead Code `_expand_env_in_dict()` (MEDIUM priority)

File: yourbench/conf/schema.py
Removed function that was defined but never called anywhere in the codebase

3. Fixed Loguru Teardown Error (LOW priority - cosmetic)

File: yourbench/utils/inference/inference_tracking.py
Removed logger calls in _write_aggregate_log() atexit handler
This fixes the ValueError: I/O operation on closed file noise during test runs

4. Improved Prompt Load Error Handling (HIGH priority)

File: yourbench/conf/loader.py
Changed from generic except Exception with warning to:
- except AttributeError: pass for expected disabled stages
- except Exception: logger.error(...) for unexpected errors
Provides better visibility into actual problems vs expected behavior

Impact

Net change: -22 lines (16 added, 38 removed)
All 104 tests pass
Ruff checks pass
No breaking changes to public API

Testing

uv run pytest tests/ -q --tb=short  # 104 passed
uv run ruff check yourbench/        # All checks passed
uv run ruff format --check yourbench/ # All formatted

- Use existing _validate_mode() instead of duplicating validation 3x - Remove unused _expand_env_in_dict() dead code - Fix loguru teardown error in inference tracking - Improve prompt load error handling with proper exception types Net: -22 lines, all 104 tests pass

codecov-commenter · 2025-12-28T03:22:22Z

Codecov Report

❌ Patch coverage is 40.00000% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 52.94%. Comparing base (804782b) to head (caa3c4a).

Files with missing lines	Patch %	Lines
yourbench/conf/loader.py	0.00%	3 Missing ⚠️
yourbench/pipeline/question_generation/_core.py	66.66%	2 Missing ⚠️
yourbench/utils/inference/inference_tracking.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #197      +/-   ##
==========================================
+ Coverage   52.41%   52.94%   +0.53%     
==========================================
  Files          32       32              
  Lines        2852     2831      -21     
==========================================
+ Hits         1495     1499       +4     
+ Misses       1357     1332      -25

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…v.py - Create centralized env.py with expand_env_value, expand_env_recursive, and validate_env_expanded functions - Remove duplicate _expand_env from conf/schema.py (use shared utility) - Update conf/loader.py to use expand_env_recursive - Update utils/dataset_engine.py to use validate_env_expanded - All 104 tests passing

…models This makes config validation stricter - typos in config field names will now raise ValidationError instead of being silently ignored. All 14 Pydantic model classes updated.

Reduced inference_builders.py from 331 to 238 lines (28% reduction). New helper functions: - _builder_context(): Context manager for metrics and timing - _log_builder_completion(): Unified logging for both builders - _build_tags(): Common tag building logic - _create_call(): Shared InferenceCall creation Used dataclass field() for cleaner defaults in InferenceJob and BuilderMetrics.

When a model has base_url set (indicating a custom endpoint like OpenAI), but api_key is not set, log a warning at model initialization time. This gives early feedback rather than cryptic failures during inference.

- Add 14 tests for yourbench/utils/env.py module - Cover expand_env_value, expand_env_recursive, validate_env_expanded - Test edge cases: nested structures, missing vars, special syntax Also update docs/CONFIGURATION.md to document strict validation

- Replace custom $VAR expansion with Python's native os.path.expandvars() - Now supports both $VAR and ${VAR} syntax automatically - Remove _resolve_hf_organization (unused special case) - Remove _expand_env_vars wrapper in loader.py (inline call) - Update tests for new behavior env.py: 90 → 65 lines (-28%)

Merge related test cases into single test methods: - test_env_utils: 13 → 5 tests - test_schema_validation: 16 → 5 tests - test_comb: 17 → 11 tests Total: 117 → 91 tests (-22%)

- Remove aws_support_documentation (no data folder) - Remove generate_questions_from_docs (overlaps with custom_prompts_demo) - Remove mckinsey_global_report (overlaps with custom_prompts_demo) - Rename childrens_books -> custom_prompts_demo - Add shared example/prompts/ with reusable templates - Update example/README.md with comparison table Remaining examples: - default_example: Quickstart with HuggingFace - harry_potter_quizz: Comprehensive tutorial - custom_prompts_demo: Custom prompt usage - local_vllm_private_data: Self-hosted models - rich_pdf_extraction_with_gemini: LLM ingestion

- Replace naive regex in _best_effort_json_extract() with balanced bracket parser that correctly handles nested JSON structures - Add _is_valid_question_list() helper to filter non-question arrays - Fixes multi-hop question parsing that was extracting partial JSON

Some models return questions wrapped in a dict like {"questions": [...]} instead of a raw array. This adds handling for common wrapper keys: - questions - question_list - qa_pairs - pairs - data Also adds debug logging for parsing diagnostics.

sumukshashidhar added 4 commits December 28, 2025 03:27

refactor(schema): change extra:allow to extra:forbid on all Pydantic …

2aa9a8c

…models This makes config validation stricter - typos in config field names will now raise ValidationError instead of being silently ignored. All 14 Pydantic model classes updated.

feat(inference): add warning for missing api_key with custom base_url

9a4a0a2

When a model has base_url set (indicating a custom endpoint like OpenAI), but api_key is not set, log a warning at model initialization time. This gives early feedback rather than cryptic failures during inference.

sumukshashidhar force-pushed the refactor/code-quality-improvements branch from c9b1c4b to 9a4a0a2 Compare December 28, 2025 04:01

sumukshashidhar added 3 commits December 28, 2025 04:04

test: consolidate redundant tests while maintaining coverage

14ae23b

Merge related test cases into single test methods: - test_env_utils: 13 → 5 tests - test_schema_validation: 16 → 5 tests - test_comb: 17 → 11 tests Total: 117 → 91 tests (-22%)

sumukshashidhar force-pushed the refactor/code-quality-improvements branch from cff6f86 to 14ae23b Compare December 28, 2025 04:33

sumukshashidhar added 9 commits December 28, 2025 04:38

docs: document both $VAR and ${VAR} env var syntax

3d7a9f3

fix: handle dict config in cross_document_utils

7f89b40

fix: add warning when no valid questions are parsed

d178ad8

fix: add missing return statement in cross_document_utils

9de635e

perf: increase default max_concurrent_requests from 32 to 128

4ebc8ff

docs: update documentation for consolidated examples and new defaults

978fc98

sumukshashidhar merged commit 61842d6 into main Dec 28, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: clean up code smells and fix minor issues#197

refactor: clean up code smells and fix minor issues#197
sumukshashidhar merged 17 commits intomainfrom
refactor/code-quality-improvements

sumukshashidhar commented Dec 28, 2025

Uh oh!

codecov-commenter commented Dec 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sumukshashidhar commented Dec 28, 2025

Summary

Changes

1. Fixed Unused _validate_mode() Function (HIGH priority)

2. Removed Dead Code _expand_env_in_dict() (MEDIUM priority)

3. Fixed Loguru Teardown Error (LOW priority - cosmetic)

4. Improved Prompt Load Error Handling (HIGH priority)

Impact

Testing

Uh oh!

codecov-commenter commented Dec 28, 2025

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. Fixed Unused `_validate_mode()` Function (HIGH priority)

2. Removed Dead Code `_expand_env_in_dict()` (MEDIUM priority)