Skip to content

refactor: clean up code smells and fix minor issues#197

Merged
sumukshashidhar merged 17 commits intomainfrom
refactor/code-quality-improvements
Dec 28, 2025
Merged

refactor: clean up code smells and fix minor issues#197
sumukshashidhar merged 17 commits intomainfrom
refactor/code-quality-improvements

Conversation

@sumukshashidhar
Copy link
Collaborator

Summary

This PR addresses several code quality issues identified during a comprehensive codebase review.

Changes

1. Fixed Unused _validate_mode() Function (HIGH priority)

  • File: yourbench/pipeline/question_generation/_core.py
  • The _validate_mode() function was defined but never used
  • The same validation logic was duplicated 3 times in run_single_shot(), run_multi_hop(), and run_cross_document()
  • Added _get_mode_from_config() wrapper and replaced all 3 duplicated blocks

2. Removed Dead Code _expand_env_in_dict() (MEDIUM priority)

  • File: yourbench/conf/schema.py
  • Removed function that was defined but never called anywhere in the codebase

3. Fixed Loguru Teardown Error (LOW priority - cosmetic)

  • File: yourbench/utils/inference/inference_tracking.py
  • Removed logger calls in _write_aggregate_log() atexit handler
  • This fixes the ValueError: I/O operation on closed file noise during test runs

4. Improved Prompt Load Error Handling (HIGH priority)

  • File: yourbench/conf/loader.py
  • Changed from generic except Exception with warning to:
    • except AttributeError: pass for expected disabled stages
    • except Exception: logger.error(...) for unexpected errors
  • Provides better visibility into actual problems vs expected behavior

Impact

  • Net change: -22 lines (16 added, 38 removed)
  • All 104 tests pass
  • Ruff checks pass
  • No breaking changes to public API

Testing

uv run pytest tests/ -q --tb=short  # 104 passed
uv run ruff check yourbench/        # All checks passed
uv run ruff format --check yourbench/ # All formatted

- Use existing _validate_mode() instead of duplicating validation 3x
- Remove unused _expand_env_in_dict() dead code
- Fix loguru teardown error in inference tracking
- Improve prompt load error handling with proper exception types

Net: -22 lines, all 104 tests pass
@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 40.00000% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 52.94%. Comparing base (804782b) to head (caa3c4a).

Files with missing lines Patch % Lines
yourbench/conf/loader.py 0.00% 3 Missing ⚠️
yourbench/pipeline/question_generation/_core.py 66.66% 2 Missing ⚠️
yourbench/utils/inference/inference_tracking.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #197      +/-   ##
==========================================
+ Coverage   52.41%   52.94%   +0.53%     
==========================================
  Files          32       32              
  Lines        2852     2831      -21     
==========================================
+ Hits         1495     1499       +4     
+ Misses       1357     1332      -25     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…v.py

- Create centralized env.py with expand_env_value, expand_env_recursive, and validate_env_expanded functions
- Remove duplicate _expand_env from conf/schema.py (use shared utility)
- Update conf/loader.py to use expand_env_recursive
- Update utils/dataset_engine.py to use validate_env_expanded
- All 104 tests passing
…models

This makes config validation stricter - typos in config field names
will now raise ValidationError instead of being silently ignored.

All 14 Pydantic model classes updated.
Reduced inference_builders.py from 331 to 238 lines (28% reduction).

New helper functions:
- _builder_context(): Context manager for metrics and timing
- _log_builder_completion(): Unified logging for both builders
- _build_tags(): Common tag building logic
- _create_call(): Shared InferenceCall creation

Used dataclass field() for cleaner defaults in InferenceJob and BuilderMetrics.
When a model has base_url set (indicating a custom endpoint like OpenAI),
but api_key is not set, log a warning at model initialization time.

This gives early feedback rather than cryptic failures during inference.
@sumukshashidhar sumukshashidhar force-pushed the refactor/code-quality-improvements branch from c9b1c4b to 9a4a0a2 Compare December 28, 2025 04:01
- Add 14 tests for yourbench/utils/env.py module
- Cover expand_env_value, expand_env_recursive, validate_env_expanded
- Test edge cases: nested structures, missing vars, special syntax

Also update docs/CONFIGURATION.md to document strict validation
- Replace custom $VAR expansion with Python's native os.path.expandvars()
- Now supports both $VAR and ${VAR} syntax automatically
- Remove _resolve_hf_organization (unused special case)
- Remove _expand_env_vars wrapper in loader.py (inline call)
- Update tests for new behavior

env.py: 90 → 65 lines (-28%)
Merge related test cases into single test methods:
- test_env_utils: 13 → 5 tests
- test_schema_validation: 16 → 5 tests
- test_comb: 17 → 11 tests

Total: 117 → 91 tests (-22%)
@sumukshashidhar sumukshashidhar force-pushed the refactor/code-quality-improvements branch from cff6f86 to 14ae23b Compare December 28, 2025 04:33
- Remove aws_support_documentation (no data folder)
- Remove generate_questions_from_docs (overlaps with custom_prompts_demo)
- Remove mckinsey_global_report (overlaps with custom_prompts_demo)
- Rename childrens_books -> custom_prompts_demo
- Add shared example/prompts/ with reusable templates
- Update example/README.md with comparison table

Remaining examples:
- default_example: Quickstart with HuggingFace
- harry_potter_quizz: Comprehensive tutorial
- custom_prompts_demo: Custom prompt usage
- local_vllm_private_data: Self-hosted models
- rich_pdf_extraction_with_gemini: LLM ingestion
- Replace naive regex in _best_effort_json_extract() with balanced
  bracket parser that correctly handles nested JSON structures
- Add _is_valid_question_list() helper to filter non-question arrays
- Fixes multi-hop question parsing that was extracting partial JSON
Some models return questions wrapped in a dict like {"questions": [...]}
instead of a raw array. This adds handling for common wrapper keys:
- questions
- question_list
- qa_pairs
- pairs
- data

Also adds debug logging for parsing diagnostics.
@sumukshashidhar sumukshashidhar merged commit 61842d6 into main Dec 28, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants