refactor: clean up code smells and fix minor issues#197
Merged
sumukshashidhar merged 17 commits intomainfrom Dec 28, 2025
Merged
refactor: clean up code smells and fix minor issues#197sumukshashidhar merged 17 commits intomainfrom
sumukshashidhar merged 17 commits intomainfrom
Conversation
- Use existing _validate_mode() instead of duplicating validation 3x - Remove unused _expand_env_in_dict() dead code - Fix loguru teardown error in inference tracking - Improve prompt load error handling with proper exception types Net: -22 lines, all 104 tests pass
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #197 +/- ##
==========================================
+ Coverage 52.41% 52.94% +0.53%
==========================================
Files 32 32
Lines 2852 2831 -21
==========================================
+ Hits 1495 1499 +4
+ Misses 1357 1332 -25 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…v.py - Create centralized env.py with expand_env_value, expand_env_recursive, and validate_env_expanded functions - Remove duplicate _expand_env from conf/schema.py (use shared utility) - Update conf/loader.py to use expand_env_recursive - Update utils/dataset_engine.py to use validate_env_expanded - All 104 tests passing
…models This makes config validation stricter - typos in config field names will now raise ValidationError instead of being silently ignored. All 14 Pydantic model classes updated.
Reduced inference_builders.py from 331 to 238 lines (28% reduction). New helper functions: - _builder_context(): Context manager for metrics and timing - _log_builder_completion(): Unified logging for both builders - _build_tags(): Common tag building logic - _create_call(): Shared InferenceCall creation Used dataclass field() for cleaner defaults in InferenceJob and BuilderMetrics.
When a model has base_url set (indicating a custom endpoint like OpenAI), but api_key is not set, log a warning at model initialization time. This gives early feedback rather than cryptic failures during inference.
c9b1c4b to
9a4a0a2
Compare
- Add 14 tests for yourbench/utils/env.py module - Cover expand_env_value, expand_env_recursive, validate_env_expanded - Test edge cases: nested structures, missing vars, special syntax Also update docs/CONFIGURATION.md to document strict validation
- Replace custom $VAR expansion with Python's native os.path.expandvars()
- Now supports both $VAR and ${VAR} syntax automatically
- Remove _resolve_hf_organization (unused special case)
- Remove _expand_env_vars wrapper in loader.py (inline call)
- Update tests for new behavior
env.py: 90 → 65 lines (-28%)
Merge related test cases into single test methods: - test_env_utils: 13 → 5 tests - test_schema_validation: 16 → 5 tests - test_comb: 17 → 11 tests Total: 117 → 91 tests (-22%)
cff6f86 to
14ae23b
Compare
- Remove aws_support_documentation (no data folder) - Remove generate_questions_from_docs (overlaps with custom_prompts_demo) - Remove mckinsey_global_report (overlaps with custom_prompts_demo) - Rename childrens_books -> custom_prompts_demo - Add shared example/prompts/ with reusable templates - Update example/README.md with comparison table Remaining examples: - default_example: Quickstart with HuggingFace - harry_potter_quizz: Comprehensive tutorial - custom_prompts_demo: Custom prompt usage - local_vllm_private_data: Self-hosted models - rich_pdf_extraction_with_gemini: LLM ingestion
- Replace naive regex in _best_effort_json_extract() with balanced bracket parser that correctly handles nested JSON structures - Add _is_valid_question_list() helper to filter non-question arrays - Fixes multi-hop question parsing that was extracting partial JSON
Some models return questions wrapped in a dict like {"questions": [...]}
instead of a raw array. This adds handling for common wrapper keys:
- questions
- question_list
- qa_pairs
- pairs
- data
Also adds debug logging for parsing diagnostics.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR addresses several code quality issues identified during a comprehensive codebase review.
Changes
1. Fixed Unused
_validate_mode()Function (HIGH priority)yourbench/pipeline/question_generation/_core.py_validate_mode()function was defined but never usedrun_single_shot(),run_multi_hop(), andrun_cross_document()_get_mode_from_config()wrapper and replaced all 3 duplicated blocks2. Removed Dead Code
_expand_env_in_dict()(MEDIUM priority)yourbench/conf/schema.py3. Fixed Loguru Teardown Error (LOW priority - cosmetic)
yourbench/utils/inference/inference_tracking.py_write_aggregate_log()atexit handlerValueError: I/O operation on closed filenoise during test runs4. Improved Prompt Load Error Handling (HIGH priority)
yourbench/conf/loader.pyexcept Exceptionwith warning to:except AttributeError: passfor expected disabled stagesexcept Exception: logger.error(...)for unexpected errorsImpact
Testing