Integrate MeasureDefScorer Part 1: Foundation for Version-Agnostic Scoring #859

lukedegruchy · 2025-12-10T18:52:44Z

Merge Request Description

Summary

This MR implements Part 1 of a 2-part plan to integrate version-agnostic measure scoring into the clinical-reasoning evaluation workflow. Part 1 focuses on creating foundation infrastructure and a new external API for post-hoc measure scoring without modifying existing evaluation behavior. The primary deliverable is MeasureReportScoringFhirAdapter, a version-agnostic API that enables external consumers (specifically the cdr-cr project) to score MeasureReports that already have population counts but lack calculated scores.

New External API: Created MeasureReportScoringFhirAdapter as a simplified, version-agnostic entry point for post-hoc MeasureReport scoring, eliminating the need for external consumers to manage version-specific scorers and builders.
Score Copying Infrastructure: Added copyScoresFromDef() methods to R4 and DSTU3 MeasureReport builders that copy scores from Def objects to FHIR MeasureReports. This infrastructure is currently inactive (Def objects have null scores in Part 1) and will be activated in Part 2 when MeasureDefScorer is integrated into the evaluation workflow.
Def Class Enhancements: Enhanced StratumDef with getMeasureScore() for improvement notation and createSnapshot() for immutable copies, plus added createSnapshot() to MeasureDef and other Def classes to support the Def capture framework.
Version-Agnostic Testing Framework: Implemented a comprehensive fhir2deftest package with SelectedDef assertion APIs and adapters that enable writing measure evaluation tests once and running them against multiple FHIR versions (DSTU3, R4, future R5).
Deprecation Documentation: Added javadoc notices to IMeasureReportScorer documenting the migration path to the new API while keeping all existing scorer functionality unchanged.

Code Review Suggestions

External API Design: Review MeasureReportScoringFhirAdapter and IMeasureReportScoringFhirAdapter interface design for usability by external consumers. Verify the static factory pattern and version detection logic align with clinical-reasoning API conventions.
Score Copying Logic Correctness: In R4MeasureReportBuilder.copyScoresFromDef() and Dstu3MeasureReportBuilder.copyScoresFromDef(), verify the smart matching logic (positional for single groups/stratifiers, ID-based for multiple) correctly handles edge cases like missing IDs, mismatched group counts, and component stratifiers.
FHIR Resource Mutation Safety: The score copying code uses patterns like if (!group.hasId()) to avoid mutating FHIR resources when checking for IDs. Verify these safe access patterns are consistently applied throughout the copying logic and no inadvertent side effects (like creating empty Coding objects via .getCodingFirstRep()) occur.
Part 1 Isolation Guarantee: Verify that score copying infrastructure remains truly inactive in Part 1. Check that copyScoresFromDef() is called but does nothing because Def objects have null scores, and old scorers remain the sole source of scores in regular evaluation workflows.
Def Snapshot Semantics: Review the createSnapshot() methods in GroupDef, StratumDef, and MeasureDef to ensure they create proper immutable copies. Verify that mutable score fields are copied correctly and defensive copying of collections is complete.
Testing Framework Architecture: Examine the fhir2deftest package design (Fhir2DefUnifiedMeasureTestHandler, FhirVersionTestContext, MeasureServiceAdapter) for potential coupling issues or abstraction leaks that could make multi-version testing fragile.
Migration Impact on cdr-cr: Consider whether the API changes to IMeasureReportScorer (javadoc deprecation only, no functional changes) adequately guide cdr-cr developers to the new MeasureReportScoringFhirAdapter API without breaking their existing code.
DefCaptureCallback Integration: Review how DefCaptureCallback is integrated into R4MeasureProcessor and Dstu3MeasureProcessor. Verify callback invocation happens at the correct point (after evaluation, before builder) and the snapshot semantics are correct.

QA Test Suggestions

Setup

Prepare test data:
- R4 and DSTU3 Measure resources with proportion scoring
- Corresponding MeasureReports with population counts but no scores
- Measures with single and multiple groups
- Measures with stratifiers (both single-component and multi-component)

Test Cases

…for counts This refactoring eliminates duplicate count calculations and establishes Def classes (MeasureDef, GroupDef, PopulationDef, StratumDef) as the authoritative source for all population counts in measure scoring. Key Changes: - Refactored R4MeasureReportScorer to read counts from Def classes instead of MeasureReport FHIR components - Added getPopulationCount() methods to GroupDef and StratumDef for cleaner count retrieval - Added helper methods to StratumDef: getStratumPopulation() and getPopulationCount() - Simplified ContinuousVariableObservationConverter interface to only handle conversion to FHIR Quantities - Removed ContinuousVariableObservationConverter parameter from entire ContinuousVariableObservationHandler call chain (no longer needed) - Added convertCqlResultToQuantityDef() method to handle CQL result conversion to QuantityDef - Updated exception types in convertCqlResultToQuantityDef() to use FHIR InvalidRequestException - Refactored MeasureScorerTest to use record classes instead of nested HashMaps - Fixed stratum scoring by properly handling CRITERIA-type stratifiers Benefits: - Eliminates redundant count calculations between Def classes and MeasureReport - Improves FHIR version independence (R4, DSTU3, R5 can share common logic) - Better separation of concerns: Def classes own count logic, converters only handle FHIR serialization - Improved code maintainability and testability All 931 tests passing in cqf-fhir-cr module. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…-scorer-count-and-quantity-def-refactoring

Create a FHIR-version-agnostic measure scorer that uses Def classes as the single source of truth for both data and computed scores, eliminating the need for version-specific scoring logic duplication. Architecture changes: - Rename MeasureReportScorer interface to IMeasureReportScorer - Create new MeasureDefScorer class (602 lines) with version-agnostic scoring logic that mutates Def objects instead of returning scores - Convert StratumDef from record to class to support mutable score field - Add score state (Double) with getters/setters to GroupDef and StratumDef Key features: - Def-first iteration pattern: uses MeasureDef.groups() and StratifierDef.getStratum() instead of FHIR object iteration - Mutation-based design: scoreGroup() is void and sets scores directly on GroupDef/StratumDef objects via setScore() - Enhanced switch expressions with early returns in aggregate() method - GroupDef.getMeasureScore() handles improvement notation adjustment ("increase" returns score as-is, "decrease" returns 1 - score) - Population basis support: correctly handles boolean (count subjects) vs non-boolean (count resources) scoring differences Comprehensive test coverage (784 lines, 16 tests): - Proportion, ratio, and continuous variable scoring - Stratifier scoring with multiple strata - Zero denominator edge cases - All aggregate methods (SUM, AVG, MIN, MAX, MEDIAN, COUNT) - Improvement notation adjustment (increase/decrease) - Population basis impact (boolean vs Encounter counts) - Null, zero, and negative score handling Population basis is critical: boolean basis counts unique subjects (2/3 = 0.667) while Encounter basis counts all resources (5/9 = 0.556) for the same population data. Files changed: - New: IMeasureReportScorer.java (renamed interface) - New: MeasureDefScorer.java (version-agnostic scorer) - New: MeasureDefScorerTest.java (comprehensive unit tests) - New: PRPs/version-agnostic-measure-report-scorer.md (PRP documentation) - Modified: GroupDef.java (+54 lines: score field, getMeasureScore) - Modified: StratumDef.java (+48 lines: record to class, score field) - Modified: BaseMeasureReportScorer.java (interface rename) - Modified: R4MeasureReportBuilder.java (interface rename) - Modified: Dstu3MeasureReportBuilder.java (interface rename) - Deleted: MeasureReportScorer.java (renamed to IMeasureReportScorer) All 328 measure tests passing. No regressions. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>

…Def class design This commit completes comprehensive test coverage for MeasureDefScorer and addresses implementation gaps discovered through test-driven development. Implementation Changes: - Add scoreRatioMeasureObservationGroup() method for group-level ratio with observations scoring (lines 153-199) - Fix critical stratum filtering bug in getResultsForStratum() - now correctly filters at Map.Entry level (subject ID) instead of observation Map keys - Implement 4 new comprehensive tests for ratio with observations, cohort measures, and edge cases Def Class Enhancements: - Replace String ID with PopulationDef reference in StratumPopulationDef for better type safety and elimination of ID-based lookups - Rename GroupDef.get() to getPopulationDefs() for clarity - Add GroupDef.getFirstWithId() helper method for criteriaReference matching - Enhance PopulationDef.toString() with comprehensive debugging information - Update StratumPopulationDef.toString() to use PopulationDef's toString() Test Coverage: - testScoreGroup_RatioWithObservations_GroupLevel: Group-level ratio scoring - testScoreStratifier_RatioWithObservations_StratumLevel: Stratum-level ratio scoring with gender stratification - testScoreGroup_CohortMeasure_NoScoreSet: Verify cohort measures return null - testScoreGroup_MissingScoringType_ThrowsException: Validation error handling - Remove unused createMeasurePopulationConcept(type, criteria) helper method Documentation: - Add "Implementation Outcomes and Lessons Learned" section to version-agnostic PRP documenting actual implementation results - Update measure-def-scorer-test-coverage-enhancement.md to reflect completed work - Document data structure insights and R4 pattern reuse All Tests: 954 passing (20 in MeasureDefScorerTest), 0 failures, 0 errors This work represents Phase 1 completion of the version-agnostic measure scorer implementation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…-agnostic-measure-report-scorer

This commit completes the implementation of a version-agnostic test framework for capturing and asserting on MeasureDef state across FHIR versions (DSTU3, R4). ## Core Infrastructure (Phase 1) ### Def Snapshot API - Add DefCaptureCallback interface for capturing MeasureDef snapshots - Implement createSnapshot() methods on all Def classes: - MeasureDef, GroupDef, PopulationDef, StratifierDef, StratumDef - StratifierComponentDef, SdeDef - Deep copy collections, share FHIR resource references - Add defCaptureCallback field to MeasureEvaluationOptions ### PopulationDef Refactoring - Add populationBasis field to PopulationDef (moved from GroupDef dependency) - Add getPopulationBasis() and isBooleanBasis() methods with Javadoc - Streamline getCount() to use own populationBasis (no GroupDef parameter) - Update createSnapshot() to include populationBasis field ### Processor Integration - Add callback invocation to R4MeasureProcessor (after processResults) - Add callback invocation to Dstu3MeasureProcessor (after processResults) - Update MeasureDefBuilder classes to pass populationBasis to PopulationDef ## Version-Agnostic Test Framework (Phases 2-6) ### Test Infrastructure (21 new files in fhir2deftest/) - Fhir2DefUnifiedMeasureTestHandler: Unified Given/When/Then DSL - FhirVersionTestContext: Auto-detect FHIR version, factory for adapters - MeasureServiceAdapter interface: Version-agnostic evaluation API - Request/Response wrappers: MeasureEvaluationRequest, MeasureReportAdapter ### Version-Specific Adapters - R4MeasureServiceAdapter: R4 single/multi-measure support - R4MeasureReportAdapter, R4MultiMeasureReportAdapter - Dstu3MeasureServiceAdapter: DSTU3 single-measure support - Dstu3MeasureReportAdapter ### Fluent Assertion Classes (6 classes) - Selected: Base class for fluent navigation - SelectedDef: MeasureDef assertions and navigation - SelectedDefGroup: GroupDef assertions, population/stratifier navigation - SelectedDefPopulation: Subject/resource assertions, count validation - SelectedDefStratifier: Stratum navigation and assertions - SelectedDefStratum: Stratum population navigation - SelectedDefStratumPopulation: Stratum-level count assertions ### Repository Configuration - Add in-memory filtering to Fhir2DefUnifiedMeasureTestHandler: - SEARCH_FILTER_MODE.FILTER_IN_MEMORY - TERMINOLOGY_FILTER_MODE.FILTER_IN_MEMORY - VALUESET_EXPANSION_MODE.PERFORM_NAIVE_EXPANSION - Matches DSTU3/R4 Measure class configuration ### Integration Tests - ContinuousVariableResourceMeasureObservationFhir2DefTest (R4) - Tests continuous variable measure with MEASUREOBSERVATION - Validates Def capture, population counts, stratifiers - Dstu3MeasureAdditionalDataFhir2DefTest (DSTU3) - Tests measure evaluation with additional data Bundle - Validates Def capture and population counts ## Enhanced Test Coverage ### PopulationDefTest (10 tests, was 3) - Add isBooleanBasis() assertions to all existing tests - Add 7 new tests covering different population basis types: - testIsBooleanBasis_WithBooleanBasis/WithNonBooleanBasis - testGetCount_BooleanBasis_CountsUniqueSubjects - testGetCount_EncounterBasis_CountsAllResources - testGetCount_StringBasis_CountsAllResources - testGetCount_DateBasis_CountsAllResources - testGetCount_MeasureObservation_CountsObservations ### MeasureDefScorerTest - Fix population basis types to match master (source of truth): - testScoreGroup_SetsScoreOnGroupDef: boolean -> Encounter - testScoreGroup_ProportionWithExclusions: boolean -> String - Ensure same CodeDef instance used for PopulationDef and GroupDef ## Code Cleanup ### SelectedDef Classes - Remove Arrays.asList() wrappers, iterate directly over varargs - Fix SelectedDefPopulation.hasType() to compare with MeasurePopulationType.toCode() - Clean up unused imports ### Test Files - Remove 6 framework-specific tests from Dstu3MeasureAdditionalDataFhir2DefTest - Add Map import to PopulationDefTest, use non-qualified references ### Javadoc - Add comprehensive Javadoc for PopulationDef.getPopulationBasis() - Add comprehensive Javadoc for PopulationDef.isBooleanBasis() - All existing Javadoc verified for accuracy ## Testing - All 974 tests passing (13 skipped) - R4 and DSTU3 Def capture working correctly - Unified DSL working for both FHIR versions ## Documentation - Update version-agnostic-def-capture-framework.md PRP: - Mark all phases as COMPLETE ✅ - Add Implementation Summary section - Document final statistics and achievements - Note deviations from original plan 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…-def-test-suite

…oring This commit implements Part 1 of the MeasureDefScorer integration plan, establishing foundation components for version-agnostic measure scoring and creating a new external API for the cdr-cr project. No behavioral changes to existing measure evaluation workflows. ## Part 1 Scope: Foundation & Refactoring (No Behavioral Changes) This is Part 1 of a 2-part integration. Part 1 creates infrastructure without changing existing behavior. Part 2 will integrate MeasureDefScorer into the evaluation workflow. ### Key Constraint Old R4/DSTU3 scorers remain active and unchanged during regular measure evaluation. New scoring infrastructure only activated via MeasureReportScoringFhirAdapter API. ## Changes Implemented ### 1. StratumDef Enhancement - Added getMeasureScore() method for applying improvement notation to stratifiers - Added createSnapshot() method for creating immutable copies with score state - Stratifiers inherit improvement notation from parent group (per FHIR spec) ### 2. MeasureDefScorer Refinements - Added COHORT and COMPOSITE measure scoring support (returns null scores per spec) - Enhanced javadoc and method documentation - All scoring logic operates on Def classes (version-agnostic) ### 3. Score Copying Infrastructure in Builders (Inactive in Part 1) - **R4MeasureReportBuilder**: Added copyScoresFromDef() method - Smart matching: positional for single groups/stratifiers, ID-based for multiple - Safe FHIR access patterns prevent side effects - Currently inactive (Def objects have null scores in Part 1) - **Dstu3MeasureReportBuilder**: Added copyScoresFromDef() method - Same smart matching and safe access patterns as R4 - Currently inactive (Def objects have null scores in Part 1) ### 4. External API for cdr-cr Project (NEW) Created MeasureReportScoringFhirAdapter - version-agnostic post-hoc scoring API: - **MeasureReportScoringFhirAdapter**: Static factory entry point - Auto-detects FHIR version, routes to appropriate adapter - Validates measure/report compatibility - **IMeasureReportScoringFhirAdapter**: Version-agnostic interface - Default workflow: build MeasureDef → populate counts → score → copy back - Abstract methods for version-specific FHIR operations - **R4MeasureReportScoringFhirAdapter**: R4 implementation - Builds MeasureDef from Measure structure - Populates counts from MeasureReport - Uses MeasureDefScorer for version-agnostic scoring - Copies scores back to MeasureReport - **Dstu3MeasureReportScoringFhirAdapter**: DSTU3 implementation - Same workflow as R4, adapted for DSTU3 types ### 5. Deprecation Documentation - Added javadoc deprecation notices to IMeasureReportScorer interface - Documented migration path for external consumers - Old scorers remain fully functional (no runtime warnings) ### 6. Comprehensive Testing - **MeasureReportScoringFhirAdapterTest**: Tests external API - R4 and DSTU3 proportion measure scoring - Actual score assertions (0.75, 0.21, 0.80, 0.60) - Version mismatch detection - Null validation - **Dstu3MeasureReportBuilderTest**: Tests score copying infrastructure (DSTU3) - Verifies copyScoresFromDef() integration point - Currently no scores to copy (Part 1 foundation only) - **R4MeasureReportBuilderTest**: Tests score copying infrastructure (R4) - Verifies copyScoresFromDef() integration point - Currently no scores to copy (Part 1 foundation only) ## Architecture Benefits 1. **External API Ready**: cdr-cr project can immediately use MeasureReportScoringFhirAdapter 2. **Zero Risk**: No changes to existing evaluation behavior 3. **Version Independence**: New API works with R4 and DSTU3, ready for R5 4. **Safe Refactoring**: Score copying infrastructure in place for Part 2 activation 5. **Backward Compatible**: Old scorers continue working exactly as before ## Part 2 Preview Part 2 will: - Activate MeasureDefScorer in MeasureEvaluationResultHandler - Remove old scorer calls from builders (switch to copyScoresFromDef) - Add Fhir2Def integration tests - Complete the transition to version-agnostic scoring ## Testing - All 987 tests passing (0 failures, 0 errors, 13 skipped) - Added 513 lines of new test coverage - Comprehensive unit tests for new external API - Score copying infrastructure validated (ready for Part 2) ## Statistics - 19 files changed: +5092 insertions, -19 deletions - 3 PRP documents (master plan + Part 1 + Part 2) - No breaking changes to public APIs - All existing tests maintained and passing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

github-actions · 2025-12-10T18:53:31Z

Formatting check succeeded!

codecov · 2025-12-10T19:03:27Z

Codecov Report

❌ Patch coverage is 60.94183% with 141 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.53%. Comparing base (0f85487) to head (e66cce2).
⚠️ Report is 2 commits behind head on master.

Files with missing lines	Patch %	Lines
...ir/cr/measure/dstu3/Dstu3MeasureReportBuilder.java	24.13%	41 Missing and 3 partials ⚠️
...cqf/fhir/cr/measure/r4/R4MeasureReportBuilder.java	61.17%	14 Missing and 19 partials ⚠️
...e/common/Dstu3MeasureReportScoringFhirAdapter.java	58.92%	19 Missing and 4 partials ⚠️
...sure/common/R4MeasureReportScoringFhirAdapter.java	58.92%	19 Missing and 4 partials ⚠️
...asure/common/IMeasureReportScoringFhirAdapter.java	66.66%	2 Missing and 1 partial ⚠️
...easure/common/MeasureReportScoringFhirAdapter.java	85.71%	2 Missing and 1 partial ⚠️
...opencds/cqf/fhir/cr/measure/common/StratumDef.java	66.66%	3 Missing ⚠️
...ncds/cqf/fhir/cr/measure/common/StratifierDef.java	77.77%	1 Missing and 1 partial ⚠️
...cds/cqf/fhir/cr/measure/r4/R4MeasureProcessor.java	50.00%	1 Missing and 1 partial ⚠️
...s/cqf/fhir/cr/measure/common/MeasureDefScorer.java	50.00%	1 Missing ⚠️
... and 4 more

Additional details and impacted files

@@             Coverage Diff              @@
##             master     #859      +/-   ##
============================================
- Coverage     73.68%   73.53%   -0.15%     
  Complexity      277      277              
============================================
  Files           576      580       +4     
  Lines         27159    27513     +354     
  Branches       3459     3518      +59     
============================================
+ Hits          20011    20233     +222     
- Misses         5432     5522      +90     
- Partials       1716     1758      +42

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

lukedegruchy and others added 8 commits December 2, 2025 16:46

Merge remote-tracking branch 'origin/master' into ld-20121202-measure…

f1ca2c4

…-scorer-count-and-quantity-def-refactoring

Merge remote-tracking branch 'origin/master' into ld-20251203-version…

dd990af

…-agnostic-measure-report-scorer

Merge remote-tracking branch 'origin/master' into ld-20251203-fhir-to…

85d4d27

…-def-test-suite

lukedegruchy changed the title ~~Ld 20251208 integrate new measure scorer~~ Integrate new FHIR-version agnostic measure scoring API from clinical-reasoning Dec 10, 2025

lukedegruchy changed the title ~~Integrate new FHIR-version agnostic measure scoring API from clinical-reasoning~~ Integrate MeasureDefScorer Part 1: Foundation for Version-Agnostic Scoring Dec 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Integrate MeasureDefScorer Part 1: Foundation for Version-Agnostic Scoring #859

Integrate MeasureDefScorer Part 1: Foundation for Version-Agnostic Scoring #859

Uh oh!

lukedegruchy commented Dec 10, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 10, 2025

Uh oh!

codecov bot commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Integrate MeasureDefScorer Part 1: Foundation for Version-Agnostic Scoring #859

Are you sure you want to change the base?

Integrate MeasureDefScorer Part 1: Foundation for Version-Agnostic Scoring #859

Uh oh!

Conversation

lukedegruchy commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Request Description

Summary

Code Review Suggestions

QA Test Suggestions

Setup

Test Cases

Uh oh!

github-actions bot commented Dec 10, 2025

Uh oh!

codecov bot commented Dec 10, 2025

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lukedegruchy commented Dec 10, 2025 •

edited

Loading