feat: LCD-anchored confidence scoring algorithm (v2)#32
Conversation
… policy - T001: PolicyCriterion + PolicyDefinition Pydantic models with LCD metadata - T002: Weighted LCD compliance scoring algorithm with bypass logic - T003: Generic fallback policy builder for unsupported procedure codes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- T001: PolicyCriterion + PolicyDefinition data models - T003: Generic fallback policy for unsupported procedure codes - T004: PolicyRegistry with resolve() and generic fallback - T007: 5 seed policies (MRI Lumbar L34220, MRI Brain L37373, TKA L36575, PT L34049, ESI L39240) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- T005: Add optional policy_id and lcd_reference to PAFormResponse - T006: Evidence extractor accepts PolicyDefinition, includes LCD context in prompts, parses confidence signals Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
# Conflicts: # apps/intelligence/src/models/policy.py
- T008: Form generator delegates scoring to ConfidenceScorer, includes policy metadata - generate_form_data now accepts PolicyDefinition instead of dict - Recommendation and confidence_score come from calculate_confidence() - PAFormResponse includes policy_id and lcd_reference fields - T009: Analyze endpoint uses PolicyRegistry.resolve() instead of hardcoded policy - Removed SUPPORTED_PROCEDURE_CODES gate (unknown CPTs get generic policy) - Removed EXAMPLE_POLICY import and dead _build_field_mappings helper - evidence_extractor accepts PolicyDefinition | dict via _normalize_criteria() - T010: All 34 tests pass Prerequisite modules created: - src/models/policy.py: PolicyCriterion, PolicyDefinition - src/reasoning/confidence_scorer.py: ScoreResult, calculate_confidence - src/policies/generic_policy.py: build_generic_policy - src/policies/registry.py: PolicyRegistry with 5 seed LCD policies Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Caution Review failedThe pull request is closed. 📝 WalkthroughWalkthroughIntroduces LCD-backed PolicyDefinition models, a PolicyRegistry with seeded policies and generic fallback, a weighted confidence scorer, and wiring to use registry-resolved policies in evidence extraction, scoring, and form generation; removes hard-coded CPT validation and related mapping helpers. Changes
Sequence DiagramsequenceDiagram
participant Client
participant AnalyzeAPI as Analyze API
participant Registry as PolicyRegistry
participant Extractor as EvidenceExtractor
participant LLM
participant Scorer as ConfidenceScorer
participant Generator as FormGenerator
Client->>AnalyzeAPI: POST /analyze (clinical_bundle, procedure_code)
AnalyzeAPI->>Registry: resolve(procedure_code)
Registry-->>AnalyzeAPI: PolicyDefinition
AnalyzeAPI->>Extractor: extract_evidence(clinical_bundle, policy)
Extractor->>LLM: evaluate_criterion (concurrent per criterion)
LLM-->>Extractor: assessment + confidence
Extractor-->>AnalyzeAPI: list[EvidenceItem]
AnalyzeAPI->>Scorer: calculate_confidence(evidence, policy)
Scorer-->>AnalyzeAPI: ScoreResult(score, recommendation)
AnalyzeAPI->>Generator: generate_form_data(clinical_bundle, evidence, policy)
Generator-->>AnalyzeAPI: PAFormResponse (includes policy_id, lcd_reference)
AnalyzeAPI-->>Client: PAFormResponse
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly Related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (4)
apps/intelligence/src/reasoning/confidence_scorer.py (1)
17-21: Consider using Pydantic BaseModel for consistency.The codebase uses Pydantic for data validation (per coding guidelines). While
dataclassworks fine for this simple container, usingBaseModelwould maintain consistency and provide validation/serialization for free.♻️ Optional: Convert to Pydantic model
-from dataclasses import dataclass +from pydantic import BaseModel -@dataclass -class ScoreResult: - score: float - recommendation: Literal["APPROVE", "MANUAL_REVIEW", "NEED_INFO"] +class ScoreResult(BaseModel): + score: float + recommendation: Literal["APPROVE", "MANUAL_REVIEW", "NEED_INFO"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/intelligence/src/reasoning/confidence_scorer.py` around lines 17 - 21, Replace the dataclass ScoreResult with a Pydantic BaseModel to align with project validation/serialization conventions: convert the class named ScoreResult to inherit from pydantic.BaseModel, keep the fields score: float and recommendation: Literal["APPROVE","MANUAL_REVIEW","NEED_INFO"], and add any needed type validators or field constraints (e.g., ensure score is within expected range) so serialization and validation behave like other models in the codebase.apps/intelligence/src/tests/test_confidence_scorer.py (1)
8-9: Parameteridshadows builtin.Minor style issue:
idshadows the Python builtin. Consider renaming tocriterion_idfor clarity.Rename parameter
-def _make_criterion(id: str, weight: float, required: bool = False, bypasses: list[str] | None = None) -> PolicyCriterion: - return PolicyCriterion(id=id, description=f"Test {id}", weight=weight, required=required, bypasses=bypasses or []) +def _make_criterion(criterion_id: str, weight: float, required: bool = False, bypasses: list[str] | None = None) -> PolicyCriterion: + return PolicyCriterion(id=criterion_id, description=f"Test {criterion_id}", weight=weight, required=required, bypasses=bypasses or [])🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/intelligence/src/tests/test_confidence_scorer.py` around lines 8 - 9, The helper function _make_criterion currently uses parameter name id which shadows the built-in; rename the parameter to criterion_id (and update its type hint) and replace all uses inside the function (e.g., the PolicyCriterion(id=...) argument) and any test calls in this file to use criterion_id instead of id so the code no longer hides the builtin and remains clear; ensure the function signature and all references (calls to _make_criterion) are updated consistently.apps/intelligence/src/policies/registry.py (1)
13-15: Consider warning on CPT collision during registration.If two policies claim the same CPT code, the second silently overwrites the first. This is likely fine for the current seed-only use case, but could cause subtle bugs if policies are dynamically registered later.
Optional: Add collision detection
def register(self, policy: PolicyDefinition) -> None: for cpt in policy.procedure_codes: + if cpt in self._by_cpt: + import logging + logging.warning(f"CPT {cpt} already registered; overwriting with {policy.policy_id}") self._by_cpt[cpt] = policy🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/intelligence/src/policies/registry.py` around lines 13 - 15, The register method currently overwrites existing entries in self._by_cpt when two PolicyDefinition objects share a procedure_codes value; update register(self, policy: PolicyDefinition) to detect collisions by checking "if cpt in self._by_cpt" before assignment and emit a warning (using the module logger or warnings.warn) that includes the CPT value and both the existing policy and the incoming policy identifiers (e.g., policy.name or repr(policy) and repr(self._by_cpt[cpt])) so developers can see which policy is being replaced, then continue to assign self._by_cpt[cpt] = policy.apps/intelligence/src/models/policy.py (1)
6-14: Consider adding Field constraints forweight.The comment indicates weight should be 0.0-1.0, but there's no validation. Malformed policy definitions could produce unexpected confidence scores.
Add Field validation
+from pydantic import BaseModel, Field -from pydantic import BaseModel class PolicyCriterion(BaseModel): """A single criterion from a coverage policy.""" id: str description: str - weight: float # 0.0-1.0, clinical importance + weight: float = Field(ge=0.0, le=1.0, description="Clinical importance weight") required: bool = False # Hard gate — if NOT_MET, caps score lcd_section: str | None = None # e.g. "L34220 §4.2" bypasses: list[str] = [] # criterion IDs this one bypasses when MET🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/intelligence/src/models/policy.py` around lines 6 - 14, The PolicyCriterion model lacks validation for weight and uses a mutable default for bypasses; update the weight field in class PolicyCriterion to enforce 0.0-1.0 bounds using a Pydantic Field (e.g., Field(ge=0.0, le=1.0)) and change bypasses to use a safe default factory (e.g., list) instead of a literal empty list; import Field from pydantic if not already present so validation applies at model init.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/intelligence/src/policies/seed/__init__.py`:
- Around line 11-13: The function register_all_seeds lacks a type annotation for
its registry parameter; update its signature to include the appropriate registry
interface/type (the object expected by register_all_seeds that exposes
register), e.g., annotate registry with the correct protocol or concrete class
used in this module so callers and linters know its type (refer to
register_all_seeds, ALL_SEED_POLICIES, and the registry.register usage to
determine the proper type to import and use).
In `@apps/intelligence/src/reasoning/evidence_extractor.py`:
- Around line 34-41: The dict branch drops lcd_section by setting it to None;
change it to read lcd_section from the dict (e.g., lcd_section =
criterion.get("lcd_section")) so that any provided LCD context is preserved;
update the code around the PolicyCriterion check where criterion_id,
criterion_desc, and lcd_section are set to use criterion.get("id", "unknown"),
criterion.get("description", ""), and criterion.get("lcd_section") respectively
(defaulting to None if absent).
In `@apps/intelligence/src/reasoning/form_generator.py`:
- Line 46: Replace the hardcoded CPT fallback "72148" used when procedure_codes
is empty: in the assignment to procedure_code (variable procedure_code,
referencing policy.procedure_codes) either default to a neutral sentinel like
"Unknown" (procedure_code = policy.procedure_codes[0] if policy.procedure_codes
else "Unknown") or raise/propagate a validation exception (e.g., raise
PolicyValidationError("missing procedure_codes") or ValueError) so callers can
handle invalid policies; if you choose to raise, update any callers of the form
generation path to catch/handle that exception.
In `@apps/intelligence/src/tests/test_pa_form_model.py`:
- Line 2: The test module test_pa_form_model contains an unused top-level import
"import pytest"; remove that import line (or if you intended to use pytest
fixtures/assertions, update the tests to reference pytest) so the module no
longer has an unused import; look for the "import pytest" statement at the top
of test_pa_form_model and delete it.
In `@apps/intelligence/src/tests/test_policy_model.py`:
- Line 2: Remove the unused top-level import "pytest" from the test module (the
import statement that reads "import pytest"), leaving only the necessary imports
for tests in test_policy_model.py; if you expect to use pytest fixtures later,
instead add a TODO comment or reintroduce the import when needed.
In `@apps/intelligence/src/tests/test_policy_registry.py`:
- Around line 46-57: The test test_all_seed_cpts_resolve_to_lcd_policy has a
mismatched docstring ("All 14 seed CPT codes") versus the actual seed_cpts list
(12 items); fix by either updating the docstring to "All 12 seed CPT codes" or
adding the two missing CPT strings to the seed_cpts list so it truly contains 14
entries, ensuring each CPT is validated via registry.resolve and the assert
remains unchanged.
---
Nitpick comments:
In `@apps/intelligence/src/models/policy.py`:
- Around line 6-14: The PolicyCriterion model lacks validation for weight and
uses a mutable default for bypasses; update the weight field in class
PolicyCriterion to enforce 0.0-1.0 bounds using a Pydantic Field (e.g.,
Field(ge=0.0, le=1.0)) and change bypasses to use a safe default factory (e.g.,
list) instead of a literal empty list; import Field from pydantic if not already
present so validation applies at model init.
In `@apps/intelligence/src/policies/registry.py`:
- Around line 13-15: The register method currently overwrites existing entries
in self._by_cpt when two PolicyDefinition objects share a procedure_codes value;
update register(self, policy: PolicyDefinition) to detect collisions by checking
"if cpt in self._by_cpt" before assignment and emit a warning (using the module
logger or warnings.warn) that includes the CPT value and both the existing
policy and the incoming policy identifiers (e.g., policy.name or repr(policy)
and repr(self._by_cpt[cpt])) so developers can see which policy is being
replaced, then continue to assign self._by_cpt[cpt] = policy.
In `@apps/intelligence/src/reasoning/confidence_scorer.py`:
- Around line 17-21: Replace the dataclass ScoreResult with a Pydantic BaseModel
to align with project validation/serialization conventions: convert the class
named ScoreResult to inherit from pydantic.BaseModel, keep the fields score:
float and recommendation: Literal["APPROVE","MANUAL_REVIEW","NEED_INFO"], and
add any needed type validators or field constraints (e.g., ensure score is
within expected range) so serialization and validation behave like other models
in the codebase.
In `@apps/intelligence/src/tests/test_confidence_scorer.py`:
- Around line 8-9: The helper function _make_criterion currently uses parameter
name id which shadows the built-in; rename the parameter to criterion_id (and
update its type hint) and replace all uses inside the function (e.g., the
PolicyCriterion(id=...) argument) and any test calls in this file to use
criterion_id instead of id so the code no longer hides the builtin and remains
clear; ensure the function signature and all references (calls to
_make_criterion) are updated consistently.
| def register_all_seeds(registry) -> None: | ||
| for policy in ALL_SEED_POLICIES: | ||
| registry.register(policy) |
There was a problem hiding this comment.
Add type annotation for registry parameter.
Per coding guidelines, all functions must have complete type annotations.
🔧 Add type hint
+from typing import TYPE_CHECKING
+
+if TYPE_CHECKING:
+ from src.policies.registry import PolicyRegistry
+
-def register_all_seeds(registry) -> None:
+def register_all_seeds(registry: "PolicyRegistry") -> None:
for policy in ALL_SEED_POLICIES:
registry.register(policy)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/intelligence/src/policies/seed/__init__.py` around lines 11 - 13, The
function register_all_seeds lacks a type annotation for its registry parameter;
update its signature to include the appropriate registry interface/type (the
object expected by register_all_seeds that exposes register), e.g., annotate
registry with the correct protocol or concrete class used in this module so
callers and linters know its type (refer to register_all_seeds,
ALL_SEED_POLICIES, and the registry.register usage to determine the proper type
to import and use).
| if isinstance(criterion, PolicyCriterion): | ||
| criterion_id = criterion.id | ||
| criterion_desc = criterion.description | ||
| lcd_section = criterion.lcd_section | ||
| else: | ||
| criterion_id = criterion.get("id", "unknown") | ||
| criterion_desc = criterion.get("description", "") | ||
| lcd_section = None |
There was a problem hiding this comment.
Preserve lcd_section for dict-based criteria.
Currently Line 41 forces lcd_section = None, so LCD context is dropped even if provided in dict form.
🛠️ Suggested fix
- lcd_section = None
+ lcd_section = criterion.get("lcd_section")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if isinstance(criterion, PolicyCriterion): | |
| criterion_id = criterion.id | |
| criterion_desc = criterion.description | |
| lcd_section = criterion.lcd_section | |
| else: | |
| criterion_id = criterion.get("id", "unknown") | |
| criterion_desc = criterion.get("description", "") | |
| lcd_section = None | |
| if isinstance(criterion, PolicyCriterion): | |
| criterion_id = criterion.id | |
| criterion_desc = criterion.description | |
| lcd_section = criterion.lcd_section | |
| else: | |
| criterion_id = criterion.get("id", "unknown") | |
| criterion_desc = criterion.get("description", "") | |
| lcd_section = criterion.get("lcd_section") |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/intelligence/src/reasoning/evidence_extractor.py` around lines 34 - 41,
The dict branch drops lcd_section by setting it to None; change it to read
lcd_section from the dict (e.g., lcd_section = criterion.get("lcd_section")) so
that any provided LCD context is preserved; update the code around the
PolicyCriterion check where criterion_id, criterion_desc, and lcd_section are
set to use criterion.get("id", "unknown"), criterion.get("description", ""), and
criterion.get("lcd_section") respectively (defaulting to None if absent).
|
|
||
| has_not_met = any(e.status == "NOT_MET" for e in evidence) | ||
| has_unclear = any(e.status == "UNCLEAR" for e in evidence) | ||
| procedure_code = policy.procedure_codes[0] if policy.procedure_codes else "72148" |
There was a problem hiding this comment.
Avoid hardcoded CPT fallback in Line 46.
Defaulting to "72148" can mislabel non‑MRI policies when procedure_codes is empty. Prefer "Unknown" or raise/propagate a policy validation error.
🛠️ Suggested fix
- procedure_code = policy.procedure_codes[0] if policy.procedure_codes else "72148"
+ procedure_code = policy.procedure_codes[0] if policy.procedure_codes else "Unknown"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| procedure_code = policy.procedure_codes[0] if policy.procedure_codes else "72148" | |
| procedure_code = policy.procedure_codes[0] if policy.procedure_codes else "Unknown" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/intelligence/src/reasoning/form_generator.py` at line 46, Replace the
hardcoded CPT fallback "72148" used when procedure_codes is empty: in the
assignment to procedure_code (variable procedure_code, referencing
policy.procedure_codes) either default to a neutral sentinel like "Unknown"
(procedure_code = policy.procedure_codes[0] if policy.procedure_codes else
"Unknown") or raise/propagate a validation exception (e.g., raise
PolicyValidationError("missing procedure_codes") or ValueError) so callers can
handle invalid policies; if you choose to raise, update any callers of the form
generation path to catch/handle that exception.
| @@ -0,0 +1,60 @@ | |||
| """Tests for PAFormResponse model update.""" | |||
| import pytest | |||
There was a problem hiding this comment.
Unused import.
pytest is imported but not used in this test module.
🧹 Remove unused import
"""Tests for PAFormResponse model update."""
-import pytest
from src.models.pa_form import PAFormResponse📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import pytest | |
| """Tests for PAFormResponse model update.""" | |
| from src.models.pa_form import PAFormResponse |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/intelligence/src/tests/test_pa_form_model.py` at line 2, The test module
test_pa_form_model contains an unused top-level import "import pytest"; remove
that import line (or if you intended to use pytest fixtures/assertions, update
the tests to reference pytest) so the module no longer has an unused import;
look for the "import pytest" statement at the top of test_pa_form_model and
delete it.
| @@ -0,0 +1,75 @@ | |||
| """Tests for policy data models.""" | |||
| import pytest | |||
There was a problem hiding this comment.
Unused import.
pytest is imported but not used in this test module.
🧹 Remove unused import
"""Tests for policy data models."""
-import pytest
from src.models.policy import PolicyCriterion, PolicyDefinition🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/intelligence/src/tests/test_policy_model.py` at line 2, Remove the
unused top-level import "pytest" from the test module (the import statement that
reads "import pytest"), leaving only the necessary imports for tests in
test_policy_model.py; if you expect to use pytest fixtures later, instead add a
TODO comment or reintroduce the import when needed.
| def test_all_seed_cpts_resolve_to_lcd_policy(): | ||
| """All 14 seed CPT codes resolve to LCD-backed policies.""" | ||
| seed_cpts = [ | ||
| "72148", "72149", "72158", # MRI Lumbar | ||
| "70551", "70552", "70553", # MRI Brain | ||
| "27447", # TKA | ||
| "97161", "97162", "97163", # Physical Therapy | ||
| "62322", "62323", # Epidural Steroid | ||
| ] | ||
| for cpt in seed_cpts: | ||
| result = registry.resolve(cpt) | ||
| assert result.lcd_reference is not None, f"CPT {cpt} should have LCD reference" |
There was a problem hiding this comment.
Docstring claims 14 CPTs, but list contains 12.
The docstring states "All 14 seed CPT codes" but the seed_cpts list only contains 12 codes. Either the docstring is stale or two CPTs are missing from the test coverage.
Proposed fix
def test_all_seed_cpts_resolve_to_lcd_policy():
- """All 14 seed CPT codes resolve to LCD-backed policies."""
+ """All 12 seed CPT codes resolve to LCD-backed policies."""
seed_cpts = [
"72148", "72149", "72158", # MRI Lumbar
"70551", "70552", "70553", # MRI Brain
"27447", # TKA
"97161", "97162", "97163", # Physical Therapy
"62322", "62323", # Epidural Steroid
]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_all_seed_cpts_resolve_to_lcd_policy(): | |
| """All 14 seed CPT codes resolve to LCD-backed policies.""" | |
| seed_cpts = [ | |
| "72148", "72149", "72158", # MRI Lumbar | |
| "70551", "70552", "70553", # MRI Brain | |
| "27447", # TKA | |
| "97161", "97162", "97163", # Physical Therapy | |
| "62322", "62323", # Epidural Steroid | |
| ] | |
| for cpt in seed_cpts: | |
| result = registry.resolve(cpt) | |
| assert result.lcd_reference is not None, f"CPT {cpt} should have LCD reference" | |
| def test_all_seed_cpts_resolve_to_lcd_policy(): | |
| """All 12 seed CPT codes resolve to LCD-backed policies.""" | |
| seed_cpts = [ | |
| "72148", "72149", "72158", # MRI Lumbar | |
| "70551", "70552", "70553", # MRI Brain | |
| "27447", # TKA | |
| "97161", "97162", "97163", # Physical Therapy | |
| "62322", "62323", # Epidural Steroid | |
| ] | |
| for cpt in seed_cpts: | |
| result = registry.resolve(cpt) | |
| assert result.lcd_reference is not None, f"CPT {cpt} should have LCD reference" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/intelligence/src/tests/test_policy_registry.py` around lines 46 - 57,
The test test_all_seed_cpts_resolve_to_lcd_policy has a mismatched docstring
("All 14 seed CPT codes") versus the actual seed_cpts list (12 items); fix by
either updating the docstring to "All 12 seed CPT codes" or adding the two
missing CPT strings to the seed_cpts list so it truly contains 14 entries,
ensuring each CPT is validated via registry.resolve and the assert remains
unchanged.
GenerateIdAsync used string OrderByDescending which returned PA-DEMO-004 as max ID. int.TryParse failed on "DEMO-004", counter reset to 1, causing duplicate key PA-001. Fix filters to only PA-NNN IDs and finds true numeric max. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
d98ede4 to
b687927
Compare

Summary
Replaces the naive 3-tier fixed confidence scoring (90%/70%/60%) with a weighted, LCD-anchored algorithm that evaluates individual policy criteria against extracted clinical evidence. Introduces a PolicyRegistry with 5 real CMS Local Coverage Determination seed policies and a generic fallback, enabling granular confidence scoring with bypass logic, hard gates, and empirically-derived recommendations.
Changes
Σ(weight × status × confidence) / Σ(weight × confidence)with hard gate penalties for required NOT_MET criteria and configurable floor/ceilingTest Plan
Results: 93 tests pass | +1,266 / -163 lines | 23 files
Design:
docs/designs/2026-02-22-confidence-algorithm-v2.mdPlan:
docs/plans/2026-02-22-confidence-algorithm-v2.md