From 7e380a23dfcbb56291a7b78df9675ac3135ec67f Mon Sep 17 00:00:00 2001 From: Chris Sewell Date: Mon, 2 Feb 2026 18:08:23 +0100 Subject: [PATCH 1/3] =?UTF-8?q?=F0=9F=94=A7=20Refactor=20`populate=5Ffield?= =?UTF-8?q?=5Ftype`=20to=20use=20type-directed=20schema=20walking?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This PR simplifies the `populate_field_type` function by replacing runtime keyword detection with type-directed traversal of the `SchemasRootType` structure. ## Changes ### Refactored `populate_field_type` (~190 → ~130 lines) **Before:** Generic recursive function operating on `Any`, detecting JSON schema keywords (`properties`, `items`, `contains`, etc.) at runtime to infer structure, with special-case handling for network context. **After:** Type-directed implementation that surgically walks the known typed structure: - `SchemasRootType` → `select` / `validate` - `ValidateSchemaType` → `local` / `network` - `NeedFieldsSchemaType` → `properties` / `allOf` - `ResolvedLinkSchemaType` → recursive `items` / `contains` ### Key improvements 1. **Clearer structure** — 5 focused helper functions with single responsibilities: - `_process_validate_schema` - `_process_resolved_link` - `_process_need_fields_schema` - `_inject_field_type` - `_inject_type_and_item_type` 2. **Eliminated ambiguity** — No more `is_inside_network` path checks; the type structure dictates context 3. **Defensive handling** — All field accesses guarded with `isinstance()` checks; blanket exception handler wraps unexpected errors with context 4. **Unified field lookup** — Single pattern for extra/link/core fields with fallback to `NeedsCoreFields` ### Added tests New test file `tests/schema/test_populate_field_type.py` with 32 tests covering: - Type injection for extra fields, link fields, and core fields - Array type injection with `items`/`contains` - Recursive processing of `network` and `allOf` - Error handling for unknown fields, type mismatches, and malformed schemas - Defensive behavior for missing/invalid structure ## Breaking changes None. All existing tests pass. Error messages for malformed schemas may differ slightly in wording but still provide equivalent information. --- docs/schema/local_validation_cache.md | 294 ++++++++ sphinx_needs/schema/config_utils.py | 412 ++++++----- tests/schema/__snapshots__/test_schema.ambr | 12 +- tests/schema/test_populate_field_type.py | 728 ++++++++++++++++++++ 4 files changed, 1264 insertions(+), 182 deletions(-) create mode 100644 docs/schema/local_validation_cache.md create mode 100644 tests/schema/test_populate_field_type.py diff --git a/docs/schema/local_validation_cache.md b/docs/schema/local_validation_cache.md new file mode 100644 index 000000000..32c06bad1 --- /dev/null +++ b/docs/schema/local_validation_cache.md @@ -0,0 +1,294 @@ +# Local Validation Caching System + +This document explains the local validation caching system implemented in `sphinx_needs/schema/core.py` to optimize schema validation performance. + +## Overview + +The caching system avoids redundant JSON schema validation when the same need is validated against the same schema multiple times. This is common in network validation where linked needs are traversed repeatedly from different starting points. + +## Problem Statement + +In network validation, the same need can be validated multiple times: + +``` +Need REQ_002 links to REQ_001 +Need REQ_003 links to REQ_001 + + ┌────────┐ links ┌────────┐ + │REQ_002 │──────────────►│REQ_001 │ + └────────┘ └────────┘ + ▲ + ┌────────┐ links │ + │REQ_003 │────────────────────┘ + └────────┘ + +Validation order (without cache): +1. Validate REQ_002 + └── Network: validate REQ_001 (expensive!) + +2. Validate REQ_003 + └── Network: validate REQ_001 (same validation repeated!) +``` + +## Solution: Higher-Level Caching + +Rather than restructuring the recursive algorithm (which would change warning attribution), we cache at a higher level: the raw validation results before context-dependent data is added. + +## Data Structures + +### CachedLocalResult + +Stores the raw validation outcome without context-dependent data: + +```python +@dataclass(slots=True) +class CachedLocalResult: + reduced_need: dict[str, Any] # The reduced need that was validated + errors: tuple[ValidationError, ...] # Validation errors (empty if passed) + schema_error: str | None = None # Schema compilation error, if any + + @property + def success(self) -> bool: + return not self.errors and self.schema_error is None +``` + +### LocalValidationCache + +A cache keyed by `(need_id, schema_key)`: + +```python +@dataclass(slots=True) +class LocalValidationCache: + _cache: dict[tuple[str, tuple[str, ...]], CachedLocalResult] + + def get(self, need_id: str, schema_key: tuple[str, ...]) -> CachedLocalResult | None + def store(self, need_id: str, schema_key: tuple[str, ...], result: CachedLocalResult) -> None + def invalidate(self, need_id: str) -> None # Remove all entries for a need + def clear(self) -> None +``` + +## Cache Key Structure + +``` +schema_key = tuple of schema path components + +Examples: + ("requirement_schema", "local") + ("requirement_schema", "validate", "network", "implements", "items", "local") + ("specification_schema", "local") + +Combined with need_id for full cache key: + ("REQ_001", ("requirement_schema", "local")) + ("REQ_001", ("requirement_schema", "validate", "network", ...)) +``` + +## Validation Flow + +### Without Cache + +``` +validate_type_schema(needs, schema) + │ + ▼ + ┌───────────────┐ + │ For each need │ + └───────────────┘ + │ + ▼ +recurse_validate_schemas(need, ...) + │ + ├──► LOCAL: get_ontology_warnings() + │ │ + │ ▼ + │ ┌─────────────────────────┐ + │ │ 1. reduce_need() │ ◄── Expensive + │ │ 2. validator.validate() │ ◄── Expensive + │ │ 3. Build warnings │ + │ └─────────────────────────┘ + │ + └──► NETWORK: For each link_type + │ + ▼ + For each target_need_id + │ + ▼ + recurse_validate_schemas(target_need, ...) ◄── Same need may be + │ validated multiple + ▼ times! + (repeat LOCAL + NETWORK) +``` + +### With Cache + +``` +validate_type_schema(needs, schema) + │ + ▼ + ┌─────────────────────────────┐ + │ Create LocalValidationCache │ + │ Create validator_cache │ + └─────────────────────────────┘ + │ + ▼ + ┌───────────────┐ + │ For each need │ + └───────────────┘ + │ + ▼ +recurse_validate_schemas(need, ..., local_cache) + │ + ├──► LOCAL validation: + │ │ + │ ▼ + │ ┌──────────────────────────────────┐ + │ │ cache_key = (schema_path, "local")│ + │ │ cached = local_cache.get( │ + │ │ need_id, cache_key) │ + │ └──────────────────────────────────┘ + │ │ + │ ┌──────┴──────┐ + │ │ │ + │ ▼ ▼ + │ CACHE HIT CACHE MISS + │ │ │ + │ ▼ ▼ + │ ┌────────┐ ┌─────────────────────┐ + │ │Rebuild │ │ 1. reduce_need() │ + │ │warnings│ │ 2. validate() │ + │ │from │ │ 3. Cache result │ + │ │cached │ │ 4. Build warnings │ + │ │result │ └─────────────────────┘ + │ └────────┘ + │ + └──► NETWORK: (unchanged, passes local_cache down) +``` + +## What Gets Cached vs. Rebuilt + +### Cached (Context-Independent) + +| Field | Description | +|-------|-------------| +| `reduced_need` | The need dict after field reduction | +| `errors` | Tuple of `ValidationError` objects | +| `schema_error` | Schema compilation error message | + +### Rebuilt (Context-Dependent) + +| Field | Description | +|-------|-------------| +| `need_path` | Path like `["REQ_002", "implements", "REQ_001"]` | +| `schema_path` | Path like `["my_schema", "local", "properties"]` | +| `user_message` | Only at `recurse_level == 0` | +| `user_severity` | Only at `recurse_level == 0` | +| `rule` | `local_fail` vs `network_local_fail` | + +This separation is key: the expensive validation is cached, but warnings are rebuilt with the correct context for each call site. + +## Invalidation for Incremental Updates + +When a need is modified, its cached entries can be invalidated: + +``` +Before modification: +┌─────────────────────────────────────────┐ +│ Cache entries: │ +│ (REQ_001, schema_A) → result1 │ +│ (REQ_001, schema_B) → result2 │ +│ (REQ_002, schema_A) → result3 │ +│ (REQ_003, schema_A) → result4 │ +└─────────────────────────────────────────┘ + +After local_cache.invalidate("REQ_001"): +┌─────────────────────────────────────────┐ +│ Cache entries: │ +│ (REQ_002, schema_A) → result3 ✓ │ +│ (REQ_003, schema_A) → result4 ✓ │ +└─────────────────────────────────────────┘ +``` + +## Code Changes + +### Modified Files + +#### `sphinx_needs/schema/core.py` + +1. **Added imports:** + ```python + from dataclasses import dataclass, field as dataclass_field + ``` + +2. **Added `CachedLocalResult` dataclass** (after line 35): + - Stores validation results without context + - Has `success` property + +3. **Added `LocalValidationCache` dataclass:** + - `get()` - Retrieve cached result + - `store()` - Store a validation result + - `invalidate()` - Remove all entries for a need + - `clear()` - Clear entire cache + - `__len__()` - Return cache size + +4. **Modified `validate_type_schema()`:** + - Creates `LocalValidationCache` instance + - Passes `local_cache` to `recurse_validate_schemas()` + +5. **Modified `recurse_validate_schemas()`:** + - Added `local_cache: LocalValidationCache` parameter + - Checks cache before validating + - Uses `_construct_warnings_from_cache()` on cache hit + - Passes `local_cache` and `cache_key` to `get_ontology_warnings()` on cache miss + +6. **Added `_construct_warnings_from_cache()` function:** + - Rebuilds `OntologyWarning` objects from cached results + - Applies current context (need_path, schema_path, user_message, user_severity) + +7. **Modified `get_ontology_warnings()`:** + - Added optional `local_cache` and `cache_key` parameters + - Stores results in cache when parameters are provided + +### New Test Files + +#### `tests/schema/test_core_cache.py` + +Unit tests for the caching classes: + +- `TestCachedLocalResult` - Tests for the result dataclass +- `TestLocalValidationCache` - Tests for cache operations including: + - Basic get/store + - Different schema keys for same need + - Different needs with same schema + - Invalidation + - Clear + - Overwrite + +## Usage Example + +```python +from sphinx_needs.schema.core import LocalValidationCache, validate_type_schema + +# The cache is created and used automatically within validate_type_schema +# No changes needed to calling code + +# For incremental updates (future use): +cache = LocalValidationCache() +# ... validation happens ... + +# When a need is modified: +cache.invalidate("REQ_001") # Clear cached results for this need + +# Re-validate only affected needs +``` + +## Performance Impact + +The caching is most effective when: + +1. **Many needs link to the same targets** - Each target is validated once instead of N times +2. **Deep network validation** - Reduces exponential re-validation +3. **Multiple schemas validate the same needs** - Each (need, schema) pair is cached separately + +The cache has minimal overhead: +- Cache lookup is O(1) dict access +- Stored data is already computed (no copies needed) +- Memory usage scales with unique (need_id, schema_key) pairs diff --git a/sphinx_needs/schema/config_utils.py b/sphinx_needs/schema/config_utils.py index 3920e384b..64bd349eb 100644 --- a/sphinx_needs/schema/config_utils.py +++ b/sphinx_needs/schema/config_utils.py @@ -20,6 +20,8 @@ AllOfSchemaType, ExtraOptionAndLinkSchemaTypes, NeedFieldsSchemaType, + RefItemType, + ResolvedLinkSchemaType, SchemasRootType, ValidateSchemaType, get_schema_name, @@ -261,199 +263,212 @@ def resolve_refs( def populate_field_type( - curr_item: Any, + schema: SchemasRootType, schema_name: str, fields_schema: FieldsSchema, - path: str = "", ) -> None: """ Inject field type into select / validate fields in schema. - If a schema is defined on extra options, the type fields is - required. It is the primary type information for the field. - The JSON schema definition may also contain the type (NotRequired) - but it must match the extra option type. If it is not given, it - is injected by this function. - - Users might not be aware that schema validation will not complain if e.g. - a minimium is set for an integer, but the type is not. JSON schema - validators only complain if the type matches to the constraint. - - If the field is of type extra option but no type is defined, string is assumed. - For link field types, an array of strings is the only possible value. - Due to this, it is NotRequired for extra link schema configuration. + This is a type-directed implementation that surgically walks through the + typed structure of SchemasRootType, rather than using runtime keyword detection. - :param curr_item: The current schema item being processed - :param schema_name: The name/identifier of the root schema + :param schema: The root schema being processed. + This is not yet validated, but is expected to be dereferenced already. + :param schema_name: The name/identifier of the schema for error reporting :param fields_schema: The fields schema containing field definitions - :param path: The current path in the schema hierarchy for error reporting """ - # TODO(Marco): this function could be improved to run on defined types, not on Any; - # this would make the detection of 'array' or 'object' safer; - # however, type check looks over the final schema anyway - if isinstance(curr_item, dict): - # set 'object' type - keys_indicating_object = {"properties", "required", "unevaluatedProperties"} - found_keys_object = [key for key in keys_indicating_object if key in curr_item] - if found_keys_object: - if "type" not in curr_item: - curr_item["type"] = "object" - else: - if curr_item["type"] != "object": - raise NeedsConfigException( - f"Config error in schema '{schema_name}' at path '{path}': " - f"Item has keys {found_keys_object}, but type is '{curr_item['type']}', " - f"expected 'object'." + try: + # Handle 'select' (optional) + if "select" in schema: + select = schema["select"] + if isinstance(select, dict): + _process_need_fields_schema( + select, schema_name, fields_schema, "select" + ) + + # Handle 'validate' (may not be present if schema is malformed) + if "validate" in schema: + validate = schema["validate"] + if isinstance(validate, dict): + _process_validate_schema( + validate, schema_name, fields_schema, "validate" + ) + except NeedsConfigException: + # Re-raise our own exceptions + raise + except Exception as exc: + # Catch unexpected errors due to malformed schema structure + raise NeedsConfigException( + f"Unexpected error while processing schema '{schema_name}': {exc}" + ) from exc + + +def _process_validate_schema( + validate: ValidateSchemaType, + schema_name: str, + fields_schema: FieldsSchema, + path: str, +) -> None: + """Process ValidateSchemaType - has local and network fields.""" + # Handle 'local' (optional) + if "local" in validate: + local = validate["local"] + if isinstance(local, dict): + _process_need_fields_schema( + local, schema_name, fields_schema, f"{path}.local" + ) + + # Handle 'network' (optional) - dict of link_name -> ResolvedLinkSchemaType + if "network" in validate: + network = validate["network"] + if isinstance(network, dict): + for link_name, resolved_link in network.items(): + if isinstance(resolved_link, dict): + _process_resolved_link( + resolved_link, + schema_name, + fields_schema, + f"{path}.network.{link_name}", ) - # Skip array type injection if we're directly inside a 'network' context, - # because in that case keys like 'contains', 'items' are link field names, - # not JSON schema array keywords - is_inside_network = path.endswith(".network") or path == "validate.network" - - keys_indicating_array = { - "items", - "contains", - "minItems", - "maxItems", - "minContains", - "maxContains", - } - found_keys_array = [key for key in keys_indicating_array if key in curr_item] - if ( - any(key in curr_item for key in keys_indicating_array) - and not is_inside_network - ): - if "type" not in curr_item: - curr_item["type"] = "array" - else: - if curr_item["type"] != "array": - raise NeedsConfigException( - f"Config error in schema '{schema_name}' at path '{path}': " - f"Item has keys {found_keys_array}, but type is '{curr_item['type']}', " - f"expected 'array'." + + +def _process_resolved_link( + resolved: ResolvedLinkSchemaType, + schema_name: str, + fields_schema: FieldsSchema, + path: str, +) -> None: + """Process ResolvedLinkSchemaType - inject 'array' type, recurse into items/contains.""" + # Inject type="array" for the resolved link array (always an array of needs) + if "type" not in resolved: + resolved["type"] = "array" + + # 'items' contains ValidateSchemaType (recursive) + if "items" in resolved: + items = resolved["items"] + if isinstance(items, dict): + _process_validate_schema(items, schema_name, fields_schema, f"{path}.items") + + # 'contains' contains ValidateSchemaType (recursive) + if "contains" in resolved: + contains = resolved["contains"] + if isinstance(contains, dict): + _process_validate_schema( + contains, schema_name, fields_schema, f"{path}.contains" + ) + + +def _process_need_fields_schema( + schema: RefItemType | AllOfSchemaType | NeedFieldsSchemaType, + schema_name: str, + fields_schema: FieldsSchema, + path: str, +) -> None: + """Process a schema that could be RefItemType, AllOfSchemaType, or NeedFieldsSchemaType.""" + # Skip if it's a $ref (should be resolved by this point, but guard for type safety) + if "$ref" in schema: + return + + # Cast to NeedFieldsSchemaType for property access (safe after $ref check) + need_schema = cast(NeedFieldsSchemaType, schema) + + # Inject 'object' type if properties/required/unevaluatedProperties exist + if ( + "properties" in need_schema + or "required" in need_schema + or "unevaluatedProperties" in need_schema + ): + if "type" not in need_schema: + need_schema["type"] = "object" + elif need_schema["type"] != "object": + raise NeedsConfigException( + f"Config error in schema '{schema_name}' at path '{path}': " + f"Schema has object keywords but type is '{need_schema['type']}', expected 'object'." + ) + + # Process properties - THE KEY INJECTION POINT + if "properties" in need_schema: + properties = need_schema["properties"] + if isinstance(properties, dict): + properties_path = f"{path}.properties" + for field_name, field_schema in properties.items(): + if isinstance(field_name, str) and isinstance(field_schema, dict): + _inject_field_type( + field_name, + field_schema, + schema_name, + fields_schema, + f"{properties_path}.{field_name}", ) - found_properties_or_all_of = False - if ( - "properties" in curr_item - and isinstance(curr_item["properties"], dict) - and all(isinstance(k, str) for k in curr_item["properties"]) - and all(isinstance(v, dict) for v in curr_item["properties"].values()) - ): - found_properties_or_all_of = True - properties_path = f"{path}.properties" if path else "properties" - - for key, value in curr_item["properties"].items(): - field_path = f"{properties_path}.{key}" - - if fields_schema.get_extra_field(key) is not None: - extra_field = fields_schema.get_extra_field(key) - assert extra_field is not None # Type narrowing for pylance - if "type" not in value: - value["type"] = extra_field.type - else: - if value["type"] != extra_field.type: - raise NeedsConfigException( - f"Config error in schema '{schema_name}' at path '{field_path}': " - f"Field '{key}' has type '{value['type']}', but expected '{extra_field.type}'." - ) - if extra_field.type == "array": - container_fields = ["items", "contains"] - for container_field in container_fields: - if container_field in value: - container_path = f"{field_path}.{container_field}" - if "type" not in value[container_field]: - value[container_field]["type"] = ( - extra_field.item_type - ) - else: - if ( - value[container_field]["type"] - != extra_field.item_type - ): - raise NeedsConfigException( - f"Config error in schema '{schema_name}' at path '{container_path}': " - f"Field '{key}' has {container_field}.type '{value[container_field]['type']}', " - f"but expected '{extra_field.item_type}'." - ) - elif fields_schema.get_link_field(key) is not None: - link_field = fields_schema.get_link_field(key) - assert link_field is not None # Type narrowing for pylance - if "type" not in value: - value["type"] = "array" - else: - if value["type"] != "array": - raise NeedsConfigException( - f"Config error in schema '{schema_name}' at path '{field_path}': " - f"Field '{key}' has type '{value['type']}', but expected 'array'." - ) - container_fields = ["items", "contains"] - for container_field in container_fields: - if container_field in value: - container_path = f"{field_path}.{container_field}" - if "type" not in value[container_field]: - value[container_field]["type"] = "string" - else: - if value[container_field]["type"] != "string": - raise NeedsConfigException( - f"Config error in schema '{schema_name}' at path '{container_path}': " - f"Field '{key}' has {container_field}.type '{value[container_field]['type']}', " - f"but expected 'string'." - ) - else: - # first try to resolve from FieldsSchema - core_field = fields_schema.get_core_field(key) - if core_field is not None: - _type = core_field.type - _item_type = core_field.item_type - else: - # no success, look in NeedsCoreFields - core_field_result = get_core_field_type(key) - if core_field_result is None: - # field is unknown - raise NeedsConfigException( - f"Config error in schema '{schema_name}' at path '{field_path}': " - f"Field '{key}' is not a known extra option, extra link, or core field." - ) - _type, _item_type = core_field_result - if "type" not in value: - value["type"] = _type - else: - if value["type"] != _type: - raise NeedsConfigException( - f"Config error in schema '{schema_name}' at path '{field_path}': " - f"Field '{key}' has type '{value['type']}', but expected '{_type}'." - ) - if _type == "array": - container_fields = ["items", "contains"] - for container_field in container_fields: - if container_field in value: - container_path = f"{field_path}.{container_field}" - if "type" not in value[container_field]: - value[container_field]["type"] = _item_type - else: - if value[container_field]["type"] != _item_type: - raise NeedsConfigException( - f"Config error in schema '{schema_name}' at path '{container_path}': " - f"Field '{key}' has {container_field}.type '{value[container_field]['type']}', " - f"but expected '{_item_type}'." - ) - if "allOf" in curr_item and isinstance(curr_item["allOf"], list): - found_properties_or_all_of = True - for index, value in enumerate(curr_item["allOf"]): - all_of_path = f"{path}.allOf[{index}]" if path else f"allOf[{index}]" - populate_field_type(value, schema_name, fields_schema, all_of_path) - if not found_properties_or_all_of: - # iterate deeper - for key, value in curr_item.items(): - current_path = f"{path}.{key}" if path else key - populate_field_type(value, schema_name, fields_schema, current_path) - elif isinstance(curr_item, list): - for index, value in enumerate(curr_item): - current_path = f"{path}[{index}]" if path else f"[{index}]" - populate_field_type(value, schema_name, fields_schema, current_path) + + # Handle allOf (recurse into each element) + if "allOf" in need_schema: + all_of = need_schema["allOf"] + if isinstance(all_of, list): + for idx, item in enumerate(all_of): + if isinstance(item, dict): + _process_need_fields_schema( + item, schema_name, fields_schema, f"{path}.allOf[{idx}]" + ) + + +def _inject_field_type( + field_name: str, + field_schema: ExtraOptionAndLinkSchemaTypes, + schema_name: str, + fields_schema: FieldsSchema, + path: str, +) -> None: + """Inject type into a single field schema based on FieldsSchema lookup.""" + if (extra_field := fields_schema.get_extra_field(field_name)) is not None: + _inject_type_and_item_type( + field_schema, + extra_field.type, + extra_field.item_type, + field_name, + schema_name, + path, + ) + elif (link_field := fields_schema.get_link_field(field_name)) is not None: + # Link fields are always array of strings + _inject_type_and_item_type( + field_schema, + link_field.type, + link_field.item_type, + field_name, + schema_name, + path, + ) + elif (core_field := fields_schema.get_core_field(field_name)) is not None: + _inject_type_and_item_type( + field_schema, + core_field.type, + core_field.item_type, + field_name, + schema_name, + path, + ) + else: + # Fallback to NeedsCoreFields lookup (for core fields not yet in FieldsSchema) + core_field_result = _get_core_field_type(field_name) + if core_field_result is None: + raise NeedsConfigException( + f"Config error in schema '{schema_name}' at path '{path}': " + f"Field '{field_name}' is not a known extra option, extra link, or core field." + ) + field_type, item_type = core_field_result + _inject_type_and_item_type( + field_schema, + field_type, + item_type, + field_name, + schema_name, + path, + ) -def get_core_field_type( +def _get_core_field_type( field_name: str, ) -> ( tuple[ @@ -500,3 +515,40 @@ def get_core_field_type( # array without items type is unexpected return None return core_type, None + + +def _inject_type_and_item_type( + field_schema: ExtraOptionAndLinkSchemaTypes, + expected_type: Literal["string", "boolean", "integer", "number", "array"], + expected_item_type: Literal["string", "boolean", "integer", "number"] | None, + field_name: str, + schema_name: str, + path: str, +) -> None: + """Inject type and item type (for arrays) into a field schema, validating any existing values.""" + # Cast to dict for mutation (TypedDict union makes direct assignment complex) + schema_dict = cast(dict[str, Any], field_schema) + + # Inject or validate the main type + if "type" not in schema_dict: + schema_dict["type"] = expected_type + elif schema_dict["type"] != expected_type: + raise NeedsConfigException( + f"Config error in schema '{schema_name}' at path '{path}': " + f"Field '{field_name}' has type '{schema_dict['type']}', but expected '{expected_type}'." + ) + + # For array types, also inject/validate item types in 'items' and 'contains' + if expected_type == "array" and expected_item_type is not None: + for container_key in ("items", "contains"): + if container_key in schema_dict: + container = schema_dict[container_key] + container_path = f"{path}.{container_key}" + if "type" not in container: + container["type"] = expected_item_type + elif container["type"] != expected_item_type: + raise NeedsConfigException( + f"Config error in schema '{schema_name}' at path '{container_path}': " + f"Field '{field_name}' has {container_key}.type '{container['type']}', " + f"but expected '{expected_item_type}'." + ) diff --git a/tests/schema/__snapshots__/test_schema.ambr b/tests/schema/__snapshots__/test_schema.ambr index bf35f61c5..514b48c17 100644 --- a/tests/schema/__snapshots__/test_schema.ambr +++ b/tests/schema/__snapshots__/test_schema.ambr @@ -705,10 +705,18 @@ "Invalid extra option 'efforts': Invalid schema: Extra option schema has invalid type 'unknown'. Must be one of ['array', 'boolean', 'integer', 'number', 'string']." # --- # name: test_schema_config[type_mismatch_array_error] - "Config error in schema '[0]' at path 'validate.local': Item has keys ['items'], but type is 'object', expected 'array'." + ''' + Schemas entry '[0]' is not valid: + {"items":{},"type":"object"} is not valid under any of the schemas listed in the 'anyOf' keyword + + Failed validating "anyOf" in schema["properties"]["validate"]["$ref"]["properties"]["local"] + + On instance["validate"]["local"]: + {"items":{},"type":"object"} + ''' # --- # name: test_schema_config[type_mismatch_object_error] - "Config error in schema '[0]' at path 'validate.local': Item has keys ['properties'], but type is 'array', expected 'object'." + "Config error in schema '[0]' at path 'validate.local': Schema has object keywords but type is 'array', expected 'object'." # --- # name: test_schema_config[type_mismatch_object_extra_option_error] "Config error in schema '[0]' at path 'validate.local.properties.efforts': Field 'efforts' has type 'integer', but expected 'string'." diff --git a/tests/schema/test_populate_field_type.py b/tests/schema/test_populate_field_type.py new file mode 100644 index 000000000..74cfe4fab --- /dev/null +++ b/tests/schema/test_populate_field_type.py @@ -0,0 +1,728 @@ +"""Tests for populate_field_type function in config_utils.""" + +from __future__ import annotations + +from typing import Any +from unittest.mock import MagicMock + +import pytest + +from sphinx_needs.exceptions import NeedsConfigException +from sphinx_needs.needs_schema import FieldSchema, FieldsSchema, LinkSchema +from sphinx_needs.schema.config_utils import populate_field_type + + +def create_mock_fields_schema( + extra_fields: dict[str, tuple[str, str | None]] | None = None, + link_fields: dict[str, tuple[str, str]] | None = None, + core_fields: dict[str, tuple[str, str | None]] | None = None, +) -> FieldsSchema: + """Create a mock FieldsSchema with specified fields. + + :param extra_fields: dict of field_name -> (type, item_type) + :param link_fields: dict of field_name -> (type, item_type) + :param core_fields: dict of field_name -> (type, item_type) + """ + mock = MagicMock(spec=FieldsSchema) + + extra_fields = extra_fields or {} + link_fields = link_fields or {} + core_fields = core_fields or {} + + def get_extra_field(name: str) -> FieldSchema | None: + if name in extra_fields: + field_type, item_type = extra_fields[name] + field = MagicMock(spec=FieldSchema) + field.type = field_type + field.item_type = item_type + return field + return None + + def get_link_field(name: str) -> LinkSchema | None: + if name in link_fields: + field_type, item_type = link_fields[name] + field = MagicMock(spec=LinkSchema) + field.type = field_type + field.item_type = item_type + return field + return None + + def get_core_field(name: str) -> FieldSchema | None: + if name in core_fields: + field_type, item_type = core_fields[name] + field = MagicMock(spec=FieldSchema) + field.type = field_type + field.item_type = item_type + return field + return None + + mock.get_extra_field = get_extra_field + mock.get_link_field = get_link_field + mock.get_core_field = get_core_field + + return mock + + +class TestPopulateFieldTypeSuccess: + """Test successful type injection scenarios.""" + + def test_empty_schema(self) -> None: + """Test schema with only idx (minimal valid structure).""" + schema: dict[str, Any] = {"idx": 0, "validate": {}} + fields_schema = create_mock_fields_schema() + + populate_field_type(schema, "[0]", fields_schema) + + # No changes expected + assert schema == {"idx": 0, "validate": {}} + + def test_select_with_properties_injects_object_type(self) -> None: + """Test that select with properties gets type='object' injected.""" + schema: dict[str, Any] = { + "idx": 0, + "select": {"properties": {}}, + "validate": {}, + } + fields_schema = create_mock_fields_schema() + + populate_field_type(schema, "[0]", fields_schema) + + assert schema["select"]["type"] == "object" + + def test_validate_local_with_properties_injects_object_type(self) -> None: + """Test that validate.local with properties gets type='object' injected.""" + schema: dict[str, Any] = { + "idx": 0, + "validate": {"local": {"properties": {}}}, + } + fields_schema = create_mock_fields_schema() + + populate_field_type(schema, "[0]", fields_schema) + + assert schema["validate"]["local"]["type"] == "object" + + def test_extra_field_type_injection(self) -> None: + """Test type injection for extra option fields.""" + schema: dict[str, Any] = { + "idx": 0, + "validate": { + "local": { + "properties": { + "priority": {}, + } + } + }, + } + fields_schema = create_mock_fields_schema( + extra_fields={"priority": ("string", None)} + ) + + populate_field_type(schema, "[0]", fields_schema) + + assert schema["validate"]["local"]["properties"]["priority"]["type"] == "string" + + def test_extra_field_array_type_injection(self) -> None: + """Test type injection for array extra option fields with items.""" + schema: dict[str, Any] = { + "idx": 0, + "validate": { + "local": { + "properties": { + "categories": {"items": {}}, + } + } + }, + } + fields_schema = create_mock_fields_schema( + extra_fields={"categories": ("array", "string")} + ) + + populate_field_type(schema, "[0]", fields_schema) + + props = schema["validate"]["local"]["properties"]["categories"] + assert props["type"] == "array" + assert props["items"]["type"] == "string" + + def test_extra_field_array_contains_injection(self) -> None: + """Test type injection for array extra option fields with contains.""" + schema: dict[str, Any] = { + "idx": 0, + "validate": { + "local": { + "properties": { + "tags": {"contains": {}}, + } + } + }, + } + fields_schema = create_mock_fields_schema( + extra_fields={"tags": ("array", "string")} + ) + + populate_field_type(schema, "[0]", fields_schema) + + props = schema["validate"]["local"]["properties"]["tags"] + assert props["type"] == "array" + assert props["contains"]["type"] == "string" + + def test_link_field_type_injection(self) -> None: + """Test type injection for link fields.""" + schema: dict[str, Any] = { + "idx": 0, + "validate": { + "local": { + "properties": { + "links": {}, + } + } + }, + } + fields_schema = create_mock_fields_schema( + link_fields={"links": ("array", "string")} + ) + + populate_field_type(schema, "[0]", fields_schema) + + assert schema["validate"]["local"]["properties"]["links"]["type"] == "array" + + def test_link_field_items_type_injection(self) -> None: + """Test type injection for link fields with items.""" + schema: dict[str, Any] = { + "idx": 0, + "validate": { + "local": { + "properties": { + "links": {"items": {}}, + } + } + }, + } + fields_schema = create_mock_fields_schema( + link_fields={"links": ("array", "string")} + ) + + populate_field_type(schema, "[0]", fields_schema) + + props = schema["validate"]["local"]["properties"]["links"] + assert props["type"] == "array" + assert props["items"]["type"] == "string" + + def test_core_field_from_fields_schema(self) -> None: + """Test type injection for core fields from FieldsSchema.""" + schema: dict[str, Any] = { + "idx": 0, + "validate": { + "local": { + "properties": { + "status": {}, + } + } + }, + } + fields_schema = create_mock_fields_schema( + core_fields={"status": ("string", None)} + ) + + populate_field_type(schema, "[0]", fields_schema) + + assert schema["validate"]["local"]["properties"]["status"]["type"] == "string" + + def test_core_field_fallback_to_needs_core_fields(self) -> None: + """Test type injection for core fields via NeedsCoreFields fallback.""" + schema: dict[str, Any] = { + "idx": 0, + "validate": { + "local": { + "properties": { + "id": {}, # 'id' is a known core field + } + } + }, + } + # Empty fields_schema - will fallback to NeedsCoreFields + fields_schema = create_mock_fields_schema() + + populate_field_type(schema, "[0]", fields_schema) + + # 'id' should get type='string' from NeedsCoreFields + assert schema["validate"]["local"]["properties"]["id"]["type"] == "string" + + def test_network_link_array_type_injection(self) -> None: + """Test type='array' injection for network resolved links.""" + schema: dict[str, Any] = { + "idx": 0, + "validate": { + "network": { + "links": {}, + } + }, + } + fields_schema = create_mock_fields_schema( + link_fields={"links": ("array", "string")} + ) + + populate_field_type(schema, "[0]", fields_schema) + + assert schema["validate"]["network"]["links"]["type"] == "array" + + def test_network_items_recursive_processing(self) -> None: + """Test recursive processing of network items.""" + schema: dict[str, Any] = { + "idx": 0, + "validate": { + "network": { + "links": { + "items": { + "local": { + "properties": { + "priority": {}, + } + } + } + }, + } + }, + } + fields_schema = create_mock_fields_schema( + link_fields={"links": ("array", "string")}, + extra_fields={"priority": ("integer", None)}, + ) + + populate_field_type(schema, "[0]", fields_schema) + + # Check network link gets array type + assert schema["validate"]["network"]["links"]["type"] == "array" + # Check nested local properties get processed + nested_props = schema["validate"]["network"]["links"]["items"]["local"][ + "properties" + ] + assert nested_props["priority"]["type"] == "integer" + + def test_network_contains_recursive_processing(self) -> None: + """Test recursive processing of network contains.""" + schema: dict[str, Any] = { + "idx": 0, + "validate": { + "network": { + "links": { + "contains": { + "local": { + "properties": { + "status": {}, + } + } + } + }, + } + }, + } + fields_schema = create_mock_fields_schema( + link_fields={"links": ("array", "string")}, + core_fields={"status": ("string", None)}, + ) + + populate_field_type(schema, "[0]", fields_schema) + + # Check nested contains local properties get processed + nested_props = schema["validate"]["network"]["links"]["contains"]["local"][ + "properties" + ] + assert nested_props["status"]["type"] == "string" + + def test_allof_recursive_processing(self) -> None: + """Test recursive processing of allOf entries.""" + schema: dict[str, Any] = { + "idx": 0, + "validate": { + "local": { + "allOf": [ + {"properties": {"field1": {}}}, + {"properties": {"field2": {}}}, + ] + } + }, + } + fields_schema = create_mock_fields_schema( + extra_fields={ + "field1": ("string", None), + "field2": ("integer", None), + } + ) + + populate_field_type(schema, "[0]", fields_schema) + + all_of = schema["validate"]["local"]["allOf"] + assert all_of[0]["properties"]["field1"]["type"] == "string" + assert all_of[1]["properties"]["field2"]["type"] == "integer" + + def test_existing_correct_type_preserved(self) -> None: + """Test that existing correct type is preserved.""" + schema: dict[str, Any] = { + "idx": 0, + "validate": { + "local": { + "properties": { + "priority": {"type": "string"}, + } + } + }, + } + fields_schema = create_mock_fields_schema( + extra_fields={"priority": ("string", None)} + ) + + populate_field_type(schema, "[0]", fields_schema) + + assert schema["validate"]["local"]["properties"]["priority"]["type"] == "string" + + def test_ref_skipped(self) -> None: + """Test that $ref entries are skipped (should be resolved before this).""" + schema: dict[str, Any] = { + "idx": 0, + "validate": { + "local": {"$ref": "#/$defs/something"}, + }, + } + fields_schema = create_mock_fields_schema() + + # Should not raise, just skip + populate_field_type(schema, "[0]", fields_schema) + + assert schema["validate"]["local"] == {"$ref": "#/$defs/something"} + + def test_missing_validate_is_safe(self) -> None: + """Test that missing validate field doesn't crash.""" + schema: dict[str, Any] = {"idx": 0} + fields_schema = create_mock_fields_schema() + + # Should not raise + populate_field_type(schema, "[0]", fields_schema) + + def test_non_dict_values_ignored(self) -> None: + """Test that non-dict values in unexpected places are safely ignored.""" + schema: dict[str, Any] = { + "idx": 0, + "select": "not a dict", # Malformed but should be ignored + "validate": {"local": "also not a dict"}, + } + fields_schema = create_mock_fields_schema() + + # Should not raise - defensive checks should skip non-dicts + populate_field_type(schema, "[0]", fields_schema) + + +class TestPopulateFieldTypeFailure: + """Test error handling scenarios.""" + + def test_unknown_field_raises_error(self) -> None: + """Test that unknown field in properties raises NeedsConfigException.""" + schema: dict[str, Any] = { + "idx": 0, + "validate": { + "local": { + "properties": { + "unknown_field": {}, + } + } + }, + } + fields_schema = create_mock_fields_schema() + + with pytest.raises(NeedsConfigException) as exc_info: + populate_field_type(schema, "[0]", fields_schema) + + assert "unknown_field" in str(exc_info.value) + assert "not a known extra option, extra link, or core field" in str( + exc_info.value + ) + + def test_type_mismatch_extra_field(self) -> None: + """Test that type mismatch for extra field raises error.""" + schema: dict[str, Any] = { + "idx": 0, + "validate": { + "local": { + "properties": { + "priority": {"type": "integer"}, # Wrong type + } + } + }, + } + fields_schema = create_mock_fields_schema( + extra_fields={"priority": ("string", None)} + ) + + with pytest.raises(NeedsConfigException) as exc_info: + populate_field_type(schema, "[0]", fields_schema) + + assert "priority" in str(exc_info.value) + assert "has type 'integer'" in str(exc_info.value) + assert "expected 'string'" in str(exc_info.value) + + def test_type_mismatch_link_field(self) -> None: + """Test that type mismatch for link field raises error.""" + schema: dict[str, Any] = { + "idx": 0, + "validate": { + "local": { + "properties": { + "links": {"type": "string"}, # Should be array + } + } + }, + } + fields_schema = create_mock_fields_schema( + link_fields={"links": ("array", "string")} + ) + + with pytest.raises(NeedsConfigException) as exc_info: + populate_field_type(schema, "[0]", fields_schema) + + assert "links" in str(exc_info.value) + assert "has type 'string'" in str(exc_info.value) + assert "expected 'array'" in str(exc_info.value) + + def test_item_type_mismatch(self) -> None: + """Test that item type mismatch raises error.""" + schema: dict[str, Any] = { + "idx": 0, + "validate": { + "local": { + "properties": { + "tags": { + "items": {"type": "integer"}, # Should be string + }, + } + } + }, + } + fields_schema = create_mock_fields_schema( + extra_fields={"tags": ("array", "string")} + ) + + with pytest.raises(NeedsConfigException) as exc_info: + populate_field_type(schema, "[0]", fields_schema) + + assert "tags" in str(exc_info.value) + assert "items.type 'integer'" in str(exc_info.value) + assert "expected 'string'" in str(exc_info.value) + + def test_contains_type_mismatch(self) -> None: + """Test that contains type mismatch raises error.""" + schema: dict[str, Any] = { + "idx": 0, + "validate": { + "local": { + "properties": { + "categories": { + "contains": {"type": "boolean"}, # Should be string + }, + } + } + }, + } + fields_schema = create_mock_fields_schema( + extra_fields={"categories": ("array", "string")} + ) + + with pytest.raises(NeedsConfigException) as exc_info: + populate_field_type(schema, "[0]", fields_schema) + + assert "categories" in str(exc_info.value) + assert "contains.type 'boolean'" in str(exc_info.value) + assert "expected 'string'" in str(exc_info.value) + + def test_object_type_conflict_with_properties(self) -> None: + """Test that type != 'object' with properties keyword raises error.""" + schema: dict[str, Any] = { + "idx": 0, + "validate": { + "local": { + "type": "array", # Conflict with properties + "properties": {}, + } + }, + } + fields_schema = create_mock_fields_schema() + + with pytest.raises(NeedsConfigException) as exc_info: + populate_field_type(schema, "[0]", fields_schema) + + assert "object keywords" in str(exc_info.value) + assert "type is 'array'" in str(exc_info.value) + assert "expected 'object'" in str(exc_info.value) + + def test_error_includes_path(self) -> None: + """Test that error messages include the schema path.""" + schema: dict[str, Any] = { + "idx": 0, + "validate": { + "network": { + "links": { + "items": { + "local": { + "properties": { + "unknown": {}, + } + } + } + } + } + }, + } + fields_schema = create_mock_fields_schema( + link_fields={"links": ("array", "string")} + ) + + with pytest.raises(NeedsConfigException) as exc_info: + populate_field_type(schema, "test_schema[0]", fields_schema) + + error_msg = str(exc_info.value) + assert "test_schema[0]" in error_msg + assert "validate.network.links.items.local.properties.unknown" in error_msg + + +class TestPopulateFieldTypeEdgeCases: + """Test edge cases and defensive behavior.""" + + def test_deeply_nested_allof(self) -> None: + """Test deeply nested allOf structures.""" + schema: dict[str, Any] = { + "idx": 0, + "validate": { + "local": { + "allOf": [ + { + "allOf": [ + {"properties": {"deep_field": {}}}, + ] + }, + ] + } + }, + } + fields_schema = create_mock_fields_schema( + extra_fields={"deep_field": ("boolean", None)} + ) + + populate_field_type(schema, "[0]", fields_schema) + + deep_props = schema["validate"]["local"]["allOf"][0]["allOf"][0]["properties"] + assert deep_props["deep_field"]["type"] == "boolean" + + def test_multiple_fields_in_properties(self) -> None: + """Test multiple fields processed correctly.""" + schema: dict[str, Any] = { + "idx": 0, + "validate": { + "local": { + "properties": { + "field1": {}, + "field2": {}, + "field3": {"items": {}}, + } + } + }, + } + fields_schema = create_mock_fields_schema( + extra_fields={ + "field1": ("string", None), + "field2": ("integer", None), + "field3": ("array", "number"), + } + ) + + populate_field_type(schema, "[0]", fields_schema) + + props = schema["validate"]["local"]["properties"] + assert props["field1"]["type"] == "string" + assert props["field2"]["type"] == "integer" + assert props["field3"]["type"] == "array" + assert props["field3"]["items"]["type"] == "number" + + def test_select_and_validate_both_processed(self) -> None: + """Test both select and validate are processed.""" + schema: dict[str, Any] = { + "idx": 0, + "select": {"properties": {"type": {}}}, + "validate": { + "local": {"properties": {"status": {}}}, + }, + } + fields_schema = create_mock_fields_schema( + core_fields={ + "type": ("string", None), + "status": ("string", None), + } + ) + + populate_field_type(schema, "[0]", fields_schema) + + assert schema["select"]["properties"]["type"]["type"] == "string" + assert schema["validate"]["local"]["properties"]["status"]["type"] == "string" + + def test_network_multiple_link_types(self) -> None: + """Test multiple link types in network.""" + schema: dict[str, Any] = { + "idx": 0, + "validate": { + "network": { + "links": {}, + "parent": {}, + } + }, + } + fields_schema = create_mock_fields_schema( + link_fields={ + "links": ("array", "string"), + "parent": ("array", "string"), + } + ) + + populate_field_type(schema, "[0]", fields_schema) + + assert schema["validate"]["network"]["links"]["type"] == "array" + assert schema["validate"]["network"]["parent"]["type"] == "array" + + def test_empty_properties_dict(self) -> None: + """Test empty properties dict is handled.""" + schema: dict[str, Any] = { + "idx": 0, + "validate": { + "local": {"properties": {}}, + }, + } + fields_schema = create_mock_fields_schema() + + populate_field_type(schema, "[0]", fields_schema) + + assert schema["validate"]["local"]["type"] == "object" + assert schema["validate"]["local"]["properties"] == {} + + def test_required_without_properties(self) -> None: + """Test required keyword alone triggers object type injection.""" + schema: dict[str, Any] = { + "idx": 0, + "validate": { + "local": {"required": ["field1"]}, + }, + } + fields_schema = create_mock_fields_schema() + + populate_field_type(schema, "[0]", fields_schema) + + assert schema["validate"]["local"]["type"] == "object" + + def test_unevaluated_properties_triggers_object_type(self) -> None: + """Test unevaluatedProperties keyword triggers object type injection.""" + schema: dict[str, Any] = { + "idx": 0, + "validate": { + "local": {"unevaluatedProperties": False}, + }, + } + fields_schema = create_mock_fields_schema() + + populate_field_type(schema, "[0]", fields_schema) + + assert schema["validate"]["local"]["type"] == "object" From efd832488a5b5da58c956e8ccaafc778049320c4 Mon Sep 17 00:00:00 2001 From: Chris Sewell Date: Mon, 2 Feb 2026 21:31:49 +0100 Subject: [PATCH 2/3] Delete local_validation_cache.md populate_field_type --- docs/schema/local_validation_cache.md | 294 -------------------------- 1 file changed, 294 deletions(-) delete mode 100644 docs/schema/local_validation_cache.md diff --git a/docs/schema/local_validation_cache.md b/docs/schema/local_validation_cache.md deleted file mode 100644 index 32c06bad1..000000000 --- a/docs/schema/local_validation_cache.md +++ /dev/null @@ -1,294 +0,0 @@ -# Local Validation Caching System - -This document explains the local validation caching system implemented in `sphinx_needs/schema/core.py` to optimize schema validation performance. - -## Overview - -The caching system avoids redundant JSON schema validation when the same need is validated against the same schema multiple times. This is common in network validation where linked needs are traversed repeatedly from different starting points. - -## Problem Statement - -In network validation, the same need can be validated multiple times: - -``` -Need REQ_002 links to REQ_001 -Need REQ_003 links to REQ_001 - - ┌────────┐ links ┌────────┐ - │REQ_002 │──────────────►│REQ_001 │ - └────────┘ └────────┘ - ▲ - ┌────────┐ links │ - │REQ_003 │────────────────────┘ - └────────┘ - -Validation order (without cache): -1. Validate REQ_002 - └── Network: validate REQ_001 (expensive!) - -2. Validate REQ_003 - └── Network: validate REQ_001 (same validation repeated!) -``` - -## Solution: Higher-Level Caching - -Rather than restructuring the recursive algorithm (which would change warning attribution), we cache at a higher level: the raw validation results before context-dependent data is added. - -## Data Structures - -### CachedLocalResult - -Stores the raw validation outcome without context-dependent data: - -```python -@dataclass(slots=True) -class CachedLocalResult: - reduced_need: dict[str, Any] # The reduced need that was validated - errors: tuple[ValidationError, ...] # Validation errors (empty if passed) - schema_error: str | None = None # Schema compilation error, if any - - @property - def success(self) -> bool: - return not self.errors and self.schema_error is None -``` - -### LocalValidationCache - -A cache keyed by `(need_id, schema_key)`: - -```python -@dataclass(slots=True) -class LocalValidationCache: - _cache: dict[tuple[str, tuple[str, ...]], CachedLocalResult] - - def get(self, need_id: str, schema_key: tuple[str, ...]) -> CachedLocalResult | None - def store(self, need_id: str, schema_key: tuple[str, ...], result: CachedLocalResult) -> None - def invalidate(self, need_id: str) -> None # Remove all entries for a need - def clear(self) -> None -``` - -## Cache Key Structure - -``` -schema_key = tuple of schema path components - -Examples: - ("requirement_schema", "local") - ("requirement_schema", "validate", "network", "implements", "items", "local") - ("specification_schema", "local") - -Combined with need_id for full cache key: - ("REQ_001", ("requirement_schema", "local")) - ("REQ_001", ("requirement_schema", "validate", "network", ...)) -``` - -## Validation Flow - -### Without Cache - -``` -validate_type_schema(needs, schema) - │ - ▼ - ┌───────────────┐ - │ For each need │ - └───────────────┘ - │ - ▼ -recurse_validate_schemas(need, ...) - │ - ├──► LOCAL: get_ontology_warnings() - │ │ - │ ▼ - │ ┌─────────────────────────┐ - │ │ 1. reduce_need() │ ◄── Expensive - │ │ 2. validator.validate() │ ◄── Expensive - │ │ 3. Build warnings │ - │ └─────────────────────────┘ - │ - └──► NETWORK: For each link_type - │ - ▼ - For each target_need_id - │ - ▼ - recurse_validate_schemas(target_need, ...) ◄── Same need may be - │ validated multiple - ▼ times! - (repeat LOCAL + NETWORK) -``` - -### With Cache - -``` -validate_type_schema(needs, schema) - │ - ▼ - ┌─────────────────────────────┐ - │ Create LocalValidationCache │ - │ Create validator_cache │ - └─────────────────────────────┘ - │ - ▼ - ┌───────────────┐ - │ For each need │ - └───────────────┘ - │ - ▼ -recurse_validate_schemas(need, ..., local_cache) - │ - ├──► LOCAL validation: - │ │ - │ ▼ - │ ┌──────────────────────────────────┐ - │ │ cache_key = (schema_path, "local")│ - │ │ cached = local_cache.get( │ - │ │ need_id, cache_key) │ - │ └──────────────────────────────────┘ - │ │ - │ ┌──────┴──────┐ - │ │ │ - │ ▼ ▼ - │ CACHE HIT CACHE MISS - │ │ │ - │ ▼ ▼ - │ ┌────────┐ ┌─────────────────────┐ - │ │Rebuild │ │ 1. reduce_need() │ - │ │warnings│ │ 2. validate() │ - │ │from │ │ 3. Cache result │ - │ │cached │ │ 4. Build warnings │ - │ │result │ └─────────────────────┘ - │ └────────┘ - │ - └──► NETWORK: (unchanged, passes local_cache down) -``` - -## What Gets Cached vs. Rebuilt - -### Cached (Context-Independent) - -| Field | Description | -|-------|-------------| -| `reduced_need` | The need dict after field reduction | -| `errors` | Tuple of `ValidationError` objects | -| `schema_error` | Schema compilation error message | - -### Rebuilt (Context-Dependent) - -| Field | Description | -|-------|-------------| -| `need_path` | Path like `["REQ_002", "implements", "REQ_001"]` | -| `schema_path` | Path like `["my_schema", "local", "properties"]` | -| `user_message` | Only at `recurse_level == 0` | -| `user_severity` | Only at `recurse_level == 0` | -| `rule` | `local_fail` vs `network_local_fail` | - -This separation is key: the expensive validation is cached, but warnings are rebuilt with the correct context for each call site. - -## Invalidation for Incremental Updates - -When a need is modified, its cached entries can be invalidated: - -``` -Before modification: -┌─────────────────────────────────────────┐ -│ Cache entries: │ -│ (REQ_001, schema_A) → result1 │ -│ (REQ_001, schema_B) → result2 │ -│ (REQ_002, schema_A) → result3 │ -│ (REQ_003, schema_A) → result4 │ -└─────────────────────────────────────────┘ - -After local_cache.invalidate("REQ_001"): -┌─────────────────────────────────────────┐ -│ Cache entries: │ -│ (REQ_002, schema_A) → result3 ✓ │ -│ (REQ_003, schema_A) → result4 ✓ │ -└─────────────────────────────────────────┘ -``` - -## Code Changes - -### Modified Files - -#### `sphinx_needs/schema/core.py` - -1. **Added imports:** - ```python - from dataclasses import dataclass, field as dataclass_field - ``` - -2. **Added `CachedLocalResult` dataclass** (after line 35): - - Stores validation results without context - - Has `success` property - -3. **Added `LocalValidationCache` dataclass:** - - `get()` - Retrieve cached result - - `store()` - Store a validation result - - `invalidate()` - Remove all entries for a need - - `clear()` - Clear entire cache - - `__len__()` - Return cache size - -4. **Modified `validate_type_schema()`:** - - Creates `LocalValidationCache` instance - - Passes `local_cache` to `recurse_validate_schemas()` - -5. **Modified `recurse_validate_schemas()`:** - - Added `local_cache: LocalValidationCache` parameter - - Checks cache before validating - - Uses `_construct_warnings_from_cache()` on cache hit - - Passes `local_cache` and `cache_key` to `get_ontology_warnings()` on cache miss - -6. **Added `_construct_warnings_from_cache()` function:** - - Rebuilds `OntologyWarning` objects from cached results - - Applies current context (need_path, schema_path, user_message, user_severity) - -7. **Modified `get_ontology_warnings()`:** - - Added optional `local_cache` and `cache_key` parameters - - Stores results in cache when parameters are provided - -### New Test Files - -#### `tests/schema/test_core_cache.py` - -Unit tests for the caching classes: - -- `TestCachedLocalResult` - Tests for the result dataclass -- `TestLocalValidationCache` - Tests for cache operations including: - - Basic get/store - - Different schema keys for same need - - Different needs with same schema - - Invalidation - - Clear - - Overwrite - -## Usage Example - -```python -from sphinx_needs.schema.core import LocalValidationCache, validate_type_schema - -# The cache is created and used automatically within validate_type_schema -# No changes needed to calling code - -# For incremental updates (future use): -cache = LocalValidationCache() -# ... validation happens ... - -# When a need is modified: -cache.invalidate("REQ_001") # Clear cached results for this need - -# Re-validate only affected needs -``` - -## Performance Impact - -The caching is most effective when: - -1. **Many needs link to the same targets** - Each target is validated once instead of N times -2. **Deep network validation** - Reduces exponential re-validation -3. **Multiple schemas validate the same needs** - Each (need, schema) pair is cached separately - -The cache has minimal overhead: -- Cache lookup is O(1) dict access -- Stored data is already computed (no copies needed) -- Memory usage scales with unique (need_id, schema_key) pairs From fae627e9d2a30d567322cde696286ff66b498f6a Mon Sep 17 00:00:00 2001 From: Chris Sewell Date: Mon, 2 Feb 2026 21:57:17 +0100 Subject: [PATCH 3/3] Update config_utils.py --- sphinx_needs/schema/config_utils.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sphinx_needs/schema/config_utils.py b/sphinx_needs/schema/config_utils.py index 64bd349eb..d28848a5f 100644 --- a/sphinx_needs/schema/config_utils.py +++ b/sphinx_needs/schema/config_utils.py @@ -271,7 +271,7 @@ def populate_field_type( Inject field type into select / validate fields in schema. This is a type-directed implementation that surgically walks through the - typed structure of SchemasRootType, rather than using runtime keyword detection. + typed structure of SchemasRootType. :param schema: The root schema being processed. This is not yet validated, but is expected to be dereferenced already.