Skip to content

Auto-Refresh Index Reconciliation: Sync Registry Metadata with Filesystem Reality #70

@jsbattig

Description

@jsbattig

[Conversation Reference: "User requested that auto-refresh should reconcile registry metadata with filesystem reality. When auto-refresh runs, it should detect ALL index types present on disk and update the registry to match - ensuring metadata always reflects what actually exists."]

Story Overview

Objective: Use auto-refresh as an opportunity to reconcile the golden repo registry with filesystem reality - detecting ALL index types that exist on disk and synchronizing the registry metadata accordingly.

User Value: The registry always reflects the true state of indexes in a golden repo. Users never have stale metadata, and all existing indexes are automatically maintained during refresh.

Acceptance Criteria Summary: Auto-refresh detects all existing index types, reconciles registry metadata with reality, and regenerates all detected indexes.

Acceptance Criteria

AC1: Filesystem-Based Index Detection for ALL Types

Scenario: The refresh scheduler detects all index types that exist on disk

Given a golden repo with various indexes created on disk
When the refresh scheduler prepares to create a new versioned index
Then it detects which index types exist by inspecting the filesystem:
  - Semantic: .code-indexer/index/code-indexer/ exists
  - FTS: .code-indexer/fts/ exists
  - Temporal: .code-indexer/index/code-indexer-temporal/ exists
  - SCIP: .code-indexer/scip/*.scip.db files exist
And it returns a complete picture of what indexes are present

Technical Requirements:

  • Add _detect_existing_indexes(repo_path: Path) -> Dict[str, bool] method to RefreshScheduler
  • Return dict with keys: "semantic", "fts", "temporal", "scip" and boolean presence values
  • Use same detection logic as ActivatedRepoIndexManager._get_*_status() methods
  • Call detection on source repo BEFORE CoW clone

AC2: Registry Reconciliation with Filesystem Reality

Scenario: Registry is updated to match what actually exists on disk

Given a golden repo where registry metadata doesn't match filesystem reality
When auto-refresh runs and detects the actual indexes present
Then the registry is updated to reflect reality:
  - If index exists on disk but registry says disabled → enable in registry
  - If index missing from disk but registry says enabled → disable in registry
And a reconciliation log entry is created showing what changed

Technical Requirements:

  • Add _reconcile_registry_with_filesystem(alias_name: str, detected: Dict[str, bool]) method
  • Compare detected state with registry state for each index type
  • Update registry for any mismatches (both directions)
  • Log all reconciliation changes: "Reconciled {alias}: {index_type} {old_state} → {new_state}"
  • Handle all index types: semantic, fts, temporal, scip

AC3: Registry Schema with Complete Index Tracking

Scenario: Registry can track presence of all index types

Given the GlobalRegistry schema
When storing golden repo configuration
Then it can track presence/enablement of all index types:
  - has_semantic: bool (default True - always created with semantic)
  - has_fts: bool (default True - created with --fts)
  - enable_temporal: bool (existing field)
  - enable_scip: bool (new field)
And backward compatibility is maintained for existing entries

Technical Requirements:

  • Ensure registry schema supports tracking all four index types
  • Add enable_scip: bool field to GlobalRegistry repo entries
  • Backward compatibility: missing fields default appropriately
  • Add update_global_repo() method if not exists for partial updates

AC4: Detection-Based Index Regeneration

Scenario: All detected indexes are regenerated during refresh

Given a golden repo with indexes detected on disk
When _create_new_index() creates a new versioned copy
Then it regenerates ALL detected index types in order:
  1. Semantic + FTS (always, if detected)
  2. Temporal (if detected)
  3. SCIP (if detected)
And each regeneration respects its configured timeout

Technical Requirements:

  • Step 5a: cidx index --fts (semantic + FTS)
  • Step 5b: cidx index --index-commits if temporal detected
  • Step 5c: cidx scip generate if SCIP detected
  • Add cidx_scip_generate_timeout to ResourceConfig (default: 1800s)
  • Use temporal_options from registry when regenerating temporal

AC5: Graceful Failure Handling Per Index Type

Scenario: Failure in one index type doesn't block others

Given a golden repo where one index generation might fail
When any index generation fails or times out
Then the error is logged with details for that specific index type
And the refresh continues with remaining index types
And the refresh result includes per-index-type status
And registry is updated to reflect what actually succeeded

Technical Requirements:

  • Each index type (temporal, SCIP) wrapped in try/except
  • Log warning (not error) for optional index failures
  • Semantic+FTS failure is still fatal (required indexes)
  • After regeneration, re-detect and update registry with actual results
  • Return refresh result with status for each index type

AC6: Reconciliation on Every Refresh (Continuous Sync)

Scenario: Every auto-refresh keeps registry in sync

Given a golden repo that undergoes multiple auto-refreshes over time
When each auto-refresh runs
Then it always reconciles registry with current filesystem state
And manual index additions/removals between refreshes are detected
And the registry converges to match reality after each refresh

Technical Requirements:

  • Reconciliation runs at start of every _create_new_index() call
  • Reconciliation also runs at end (to capture regeneration results)
  • Idempotent: running multiple times produces same result
  • No manual intervention needed to fix metadata drift

Implementation Status

Progress Tracking:

  • Core implementation complete
  • Unit tests passing
  • Integration tests passing
  • Code review approved
  • Manual E2E testing completed
  • Documentation updated

Completion: 0/6 tasks complete (0%)

Technical Implementation Details

Files to Modify

src/code_indexer/global_repos/refresh_scheduler.py  # Add detection + reconciliation + Step 5c
src/code_indexer/global_repos/global_registry.py    # Add enable_scip, update_global_repo()
src/code_indexer/server/utils/config_manager.py     # Add cidx_scip_generate_timeout

Detection Method

def _detect_existing_indexes(self, repo_path: Path) -> Dict[str, bool]:
    """Detect which index types exist on disk."""
    return {
        "semantic": (repo_path / ".code-indexer" / "index" / "code-indexer").exists(),
        "fts": (repo_path / ".code-indexer" / "fts").exists(),
        "temporal": (repo_path / ".code-indexer" / "index" / "code-indexer-temporal").exists(),
        "scip": self._has_scip_indexes(repo_path),
    }

def _has_scip_indexes(self, repo_path: Path) -> bool:
    scip_dir = repo_path / ".code-indexer" / "scip"
    return scip_dir.exists() and bool(list(scip_dir.glob("*.scip.db")))

Reconciliation Method

def _reconcile_registry_with_filesystem(self, alias_name: str, detected: Dict[str, bool]):
    """Reconcile registry metadata with filesystem reality."""
    repo_info = self.registry.get_global_repo(alias_name)
    if not repo_info:
        return

    changes = []

    # Temporal reconciliation
    registry_temporal = repo_info.get("enable_temporal", False)
    if detected["temporal"] != registry_temporal:
        changes.append(f"enable_temporal: {registry_temporal}{detected['temporal']}")

    # SCIP reconciliation
    registry_scip = repo_info.get("enable_scip", False)
    if detected["scip"] != registry_scip:
        changes.append(f"enable_scip: {registry_scip}{detected['scip']}")

    if changes:
        logger.info(f"Reconciling {alias_name} registry with filesystem: {', '.join(changes)}")
        self.registry.update_global_repo(
            alias_name,
            enable_temporal=detected["temporal"],
            enable_scip=detected["scip"],
        )

Modified _create_new_index() Flow

def _create_new_index(self, alias_name: str, source_path: str) -> str:
    # RECONCILIATION POINT 1: Detect and sync before cloning
    detected = self._detect_existing_indexes(Path(source_path))
    logger.info(f"Detected indexes in source: {detected}")
    self._reconcile_registry_with_filesystem(alias_name, detected)

    # ... CoW clone, git fix, cidx fix-config ...

    # Step 5a: Semantic + FTS (always required)
    # ... existing code ...

    # Step 5b: Temporal (if detected)
    if detected["temporal"]:
        logger.info("Temporal indexes detected, regenerating...")
        # Use temporal_options from registry if available
        # ... run cidx index --index-commits ...

    # Step 5c: SCIP (if detected)
    if detected["scip"]:
        logger.info("SCIP indexes detected, regenerating...")
        # ... run cidx scip generate ...

    # RECONCILIATION POINT 2: Re-detect after regeneration to confirm
    final_state = self._detect_existing_indexes(versioned_path)
    self._reconcile_registry_with_filesystem(alias_name, final_state)

    return str(versioned_path)

Testing Requirements

Unit Test Coverage

  • _detect_existing_indexes() returns correct dict for all combinations
  • _reconcile_registry_with_filesystem() updates for mismatches in both directions
  • Registry updates persisted correctly
  • Reconciliation is idempotent

Integration Test Coverage

  • Refresh with all four index types: all detected and regenerated
  • Index on disk but not in registry: detected, regenerated, registry updated
  • Index in registry but not on disk: registry updated to reflect absence
  • Multiple refreshes converge to stable state

Definition of Done

Functional Completion

  • Detection logic for all index types implemented
  • Bidirectional registry reconciliation implemented
  • All detected indexes regenerated during refresh
  • Graceful failure handling per index type
  • Registry reflects reality after every refresh

Quality Validation

  • >90% test coverage for new code
  • All tests passing
  • Code review approved
  • Manual testing validated

Story Points: 5 (detection + bidirectional reconciliation + SCIP regeneration)
Priority: High (ensures registry metadata accuracy, prevents index staleness)
Dependencies: None
Success Metric: After any auto-refresh, registry metadata exactly matches filesystem reality for all index types

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions