-
Notifications
You must be signed in to change notification settings - Fork 0
Description
[Conversation Reference: "User requested that auto-refresh should reconcile registry metadata with filesystem reality. When auto-refresh runs, it should detect ALL index types present on disk and update the registry to match - ensuring metadata always reflects what actually exists."]
Story Overview
Objective: Use auto-refresh as an opportunity to reconcile the golden repo registry with filesystem reality - detecting ALL index types that exist on disk and synchronizing the registry metadata accordingly.
User Value: The registry always reflects the true state of indexes in a golden repo. Users never have stale metadata, and all existing indexes are automatically maintained during refresh.
Acceptance Criteria Summary: Auto-refresh detects all existing index types, reconciles registry metadata with reality, and regenerates all detected indexes.
Acceptance Criteria
AC1: Filesystem-Based Index Detection for ALL Types
Scenario: The refresh scheduler detects all index types that exist on disk
Given a golden repo with various indexes created on disk
When the refresh scheduler prepares to create a new versioned index
Then it detects which index types exist by inspecting the filesystem:
- Semantic: .code-indexer/index/code-indexer/ exists
- FTS: .code-indexer/fts/ exists
- Temporal: .code-indexer/index/code-indexer-temporal/ exists
- SCIP: .code-indexer/scip/*.scip.db files exist
And it returns a complete picture of what indexes are presentTechnical Requirements:
- Add
_detect_existing_indexes(repo_path: Path) -> Dict[str, bool]method to RefreshScheduler - Return dict with keys: "semantic", "fts", "temporal", "scip" and boolean presence values
- Use same detection logic as
ActivatedRepoIndexManager._get_*_status()methods - Call detection on source repo BEFORE CoW clone
AC2: Registry Reconciliation with Filesystem Reality
Scenario: Registry is updated to match what actually exists on disk
Given a golden repo where registry metadata doesn't match filesystem reality
When auto-refresh runs and detects the actual indexes present
Then the registry is updated to reflect reality:
- If index exists on disk but registry says disabled → enable in registry
- If index missing from disk but registry says enabled → disable in registry
And a reconciliation log entry is created showing what changedTechnical Requirements:
- Add
_reconcile_registry_with_filesystem(alias_name: str, detected: Dict[str, bool])method - Compare detected state with registry state for each index type
- Update registry for any mismatches (both directions)
- Log all reconciliation changes: "Reconciled {alias}: {index_type} {old_state} → {new_state}"
- Handle all index types: semantic, fts, temporal, scip
AC3: Registry Schema with Complete Index Tracking
Scenario: Registry can track presence of all index types
Given the GlobalRegistry schema
When storing golden repo configuration
Then it can track presence/enablement of all index types:
- has_semantic: bool (default True - always created with semantic)
- has_fts: bool (default True - created with --fts)
- enable_temporal: bool (existing field)
- enable_scip: bool (new field)
And backward compatibility is maintained for existing entriesTechnical Requirements:
- Ensure registry schema supports tracking all four index types
- Add
enable_scip: boolfield to GlobalRegistry repo entries - Backward compatibility: missing fields default appropriately
- Add
update_global_repo()method if not exists for partial updates
AC4: Detection-Based Index Regeneration
Scenario: All detected indexes are regenerated during refresh
Given a golden repo with indexes detected on disk
When _create_new_index() creates a new versioned copy
Then it regenerates ALL detected index types in order:
1. Semantic + FTS (always, if detected)
2. Temporal (if detected)
3. SCIP (if detected)
And each regeneration respects its configured timeoutTechnical Requirements:
- Step 5a:
cidx index --fts(semantic + FTS) - Step 5b:
cidx index --index-commitsif temporal detected - Step 5c:
cidx scip generateif SCIP detected - Add
cidx_scip_generate_timeoutto ResourceConfig (default: 1800s) - Use
temporal_optionsfrom registry when regenerating temporal
AC5: Graceful Failure Handling Per Index Type
Scenario: Failure in one index type doesn't block others
Given a golden repo where one index generation might fail
When any index generation fails or times out
Then the error is logged with details for that specific index type
And the refresh continues with remaining index types
And the refresh result includes per-index-type status
And registry is updated to reflect what actually succeededTechnical Requirements:
- Each index type (temporal, SCIP) wrapped in try/except
- Log warning (not error) for optional index failures
- Semantic+FTS failure is still fatal (required indexes)
- After regeneration, re-detect and update registry with actual results
- Return refresh result with status for each index type
AC6: Reconciliation on Every Refresh (Continuous Sync)
Scenario: Every auto-refresh keeps registry in sync
Given a golden repo that undergoes multiple auto-refreshes over time
When each auto-refresh runs
Then it always reconciles registry with current filesystem state
And manual index additions/removals between refreshes are detected
And the registry converges to match reality after each refreshTechnical Requirements:
- Reconciliation runs at start of every
_create_new_index()call - Reconciliation also runs at end (to capture regeneration results)
- Idempotent: running multiple times produces same result
- No manual intervention needed to fix metadata drift
Implementation Status
Progress Tracking:
- Core implementation complete
- Unit tests passing
- Integration tests passing
- Code review approved
- Manual E2E testing completed
- Documentation updated
Completion: 0/6 tasks complete (0%)
Technical Implementation Details
Files to Modify
src/code_indexer/global_repos/refresh_scheduler.py # Add detection + reconciliation + Step 5c
src/code_indexer/global_repos/global_registry.py # Add enable_scip, update_global_repo()
src/code_indexer/server/utils/config_manager.py # Add cidx_scip_generate_timeout
Detection Method
def _detect_existing_indexes(self, repo_path: Path) -> Dict[str, bool]:
"""Detect which index types exist on disk."""
return {
"semantic": (repo_path / ".code-indexer" / "index" / "code-indexer").exists(),
"fts": (repo_path / ".code-indexer" / "fts").exists(),
"temporal": (repo_path / ".code-indexer" / "index" / "code-indexer-temporal").exists(),
"scip": self._has_scip_indexes(repo_path),
}
def _has_scip_indexes(self, repo_path: Path) -> bool:
scip_dir = repo_path / ".code-indexer" / "scip"
return scip_dir.exists() and bool(list(scip_dir.glob("*.scip.db")))Reconciliation Method
def _reconcile_registry_with_filesystem(self, alias_name: str, detected: Dict[str, bool]):
"""Reconcile registry metadata with filesystem reality."""
repo_info = self.registry.get_global_repo(alias_name)
if not repo_info:
return
changes = []
# Temporal reconciliation
registry_temporal = repo_info.get("enable_temporal", False)
if detected["temporal"] != registry_temporal:
changes.append(f"enable_temporal: {registry_temporal} → {detected['temporal']}")
# SCIP reconciliation
registry_scip = repo_info.get("enable_scip", False)
if detected["scip"] != registry_scip:
changes.append(f"enable_scip: {registry_scip} → {detected['scip']}")
if changes:
logger.info(f"Reconciling {alias_name} registry with filesystem: {', '.join(changes)}")
self.registry.update_global_repo(
alias_name,
enable_temporal=detected["temporal"],
enable_scip=detected["scip"],
)Modified _create_new_index() Flow
def _create_new_index(self, alias_name: str, source_path: str) -> str:
# RECONCILIATION POINT 1: Detect and sync before cloning
detected = self._detect_existing_indexes(Path(source_path))
logger.info(f"Detected indexes in source: {detected}")
self._reconcile_registry_with_filesystem(alias_name, detected)
# ... CoW clone, git fix, cidx fix-config ...
# Step 5a: Semantic + FTS (always required)
# ... existing code ...
# Step 5b: Temporal (if detected)
if detected["temporal"]:
logger.info("Temporal indexes detected, regenerating...")
# Use temporal_options from registry if available
# ... run cidx index --index-commits ...
# Step 5c: SCIP (if detected)
if detected["scip"]:
logger.info("SCIP indexes detected, regenerating...")
# ... run cidx scip generate ...
# RECONCILIATION POINT 2: Re-detect after regeneration to confirm
final_state = self._detect_existing_indexes(versioned_path)
self._reconcile_registry_with_filesystem(alias_name, final_state)
return str(versioned_path)Testing Requirements
Unit Test Coverage
- _detect_existing_indexes() returns correct dict for all combinations
- _reconcile_registry_with_filesystem() updates for mismatches in both directions
- Registry updates persisted correctly
- Reconciliation is idempotent
Integration Test Coverage
- Refresh with all four index types: all detected and regenerated
- Index on disk but not in registry: detected, regenerated, registry updated
- Index in registry but not on disk: registry updated to reflect absence
- Multiple refreshes converge to stable state
Definition of Done
Functional Completion
- Detection logic for all index types implemented
- Bidirectional registry reconciliation implemented
- All detected indexes regenerated during refresh
- Graceful failure handling per index type
- Registry reflects reality after every refresh
Quality Validation
- >90% test coverage for new code
- All tests passing
- Code review approved
- Manual testing validated
Story Points: 5 (detection + bidirectional reconciliation + SCIP regeneration)
Priority: High (ensures registry metadata accuracy, prevents index staleness)
Dependencies: None
Success Metric: After any auto-refresh, registry metadata exactly matches filesystem reality for all index types