Skip to content

[STORY] Log Context Inventory and Standardization #86

@jsbattig

Description

@jsbattig

Part of: #71

Feature: Self-Monitoring Core Infrastructure

Story: Log Context Inventory and Standardization

Overview

Objective: Audit all warning/error/critical log statements in the CIDX server codebase to ensure each has uniquely identifiable contextual information, enabling accurate source tracing and deduplication by the self-monitoring Claude instance.

User Value: As the self-monitoring Claude instance, I need each log entry to be uniquely identifiable so I can accurately determine the source code location, avoid creating duplicate issues, and provide actionable bug reports with precise reproduction context.

Problem Statement

Current logging practices across 95 server files (809 warning/error/critical calls) are inconsistent:

  • Some logs include correlation_id
  • Some include operation prefixes (e.g., "Hybrid auth ({auth_type}): ...")
  • Many are plain messages without unique identifiers

Without standardized identification, the self-monitoring system cannot:

  1. Distinguish similar errors from different code locations
  2. Create stable fingerprints for deduplication
  3. Provide precise source tracing in bug reports

Acceptance Criteria

AC1: Error Code System

Given the CIDX server codebase
When a standardized error code system is defined
Then each warning/error/critical log has a unique error code
And error codes follow format: {SUBSYSTEM}-{CATEGORY}-{NUMBER}
And error codes are documented in a central registry

Examples:
  | Error Code      | Meaning                                    |
  | AUTH-OIDC-001   | OIDC discovery endpoint unreachable        |
  | AUTH-OIDC-002   | OIDC token validation failed               |
  | GIT-SYNC-001    | Pre-pull clearing failed                   |
  | MCP-HANDLER-001 | Tool execution failed                      |
  | CACHE-HNSW-001  | Index load failed                          |

AC2: Subsystem Inventory

Given the 95 server files with logging
When subsystems are inventoried
Then each subsystem has a unique prefix:
  | Prefix    | Subsystem                          | Files |
  | AUTH      | Authentication (OIDC, session)     | 4+    |
  | GIT       | Git operations (sync, branches)    | 5+    |
  | MCP       | MCP handlers and session           | 3+    |
  | CACHE     | Index caching (HNSW, FTS, payload) | 4+    |
  | REPO      | Repository management              | 7+    |
  | QUERY     | Search and query                   | 3+    |
  | VALID     | Validation and health              | 5+    |
  | DEPLOY    | Auto-update and deployment         | 4+    |
  | SCIP      | SCIP code intelligence             | 4+    |
  | TELEM     | Telemetry and metrics              | 5+    |
  | STORE     | Storage and migration              | 2+    |
  | SVC       | General services                   | 15+   |
  | WEB       | Web routes and UI                  | 2+    |
  | APP       | Application startup                | 3+    |

AC3: Logging Standard Implementation

Given a log statement emitting warning/error/critical
When the standardized format is applied
Then it includes:
  - Error code as first element: "[{ERROR_CODE}]"
  - Operation context (what was being attempted)
  - Relevant variable data (sanitized, no secrets)
  - correlation_id in extra dict (if available)

Example (before):
  logger.error(f"Pre-pull clearing failed (non-blocking): {e}")

Example (after):
  logger.error(
      f"[GIT-SYNC-001] Pre-pull clearing failed: {e}",
      extra={"correlation_id": get_correlation_id(), "error_code": "GIT-SYNC-001"}
  )

AC4: Error Code Registry File

Given the error code system
When a registry file is created
Then it exists at src/code_indexer/server/error_codes.py
And it contains:
  - All error codes as constants
  - Human-readable descriptions
  - Severity level (warning, error, critical)
  - Suggested actions/documentation links
And it is importable by the self-monitoring system

AC5: Migration Checklist

Given the 809 existing log statements
When migration is complete
Then all warning/error/critical logs have error codes
And a migration report documents:
  - Total statements migrated
  - Statements per subsystem
  - Any statements intentionally excluded (with justification)
And migration is verified by grep showing no untagged warning/error logs

AC6: Self-Monitoring Integration

Given the error code registry
When the self-monitoring prompt is assembled
Then it includes the error code registry (or path to it)
And Claude can map any [ERROR_CODE] to its definition
And deduplication uses error_code as primary fingerprint component

Technical Requirements

Files to Create

  • src/code_indexer/server/error_codes.py (central registry)
  • src/code_indexer/server/logging_utils.py (helper functions)
  • docs/error-codes.md (developer documentation)

Files to Modify (809 log statements across 95 files)

Priority order by impact:

  1. src/code_indexer/server/mcp/handlers.py (135 calls - highest density)
  2. src/code_indexer/server/web/routes.py (41 calls)
  3. src/code_indexer/server/routers/git.py (42 calls)
  4. src/code_indexer/server/repositories/activated_repo_manager.py (41 calls)
  5. src/code_indexer/server/app.py (37 calls)
  6. Remaining 90 files

Error Code Format

# src/code_indexer/server/error_codes.py
from dataclasses import dataclass
from enum import Enum

class Severity(Enum):
    WARNING = "warning"
    ERROR = "error"
    CRITICAL = "critical"

@dataclass
class ErrorDefinition:
    code: str
    description: str
    severity: Severity
    action: str  # What to do when this occurs

ERROR_REGISTRY = {
    "AUTH-OIDC-001": ErrorDefinition(
        code="AUTH-OIDC-001",
        description="OIDC discovery endpoint unreachable",
        severity=Severity.ERROR,
        action="Check OIDC provider connectivity and configuration"
    ),
    # ... all error codes
}

Verification Command

# After migration, this should return 0 matches
grep -rE "logger\.(warning|error|critical)\(" src/code_indexer/server/ | grep -v "\[.*-.*-[0-9]"

Testing Requirements

  • Unit test for error_codes.py registry completeness
  • Unit test for logging_utils helper functions
  • Integration test verifying log output format
  • Verification script that all logs have error codes

Dependencies

Definition of Done

  • Error code system defined and documented
  • All 95 files audited and subsystem prefixes assigned
  • error_codes.py registry created with all codes
  • All 809 warning/error/critical logs migrated to new format
  • Verification grep returns 0 untagged logs
  • docs/error-codes.md created for developers
  • Self-monitoring prompt updated to include registry
  • All tests pass
  • Code review approved

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions