[STORY] Log Analysis and Issue Creation

**Part of**: #71

# Feature: Self-Monitoring Core Infrastructure
# Story: Log Analysis and Issue Creation

## Overview

**Objective**: Implement Claude Code log analysis with intelligent querying and GitHub issue creation.

**User Value**: As an admin, I want the system to automatically analyze logs, classify issues, and create non-duplicate GitHub issues so that problems are identified and tracked without manual log review.

## Acceptance Criteria

### AC1: Embedded Assets
```gherkin
Given the self_monitoring package
When issue_manager.py is copied from ~/.claude/scripts/utils/
Then it exists at src/code_indexer/server/self_monitoring/issue_manager.py
And it is extended to write issue metadata to SQLite on successful creation
And bug_report_standards.md is copied to src/code_indexer/server/self_monitoring/standards/

Given the embedded issue_manager.py
When a GitHub issue is successfully created
Then a row is inserted into self_monitoring_issues table
With github_issue_number, github_issue_url, classification, title, source_log_ids, created_at
```

### AC2: Claude Prompt Assembly
```gherkin
Given a scheduled self-monitoring scan triggers
When the Claude prompt is assembled
Then it includes: log database path (~/.cidx-server/logs.db), table schema, last_scan_log_id
And it includes list of existing open GitHub issues for duplicate checking
And it includes the editable prompt_template from config
And it instructs Claude to query logs via sqlite3 CLI (not embedded in prompt)
And it warns about large log volumes and smart querying strategy
```

### AC3: Log Delta Tracking
```gherkin
Given a successful scan completes
When Claude returns status="SUCCESS" with max_log_id_processed
Then the scan record in self_monitoring_scans is updated with log_id_end
And the next scan will use this as log_id_start

Given a scan fails (status="FAILURE")
When the scan record is updated
Then log_id_end is NOT advanced
And the next scan retries from the same log_id_start
```

### AC4: Issue Classification
```gherkin
Given Claude analyzes log entries
When issues are identified
Then they are classified as one of: server_bug, client_misuse, documentation_gap
And server_bug issues get [BUG] prefix
And client_misuse issues get [CLIENT] prefix  
And documentation_gap issues get [DOCS] prefix and reference specific MCP tool docs
```

### AC5: Duplicate Detection Algorithm

```gherkin
Given Claude receives the deduplication context
When a potential new issue is identified from logs
Then Claude applies the three-tier deduplication algorithm:

Tier 1 - Error Code Match (Exact):
  Given a log entry contains [ERROR_CODE] (e.g., [GIT-SYNC-001])
  When an existing open issue title contains the same error code
  Then the issue is a DUPLICATE - skip creation
  And increment duplicates_skipped counter

Tier 2 - Fingerprint Match (Structural):
  Given a log entry without error code OR no Tier 1 match
  When fingerprint is computed as: hash(classification + source_file + error_type)
  And an existing issue has matching fingerprint in metadata
  Then the issue is a DUPLICATE - skip creation

Tier 3 - Semantic Similarity (Fallback):
  Given no Tier 1 or Tier 2 match
  When Claude compares the proposed issue against existing issues
  And semantic similarity exceeds 85% threshold on:
    - Error message pattern (ignoring variable data like IDs, timestamps)
    - Affected component/subsystem
    - Root cause description
  Then the issue is a POTENTIAL DUPLICATE
  And Claude adds comment to existing issue instead of creating new one
  And reports in potential_duplicates_commented count

No Match:
  Given no match in any tier
  Then create new GitHub issue
  And store fingerprint in self_monitoring_issues table
```

### AC5b: Deduplication Context Assembly

```gherkin
Given Claude prompt is assembled for log analysis
When deduplication context is included
Then it contains:
  - List of all open GitHub issues with labels [BUG], [CLIENT], [DOCS]
  - For each issue: number, title, body (first 500 chars), labels, created_at
  - Fingerprints from self_monitoring_issues table for closed issues (last 90 days)
  - Error code registry from Story #86 (for mapping codes to definitions)

And Claude is instructed to:
  - Extract [ERROR_CODE] from log entries as primary identifier
  - Compute fingerprint for issues without error codes
  - Check all three tiers before deciding to create
  - Prefer commenting on existing issues over creating duplicates
```

### AC5c: Issue Metadata for Future Deduplication

```gherkin
Given a new issue is created by self-monitoring
When the issue metadata is stored in self_monitoring_issues
Then it includes:
  - github_issue_number, github_issue_url
  - classification (server_bug, client_misuse, documentation_gap)
  - error_codes (array of [ERROR_CODE] values found in source logs)
  - fingerprint (computed hash for Tier 2 matching)
  - source_log_ids (array of log IDs that triggered this issue)
  - source_files (array of source files mentioned in logs)
  - created_at

And this metadata is used in future scans for deduplication
And metadata is retained for 90 days after issue is closed
```

### AC6: Claude Response Format
```gherkin
Given Claude completes log analysis
When it returns the result
Then the response is valid JSON with: status, max_log_id_processed, issues_created (array of issue IDs), duplicates_skipped, potential_duplicates_commented, error
And status is "SUCCESS" if analysis completed (even with zero issues)
And status is "FAILURE" if analysis could not complete
```

## Technical Requirements

### Files to Create
- src/code_indexer/server/self_monitoring/issue_manager.py (extended copy)
- src/code_indexer/server/self_monitoring/scanner.py (Claude job assembly)
- src/code_indexer/server/self_monitoring/standards/bug_report_standards.md

### Path Derivation
- Server data dir: config_service.config_manager.server_dir
- Logs database: {server_dir}/logs.db
- Embedded assets: Path(__file__).parent for self_monitoring package

### Logs Database Schema Reference
```sql
logs (
    id INTEGER PRIMARY KEY,
    timestamp TEXT,
    level TEXT,  -- DEBUG, INFO, WARNING, ERROR, CRITICAL
    source TEXT,
    message TEXT,
    correlation_id TEXT,
    extra_data TEXT
)
```

### Deduplication Algorithm Implementation Notes

**Tier 1 (Error Code)**: Fast, deterministic. Requires Story #86 completion for error codes.

**Tier 2 (Fingerprint)**: Hash computation: `sha256(classification + ":" + source_file + ":" + error_type)`. Stored in `self_monitoring_issues.fingerprint` column.

**Tier 3 (Semantic)**: Claude's judgment call. Prompt instructs to normalize messages (remove timestamps, IDs, paths) before comparison. 85% threshold means 3+ of 4 key attributes must match.

### Dependencies

- **Story #86 (Log Context Inventory)**: PREREQUISITE - Provides error codes for Tier 1 deduplication
- Error code registry must be available before this story can implement Tier 1 matching

## Testing Requirements

- Unit tests for issue_manager.py SQLite extension
- Unit tests for prompt assembly
- Unit tests for response parsing
- Unit tests for fingerprint computation
- Integration test with mock Claude response
- E2E test creating actual GitHub issue (in test repo)
- Test for duplicate detection across all three tiers

## Definition of Done

- [ ] Embedded assets copied and extended
- [ ] Claude prompt assembly includes all required context
- [ ] Log delta tracking works correctly
- [ ] Issue classification produces correct prefixes
- [ ] Three-tier duplicate detection implemented and documented
- [ ] Fingerprint storage in self_monitoring_issues works
- [ ] Response parsing handles success and failure
- [ ] All tests pass
- [ ] Code review approved


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[STORY] Log Analysis and Issue Creation #73

Feature: Self-Monitoring Core Infrastructure

Story: Log Analysis and Issue Creation

Overview

Acceptance Criteria

AC1: Embedded Assets

AC2: Claude Prompt Assembly

AC3: Log Delta Tracking

AC4: Issue Classification

AC5: Duplicate Detection Algorithm

AC5b: Deduplication Context Assembly

AC5c: Issue Metadata for Future Deduplication

AC6: Claude Response Format

Technical Requirements

Files to Create

Path Derivation

Logs Database Schema Reference

Deduplication Algorithm Implementation Notes

Dependencies

Testing Requirements

Definition of Done

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[STORY] Log Analysis and Issue Creation #73

Description

Feature: Self-Monitoring Core Infrastructure

Story: Log Analysis and Issue Creation

Overview

Acceptance Criteria

AC1: Embedded Assets

AC2: Claude Prompt Assembly

AC3: Log Delta Tracking

AC4: Issue Classification

AC5: Duplicate Detection Algorithm

AC5b: Deduplication Context Assembly

AC5c: Issue Metadata for Future Deduplication

AC6: Claude Response Format

Technical Requirements

Files to Create

Path Derivation

Logs Database Schema Reference

Deduplication Algorithm Implementation Notes

Dependencies

Testing Requirements

Definition of Done

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions