feat(discovery): Improve category coverage detection with AI classification

## Summary

The current category coverage detection in `LeadAgent._get_category_coverage()` uses simple keyword matching to determine which discovery topics have been covered. This approach has limitations:

### Current Implementation
```python
category_keywords = {
    "problem": ["problem", "solve", "issue", "challenge", "pain", "need"],
    "users": ["user", "customer", "audience", "target", "who", "people"],
    "features": ["feature", "function", "capability", "able to", "requirement"],
    "constraints": ["constraint", "limit", "budget", "timeline", "restriction"],
    "tech_stack": ["tech", "stack", "language", "framework", "database", "tool"],
}
```

### Limitations
1. **False positives**: Answer "I don't have any problem with the current solution" would mark "problem" as covered
2. **False negatives**: Detailed problem descriptions without keyword matches get missed
3. **Context ignorance**: Doesn't understand semantic meaning of answers
4. **Language sensitivity**: Only works with English keywords

### Proposed Improvements

#### Option 1: AI-Powered Classification (Recommended)
Use Claude to analyze each answer and determine which categories it addresses:

```python
def _classify_answer_categories(self, answer: str, question: str) -> List[str]:
    """Use AI to classify which categories an answer addresses."""
    prompt = f"""Analyze this Q&A from a software project discovery session.

Question: {question}
Answer: {answer}

Which of these categories does the answer provide meaningful information about?
- problem: What problem the application solves
- users: Who the target users are
- features: What features are required
- constraints: Technical or business constraints
- tech_stack: Preferred technologies

Return only the category names that are meaningfully addressed, comma-separated.
If none apply, return "none".
"""
    # Call AI and parse response
```

Benefits:
- Semantic understanding of answers
- Handles varied phrasings
- Can detect partial coverage
- Language agnostic

Drawbacks:
- Additional API calls (cost/latency)
- Non-deterministic

#### Option 2: Enhanced Keyword Matching
Improve keyword lists and add:
- Phrase matching ("target audience", "end users")
- Negative keyword exclusion ("don't have a problem")
- Synonym expansion
- Answer length thresholds per category

#### Option 3: Hybrid Approach
Use keyword matching first, then AI classification only for ambiguous cases.

### Acceptance Criteria
- [ ] Category detection correctly identifies topic coverage from semantic meaning
- [ ] False positive rate reduced (e.g., "no problem" doesn't mark problem covered)
- [ ] Tests updated to verify improved accuracy
- [ ] Document the classification approach in `docs/discovery-socratic-methodology.md`

### Related
- PR #257: Initial Socratic discovery implementation
- Current location: `codeframe/agents/lead_agent.py:_get_category_coverage()`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(discovery): Improve category coverage detection with AI classification #258

Summary

Current Implementation

Limitations

Proposed Improvements

Option 1: AI-Powered Classification (Recommended)

Option 2: Enhanced Keyword Matching

Option 3: Hybrid Approach

Acceptance Criteria

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat(discovery): Improve category coverage detection with AI classification #258

Description

Summary

Current Implementation

Limitations

Proposed Improvements

Option 1: AI-Powered Classification (Recommended)

Option 2: Enhanced Keyword Matching

Option 3: Hybrid Approach

Acceptance Criteria

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions