-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Summary
The current category coverage detection in LeadAgent._get_category_coverage() uses simple keyword matching to determine which discovery topics have been covered. This approach has limitations:
Current Implementation
category_keywords = {
"problem": ["problem", "solve", "issue", "challenge", "pain", "need"],
"users": ["user", "customer", "audience", "target", "who", "people"],
"features": ["feature", "function", "capability", "able to", "requirement"],
"constraints": ["constraint", "limit", "budget", "timeline", "restriction"],
"tech_stack": ["tech", "stack", "language", "framework", "database", "tool"],
}Limitations
- False positives: Answer "I don't have any problem with the current solution" would mark "problem" as covered
- False negatives: Detailed problem descriptions without keyword matches get missed
- Context ignorance: Doesn't understand semantic meaning of answers
- Language sensitivity: Only works with English keywords
Proposed Improvements
Option 1: AI-Powered Classification (Recommended)
Use Claude to analyze each answer and determine which categories it addresses:
def _classify_answer_categories(self, answer: str, question: str) -> List[str]:
"""Use AI to classify which categories an answer addresses."""
prompt = f"""Analyze this Q&A from a software project discovery session.
Question: {question}
Answer: {answer}
Which of these categories does the answer provide meaningful information about?
- problem: What problem the application solves
- users: Who the target users are
- features: What features are required
- constraints: Technical or business constraints
- tech_stack: Preferred technologies
Return only the category names that are meaningfully addressed, comma-separated.
If none apply, return "none".
"""
# Call AI and parse responseBenefits:
- Semantic understanding of answers
- Handles varied phrasings
- Can detect partial coverage
- Language agnostic
Drawbacks:
- Additional API calls (cost/latency)
- Non-deterministic
Option 2: Enhanced Keyword Matching
Improve keyword lists and add:
- Phrase matching ("target audience", "end users")
- Negative keyword exclusion ("don't have a problem")
- Synonym expansion
- Answer length thresholds per category
Option 3: Hybrid Approach
Use keyword matching first, then AI classification only for ambiguous cases.
Acceptance Criteria
- Category detection correctly identifies topic coverage from semantic meaning
- False positive rate reduced (e.g., "no problem" doesn't mark problem covered)
- Tests updated to verify improved accuracy
- Document the classification approach in
docs/discovery-socratic-methodology.md
Related
- PR feat(discovery): implement AI-powered Socratic questioning system #257: Initial Socratic discovery implementation
- Current location:
codeframe/agents/lead_agent.py:_get_category_coverage()
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request