Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
111 changes: 90 additions & 21 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# CodeFRAME Development Guidelines (v2 Reset)

Last updated: 2026-02-03
Last updated: 2026-02-15

This repo is in an **in-place v2 refactor** ("strangler rewrite"). The goal is to deliver a **headless, CLI-first Golden Path** and treat all UI/server layers as optional adapters.

**Status: Phase 1 Complete ✅ | Phase 2 Complete ✅** - Server layer with full REST API, authentication, rate limiting, and real-time streaming. See `docs/V2_STRATEGIC_ROADMAP.md` for the 5-phase plan.
**Status: Phase 1 ✅ | Phase 2 ✅ | Phase 2.5 ✅** - ReAct agent is default engine. Server layer with full REST API, authentication, rate limiting, and real-time streaming. See `docs/V2_STRATEGIC_ROADMAP.md` for the 5-phase plan.

If you are an agent working in this repo: **do not improvise architecture**. Follow the documents listed below.

Expand All @@ -31,13 +31,14 @@ If you are an agent working in this repo: **do not improvise architecture**. Fol

---

## Current Reality (Phase 1 & 2 Complete)
## Current Reality (Phase 1, 2 & 2.5 Complete)

### What's Working Now
- **Full agent execution**: `cf work start <task-id> --execute`
- **Full agent execution**: `cf work start <task-id> --execute` (uses ReAct engine by default)
- **Engine selection**: `--engine react` (default) or `--engine plan` (legacy)
- **Verbose mode**: `cf work start <task-id> --execute --verbose` shows detailed progress
- **Dry run mode**: `cf work start <task-id> --execute --dry-run`
- **Self-correction loop**: Agent automatically fixes failing verification gates (up to 3 attempts)
- **Self-correction loop**: Agent automatically fixes failing verification gates (up to 5 attempts with ReAct)
Comment on lines +34 to +41
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Align self-correction retry count across the doc.
Line 41 says “up to 5 attempts,” but the later “Agent Self-Correction & Observability” section still states 3 retries. Please reconcile the count in one place or the other.

🤖 Prompt for AI Agents
In `@CLAUDE.md` around lines 34 - 41, The document is inconsistent: the "What's
Working Now" section mentions "up to 5 attempts" while the "Agent
Self-Correction & Observability" section still says "3 retries"; update the
latter so both places match (change the text in the "Agent Self-Correction &
Observability" section from "3 retries" to "up to 5 attempts" or vice versa if
you prefer 3), ensuring the phrases "up to 5 attempts" and "Agent
Self-Correction & Observability" are consistent across the file.

- **FAILED task status**: Tasks can transition to FAILED for proper error visibility
- **Tech stack configuration**: `cf init . --detect` auto-detects tech stack from project files
- **Project preferences**: Agent loads AGENTS.md or CLAUDE.md for per-project configuration
Expand Down Expand Up @@ -79,9 +80,12 @@ If you are an agent working in this repo: **do not improvise architecture**. Fol
```
codeframe/
├── core/ # Headless domain + orchestration (NO FastAPI imports)
│ ├── agent.py # Agent orchestrator with blocker detection
│ ├── planner.py # LLM-powered implementation planning
│ ├── executor.py # Code execution engine with rollback
│ ├── react_agent.py # ReAct agent (default engine) - observe-think-act loop
│ ├── tools.py # Tool definitions for ReAct agent (7 tools)
│ ├── editor.py # Search-replace file editor with fuzzy matching
│ ├── agent.py # Legacy plan-based agent (--engine plan)
│ ├── planner.py # LLM-powered implementation planning (plan engine)
│ ├── executor.py # Code execution engine with rollback (plan engine)
│ ├── context.py # Task context loader with relevance scoring
│ ├── tasks.py # Task management with depends_on field
│ ├── blockers.py # Human-in-the-loop blocker system
Expand Down Expand Up @@ -200,14 +204,17 @@ At all times:

| Component | File | Purpose |
|-----------|------|---------|
| **ReactAgent** | **`core/react_agent.py`** | **Default engine: observe-think-act loop with tool use** |
| **Tools** | **`core/tools.py`** | **7 agent tools: read/edit/create file, run command/tests, search, list** |
| **Editor** | **`core/editor.py`** | **Search-replace editor with 4-level fuzzy matching** |
| LLM Adapter | `adapters/llm/base.py` | Protocol, ModelSelector, Purpose enum |
| Anthropic Provider | `adapters/llm/anthropic.py` | Claude integration with streaming |
| Mock Provider | `adapters/llm/mock.py` | Testing with call tracking |
| Context Loader | `core/context.py` | Codebase scanning, relevance scoring |
| Planner | `core/planner.py` | Task → ImplementationPlan via LLM |
| Executor | `core/executor.py` | File ops, shell commands, rollback |
| Agent | `core/agent.py` | Orchestration loop, blocker detection |
| Runtime | `core/runtime.py` | Run lifecycle, agent invocation |
| Planner | `core/planner.py` | Task → ImplementationPlan via LLM (plan engine) |
| Executor | `core/executor.py` | File ops, shell commands, rollback (plan engine) |
| Agent (legacy) | `core/agent.py` | Plan-based orchestration (--engine plan) |
| Runtime | `core/runtime.py` | Run lifecycle, engine selection, agent invocation |
| Conductor | `core/conductor.py` | Batch orchestration, worker pool |
| Dependency Graph | `core/dependency_graph.py` | DAG operations, topological sort |
| Dependency Analyzer | `core/dependency_analyzer.py` | LLM-based dependency inference |
Expand All @@ -228,13 +235,50 @@ Task-based heuristic via `Purpose` enum:

Future: `cf tasks set provider <id> <provider>` for per-task override.

### Execution Flow
```
### Engine Selection

CodeFRAME supports two execution engines, selected via `--engine`:

| Engine | Flag | Pattern | Best For |
|--------|------|---------|----------|
| **ReAct** (default) | `--engine react` | Observe → Think → Act loop | Most tasks, adaptive execution |
| **Plan** (legacy) | `--engine plan` | Plan all steps → Execute sequentially | Well-defined, predictable tasks |

### Execution Flow (ReAct — default)
```text
cf work start <id> --execute [--verbose]
├── runtime.start_task_run() # Creates run, transitions task→IN_PROGRESS
└── runtime.execute_agent(verbose=True/False)
└── runtime.execute_agent(engine="react")
└── ReactAgent.run(task_id)
├── Load context (PRD, codebase, blockers, AGENTS.md, tech_stack)
├── Build layered system prompt
└── Tool-use loop (until complete/blocked/failed):
├── LLM decides next action (tool call)
├── Execute tool: read_file, edit_file, create_file,
│ run_command, run_tests, search_codebase, list_files
├── Observe result → feed back to LLM
├── Incremental verification (ruff after file changes)
└── Token budget management (3-tier compaction)
└── Final verification with self-correction (up to 5 retries)
└── Update run/task status based on agent result
├── COMPLETED → complete_run() → task→DONE
├── BLOCKED → block_run() → task→BLOCKED
└── FAILED → fail_run() → task→FAILED
```

### Execution Flow (Plan — legacy, `--engine plan`)
```text
cf work start <id> --execute --engine plan
├── runtime.start_task_run()
└── runtime.execute_agent(engine="plan")
├── agent.run(task_id)
│ ├── Load context (PRD, codebase, blockers, AGENTS.md)
Expand Down Expand Up @@ -289,7 +333,8 @@ cf tasks show <id>

# Work execution (single task)
cf work start <task-id> # Creates run record
cf work start <task-id> --execute # Runs AI agent
cf work start <task-id> --execute # Runs AI agent (ReAct engine, default)
cf work start <task-id> --execute --engine plan # Use legacy plan engine
cf work start <task-id> --execute --verbose # With detailed output
cf work start <task-id> --execute --dry-run # Preview changes
cf work stop <task-id> # Cancel stale run
Expand All @@ -298,13 +343,14 @@ cf work follow <task-id> # Stream real-time output
cf work follow <task-id> --tail 50 # Show last 50 lines then stream

# Batch execution (multiple tasks)
cf work batch run <id1> <id2> ... # Execute multiple tasks
cf work batch run <id1> <id2> ... # Execute multiple tasks (ReAct default)
cf work batch run --all-ready # All READY tasks
cf work batch run --all-ready --engine plan # Use legacy plan engine
cf work batch run --strategy serial # Serial (default)
cf work batch run --strategy parallel # Parallel execution
cf work batch run --strategy auto # LLM-inferred dependencies
cf work batch run --max-parallel 4 # Concurrent limit
cf work batch run --retry 3 # Auto-retry failures
cf work batch run --retry 3 # Auto-retry failures
cf work batch status [batch_id] # Show batch status
cf work batch cancel <batch_id> # Cancel running batch
cf work batch resume <batch_id> # Re-run failed tasks
Expand Down Expand Up @@ -360,6 +406,11 @@ Do not expand frontend scope during Golden Path work.
- `docs/AGENT_IMPLEMENTATION_TASKS.md` - Agent system components
- `docs/V2_STRATEGIC_ROADMAP.md` - 5-phase plan from CLI to multi-agent

### Agent Architecture (Phase 2.5)
- `docs/AGENT_V3_UNIFIED_PLAN.md` - ReAct architecture design and rules
- `docs/REACT_AGENT_ARCHITECTURE.md` - Deep-dive: tools, editor, token management
- `docs/REACT_AGENT_ANALYSIS.md` - Golden path test run analysis

### API Documentation (Phase 2)
- `/docs` - Swagger UI (interactive API explorer)
- `/redoc` - ReDoc (readable API documentation)
Expand Down Expand Up @@ -406,19 +457,37 @@ If you are unsure which direction to take, default to:

---

## Recent Updates (2026-02-03)
## Recent Updates (2026-02-15)

### Phase 2 Complete: Server Layer
All Phase 2 deliverables are complete:
### Phase 2.5 Complete: ReAct Agent Architecture (#355)
Default execution engine switched from plan-based to **ReAct (Reasoning + Acting)**.

**What changed:**
- Default engine is now `"react"` — all `cf work start --execute` and `cf work batch run` commands use ReactAgent
- Legacy plan engine available via `--engine plan` flag
- ReactAgent uses iterative tool-use loop (observe → think → act) instead of plan-all-then-execute
- 7 structured tools: `read_file`, `edit_file`, `create_file`, `run_command`, `run_tests`, `search_codebase`, `list_files`
- Search-replace editing with 4-level fuzzy matching (exact → whitespace-normalized → indentation-agnostic → fuzzy)
- Token budget management with 3-tier compaction
- Adaptive iteration budget based on task complexity

**Phase 2.5 deliverables:**
- ✅ ReAct agent implementation (`core/react_agent.py`, `core/tools.py`, `core/editor.py`)
- ✅ CLI `--engine` flag (#353)
- ✅ API engine parameter (#354)
- ✅ Default switch to react + documentation (#355)

| Phase | Focus | Status |
|-------|-------|--------|
| 1 | CLI Completion | ✅ **Complete** |
| 2 | Server Layer | ✅ **Complete** |
| 2.5 | ReAct Agent | ✅ **Complete** |
| 3 | Web UI Rebuild | Planned |
| 4 | Multi-Agent Coordination | Planned |
| 5 | Advanced Features | Planned |

### Phase 2 Complete: Server Layer (2026-02-03)

**Phase 2 deliverables completed:**
- ✅ Server audit and refactor (#322) - 15 v2 routers following thin adapter pattern
- ✅ API key authentication (#326) - Scopes: read/write/admin
Expand Down
14 changes: 7 additions & 7 deletions codeframe/cli/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -1994,9 +1994,9 @@ def work_start(
help="Run stub agent (for testing, does nothing real)",
),
engine: str = typer.Option(
"plan",
"react",
"--engine",
help="Agent engine: 'plan' (default, step-based) or 'react' (ReAct tool-use loop)",
help="Agent engine: 'react' (default, ReAct tool-use loop) or 'plan' (legacy step-based)",
),
) -> None:
"""Start working on a task.
Expand All @@ -2007,7 +2007,7 @@ def work_start(
Example:
codeframe work start abc123
codeframe work start abc123 --execute
codeframe work start abc123 --execute --engine react
codeframe work start abc123 --execute --engine plan
codeframe work start abc123 --execute --dry-run
codeframe work start abc123 --execute --verbose
"""
Expand Down Expand Up @@ -2056,7 +2056,7 @@ def work_start(
mode = "[dim](dry run)[/dim]" if dry_run else ""
debug_mode = " [dim](debug logging enabled)[/dim]" if debug else ""
verbose_mode = " [dim](verbose)[/dim]" if verbose else ""
engine_mode = f" [dim](engine={engine})[/dim]" if engine != "plan" else ""
engine_mode = f" [dim](engine={engine})[/dim]" if engine != "react" else ""
console.print(f"\n[bold]Executing agent...{mode}{debug_mode}{verbose_mode}{engine_mode}[/bold]")

try:
Expand Down Expand Up @@ -2860,9 +2860,9 @@ def batch_run(
help="Run verification gates (pytest, ruff) after successful batch completion",
),
engine: str = typer.Option(
"plan",
"react",
"--engine",
help="Agent engine: 'plan' (default, step-based) or 'react' (ReAct tool-use loop)",
help="Agent engine: 'react' (default, ReAct tool-use loop) or 'plan' (legacy step-based)",
),
) -> None:
"""Execute multiple tasks in batch.
Expand All @@ -2876,7 +2876,7 @@ def batch_run(
codeframe work batch run task1 task2 task3
codeframe work batch run --all-ready
codeframe work batch run --all-ready --strategy serial
codeframe work batch run --all-ready --engine react
codeframe work batch run --all-ready --engine plan
codeframe work batch run task1 task2 --dry-run
codeframe work batch run task1 task2 --retry 2
"""
Expand Down
8 changes: 4 additions & 4 deletions codeframe/core/conductor.py
Original file line number Diff line number Diff line change
Expand Up @@ -456,7 +456,7 @@ class BatchRun:
started_at: datetime
completed_at: Optional[datetime]
results: dict[str, str] = field(default_factory=dict)
engine: str = "plan"
engine: str = "react"


def start_batch(
Expand All @@ -468,7 +468,7 @@ def start_batch(
dry_run: bool = False,
max_retries: int = 0,
on_event: Optional[Callable[[str, dict], None]] = None,
engine: str = "plan",
engine: str = "react",
) -> BatchRun:
"""Start a batch execution of multiple tasks.

Expand All @@ -481,7 +481,7 @@ def start_batch(
dry_run: If True, don't actually execute tasks
max_retries: Max retry attempts for failed tasks (0 = no retries)
on_event: Optional callback for batch events
engine: Agent engine to use ("plan" or "react")
engine: Agent engine to use ("react" default, or "plan" for legacy)

Returns:
BatchRun with results populated
Expand Down Expand Up @@ -1699,7 +1699,7 @@ def _execute_task_subprocess(
workspace: Workspace,
task_id: str,
batch_id: Optional[str] = None,
engine: str = "plan",
engine: str = "react",
) -> str:
"""Execute a single task via subprocess.

Expand Down
4 changes: 2 additions & 2 deletions codeframe/core/runtime.py
Original file line number Diff line number Diff line change
Expand Up @@ -596,7 +596,7 @@ def execute_agent(
verbose: bool = False,
fix_coordinator: Optional["GlobalFixCoordinator"] = None,
event_publisher: Optional["EventPublisher"] = None,
engine: str = "plan",
engine: str = "react",
) -> "AgentState":
"""Execute a task using the agent orchestrator.

Expand All @@ -611,7 +611,7 @@ def execute_agent(
verbose: If True, print detailed progress to stdout
fix_coordinator: Optional coordinator for global fixes (for parallel execution)
event_publisher: Optional EventPublisher for SSE streaming (real-time events)
engine: Agent engine to use ("plan" for existing Agent, "react" for ReactAgent)
engine: Agent engine to use ("react" for ReactAgent (default), "plan" for legacy Agent)

Returns:
Final AgentState after execution
Expand Down
10 changes: 5 additions & 5 deletions codeframe/ui/routers/tasks_v2.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,8 +52,8 @@ class ApproveTasksRequest(BaseModel):
description="Whether to start batch execution after approval",
)
engine: str = Field(
"plan",
description="Execution engine: 'plan' (default) or 'react' (ReAct loop)",
"react",
description="Execution engine: 'react' (default, ReAct loop) or 'plan' (legacy step-based)",
)

@model_validator(mode="after")
Expand Down Expand Up @@ -109,8 +109,8 @@ class StartExecutionRequest(BaseModel):
description="Number of retries for failed tasks",
)
engine: str = Field(
"plan",
description="Execution engine: 'plan' (default) or 'react' (ReAct loop)",
"react",
description="Execution engine: 'react' (default, ReAct loop) or 'plan' (legacy step-based)",
)

@model_validator(mode="after")
Expand Down Expand Up @@ -586,7 +586,7 @@ async def start_single_task(
execute: bool = Query(False, description="Run agent execution (requires ANTHROPIC_API_KEY)"),
dry_run: bool = Query(False, description="Preview changes without making them"),
verbose: bool = Query(False, description="Show detailed progress output"),
engine: Literal["plan", "react"] = Query("plan", description="Execution engine: 'plan' (default) or 'react' (ReAct loop)"),
engine: Literal["plan", "react"] = Query("react", description="Execution engine: 'react' (default, ReAct loop) or 'plan' (legacy step-based)"),
workspace: Workspace = Depends(get_v2_workspace),
) -> dict[str, Any]:
"""Start a single task run.
Expand Down
6 changes: 3 additions & 3 deletions docs/AGENT_V3_UNIFIED_PLAN.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Agent V3: Unified Architectural Plan

**Date**: 2026-02-07
**Status**: Final DraftSynthesized from research team debate
**Status**: ✅ ImplementedDefault engine since 2026-02-15 (#355)
**Sources**: AGENT_ARCHITECTURE_RESEARCH.md, AGENT_FRAMEWORK_DEEP_DIVE.md, AGENT_ARCHITECTURE_CRITIQUE.md, REACT_AGENT_ARCHITECTURE.md

---
Expand All @@ -27,7 +27,7 @@ This plan redesigns CodeFRAME's agent execution from Plan-and-Execute to a **Hyb
3. **Lint after every file change** — catch errors immediately, not after 92 accumulate
4. **Model is the planner** — the LLM decides what to do next based on observed reality
5. **Fewer tools = higher accuracy** — 7 focused tools, not a large surface area
6. **Backward compatible** — `--engine plan` preserved as default until ReAct is validated
6. **Backward compatible** — `--engine plan` available as fallback (ReAct is now default)

---

Expand All @@ -39,7 +39,7 @@ This plan redesigns CodeFRAME's agent execution from Plan-and-Execute to a **Hyb
cf work start <task-id> --execute [--engine react]
├── runtime.start_task_run()
│ └── Select engine: "plan" (default, existing) or "react" (new)
│ └── Select engine: "react" (default) or "plan" (legacy)
└── runtime.execute_agent(engine="react")
Expand Down
Loading
Loading