-
Notifications
You must be signed in to change notification settings - Fork 5
Closed
Labels
phase-2.5Phase 2.5: Agent Execution Redesign (ReAct)Phase 2.5: Agent Execution Redesign (ReAct)phase-2.5-FPhase F: Validation & Default SwitchPhase F: Validation & Default Switchqualitytesting
Description
Context
Re-running the end-to-end CLI validation (#353) with the ReAct agent revealed that all 12 tasks completed successfully (DONE status, 0 blockers, 18 min total), but the generated code has 3 categories of bugs that the verification gates failed to catch. The orchestration pipeline works; the quality assurance has gaps.
Test run: cf-test project, todo app (FastAPI + Next.js), serial strategy, 12 LLM-generated tasks.
Bugs the gates missed
| Bug | Root Cause | Why Gate Missed It |
|---|---|---|
init_db() signature mismatch — test passes arg, function takes 0 |
Agent wrote tests against an imagined API | pytest gate likely saw import errors and treated as "no tests" rather than failure |
Double /api/api/ prefix — APIRouter(prefix="/api") + include_router(prefix="/api") |
Cross-file inconsistency between tasks | No integration/smoke test gate exists |
TypeScript type error — completed passed to CreateTodoRequest which only has title |
Frontend tasks didn't cross-check the API service types | No tsc gate exists in the gate system |
Task ordering — requirements.txt generated as task 11 but tests ran in task 6 |
Serial strategy uses priority order, not dependency order | No dep-install-before-test enforcement |
Requirements
1. Add TypeScript type-check gate
- Detect
tsconfig.jsonin workspace → runnpx tsc --noEmitornpm run type-checkif script exists - Add to
_detect_available_gates()ingates.py - Timeout: 2 min (same as mypy)
- This would have caught the
CreateTodoRequesttype error
2. Harden pytest gate failure handling
- If pytest returns non-zero due to import errors or collection errors, the gate must report FAILED (not skip)
- Distinguish between "no tests found" (acceptable) vs "tests errored during collection" (failure)
- Parse pytest output for
ERRORvsno tests ranpatterns
3. Add build verification gate
- Python:
python -c "from main import app"(or detect entry point) - Node/TS:
npm run buildif build script exists in package.json - This catches import errors, missing deps, and type errors in one shot
- Run as part of final verification, not incremental
4. Add end-of-batch validation step
- After
BATCH_COMPLETEDinconductor.py, re-run all gates against the full workspace - This catches cross-task inconsistencies that per-task gates miss
- Report batch as
PARTIALif post-batch gates fail (tasks DONE but integration broken) - Hook point: after the final
BATCH_COMPLETEDevent emission
5. Ensure dependency installation before test gates
- Before running pytest/npm-test gates, check if dependencies are installed
- If
requirements.txtexists but no venv/site-packages → runpip install -r requirements.txt - If
package.jsonexists but nonode_modules/→ runnpm install - This is a prerequisite step in the gate runner, not a separate gate
Acceptance Criteria
-
tsc --noEmitgate auto-detected and run for TypeScript projects - pytest gate fails on collection/import errors (not silent skip)
- Build gate (
npm run build/ Python import check) runs during final verification - Conductor runs full gate sweep after batch completion
- Gate runner installs missing deps before running test gates
- Re-run the cf-test todo app scenario — all 3 bug categories either caught or prevented
Ordering
This issue must be completed before:
- [Phase 2.5-F] Verify ReAct engine works via API routes #354 — Verify ReAct engine works via API routes
- [Phase 2.5-F] Switch default engine to react and update documentation #355 — Switch default engine to react and update documentation
The gate hardening ensures that the validation in #354 is meaningful and the engine switch in #355 is backed by real quality checks.
Test Plan
- Run the same cf-test todo app scenario after implementing fixes
- Verify that the agent either:
- Produces correct code (gates prevent the bugs during self-correction), OR
- Gates catch the bugs and tasks are marked FAILED (not false DONE)
- Verify batch-level validation catches cross-task inconsistencies
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
phase-2.5Phase 2.5: Agent Execution Redesign (ReAct)Phase 2.5: Agent Execution Redesign (ReAct)phase-2.5-FPhase F: Validation & Default SwitchPhase F: Validation & Default Switchqualitytesting