Skip to content

[Phase 2.5-F] Harden verification gates and add batch-level QA #385

@frankbria

Description

@frankbria

Context

Re-running the end-to-end CLI validation (#353) with the ReAct agent revealed that all 12 tasks completed successfully (DONE status, 0 blockers, 18 min total), but the generated code has 3 categories of bugs that the verification gates failed to catch. The orchestration pipeline works; the quality assurance has gaps.

Test run: cf-test project, todo app (FastAPI + Next.js), serial strategy, 12 LLM-generated tasks.

Bugs the gates missed

Bug Root Cause Why Gate Missed It
init_db() signature mismatch — test passes arg, function takes 0 Agent wrote tests against an imagined API pytest gate likely saw import errors and treated as "no tests" rather than failure
Double /api/api/ prefix — APIRouter(prefix="/api") + include_router(prefix="/api") Cross-file inconsistency between tasks No integration/smoke test gate exists
TypeScript type error — completed passed to CreateTodoRequest which only has title Frontend tasks didn't cross-check the API service types No tsc gate exists in the gate system
Task ordering — requirements.txt generated as task 11 but tests ran in task 6 Serial strategy uses priority order, not dependency order No dep-install-before-test enforcement

Requirements

1. Add TypeScript type-check gate

  • Detect tsconfig.json in workspace → run npx tsc --noEmit or npm run type-check if script exists
  • Add to _detect_available_gates() in gates.py
  • Timeout: 2 min (same as mypy)
  • This would have caught the CreateTodoRequest type error

2. Harden pytest gate failure handling

  • If pytest returns non-zero due to import errors or collection errors, the gate must report FAILED (not skip)
  • Distinguish between "no tests found" (acceptable) vs "tests errored during collection" (failure)
  • Parse pytest output for ERROR vs no tests ran patterns

3. Add build verification gate

  • Python: python -c "from main import app" (or detect entry point)
  • Node/TS: npm run build if build script exists in package.json
  • This catches import errors, missing deps, and type errors in one shot
  • Run as part of final verification, not incremental

4. Add end-of-batch validation step

  • After BATCH_COMPLETED in conductor.py, re-run all gates against the full workspace
  • This catches cross-task inconsistencies that per-task gates miss
  • Report batch as PARTIAL if post-batch gates fail (tasks DONE but integration broken)
  • Hook point: after the final BATCH_COMPLETED event emission

5. Ensure dependency installation before test gates

  • Before running pytest/npm-test gates, check if dependencies are installed
  • If requirements.txt exists but no venv/site-packages → run pip install -r requirements.txt
  • If package.json exists but no node_modules/ → run npm install
  • This is a prerequisite step in the gate runner, not a separate gate

Acceptance Criteria

  • tsc --noEmit gate auto-detected and run for TypeScript projects
  • pytest gate fails on collection/import errors (not silent skip)
  • Build gate (npm run build / Python import check) runs during final verification
  • Conductor runs full gate sweep after batch completion
  • Gate runner installs missing deps before running test gates
  • Re-run the cf-test todo app scenario — all 3 bug categories either caught or prevented

Ordering

This issue must be completed before:

The gate hardening ensures that the validation in #354 is meaningful and the engine switch in #355 is backed by real quality checks.

Test Plan

  1. Run the same cf-test todo app scenario after implementing fixes
  2. Verify that the agent either:
    • Produces correct code (gates prevent the bugs during self-correction), OR
    • Gates catch the bugs and tasks are marked FAILED (not false DONE)
  3. Verify batch-level validation catches cross-task inconsistencies

Metadata

Metadata

Assignees

No one assigned

    Labels

    phase-2.5Phase 2.5: Agent Execution Redesign (ReAct)phase-2.5-FPhase F: Validation & Default Switchqualitytesting

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions