From aae91a40575b0ac522e6878725979f3c532b7b27 Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 25 Feb 2026 13:26:00 +0000 Subject: [PATCH 1/2] Restructure README: reposition single/multi-repo as operating modes, improve flow MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Replace "Single-Repo (Backward Compatible)" framing with equal "Operating Modes" section presenting single-repo and multi-repo as first-class features - Reorder sections for better narrative flow: Hook → Features → Quick Start → How It Works → Benchmark → Examples → Modes → API Reference - Add scannable "Features" section replacing the "Why SWE-AF" list - Add social proof stats bar near the top (95/100 benchmark, $19 cost, 400+ agents) - Consolidate duplicate curl examples (was ~6 places, now deduplicated) - Tighten Quick Start into Railway / Local / Docker subsections - Fold "Adaptive Factory Control" into "How a Build Works" with control loop table - Move benchmark and real-world examples into dedicated sections This is a draft for review — further iteration expected on restoring open-source runtime examples and the clean build-process bullets. https://claude.ai/code/session_01XgBbDVMbE52cKdU1dCNEsV --- README.md | 601 +++++++++++++++++++++--------------------------------- 1 file changed, 236 insertions(+), 365 deletions(-) diff --git a/README.md b/README.md index b9a5fc5..f2790b3 100644 --- a/README.md +++ b/README.md @@ -14,193 +14,45 @@ ![WorldSpace Community Developer](https://img.shields.io/badge/WorldSpace-Community%20Developer-111827?style=for-the-badge) [![Example PR](https://img.shields.io/badge/Example-PR%20%23179-ff6b35?style=for-the-badge&logo=github)](https://github.com/Agent-Field/agentfield/pull/179) - +**One API call → full engineering team → shipped code.**

Quick Start • - Why SWE-AF • - In Action • - Factory Control • - Benchmark • + Features • + How It Works • + Benchmark • + ModesAPI • - Architecture Doc + Architecture

-One API call spins up a full autonomous engineering team that can scope, build, adapt, and ship complex software end to end. -SWE-AF is a first step toward **autonomous software engineering factories**, scaling from simple goals to hard multi-issue programs with hundreds to thousands of agent invocations. +SWE-AF spins up a coordinated fleet of AI agents — product managers, architects, coders, reviewers, testers — that scope, build, adapt, and ship software end to end. No scaffolding, no human-in-the-loop. One goal in, verified PR out.

SWE-AF autonomous engineering fleet banner

-## One-Call DX - -```bash -curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \ - -H "Content-Type: application/json" \ - -d @- <<'JSON' -{ - "input": { - "goal": "Refactor and harden auth + billing flows", - "repo_url": "https://github.com/user/my-project", - "config": { - "runtime": "claude_code", - "models": { - "default": "sonnet", - "coder": "opus", - "qa": "opus" - }, - "enable_learning": true - } - } -} -JSON -``` - -Swap `models.default` and any role key (`coder`, `qa`, `architect`, etc.) to any model your runtime supports. - -## Multi-Repository Workspace Support - -SWE-AF supports coordinated work across multiple repositories in a single build. This is useful when your project consists of a primary application plus shared libraries, monorepo sub-projects, or dependent microservices. - -### Use Cases - -- **Primary App + Shared Libraries**: Build a web application that depends on a shared utilities or SDK library. -- **Monorepo Sub-Projects**: Coordinate changes across multiple packages in a monorepo (each repo_url points to a sub-directory or separate repo). -- **Dependent Microservices**: When a feature spans multiple services (e.g., API + Worker Queue), define roles to orchestrate changes across boundaries. - -### Single-Repo (Backward Compatible) - -Single-repository builds work exactly as before — just use `repo_url` or `repo_path` at the top level: - -```bash -curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \ - -H "Content-Type: application/json" \ - -d '{ - "input": { - "goal": "Add JWT auth", - "repo_url": "https://github.com/user/my-project" - } - }' -``` - -### Multi-Repo Configuration - -Pass `config.repos` as an array of repository objects, each with `repo_url` (or `repo_path`) and a `role`: - -```bash -curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \ - -H "Content-Type: application/json" \ - -d '{ - "input": { - "goal": "Add JWT auth across API and shared-lib", - "config": { - "repos": [ - { - "repo_url": "https://github.com/org/main-app", - "role": "primary" - }, - { - "repo_url": "https://github.com/org/shared-lib", - "role": "dependency" - } - ], - "runtime": "claude_code", - "models": { - "default": "sonnet" - } - } - } - }' -``` - -**Roles:** -- `primary`: The main application being built. Changes here drive the build; failures block progress. -- `dependency`: Libraries or services that may be modified to support the primary repo. Failures are captured but don't block. - -## Autonomous Build Spotlight - -Rust-based Python compiler benchmark (built autonomously): - -| Metric | CPython (subprocess) | RustPython (SWE-AF) | Improvement | -| ---------------------- | -------------------- | ---------------------------- | ----------------------- | -| Steady-state execution | Baseline (~19ms) | Optimized in-process runtime | **88.3x-602.3x faster** | -| Geometric mean | 1.0x baseline | 253.8x | **253.8x** | -| Peak throughput | ~52 ops/s | 31,807 ops/s | **~612x** | - -
-Measurement methodology - -Throughput comparison measures different execution models: CPython subprocess spawn (~19ms per call → ~52 ops/s) vs RustPython pre-warmed interpreter pool (in-process). This is the real-world tradeoff the system was built to optimize — replacing repeated subprocess invocations with a persistent pool for short-snippet execution. - -
- -Artifact trail includes **175 tracked autonomous agents** across planning, coding, review, merge, and verification. - -Details: [`examples/llm-rust-python-compiler-sonnet/README.md`](examples/llm-rust-python-compiler-sonnet/README.md) - -## Why SWE-AF - -Most agent frameworks are harnesses around a single coder loop. SWE-AF is a software engineering factory built from coordinated harnesses. - -- Hardness-aware execution: easy issues pass through quickly, while hard issues trigger deeper adaptation and DAG-level replanning instead of blind retries. -- Factory architecture: planning, execution, and governance agents run as a coordinated control stack. -- Continual learning (optional): with `enable_learning=true`, conventions and failure patterns discovered early are injected into downstream issues. -- Agent-scale parallelism: dependency-level scheduling + isolated git worktrees allow large fan-out without branch collisions. -- Fleet-scale orchestration with AgentField: many SWE-AF nodes can run continuously in parallel, driving thousands of agent invocations across concurrent builds. -- Explicit compromise tracking: when scope is relaxed, debt is typed, severity-rated, and propagated. -- Long-run reliability: checkpointed execution supports `resume_build` after crashes or interruptions. - -## In Action - -[PR #179: Go SDK DID/VC Registration](https://github.com/Agent-Field/agentfield/pull/179) — built entirely by SWE-AF (Claude runtime with haiku-class models). One API call, zero human code. - -| Metric | Value | -| ------------------- | ------------------ | -| Issues completed | 10/10 | -| Tests passing | 217 | -| Acceptance criteria | 34/34 | -| Agent invocations | 79 | -| Model | `claude-haiku-4-5` | -| **Total cost** | **$19.23** | - -
-Cost breakdown by agent role - -| Role | Cost | % | -| ---------------------------------- | ----- | ----- | -| Coder | $5.88 | 30.6% | -| Code Reviewer | $3.48 | 18.1% | -| QA | $1.78 | 9.2% | -| GitHub PR | $1.66 | 8.6% | -| Integration Tester | $1.59 | 8.3% | -| Merger | $1.22 | 6.3% | -| Workspace Ops | $1.77 | 9.2% | -| Planning (PM + Arch + TL + Sprint) | $0.79 | 4.1% | -| Verifier + Finalize | $0.34 | 1.8% | -| Synthesizer | $0.05 | 0.2% | - -79 invocations, 2,070 conversation turns. Planning agents scope and decompose; coders work in parallel isolated worktrees; reviewers and QA validate each issue; merger integrates branches; verifier checks acceptance criteria against the PRD. - -
- -**Claude & open-source models supported**: Run builds with either runtime and tune models per role in one flat config map. -- `runtime: "claude_code"` maps to Claude backend. -- `runtime: "open_code"` maps to OpenCode backend (OpenRouter/OpenAI/Google/Anthropic model IDs). +
-## Adaptive Factory Control +| Scored **95/100** on benchmark | 10/10 issues, **$19** total cost | **400–500+** agents per build | Claude, MiniMax, DeepSeek, Qwen | +|:---:|:---:|:---:|:---:| +| Beats Claude Code (73) & Codex (62) | [Real PR — zero human code](https://github.com/Agent-Field/agentfield/pull/179) | Planning → coding → QA → merge | Any model, any provider | -SWE-AF uses three nested control loops to adapt to task difficulty in real time: +
-| Loop | Scope | Trigger | Action | -| ----------- | ------------- | -------------------- | ---------------------------------------------------------------------------------- | -| Inner loop | Single issue | QA/review fails | Coder retries with feedback | -| Middle loop | Single issue | Inner loop exhausted | `run_issue_advisor` retries with a new approach, splits work, or accepts with debt | -| Outer loop | Remaining DAG | Escalated failures | `run_replanner` restructures remaining issues and dependencies | +## Features -This is the core factory-control behavior: control agents supervise worker agents and continuously reshape the plan as reality changes. +- **Factory, not a wrapper** — Planning, execution, and governance agents run as a coordinated control stack. Not just a coder loop with retries. +- **Hardness-aware execution** — Easy issues pass through fast. Hard issues trigger deeper adaptation and DAG-level replanning instead of blind retries. +- **Multi-model, multi-provider** — Assign different models per role (`coder: opus`, `qa: haiku`). Works with Claude, OpenRouter, OpenAI, and Google. +- **Single-repo and multi-repo modes** — Point at one repository or orchestrate coordinated changes across multiple repos in a single build. +- **Continual learning** — With `enable_learning: true`, conventions and failure patterns discovered early get injected into downstream issues. +- **Agent-scale parallelism** — Dependency-level scheduling + isolated git worktrees allow large fan-out without branch collisions. +- **Self-correcting builds** — Three nested control loops (inner retry → advisor adaptation → DAG replanning) handle failures automatically. +- **Crash recovery** — Checkpointed execution supports `resume_build` after interruptions. +- **Draft PR output** — Pass a `repo_url` and SWE-AF clones, builds, and opens a draft PR on GitHub. ## Quick Start @@ -213,123 +65,93 @@ One click deploys SWE-AF + AgentField control plane + PostgreSQL. Set two enviro - `CLAUDE_CODE_OAUTH_TOKEN` — run `claude setup-token` in [Claude Code CLI](https://docs.anthropic.com/en/docs/claude-code) (uses Pro/Max subscription credits) - `GH_TOKEN` — GitHub personal access token with `repo` scope for draft PR creation -Once deployed, trigger a build: +Then trigger a build: ```bash -curl -X POST https://.up.railway.app/api/v1/execute/async/swe-planner.build \ +curl -X POST https://.up.railway.app/api/v1/execute/async/swe-planner.build \ -H "Content-Type: application/json" \ -H "X-API-Key: this-is-a-secret" \ -d '{"input": {"goal": "Add JWT auth", "repo_url": "https://github.com/user/my-repo"}}' ``` -### 1. Requirements (local) +### Run Locally -- Python 3.12+ -- AgentField control plane (`af`) -- AI provider API key (Anthropic, OpenRouter, OpenAI, or Google) +```bash +# 1. Install +python3.12 -m venv .venv && source .venv/bin/activate +pip install -e ".[dev]" -### 2. Install +# 2. Start the control plane and register the node +af # starts AgentField on :8080 +python -m swe_af # registers node "swe-planner" -```bash -python3.12 -m venv .venv -source .venv/bin/activate -python -m pip install --upgrade pip -python -m pip install -e ".[dev]" +# 3. Trigger a build +curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \ + -H "Content-Type: application/json" \ + -d '{"input": {"goal": "Add JWT auth", "repo_url": "https://github.com/user/my-project"}}' ``` -### 3. Run +### Docker ```bash -af # starts AgentField control plane on :8080 -python -m swe_af # registers node id "swe-planner" +cp .env.example .env # add your API key + optional GH_TOKEN +docker compose up -d + +# Scale workers +docker compose up --scale swe-agent=3 -d ``` -### 4. Trigger a build +Use a host control plane instead of the Docker service: ```bash -# Default (uses Claude) -curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \ - -H "Content-Type: application/json" \ - -d @- <<'JSON' -{ - "input": { - "goal": "Add JWT auth to all API endpoints", - "repo_url": "https://github.com/user/my-project" - } -} -JSON +docker compose -f docker-compose.local.yml up -d +``` -# With open-source runtime + flat role map -curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \ - -H "Content-Type: application/json" \ - -d @- <<'JSON' -{ - "input": { - "goal": "Add JWT auth", - "repo_url": "https://github.com/user/my-project", - "config": { - "runtime": "open_code", - "models": { - "default": "openrouter/minimax/minimax-m2.5" - } - } - } -} -JSON +## How a Build Works -# Local workspace mode (repo_path) + targeted role override -curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \ - -H "Content-Type: application/json" \ - -d @- <<'JSON' -{ - "input": { - "goal": "Refactor and harden auth + billing flows", - "repo_path": "/path/to/repo", - "config": { - "runtime": "claude_code", - "models": { - "default": "sonnet", - "coder": "opus", - "qa": "opus" - }, - "enable_learning": true - } - } -} -JSON +```text +Goal → PM → Architect → Tech Lead → Sprint Planner → Issue DAG + ↓ + ┌────────────────────────────────┘ + ↓ (parallel, isolated worktrees) + ┌─────────┐ + │ Issue N │ → Coder → QA → Reviewer → Synthesizer + └─────────┘ ↑ │ + ↑ └── retry ───────┘ (inner loop) + │ + └── advisor / replanner (middle + outer loops) + ↓ + Merge → Integration Test → Verify → Draft PR ``` -For OpenRouter with `open_code`, use model IDs in `openrouter//` format (for example `openrouter/minimax/minimax-m2.5`). - -## What Happens In One Build +Three nested control loops handle task difficulty in real time: -- Architecture is generated and reviewed before coding starts -- Issues are dependency-sorted and run in parallel across isolated worktrees -- Each issue gets dedicated coder, tester, and reviewer passes -- Failed issues trigger advisor-driven adaptation (split, re-scope, or escalate) -- Escalations trigger replanning of the remaining DAG -- End result is merged, integration-tested, and verified against acceptance criteria +| Loop | Scope | Trigger | Action | +|------|-------|---------|--------| +| **Inner** | Single issue | QA/review fails | Coder retries with feedback | +| **Middle** | Single issue | Inner loop exhausted | Advisor retries with new approach, splits work, or accepts with debt | +| **Outer** | Remaining DAG | Escalated failures | Replanner restructures remaining issues and dependencies |

SWE-AF architecture

-> Typical runs spin up 400-500+ agent instances across planning, execution, QA, and verification. For larger DAGs and repeated adaptation/replanning cycles, SWE-AF can scale into the high hundreds to thousands of agent invocations in a single build. +> Typical runs spin up 400–500+ agent instances across planning, execution, QA, and verification. Larger DAGs and repeated adaptation cycles scale into the thousands. -## Benchmark Snapshot +## Benchmark -**95/100 with haiku and MiniMax**: SWE-AF scored 95/100 with both Claude haiku-class routing ($20) and MiniMax M2.5 via open runtime ($6), outperforming Claude Code sonnet (73), Codex o3 (62), and Claude Code haiku (59) on the same prompt. +**95/100** with both Claude haiku-class routing ($20) and MiniMax M2.5 via open runtime ($6), outperforming Claude Code sonnet (73), Codex o3 (62), and Claude Code haiku (59) on the same prompt. -| Dimension | SWE-AF (haiku) | SWE-AF (MiniMax) | CC Sonnet | Codex (o3) | CC Haiku | -| --------------- | -------------- | ---------------- | --------- | ---------- | -------- | -| Functional (30) | **30** | **30** | **30** | **30** | **30** | -| Structure (20) | **20** | **20** | 10 | 10 | 10 | -| Hygiene (20) | **20** | **20** | 16 | 10 | 7 | -| Git (15) | **15** | **15** | 2 | 2 | 2 | -| Quality (15) | 10 | 10 | **15** | 10 | 10 | -| Total | **95** | **95** | **73** | **62** | **59** | -| **Cost** | **~$20** | **~$6** | ? | ? | ? | -| **Time** | ~30-40 min | 43 min | ? | ? | ? | +| Dimension | SWE-AF (haiku) | SWE-AF (MiniMax) | CC Sonnet | Codex (o3) | CC Haiku | +|-----------|---------------|-----------------|-----------|-----------|---------| +| Functional (30) | **30** | **30** | **30** | **30** | **30** | +| Structure (20) | **20** | **20** | 10 | 10 | 10 | +| Hygiene (20) | **20** | **20** | 16 | 10 | 7 | +| Git (15) | **15** | **15** | 2 | 2 | 2 | +| Quality (15) | 10 | 10 | **15** | 10 | 10 | +| **Total** | **95** | **95** | **73** | **62** | **59** | +| **Cost** | **~$20** | **~$6** | ? | ? | ? | +| **Time** | ~30–40 min | 43 min | ? | ? | ? |
Full benchmark details and reproduction @@ -342,13 +164,13 @@ Same prompt tested across multiple agents. SWE-AF with Claude runtime (haiku-cla ### Scoring framework -| Dimension | Points | What it measures | -| ---------- | ------ | ------------------------------------------------ | -| Functional | 30 | CLI behavior and passing tests | -| Structure | 20 | Modular source layout and test organization | -| Hygiene | 20 | `.gitignore`, clean status, no junk artifacts | -| Git | 15 | Commit discipline and message quality | -| Quality | 15 | Error handling, package metadata, README quality | +| Dimension | Points | What it measures | +|-----------|--------|-----------------| +| Functional | 30 | CLI behavior and passing tests | +| Structure | 20 | Modular source layout and test organization | +| Hygiene | 20 | `.gitignore`, clean status, no junk artifacts | +| Git | 15 | Commit discipline and message quality | +| Quality | 15 | Error handling, package metadata, README quality | ### Reproduction @@ -390,13 +212,13 @@ curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \ JSON # Claude Code (haiku) -claude -p "Build a Node.js CLI todo app with add, list, complete, and delete commands. Data should persist to a JSON file. Initialize git, write tests, and commit your work." --model haiku --dangerously-skip-permissions +claude -p "Build a Node.js CLI todo app ..." --model haiku --dangerously-skip-permissions # Claude Code (sonnet) -claude -p "Build a Node.js CLI todo app with add, list, complete, and delete commands. Data should persist to a JSON file. Initialize git, write tests, and commit your work." --model sonnet --dangerously-skip-permissions +claude -p "Build a Node.js CLI todo app ..." --model sonnet --dangerously-skip-permissions # Codex (gpt-5.3-codex) -codex exec "Build a Node.js CLI todo app with add, list, complete, and delete commands. Data should persist to a JSON file. Initialize git, write tests, and commit your work." --full-auto +codex exec "Build a Node.js CLI todo app ..." --full-auto ``` **MiniMax M2.5 Measured Metrics (Feb 2026):** @@ -410,77 +232,121 @@ Benchmark assets, logs, evaluator, and generated projects live in [`examples/age
-## Docker +## Real-World Examples -```bash -cp .env.example .env -# Add your API key: ANTHROPIC_API_KEY, OPENROUTER_API_KEY, OPENAI_API_KEY, or GOOGLE_API_KEY -# Optionally add GH_TOKEN for draft PR workflow +### PR #179: Go SDK — Built Entirely by SWE-AF -docker compose up -d -``` +[PR #179: Go SDK DID/VC Registration](https://github.com/Agent-Field/agentfield/pull/179) — one API call, zero human code, haiku-class models. -Submit a build: +| Metric | Value | +|--------|-------| +| Issues completed | 10/10 | +| Tests passing | 217 | +| Acceptance criteria | 34/34 | +| Agent invocations | 79 | +| Model | `claude-haiku-4-5` | +| **Total cost** | **$19.23** | -```bash -# Default (Claude) -curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \ - -H "Content-Type: application/json" \ - -d @- <<'JSON' -{ - "input": { - "goal": "Add JWT auth", - "repo_url": "https://github.com/user/my-repo" - } -} -JSON +
+Cost breakdown by agent role -# With open-source runtime (set OPENROUTER_API_KEY in .env) -curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \ - -H "Content-Type: application/json" \ - -d @- <<'JSON' -{ - "input": { - "goal": "Add JWT auth", - "repo_url": "https://github.com/user/my-repo", - "config": { - "runtime": "open_code", - "models": { - "default": "openrouter/minimax/minimax-m2.5" - } - } - } -} -JSON +| Role | Cost | % | +|------|------|---| +| Coder | $5.88 | 30.6% | +| Code Reviewer | $3.48 | 18.1% | +| QA | $1.78 | 9.2% | +| GitHub PR | $1.66 | 8.6% | +| Integration Tester | $1.59 | 8.3% | +| Merger | $1.22 | 6.3% | +| Workspace Ops | $1.77 | 9.2% | +| Planning (PM + Arch + TL + Sprint) | $0.79 | 4.1% | +| Verifier + Finalize | $0.34 | 1.8% | +| Synthesizer | $0.05 | 0.2% | + +79 invocations, 2,070 conversation turns. Planning agents scope and decompose; coders work in parallel isolated worktrees; reviewers and QA validate each issue; merger integrates branches; verifier checks acceptance criteria against the PRD. + +
+ +### Autonomous Build Spotlight + +Rust-based Python compiler benchmark (built autonomously): + +| Metric | CPython (subprocess) | RustPython (SWE-AF) | Improvement | +|--------|---------------------|---------------------|-------------| +| Steady-state execution | Baseline (~19ms) | Optimized in-process runtime | **88.3x–602.3x faster** | +| Geometric mean | 1.0x baseline | 253.8x | **253.8x** | +| Peak throughput | ~52 ops/s | 31,807 ops/s | **~612x** | + +
+Measurement methodology + +Throughput comparison measures different execution models: CPython subprocess spawn (~19ms per call → ~52 ops/s) vs RustPython pre-warmed interpreter pool (in-process). This is the real-world tradeoff the system was built to optimize — replacing repeated subprocess invocations with a persistent pool for short-snippet execution. + +
+ +Artifact trail includes **175 tracked autonomous agents** across planning, coding, review, merge, and verification. + +Details: [`examples/llm-rust-python-compiler-sonnet/README.md`](examples/llm-rust-python-compiler-sonnet/README.md) + +## Operating Modes -# Local workspace mode (repo_path) +SWE-AF works in two modes: point it at a single repository, or orchestrate coordinated changes across multiple repos in one build. + +### Single-Repository Mode + +The default. Pass `repo_url` (remote) or `repo_path` (local) and SWE-AF handles everything: + +```bash curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \ -H "Content-Type: application/json" \ - -d @- <<'JSON' -{ - "input": { - "goal": "Add JWT auth", - "repo_path": "/workspaces/my-repo" - } -} -JSON + -d '{ + "input": { + "goal": "Add JWT auth", + "repo_url": "https://github.com/user/my-project" + } + }' ``` -Scale workers: +### Multi-Repository Mode + +When your work spans multiple codebases — a primary app plus shared libraries, monorepo sub-projects, or dependent microservices — pass `config.repos` as an array with roles: ```bash -docker compose up --scale swe-agent=3 -d +curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \ + -H "Content-Type: application/json" \ + -d '{ + "input": { + "goal": "Add JWT auth across API and shared-lib", + "config": { + "repos": [ + { + "repo_url": "https://github.com/org/main-app", + "role": "primary" + }, + { + "repo_url": "https://github.com/org/shared-lib", + "role": "dependency" + } + ], + "runtime": "claude_code", + "models": { "default": "sonnet" } + } + } + }' ``` -Use a host control plane instead of Docker control-plane service: +**Roles:** +- `primary` — The main application. Changes here drive the build; failures block progress. +- `dependency` — Libraries or services modified to support the primary repo. Failures are captured but don't block. -```bash -docker compose -f docker-compose.local.yml up -d -``` +**Use cases:** +- Primary app + shared SDK or utilities library +- Monorepo sub-projects that live in separate repos +- Feature spanning multiple microservices (e.g., API + worker queue) -## GitHub Repo Workflow (Clone -> Build -> Draft PR) +## One-Call DX -Pass `repo_url` instead of `repo_path` to let SWE-AF clone and open a draft PR after execution. +Every build is a single API call. Swap runtimes and assign models per agent role in one flat config: ```bash curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \ @@ -488,25 +354,35 @@ curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \ -d @- <<'JSON' { "input": { + "goal": "Refactor and harden auth + billing flows", "repo_url": "https://github.com/user/my-project", - "goal": "Add comprehensive test coverage", "config": { "runtime": "claude_code", "models": { "default": "sonnet", "coder": "opus", "qa": "opus" - } + }, + "enable_learning": true } } } JSON ``` -Requirements: +**Runtimes supported:** +- `runtime: "claude_code"` — Claude backend (Anthropic) +- `runtime: "open_code"` — OpenCode backend (OpenRouter/OpenAI/Google/Anthropic model IDs) -- `GH_TOKEN` in `.env` with `repo` scope -- Repo access for that token +For OpenRouter, use model IDs in `openrouter//` format (e.g., `openrouter/minimax/minimax-m2.5`). + +## GitHub Repo Workflow + +Pass `repo_url` instead of `repo_path` to let SWE-AF clone the repo and open a draft PR after the build completes. + +Requirements: +- `GH_TOKEN` environment variable with `repo` scope +- Token must have access to the target repository ## API Reference @@ -516,7 +392,7 @@ Requirements: Core async endpoints (returns an `execution_id` immediately): ```bash -# Full build: plan -> execute -> verify +# Full build: plan → execute → verify POST /api/v1/execute/async/swe-planner.build # Plan only @@ -544,24 +420,24 @@ Every specialist is also callable directly:
Agent execution flow -| Agent | In -> Out | -| ------------------------ | ---------------------------------------------------- | -| `run_product_manager` | goal -> PRD | -| `run_architect` | PRD -> architecture | -| `run_tech_lead` | architecture -> review | -| `run_sprint_planner` | architecture -> issue DAG | -| `run_issue_writer` | issue spec -> detailed issue | -| `run_coder` | issue + worktree -> code + tests + commit | -| `run_qa` | worktree -> test results | -| `run_code_reviewer` | worktree -> quality/security review | -| `run_qa_synthesizer` | QA + review -> FIX / APPROVE / BLOCK | -| `run_issue_advisor` | failure context -> adapt / split / accept / escalate | -| `run_replanner` | build state + failures -> restructured plan | -| `run_merger` | branches -> merged output | -| `run_integration_tester` | merged repo -> integration results | -| `run_verifier` | repo + PRD -> acceptance pass/fail | -| `generate_fix_issues` | failed criteria -> targeted fix issues | -| `run_github_pr` | branch -> push + draft PR | +| Agent | In → Out | +|-------|----------| +| `run_product_manager` | goal → PRD | +| `run_architect` | PRD → architecture | +| `run_tech_lead` | architecture → review | +| `run_sprint_planner` | architecture → issue DAG | +| `run_issue_writer` | issue spec → detailed issue | +| `run_coder` | issue + worktree → code + tests + commit | +| `run_qa` | worktree → test results | +| `run_code_reviewer` | worktree → quality/security review | +| `run_qa_synthesizer` | QA + review → FIX / APPROVE / BLOCK | +| `run_issue_advisor` | failure context → adapt / split / accept / escalate | +| `run_replanner` | build state + failures → restructured plan | +| `run_merger` | branches → merged output | +| `run_integration_tester` | merged repo → integration results | +| `run_verifier` | repo + PRD → acceptance pass/fail | +| `generate_fix_issues` | failed criteria → targeted fix issues | +| `run_github_pr` | branch → push + draft PR |
@@ -570,23 +446,23 @@ Every specialist is also callable directly: Pass `config` to `build` or `execute`. Full schema: [`swe_af/execution/schemas.py`](swe_af/execution/schemas.py) -| Key | Default | Description | -| ------------------------- | --------------- | ----------------------------------------------------- | -| `runtime` | `"claude_code"` | Model runtime: `"claude_code"` or `"open_code"` | -| `models` | `null` | Flat role-model map (`default` + role keys below) | -| `max_coding_iterations` | `5` | Inner-loop retry budget | -| `max_advisor_invocations` | `2` | Middle-loop advisor budget | -| `max_replans` | `2` | Build-level replanning budget | -| `enable_issue_advisor` | `true` | Enable issue adaptation | -| `enable_replanning` | `true` | Enable global replanning | -| `enable_learning` | `false` | Enable cross-issue shared memory (continual learning) | -| `agent_timeout_seconds` | `2700` | Per-agent timeout | -| `agent_max_turns` | `150` | Tool-use turn budget | +| Key | Default | Description | +|-----|---------|-------------| +| `runtime` | `"claude_code"` | Model runtime: `"claude_code"` or `"open_code"` | +| `models` | `null` | Flat role→model map (`default` + role keys below) | +| `max_coding_iterations` | `5` | Inner-loop retry budget | +| `max_advisor_invocations` | `2` | Middle-loop advisor budget | +| `max_replans` | `2` | Build-level replanning budget | +| `enable_issue_advisor` | `true` | Enable issue adaptation | +| `enable_replanning` | `true` | Enable global replanning | +| `enable_learning` | `false` | Enable cross-issue shared memory (continual learning) | +| `agent_timeout_seconds` | `2700` | Per-agent timeout | +| `agent_max_turns` | `150` | Tool-use turn budget |
-Model Role Keys +Model role keys `models` supports: @@ -596,12 +472,7 @@ Pass `config` to `build` or `execute`. Full schema: [`swe_af/execution/schemas.p - `replan`, `retry_advisor`, `issue_writer`, `issue_advisor` - `verifier`, `git`, `merger`, `integration_tester` -
- -
-Resolution order - -`runtime defaults` < `models.default` < `models.` +Resolution order: `runtime defaults` < `models.default` < `models.`
@@ -661,7 +532,7 @@ make clean-examples
-Security and Community +Security and community - Contribution guide: [`docs/CONTRIBUTING.md`](docs/CONTRIBUTING.md) - Code of conduct: [`CODE_OF_CONDUCT.md`](CODE_OF_CONDUCT.md) @@ -673,4 +544,4 @@ make clean-examples --- -SWE-AF is built on [AgentField](https://github.com/Agent-Field/agentfield) as a first step from single-agent harnesses to autonomous software engineering factories. +SWE-AF is built on [AgentField](https://github.com/Agent-Field/agentfield) — a first step from single-agent harnesses to autonomous software engineering factories. From b2a06f02bf21a44cab741abf56a37813a59c59c2 Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 25 Feb 2026 13:46:41 +0000 Subject: [PATCH 2/2] Reposition single/multi-repo as equal operating modes, improve README polish MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Rename "Multi-Repository Workspace Support" → "Operating Modes" with "Single-Repository Mode" and "Multi-Repository Mode" as equal subsections - Remove all "backward compatible" and "exactly as before" framing - Move use cases into Multi-Repository Mode section (where they're relevant) - Add tagline "One API call → full engineering team → shipped code." - Bold lead phrases in Why SWE-AF for scannability - Add multi-model/multi-provider as a Why SWE-AF bullet - Sharpen Why SWE-AF opening line - Add "Modes" to nav links, rename "Benchmark Snapshot" → "Benchmark" - Enumerate agent roles in intro paragraph for clarity https://claude.ai/code/session_01XgBbDVMbE52cKdU1dCNEsV --- README.md | 592 +++++++++++++++++++++++++++++++++--------------------- 1 file changed, 361 insertions(+), 231 deletions(-) diff --git a/README.md b/README.md index f2790b3..36a39fb 100644 --- a/README.md +++ b/README.md @@ -18,8 +18,9 @@

Quick Start • - Features • - How It Works • + Why SWE-AF • + In Action • + Factory ControlBenchmarkModesAPI • @@ -28,31 +29,179 @@ -SWE-AF spins up a coordinated fleet of AI agents — product managers, architects, coders, reviewers, testers — that scope, build, adapt, and ship software end to end. No scaffolding, no human-in-the-loop. One goal in, verified PR out. +One API call spins up a full autonomous engineering team — product managers, architects, coders, reviewers, testers — that scopes, builds, adapts, and ships complex software end to end. +SWE-AF is a first step toward **autonomous software engineering factories**, scaling from simple goals to hard multi-issue programs with hundreds to thousands of agent invocations.

SWE-AF autonomous engineering fleet banner

-
+## One-Call DX -| Scored **95/100** on benchmark | 10/10 issues, **$19** total cost | **400–500+** agents per build | Claude, MiniMax, DeepSeek, Qwen | -|:---:|:---:|:---:|:---:| -| Beats Claude Code (73) & Codex (62) | [Real PR — zero human code](https://github.com/Agent-Field/agentfield/pull/179) | Planning → coding → QA → merge | Any model, any provider | +```bash +curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \ + -H "Content-Type: application/json" \ + -d @- <<'JSON' +{ + "input": { + "goal": "Refactor and harden auth + billing flows", + "repo_url": "https://github.com/user/my-project", + "config": { + "runtime": "claude_code", + "models": { + "default": "sonnet", + "coder": "opus", + "qa": "opus" + }, + "enable_learning": true + } + } +} +JSON +``` -
+Swap `models.default` and any role key (`coder`, `qa`, `architect`, etc.) to any model your runtime supports. + +## Operating Modes + +SWE-AF works in two modes: point it at a single repository, or orchestrate coordinated changes across multiple repos in one build. + +### Single-Repository Mode + +The default. Pass `repo_url` (remote) or `repo_path` (local) and SWE-AF handles everything: + +```bash +curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \ + -H "Content-Type: application/json" \ + -d '{ + "input": { + "goal": "Add JWT auth", + "repo_url": "https://github.com/user/my-project" + } + }' +``` + +### Multi-Repository Mode + +When your work spans multiple codebases — a primary app plus shared libraries, monorepo sub-projects, or dependent microservices — pass `config.repos` as an array with roles: + +```bash +curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \ + -H "Content-Type: application/json" \ + -d '{ + "input": { + "goal": "Add JWT auth across API and shared-lib", + "config": { + "repos": [ + { + "repo_url": "https://github.com/org/main-app", + "role": "primary" + }, + { + "repo_url": "https://github.com/org/shared-lib", + "role": "dependency" + } + ], + "runtime": "claude_code", + "models": { + "default": "sonnet" + } + } + } + }' +``` -## Features +**Roles:** +- `primary` — The main application. Changes here drive the build; failures block progress. +- `dependency` — Libraries or services modified to support the primary repo. Failures are captured but don't block. + +**Use cases:** +- Primary app + shared SDK or utilities library +- Monorepo sub-projects that live in separate repos +- Feature spanning multiple microservices (e.g., API + worker queue) + +## Autonomous Build Spotlight + +Rust-based Python compiler benchmark (built autonomously): + +| Metric | CPython (subprocess) | RustPython (SWE-AF) | Improvement | +| ---------------------- | -------------------- | ---------------------------- | ----------------------- | +| Steady-state execution | Baseline (~19ms) | Optimized in-process runtime | **88.3x-602.3x faster** | +| Geometric mean | 1.0x baseline | 253.8x | **253.8x** | +| Peak throughput | ~52 ops/s | 31,807 ops/s | **~612x** | + +
+Measurement methodology + +Throughput comparison measures different execution models: CPython subprocess spawn (~19ms per call → ~52 ops/s) vs RustPython pre-warmed interpreter pool (in-process). This is the real-world tradeoff the system was built to optimize — replacing repeated subprocess invocations with a persistent pool for short-snippet execution. + +
+ +Artifact trail includes **175 tracked autonomous agents** across planning, coding, review, merge, and verification. + +Details: [`examples/llm-rust-python-compiler-sonnet/README.md`](examples/llm-rust-python-compiler-sonnet/README.md) + +## Why SWE-AF + +Most agent frameworks wrap a single coder loop. SWE-AF is a coordinated engineering factory — planning, execution, and governance agents run as a control stack that adapts in real time. + +- **Hardness-aware execution** — easy issues pass through quickly, while hard issues trigger deeper adaptation and DAG-level replanning instead of blind retries. +- **Factory architecture** — not a single-agent wrapper. Planning, execution, and governance agents run as a coordinated control stack. +- **Multi-model, multi-provider** — assign different models per role (`coder: opus`, `qa: haiku`). Works with Claude, OpenRouter, OpenAI, and Google. +- **Continual learning** — with `enable_learning=true`, conventions and failure patterns discovered early are injected into downstream issues. +- **Agent-scale parallelism** — dependency-level scheduling + isolated git worktrees allow large fan-out without branch collisions. +- **Fleet-scale orchestration** — many SWE-AF nodes can run continuously in parallel via AgentField, driving thousands of agent invocations across concurrent builds. +- **Explicit compromise tracking** — when scope is relaxed, debt is typed, severity-rated, and propagated. +- **Long-run reliability** — checkpointed execution supports `resume_build` after crashes or interruptions. + +## In Action + +[PR #179: Go SDK DID/VC Registration](https://github.com/Agent-Field/agentfield/pull/179) — built entirely by SWE-AF (Claude runtime with haiku-class models). One API call, zero human code. + +| Metric | Value | +| ------------------- | ------------------ | +| Issues completed | 10/10 | +| Tests passing | 217 | +| Acceptance criteria | 34/34 | +| Agent invocations | 79 | +| Model | `claude-haiku-4-5` | +| **Total cost** | **$19.23** | + +
+Cost breakdown by agent role + +| Role | Cost | % | +| ---------------------------------- | ----- | ----- | +| Coder | $5.88 | 30.6% | +| Code Reviewer | $3.48 | 18.1% | +| QA | $1.78 | 9.2% | +| GitHub PR | $1.66 | 8.6% | +| Integration Tester | $1.59 | 8.3% | +| Merger | $1.22 | 6.3% | +| Workspace Ops | $1.77 | 9.2% | +| Planning (PM + Arch + TL + Sprint) | $0.79 | 4.1% | +| Verifier + Finalize | $0.34 | 1.8% | +| Synthesizer | $0.05 | 0.2% | + +79 invocations, 2,070 conversation turns. Planning agents scope and decompose; coders work in parallel isolated worktrees; reviewers and QA validate each issue; merger integrates branches; verifier checks acceptance criteria against the PRD. -- **Factory, not a wrapper** — Planning, execution, and governance agents run as a coordinated control stack. Not just a coder loop with retries. -- **Hardness-aware execution** — Easy issues pass through fast. Hard issues trigger deeper adaptation and DAG-level replanning instead of blind retries. -- **Multi-model, multi-provider** — Assign different models per role (`coder: opus`, `qa: haiku`). Works with Claude, OpenRouter, OpenAI, and Google. -- **Single-repo and multi-repo modes** — Point at one repository or orchestrate coordinated changes across multiple repos in a single build. -- **Continual learning** — With `enable_learning: true`, conventions and failure patterns discovered early get injected into downstream issues. -- **Agent-scale parallelism** — Dependency-level scheduling + isolated git worktrees allow large fan-out without branch collisions. -- **Self-correcting builds** — Three nested control loops (inner retry → advisor adaptation → DAG replanning) handle failures automatically. -- **Crash recovery** — Checkpointed execution supports `resume_build` after interruptions. -- **Draft PR output** — Pass a `repo_url` and SWE-AF clones, builds, and opens a draft PR on GitHub. +
+ +**Claude & open-source models supported**: Run builds with either runtime and tune models per role in one flat config map. +- `runtime: "claude_code"` maps to Claude backend. +- `runtime: "open_code"` maps to OpenCode backend (OpenRouter/OpenAI/Google/Anthropic model IDs). + +## Adaptive Factory Control + +SWE-AF uses three nested control loops to adapt to task difficulty in real time: + +| Loop | Scope | Trigger | Action | +| ----------- | ------------- | -------------------- | ---------------------------------------------------------------------------------- | +| Inner loop | Single issue | QA/review fails | Coder retries with feedback | +| Middle loop | Single issue | Inner loop exhausted | `run_issue_advisor` retries with a new approach, splits work, or accepts with debt | +| Outer loop | Remaining DAG | Escalated failures | `run_replanner` restructures remaining issues and dependencies | + +This is the core factory-control behavior: control agents supervise worker agents and continuously reshape the plan as reality changes. ## Quick Start @@ -65,93 +214,123 @@ One click deploys SWE-AF + AgentField control plane + PostgreSQL. Set two enviro - `CLAUDE_CODE_OAUTH_TOKEN` — run `claude setup-token` in [Claude Code CLI](https://docs.anthropic.com/en/docs/claude-code) (uses Pro/Max subscription credits) - `GH_TOKEN` — GitHub personal access token with `repo` scope for draft PR creation -Then trigger a build: +Once deployed, trigger a build: ```bash -curl -X POST https://.up.railway.app/api/v1/execute/async/swe-planner.build \ +curl -X POST https://.up.railway.app/api/v1/execute/async/swe-planner.build \ -H "Content-Type: application/json" \ -H "X-API-Key: this-is-a-secret" \ -d '{"input": {"goal": "Add JWT auth", "repo_url": "https://github.com/user/my-repo"}}' ``` -### Run Locally +### 1. Requirements (local) -```bash -# 1. Install -python3.12 -m venv .venv && source .venv/bin/activate -pip install -e ".[dev]" +- Python 3.12+ +- AgentField control plane (`af`) +- AI provider API key (Anthropic, OpenRouter, OpenAI, or Google) -# 2. Start the control plane and register the node -af # starts AgentField on :8080 -python -m swe_af # registers node "swe-planner" +### 2. Install -# 3. Trigger a build -curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \ - -H "Content-Type: application/json" \ - -d '{"input": {"goal": "Add JWT auth", "repo_url": "https://github.com/user/my-project"}}' +```bash +python3.12 -m venv .venv +source .venv/bin/activate +python -m pip install --upgrade pip +python -m pip install -e ".[dev]" ``` -### Docker +### 3. Run ```bash -cp .env.example .env # add your API key + optional GH_TOKEN -docker compose up -d - -# Scale workers -docker compose up --scale swe-agent=3 -d +af # starts AgentField control plane on :8080 +python -m swe_af # registers node id "swe-planner" ``` -Use a host control plane instead of the Docker service: +### 4. Trigger a build ```bash -docker compose -f docker-compose.local.yml up -d -``` +# Default (uses Claude) +curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \ + -H "Content-Type: application/json" \ + -d @- <<'JSON' +{ + "input": { + "goal": "Add JWT auth to all API endpoints", + "repo_url": "https://github.com/user/my-project" + } +} +JSON -## How a Build Works +# With open-source runtime + flat role map +curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \ + -H "Content-Type: application/json" \ + -d @- <<'JSON' +{ + "input": { + "goal": "Add JWT auth", + "repo_url": "https://github.com/user/my-project", + "config": { + "runtime": "open_code", + "models": { + "default": "openrouter/minimax/minimax-m2.5" + } + } + } +} +JSON -```text -Goal → PM → Architect → Tech Lead → Sprint Planner → Issue DAG - ↓ - ┌────────────────────────────────┘ - ↓ (parallel, isolated worktrees) - ┌─────────┐ - │ Issue N │ → Coder → QA → Reviewer → Synthesizer - └─────────┘ ↑ │ - ↑ └── retry ───────┘ (inner loop) - │ - └── advisor / replanner (middle + outer loops) - ↓ - Merge → Integration Test → Verify → Draft PR +# Local workspace mode (repo_path) + targeted role override +curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \ + -H "Content-Type: application/json" \ + -d @- <<'JSON' +{ + "input": { + "goal": "Refactor and harden auth + billing flows", + "repo_path": "/path/to/repo", + "config": { + "runtime": "claude_code", + "models": { + "default": "sonnet", + "coder": "opus", + "qa": "opus" + }, + "enable_learning": true + } + } +} +JSON ``` -Three nested control loops handle task difficulty in real time: +For OpenRouter with `open_code`, use model IDs in `openrouter//` format (for example `openrouter/minimax/minimax-m2.5`). + +## What Happens In One Build -| Loop | Scope | Trigger | Action | -|------|-------|---------|--------| -| **Inner** | Single issue | QA/review fails | Coder retries with feedback | -| **Middle** | Single issue | Inner loop exhausted | Advisor retries with new approach, splits work, or accepts with debt | -| **Outer** | Remaining DAG | Escalated failures | Replanner restructures remaining issues and dependencies | +- Architecture is generated and reviewed before coding starts +- Issues are dependency-sorted and run in parallel across isolated worktrees +- Each issue gets dedicated coder, tester, and reviewer passes +- Failed issues trigger advisor-driven adaptation (split, re-scope, or escalate) +- Escalations trigger replanning of the remaining DAG +- End result is merged, integration-tested, and verified against acceptance criteria

SWE-AF architecture

-> Typical runs spin up 400–500+ agent instances across planning, execution, QA, and verification. Larger DAGs and repeated adaptation cycles scale into the thousands. +> Typical runs spin up 400-500+ agent instances across planning, execution, QA, and verification. For larger DAGs and repeated adaptation/replanning cycles, SWE-AF can scale into the high hundreds to thousands of agent invocations in a single build. ## Benchmark -**95/100** with both Claude haiku-class routing ($20) and MiniMax M2.5 via open runtime ($6), outperforming Claude Code sonnet (73), Codex o3 (62), and Claude Code haiku (59) on the same prompt. +**95/100 with haiku and MiniMax**: SWE-AF scored 95/100 with both Claude haiku-class routing ($20) and MiniMax M2.5 via open runtime ($6), outperforming Claude Code sonnet (73), Codex o3 (62), and Claude Code haiku (59) on the same prompt. -| Dimension | SWE-AF (haiku) | SWE-AF (MiniMax) | CC Sonnet | Codex (o3) | CC Haiku | -|-----------|---------------|-----------------|-----------|-----------|---------| -| Functional (30) | **30** | **30** | **30** | **30** | **30** | -| Structure (20) | **20** | **20** | 10 | 10 | 10 | -| Hygiene (20) | **20** | **20** | 16 | 10 | 7 | -| Git (15) | **15** | **15** | 2 | 2 | 2 | -| Quality (15) | 10 | 10 | **15** | 10 | 10 | -| **Total** | **95** | **95** | **73** | **62** | **59** | -| **Cost** | **~$20** | **~$6** | ? | ? | ? | -| **Time** | ~30–40 min | 43 min | ? | ? | ? | +| Dimension | SWE-AF (haiku) | SWE-AF (MiniMax) | CC Sonnet | Codex (o3) | CC Haiku | +| --------------- | -------------- | ---------------- | --------- | ---------- | -------- | +| Functional (30) | **30** | **30** | **30** | **30** | **30** | +| Structure (20) | **20** | **20** | 10 | 10 | 10 | +| Hygiene (20) | **20** | **20** | 16 | 10 | 7 | +| Git (15) | **15** | **15** | 2 | 2 | 2 | +| Quality (15) | 10 | 10 | **15** | 10 | 10 | +| Total | **95** | **95** | **73** | **62** | **59** | +| **Cost** | **~$20** | **~$6** | ? | ? | ? | +| **Time** | ~30-40 min | 43 min | ? | ? | ? |
Full benchmark details and reproduction @@ -164,13 +343,13 @@ Same prompt tested across multiple agents. SWE-AF with Claude runtime (haiku-cla ### Scoring framework -| Dimension | Points | What it measures | -|-----------|--------|-----------------| -| Functional | 30 | CLI behavior and passing tests | -| Structure | 20 | Modular source layout and test organization | -| Hygiene | 20 | `.gitignore`, clean status, no junk artifacts | -| Git | 15 | Commit discipline and message quality | -| Quality | 15 | Error handling, package metadata, README quality | +| Dimension | Points | What it measures | +| ---------- | ------ | ------------------------------------------------ | +| Functional | 30 | CLI behavior and passing tests | +| Structure | 20 | Modular source layout and test organization | +| Hygiene | 20 | `.gitignore`, clean status, no junk artifacts | +| Git | 15 | Commit discipline and message quality | +| Quality | 15 | Error handling, package metadata, README quality | ### Reproduction @@ -212,13 +391,13 @@ curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \ JSON # Claude Code (haiku) -claude -p "Build a Node.js CLI todo app ..." --model haiku --dangerously-skip-permissions +claude -p "Build a Node.js CLI todo app with add, list, complete, and delete commands. Data should persist to a JSON file. Initialize git, write tests, and commit your work." --model haiku --dangerously-skip-permissions # Claude Code (sonnet) -claude -p "Build a Node.js CLI todo app ..." --model sonnet --dangerously-skip-permissions +claude -p "Build a Node.js CLI todo app with add, list, complete, and delete commands. Data should persist to a JSON file. Initialize git, write tests, and commit your work." --model sonnet --dangerously-skip-permissions # Codex (gpt-5.3-codex) -codex exec "Build a Node.js CLI todo app ..." --full-auto +codex exec "Build a Node.js CLI todo app with add, list, complete, and delete commands. Data should persist to a JSON file. Initialize git, write tests, and commit your work." --full-auto ``` **MiniMax M2.5 Measured Metrics (Feb 2026):** @@ -232,121 +411,77 @@ Benchmark assets, logs, evaluator, and generated projects live in [`examples/age
-## Real-World Examples - -### PR #179: Go SDK — Built Entirely by SWE-AF - -[PR #179: Go SDK DID/VC Registration](https://github.com/Agent-Field/agentfield/pull/179) — one API call, zero human code, haiku-class models. - -| Metric | Value | -|--------|-------| -| Issues completed | 10/10 | -| Tests passing | 217 | -| Acceptance criteria | 34/34 | -| Agent invocations | 79 | -| Model | `claude-haiku-4-5` | -| **Total cost** | **$19.23** | - -
-Cost breakdown by agent role - -| Role | Cost | % | -|------|------|---| -| Coder | $5.88 | 30.6% | -| Code Reviewer | $3.48 | 18.1% | -| QA | $1.78 | 9.2% | -| GitHub PR | $1.66 | 8.6% | -| Integration Tester | $1.59 | 8.3% | -| Merger | $1.22 | 6.3% | -| Workspace Ops | $1.77 | 9.2% | -| Planning (PM + Arch + TL + Sprint) | $0.79 | 4.1% | -| Verifier + Finalize | $0.34 | 1.8% | -| Synthesizer | $0.05 | 0.2% | - -79 invocations, 2,070 conversation turns. Planning agents scope and decompose; coders work in parallel isolated worktrees; reviewers and QA validate each issue; merger integrates branches; verifier checks acceptance criteria against the PRD. - -
- -### Autonomous Build Spotlight +## Docker -Rust-based Python compiler benchmark (built autonomously): - -| Metric | CPython (subprocess) | RustPython (SWE-AF) | Improvement | -|--------|---------------------|---------------------|-------------| -| Steady-state execution | Baseline (~19ms) | Optimized in-process runtime | **88.3x–602.3x faster** | -| Geometric mean | 1.0x baseline | 253.8x | **253.8x** | -| Peak throughput | ~52 ops/s | 31,807 ops/s | **~612x** | - -
-Measurement methodology - -Throughput comparison measures different execution models: CPython subprocess spawn (~19ms per call → ~52 ops/s) vs RustPython pre-warmed interpreter pool (in-process). This is the real-world tradeoff the system was built to optimize — replacing repeated subprocess invocations with a persistent pool for short-snippet execution. - -
- -Artifact trail includes **175 tracked autonomous agents** across planning, coding, review, merge, and verification. - -Details: [`examples/llm-rust-python-compiler-sonnet/README.md`](examples/llm-rust-python-compiler-sonnet/README.md) - -## Operating Modes - -SWE-AF works in two modes: point it at a single repository, or orchestrate coordinated changes across multiple repos in one build. +```bash +cp .env.example .env +# Add your API key: ANTHROPIC_API_KEY, OPENROUTER_API_KEY, OPENAI_API_KEY, or GOOGLE_API_KEY +# Optionally add GH_TOKEN for draft PR workflow -### Single-Repository Mode +docker compose up -d +``` -The default. Pass `repo_url` (remote) or `repo_path` (local) and SWE-AF handles everything: +Submit a build: ```bash +# Default (Claude) curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \ -H "Content-Type: application/json" \ - -d '{ - "input": { - "goal": "Add JWT auth", - "repo_url": "https://github.com/user/my-project" - } - }' -``` - -### Multi-Repository Mode - -When your work spans multiple codebases — a primary app plus shared libraries, monorepo sub-projects, or dependent microservices — pass `config.repos` as an array with roles: + -d @- <<'JSON' +{ + "input": { + "goal": "Add JWT auth", + "repo_url": "https://github.com/user/my-repo" + } +} +JSON -```bash +# With open-source runtime (set OPENROUTER_API_KEY in .env) curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \ -H "Content-Type: application/json" \ - -d '{ - "input": { - "goal": "Add JWT auth across API and shared-lib", - "config": { - "repos": [ - { - "repo_url": "https://github.com/org/main-app", - "role": "primary" - }, - { - "repo_url": "https://github.com/org/shared-lib", - "role": "dependency" - } - ], - "runtime": "claude_code", - "models": { "default": "sonnet" } + -d @- <<'JSON' +{ + "input": { + "goal": "Add JWT auth", + "repo_url": "https://github.com/user/my-repo", + "config": { + "runtime": "open_code", + "models": { + "default": "openrouter/minimax/minimax-m2.5" } } - }' + } +} +JSON + +# Local workspace mode (repo_path) +curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \ + -H "Content-Type: application/json" \ + -d @- <<'JSON' +{ + "input": { + "goal": "Add JWT auth", + "repo_path": "/workspaces/my-repo" + } +} +JSON ``` -**Roles:** -- `primary` — The main application. Changes here drive the build; failures block progress. -- `dependency` — Libraries or services modified to support the primary repo. Failures are captured but don't block. +Scale workers: -**Use cases:** -- Primary app + shared SDK or utilities library -- Monorepo sub-projects that live in separate repos -- Feature spanning multiple microservices (e.g., API + worker queue) +```bash +docker compose up --scale swe-agent=3 -d +``` -## One-Call DX +Use a host control plane instead of Docker control-plane service: + +```bash +docker compose -f docker-compose.local.yml up -d +``` -Every build is a single API call. Swap runtimes and assign models per agent role in one flat config: +## GitHub Repo Workflow (Clone -> Build -> Draft PR) + +Pass `repo_url` instead of `repo_path` to let SWE-AF clone and open a draft PR after execution. ```bash curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \ @@ -354,35 +489,25 @@ curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \ -d @- <<'JSON' { "input": { - "goal": "Refactor and harden auth + billing flows", "repo_url": "https://github.com/user/my-project", + "goal": "Add comprehensive test coverage", "config": { "runtime": "claude_code", "models": { "default": "sonnet", "coder": "opus", "qa": "opus" - }, - "enable_learning": true + } } } } JSON ``` -**Runtimes supported:** -- `runtime: "claude_code"` — Claude backend (Anthropic) -- `runtime: "open_code"` — OpenCode backend (OpenRouter/OpenAI/Google/Anthropic model IDs) - -For OpenRouter, use model IDs in `openrouter//` format (e.g., `openrouter/minimax/minimax-m2.5`). - -## GitHub Repo Workflow - -Pass `repo_url` instead of `repo_path` to let SWE-AF clone the repo and open a draft PR after the build completes. - Requirements: -- `GH_TOKEN` environment variable with `repo` scope -- Token must have access to the target repository + +- `GH_TOKEN` in `.env` with `repo` scope +- Repo access for that token ## API Reference @@ -392,7 +517,7 @@ Requirements: Core async endpoints (returns an `execution_id` immediately): ```bash -# Full build: plan → execute → verify +# Full build: plan -> execute -> verify POST /api/v1/execute/async/swe-planner.build # Plan only @@ -420,24 +545,24 @@ Every specialist is also callable directly:
Agent execution flow -| Agent | In → Out | -|-------|----------| -| `run_product_manager` | goal → PRD | -| `run_architect` | PRD → architecture | -| `run_tech_lead` | architecture → review | -| `run_sprint_planner` | architecture → issue DAG | -| `run_issue_writer` | issue spec → detailed issue | -| `run_coder` | issue + worktree → code + tests + commit | -| `run_qa` | worktree → test results | -| `run_code_reviewer` | worktree → quality/security review | -| `run_qa_synthesizer` | QA + review → FIX / APPROVE / BLOCK | -| `run_issue_advisor` | failure context → adapt / split / accept / escalate | -| `run_replanner` | build state + failures → restructured plan | -| `run_merger` | branches → merged output | -| `run_integration_tester` | merged repo → integration results | -| `run_verifier` | repo + PRD → acceptance pass/fail | -| `generate_fix_issues` | failed criteria → targeted fix issues | -| `run_github_pr` | branch → push + draft PR | +| Agent | In -> Out | +| ------------------------ | ---------------------------------------------------- | +| `run_product_manager` | goal -> PRD | +| `run_architect` | PRD -> architecture | +| `run_tech_lead` | architecture -> review | +| `run_sprint_planner` | architecture -> issue DAG | +| `run_issue_writer` | issue spec -> detailed issue | +| `run_coder` | issue + worktree -> code + tests + commit | +| `run_qa` | worktree -> test results | +| `run_code_reviewer` | worktree -> quality/security review | +| `run_qa_synthesizer` | QA + review -> FIX / APPROVE / BLOCK | +| `run_issue_advisor` | failure context -> adapt / split / accept / escalate | +| `run_replanner` | build state + failures -> restructured plan | +| `run_merger` | branches -> merged output | +| `run_integration_tester` | merged repo -> integration results | +| `run_verifier` | repo + PRD -> acceptance pass/fail | +| `generate_fix_issues` | failed criteria -> targeted fix issues | +| `run_github_pr` | branch -> push + draft PR |
@@ -446,23 +571,23 @@ Every specialist is also callable directly: Pass `config` to `build` or `execute`. Full schema: [`swe_af/execution/schemas.py`](swe_af/execution/schemas.py) -| Key | Default | Description | -|-----|---------|-------------| -| `runtime` | `"claude_code"` | Model runtime: `"claude_code"` or `"open_code"` | -| `models` | `null` | Flat role→model map (`default` + role keys below) | -| `max_coding_iterations` | `5` | Inner-loop retry budget | -| `max_advisor_invocations` | `2` | Middle-loop advisor budget | -| `max_replans` | `2` | Build-level replanning budget | -| `enable_issue_advisor` | `true` | Enable issue adaptation | -| `enable_replanning` | `true` | Enable global replanning | -| `enable_learning` | `false` | Enable cross-issue shared memory (continual learning) | -| `agent_timeout_seconds` | `2700` | Per-agent timeout | -| `agent_max_turns` | `150` | Tool-use turn budget | +| Key | Default | Description | +| ------------------------- | --------------- | ----------------------------------------------------- | +| `runtime` | `"claude_code"` | Model runtime: `"claude_code"` or `"open_code"` | +| `models` | `null` | Flat role-model map (`default` + role keys below) | +| `max_coding_iterations` | `5` | Inner-loop retry budget | +| `max_advisor_invocations` | `2` | Middle-loop advisor budget | +| `max_replans` | `2` | Build-level replanning budget | +| `enable_issue_advisor` | `true` | Enable issue adaptation | +| `enable_replanning` | `true` | Enable global replanning | +| `enable_learning` | `false` | Enable cross-issue shared memory (continual learning) | +| `agent_timeout_seconds` | `2700` | Per-agent timeout | +| `agent_max_turns` | `150` | Tool-use turn budget |
-Model role keys +Model Role Keys `models` supports: @@ -472,7 +597,12 @@ Pass `config` to `build` or `execute`. Full schema: [`swe_af/execution/schemas.p - `replan`, `retry_advisor`, `issue_writer`, `issue_advisor` - `verifier`, `git`, `merger`, `integration_tester` -Resolution order: `runtime defaults` < `models.default` < `models.` +
+ +
+Resolution order + +`runtime defaults` < `models.default` < `models.`
@@ -532,7 +662,7 @@ make clean-examples
-Security and community +Security and Community - Contribution guide: [`docs/CONTRIBUTING.md`](docs/CONTRIBUTING.md) - Code of conduct: [`CODE_OF_CONDUCT.md`](CODE_OF_CONDUCT.md) @@ -544,4 +674,4 @@ make clean-examples --- -SWE-AF is built on [AgentField](https://github.com/Agent-Field/agentfield) — a first step from single-agent harnesses to autonomous software engineering factories. +SWE-AF is built on [AgentField](https://github.com/Agent-Field/agentfield) as a first step from single-agent harnesses to autonomous software engineering factories.