A minimal harness demonstrating long-running autonomous coding with the Claude Agent SDK. This demo implements a three-agent pattern (initializer/onboarding + coding agent) that can build complete applications over multiple sessions.
New in v2.0: Support for existing codebases! The agent can now analyze and continue development on existing projects, not just build from scratch.
Required: Install the latest versions of both Claude Code and the Claude Agent SDK:
# Install Claude Code CLI (latest version required)
npm install -g @anthropic-ai/claude-code
# Install Python dependencies
pip install -r requirements.txtVerify your installations:
claude --version # Should be latest version
pip show claude-code-sdk # Check SDK is installedAPI Key: Set your Anthropic API key:
export ANTHROPIC_API_KEY='your-api-key-here'New Project (build from scratch):
python aidd-c.py --project-dir ./my_project --spec ./specs/app_spec.txtExisting Codebase (analyze and continue):
python aidd-c.py --project-dir ./path/to/existing/appTesting with limited iterations:
python aidd-c.py --project-dir ./my_project --spec ./specs/app_spec.txt --max-iterations 3Warning: This demo takes a long time to run!
-
First session (initialization): The agent generates a metadata directory (
.auto,.autok, or.automaker) with afeature_list.jsoncontaining 200 test cases. This takes several minutes and may appear to hang - this is normal. The agent is writing out all the features. -
Subsequent sessions: Each coding iteration can take 5-15 minutes depending on complexity.
-
Full app: Building all 200 features typically requires many hours of total runtime across multiple sessions.
Tip: The 200 features parameter in the prompts is designed for comprehensive coverage. If you want faster demos, you can modify prompts/initializer.md to reduce the feature count (e.g., 20-50 features for a quicker demo).
The system automatically detects which agent to use based on the project directory state:
-
Initializer Agent (Session 1 - New Projects):
- Triggered when: Directory is empty or doesn't exist
- Reads
spec.txtfrom metadata directory, createsfeature_list.jsonwith 200 test cases - Sets up project structure and initializes git
- Begins implementation if time permits
-
Onboarding Agent (Session 1 - Existing Codebases):
- Triggered when: Directory has existing code but no metadata directory with
feature_list.json - Analyzes the existing codebase to understand what's implemented
- Creates or infers
spec.txtfrom the code in the metadata directory - Creates
feature_list.jsonwith existing features marked as passing - Identifies missing features and technical debt
- Prepares for continued development
- Triggered when: Directory has existing code but no metadata directory with
-
Coding Agent (Sessions 2+):
- Triggered when:
feature_list.jsonexists in metadata directory - Picks up where previous session left off
- Implements features one by one
- Marks them as passing in
feature_list.json - Works on both new and existing codebases
- Triggered when:
- Each session runs with a fresh context window
- Progress is persisted via metadata directory (
feature_list.json) and git commits - The agent auto-continues between sessions (3 second delay)
- Press
Ctrl+Cto pause; run the same command to resume
This demo uses a defense-in-depth security approach (see security.py and client.py):
- OS-level Sandbox: Bash commands run in an isolated environment
- Filesystem Restrictions: File operations restricted to the project directory only
- Bash Allowlist: Only specific commands are permitted:
- File inspection:
ls,cat,head,tail,wc,grep - Node.js:
npm,node - Version control:
git - Process management:
ps,lsof,sleep,pkill(dev processes only)
- File inspection:
Commands not in the allowlist are blocked by the security hook.
autonomous-coding/
├── aidd-c.py # Main entry point
├── agent.py # Agent session logic
├── client.py # Claude SDK client configuration
├── security.py # Bash command allowlist and validation
├── progress.py # Progress tracking utilities
├── prompts.py # Prompt loading utilities
├── prompts/
│ ├── initializer.md # First session prompt (new projects)
│ ├── onboarding.md # First session prompt (existing codebases)
│ └── coding.md # Continuation session prompt
├── specs/
│ └── app_spec.txt # Application specification
└── requirements.txt # Python dependencies
After running, your project directory will contain:
my_project/
├── .aidd/ # or .autok/ or .automaker/ (whichever is found/created)
│ ├── feature_list.json # Test cases (source of truth)
│ ├── spec.txt # Copied specification
│ ├── init.sh # Environment setup script
│ └── claude-progress.txt # Session progress notes
├── .claude_settings.json # Security settings
└── [application files] # Generated application code
The metadata directory's feature_list.json file uses an enhanced schema with rich metadata for better tracking:
{
"area": "backend",
"category": "functional",
"description": "User can log in with email and password",
"priority": "critical",
"status": "open",
"created_at": "2025-01-15",
"closed_at": null,
"steps": [
"Step 1: Navigate to login page",
"Step 2: Enter credentials",
"Step 3: Verify login success"
],
"passes": false
}Field Definitions:
| Field | Values | Description |
|---|---|---|
area |
database, backend, frontend, testing, security, devex, docs |
System area |
category |
functional, style, security, performance, accessibility |
Test type |
priority |
critical, high, medium, low |
Implementation priority |
status |
open, in_progress, resolved, deferred |
Current state |
created_at |
YYYY-MM-DD |
Date feature was added |
closed_at |
YYYY-MM-DD or null |
Date feature was completed |
steps |
Array of strings | Testing steps |
passes |
true or false |
Whether feature passes testing |
Progress Display:
The agent displays progress summaries including:
- Overall passing/total counts
- Status breakdown (open, in_progress, resolved, deferred)
- Priority breakdown (critical, high, medium, low)
After the agent completes (or pauses), you can run the generated application:
cd generations/my_project
# Run the setup script created by the agent
./[metadata-dir]/init.sh
# Or manually (typical for Node.js apps):
npm install
npm run devThe application will typically be available at http://localhost:3000 or similar (check the agent's output or [metadata-dir]/init.sh for the exact URL).
| Option | Description | Default |
|---|---|---|
--project-dir |
Directory for the project (required) | None |
--spec |
Specification file (required for new projects) | None |
--max-iterations |
Max agent iterations | Unlimited |
--model |
Claude model for all phases | claude-sonnet-4-5-20250929 |
--init-model |
Model for init/onboarding (overrides --model) |
Same as --model |
--code-model |
Model for coding phases (overrides --model) |
Same as --model |
--idle-timeout |
Abort session if no output for N seconds | 180 |
--quit-on-abort |
Quit after N consecutive failures | 0 (never) |
You can use different models for different phases to optimize cost and performance:
# Use Haiku 4.5 for setup (cheaper), Sonnet 4.5 for coding (more capable)
python aidd-c.py --project-dir ./my_project --spec ./specs/app_spec.txt \
--init-model claude-haiku-4-5-20251001 \
--code-model claude-sonnet-4-5-20250929
# Use Opus 4.5 for complex coding tasks
python aidd-c.py --project-dir ./my_project --spec ./specs/app_spec.txt \
--code-model claude-opus-4-5-20251101Recommended configurations:
| Use Case | Init Model | Code Model |
|---|---|---|
| Cost-optimized | claude-haiku-4-5-20251001 |
claude-sonnet-4-5-20250929 |
| Balanced | claude-sonnet-4-5-20250929 |
claude-sonnet-4-5-20250929 |
| Maximum quality | claude-sonnet-4-5-20250929 |
claude-opus-4-5-20251101 |
The idle timeout feature automatically detects and handles stuck agent sessions. If the agent produces no output for the specified number of seconds, the session is aborted and a fresh session is started.
# Use default 180-second idle timeout
python aidd-c.py --project-dir ./my_project
# Increase timeout for complex operations (5 minutes)
python aidd-c.py --project-dir ./my_project --idle-timeout 300
# Disable idle timeout entirely
python aidd-c.py --project-dir ./my_project --idle-timeout 0When to adjust idle timeout:
- Increase if you're seeing false timeouts during long-running operations
- Decrease if you want faster detection of stuck sessions
- Disable (0) if you want the agent to run without time limits
The failure threshold feature tracks consecutive failures (errors and idle timeouts) and can automatically quit after reaching a threshold. This prevents infinite retry loops when something is fundamentally broken.
# Default: never quit, keep retrying forever
python aidd-c.py --project-dir ./my_project
# Quit after 3 consecutive failures
python aidd-c.py --project-dir ./my_project --quit-on-abort 3
# Quit after 5 consecutive failures (more resilient)
python aidd-c.py --project-dir ./my_project --quit-on-abort 5How it works:
- Counter increments on errors or idle timeouts
- Counter resets to 0 on successful session completion
- When counter reaches threshold, the agent stops
- Use
0(default) to disable and keep retrying indefinitely
When to use:
- Production runs: Set to 3-5 to avoid wasting compute on broken sessions
- Development/debugging: Set to 0 to allow manual investigation
- Unattended runs: Set to a reasonable threshold to prevent runaway costs
Edit specs/app_spec.txt to specify a different application to build.
Edit prompts/initializer.md and change the "200 features" requirement to a smaller number for faster demos.
Edit security.py to add or remove commands from ALLOWED_COMMANDS.
"Appears to hang on first run"
This is normal. The initializer agent is generating 200 detailed test cases, which takes significant time. Watch for [Tool: ...] output to confirm the agent is working.
"Command blocked by security hook"
The agent tried to run a command not in the allowlist. This is the security system working as intended. If needed, add the command to ALLOWED_COMMANDS in security.py.
"API key not set"
Ensure ANTHROPIC_API_KEY is exported in your shell environment.
Internal Anthropic use.