egg

A structurally enforced SDLC pipeline for autonomous LLM agents — turning tasks into reviewed pull requests with mandatory human gates.

Inspired by Andy Weir's short story "The Egg" — a contained environment where development happens before emerging into the world. The agent works inside the egg; when ready, it "hatches" via human review and merge.

Note: this project is currently under heavy development. The core workflow is functional, but continually being refined and refactored. Expect breakages and changing behavior for the forseeable future.

How It Works

Use the egg-sdlc CLI to launch a multi-agent pipeline. The agent cannot skip steps, self-approve work, or bypass review — these constraints are enforced by the gateway infrastructure, not by prompts.

         ┌───────────┐     ┌───────────┐     ┌───────────────┐     ┌──────────┐
         │  REFINE   │────▶│   PLAN    │────▶│  IMPLEMENT    │────▶│  HUMAN   │
         │  task     │     │           │     │  + review     │     │  MERGE   │
         └─────┬─────┘     └─────┬─────┘     └───────────────┘     └────┬─────┘
               │                 │                                      │
               ▼                 ▼                                      ▼
          Human gate        Human gate                             GitHub UI
        (approve plan)  (approve tasks)                          (final merge)

Refine — Agent analyzes the task and produces a requirements document. Human approves.
Plan — Agent breaks work into tasks with acceptance criteria. Human approves before any code is written.
Implement — Agent creates a draft PR and implements tasks. CI runs, automated review provides line-level feedback. Re-implementation cycles continue until all checks pass.
Merge — Draft PR is marked ready. Only a human can merge via GitHub UI.

The egg-sdlc CLI supports two modes:

Issue mode (egg-sdlc -r <repo_dir> -i <issue_num>): GitHub-issue-driven pipeline with full remote integration
Local mode (egg-sdlc with no args): Prompt-driven pipeline that runs entirely locally

Both modes create a pipeline in the orchestrator, which spawns sandbox containers to execute each phase as a DAG. Pipeline state lives in a JSON contract (.egg-state/contracts/{identifier}.json) committed to the feature branch, giving full auditability of every phase transition.

For convenience, the /sdlc skill inside the egg interactive session redirects to egg-sdlc, so you can invoke it either way.

The Gateway

The gateway is the enforcement engine. It sits between the agent sandbox and the outside world, validating every operation against the current pipeline phase and role permissions.

┌─────────────────────────────────────────────────────────────────────────────┐
│                                    egg                                      │
│                                                                             │
│   ┌───────────────────────────────┐      ┌───────────────────────────────┐  │
│   │       Gateway Sidecar         │      │      Sandbox Container        │  │
│   │      (Enforcement Engine)     │      │     (Untrusted Agent)         │  │
│   │                               │      │                               │  │
│   │  ┌─────────────────────────┐  │ HTTP │  ┌─────────────────────────┐  │  │
│   │  │ Phase Filter            │◀─┼──────┼──│ git/gh wrappers         │  │  │
│   │  │ (block ops by phase)    │  │      │  │ (intercept all ops)     │  │  │
│   │  └─────────────────────────┘  │      │  └─────────────────────────┘  │  │
│   │                               │      │                               │  │
│   │  ┌─────────────────────────┐  │      │  ┌─────────────────────────┐  │  │
│   │  │ Role Validator          │  │ API  │  │ egg-contract CLI        │  │  │
│   │  │ (enforce field ownership│◀─┼──────┼──│ (state mutations)       │  │  │
│   │  │  for contract mutations)│  │      │  │                         │  │  │
│   │  └─────────────────────────┘  │      │  └─────────────────────────┘  │  │
│   │                               │      │                               │  │
│   │  ┌─────────────────────────┐  │      │  ┌─────────────────────────┐  │  │
│   │  │ Credential Injection    │  │Proxy │  │ Claude Code             │  │  │
│   │  │ (secrets never in       │◀─┼──────┼──│ (ANTHROPIC_BASE_URL)    │  │  │
│   │  │  sandbox)               │  │      │  │                         │  │  │
│   │  └─────────────────────────┘  │      │  └─────────────────────────┘  │  │
│   │                               │      │                               │  │
│   │  ┌─────────────────────────┐  │HTTPS │  ┌─────────────────────────┐  │  │
│   │  │ Network Policy          │◀─┼──────┼──│ All outbound traffic    │  │  │
│   │  │ (domain allowlist)      │  │Proxy │  │ (private mode)          │  │  │
│   │  └─────────────────────────┘  │      │  └─────────────────────────┘  │  │
│   │                               │      │                               │  │
│   │  HAS: GitHub tokens,          │      │  HAS: Workspace files only    │  │
│   │  Anthropic keys, network      │      │  NO: Credentials, .git/,      │  │
│   │                               │      │      direct network (private) │  │
│   └───────────────────────────────┘      └───────────────────────────────┘  │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Key principle: The agent cannot bypass controls because the capabilities don't exist in its environment. The gateway physically blocks operations — this is infrastructure enforcement, not behavioral controls.

What's Enforced

The gateway enforces both process controls (SDLC phases) and security controls (credential isolation) as aspects of the same system:

What	How It's Prevented
Agent skips to implementation	Gateway blocks `git push` during refine and plan phases
Agent self-approves work	Role-based validation: implementer cannot modify task status
Agent merges its own PR	Gateway has no merge endpoint — humans must merge via GitHub UI
Agent steals credentials	Credentials never enter sandbox; gateway injects them at request time
Agent pushes to main	Gateway enforces branch policies; agent can only push to `egg/*` branches
Agent tampers with contracts	File-level restrictions block implementers from modifying contract files via `git push`
Agent exfiltrates code	Private mode restricts network to Anthropic API + private GitHub repos only
Agent accesses other workspaces	Each agent gets isolated git worktree; `.git/` is shadowed

Phase Permissions

Each pipeline phase has a defined set of permitted operations:

Phase	Allowed Operations	Exit Requires
Refine	`gh issue comment/edit`, `git push` (state files), `egg-contract add-decision`	Human approval
Plan	`gh issue comment/edit`, `git push` (state files), `egg-contract add-decision`	Human approval
Implement	`git push` (code), `egg-contract add-commit/update-notes`	All checks pass (CI + PR review)
PR	`gh pr create/edit/comment`, `git push`	Human merge

How Isolation Works

Zero credential exposure: Anthropic API requests route through gateway; GitHub operations use wrappers that call gateway API. Container environment is sanitized.
Git isolation: Each agent gets an isolated worktree. The .git/ directory is shadowed (tmpfs mount) — the agent cannot access git metadata directly.
Network modes: Public mode allows full internet + credential-injected API calls. Private mode restricts to Anthropic API + private GitHub repos only.

Multi-Agent Orchestration

The orchestrator (orchestrator/) manages parallel execution of specialized agent roles. Each role runs in its own sandbox container with scoped permissions enforced by the gateway.

Implementation Phase Roles

During the implement phase, work is divided across these specialized agents:

Role	Responsibility
Coder	Write code, create commits, push branches
Tester	Write and run tests, validate acceptance criteria
Documenter	Update docs, READMEs, and changelogs
Integrator	Run full test suite, validate integration

Execution model: Wave-based with dependencies. The coder runs first, then tester and documenter run in parallel (both depend on coder's output). The integrator runs after the coder and tester complete (it does not wait for the documenter).

Reviewer Roles

Reviewers run as a separate step after workers complete in refine, plan, and implement phases:

Refine Phase:

Refine Reviewer: Analysis quality and completeness
Agent Design Reviewer: Agent-mode alignment and anti-patterns

Plan Phase:

Unified Reviewer: Plan quality, task structure, acceptance criteria
Plan Reviewer: Plan-specific quality (task breakdown, dependencies, test strategy)

Implement Phase:

Unified Reviewer: Comprehensive review across all criteria
Code Reviewer: Security, correctness, code quality
Contract Reviewer: Verify acceptance criteria met, task completion status

Execution model: Reviewers always run as a separate step after all workers (and checkers, if applicable) complete. They spawn in parallel with a configurable concurrency limit (max_parallel_agents). In implement phase, reviewers run after the integrator completes. In plan phase, reviewers run after the task planner and risk analyst complete. In refine phase, reviewers run after the refiner completes.

Refine Phase Roles

During the refine phase, a specialized agent produces the analysis:

Role	Responsibility
Refiner	Analyze task, research codebase, evaluate options, recommend approach

Execution model: Refiner runs first, then reviewers validate the analysis before human approval.

Plan Phase Roles

During the plan phase, work is divided across these specialized agents:

Role	Responsibility
Architect	Analyze task, research codebase, recommend approach
Task Planner	Break work into phases and discrete tasks with acceptance criteria
Risk Analyst	Identify technical risks, propose mitigation strategies

Execution model: Architect runs first, then task planner and risk analyst run in parallel (both depend on architect's analysis).

How It Works

The orchestrator handles wave-based execution, dependency tracking, container lifecycle, and result collection. Each role's file access is restricted by the gateway — agents can only read/write files within their permission scope. Handoff data (e.g., changed files from coder) is passed between waves via the EGG_HANDOFF_DATA environment variable.

Starting a Pipeline

# Direct CLI (recommended)
egg-sdlc -r egg -i 123    # Issue mode — repo dir + issue number
egg-sdlc -r egg 123       # Short form (positional issue)
egg-sdlc --private -r myrepo -i 456  # Private mode (network lockdown)
egg-sdlc                  # Local mode — interactive prompt
egg-sdlc -r egg -p "Add user auth"  # Local mode — non-interactive with prompt

# Or from inside an egg session
egg
/sdlc -r egg -i 123       # Redirects to egg-sdlc

The pipeline creates a draft PR automatically when entering the implement phase. Once all checks pass, the PR is marked ready for human review and merge.

Restarting Failed Pipelines

If a pipeline fails, you can restart it from the failed phase using the orchestrator API:

curl -X POST http://localhost:9849/api/v1/pipelines/{pipeline-id}/start

This resets the failed phase to pending, clears error state, and resumes execution from that phase. Completed and cancelled pipelines cannot be restarted.

Human-in-the-Loop Checkpoints

At each phase boundary (refine and plan), the pipeline pauses for human approval before proceeding. Interaction happens through checkbox-based UI in GitHub comments (issue mode) or terminal prompts (local mode). In local mode, the orchestrator also supports requesting changes to re-run a phase with feedback (limited by max_review_cycles, default 3).

Guidance: Provide additional context, adjust acceptance criteria, break into subtasks
Override: Mark complete, skip tasks, cancel pipeline
Manual: Complete manually, reassign

Quick Start

# Clone and install
git clone https://github.com/jwbron/egg.git
cd egg
pip install -e ./sandbox

# Run egg — auto-setup prompts on first run
egg

Running egg starts the gateway and sandbox automatically. On first run it will prompt you to configure repositories and credentials via egg --setup. By default it launches in public mode (full internet access); use egg --private for network-locked private repo mode.

Use egg-sdlc to launch the full SDLC pipeline, or work interactively inside an egg session with individual agent modes (/coder-mode, /tester-mode, etc.).

See the Local Quickstart Guide for detailed setup instructions including PAT-based authentication.

Docker Compose (Advanced)

For managing the gateway stack separately:

# Initialize configuration
bin/egg-deploy init

# Edit .env with your credentials (GITHUB_USER_TOKEN, etc.)
vim .env

# Start the gateway
bin/egg-deploy up

# Start a sandbox session against the running gateway
egg --compose

See the Deployment Guide for deployment options.

CLI Reference

egg CLI

Command	Description
`egg`	Start interactive sandbox session (public mode, auto-setup on first run)
`egg --public`	Explicit public mode (full internet access, default)
`egg --private`	Private mode (Anthropic API only, network lockdown)
`egg --setup`	Run interactive setup wizard
`egg --reset`	Reset configuration and start over
`egg --exec <cmd>`	Execute command in ephemeral container
`egg --compose`	Start gateway via Docker Compose
`egg --compose --down`	Stop the Docker Compose stack (gateway + orchestrator)
`egg --compose --build`	Rebuild compose images before starting

egg-deploy CLI

For production/advanced deployments using Docker Compose:

Command	Description
`bin/egg-deploy init`	Initialize configuration files
`bin/egg-deploy up`	Start the gateway stack
`bin/egg-deploy down`	Stop the gateway stack
`bin/egg-deploy status`	Show container status and health
`bin/egg-deploy logs`	Follow gateway logs
`bin/egg-deploy build`	Rebuild Docker images

egg-status CLI

For monitoring all active SDLC pipelines in real-time:

Command	Description
`bin/egg-status`	Stream real-time status for all active pipelines
`bin/egg-status --once`	Show current snapshot and exit
`bin/egg-status --all`	Include completed/failed pipelines
`bin/egg-status --verbose`	Show full DAG instead of compact status
`bin/egg-status --ascii`	Use ASCII-only characters (no Unicode)
`bin/egg-status --port <port>`	Specify orchestrator port (default: 9849)

egg-pipeline-watch CLI

For watching a specific pipeline's progress with DAG visualization:

Command	Description
`bin/egg-pipeline-watch <pipeline-id>`	Stream DAG visualization for a specific pipeline
`bin/egg-pipeline-watch <pipeline-id> --compact`	Show compact single-line status instead of full DAG
`bin/egg-pipeline-watch <pipeline-id> --once`	Show current state and exit (no streaming)
`bin/egg-pipeline-watch <pipeline-id> --ascii`	Use ASCII-only characters (no Unicode)

egg-orch CLI

For programmatic interaction with the orchestrator API (usable by both agents and humans):

Command Group	Description
`egg-orch health`	Check orchestrator + gateway health
`egg-orch pipeline list/get/create/status/delete`	Pipeline management
`egg-orch signal complete/progress/error/heartbeat`	Send agent signals
`egg-orch phase get/advance/start/complete`	Phase transitions
`egg-orch decision list/create/resolve/status`	HITL decision queue
`egg-orch container list/spawn/get/stop/logs`	Container operations
`egg-orch gateway health/phase/permissions`	Gateway operations
`egg-orch env`	Show orchestrator environment variables

All commands support --json for machine-readable output. Run egg-orch <command> --help for detailed usage.

egg-checkpoint CLI

For querying agent session checkpoints across multi-agent pipelines:

Command	Description
`egg-checkpoint list [filters]`	List checkpoints with multi-dimensional filtering
`egg-checkpoint show <id-or-commit>`	Display full checkpoint details (transcript, tool calls, files touched)
`egg-checkpoint browse --issue <n>`	Filter checkpoints by issue number
`egg-checkpoint context [filters]`	Cross-agent context summary grouped by phase and agent type

Filters: --issue, --pr, --pipeline, --session, --branch, --trigger, --status, --agent-type, --phase, --limit, --json

See the Checkpoint Access Guide for detailed usage examples.

Flags

Flag	Description
`--private`	Enable private mode (Anthropic API + private GitHub repos only)
`--public`	Enable public mode (full internet access, default)
`--compose`	Use Docker Compose to manage the gateway stack
`--down`	Stop the Docker Compose stack (use with `--compose`)
`--build`	Rebuild compose images before starting (use with `--compose`)
`--multi-agent`	Enable multi-agent execution (wave-based parallel agents)
`--no-multi-agent`	Disable multi-agent execution (single-agent mode)
`--max-parallel <n>`	Maximum parallel agents per wave (default: 10)
`--exec <cmd>`	Execute command in new ephemeral container
`--timeout <min>`	Timeout for --exec commands (default: 30)
`--auth <method>`	Anthropic auth method for --exec: `oauth-token` (default) or `api-key`
`--rebuild`	Force rebuild Docker image
`--time`	Show startup timing breakdown for debugging
`-v, --verbose`	Show detailed output instead of progress bar

Documentation

SDLC Pipeline

SDLC Pipeline Guide — Operational guide, CLI commands, triggering
ADR: SDLC Pipeline — Architecture, threat model, security properties

Architecture

Documentation Index — Navigation hub for all docs
Architecture Overview — System design and components
Gateway Sidecar — Policy enforcement, API endpoints, credential injection
Sandbox Container — Agent environment, tools, wrappers
Project Structure — Directory layout and component map

Architecture Decision Records

SDLC Pipeline — Structurally enforced agent checkpoints
Git Isolation — Worktree isolation design
Credential Injection — Zero-credential sandbox design
Declarative Setup — Setup wizard architecture
Standardized Logging — Structured logging interface
All ADRs — Complete index (7 implemented, 3 in-progress)

Component Documentation

Shared Libraries — Config, logging, and git utilities
Configuration — Repository and host configuration

Guides

Local Quickstart — Get running locally with PAT authentication
Deployment Guide — Deployment options
Deploy Migration — Migrating from legacy deployments
Agent Development — Developing agent strategies
Agent Mode Design — When to use constraints vs. freedom

Other

Human-in-the-Loop Decisions — Decision workflow and checkbox UI
Agentic Feedback Loop — The foundational feedback loop
Why egg Works — Safety, quality, and collaboration
Contributing — Development setup and workflow

Versioning

egg uses semantic versioning for Docker images.

Docker Images

# Latest stable (updated on every release)
docker pull ghcr.io/jwbron/egg-sandbox:latest

# Major version (updated on v0.x.y releases)
docker pull ghcr.io/jwbron/egg-sandbox:v0

# Exact version
docker pull ghcr.io/jwbron/egg-sandbox:v0.1.0

Breaking Changes

v0.x.y: Pre-stable releases. Minor versions may contain breaking changes.
v1.x.y and later: Stable releases. Breaking changes only in major version bumps.

See RELEASING.md for the release process.

Development

make setup           # Set up development environment
make lint            # Run all linters
make test            # Run all tests
make test-integration # Run integration tests
make test-e2e        # Run end-to-end tests
make lint-fix        # Auto-fix lint issues
make build           # Build Docker images

Requires Python >= 3.11. See CONTRIBUTING.md for development guidelines.

License

MIT License — see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 430 Commits
.egg-state		.egg-state
.egg		.egg
.github		.github
action		action
bin		bin
config		config
docs		docs
gateway		gateway
integration_tests		integration_tests
metrics		metrics
orchestrator		orchestrator
sandbox		sandbox
scripts		scripts
shared		shared
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.hadolint.yaml		.hadolint.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
.yamllint.yaml		.yamllint.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
RELEASING.md		RELEASING.md
docker-compose.yml		docker-compose.yml
egg.yaml.example		egg.yaml.example
pyproject.toml		pyproject.toml
secrets.yaml.example		secrets.yaml.example
uv.lock		uv.lock

License

jwbron/egg

Folders and files

Latest commit

History

Repository files navigation

egg

How It Works

The Gateway

What's Enforced

Phase Permissions

How Isolation Works

Multi-Agent Orchestration

Implementation Phase Roles

Reviewer Roles

Refine Phase Roles

Plan Phase Roles

How It Works

Starting a Pipeline

Restarting Failed Pipelines

Human-in-the-Loop Checkpoints

Quick Start

Docker Compose (Advanced)

CLI Reference

egg CLI

egg-deploy CLI

egg-status CLI

egg-pipeline-watch CLI

egg-orch CLI

egg-checkpoint CLI

Flags

Documentation

SDLC Pipeline

Architecture

Architecture Decision Records

Component Documentation

Guides

Other

Versioning

Docker Images

Breaking Changes

Development

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Packages