Skip to content

The missing DevOps layer for coding agents. Flow, feedback, and memory that compounds between sessions.

License

Notifications You must be signed in to change notification settings

boshu2/agentops

Context engineering — crafting what enters the window

AgentOps

Coding agents forget everything between sessions. This fixes that.

Version Skills License

How It Works · See It Work · Install · The Path · Skills · Deep Dive · FAQ


How It Works

Coding agents get a blank context window every session. AgentOps automates the feedback loop so each session starts smarter than the last — learnings extracted, quality-gated, and re-injected automatically.

DevOps' Three Ways: flow, feedback, continual learning. AgentOps applies them to the agent loop, then compounds memory between sessions.

The building blocks: primitives you can mix and match into a custom pipeline that fits your workflow.

  • Flow: orchestration skills that move WIP through the system. Research → plan → validate → build → review → learn — single-piece flow, minimizing context switches.
  • Feedback: shorten the feedback loop until defects can't survive it. Multi-model councils catch issues before code ships. Hooks make the rules unavoidable — validation gates, push blocking, regression auto-revert. Problems found Friday don't wait until Monday.
  • Learning: stop rediscovering what you already know. Every session extracts learnings, scores them, and re-injects them at the next session start. Knowledge compounds instead of decaying. Session 50 knows what session 1 learned the hard way.

See It Work

Use one piece. No pipeline required — every skill works standalone:

> /council validate this PR

[council] 3 judges spawned
[judge-1] PASS — JWT implementation correct
[judge-2] WARN — rate limiting missing on /login
[judge-3] PASS — refresh rotation implemented
Consensus: WARN — add rate limiting before shipping

Three weeks later, different session:

> /knowledge "rate limiting"

1. .agents/learnings/2026-01-28-rate-limiting.md
   [established] Token bucket with Redis — chose over sliding window for burst tolerance
2. .agents/patterns/api-middleware.md
   Pattern: rate limit at middleware layer, not per-handler

Your agent reads these automatically at session start — no CLI required, just skills + .agents/.

Wire it all together when you're ready:

> /rpi "add retry backoff to rate limiter"

[research]    Found 2 prior learnings on rate limiting (injected)
[plan]        2 issues, 1 wave → epic ag-0058
[pre-mortem]  4 judges → PASS (knew about Redis choice from prior session)
[crank]       Wave 1: ██ 2/2
[vibe]        3 judges → PASS
[post-mortem] 2 new learnings → .agents/
[flywheel]    Next: /rpi "add circuit breaker to external API calls"
More examples — /crank, /evolve

Parallel agents with fresh context:

> /crank ag-0042

[crank] Epic: ag-0042 — 6 issues, 3 waves
[wave-1] ██████ 3/3 complete
[wave-2] ████── 2/2 complete
[wave-3] ██──── 1/1 complete
[vibe] PASS — all gates locked
[post-mortem] 4 learnings extracted

Goal-driven improvement loop:

> /evolve --max-cycles=5

[evolve] GOALS.yaml: 4 goals loaded
[cycle-1] Measuring fitness... 2/4 passing
         Worst gap: test-pass-rate (weight: 10)
         /rpi "Improve test-pass-rate" → 3 issues, 2 waves
         Re-measure: 3/4 passing ✓
[cycle-2] Worst gap: doc-coverage (weight: 7)
         /rpi "Improve doc-coverage" → 2 issues, 1 wave
         Re-measure: 4/4 passing ✓
[cycle-3] All goals met. Checking harvested work...
         Picked: "add smoke test for /evolve" (from post-mortem)
[teardown] /post-mortem → 5 learnings extracted

Install

# Claude Code, Codex CLI, Cursor (most users)
npx skills@latest add boshu2/agentops --all -g

# OpenCode
curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install-opencode.sh | bash

Then type /quickstart in your agent chat.

# Claude Code plugin (alternative)
claude plugin add boshu2/agentops
Full setup — CLI + hooks (optional)
brew tap boshu2/agentops https://github.com/boshu2/homebrew-agentops && brew install agentops
cd /path/to/your/repo
ao init --hooks --full

12 hooks across 8 lifecycle event types. Adds knowledge injection/extraction, validation gates, session lifecycle. All skills work without it.

OpenCode — plugin + skills

Installs 7 hooks (tool enrichment, audit logging, compaction resilience) and symlinks all skills. Restart OpenCode after install. Details: .opencode/INSTALL.md

Your data — what it touches, where it lives, how to remove it

Local-only. No telemetry. No cloud. No accounts.

What Where Reversible?
Skills Global skills dir (outside your repo; for Claude Code: ~/.claude/skills/) npx skills@latest remove boshu2/agentops -g
Knowledge artifacts .agents/ in your repo (git-ignored by default) rm -rf .agents/
Hook registration .claude/settings.json ao hooks uninstall or delete entries
Git push gate Pre-push hook (optional, only with CLI) AGENTOPS_HOOKS_DISABLED=1

Nothing modifies your source code. Nothing phones home. Everything is open source — audit it yourself.

Troubleshooting: docs/troubleshooting.md


The Path

/quickstart                          ← Day 1: guided tour on your codebase (~10 min)
    │
/council, /research, /vibe           ← Week 1: use skills standalone
    │
/rpi "goal"                          ← Week 2: full lifecycle — research → ship → learn
    │
/product → /goals generate           ← Define what good looks like
    │
/evolve                              ← Ongoing: measure goals, fix gaps, compound

Start with /quickstart. Use individual skills when you need them. Graduate to /rpi for end-to-end. When you're ready for hands-free improvement: /product defines your mission and personas, /goals generate scans for fitness goals, and /evolve pursues them.


Deep Dive

Standard iterative development — research, plan, validate, build, review, learn — automated for agents that can't carry context between sessions.

This is DevOps thinking applied to agent work: the Three Ways as composable primitives.

  • Flow: wave-based execution (/crank) + workflow orchestration (/rpi) to keep work moving.
  • Feedback: shift-left validation (/pre-mortem, /vibe, /council) plus optional gates/hooks to make feedback unavoidable.
  • Continual learning: post-mortems turn outcomes into reusable knowledge in .agents/, so the next session starts smarter.
  /rpi "goal"
    │
    ├── /research → /plan → /pre-mortem → /crank → /vibe
    │
    ▼
  /post-mortem
    ├── validates what shipped
    ├── extracts learnings → .agents/
    └── suggests next /rpi command ────┐
                                       │
   /rpi "next goal" ◄──────────────────┘

The post-mortem analyzes each learning, asks "what process would this improve?", and writes improvement proposals. It hands you a ready-to-copy /rpi command. Paste it, walk away.

Learnings pass quality gates (specificity, actionability, novelty) and land in tiered pools. Freshness decay ensures recent insights outweigh stale patterns. The formal model is straightforward: if retrieval rate × usage rate exceeds decay rate, knowledge compounds. If not, it decays to zero.

Phase details — what each step does
  1. /research — Explores your codebase. Produces a research artifact with findings and recommendations.

  2. /plan — Decomposes the goal into issues with dependency waves. Derives scope boundaries and conformance checks. Creates a beads epic (git-native issue tracking).

  3. /pre-mortem — Judges simulate failures before you write code, including a spec-completeness judge. FAIL? Re-plan with feedback (max 3 retries).

  4. /crank — Spawns parallel agents in dependency-ordered waves. Each worker gets fresh context. Lead validates and commits. Runs until every issue is closed. --test-first for spec-first TDD.

  5. /vibe — Judges validate the code. FAIL? Re-crank with failure context and re-vibe (max 3).

  6. /post-mortem — Council validates the implementation. Retro extracts learnings. Suggests the next /rpi command.

/rpi "goal" runs all six end to end. Use --interactive for human gates at research and plan.

Phased RPI — fresh context per phase for larger goals

ao rpi phased "goal" runs each phase in its own session — no context bleed between phases.

ao rpi phased "add rate limiting"      # Hands-free, fresh context per phase
ao rpi phased "add auth" &             # Run multiple in parallel (auto-worktrees)
ao rpi phased --from=crank "fix perf"  # Resume from any phase

Use /rpi when context fits in one session. Use ao rpi phased when it doesn't.

Goal-driven mode — /evolve with GOALS.yaml

Bootstrap with /goals generate — it scans your repo (PRODUCT.md, README, skills, tests) and proposes mechanically verifiable goals. Or write them by hand:

# GOALS.yaml
version: 1
goals:
  - id: test-pass-rate
    description: "All tests pass"
    check: "make test"
    weight: 10

Then /evolve measures them, picks the worst gap, runs /rpi to fix it, re-measures ALL goals (regressed commits auto-revert), and loops. It commits locally — you control when to push. Kill switch: echo "stop" > ~/.config/evolve/KILL

Maintain over time: /goals shows pass/fail status, /goals prune finds stale or broken checks.

References — science, systems theory, prior art

Built on Darr 1995 (decay rates), Sweller 1988 (cognitive load), Liu et al. 2023 (lost-in-the-middle), MemRL 2025 (RL for memory).

AgentOps concentrates on the high-leverage end of Meadows' hierarchy: information flows (#6), rules (#5), self-organization (#4), goals (#3). The bet: changing the loop beats tuning the output.

Deep dive: docs/how-it-works.md — Brownian Ratchet, Ralph Loops, agent backends, hooks, context windowing.


Skills

43 skills: 33 user-facing, 10 internal (fire automatically). Each level composes the ones below it.

Scope Skill What it does
Single review /council Multiple judges (Claude + Codex) debate, surface disagreement, converge on a verdict. Customize with --preset=security-audit, --perspectives="a,b,c", or --perspectives-file
Single issue /implement Full lifecycle for one task — research, plan, build, validate, learn
Multi-issue waves /crank Parallel agents in dependency-ordered waves with fresh context per worker
Full lifecycle /rpi Research → Plan → Pre-mortem → Crank → Vibe → Post-mortem — one command
Hands-free loop /evolve Measures fitness goals, picks the worst gap, ships a fix, rolls back regressions, repeats

Supporting skills: /research, /plan, /vibe, /pre-mortem, /post-mortem, /product, /goals, /readme, /status, /quickstart, /bug-hunt, /doc, /release, /knowledge, /handoff

Full reference: docs/SKILLS.md

Cross-runtime orchestration — mix Claude, Codex, OpenCode

AgentOps orchestrates across runtimes. Claude can lead a team of Codex workers. Codex judges can review Claude's output.

Spawning Backend How it works Best for
Native teams TeamCreate + SendMessage — built into Claude Code Tight coordination, debate
Background tasks Task(run_in_background=true) — last-resort fallback When no team APIs available
Codex sub-agents /codex-team — Claude orchestrates Codex workers Cross-vendor validation
tmux + Agent Mail /swarm --mode=distributed — full process isolation Long-running work, crash recovery

How AgentOps Fits With Other Tools

These are fellow experiments in making coding agents work. Use pieces from any of them.

Alternative What it does well Where AgentOps focuses differently
GSD Clean subagent spawning, fights context rot Cross-session memory (GSD keeps context fresh within a session; AgentOps carries knowledge between sessions)
Compound Engineer Knowledge compounding, structured loop Multi-model councils and validation gates — independent judges debating before and after code ships

Detailed comparisons →


The ao CLI

Optional. The CLI is plumbing — skills and hooks call it automatically. Install via the Full setup section above.

The killer feature: run the full lifecycle from your terminal — no chat session required:

ao rpi phased "add rate limiting"   # Spawns Claude, runs research → plan → ship → learn
ao rpi phased "fix auth bug" &      # Run multiple in parallel (auto-worktrees)
ao rpi phased --from=crank ag-058   # Resume from any phase

Each phase gets its own fresh context window. Walk away, come back to committed code + extracted learnings.

Other commands you'll use:

ao search "query"      # Search knowledge across files and chat history
ao demo                # Interactive demo

Full reference: CLI Commands


FAQ

docs/FAQ.md — comparisons, limitations, subagent nesting, PRODUCT.md, uninstall.


Built on — Ralph Wiggum, Multiclaude, beads, CASS, MemRL

Ralph Wiggum (fresh context per agent) · Multiclaude (validation gates) · beads (git-native issues) · CASS (session search) · MemRL (cross-session memory)

Contributing

Issue tracking — Beads / bd

Git-native issues in .beads/. bd onboard (setup) · bd ready (find work) · bd show <id> · bd close <id> · bd sync. More: AGENTS.md

See CONTRIBUTING.md. If AgentOps helped you ship something, post in Discussions.

License

Apache-2.0 · Docs · How It Works · FAQ · Glossary · Architecture · CLI Reference · Changelog