How It Works · See It Work · Install · The Path · Skills · Deep Dive · FAQ
Coding agents get a blank context window every session. AgentOps automates the feedback loop so each session starts smarter than the last — learnings extracted, quality-gated, and re-injected automatically.
DevOps' Three Ways: flow, feedback, continual learning. AgentOps applies them to the agent loop, then compounds memory between sessions.
The building blocks: primitives you can mix and match into a custom pipeline that fits your workflow.
- Flow: orchestration skills that move WIP through the system. Research → plan → validate → build → review → learn — single-piece flow, minimizing context switches.
- Feedback: shorten the feedback loop until defects can't survive it. Multi-model councils catch issues before code ships. Hooks make the rules unavoidable — validation gates, push blocking, regression auto-revert. Problems found Friday don't wait until Monday.
- Learning: stop rediscovering what you already know. Every session extracts learnings, scores them, and re-injects them at the next session start. Knowledge compounds instead of decaying. Session 50 knows what session 1 learned the hard way.
Use one piece. No pipeline required — every skill works standalone:
> /council validate this PR
[council] 3 judges spawned
[judge-1] PASS — JWT implementation correct
[judge-2] WARN — rate limiting missing on /login
[judge-3] PASS — refresh rotation implemented
Consensus: WARN — add rate limiting before shipping
Three weeks later, different session:
> /knowledge "rate limiting"
1. .agents/learnings/2026-01-28-rate-limiting.md
[established] Token bucket with Redis — chose over sliding window for burst tolerance
2. .agents/patterns/api-middleware.md
Pattern: rate limit at middleware layer, not per-handler
Your agent reads these automatically at session start — no CLI required, just skills + .agents/.
Wire it all together when you're ready:
> /rpi "add retry backoff to rate limiter"
[research] Found 2 prior learnings on rate limiting (injected)
[plan] 2 issues, 1 wave → epic ag-0058
[pre-mortem] 4 judges → PASS (knew about Redis choice from prior session)
[crank] Wave 1: ██ 2/2
[vibe] 3 judges → PASS
[post-mortem] 2 new learnings → .agents/
[flywheel] Next: /rpi "add circuit breaker to external API calls"
More examples — /crank, /evolve
Parallel agents with fresh context:
> /crank ag-0042
[crank] Epic: ag-0042 — 6 issues, 3 waves
[wave-1] ██████ 3/3 complete
[wave-2] ████── 2/2 complete
[wave-3] ██──── 1/1 complete
[vibe] PASS — all gates locked
[post-mortem] 4 learnings extracted
Goal-driven improvement loop:
> /evolve --max-cycles=5
[evolve] GOALS.yaml: 4 goals loaded
[cycle-1] Measuring fitness... 2/4 passing
Worst gap: test-pass-rate (weight: 10)
/rpi "Improve test-pass-rate" → 3 issues, 2 waves
Re-measure: 3/4 passing ✓
[cycle-2] Worst gap: doc-coverage (weight: 7)
/rpi "Improve doc-coverage" → 2 issues, 1 wave
Re-measure: 4/4 passing ✓
[cycle-3] All goals met. Checking harvested work...
Picked: "add smoke test for /evolve" (from post-mortem)
[teardown] /post-mortem → 5 learnings extracted
# Claude Code, Codex CLI, Cursor (most users)
npx skills@latest add boshu2/agentops --all -g
# OpenCode
curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install-opencode.sh | bashThen type /quickstart in your agent chat.
# Claude Code plugin (alternative)
claude plugin add boshu2/agentopsFull setup — CLI + hooks (optional)
brew tap boshu2/agentops https://github.com/boshu2/homebrew-agentops && brew install agentops
cd /path/to/your/repo
ao init --hooks --full12 hooks across 8 lifecycle event types. Adds knowledge injection/extraction, validation gates, session lifecycle. All skills work without it.
OpenCode — plugin + skills
Installs 7 hooks (tool enrichment, audit logging, compaction resilience) and symlinks all skills. Restart OpenCode after install. Details: .opencode/INSTALL.md
Your data — what it touches, where it lives, how to remove it
Local-only. No telemetry. No cloud. No accounts.
| What | Where | Reversible? |
|---|---|---|
| Skills | Global skills dir (outside your repo; for Claude Code: ~/.claude/skills/) |
npx skills@latest remove boshu2/agentops -g |
| Knowledge artifacts | .agents/ in your repo (git-ignored by default) |
rm -rf .agents/ |
| Hook registration | .claude/settings.json |
ao hooks uninstall or delete entries |
| Git push gate | Pre-push hook (optional, only with CLI) | AGENTOPS_HOOKS_DISABLED=1 |
Nothing modifies your source code. Nothing phones home. Everything is open source — audit it yourself.
Troubleshooting: docs/troubleshooting.md
/quickstart ← Day 1: guided tour on your codebase (~10 min)
│
/council, /research, /vibe ← Week 1: use skills standalone
│
/rpi "goal" ← Week 2: full lifecycle — research → ship → learn
│
/product → /goals generate ← Define what good looks like
│
/evolve ← Ongoing: measure goals, fix gaps, compound
Start with /quickstart. Use individual skills when you need them. Graduate to /rpi for end-to-end. When you're ready for hands-free improvement: /product defines your mission and personas, /goals generate scans for fitness goals, and /evolve pursues them.
Standard iterative development — research, plan, validate, build, review, learn — automated for agents that can't carry context between sessions.
This is DevOps thinking applied to agent work: the Three Ways as composable primitives.
- Flow: wave-based execution (
/crank) + workflow orchestration (/rpi) to keep work moving. - Feedback: shift-left validation (
/pre-mortem,/vibe,/council) plus optional gates/hooks to make feedback unavoidable. - Continual learning: post-mortems turn outcomes into reusable knowledge in
.agents/, so the next session starts smarter.
/rpi "goal"
│
├── /research → /plan → /pre-mortem → /crank → /vibe
│
▼
/post-mortem
├── validates what shipped
├── extracts learnings → .agents/
└── suggests next /rpi command ────┐
│
/rpi "next goal" ◄──────────────────┘
The post-mortem analyzes each learning, asks "what process would this improve?", and writes improvement proposals. It hands you a ready-to-copy /rpi command. Paste it, walk away.
Learnings pass quality gates (specificity, actionability, novelty) and land in tiered pools. Freshness decay ensures recent insights outweigh stale patterns. The formal model is straightforward: if retrieval rate × usage rate exceeds decay rate, knowledge compounds. If not, it decays to zero.
Phase details — what each step does
-
/research— Explores your codebase. Produces a research artifact with findings and recommendations. -
/plan— Decomposes the goal into issues with dependency waves. Derives scope boundaries and conformance checks. Creates a beads epic (git-native issue tracking). -
/pre-mortem— Judges simulate failures before you write code, including a spec-completeness judge. FAIL? Re-plan with feedback (max 3 retries). -
/crank— Spawns parallel agents in dependency-ordered waves. Each worker gets fresh context. Lead validates and commits. Runs until every issue is closed.--test-firstfor spec-first TDD. -
/vibe— Judges validate the code. FAIL? Re-crank with failure context and re-vibe (max 3). -
/post-mortem— Council validates the implementation. Retro extracts learnings. Suggests the next/rpicommand.
/rpi "goal" runs all six end to end. Use --interactive for human gates at research and plan.
Phased RPI — fresh context per phase for larger goals
ao rpi phased "goal" runs each phase in its own session — no context bleed between phases.
ao rpi phased "add rate limiting" # Hands-free, fresh context per phase
ao rpi phased "add auth" & # Run multiple in parallel (auto-worktrees)
ao rpi phased --from=crank "fix perf" # Resume from any phaseUse /rpi when context fits in one session. Use ao rpi phased when it doesn't.
Goal-driven mode — /evolve with GOALS.yaml
Bootstrap with /goals generate — it scans your repo (PRODUCT.md, README, skills, tests) and proposes mechanically verifiable goals. Or write them by hand:
# GOALS.yaml
version: 1
goals:
- id: test-pass-rate
description: "All tests pass"
check: "make test"
weight: 10Then /evolve measures them, picks the worst gap, runs /rpi to fix it, re-measures ALL goals (regressed commits auto-revert), and loops. It commits locally — you control when to push. Kill switch: echo "stop" > ~/.config/evolve/KILL
Maintain over time: /goals shows pass/fail status, /goals prune finds stale or broken checks.
References — science, systems theory, prior art
Built on Darr 1995 (decay rates), Sweller 1988 (cognitive load), Liu et al. 2023 (lost-in-the-middle), MemRL 2025 (RL for memory).
AgentOps concentrates on the high-leverage end of Meadows' hierarchy: information flows (#6), rules (#5), self-organization (#4), goals (#3). The bet: changing the loop beats tuning the output.
Deep dive: docs/how-it-works.md — Brownian Ratchet, Ralph Loops, agent backends, hooks, context windowing.
43 skills: 33 user-facing, 10 internal (fire automatically). Each level composes the ones below it.
| Scope | Skill | What it does |
|---|---|---|
| Single review | /council |
Multiple judges (Claude + Codex) debate, surface disagreement, converge on a verdict. Customize with --preset=security-audit, --perspectives="a,b,c", or --perspectives-file |
| Single issue | /implement |
Full lifecycle for one task — research, plan, build, validate, learn |
| Multi-issue waves | /crank |
Parallel agents in dependency-ordered waves with fresh context per worker |
| Full lifecycle | /rpi |
Research → Plan → Pre-mortem → Crank → Vibe → Post-mortem — one command |
| Hands-free loop | /evolve |
Measures fitness goals, picks the worst gap, ships a fix, rolls back regressions, repeats |
Supporting skills: /research, /plan, /vibe, /pre-mortem, /post-mortem, /product, /goals, /readme, /status, /quickstart, /bug-hunt, /doc, /release, /knowledge, /handoff
Full reference: docs/SKILLS.md
Cross-runtime orchestration — mix Claude, Codex, OpenCode
AgentOps orchestrates across runtimes. Claude can lead a team of Codex workers. Codex judges can review Claude's output.
| Spawning Backend | How it works | Best for |
|---|---|---|
| Native teams | TeamCreate + SendMessage — built into Claude Code |
Tight coordination, debate |
| Background tasks | Task(run_in_background=true) — last-resort fallback |
When no team APIs available |
| Codex sub-agents | /codex-team — Claude orchestrates Codex workers |
Cross-vendor validation |
| tmux + Agent Mail | /swarm --mode=distributed — full process isolation |
Long-running work, crash recovery |
These are fellow experiments in making coding agents work. Use pieces from any of them.
| Alternative | What it does well | Where AgentOps focuses differently |
|---|---|---|
| GSD | Clean subagent spawning, fights context rot | Cross-session memory (GSD keeps context fresh within a session; AgentOps carries knowledge between sessions) |
| Compound Engineer | Knowledge compounding, structured loop | Multi-model councils and validation gates — independent judges debating before and after code ships |
Optional. The CLI is plumbing — skills and hooks call it automatically. Install via the Full setup section above.
The killer feature: run the full lifecycle from your terminal — no chat session required:
ao rpi phased "add rate limiting" # Spawns Claude, runs research → plan → ship → learn
ao rpi phased "fix auth bug" & # Run multiple in parallel (auto-worktrees)
ao rpi phased --from=crank ag-058 # Resume from any phaseEach phase gets its own fresh context window. Walk away, come back to committed code + extracted learnings.
Other commands you'll use:
ao search "query" # Search knowledge across files and chat history
ao demo # Interactive demoFull reference: CLI Commands
docs/FAQ.md — comparisons, limitations, subagent nesting, PRODUCT.md, uninstall.
Built on — Ralph Wiggum, Multiclaude, beads, CASS, MemRL
Ralph Wiggum (fresh context per agent) · Multiclaude (validation gates) · beads (git-native issues) · CASS (session search) · MemRL (cross-session memory)
Issue tracking — Beads / bd
Git-native issues in .beads/. bd onboard (setup) · bd ready (find work) · bd show <id> · bd close <id> · bd sync. More: AGENTS.md
See CONTRIBUTING.md. If AgentOps helped you ship something, post in Discussions.
Apache-2.0 · Docs · How It Works · FAQ · Glossary · Architecture · CLI Reference · Changelog