AgentOps

Context engineering — crafting what enters the window

AgentOps

Coding agents forget everything between sessions. This fixes that.

How It Works · See It Work · Install · The Path · Skills · Deep Dive · FAQ

How It Works

Coding agents get a blank context window every session. AgentOps automates the feedback loop so each session starts smarter than the last — learnings extracted, quality-gated, and re-injected automatically.

DevOps' Three Ways: flow, feedback, continual learning. AgentOps applies them to the agent loop, then compounds memory between sessions.

The building blocks: primitives you can mix and match into a custom pipeline that fits your workflow.

Flow: orchestration skills that move WIP through the system. Research → plan → validate → build → review → learn — single-piece flow, minimizing context switches.
Feedback: shorten the feedback loop until defects can't survive it. Multi-model councils catch issues before code ships. Hooks make the rules unavoidable — validation gates, push blocking, regression auto-revert. Problems found Friday don't wait until Monday.
Learning: stop rediscovering what you already know. Every session extracts learnings, scores them, and re-injects them at the next session start. Knowledge compounds instead of decaying. Session 50 knows what session 1 learned the hard way.

See It Work

Use one piece. No pipeline required — every skill works standalone:

> /council validate this PR

[council] 3 judges spawned
[judge-1] PASS — JWT implementation correct
[judge-2] WARN — rate limiting missing on /login
[judge-3] PASS — refresh rotation implemented
Consensus: WARN — add rate limiting before shipping

Three weeks later, different session:

> /knowledge "rate limiting"

1. .agents/learnings/2026-01-28-rate-limiting.md
   [established] Token bucket with Redis — chose over sliding window for burst tolerance
2. .agents/patterns/api-middleware.md
   Pattern: rate limit at middleware layer, not per-handler

Your agent reads these automatically at session start — no CLI required, just skills + .agents/.

Wire it all together when you're ready:

> /rpi "add retry backoff to rate limiter"

[research]    Found 2 prior learnings on rate limiting (injected)
[plan]        2 issues, 1 wave → epic ag-0058
[pre-mortem]  4 judges → PASS (knew about Redis choice from prior session)
[crank]       Wave 1: ██ 2/2
[vibe]        3 judges → PASS
[post-mortem] 2 new learnings → .agents/
[flywheel]    Next: /rpi "add circuit breaker to external API calls"

More examples — /crank, /evolve

Parallel agents with fresh context:

> /crank ag-0042

[crank] Epic: ag-0042 — 6 issues, 3 waves
[wave-1] ██████ 3/3 complete
[wave-2] ████── 2/2 complete
[wave-3] ██──── 1/1 complete
[vibe] PASS — all gates locked
[post-mortem] 4 learnings extracted

Goal-driven improvement loop:

> /evolve --max-cycles=5

[evolve] GOALS.yaml: 4 goals loaded
[cycle-1] Measuring fitness... 2/4 passing
         Worst gap: test-pass-rate (weight: 10)
         /rpi "Improve test-pass-rate" → 3 issues, 2 waves
         Re-measure: 3/4 passing ✓
[cycle-2] Worst gap: doc-coverage (weight: 7)
         /rpi "Improve doc-coverage" → 2 issues, 1 wave
         Re-measure: 4/4 passing ✓
[cycle-3] All goals met. Checking harvested work...
         Picked: "add smoke test for /evolve" (from post-mortem)
[teardown] /post-mortem → 5 learnings extracted

Install

# Claude Code, Codex CLI, Cursor (most users)
npx skills@latest add boshu2/agentops --all -g

# OpenCode
curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install-opencode.sh | bash

Then type /quickstart in your agent chat.

# Claude Code plugin (alternative)
claude plugin add boshu2/agentops

Full setup — CLI + hooks (optional)

brew tap boshu2/agentops https://github.com/boshu2/homebrew-agentops && brew install agentops
cd /path/to/your/repo
ao init --hooks --full

12 hooks across 8 lifecycle event types. Adds knowledge injection/extraction, validation gates, session lifecycle. All skills work without it.

OpenCode — plugin + skills

Installs 7 hooks (tool enrichment, audit logging, compaction resilience) and symlinks all skills. Restart OpenCode after install. Details: .opencode/INSTALL.md

Your data — what it touches, where it lives, how to remove it

Local-only. No telemetry. No cloud. No accounts.

What	Where	Reversible?
Skills	Global skills dir (outside your repo; for Claude Code: `~/.claude/skills/`)	`npx skills@latest remove boshu2/agentops -g`
Knowledge artifacts	`.agents/` in your repo (git-ignored by default)	`rm -rf .agents/`
Hook registration	`.claude/settings.json`	`ao hooks uninstall` or delete entries
Git push gate	Pre-push hook (optional, only with CLI)	`AGENTOPS_HOOKS_DISABLED=1`

Nothing modifies your source code. Nothing phones home. Everything is open source — audit it yourself.

Troubleshooting: docs/troubleshooting.md

The Path

/quickstart                          ← Day 1: guided tour on your codebase (~10 min)
    │
/council, /research, /vibe           ← Week 1: use skills standalone
    │
/rpi "goal"                          ← Week 2: full lifecycle — research → ship → learn
    │
/product → /goals generate           ← Define what good looks like
    │
/evolve                              ← Ongoing: measure goals, fix gaps, compound

Start with /quickstart. Use individual skills when you need them. Graduate to /rpi for end-to-end. When you're ready for hands-free improvement: /product defines your mission and personas, /goals generate scans for fitness goals, and /evolve pursues them.

Deep Dive

Standard iterative development — research, plan, validate, build, review, learn — automated for agents that can't carry context between sessions.

This is DevOps thinking applied to agent work: the Three Ways as composable primitives.

Flow: wave-based execution (/crank) + workflow orchestration (/rpi) to keep work moving.
Feedback: shift-left validation (/pre-mortem, /vibe, /council) plus optional gates/hooks to make feedback unavoidable.
Continual learning: post-mortems turn outcomes into reusable knowledge in .agents/, so the next session starts smarter.

  /rpi "goal"
    │
    ├── /research → /plan → /pre-mortem → /crank → /vibe
    │
    ▼
  /post-mortem
    ├── validates what shipped
    ├── extracts learnings → .agents/
    └── suggests next /rpi command ────┐
                                       │
   /rpi "next goal" ◄──────────────────┘

The post-mortem analyzes each learning, asks "what process would this improve?", and writes improvement proposals. It hands you a ready-to-copy /rpi command. Paste it, walk away.

Learnings pass quality gates (specificity, actionability, novelty) and land in tiered pools. Freshness decay ensures recent insights outweigh stale patterns. The formal model is straightforward: if retrieval rate × usage rate exceeds decay rate, knowledge compounds. If not, it decays to zero.

Phase details — what each step does

/research — Explores your codebase. Produces a research artifact with findings and recommendations.
/plan — Decomposes the goal into issues with dependency waves. Derives scope boundaries and conformance checks. Creates a beads epic (git-native issue tracking).
/pre-mortem — Judges simulate failures before you write code, including a spec-completeness judge. FAIL? Re-plan with feedback (max 3 retries).
/crank — Spawns parallel agents in dependency-ordered waves. Each worker gets fresh context. Lead validates and commits. Runs until every issue is closed. --test-first for spec-first TDD.
/vibe — Judges validate the code. FAIL? Re-crank with failure context and re-vibe (max 3).
/post-mortem — Council validates the implementation. Retro extracts learnings. Suggests the next /rpi command.

/rpi "goal" runs all six end to end. Use --interactive for human gates at research and plan.

Phased RPI — fresh context per phase for larger goals

ao rpi phased "goal" runs each phase in its own session — no context bleed between phases.

ao rpi phased "add rate limiting"      # Hands-free, fresh context per phase
ao rpi phased "add auth" &             # Run multiple in parallel (auto-worktrees)
ao rpi phased --from=crank "fix perf"  # Resume from any phase

Use /rpi when context fits in one session. Use ao rpi phased when it doesn't.

Goal-driven mode — /evolve with GOALS.yaml

Bootstrap with /goals generate — it scans your repo (PRODUCT.md, README, skills, tests) and proposes mechanically verifiable goals. Or write them by hand:

# GOALS.yaml
version: 1
goals:
  - id: test-pass-rate
    description: "All tests pass"
    check: "make test"
    weight: 10

Then /evolve measures them, picks the worst gap, runs /rpi to fix it, re-measures ALL goals (regressed commits auto-revert), and loops. It commits locally — you control when to push. Kill switch: echo "stop" > ~/.config/evolve/KILL

Maintain over time: /goals shows pass/fail status, /goals prune finds stale or broken checks.

References — science, systems theory, prior art

Built on Darr 1995 (decay rates), Sweller 1988 (cognitive load), Liu et al. 2023 (lost-in-the-middle), MemRL 2025 (RL for memory).

AgentOps concentrates on the high-leverage end of Meadows' hierarchy: information flows (#6), rules (#5), self-organization (#4), goals (#3). The bet: changing the loop beats tuning the output.

Deep dive: docs/how-it-works.md — Brownian Ratchet, Ralph Loops, agent backends, hooks, context windowing.

Skills

43 skills: 33 user-facing, 10 internal (fire automatically). Each level composes the ones below it.

Scope	Skill	What it does
Single review	`/council`	Multiple judges (Claude + Codex) debate, surface disagreement, converge on a verdict. Customize with `--preset=security-audit`, `--perspectives="a,b,c"`, or `--perspectives-file`
Single issue	`/implement`	Full lifecycle for one task — research, plan, build, validate, learn
Multi-issue waves	`/crank`	Parallel agents in dependency-ordered waves with fresh context per worker
Full lifecycle	`/rpi`	Research → Plan → Pre-mortem → Crank → Vibe → Post-mortem — one command
Hands-free loop	`/evolve`	Measures fitness goals, picks the worst gap, ships a fix, rolls back regressions, repeats

Supporting skills: /research, /plan, /vibe, /pre-mortem, /post-mortem, /product, /goals, /readme, /status, /quickstart, /bug-hunt, /doc, /release, /knowledge, /handoff

Full reference: docs/SKILLS.md

Cross-runtime orchestration — mix Claude, Codex, OpenCode

AgentOps orchestrates across runtimes. Claude can lead a team of Codex workers. Codex judges can review Claude's output.

Spawning Backend	How it works	Best for
Native teams	`TeamCreate` + `SendMessage` — built into Claude Code	Tight coordination, debate
Background tasks	`Task(run_in_background=true)` — last-resort fallback	When no team APIs available
Codex sub-agents	`/codex-team` — Claude orchestrates Codex workers	Cross-vendor validation
tmux + Agent Mail	`/swarm --mode=distributed` — full process isolation	Long-running work, crash recovery

How AgentOps Fits With Other Tools

These are fellow experiments in making coding agents work. Use pieces from any of them.

Alternative	What it does well	Where AgentOps focuses differently
GSD	Clean subagent spawning, fights context rot	Cross-session memory (GSD keeps context fresh within a session; AgentOps carries knowledge between sessions)
Compound Engineer	Knowledge compounding, structured loop	Multi-model councils and validation gates — independent judges debating before and after code ships

Detailed comparisons →

The `ao` CLI

Optional. The CLI is plumbing — skills and hooks call it automatically. Install via the Full setup section above.

The killer feature: run the full lifecycle from your terminal — no chat session required:

ao rpi phased "add rate limiting"   # Spawns Claude, runs research → plan → ship → learn
ao rpi phased "fix auth bug" &      # Run multiple in parallel (auto-worktrees)
ao rpi phased --from=crank ag-058   # Resume from any phase

Each phase gets its own fresh context window. Walk away, come back to committed code + extracted learnings.

Other commands you'll use:

ao search "query"      # Search knowledge across files and chat history
ao demo                # Interactive demo

Full reference: CLI Commands

FAQ

docs/FAQ.md — comparisons, limitations, subagent nesting, PRODUCT.md, uninstall.

Built on — Ralph Wiggum, Multiclaude, beads, CASS, MemRL

Ralph Wiggum (fresh context per agent) · Multiclaude (validation gates) · beads (git-native issues) · CASS (session search) · MemRL (cross-session memory)

Contributing

Issue tracking — Beads / bd

Git-native issues in .beads/. bd onboard (setup) · bd ready (find work) · bd show <id> · bd close <id> · bd sync. More: AGENTS.md

See CONTRIBUTING.md. If AgentOps helped you ship something, post in Discussions.

License

Apache-2.0 · Docs · How It Works · FAQ · Glossary · Architecture · CLI Reference · Changelog

Name		Name	Last commit message	Last commit date
Latest commit History 697 Commits
.agents		.agents
.beads		.beads
.claude-plugin		.claude-plugin
.claude		.claude
.codex		.codex
.githooks		.githooks
.github		.github
.opencode		.opencode
bin		bin
cli		cli
docs		docs
homebrew-tap		homebrew-tap
hooks		hooks
lib		lib
schemas		schemas
scripts		scripts
skills		skills
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.goreleaser.yml		.goreleaser.yml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
GOALS.yaml		GOALS.yaml
LICENSE		LICENSE
PRODUCT.md		PRODUCT.md
README.md		README.md
SECURITY.md		SECURITY.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AgentOps

Coding agents forget everything between sessions. This fixes that.

How It Works

See It Work

Install

The Path

Deep Dive

Skills

How AgentOps Fits With Other Tools

The `ao` CLI

FAQ

Contributing

License

About

Uh oh!

Releases 38

Uh oh!

Contributors 5

Languages

License

boshu2/agentops

Folders and files

Latest commit

History

Repository files navigation

AgentOps

Coding agents forget everything between sessions. This fixes that.

How It Works

See It Work

Install

The Path

Deep Dive

Skills

How AgentOps Fits With Other Tools

The ao CLI

FAQ

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 38

Uh oh!

Contributors 5

Languages

The `ao` CLI