Principles that turn ad-hoc agent usage into a reliable, compounding practice.
Every agent session starts from zero. Same context problems. Same mistakes repeated. Same rework. You get good results sometimes and bad results other times, with no idea why. Your agent forgets everything between sessions — every conversation is like training a new junior developer.
The model isn't the problem. The operations are.
Most people blame the model when they get bad results. The real problem is how they operate: overloaded context windows, no session memory, no validation, no learning loops. Fix the operations and the same model performs dramatically better.
12-Factor AgentOps makes each session smarter than the last. Not through better models — through better operations.
Session 1: Your agent knows nothing about your codebase. It makes common mistakes. It ignores your conventions. It writes code that doesn't fit.
Session 10: Your agent knows your patterns. It avoids documented pitfalls. It follows your conventions because they're in the context.
Session 100: Your agent operates with institutional memory. It knows what's been tried and failed. It knows why architectural decisions were made. It builds on everything that came before.
The hook: Knowledge compounding is the one thing no amount of model improvement replaces. Better models with amnesia still repeat your mistakes.
No plugins, no tooling, no setup. Just a text file and discipline.
Step 1: Create a learnings.md file in your project root.
Step 2: After each agent session, append what worked and what didn't:
## Auth Middleware (2026-02-15)
- CORS requires explicit OPTIONS preflight handlers. Default config silently drops them.
- Session tokens must be validated server-side; client-side checks are insufficient.
- The auth middleware chain is: rate-limit → CORS → session → route handler.Step 3: Point your agent at it on startup. In Claude Code, add to CLAUDE.md:
Read learnings.md before starting any task.In Cursor, add to .cursorrules. In Codex, add to AGENTS.md. The mechanism varies; the principle doesn't.
That's it. You're now doing Factors I (context management), II (git tracking), and VII (knowledge extraction) at a basic level. Your agent will stop repeating documented mistakes immediately.
When to level up: When learnings.md exceeds ~50 entries or you stop reading it before sessions, you're ready for more structure.
Twelve vendor-neutral principles organized in four tiers. Start at the top. Each tier builds on the previous one. You can stop at any tier and keep the value.
Non-negotiable basics that work with zero tooling. Get these wrong and nothing else matters.
| # | Factor | The Rule |
|---|---|---|
| I | Context Is Everything | Manage what enters the context window like you manage what enters production. |
| II | Track Everything in Git | If it's not in git, it didn't happen. |
| III | One Agent, One Job | Each agent gets a scoped task and fresh context. Never reuse a saturated window. |
Without tooling: Keep sessions short. Start fresh for new tasks. Write handoff summaries. Commit your learnings.md. One issue per agent session.
How work flows through agents. The discipline that separates "prompting and hoping" from a reliable operating model.
| # | Factor | The Rule |
|---|---|---|
| IV | Research Before You Build | Understand the problem space before generating a single line of code. |
| V | Validate Externally | No agent grades its own work. Ever. |
| VI | Lock Progress Forward | Once work passes validation, it ratchets — it cannot regress. |
Without tooling: Research before implementing. Have a different session (or human) review the work. Commit validated work to protected branches.
Systematic extraction and injection of knowledge. This is where sessions start getting measurably smarter over time.
| # | Factor | The Rule |
|---|---|---|
| VII | Extract Learnings | Every session produces two outputs — the work product and the lessons learned. |
| VIII | Compound Knowledge | Learnings must flow back into future sessions automatically. |
| IX | Measure What Matters | Track fitness toward goals, not activity metrics. |
Factor VIII is the hero. It's the knowledge flywheel: extract learnings, gate for quality, inject into future sessions, measure retrieval, let stale knowledge decay. This is the differentiator that can't be commoditized — better models don't replace institutional memory.
Without tooling: Manually update learnings.md after each session. Review it weekly and prune stale entries. It's tedious but it works. The AgentOps plugin automates this — but the principle is portable.
Multi-agent orchestration patterns. Skip this entire tier if you work solo. You lose nothing. These patterns apply when you're running parallel agents on complex projects.
| # | Factor | The Rule |
|---|---|---|
| X | Isolate Workers | Each worker gets its own workspace, its own context, and zero shared mutable state. |
| XI | Supervise Hierarchically | Escalation flows up, never sideways. |
| XII | Harvest Failures as Wisdom | Failed attempts are data. Extract and index them with the same rigor as successes. |
Without tooling: Use git worktrees for parallel work. Designate one person (or agent) as coordinator. Document what doesn't work alongside what does.
You use Claude Code, Cursor, or Codex daily. Some sessions produce great results. Others are frustrating wastes of time. The difference isn't the model — it's the context.
Factors I-III give you immediate improvement: keep context focused, track what you learn, start fresh for each task. Factor VII (extracting learnings) and Factor VIII (compounding knowledge) make each session build on the last.
Your team runs agents in parallel. Work conflicts. Learnings from one developer's sessions don't help others. There's no consistent quality bar.
Factors IV-VI add workflow discipline: research first, validate externally, lock progress forward. Factor VIII gives you shared institutional memory. Scale factors (X-XII) provide isolation and coordination patterns.
You're designing agent tooling and need proven operational principles. Every framework reinvents context management, validation, and knowledge persistence from scratch.
These 12 factors are the shared vocabulary. They're vendor-neutral, grounded in 20+ years of DevOps and SRE practice, and tested in production.
You can start with zero infrastructure and level up when you need to:
Quickstart (5 min) → learnings.md file, zero tooling
Foundation (I-III) → Context discipline, git tracking, fresh sessions
Workflow (IV-VI) → Research, validation, ratcheting
Knowledge (VII-IX) → Extraction, compounding, measurement
Scale (X-XII) → Multi-agent isolation, supervision, failure harvesting (OPTIONAL)
Key principle: You can stop at any level and keep the value. Each level justifies the next, but none requires it.
When to level up:
- Quickstart → Foundation: When your
learnings.mdgets unwieldy or you notice repeated context problems - Foundation → Workflow: When you find yourself re-explaining codebase patterns to new sessions
- Workflow → Knowledge: When the same mistakes recur across sessions despite research
- Knowledge → Scale: When you're running multiple agents in parallel and conflicts emerge
These principles stand on decades of proven methodology:
| Source | Factors |
|---|---|
| DevOps practices (20+ years) | I, V, VI, IX |
| Site Reliability Engineering (Google, 15+ years) | V, VI, IX |
| Cognitive load theory (Sweller, 1988) | I, III |
| Unix philosophy (1978) | III |
| GitOps methodology (10+ years) | II |
| Microservices patterns (10+ years) | III, X, XI |
| Zero-trust architecture (10+ years) | V |
| Learning science (decades) | VII, VIII, XII |
| Project | Relationship |
|---|---|
| 12-Factor App (Heroku, 2011) | How to build cloud-native apps. We're how to operate with agents. |
| 12-Factor Agents (HumanLayer) | How to build agent applications. We're how to operate with them. |
| Vibe Coding (Gene Kim, Steve Yegge) | The methodology of AI-assisted coding. We're the operational discipline underneath. |
The AgentOps plugin is the reference implementation of these factors for Claude Code. It automates the knowledge flywheel (extraction, quality gating, semantic retrieval, decay management), provides research and planning skills, and implements multi-agent coordination patterns.
But the plugin is not a prerequisite. Every factor in this document can be applied manually with zero tooling. The principles are universal; the automation is optional.
Try the factors in your context. Document what works and what doesn't. Share via issues or PRs.
The factors evolve through production validation and community feedback.
License: CC BY-SA 4.0 (content) / Apache 2.0 (code)
- v1.0 (2025-01-27): Initial twelve factors — coding agent validation focus
- v2.0 (2025-12-27): Production implementation patterns added
- v3.0 (2026-02-15): Pivot to full operational discipline. Factors rewritten. Adoption model inverted (results-first, not manifesto-first). Knowledge compounding as hero differentiator. Scale factors marked optional.