Skip to content

Operational patterns for AI agents and infrastructure from solo to enterprise scale. Applies DevOps + SRE principles to the intersection: infrastructure FOR AI + AI FOR infrastructure. Alpha framework extracting meta-patterns from mission critical environments.

License

Notifications You must be signed in to change notification settings

boshu2/12-factor-agentops

Repository files navigation

12-Factor AgentOps

The Operational Discipline for Working With AI Agents

Principles that turn ad-hoc agent usage into a reliable, compounding practice.

CI Version

12 Factors License GitHub Stars


The Problem

Every agent session starts from zero. Same context problems. Same mistakes repeated. Same rework. You get good results sometimes and bad results other times, with no idea why. Your agent forgets everything between sessions — every conversation is like training a new junior developer.

The model isn't the problem. The operations are.

Most people blame the model when they get bad results. The real problem is how they operate: overloaded context windows, no session memory, no validation, no learning loops. Fix the operations and the same model performs dramatically better.

The Solution: Knowledge Compounds

12-Factor AgentOps makes each session smarter than the last. Not through better models — through better operations.

Session 1: Your agent knows nothing about your codebase. It makes common mistakes. It ignores your conventions. It writes code that doesn't fit.

Session 10: Your agent knows your patterns. It avoids documented pitfalls. It follows your conventions because they're in the context.

Session 100: Your agent operates with institutional memory. It knows what's been tried and failed. It knows why architectural decisions were made. It builds on everything that came before.

The hook: Knowledge compounding is the one thing no amount of model improvement replaces. Better models with amnesia still repeat your mistakes.


Quickstart: 5 Minutes, Zero Infrastructure

No plugins, no tooling, no setup. Just a text file and discipline.

Step 1: Create a learnings.md file in your project root.

Step 2: After each agent session, append what worked and what didn't:

## Auth Middleware (2026-02-15)
- CORS requires explicit OPTIONS preflight handlers. Default config silently drops them.
- Session tokens must be validated server-side; client-side checks are insufficient.
- The auth middleware chain is: rate-limit → CORS → session → route handler.

Step 3: Point your agent at it on startup. In Claude Code, add to CLAUDE.md:

Read learnings.md before starting any task.

In Cursor, add to .cursorrules. In Codex, add to AGENTS.md. The mechanism varies; the principle doesn't.

That's it. You're now doing Factors I (context management), II (git tracking), and VII (knowledge extraction) at a basic level. Your agent will stop repeating documented mistakes immediately.

When to level up: When learnings.md exceeds ~50 entries or you stop reading it before sessions, you're ready for more structure.


The 12 Factors

Twelve vendor-neutral principles organized in four tiers. Start at the top. Each tier builds on the previous one. You can stop at any tier and keep the value.

Foundation (I–III) — Start Here

Non-negotiable basics that work with zero tooling. Get these wrong and nothing else matters.

# Factor The Rule
I Context Is Everything Manage what enters the context window like you manage what enters production.
II Track Everything in Git If it's not in git, it didn't happen.
III One Agent, One Job Each agent gets a scoped task and fresh context. Never reuse a saturated window.

Without tooling: Keep sessions short. Start fresh for new tasks. Write handoff summaries. Commit your learnings.md. One issue per agent session.

Workflow (IV–VI) — The Discipline

How work flows through agents. The discipline that separates "prompting and hoping" from a reliable operating model.

# Factor The Rule
IV Research Before You Build Understand the problem space before generating a single line of code.
V Validate Externally No agent grades its own work. Ever.
VI Lock Progress Forward Once work passes validation, it ratchets — it cannot regress.

Without tooling: Research before implementing. Have a different session (or human) review the work. Commit validated work to protected branches.

Knowledge (VII–IX) — Where Compounding Kicks In

Systematic extraction and injection of knowledge. This is where sessions start getting measurably smarter over time.

# Factor The Rule
VII Extract Learnings Every session produces two outputs — the work product and the lessons learned.
VIII Compound Knowledge Learnings must flow back into future sessions automatically.
IX Measure What Matters Track fitness toward goals, not activity metrics.

Factor VIII is the hero. It's the knowledge flywheel: extract learnings, gate for quality, inject into future sessions, measure retrieval, let stale knowledge decay. This is the differentiator that can't be commoditized — better models don't replace institutional memory.

Without tooling: Manually update learnings.md after each session. Review it weekly and prune stale entries. It's tedious but it works. The AgentOps plugin automates this — but the principle is portable.

Scale (X–XII) — Advanced, Optional

Multi-agent orchestration patterns. Skip this entire tier if you work solo. You lose nothing. These patterns apply when you're running parallel agents on complex projects.

# Factor The Rule
X Isolate Workers Each worker gets its own workspace, its own context, and zero shared mutable state.
XI Supervise Hierarchically Escalation flows up, never sideways.
XII Harvest Failures as Wisdom Failed attempts are data. Extract and index them with the same rigor as successes.

Without tooling: Use git worktrees for parallel work. Designate one person (or agent) as coordinator. Document what doesn't work alongside what does.


Why These Factors?

For the Solo Developer

You use Claude Code, Cursor, or Codex daily. Some sessions produce great results. Others are frustrating wastes of time. The difference isn't the model — it's the context.

Factors I-III give you immediate improvement: keep context focused, track what you learn, start fresh for each task. Factor VII (extracting learnings) and Factor VIII (compounding knowledge) make each session build on the last.

For the Tech Lead

Your team runs agents in parallel. Work conflicts. Learnings from one developer's sessions don't help others. There's no consistent quality bar.

Factors IV-VI add workflow discipline: research first, validate externally, lock progress forward. Factor VIII gives you shared institutional memory. Scale factors (X-XII) provide isolation and coordination patterns.

For the Tool Builder

You're designing agent tooling and need proven operational principles. Every framework reinvents context management, validation, and knowledge persistence from scratch.

These 12 factors are the shared vocabulary. They're vendor-neutral, grounded in 20+ years of DevOps and SRE practice, and tested in production.


Adoption Path

You can start with zero infrastructure and level up when you need to:

Quickstart (5 min)     → learnings.md file, zero tooling
Foundation (I-III)     → Context discipline, git tracking, fresh sessions
Workflow (IV-VI)       → Research, validation, ratcheting
Knowledge (VII-IX)     → Extraction, compounding, measurement
Scale (X-XII)          → Multi-agent isolation, supervision, failure harvesting (OPTIONAL)

Key principle: You can stop at any level and keep the value. Each level justifies the next, but none requires it.

When to level up:

  • Quickstart → Foundation: When your learnings.md gets unwieldy or you notice repeated context problems
  • Foundation → Workflow: When you find yourself re-explaining codebase patterns to new sessions
  • Workflow → Knowledge: When the same mistakes recur across sessions despite research
  • Knowledge → Scale: When you're running multiple agents in parallel and conflicts emerge

Where This Comes From

These principles stand on decades of proven methodology:

Source Factors
DevOps practices (20+ years) I, V, VI, IX
Site Reliability Engineering (Google, 15+ years) V, VI, IX
Cognitive load theory (Sweller, 1988) I, III
Unix philosophy (1978) III
GitOps methodology (10+ years) II
Microservices patterns (10+ years) III, X, XI
Zero-trust architecture (10+ years) V
Learning science (decades) VII, VIII, XII

Related Projects

Project Relationship
12-Factor App (Heroku, 2011) How to build cloud-native apps. We're how to operate with agents.
12-Factor Agents (HumanLayer) How to build agent applications. We're how to operate with them.
Vibe Coding (Gene Kim, Steve Yegge) The methodology of AI-assisted coding. We're the operational discipline underneath.

Reference Implementation

The AgentOps plugin is the reference implementation of these factors for Claude Code. It automates the knowledge flywheel (extraction, quality gating, semantic retrieval, decay management), provides research and planning skills, and implements multi-agent coordination patterns.

But the plugin is not a prerequisite. Every factor in this document can be applied manually with zero tooling. The principles are universal; the automation is optional.


Contributing

Try the factors in your context. Document what works and what doesn't. Share via issues or PRs.

The factors evolve through production validation and community feedback.

License: CC BY-SA 4.0 (content) / Apache 2.0 (code)


Version History

  • v1.0 (2025-01-27): Initial twelve factors — coding agent validation focus
  • v2.0 (2025-12-27): Production implementation patterns added
  • v3.0 (2026-02-15): Pivot to full operational discipline. Factors rewritten. Adoption model inverted (results-first, not manifesto-first). Knowledge compounding as hero differentiator. Scale factors marked optional.

About

Operational patterns for AI agents and infrastructure from solo to enterprise scale. Applies DevOps + SRE principles to the intersection: infrastructure FOR AI + AI FOR infrastructure. Alpha framework extracting meta-patterns from mission critical environments.

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published

Languages