From b6d9da078692cd940644da599c290e4de12925cc Mon Sep 17 00:00:00 2001 From: Adrian Date: Sun, 8 Mar 2026 09:13:38 -0400 Subject: [PATCH 1/2] feat(plugin): add tdd skill for test-driven development workflow Add opt-in TDD skill that integrates red-green-refactor discipline into implementation plans when triggered by TDD-related keywords. Includes the full cycle (RED/GREEN/REFACTOR), plan augmentation format, test granularity guidance, and anti-patterns reference. Language-agnostic, strongly opinionated on process. Designed as an MVP starting point for public submission. Bumps plugin to 2.39.0 (21 skills). --- .../.claude-plugin/plugin.json | 4 +- plugins/compound-engineering/CHANGELOG.md | 8 + plugins/compound-engineering/README.md | 3 +- .../compound-engineering/skills/tdd/SKILL.md | 141 ++++++++++++++++++ .../skills/tdd/references/anti-patterns.md | 100 +++++++++++++ 5 files changed, 253 insertions(+), 3 deletions(-) create mode 100644 plugins/compound-engineering/skills/tdd/SKILL.md create mode 100644 plugins/compound-engineering/skills/tdd/references/anti-patterns.md diff --git a/plugins/compound-engineering/.claude-plugin/plugin.json b/plugins/compound-engineering/.claude-plugin/plugin.json index e659557f..98b7e6ea 100644 --- a/plugins/compound-engineering/.claude-plugin/plugin.json +++ b/plugins/compound-engineering/.claude-plugin/plugin.json @@ -1,7 +1,7 @@ { "name": "compound-engineering", - "version": "2.38.1", - "description": "AI-powered development tools. 29 agents, 22 commands, 20 skills, 1 MCP server for code review, research, design, and workflow automation.", + "version": "2.39.0", + "description": "AI-powered development tools. 29 agents, 22 commands, 21 skills, 1 MCP server for code review, research, design, and workflow automation.", "author": { "name": "Kieran Klaassen", "email": "kieran@every.to", diff --git a/plugins/compound-engineering/CHANGELOG.md b/plugins/compound-engineering/CHANGELOG.md index e7664980..ffbcaaf6 100644 --- a/plugins/compound-engineering/CHANGELOG.md +++ b/plugins/compound-engineering/CHANGELOG.md @@ -5,6 +5,14 @@ All notable changes to the compound-engineering plugin will be documented in thi The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [2.39.0] - 2026-03-08 + +### Added + +- **`tdd` skill** -- Opt-in test-driven development workflow that integrates red-green-refactor discipline into implementation plans. Triggered when the user mentions TDD-related keywords before or after `/ce:plan`. Includes the full TDD cycle (RED/GREEN/REFACTOR), plan augmentation format for restructuring tasks into behavioral increments, test granularity guidance, and a reference file of common TDD anti-patterns with corrections. Language-agnostic and strongly opinionated. + +--- + ## [2.38.1] - 2026-03-01 ### Fixed diff --git a/plugins/compound-engineering/README.md b/plugins/compound-engineering/README.md index 33a4ea15..0654ce57 100644 --- a/plugins/compound-engineering/README.md +++ b/plugins/compound-engineering/README.md @@ -8,7 +8,7 @@ AI-powered development tools that get smarter with every use. Make each unit of |-----------|-------| | Agents | 29 | | Commands | 22 | -| Skills | 20 | +| Skills | 21 | | MCP Servers | 1 | ## Agents @@ -126,6 +126,7 @@ Core workflow commands use `ce:` prefix to unambiguously identify them as compou | `dspy-ruby` | Build type-safe LLM applications with DSPy.rb | | `frontend-design` | Create production-grade frontend interfaces | | `skill-creator` | Guide for creating effective Claude Code skills | +| `tdd` | Integrate red-green-refactor TDD discipline into plans and coding | ### Content & Workflow diff --git a/plugins/compound-engineering/skills/tdd/SKILL.md b/plugins/compound-engineering/skills/tdd/SKILL.md new file mode 100644 index 00000000..395964b9 --- /dev/null +++ b/plugins/compound-engineering/skills/tdd/SKILL.md @@ -0,0 +1,141 @@ +--- +name: tdd +description: This skill should be used when implementing features using test-driven development. It integrates red-green-refactor discipline into implementation plans and coding workflows. Triggers on "TDD", "test-driven", "test first", "write tests first", "red-green-refactor", or when the user requests a test-driven approach before or after plan creation. +--- + +# Test-Driven Development + +Integrate red-green-refactor discipline into implementation plans and coding workflows. This skill provides opinionated, language-agnostic TDD process guidance -- not a testing style guide. + +## Philosophy + +The test is the first consumer of the API. If it is awkward to test, the design is wrong. + +- Red-green-refactor is non-negotiable -- writing code then backfilling tests is not TDD +- Each cycle targets one behavioral change and one assertion +- Refactoring happens ONLY when tests are green +- Tests describe behavior, not implementation details +- When in doubt, default to TDD + +## When to Use + +**Apply TDD when:** +- Implementing new features or business logic +- Fixing bugs (reproduce the bug as a failing test first) +- Designing APIs or interfaces (the test reveals ergonomic issues early) +- Working with complex domain logic + +**Skip TDD when:** +- Running exploratory spikes or prototyping throwaway code +- Wiring configuration, boilerplate, or plumbing +- Doing pure UI layout and styling work +- The task is a trivial rename or one-line change + +When a user mentions TDD-related keywords before or after `/ce:plan`, restructure the plan's implementation tasks to follow the red-green-refactor cycle. When TDD is not mentioned, do not apply this skill. + +## The Cycle + +For each behavioral increment in a work item: + +### 1. RED -- Write a Failing Test + +Write a test that describes the expected behavior. Run the test suite and confirm the new test fails. + +If the test passes immediately, it is testing nothing new -- rewrite it or question whether the behavior already exists. + +**Test naming convention:** describe behavior, not method names. + +| Bad | Good | +|-----|------| +| `test_check_expiry` | `test_expired_subscription_denies_access` | +| `test_process` | `test_payment_creates_invoice_and_sends_receipt` | +| `test_validate` | `test_missing_email_returns_validation_error` | + +### 2. GREEN -- Make It Pass + +Write the minimum code to make the failing test pass. No cleverness. No optimization. No refactoring. Just make it green. + +Run the full test suite -- confirm all tests pass, not just the new one. + +### 3. REFACTOR -- Clean Up on Green + +With all tests passing, improve the code: +- Remove duplication +- Improve naming +- Extract abstractions that have earned their existence +- Simplify conditional logic + +Run the test suite after each change. Never refactor on red. + +### 4. Repeat + +Move to the next behavioral increment. Each cycle should take minutes, not hours. If a cycle is dragging, the increment is too large -- split it. + +## Plan Augmentation + +When TDD is requested alongside `/ce:plan`, restructure each implementation task to follow the cycle. Break features into behavioral increments, each with explicit RED/GREEN/REFACTOR steps. + +**Standard plan task:** + +```markdown +- [ ] Implement user authentication + - Add User model with email/password + - Add login endpoint + - Add session management +``` + +**TDD-augmented plan task:** + +```markdown +- [ ] Implement user authentication (TDD) + - RED: Test that User model validates email presence and format + - GREEN: Add User model with email validation + - RED: Test that login endpoint returns token for valid credentials + - GREEN: Implement login endpoint with token generation + - RED: Test that login rejects invalid credentials with 401 + - GREEN: Add credential verification + - REFACTOR: Extract authentication logic if duplication emerged + - RED: Test that expired sessions are rejected + - GREEN: Add session expiry check + - REFACTOR: Clean up session management +``` + +Each RED step names the specific behavior being tested. Each GREEN step names the minimum implementation. REFACTOR steps appear when enough code has accumulated to warrant cleanup. + +## Test Granularity + +Choose the starting test level based on the situation: + +| Situation | Start With | Rationale | +|-----------|-----------|-----------| +| New domain/business logic | Unit test | Fast feedback loop, drives internal design | +| New API endpoint or feature | Integration test | Verifies the full request/response contract | +| Bug fix | Test at the level the bug manifests | Proves the fix, prevents regression | +| Refactoring existing code | Existing tests (add if missing) | Characterization tests first if no coverage exists | +| Data transformation | Unit test | Pure functions are easiest to test in isolation | + +Do not be dogmatic about one level. Start where the value is highest, add other levels as needed. + +## Discipline Rules + +1. **Never skip RED.** A test that was never red never proved anything. The red step confirms the test can detect failure. +2. **Never commit without a corresponding test.** If the behavior changed, a test should document that change. +3. **Never refactor on red.** Get to green first, then clean up. Mixing implementation and refactoring creates confusion about what broke. +4. **Keep cycles small.** If a RED-GREEN cycle takes more than 15-20 minutes, the behavioral increment is too large. Split it. +5. **Test behavior, not implementation.** Assert on outcomes visible to the caller, not internal state. See [anti-patterns.md](./references/anti-patterns.md) for common mistakes. +6. **Run the full suite frequently.** Not just the new test -- the full suite. Catch unintended breakage early. + +## Decomposing Features into TDD Increments + +The hardest part of TDD is deciding what to test first. Use this heuristic: + +1. **Start with the happy path.** What is the simplest successful case? +2. **Add validation and error cases.** What inputs should be rejected? +3. **Add edge cases.** Empty collections, boundary values, concurrent access. +4. **Add integration points.** How does this interact with other components? + +Each of these becomes a RED-GREEN-REFACTOR cycle. Resist the urge to write all tests upfront -- discover the design incrementally. + +## Anti-Patterns + +For detailed guidance on common TDD mistakes and corrections, see [anti-patterns.md](./references/anti-patterns.md). diff --git a/plugins/compound-engineering/skills/tdd/references/anti-patterns.md b/plugins/compound-engineering/skills/tdd/references/anti-patterns.md new file mode 100644 index 00000000..d8f8b612 --- /dev/null +++ b/plugins/compound-engineering/skills/tdd/references/anti-patterns.md @@ -0,0 +1,100 @@ +# TDD Anti-Patterns + +Common mistakes that undermine test-driven development, and how to correct them. + +## Testing Implementation Instead of Behavior + +**Problem:** Tests are coupled to internal details -- private methods, internal state, specific data structures. Any refactoring breaks tests even when behavior is unchanged. + +**Example (bad):** +``` +# Tests that the internal cache hash has a specific key +assert user._cache[:permissions] == [:read, :write] +``` + +**Correction:** Assert on outcomes visible to the caller. +``` +# Tests that the user has the expected permissions +assert user.can?(:read) +assert user.can?(:write) +``` + +**Rule of thumb:** If renaming a private method or changing an internal data structure breaks a test, the test is coupled to implementation. + +## Writing All Tests First, Then All Code + +**Problem:** Writing a full test suite upfront before any implementation. This front-loads design decisions and eliminates the feedback loop that makes TDD valuable. + +**Correction:** One cycle at a time. Write one failing test, make it pass, refactor, then write the next test. Let each passing test inform the next one. + +## Mocking Everything + +**Problem:** Every dependency is mocked, so tests pass but the system does not actually work. Tests verify that mocks return what they were told to return. + +**Correction:** Mock at boundaries (external APIs, file systems, third-party services). Use real objects for internal collaborators. If a test requires more than 2-3 mocks, the code under test may have too many dependencies -- that is a design signal, not a testing problem. + +## Skipping the Refactor Step + +**Problem:** RED-GREEN without REFACTOR. The code works but accumulates duplication, unclear naming, and tangled logic. Technical debt compounds with every cycle. + +**Correction:** Refactoring is not optional. After each GREEN, ask: Is there duplication? Are names clear? Is there an abstraction trying to emerge? Even if the answer is "no changes needed," the pause to evaluate is part of the discipline. + +## Testing Framework Code + +**Problem:** Writing tests that verify the framework does its job -- e.g., testing that Rails validates presence when `validates :name, presence: true` is declared. + +**Correction:** Test _your_ logic, not the framework's. If the framework has a well-tested feature and the code simply declares it, trust the framework. Focus tests on business rules, edge cases, and custom behavior. + +## Gold-Plating Tests + +**Problem:** Over-specified assertions that check every detail of the response. Tests break when irrelevant fields change. + +**Example (bad):** +``` +assert response == { + id: 1, name: "Alice", email: "alice@example.com", + created_at: "2026-01-01T00:00:00Z", updated_at: "2026-01-01T00:00:00Z", + role: "admin", last_login: nil, avatar_url: nil +} +``` + +**Correction:** Assert on what matters to the behavior being tested. +``` +assert response[:name] == "Alice" +assert response[:role] == "admin" +``` + +## Slow Test Suites + +**Problem:** Tests take so long that developers stop running them frequently. The feedback loop -- the core value of TDD -- breaks down. + +**Correction:** +- Prefer unit tests (fast) over integration tests (slower) for logic-heavy code +- Use integration tests strategically for critical paths, not for every code path +- Avoid unnecessary database hits in unit tests +- If the suite takes more than 30 seconds locally, investigate what is slow + +## The Test-After Trap + +**Problem:** Writing implementation first, then retrofitting tests. The tests end up verifying the implementation rather than specifying behavior. Tests shaped by existing code cannot drive design. + +**Correction:** Commit to writing the test first, even when the implementation seems obvious. The discipline of writing the test first often reveals edge cases and API awkwardness that would otherwise be missed. + +## Testing Too Many Things at Once + +**Problem:** A single test verifies multiple behaviors. When it fails, it is unclear which behavior broke. + +**Example (bad):** +``` +test "user registration" do + # Creates user, sends email, logs event, redirects -- all in one test +end +``` + +**Correction:** One behavior per test. Split into: "registration creates user," "registration sends welcome email," "registration logs signup event." Each test is a sentence that describes one thing. + +## Ignoring Test Failure Messages + +**Problem:** Tests fail with unhelpful messages like "expected true, got false" or "assertion failed." Debugging requires reading the test source. + +**Correction:** Write assertion messages that explain what went wrong in business terms. Good failure messages save time during debugging and serve as documentation. From 3c67cd27520be1d9e2688e34083620e0552b1a95 Mon Sep 17 00:00:00 2001 From: Adrian Date: Sun, 8 Mar 2026 09:52:37 -0400 Subject: [PATCH 2/2] fix(tdd): address review feedback on skill quality and completeness - Fix internal contradiction: discipline rule #2 now scoped to behavioral changes, consistent with 'Skip TDD when' section - Fix 'one assertion' to 'one behavior per test' throughout - Add activation sequence: test runner detection, plan vs no-plan routing - Add characterization test guidance for untested codebases - Expand RED step guidance for when test passes immediately - Declare Classical/Chicago school stance, acknowledge London school - Rename 'Plan Augmentation' to 'Restructuring Plans for TDD' - Merge decomposition heuristic into plan section, remove redundancy - Remove duplicate anti-patterns link - Add 3 AI-specific anti-patterns: over-implementing on GREEN, mirror tests, skipping RED verification - Fix Ruby-biased code examples in anti-patterns for language neutrality - Add code example to 'Ignoring Test Failure Messages' entry - Add Classical school context to 'Mocking Everything' entry --- .../compound-engineering/skills/tdd/SKILL.md | 51 +++++++++--------- .../skills/tdd/references/anti-patterns.md | 54 ++++++++++++++++--- 2 files changed, 75 insertions(+), 30 deletions(-) diff --git a/plugins/compound-engineering/skills/tdd/SKILL.md b/plugins/compound-engineering/skills/tdd/SKILL.md index 395964b9..3a717b1d 100644 --- a/plugins/compound-engineering/skills/tdd/SKILL.md +++ b/plugins/compound-engineering/skills/tdd/SKILL.md @@ -11,11 +11,7 @@ Integrate red-green-refactor discipline into implementation plans and coding wor The test is the first consumer of the API. If it is awkward to test, the design is wrong. -- Red-green-refactor is non-negotiable -- writing code then backfilling tests is not TDD -- Each cycle targets one behavioral change and one assertion -- Refactoring happens ONLY when tests are green -- Tests describe behavior, not implementation details -- When in doubt, default to TDD +This skill follows the Classical (Chicago) school of TDD: prefer real objects over test doubles, assert on state and outcomes, mock only at system boundaries. The London (Mockist) school -- which drives design through interaction-based testing with mocks for all collaborators -- is a valid alternative but is not the approach taught here. ## When to Use @@ -31,7 +27,15 @@ The test is the first consumer of the API. If it is awkward to test, the design - Doing pure UI layout and styling work - The task is a trivial rename or one-line change -When a user mentions TDD-related keywords before or after `/ce:plan`, restructure the plan's implementation tasks to follow the red-green-refactor cycle. When TDD is not mentioned, do not apply this skill. +When TDD is not mentioned, do not apply this skill. + +## Activation + +When this skill triggers: + +1. **Detect the test runner.** Identify the project's test framework from configuration files (package.json, Gemfile, pyproject.toml, go.mod, Cargo.toml, pom.xml, etc.). Use the detected runner for all RED/GREEN verification steps. +2. **If a plan exists or is being created** (`/ce:plan`), restructure implementation tasks to follow the red-green-refactor cycle (see Restructuring Plans for TDD below). +3. **If no plan exists,** begin the first RED step for the current task. ## The Cycle @@ -41,7 +45,7 @@ For each behavioral increment in a work item: Write a test that describes the expected behavior. Run the test suite and confirm the new test fails. -If the test passes immediately, it is testing nothing new -- rewrite it or question whether the behavior already exists. +If the test passes immediately, investigate why. If the behavior already exists, skip this cycle and move to the next increment. If the test is not actually exercising the intended behavior, rewrite it until it fails. A test that was never red is not trustworthy. **Test naming convention:** describe behavior, not method names. @@ -71,10 +75,12 @@ Run the test suite after each change. Never refactor on red. Move to the next behavioral increment. Each cycle should take minutes, not hours. If a cycle is dragging, the increment is too large -- split it. -## Plan Augmentation +## Restructuring Plans for TDD When TDD is requested alongside `/ce:plan`, restructure each implementation task to follow the cycle. Break features into behavioral increments, each with explicit RED/GREEN/REFACTOR steps. +To decompose a feature into increments: start with the simplest happy path, then add validation and error cases, then edge cases, then integration points. Each becomes a RED-GREEN-REFACTOR cycle. Resist the urge to write all tests upfront -- discover the design incrementally. + **Standard plan task:** ```markdown @@ -116,26 +122,23 @@ Choose the starting test level based on the situation: Do not be dogmatic about one level. Start where the value is highest, add other levels as needed. +### Adding TDD to Untested Code + +When adding features to code with no test coverage, do not start TDD on the new feature immediately. First, write characterization tests -- tests that document the code's *current* behavior, whether correct or not: + +1. Run the existing code and observe what it does +2. Write tests that assert the current behavior +3. Confirm these tests pass (they should -- they describe what already exists) +4. Now begin TDD for the new feature, with characterization tests as a safety net + +Characterization tests prevent the new feature from accidentally breaking existing behavior that users depend on. + ## Discipline Rules 1. **Never skip RED.** A test that was never red never proved anything. The red step confirms the test can detect failure. -2. **Never commit without a corresponding test.** If the behavior changed, a test should document that change. +2. **Never commit a behavioral change without a corresponding test.** Configuration, boilerplate, and trivial renames are exempt (see "Skip TDD when" above), but any change to how the system behaves needs a test. 3. **Never refactor on red.** Get to green first, then clean up. Mixing implementation and refactoring creates confusion about what broke. 4. **Keep cycles small.** If a RED-GREEN cycle takes more than 15-20 minutes, the behavioral increment is too large. Split it. -5. **Test behavior, not implementation.** Assert on outcomes visible to the caller, not internal state. See [anti-patterns.md](./references/anti-patterns.md) for common mistakes. +5. **Test behavior, not implementation.** Assert on outcomes visible to the caller, not internal state. A single behavior may have multiple observable effects worth asserting -- "one behavior per test" not "one assertion per test." See [anti-patterns.md](./references/anti-patterns.md) for common mistakes. 6. **Run the full suite frequently.** Not just the new test -- the full suite. Catch unintended breakage early. -## Decomposing Features into TDD Increments - -The hardest part of TDD is deciding what to test first. Use this heuristic: - -1. **Start with the happy path.** What is the simplest successful case? -2. **Add validation and error cases.** What inputs should be rejected? -3. **Add edge cases.** Empty collections, boundary values, concurrent access. -4. **Add integration points.** How does this interact with other components? - -Each of these becomes a RED-GREEN-REFACTOR cycle. Resist the urge to write all tests upfront -- discover the design incrementally. - -## Anti-Patterns - -For detailed guidance on common TDD mistakes and corrections, see [anti-patterns.md](./references/anti-patterns.md). diff --git a/plugins/compound-engineering/skills/tdd/references/anti-patterns.md b/plugins/compound-engineering/skills/tdd/references/anti-patterns.md index d8f8b612..fcf9e9e4 100644 --- a/plugins/compound-engineering/skills/tdd/references/anti-patterns.md +++ b/plugins/compound-engineering/skills/tdd/references/anti-patterns.md @@ -8,15 +8,15 @@ Common mistakes that undermine test-driven development, and how to correct them. **Example (bad):** ``` -# Tests that the internal cache hash has a specific key -assert user._cache[:permissions] == [:read, :write] +# Tests internal cache state +assert user.internal_cache["permissions"] == ["read", "write"] ``` **Correction:** Assert on outcomes visible to the caller. ``` # Tests that the user has the expected permissions -assert user.can?(:read) -assert user.can?(:write) +assert user.has_permission("read") +assert user.has_permission("write") ``` **Rule of thumb:** If renaming a private method or changing an internal data structure breaks a test, the test is coupled to implementation. @@ -31,7 +31,9 @@ assert user.can?(:write) **Problem:** Every dependency is mocked, so tests pass but the system does not actually work. Tests verify that mocks return what they were told to return. -**Correction:** Mock at boundaries (external APIs, file systems, third-party services). Use real objects for internal collaborators. If a test requires more than 2-3 mocks, the code under test may have too many dependencies -- that is a design signal, not a testing problem. +**Correction (Classical/Chicago approach):** Mock at boundaries (external APIs, file systems, third-party services). Use real objects for internal collaborators. If a test requires more than 2-3 test doubles, the code under test may have too many dependencies -- that is a design signal, not a testing problem. + +**Note:** The London/Mockist school deliberately mocks all collaborators to drive interface design. That is a different philosophy, not an error. This skill follows the Classical approach. ## Skipping the Refactor Step @@ -97,4 +99,44 @@ end **Problem:** Tests fail with unhelpful messages like "expected true, got false" or "assertion failed." Debugging requires reading the test source. -**Correction:** Write assertion messages that explain what went wrong in business terms. Good failure messages save time during debugging and serve as documentation. +**Example (bad):** +``` +assert result +``` + +**Correction:** Include context in the assertion message. +``` +assert result, "Expected expired subscription to deny access, but access was granted" +``` + +Failure messages are documentation. When a test fails six months from now, the message should explain the business rule without reading the test body. + +## Over-Implementing on GREEN (AI-Specific) + +**Problem:** AI coding assistants tend to write the complete, optimized solution in the GREEN step instead of the minimum code to pass the test. This defeats TDD's incremental design benefit -- the design emerges from many small steps, not one large leap. + +**Example (bad):** +``` +# RED: test that fizzbuzz(3) returns "Fizz" +# GREEN: writes the entire FizzBuzz algorithm including Buzz, FizzBuzz, and edge cases +``` + +**Correction:** In the GREEN step, write only what the current failing test demands. If the test says `fizzbuzz(3)` returns `"Fizz"`, a valid GREEN implementation is `return "Fizz"`. The next test will force generalization. + +## Mirror Tests (AI-Specific) + +**Problem:** AI generates tests that are structural copies of the implementation -- the test essentially re-implements the production code and asserts they match. These tests pass by definition and catch nothing. + +**Example (bad):** +``` +# Production: total = price * quantity * (1 - discount) +# Test: assert total == price * quantity * (1 - discount) +``` + +**Correction:** Tests should use concrete values, not replicate the formula. Assert `calculate_total(10, 5, 0.1) == 45`, not `calculate_total(price, qty, disc) == price * qty * (1 - disc)`. + +## Skipping RED Verification (AI-Specific) + +**Problem:** AI writes the test and implementation together in one pass without verifying the test fails first. This skips the most important step -- confirming the test can detect failure. + +**Correction:** Always run the test before writing implementation code. Observe the failure. Read the failure message. Only then write the GREEN step. This is the discipline that separates TDD from "writing tests."