diff --git a/plugins/compound-engineering/.claude-plugin/plugin.json b/plugins/compound-engineering/.claude-plugin/plugin.json index e659557f..98b7e6ea 100644 --- a/plugins/compound-engineering/.claude-plugin/plugin.json +++ b/plugins/compound-engineering/.claude-plugin/plugin.json @@ -1,7 +1,7 @@ { "name": "compound-engineering", - "version": "2.38.1", - "description": "AI-powered development tools. 29 agents, 22 commands, 20 skills, 1 MCP server for code review, research, design, and workflow automation.", + "version": "2.39.0", + "description": "AI-powered development tools. 29 agents, 22 commands, 21 skills, 1 MCP server for code review, research, design, and workflow automation.", "author": { "name": "Kieran Klaassen", "email": "kieran@every.to", diff --git a/plugins/compound-engineering/CHANGELOG.md b/plugins/compound-engineering/CHANGELOG.md index e7664980..ffbcaaf6 100644 --- a/plugins/compound-engineering/CHANGELOG.md +++ b/plugins/compound-engineering/CHANGELOG.md @@ -5,6 +5,14 @@ All notable changes to the compound-engineering plugin will be documented in thi The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [2.39.0] - 2026-03-08 + +### Added + +- **`tdd` skill** -- Opt-in test-driven development workflow that integrates red-green-refactor discipline into implementation plans. Triggered when the user mentions TDD-related keywords before or after `/ce:plan`. Includes the full TDD cycle (RED/GREEN/REFACTOR), plan augmentation format for restructuring tasks into behavioral increments, test granularity guidance, and a reference file of common TDD anti-patterns with corrections. Language-agnostic and strongly opinionated. + +--- + ## [2.38.1] - 2026-03-01 ### Fixed diff --git a/plugins/compound-engineering/README.md b/plugins/compound-engineering/README.md index 33a4ea15..0654ce57 100644 --- a/plugins/compound-engineering/README.md +++ b/plugins/compound-engineering/README.md @@ -8,7 +8,7 @@ AI-powered development tools that get smarter with every use. Make each unit of |-----------|-------| | Agents | 29 | | Commands | 22 | -| Skills | 20 | +| Skills | 21 | | MCP Servers | 1 | ## Agents @@ -126,6 +126,7 @@ Core workflow commands use `ce:` prefix to unambiguously identify them as compou | `dspy-ruby` | Build type-safe LLM applications with DSPy.rb | | `frontend-design` | Create production-grade frontend interfaces | | `skill-creator` | Guide for creating effective Claude Code skills | +| `tdd` | Integrate red-green-refactor TDD discipline into plans and coding | ### Content & Workflow diff --git a/plugins/compound-engineering/skills/tdd/SKILL.md b/plugins/compound-engineering/skills/tdd/SKILL.md new file mode 100644 index 00000000..3a717b1d --- /dev/null +++ b/plugins/compound-engineering/skills/tdd/SKILL.md @@ -0,0 +1,144 @@ +--- +name: tdd +description: This skill should be used when implementing features using test-driven development. It integrates red-green-refactor discipline into implementation plans and coding workflows. Triggers on "TDD", "test-driven", "test first", "write tests first", "red-green-refactor", or when the user requests a test-driven approach before or after plan creation. +--- + +# Test-Driven Development + +Integrate red-green-refactor discipline into implementation plans and coding workflows. This skill provides opinionated, language-agnostic TDD process guidance -- not a testing style guide. + +## Philosophy + +The test is the first consumer of the API. If it is awkward to test, the design is wrong. + +This skill follows the Classical (Chicago) school of TDD: prefer real objects over test doubles, assert on state and outcomes, mock only at system boundaries. The London (Mockist) school -- which drives design through interaction-based testing with mocks for all collaborators -- is a valid alternative but is not the approach taught here. + +## When to Use + +**Apply TDD when:** +- Implementing new features or business logic +- Fixing bugs (reproduce the bug as a failing test first) +- Designing APIs or interfaces (the test reveals ergonomic issues early) +- Working with complex domain logic + +**Skip TDD when:** +- Running exploratory spikes or prototyping throwaway code +- Wiring configuration, boilerplate, or plumbing +- Doing pure UI layout and styling work +- The task is a trivial rename or one-line change + +When TDD is not mentioned, do not apply this skill. + +## Activation + +When this skill triggers: + +1. **Detect the test runner.** Identify the project's test framework from configuration files (package.json, Gemfile, pyproject.toml, go.mod, Cargo.toml, pom.xml, etc.). Use the detected runner for all RED/GREEN verification steps. +2. **If a plan exists or is being created** (`/ce:plan`), restructure implementation tasks to follow the red-green-refactor cycle (see Restructuring Plans for TDD below). +3. **If no plan exists,** begin the first RED step for the current task. + +## The Cycle + +For each behavioral increment in a work item: + +### 1. RED -- Write a Failing Test + +Write a test that describes the expected behavior. Run the test suite and confirm the new test fails. + +If the test passes immediately, investigate why. If the behavior already exists, skip this cycle and move to the next increment. If the test is not actually exercising the intended behavior, rewrite it until it fails. A test that was never red is not trustworthy. + +**Test naming convention:** describe behavior, not method names. + +| Bad | Good | +|-----|------| +| `test_check_expiry` | `test_expired_subscription_denies_access` | +| `test_process` | `test_payment_creates_invoice_and_sends_receipt` | +| `test_validate` | `test_missing_email_returns_validation_error` | + +### 2. GREEN -- Make It Pass + +Write the minimum code to make the failing test pass. No cleverness. No optimization. No refactoring. Just make it green. + +Run the full test suite -- confirm all tests pass, not just the new one. + +### 3. REFACTOR -- Clean Up on Green + +With all tests passing, improve the code: +- Remove duplication +- Improve naming +- Extract abstractions that have earned their existence +- Simplify conditional logic + +Run the test suite after each change. Never refactor on red. + +### 4. Repeat + +Move to the next behavioral increment. Each cycle should take minutes, not hours. If a cycle is dragging, the increment is too large -- split it. + +## Restructuring Plans for TDD + +When TDD is requested alongside `/ce:plan`, restructure each implementation task to follow the cycle. Break features into behavioral increments, each with explicit RED/GREEN/REFACTOR steps. + +To decompose a feature into increments: start with the simplest happy path, then add validation and error cases, then edge cases, then integration points. Each becomes a RED-GREEN-REFACTOR cycle. Resist the urge to write all tests upfront -- discover the design incrementally. + +**Standard plan task:** + +```markdown +- [ ] Implement user authentication + - Add User model with email/password + - Add login endpoint + - Add session management +``` + +**TDD-augmented plan task:** + +```markdown +- [ ] Implement user authentication (TDD) + - RED: Test that User model validates email presence and format + - GREEN: Add User model with email validation + - RED: Test that login endpoint returns token for valid credentials + - GREEN: Implement login endpoint with token generation + - RED: Test that login rejects invalid credentials with 401 + - GREEN: Add credential verification + - REFACTOR: Extract authentication logic if duplication emerged + - RED: Test that expired sessions are rejected + - GREEN: Add session expiry check + - REFACTOR: Clean up session management +``` + +Each RED step names the specific behavior being tested. Each GREEN step names the minimum implementation. REFACTOR steps appear when enough code has accumulated to warrant cleanup. + +## Test Granularity + +Choose the starting test level based on the situation: + +| Situation | Start With | Rationale | +|-----------|-----------|-----------| +| New domain/business logic | Unit test | Fast feedback loop, drives internal design | +| New API endpoint or feature | Integration test | Verifies the full request/response contract | +| Bug fix | Test at the level the bug manifests | Proves the fix, prevents regression | +| Refactoring existing code | Existing tests (add if missing) | Characterization tests first if no coverage exists | +| Data transformation | Unit test | Pure functions are easiest to test in isolation | + +Do not be dogmatic about one level. Start where the value is highest, add other levels as needed. + +### Adding TDD to Untested Code + +When adding features to code with no test coverage, do not start TDD on the new feature immediately. First, write characterization tests -- tests that document the code's *current* behavior, whether correct or not: + +1. Run the existing code and observe what it does +2. Write tests that assert the current behavior +3. Confirm these tests pass (they should -- they describe what already exists) +4. Now begin TDD for the new feature, with characterization tests as a safety net + +Characterization tests prevent the new feature from accidentally breaking existing behavior that users depend on. + +## Discipline Rules + +1. **Never skip RED.** A test that was never red never proved anything. The red step confirms the test can detect failure. +2. **Never commit a behavioral change without a corresponding test.** Configuration, boilerplate, and trivial renames are exempt (see "Skip TDD when" above), but any change to how the system behaves needs a test. +3. **Never refactor on red.** Get to green first, then clean up. Mixing implementation and refactoring creates confusion about what broke. +4. **Keep cycles small.** If a RED-GREEN cycle takes more than 15-20 minutes, the behavioral increment is too large. Split it. +5. **Test behavior, not implementation.** Assert on outcomes visible to the caller, not internal state. A single behavior may have multiple observable effects worth asserting -- "one behavior per test" not "one assertion per test." See [anti-patterns.md](./references/anti-patterns.md) for common mistakes. +6. **Run the full suite frequently.** Not just the new test -- the full suite. Catch unintended breakage early. + diff --git a/plugins/compound-engineering/skills/tdd/references/anti-patterns.md b/plugins/compound-engineering/skills/tdd/references/anti-patterns.md new file mode 100644 index 00000000..fcf9e9e4 --- /dev/null +++ b/plugins/compound-engineering/skills/tdd/references/anti-patterns.md @@ -0,0 +1,142 @@ +# TDD Anti-Patterns + +Common mistakes that undermine test-driven development, and how to correct them. + +## Testing Implementation Instead of Behavior + +**Problem:** Tests are coupled to internal details -- private methods, internal state, specific data structures. Any refactoring breaks tests even when behavior is unchanged. + +**Example (bad):** +``` +# Tests internal cache state +assert user.internal_cache["permissions"] == ["read", "write"] +``` + +**Correction:** Assert on outcomes visible to the caller. +``` +# Tests that the user has the expected permissions +assert user.has_permission("read") +assert user.has_permission("write") +``` + +**Rule of thumb:** If renaming a private method or changing an internal data structure breaks a test, the test is coupled to implementation. + +## Writing All Tests First, Then All Code + +**Problem:** Writing a full test suite upfront before any implementation. This front-loads design decisions and eliminates the feedback loop that makes TDD valuable. + +**Correction:** One cycle at a time. Write one failing test, make it pass, refactor, then write the next test. Let each passing test inform the next one. + +## Mocking Everything + +**Problem:** Every dependency is mocked, so tests pass but the system does not actually work. Tests verify that mocks return what they were told to return. + +**Correction (Classical/Chicago approach):** Mock at boundaries (external APIs, file systems, third-party services). Use real objects for internal collaborators. If a test requires more than 2-3 test doubles, the code under test may have too many dependencies -- that is a design signal, not a testing problem. + +**Note:** The London/Mockist school deliberately mocks all collaborators to drive interface design. That is a different philosophy, not an error. This skill follows the Classical approach. + +## Skipping the Refactor Step + +**Problem:** RED-GREEN without REFACTOR. The code works but accumulates duplication, unclear naming, and tangled logic. Technical debt compounds with every cycle. + +**Correction:** Refactoring is not optional. After each GREEN, ask: Is there duplication? Are names clear? Is there an abstraction trying to emerge? Even if the answer is "no changes needed," the pause to evaluate is part of the discipline. + +## Testing Framework Code + +**Problem:** Writing tests that verify the framework does its job -- e.g., testing that Rails validates presence when `validates :name, presence: true` is declared. + +**Correction:** Test _your_ logic, not the framework's. If the framework has a well-tested feature and the code simply declares it, trust the framework. Focus tests on business rules, edge cases, and custom behavior. + +## Gold-Plating Tests + +**Problem:** Over-specified assertions that check every detail of the response. Tests break when irrelevant fields change. + +**Example (bad):** +``` +assert response == { + id: 1, name: "Alice", email: "alice@example.com", + created_at: "2026-01-01T00:00:00Z", updated_at: "2026-01-01T00:00:00Z", + role: "admin", last_login: nil, avatar_url: nil +} +``` + +**Correction:** Assert on what matters to the behavior being tested. +``` +assert response[:name] == "Alice" +assert response[:role] == "admin" +``` + +## Slow Test Suites + +**Problem:** Tests take so long that developers stop running them frequently. The feedback loop -- the core value of TDD -- breaks down. + +**Correction:** +- Prefer unit tests (fast) over integration tests (slower) for logic-heavy code +- Use integration tests strategically for critical paths, not for every code path +- Avoid unnecessary database hits in unit tests +- If the suite takes more than 30 seconds locally, investigate what is slow + +## The Test-After Trap + +**Problem:** Writing implementation first, then retrofitting tests. The tests end up verifying the implementation rather than specifying behavior. Tests shaped by existing code cannot drive design. + +**Correction:** Commit to writing the test first, even when the implementation seems obvious. The discipline of writing the test first often reveals edge cases and API awkwardness that would otherwise be missed. + +## Testing Too Many Things at Once + +**Problem:** A single test verifies multiple behaviors. When it fails, it is unclear which behavior broke. + +**Example (bad):** +``` +test "user registration" do + # Creates user, sends email, logs event, redirects -- all in one test +end +``` + +**Correction:** One behavior per test. Split into: "registration creates user," "registration sends welcome email," "registration logs signup event." Each test is a sentence that describes one thing. + +## Ignoring Test Failure Messages + +**Problem:** Tests fail with unhelpful messages like "expected true, got false" or "assertion failed." Debugging requires reading the test source. + +**Example (bad):** +``` +assert result +``` + +**Correction:** Include context in the assertion message. +``` +assert result, "Expected expired subscription to deny access, but access was granted" +``` + +Failure messages are documentation. When a test fails six months from now, the message should explain the business rule without reading the test body. + +## Over-Implementing on GREEN (AI-Specific) + +**Problem:** AI coding assistants tend to write the complete, optimized solution in the GREEN step instead of the minimum code to pass the test. This defeats TDD's incremental design benefit -- the design emerges from many small steps, not one large leap. + +**Example (bad):** +``` +# RED: test that fizzbuzz(3) returns "Fizz" +# GREEN: writes the entire FizzBuzz algorithm including Buzz, FizzBuzz, and edge cases +``` + +**Correction:** In the GREEN step, write only what the current failing test demands. If the test says `fizzbuzz(3)` returns `"Fizz"`, a valid GREEN implementation is `return "Fizz"`. The next test will force generalization. + +## Mirror Tests (AI-Specific) + +**Problem:** AI generates tests that are structural copies of the implementation -- the test essentially re-implements the production code and asserts they match. These tests pass by definition and catch nothing. + +**Example (bad):** +``` +# Production: total = price * quantity * (1 - discount) +# Test: assert total == price * quantity * (1 - discount) +``` + +**Correction:** Tests should use concrete values, not replicate the formula. Assert `calculate_total(10, 5, 0.1) == 45`, not `calculate_total(price, qty, disc) == price * qty * (1 - disc)`. + +## Skipping RED Verification (AI-Specific) + +**Problem:** AI writes the test and implementation together in one pass without verifying the test fails first. This skips the most important step -- confirming the test can detect failure. + +**Correction:** Always run the test before writing implementation code. Observe the failure. Read the failure message. Only then write the GREEN step. This is the discipline that separates TDD from "writing tests."