From 988a7eab317e3eff37198ac5a845f0e6022ad8fc Mon Sep 17 00:00:00 2001 From: Richard Hart Date: Tue, 27 Jan 2026 10:55:38 -0800 Subject: [PATCH] Add Generated Artifact Verification to reflection framework Add verification steps for generated artifacts before declaring work complete. This catches common AI agent failures: - Cross-references to non-existent tools/APIs - Sensitive information in committed files (absolute paths, usernames) - Documentation drift (stale counts, outdated references) - Claims not verified against actual system state Changes: - Add Step 1.6: Generated Artifact Verification with checklist and commands - Add 4 new items to Refinement Triggers (Dependency/Impact Gaps) - Add 4 new items to Self-Refine Checklist Motivation: External review consistently catches issues that self-reflection misses. These verification steps formalize what external reviewers check. Co-Authored-By: Claude Opus 4.5 --- plugins/reflexion/commands/reflect.md | 68 ++++++++++++++++++++++++++- 1 file changed, 66 insertions(+), 2 deletions(-) diff --git a/plugins/reflexion/commands/reflect.md b/plugins/reflexion/commands/reflect.md index 0e3b0ed..be36a1a 100644 --- a/plugins/reflexion/commands/reflect.md +++ b/plugins/reflexion/commands/reflect.md @@ -81,12 +81,60 @@ Before proceeding, evaluate your most recent output against these criteria: - [ ] Are there edge cases that haven't been considered? - [ ] Could there be unintended side effects? -4. **Fact-Checking Required** +4. **Dependency & Impact Verification** (CRITICAL - per ISSUE-086, DEC-096) + - [ ] For ANY proposed addition/deletion/modification, have you checked for dependencies? + - [ ] Have you searched for related decisions (DEC-###) that may be superseded or supersede this? + - [ ] Have you checked AUTHORITATIVE.yaml for active evaluations or status? + - [ ] Have you searched the ecosystem for files/processes that depend on items being changed? + - [ ] If recommending removal of anything, have you verified nothing depends on it? + + **Mandatory Checks Before Recommending Changes:** + ```bash + # Check for active evaluations/status + grep -A20 "item_name" ~/dev/AUTHORITATIVE.yaml | grep -i "status\|evaluation\|active" + + # Check for ecosystem dependencies + grep -ri "item_name" ~/dev/infrastructure/ --include="*.md" --include="*.yaml" | head -20 + + # Check for related/superseding decisions + grep -i "item_name" ~/dev/infrastructure/dev-env-docs/DECISIONS-LOG.md | head -10 + + # Check for dedicated project directories + find ~/dev/infrastructure -maxdepth 2 -type d -iname "*item_name*" 2>/dev/null + ``` + + **HARD RULE:** If ANY check reveals active dependencies, evaluations, or pending decisions, FLAG THIS IN THE EVALUATION. Do not approve work that recommends changes without dependency verification. + +5. **Fact-Checking Required** - [ ] Have you made any claims about performance? (needs verification) - [ ] Have you stated any technical facts? (needs source/verification) - [ ] Have you referenced best practices? (needs validation) - [ ] Have you made security assertions? (needs careful review) +6. **Generated Artifact Verification** (CRITICAL for any generated code/content) + - [ ] **Cross-references validated**: Any references to external tools, APIs, or files verified to exist with correct names + - [ ] **Security scan**: Generated files checked for sensitive information (absolute paths with usernames, credentials, internal URLs) + - [ ] **Documentation sync**: If counts, stats, or references changed, all documentation citing them updated + - [ ] **State verification**: Claims about system state verified with actual commands, not memory + + **Verification Commands (run before declaring complete):** + ```bash + # Cross-reference check: verify tool/API names exist + # Example for MCP tools: + grep -o 'mcp_[a-z_]*' generated_file.py | sort -u | while read tool; do + grep -q "$tool" ~/.config/claude/claude_desktop_config.json || echo "MISSING: $tool" + done + + # Security scan: check staged files for sensitive paths (Linux, macOS, Windows) + git diff --cached --name-only | xargs grep -l '/home/\|/Users/\|C:\\Users\|%USERPROFILE%' 2>/dev/null + + # Documentation sync: find docs referencing old values after changes + # Example: if you changed a count from 117 to 118 + grep -rn "117" docs/ *.md | grep -i "count\|total\|items" + ``` + + **HARD RULE:** Do not declare work complete until verification commands confirm claims match reality. + ### Step 2: Decision Point Based on the assessment above, determine: @@ -526,7 +574,18 @@ Automatically trigger refinement if any of these conditions are met: - No library search for common problems - No consideration of existing services -4. **Architecture Violations** +4. **Dependency/Impact Gaps** (CRITICAL) + - Recommended deletion/removal without dependency check + - Cited prior decision (DEC-###) without checking for superseding decisions + - Proposed config changes without checking AUTHORITATIVE.yaml + - Modified ecosystem files without searching for dependents + - Any destructive action without PRE-MODIFICATION GATE checks + - Generated cross-references without validation against source of truth + - Committed files containing absolute paths or usernames + - Changed counts/stats without updating referencing documentation + - Declared complete without running verification commands + +5. **Architecture Violations** - Business logic in controllers/views - Domain logic depending on infrastructure - Unclear boundaries between contexts @@ -548,6 +607,11 @@ Before finalizing any output: - [ ] Did I search for existing libraries before writing custom code? - [ ] Is the architecture aligned with Clean Architecture/DDD principles? - [ ] Are names domain-specific rather than generic (utils/helpers)? +- [ ] **CROSS-REFERENCE CHECK:** Any tool/API/file references verified against actual inventory (not assumed) +- [ ] **SECURITY CHECK:** Generated files scanned for sensitive info (paths, usernames, credentials) +- [ ] **DOCUMENTATION SYNC:** All docs referencing changed values have been updated +- [ ] **STATE VERIFICATION:** Claims verified with actual commands, not memory +- [ ] **DEPENDENCY CHECK:** For any additions/deletions/modifications, have I verified no active dependencies, evaluations, or superseding decisions exist? ### Reflexion Questions