Skip to content

Conversation

@rhart696
Copy link

@rhart696 rhart696 commented Jan 27, 2026

What

Add "Generated Artifact Verification" step to the reflection framework, catching common AI agent failures before work is declared complete.

Why

External code review consistently catches issues that self-reflection misses:

Issue Type Example Root Cause
Invalid cross-references query_decision instead of query_decisions Assumed tool names without verification
Leaked sensitive info /home/username/... in committed files No security scan of generated output
Documentation drift "117 items" when actual count is 118 Changed data without updating docs
Unverified claims "This is configured" without checking Relied on memory, not actual state

Changes

  1. New Step 1.6: Generated Artifact Verification checklist with verification commands
  2. Updated Refinement Triggers: Added 4 generated artifact failure patterns
  3. Updated Self-Refine Checklist: Added 4 verification items (CROSS-REFERENCE CHECK, SECURITY CHECK, DOCUMENTATION SYNC, STATE VERIFICATION)

Token Impact

Adds ~400 tokens to ~6500 existing (~6% increase).

Testing Status

Motivation source: Real-world scenario where external review (Codex GPT-5.2) caught 4 issues that self-reflection (Claude + /reflexion:reflect at 3.5/5 score) missed.

Testing recommendation: Apply changes locally with --plugin-dir and verify new checklist items appear during reflection.

Checklist

  • Follows plugin structure
  • Minimal token footprint
  • Focused on quality - addresses specific, documented failure mode
  • Uses MUST/SHOULD tags appropriately (HARD RULE for critical verification)

Note

Introduces critical verification gates to the reflection checklist in plugins/reflexion/commands/reflect.md.

  • Adds Dependency & Impact Verification step with mandatory grep/find checks against AUTHORITATIVE.yaml, decisions logs, and ecosystem dependents; includes a hard rule to flag active dependencies
  • Adds Generated Artifact Verification step with cross-reference validation, security scan for sensitive paths, documentation sync, and state verification, plus runnable commands
  • Updates Refinement Triggers with a new "Dependency/Impact Gaps" category; shifts "Architecture Violations" to follow it
  • Expands Final Self-Refine Checklist with new verification items (cross-reference, security, doc sync, state verification, dependency check)

Written by Cursor Bugbot for commit 988a7ea. This will update automatically on new commits. Configure here.

Add verification steps for generated artifacts before declaring work complete.
This catches common AI agent failures:
- Cross-references to non-existent tools/APIs
- Sensitive information in committed files (absolute paths, usernames)
- Documentation drift (stale counts, outdated references)
- Claims not verified against actual system state

Changes:
- Add Step 1.6: Generated Artifact Verification with checklist and commands
- Add 4 new items to Refinement Triggers (Dependency/Impact Gaps)
- Add 4 new items to Self-Refine Checklist

Motivation: External review consistently catches issues that self-reflection
misses. These verification steps formalize what external reviewers check.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings January 27, 2026 18:56
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the reflection framework to explicitly verify generated artifacts (code/content) and detect common AI agent failure modes before work is marked complete.

Changes:

  • Add a new Dependency & Impact Verification section with mandatory CLI checks for dependencies, decisions, and ecosystem impact.
  • Introduce a Generated Artifact Verification checklist plus concrete verification commands for cross-references, security scanning, and documentation sync.
  • Expand refinement triggers and the final self-refine checklist to cover dependency/impact gaps and generated artifact verification requirements.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

done

# Security scan: check staged files for sensitive paths (Linux, macOS, Windows)
git diff --cached --name-only | xargs grep -l '/home/\|/Users/\|C:\\Users\|%USERPROFILE%' 2>/dev/null
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The git diff --cached --name-only | xargs grep -l ... pipeline will emit an error (grep: missing file operand) and a non-zero exit status when there are no staged files, which is a common legitimate case and conflicts with the "HARD RULE" expectation that these commands can be run cleanly as a gate. Consider making the command no-op-safe for an empty file list (for example by using an option like xargs -r or guarding the grep invocation) so that the verification step doesn't fail spuriously when there's nothing to scan.

Suggested change
git diff --cached --name-only | xargs grep -l '/home/\|/Users/\|C:\\Users\|%USERPROFILE%' 2>/dev/null
git diff --cached --name-only | xargs -r grep -l '/home/\|/Users/\|C:\\Users\|%USERPROFILE%' 2>/dev/null

Copilot uses AI. Check for mistakes.
- Cited prior decision (DEC-###) without checking for superseding decisions
- Proposed config changes without checking AUTHORITATIVE.yaml
- Modified ecosystem files without searching for dependents
- Any destructive action without PRE-MODIFICATION GATE checks
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reference to PRE-MODIFICATION GATE here introduces a critical-sounding check without explaining what that gate is or where its checks are defined, which can make it hard for readers to understand how to comply with this trigger. Consider either defining the PRE-MODIFICATION GATE earlier in this document or linking to the section/file where its checklist and process are described, so evaluators have concrete guidance on what is expected.

Suggested change
- Any destructive action without PRE-MODIFICATION GATE checks
- Any destructive action without passing the PRE-MODIFICATION GATE (a pre-change safety checklist covering dependencies, backups, approvals, and rollback)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant