Skip to content

feat: Implement user story prompt validation [Fixes #346]#484

Open
beknobloch wants to merge 3 commits intopromptdriven:mainfrom
beknobloch:user_stories
Open

feat: Implement user story prompt validation [Fixes #346]#484
beknobloch wants to merge 3 commits intopromptdriven:mainfrom
beknobloch:user_stories

Conversation

@beknobloch
Copy link
Contributor

Implements user story validation and fix workflows for prompt development.

Summary

  • Added pdd story-test to validate prompts against user stories (story__*.md).
  • Added core user story utilities:
    • run_user_story_tests to validate story compliance via detect.
    • run_user_story_fix to apply story-driven prompt updates and re-validate.
  • Integrated automatic user story validation into pdd change after prompt modifications.
  • Extended pdd fix with user story mode (pdd fix user_stories/story__*.md).
  • Added user story template and README documentation for setup and usage.
  • Added test coverage for command wiring and user story validation/fix behavior.

New files

  • pdd/user_story_tests.py: Contains logic for user story validation and fixing.
  • error_log.txt: Logs for pytest output and validation attempts.
  • user_stories/story__template.md: Template for creating user stories.
  • tests/test_user_story_tests.py: Unit tests for user story functionality.

Test Results

  • Unit tests: PASS
  • Regression tests: PASS
  • Sync regression: PASS
  • Test coverage: 79% for tests/test_user_story_tests.py and 86% for tests/test_change_main.py

Fixes #346

- Added user story validation feature to ensure prompt changes align with user stories.
- Introduced `story-test` command for validating prompt changes against user stories.
- Implemented `run_user_story_tests` and `run_user_story_fix` functions for handling user story tests and fixes.
- Updated `README.md` to include documentation for new user story features and commands.
- Added tests for user story validation and fix functionality to ensure reliability.

New files:
- `pdd/user_story_tests.py`: Contains logic for user story validation and fixing.
- `error_log.txt`: Logs for pytest output and validation attempts.
- `user_stories/story__template.md`: Template for creating user stories.
- `tests/test_user_story_tests.py`: Unit tests for user story functionality.
- Introduced multiple tests for the `change_main` function to validate input handling, including requirements for change prompts and input codes.
- Added tests to ensure proper error messages for CSV-related issues, such as missing files, empty headers, and incorrect input formats.
- Implemented checks for handling exceptions during CSV processing and ensured appropriate responses for invalid inputs.
- Enhanced test coverage for both CSV and non-CSV scenarios to improve reliability and robustness of the functionality.
@gltanaka gltanaka requested a review from Copilot February 10, 2026 23:34
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a user-story driven prompt validation and fix workflow, including a new CLI command and automatic validation after prompt changes.

Changes:

  • Introduces pdd story-test to validate prompts against user_stories/story__*.md.
  • Adds pdd/user_story_tests.py utilities to discover story/prompt files, validate via detect_change, and apply story-driven fixes.
  • Integrates optional user story validation into pdd change and adds a user-story mode to pdd fix.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
user_stories/story__template.md Adds a starter template for writing user stories.
pdd/user_story_tests.py Implements discovery, validation, and fix flows for user story testing.
pdd/change_main.py Runs user story validation after prompt modifications and can fail the command on story failures.
pdd/commands/analysis.py Adds story-test Click command wiring to run story validation from CLI.
pdd/commands/init.py Registers the new story-test command.
pdd/commands/fix.py Adds a “user story fix mode” that runs story-driven fixes for a single story__*.md.
tests/test_user_story_tests.py Adds unit coverage for story discovery, validation, and fix behavior.
tests/test_change_main.py Adds tests for change flow interactions with user story validation (including skip / CSV behaviors).
tests/commands/test_analysis.py Adds CLI wiring tests for story-test.
tests/commands/test_fix.py Adds CLI wiring test for user story fix mode.
README.md Documents story-test, validation defaults/overrides, and fix mode usage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +329 to +331
patch.object(Path, 'mkdir') as mock_mkdir, \
patch("pdd.change_main.run_user_story_tests") as mock_story_tests: # Mock user story validation
mock_story_tests.return_value = (True, [], 0.0, "")
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This with ... as mock_story_tests: block is followed by a mis-indented statement (mock_story_tests.return_value = ...). As written, Python will raise an IndentationError because there is no new block opened between line 330 and 331. Align line 331 with the rest of the with block body (same indentation level as result = change_main(...)).

Copilot uses AI. Check for mistakes.
Comment on lines +247 to +249
changed_files.append(str(prompt_path))
if result_message.startswith("[bold red]Error"):
errors.append(result_message)
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error detection here is unreliable: change_main() often returns plain-text error messages (e.g., \"Error during prompt modification: ...\") that do not start with \"[bold red]Error\", so failures can be silently treated as success. Also, changed_files.append(...) happens even when change_main() fails. A concrete fix is to make change_main() return an explicit success flag (or raise on failure) and use that to (1) decide whether to append to changed_files and (2) collect errors consistently. If changing change_main()'s return type is too invasive, standardize change_main() error messages so they all share a consistent, machine-detectable prefix and check for that prefix here before appending changed_files.

Suggested change
changed_files.append(str(prompt_path))
if result_message.startswith("[bold red]Error"):
errors.append(result_message)
# Treat both Rich-styled and plain-text "Error" prefixes as failures.
is_error = isinstance(result_message, str) and (
result_message.startswith("[bold red]Error")
or result_message.startswith("Error")
)
if is_error:
errors.append(result_message)
else:
changed_files.append(str(prompt_path))

Copilot uses AI. Check for mistakes.
strength=strength,
temperature=temperature,
time=time_budget,
verbose=not quiet,
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation call sets verbose=not quiet, which conflates verbosity with quiet mode and ignores the existing ctx.obj.get('verbose', False) option used elsewhere. This changes behavior (validation becomes verbose whenever quiet is false) and makes verbosity inconsistent across commands. Consider passing the actual configured verbose flag (e.g., ctx.obj.get('verbose', False)) and keeping quiet solely responsible for suppressing output.

Suggested change
verbose=not quiet,
verbose=ctx.obj.get("verbose", False),

Copilot uses AI. Check for mistakes.
Comment on lines +525 to +533
merged: List[Path] = []
seen = set()
for pf in override_prompts + base_prompts:
key = pf.name.lower()
if key in seen:
continue
merged.append(pf)
seen.add(key)
validation_prompt_files = merged
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

De-duplicating by pf.name.lower() can drop distinct prompt files that share the same basename in different subdirectories. That can cause user story validation to miss prompts (false passes) depending on repository layout. Use a uniqueness key based on full normalized path (e.g., str(pf.resolve()).lower()) or avoid de-duplication entirely and let detect_change see all prompt files.

Copilot uses AI. Check for mistakes.
@gltanaka gltanaka self-requested a review February 11, 2026 01:08
Copy link
Contributor

@gltanaka gltanaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @beknobloch, thanks for putting this together! Really like the approach of reusing detect as the validation mechanism — it's a clean design that avoids reinventing anything. The recursion prevention with skip_user_stories is well thought out too.

A few suggestions:

Design question — command naming:
The existing CLI commands are mostly single words (detect, fix, change, trace). Since story-test is essentially running detect in batch
mode against story files, would it make sense to add this as a flag on detect instead (e.g., pdd detect --stories)? That would keep
the command surface smaller and make the relationship to detect more obvious. Open to discussing this though.

Code suggestions:

  1. Fragile error detection in run_user_story_fix (user_story_tests.py ~line 248): result_message.startswith("[bold red]Error") is
    coupled to Rich markup formatting. If the formatting in change_main ever changes, this silently breaks. Could we use a more reliable
    signal from the return value?
  2. Hardcoded src/ in _prompt_to_code_path (~line 96): code_dir = prompts_dir.parent / "src" assumes the code directory is always ../src
    relative to prompts. Since stories and prompts dirs both have env var overrides, it'd be nice to have the same flexibility here (e.g.,
    PDD_SRC_DIR).
  3. Docstrings and logging: The helper functions in user_story_tests.py are missing docstrings (project style guide requires them), and
    the module doesn't set up logger = logging.getLogger(name) like other modules do. Adding both would bring it in line with the rest
    of the codebase.

Minor nits:

  • The setattr(ctx, "obj", ctx_obj) calls in run_user_story_fix are likely redundant since ctx.obj is already a dict reference that's mutated in place.

Overall this is solid work — the test coverage on the core validation path is good, and the integration into change and fix is
minimally invasive. Nice job! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Using user stories as unit tests for prompts

2 participants