Skip to content

feat: warn on misspelled language suffix in prompt filenames#491

Draft
Serhan-Asad wants to merge 2 commits intopromptdriven:mainfrom
Serhan-Asad:fix/issue-451
Draft

feat: warn on misspelled language suffix in prompt filenames#491
Serhan-Asad wants to merge 2 commits intopromptdriven:mainfrom
Serhan-Asad:fix/issue-451

Conversation

@Serhan-Asad
Copy link
Contributor

Summary

  • Fixes default_language from .pddrc being ignored during language detection
  • Adds fuzzy matching to detect misspelled language suffixes in prompt filenames (e.g., new_typscript.prompt) and warns the user before silently falling back to default_language
  • Uses Levenshtein distance ≤ 2 with minimum token length of 4 to avoid false positives on short language names (r, d, go)
  • Refactors _is_known_language to share the language set via _get_known_languages helper (no duplication)

Fixes #451

Test plan

  • 23 new tests covering _get_known_languages, _levenshtein_distance, _closest_known_language, and end-to-end warning behavior
  • All 90 construct_paths tests pass
  • Full test suite passes (773 passed, 7 skipped, 1 pre-existing failure unrelated)

…riven#451)

Adds fuzzy matching (Levenshtein distance ≤ 2, token length ≥ 4) to detect
misspelled language suffixes in prompt filenames and warn the user before
falling back to default_language. Refactors _is_known_language to share
the language set via _get_known_languages.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances language detection in prompt filenames by adding fuzzy matching to detect and warn about misspelled language suffixes (e.g., "typscript" instead of "typescript"). Additionally, it fixes a bug where the default_language from .pddrc was being ignored during language detection.

Changes:

  • Refactored language validation to use a shared _get_known_languages() helper to eliminate code duplication
  • Added Levenshtein distance calculation and fuzzy matching to detect misspelled language suffixes with a threshold of ≤2 edits
  • Implemented fallback to default_language from .pddrc when no language can be determined from other sources

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
pdd/construct_paths.py Refactored _is_known_language to use _get_known_languages(), added _levenshtein_distance and _closest_known_language functions for fuzzy matching, and integrated misspelling warnings with default language fallback
tests/test_construct_paths.py Added 23 new tests covering language detection helpers, fuzzy matching behavior, and end-to-end warning scenarios; patched _find_pddrc_file in existing tests to prevent configuration interference

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

'nim', 'ocaml', 'groovy', 'coffeescript', 'fish', 'zsh',
'prisma', 'lean', 'agda',
# Frontend / templating
'prisma', 'lean', 'agda', 'lisp', 'scheme', 'ada',
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new languages 'lisp', 'scheme', and 'ada' were added on the same line as 'prisma', 'lean', and 'agda', breaking the logical grouping pattern. Based on the removed comment, these appear to be different categories. Consider moving them to a separate line or adding a comment to clarify their grouping.

Suggested change
'prisma', 'lean', 'agda', 'lisp', 'scheme', 'ada',
'prisma', 'lean', 'agda',
'lisp', 'scheme', 'ada',

Copilot uses AI. Check for mistakes.
Comment on lines 3174 to 3176
# Exact matches are handled by _is_known_language, not this function
# But if called, distance is 0 which is <= 2, so it returns the match
assert _closest_known_language("typescript") == "typescript"
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test assumes that exact matches return the language itself, but the comment suggests this case should be handled by _is_known_language instead. Consider testing the actual expected behavior where _closest_known_language is only called for non-exact matches, or explicitly document that returning exact matches is intentional fallback behavior.

Copilot uses AI. Check for mistakes.
@Serhan-Asad Serhan-Asad marked this pull request as draft February 11, 2026 22:24
Move Levenshtein distance and closest language matching to a future
feature PR. Keep _get_known_languages refactor and default_language
fallback. Also fix language grouping per Copilot review.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Serhan-Asad added a commit to Serhan-Asad/pdd that referenced this pull request Feb 13, 2026
Duplicate of PR promptdriven#491 for testing purposes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Serhan-Asad added a commit to Serhan-Asad/pdd that referenced this pull request Feb 13, 2026
Duplicate of PR promptdriven#491 for testing purposes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Language detection ignores .pddrc default_language setting

1 participant