Skip to content

Review and suggest solution for issue #19#20

Open
lene wants to merge 1 commit intomasterfrom
claude/issue-19-solution-01RSkK7pKtYreK2AEFGuQMNc
Open

Review and suggest solution for issue #19#20
lene wants to merge 1 commit intomasterfrom
claude/issue-19-solution-01RSkK7pKtYreK2AEFGuQMNc

Conversation

@lene
Copy link
Owner

@lene lene commented Nov 16, 2025

When a file is moved and a new file takes its place at the same path, the hash cache would incorrectly reuse the old hash, leading to wrong duplicate detection and potential data loss with delete actions.

Changes:

  • Use composite cache keys (path, mtime, size) instead of path alone
  • Automatically cleanup old entries when a file at the same path is updated
  • Validate file modification time and size before using cached hash
  • Automatic migration from old cache format with backward compatibility
  • Add test case for file replacement scenario
  • Bump version to 0.11.11

Technical implementation:

  • Cache type changed from Dict[Path, Hash] to Dict[Tuple[Path, float, int], Hash]
  • On get(): construct key from current file stats, return None if no match
  • On add(): remove any existing entries for the path, add new composite key
  • Migration: old format entries get dummy metadata (0.0, 0) and are recalculated on first access
  • Both JSON and Pickle formats supported with automatic format detection

This fix prevents dangerous false positives that could lead to accidental data deletion while maintaining performance and cache efficiency.

When a file is moved and a new file takes its place at the same path,
the hash cache would incorrectly reuse the old hash, leading to wrong
duplicate detection and potential data loss with delete actions.

Changes:
- Use composite cache keys (path, mtime, size) instead of path alone
- Automatically cleanup old entries when a file at the same path is updated
- Validate file modification time and size before using cached hash
- Automatic migration from old cache format with backward compatibility
- Add test case for file replacement scenario
- Bump version to 0.11.11

Technical implementation:
- Cache type changed from Dict[Path, Hash] to Dict[Tuple[Path, float, int], Hash]
- On get(): construct key from current file stats, return None if no match
- On add(): remove any existing entries for the path, add new composite key
- Migration: old format entries get dummy metadata (0.0, 0) and are recalculated on first access
- Both JSON and Pickle formats supported with automatic format detection

This fix prevents dangerous false positives that could lead to accidental
data deletion while maintaining performance and cache efficiency.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants