Skip to content

Conversation

@ali90h
Copy link
Owner

@ali90h ali90h commented Sep 13, 2025

Summary

This PR implements the enhanced scan functionality as described in issue #110, adding hierarchical scanning capabilities with depth control, pattern-based filtering, and gitignore integration.

🚀 New Features

Hierarchical Scanning

  • `--depth N`: Controls scan depth (0=root only, unlimited by default)
  • Recursive scanning: Uses `pathlib.Path.rglob` for efficient file discovery
  • Depth filtering: Files are filtered by directory depth, not including filename

Pattern-Based Filtering

  • `--ignore PATTERN`: Excludes files/directories matching glob patterns (repeatable)
  • Flexible patterns: Supports both file and directory patterns
  • Multiple patterns: Can specify multiple `--ignore` flags

Gitignore Integration

  • `--respect-gitignore`: Honors .gitignore rules when scanning
  • Enhanced parsing: Supports negation patterns (`!pattern`)
  • Directory patterns: Handles both file and directory ignore rules
  • Precedence handling: Later rules override earlier ones

File Sampling

  • `files_sample`: Always present in JSON output with stable ordering
  • `--show N`: Configurable number of sample files per language (default: 5)
  • Deterministic: Files are sorted for consistent results across runs

🔧 API Changes

Enhanced `collect_evidence()` Function

```python
collect_evidence(
root: Path,
depth: Optional[int] = None, # 0=root only, None=unlimited
ignore_patterns: Optional[list[str]] = None, # Glob patterns to ignore
respect_gitignore: bool = False, # Honor .gitignore rules
show_files_sample: Optional[int] = None # Number of sample files
)
```

JSON Schema Updates

  • `files_sample`: Array of sample file paths (always present)
  • Stable ordering: All arrays are sorted deterministically
  • Backward compatible: All existing fields preserved

🧪 Testing

Comprehensive Test Coverage

  • 62 scan-related tests all passing
  • New test suites:
    • `test_scan_enhanced_golden.py` (5 tests for golden file validation)
    • `test_scan_gitignore.py` (6 tests for gitignore functionality)
  • Golden files: Added for different scan scenarios
  • Manual test validation: All acceptance criteria verified

Test Scenarios Covered

  • Depth control (0, 2, unlimited)
  • Pattern filtering with various glob patterns
  • Gitignore integration including negation patterns
  • File sampling with different limits
  • Language detection with filtered files
  • Edge cases (empty directories, missing gitignore, etc.)

📚 Documentation

Updated Documentation

  • README.md: Enhanced with new options, features, and usage examples
  • CLI help: All new flags documented with clear descriptions
  • Usage examples: Practical examples for all new functionality

Usage Examples

```bash

Depth control

autorepro scan --depth 0 # Root only
autorepro scan --depth 2 # Up to 2 levels deep

Pattern filtering

autorepro scan --ignore 'node_modules/' --ignore 'dist/'

Gitignore integration

autorepro scan --respect-gitignore

File sampling

autorepro scan --json --show 3 # Up to 3 sample files per language
```

✅ Acceptance Criteria

All acceptance criteria from issue #110 are met:

  • depth=0 replicates current behavior (repository root only)
  • Ignored files do not contribute to detection scores or language presence
  • languages..files_sample is present in JSON with stable ordering across runs
  • Manual tests all pass as specified
  • Golden files match deterministically
  • Backward compatibility maintained - all existing tests pass

🔄 Manual Test Results

All manual tests from the issue pass:

  1. ✅ `depth=0` → python only
  2. ✅ `depth=2` → python,node
  3. ✅ `ignore pattern excludes subdir` → python only
  4. ✅ `files_sample length reflects --show` → correct length
  5. ✅ `.gitignore scenario` → python only (respects gitignore)
  6. ✅ `files_sample always present in JSON` → true

🚦 Breaking Changes

None - This is a fully backward-compatible enhancement:

  • All existing CLI usage continues to work unchanged
  • JSON output includes new `files_sample` field but preserves all existing fields
  • Default behavior (no new flags) remains identical to previous version

🔗 Related

Closes #110


Ready for review - All tests passing, documentation updated, and acceptance criteria met.

Implements T-020: Enhance scan — depth, ignore, and patterns (#110)

## New Features
- **Hierarchical scanning**: --depth N controls scan depth (0=root only, unlimited by default)
- **Pattern filtering**: --ignore PATTERN excludes files/directories (repeatable)
- **Gitignore integration**: --respect-gitignore honors .gitignore rules including negation patterns
- **File sampling**: JSON output includes files_sample array (default 5, configurable with --show N)

## API Changes
- Enhanced collect_evidence() with depth, ignore_patterns, respect_gitignore, show_files_sample parameters
- files_sample field now always present in JSON output with stable ordering
- Improved gitignore parsing with support for negation patterns (!pattern)

## Testing
- Added comprehensive test suites for enhanced functionality
- Created golden test files for different scan scenarios
- All existing tests pass, maintaining backward compatibility
- 62 scan-related tests covering all new features

## Documentation
- Updated README.md with new options and usage examples
- Enhanced CLI help text for all new flags
- Added examples for depth control, filtering, and gitignore integration

Fixes #110
@ali90h ali90h merged commit 13c4266 into main Sep 13, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

T-020: Enhance scan — depth, ignore, and patterns

2 participants