Skip to content

Support .gitignore patterns for excluding files during source discovery #57

@patdhlk

Description

@patdhlk

Description

Currently, sphinx-codelinks can optionally use the .gitignore file at the repository root to exclude files during source discovery. However, there are several enhancements needed to make this feature more robust and flexible.

Current Behavior

When gitignore = true is set in the configuration, the tool uses the .gitignore file at the source root to filter out files. This works for basic cases but has limitations.

Requested Enhancements

  1. Support nested .gitignore files: Git supports .gitignore files in subdirectories, which apply to files within that directory and its children. Currently, only the root .gitignore is considered.

  2. Support .gitignore patterns in configuration: Allow users to specify gitignore-style patterns directly in the codelinks.toml configuration without needing an actual .gitignore file. This is useful for:

    • CI/CD environments where the .gitignore might not be available
    • Excluding files that shouldn't be in .gitignore but should be excluded from traceability
    • Bazel builds where the source tree structure differs from the git repository
  3. Support global gitignore: Respect the global gitignore file (~/.gitignore_global or configured via core.excludesFile)

  4. Support .git/info/exclude: This file contains patterns specific to the local repository that aren't shared via .gitignore

Proposed Configuration

[codelinks.projects.my_project.source_discover]
src_dir = "./src"

# Existing option - use .gitignore at src_dir root
gitignore = true

# NEW: Use nested .gitignore files in subdirectories
nested_gitignore = true

# NEW: Additional gitignore-style patterns (applied after .gitignore)
gitignore_patterns = [
    "**/__pycache__/",
    "**/node_modules/",
    "*.generated.py",
    "vendor/**",
]

# NEW: Path to a custom gitignore file (useful for Bazel builds)
# Question if really needed/if it is an actual usecase. Initially not required
gitignore_file = "/path/to/custom/.gitignore"

Use Cases

  1. Monorepo with multiple projects: Each project directory has its own .gitignore with project-specific patterns
  2. Generated code exclusion: Exclude auto-generated files that match certain patterns without modifying .gitignore
  3. [optional] Bazel/Buck builds: Specify a custom gitignore file path or inline patterns when the build sandbox doesn't have access to the original .gitignore
  4. CI/CD pipelines: Exclude test fixtures, mock data, or vendored dependencies from traceability analysis

Technical Details

File: src/sphinx_codelinks/source_discover/source_discover.py

The current implementation uses gitignore_parser library to parse the .gitignore file:

from gitignore_parser import parse_gitignore

# Currently only checks root .gitignore
if self.src_discover_config.gitignore:
    gitignore_path = Path(self.src_discover_config.src_dir) / ".gitignore"
    if gitignore_path.exists():
        self.gitignore_matcher = parse_gitignore(gitignore_path)

Acceptance Criteria

  • Support gitignore_patterns configuration option for inline patterns
  • Support gitignore_file configuration option for custom gitignore file path
  • Support nested_gitignore option to recursively apply .gitignore files in subdirectories
  • Add configuration validation for new options
  • Add documentation for new configuration options
  • Add unit tests for each new feature
  • Maintain backward compatibility with existing gitignore = true/false option

Related

  • Current gitignore implementation: src/sphinx_codelinks/source_discover/source_discover.py
  • Configuration: src/sphinx_codelinks/source_discover/config.py
  • Tests: tests/test_source_discover.py

Labels

  • enhancement
  • source-discover
  • configuration

Priority

Medium - This is a quality-of-life improvement that would benefit users with complex repository structures or non-standard build systems.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions