Skip to content

Commit bb04736

Browse files
committed
add mcp config
1 parent 1bcd0f7 commit bb04736

File tree

1 file changed

+110
-0
lines changed

1 file changed

+110
-0
lines changed

.github/copilot-instructions.md

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
# GitHub Backup - AI Coding Instructions
2+
3+
## Project Overview
4+
5+
This is a Python CLI tool for comprehensive GitHub data backup. The architecture follows a single-module design with clear separation of concerns across functional areas.
6+
7+
## Core Architecture
8+
9+
### Main Entry Points
10+
- **`bin/github-backup`**: CLI entry point that orchestrates the backup workflow
11+
- **`github_backup/github_backup.py`**: Single module containing all core functionality (~1400+ lines)
12+
- **`github_backup/__init__.py`**: Version tracking only
13+
14+
### Data Flow Pattern
15+
1. **Parse & Authenticate**`parse_args()``get_auth()``get_authenticated_user()`
16+
2. **Discover**`retrieve_repositories()``filter_repositories()`
17+
3. **Backup**`backup_repositories()` + `backup_account()`
18+
19+
### GitHub API Integration
20+
- Uses `retrieve_data_gen()` for paginated API calls with automatic rate limiting
21+
- Template-based URL construction: `"https://{host}/repos/{owner}/{name}/issues"`
22+
- Built-in retry logic for 502 errors and incomplete reads
23+
- Supports both classic tokens (`-t`) and fine-grained tokens (`-f`)
24+
25+
## Key Development Patterns
26+
27+
### Authentication Flexibility
28+
```python
29+
# Supports multiple auth methods in get_auth():
30+
# - Fine-grained tokens (github_pat_...)
31+
# - Classic tokens with x-oauth-basic
32+
# - Basic username/password
33+
# - OSX Keychain integration
34+
# - GitHub App authentication (--as-app)
35+
```
36+
37+
### Incremental Backup Strategy
38+
- **Time-based**: `--incremental` uses API `since` parameter with last backup timestamp
39+
- **File-based**: `--incremental-by-files` compares filesystem modification times
40+
- State stored in `{output_dir}/last_update` file
41+
42+
### Git Repository Handling
43+
- Uses `logging_subprocess()` wrapper for all git operations
44+
- Supports both regular clones and bare/mirror clones (`--bare``git clone --mirror`)
45+
- SSH vs HTTPS preference via `--prefer-ssh` flag
46+
- LFS support with `git lfs fetch --all --prune`
47+
48+
### Output Directory Structure
49+
```
50+
{output_dir}/
51+
├── repositories/{repo_name}/repository/ # Git clones
52+
├── starred/{owner}/{repo_name}/ # Starred repos
53+
├── gists/{gist_id}/ # User gists
54+
├── account/{starred,followers,following}.json
55+
└── {repo}/issues/{number}.json # Per-repo data
56+
```
57+
58+
## Development Workflows
59+
60+
### Testing & Linting
61+
```bash
62+
# No unit tests exist - this is acknowledged in README
63+
pip install flake8
64+
flake8 --ignore=E501,E203,W503 # Same as CI
65+
```
66+
67+
### Docker Development
68+
```bash
69+
docker run --rm -v /path/to/backup:/data --name github-backup \
70+
ghcr.io/josegonzalez/python-github-backup -o /data $OPTIONS $USER
71+
```
72+
73+
### Release Process
74+
- Automated via GitHub Actions (`automatic-release.yml`, `tagged-release.yml`)
75+
- Version bumping in `github_backup/__init__.py`
76+
- Docker image publishing to ghcr.io
77+
78+
## Critical Implementation Details
79+
80+
### Rate Limiting Strategy
81+
- Automatic throttling based on `x-ratelimit-remaining` header
82+
- Custom throttling via `--throttle-limit` and `--throttle-pause`
83+
- Exponential backoff for 403 rate limit responses
84+
85+
### Error Handling Philosophy
86+
- Graceful degradation for missing data (404s logged but don't block)
87+
- Blocking errors (403 auth failures) exit entirely
88+
- Incomplete reads get 3 retry attempts with 5-second delays
89+
90+
### File I/O Patterns
91+
- Atomic writes via `.temp` files then `os.rename()`
92+
- UTF-8 encoding with `codecs.open()` for JSON files
93+
- JSON formatting: `ensure_ascii=False, sort_keys=True, indent=4`
94+
95+
## Common Gotchas
96+
97+
1. **`--all` doesn't include everything**: Missing private repos, forks, starred repos, LFS, gists
98+
2. **`--bare` is actually `--mirror`**: Uses `git clone --mirror`, not `git clone --bare`
99+
3. **Starred gists**: Stored in same directory as user gists, not separately
100+
4. **Incremental risks**: Failed runs can cause missing data in subsequent incremental backups
101+
5. **Authentication scope**: Fine-grained tokens need specific repository and user permissions
102+
103+
## Extension Points
104+
105+
When adding new backup types, follow the pattern:
106+
1. Add CLI argument in `parse_args()`
107+
2. Create `backup_*()` function following existing patterns
108+
3. Call from `backup_repositories()` or `backup_account()`
109+
4. Use `retrieve_data()` for API calls and `mkdir_p()` for directories
110+
5. Follow atomic file writing pattern with `.temp` files

0 commit comments

Comments
 (0)