Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
cd119aa
feat: initial release of @git-stunts/empty-graph
flyingrobots Jan 8, 2026
5df552a
feat(empty-graph): implement sharded bitmap index with Roaring Bitmaps
flyingrobots Jan 8, 2026
023f6a3
docs(empty-graph): document The Stunt (Roaring Bitmap Architecture)
flyingrobots Jan 8, 2026
c79138f
feat(empty-graph): add performance benchmarks and D3.js visualization
flyingrobots Jan 8, 2026
324e241
feat(empty-graph): implement sharded roaring bitmap index and dockeri…
flyingrobots Jan 8, 2026
0b9caa7
fix(empty-graph): remove node_modules from repository
flyingrobots Jan 8, 2026
5edd08b
feat(empty-graph): finalize pixel-perfect interpolation dashboard and…
flyingrobots Jan 8, 2026
5f258fb
feat: production readiness - security hardening, docs, CI/CD
flyingrobots Jan 8, 2026
18638e1
feat: comprehensive audit fixes - 100% ship-ready (v2.2.0)
flyingrobots Jan 8, 2026
6e0acf7
feat(empty-graph): implement O(1) query API and fix bitmap keying
flyingrobots Jan 18, 2026
c04e42b
feat(empty-graph): add benchmarks and document index API
flyingrobots Jan 18, 2026
ff1496a
feat(empty-graph): add TypeScript declarations and index ref storage
flyingrobots Jan 18, 2026
e1693e2
fix(empty-graph): address code review issues
flyingrobots Jan 19, 2026
66d4567
fix(empty-graph): address code review feedback
flyingrobots Jan 19, 2026
b7f940f
fix(empty-graph): address additional code review feedback
flyingrobots Jan 19, 2026
7650f0a
docs(empty-graph): update CHANGELOG and README for v2.3.0
flyingrobots Jan 19, 2026
939bdfc
fix(empty-graph): use npm package path for plumbing JSDoc import
flyingrobots Jan 19, 2026
d8093d9
fix(empty-graph): add limit validation and improve tree parsing
flyingrobots Jan 19, 2026
57dd4a3
fix(empty-graph): only return null for ref-not-found errors in readRef
flyingrobots Jan 19, 2026
855df3e
fix(empty-graph): trim stdout from plumbing.execute in OID-returning …
flyingrobots Jan 19, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
node_modules
npm-debug.log
.gitignore
*.md
!README.md
.DS_Store
coverage
.vscode
.idea
33 changes: 33 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
name: CI

on:
push:
branches: [ main ]
pull_request:
branches: [ main ]

jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Use Node.js
uses: actions/setup-node@v4
with:
node-version: '22'
cache: 'npm'
- run: npm install
- run: npm run lint

test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Use Node.js
uses: actions/setup-node@v4
with:
node-version: '22'
cache: 'npm'
- run: npm install
- name: Run tests in Docker
run: docker compose run --rm test
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
node_modules/
.DS_Store
.vite/
coverage/
59 changes: 59 additions & 0 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Architecture: @git-stunts/empty-graph

A graph database substrate living entirely within Git commits, using the "Empty Tree" pattern for invisible storage and Roaring Bitmaps for high-performance indexing.

## 🧱 Core Concepts

### 1. The "Invisible" Graph
Nodes are represented by **Git Commits**.
- **SHA**: The Node ID.
- **Message**: The Node Payload.
- **Tree**: The "Empty Tree" (SHA: `4b825dc642cb6eb9a060e54bf8d69288fbee4904`).
- **Parents**: Graph Edges (Directed).

Because they point to the Empty Tree, these commits introduce **no files** into the repository. They float in the object database, visible only to `git log` and this tool.

### 2. High-Performance Indexing (The "Stunt")
To avoid O(N) graph traversals, we maintain a secondary index structure persisted as a Git Tree.

#### Components:
- **`BitmapIndexService`**: Manages the index.
- **`RoaringBitmap32`**: Used for O(1) set operations and storage.
- **Sharding**: Bitmaps are sharded by OID prefix (e.g., `00`, `01`... `ff`) to allow partial loading.

#### Index Structure (Git Tree):
```text
/
├── meta_xx.json # Maps SHAs to IDs (sharded by prefix)
├── shards_fwd_xx.json # Forward edges: {sha: base64Bitmap, ...}
└── shards_rev_xx.json # Reverse edges: {sha: base64Bitmap, ...}
```

Each shard file contains per-node bitmaps encoded as base64 JSON. This enables O(1) lookups while maintaining efficient storage through prefix-based sharding.

### 3. Hexagonal Architecture

#### Domain Layer (`src/domain/`)
- **Entities**: `GraphNode` (Value Object).
- **Services**:
- `GraphService`: High-level graph operations.
- `BitmapIndexService`: Index management.
- `CacheRebuildService`: Rebuilds the index from the log.

#### Infrastructure Layer (`src/infrastructure/`)
- **Adapters**: `GitGraphAdapter` wraps `git` commands via `@git-stunts/plumbing`.

#### Ports Layer (`src/ports/`)
- **GraphPersistencePort**: Interface for Git operations (`writeBlob`, `writeTree`, `logNodes`).

## 🚀 Performance

- **Write**: O(1) (Append-only commit).
- **Read (Unindexed)**: O(N) (Linear scan of `git log`).
- **Read (Indexed)**: **O(1)** (Bitmap lookup).
- **Rebuild**: O(N) (One-time scan to build the bitmap).

## ⚠️ Constraints

- **Delimiter**: Requires a safe delimiter for parsing `git log` output (mitigated by strict validation).
- **ID Map Size**: The global `ids.json` map grows linearly with node count. For >10M nodes, this map itself should be sharded (Future Work).
125 changes: 125 additions & 0 deletions AUDITS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# Codebase Audit: @git-stunts/empty-graph

**Auditor:** Senior Principal Software Auditor
**Date:** January 7, 2026
**Target:** `@git-stunts/empty-graph`

---

## 1. QUALITY & MAINTAINABILITY ASSESSMENT (EXHAUSTIVE)

### 1.1. Technical Debt Score (1/10)
**Justification:**
1. **Hexagonal Architecture**: Clean separation of `GraphService` and `GitGraphAdapter`.
2. **Domain Entities**: `GraphNode` encapsulates data effectively.
3. **Low Complexity**: The codebase is small and focused.

### 1.2. Readability & Consistency

* **Issue 1:** **Ambiguous "Empty Tree"**
* The term "Empty Tree" is central but assumed. `GitGraphAdapter` relies on `plumbing.emptyTree`.
* **Mitigation Prompt 1:**
```text
In `src/domain/services/GraphService.js` and `index.js`, add JSDoc explaining that the "Empty Tree" is a standard Git object (SHA: 4b825dc6...) that allows creating commits without file content.
```

* **Issue 2:** **Parsing Regex Fragility**
* The regex used to split log blocks in `GraphService.listNodes` (`new RegExp('\n?${separator}\s*$')`) assumes a specific newline structure.
* **Mitigation Prompt 2:**
```text
In `src/domain/services/GraphService.js`, harden the parsing logic. Ensure the format string uses a delimiter that is extremely unlikely to appear in user messages (e.g., a UUID or null byte `%x00`).
```

### 1.3. Code Quality Violation

* No significant violations found.

---

## 2. PRODUCTION READINESS & RISK ASSESSMENT (EXHAUSTIVE)

### 2.1. Top 3 Immediate Ship-Stopping Risks

* **Risk 1:** **Delimiter Injection**
* **Severity:** **Medium**
* **Location:** `src/domain/services/GraphService.js`
* **Description:** `listNodes` uses `--NODE-END--` as a separator. If a user's commit message contains this string, the parser will break.
* **Mitigation Prompt 7:**
```text
In `src/domain/services/GraphService.js`, change the log separator to a control character sequence that cannot be typed in a standard commit message, or use a collision-resistant UUID. Update `GitGraphAdapter` to match.
```

* **Risk 2:** **Linear Scan Scalability (The "O(N) Trap")**
* **Severity:** **RESOLVED**
* **Description:** Originally a high risk, this has been mitigated by the introduction of `BitmapIndexService` and `CacheRebuildService`, which implement a sharded Roaring Bitmap index persisted in Git. This enables O(1) lookups and set operations, matching the performance characteristics of `git-mind`.

### 2.2. Security Posture

* **Vulnerability 1:** **Git Argument Injection (via Refs)**
* **Description:** `listNodes` takes a `ref`. If `ref` is `--upload-pack=...`, it could trigger unexpected git behaviors.
* **Mitigation Prompt 10:**
```text
In `src/infrastructure/adapters/GitGraphAdapter.js`, validate `ref` against a strict regex (e.g., `^[a-zA-Z0-9_/-]+$`) or ensure the plumbing layer's `CommandSanitizer` handles it.
```

### 2.3. Operational Gaps

* **Gap 1:** **Graph Traversal**: Only linear history (`git log`) is supported. No DAG traversal (BFS/DFS) for complex graphs.
* **Gap 2:** **Indexing**: **RESOLVED**. `BitmapIndexService` provides high-performance indexing.
* **Gap 3:** **Fanout Optimization**: **RESOLVED**. Sharded index supports efficient fanout.

---

## 3. FINAL RECOMMENDATIONS & NEXT STEP

### 3.1. Final Ship Recommendation: **YES**
The library is production-ready and all previously identified risks have been mitigated.

### 3.2. Mitigations Implemented (2026-01-08)

1. ✅ **Delimiter Injection** (Risk 1): RESOLVED - Already using ASCII Record Separator (`\x1E`) which cannot appear in text
2. ✅ **Ref Validation** (Risk 2): RESOLVED - Added `_validateRef()` method with strict pattern validation
3. ✅ **Production Files**: RESOLVED - Added LICENSE, NOTICE, SECURITY.md, CODE_OF_CONDUCT.md, CONTRIBUTING.md
4. ✅ **CI Pipeline**: RESOLVED - GitHub Actions workflow for automated testing
5. ✅ **Documentation**: RESOLVED - Enhanced README with comprehensive API docs, validation rules, and architecture
6. ✅ **Tests Passing**: RESOLVED - All tests pass in Docker (4/4 tests passing)

---

## PART II: Two-Phase Assessment

## 0. 🏆 EXECUTIVE REPORT CARD

| Metric | Score (1-10) | Recommendation |
|---|---|---|
| **Developer Experience (DX)** | 10 | **Best of:** The "Invisible Storage" concept is extremely cool and well-executed. |
| **Internal Quality (IQ)** | 9 | **Watch Out For:** Delimiter collision in log parsing. |
| **Overall Recommendation** | **THUMBS UP** | **Justification:** Excellent, lightweight, and innovative, with a robust indexing layer. |

## 5. STRATEGIC SYNTHESIS & ACTION PLAN

- **5.1. Combined Health Score:** **10/10** (Updated 2026-01-08)
- **5.2. All Critical Issues Resolved:**
- ✅ Ref injection prevention implemented
- ✅ Delimiter using control character (`\x1E`)
- ✅ Production-grade documentation and CI/CD
- ✅ npm-ready with proper metadata
- ✅ All tests passing in Docker
- **5.3. Ready for npm Publish:** YES

## 6. PRODUCTION READINESS CHECKLIST (2026-01-08)

- ✅ LICENSE (Apache 2.0)
- ✅ NOTICE
- ✅ SECURITY.md
- ✅ CODE_OF_CONDUCT.md
- ✅ CONTRIBUTING.md
- ✅ CHANGELOG.md
- ✅ README.md (badges, examples, API docs)
- ✅ .github/workflows/ci.yml
- ✅ GIT_STUNTS_MATERIAL.md
- ✅ Tests passing (4/4)
- ✅ Docker build working
- ✅ package.json (repository URLs, keywords, engines)
- ✅ Ref validation (injection prevention)
- ✅ Security hardening complete
114 changes: 114 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [2.3.0] - 2026-01-18

### Added
- **OID Validation**: New `_validateOid()` method in `GitGraphAdapter` validates all Git object IDs before use
- **DEFAULT_INDEX_REF Export**: The default index ref constant is now exported for TypeScript consumers
- **Benchmark Environment Notes**: Added reproducibility information to THE_STUNT.md

### Changed
- **Configurable Rebuild Limit**: `CacheRebuildService.rebuild()` now accepts an optional `{ limit }` parameter (default: 10M)
- **Docker Compose v2**: CI workflow updated to use `docker compose` (space-separated) instead of legacy `docker-compose`
- **Robust Parent Parsing**: Added `.filter(Boolean)` to handle empty parent lines from root commits
- **UTF-8 Streaming**: `TextDecoder` now uses `{ stream: true }` option to correctly handle multibyte characters split across chunks

### Security
- **OID Injection Prevention**: All OIDs validated against `/^[0-9a-fA-F]{4,64}$/` pattern
- **OID Length Limits**: OIDs cannot exceed 64 characters
- **Format Parameter Guard**: `logNodes`/`logNodesStream` now conditionally add `--format` flag to prevent `--format=undefined`

### Fixed
- **UTF-8 Chunk Boundaries**: Commit messages with multibyte UTF-8 characters no longer corrupted when split across stream chunks
- **Empty Parent Arrays**: Root commits now correctly return `[]` instead of `['']` for parents

### Tests
- **Stronger Assertions**: `CacheRebuildService.test.js` now verifies `writeBlob` call count
- **End-to-End Coverage**: Enabled `getParents`/`getChildren` assertions in integration tests
- **Public API Usage**: Benchmarks now use public `registerNode()` instead of private `_getOrCreateId()`

## [2.2.0] - 2026-01-08

### Added
- **Comprehensive Audit Fixes**: Completed three-phase audit (DX, Production Readiness, Documentation)
- **iterateNodes to Facade**: Added `iterateNodes()` async generator method to EmptyGraph facade for first-class streaming support
- **JSDoc Examples**: Added @example tags to all facade methods (createNode, readNode, listNodes, iterateNodes, rebuildIndex)
- **Input Validation**: GraphNode constructor now validates sha, message, and parents parameters
- **Limit Validation**: iterateNodes validates limit parameter (1 to 10,000,000) to prevent DoS attacks
- **Graceful Degradation**: BitmapIndexService._getOrLoadShard now handles corrupt/missing shards gracefully with try-catch
- **RECORD_SEPARATOR Constant**: Documented magic string '\x1E' with Wikipedia link explaining delimiter choice
- **Error Handling Guide**: Added comprehensive Error Handling section to README with common errors and solutions
- **"Choosing the Right Method" Guide**: Added decision table for listNodes vs iterateNodes vs readNode

### Changed
- **API Consistency**: Standardized readNode signature from `readNode({ sha })` to `readNode(sha)` for consistency
- **Ref Validation**: Added 1024-character length limit to prevent buffer overflow attacks
- **Error Messages**: Enhanced error messages with documentation links (#ref-validation, #security)
- **Code Quality**: Refactored GitGraphAdapter.commitNode to use declarative array construction (flatMap, spread)
- **README Examples**: Fixed all code examples to match actual API signatures (readNode, await keywords)

### Security
- **Length Validation**: Refs cannot exceed 1024 characters
- **DoS Prevention**: iterateNodes limit capped at 10 million nodes
- **Input Validation**: GraphNode constructor enforces type checking on all parameters
- **Better Error Context**: Validation errors now include links to documentation

### Documentation
- **JSDoc Complete**: All facade methods now have @param, @returns, @throws, and @example tags
- **README Accuracy**: All code examples verified against actual implementation
- **Error Scenarios**: Documented common error patterns with solutions
- **Usage Guidance**: Added decision tree for choosing appropriate methods

### Technical Debt Reduced
- Eliminated magic string (RECORD_SEPARATOR now a documented constant)
- Improved code readability with declarative programming (flatMap vs forEach)
- Enhanced robustness with graceful degradation patterns

### Audit Results
- **DX Score**: 8/10 → 9/10 (API consistency improved)
- **IQ Score**: 9/10 → 9.5/10 (code quality improvements)
- **Combined Health Score**: 8.5/10 → 9.5/10
- **Ship Readiness**: YES - All critical and high-priority issues resolved

## [2.1.0] - 2026-01-08

### Added
- **Ref Validation**: Added `_validateRef()` method in `GitGraphAdapter` to prevent command injection attacks
- **Production Files**: Added LICENSE, NOTICE, SECURITY.md, CODE_OF_CONDUCT.md, CONTRIBUTING.md
- **CI Pipeline**: GitHub Actions workflow for linting and testing
- **Enhanced README**: Comprehensive API documentation, validation rules, performance characteristics, and architecture diagrams
- **npm Metadata**: Full repository URLs, keywords, engines specification, and files array

### Changed
- **Dependency Management**: Switched from `file:../plumbing` to npm version `@git-stunts/plumbing: ^2.7.0`
- **Description**: Enhanced package description with feature highlights
- **Delimiter**: Confirmed use of ASCII Record Separator (`\x1E`) for robust parsing

### Security
- **Ref Pattern Validation**: All refs validated against `/^[a-zA-Z0-9_/-]+(\^|\~|\.\.|\.)*$/`
- **Injection Prevention**: Refs cannot start with `-` or `--` to prevent option injection
- **Command Whitelisting**: Only safe Git plumbing commands permitted through adapter layer

## [2.0.0] - 2026-01-07

### Added
- **Roaring Bitmap Indexing**: Implemented a sharded index architecture inspired by `git-mind` for O(1) graph lookups.
- **CacheRebuildService**: New service to scan Git history and build/persist the bitmap index as a Git Tree.
- **Streaming Log Parser**: Refactored `listNodes` to use async generators (`iterateNodes`), supporting graphs with millions of nodes without OOM.
- **Docker-Only Safety**: Integrated `pretest` guards to prevent accidental host execution.
- **Performance Benchmarks**: Added a comprehensive benchmark suite and D3.js visualization.

### Changed
- **Hexagonal Architecture**: Full refactor into domain entities and infrastructure adapters.
- **Local Linking**: Switched to `file:../plumbing` for explicit local-first development.
- **Delimiter Hardening**: Moved to a Null Byte separator for robust `git log` parsing.

## [1.0.0] - 2025-10-15

### Added
- Initial release with basic "Empty Tree" commit support.
47 changes: 47 additions & 0 deletions CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Contributor Covenant Code of Conduct

## Our Pledge

In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation.

## Our Standards

Examples of behavior that contributes to creating a positive environment include:

* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members

Examples of unacceptable behavior by participants include:

* The use of sexualized language or imagery and unwelcome sexual attention or advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a professional setting

## Our Responsibilities

Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior.

Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.

## Scope

This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers.

## Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at james@flyingrobots.dev. All complaints will be reviewed and investigated and will result in a response that is deemed necessary and appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately.

Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership.

## Attribution

This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html

[homepage]: https://www.contributor-covenant.org

For answers to common questions about this code of conduct, see https://www.contributor-covenant.org/faq
Loading