Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
203 changes: 203 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,203 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

git-ai is a local code understanding tool that builds a semantic layer for codebases using advanced RAG techniques. It combines vector search (LanceDB) with graph-based analysis (CozoDB) to enable AI Agents to deeply understand code structure and relationships beyond simple text search.

**Key Design Principle**: Indices travel with code in Git repos—checkout, branch, or tag any version and the semantic index is immediately available without rebuilding.

## Development Commands

```bash
# Build
npm run build # Compile TypeScript to dist/

# Development run
npm run start -- --help # Run directly with ts-node

# Testing
npm test # Full test suite (build + E2E)
npm run test:cli # CLI-specific tests
npm run test:parser # Parser verification

# Global install for local testing
npm i -g .
```

**Important**: After building, test with the compiled CLI to verify packaging:
```bash
node dist/bin/git-ai.js --help
```

## Architecture Overview

### Three-Layer Architecture

```
CLI Layer (src/cli/)
Core Layer (src/core/)
Data Layer (LanceDB + CozoDB)
```

**CLI Layer** (`src/cli/`):
- **Commands**: Commander.js command definitions in `cli/commands/`
- **Handlers**: Business logic in `cli/handlers/` (one per command type)
- **Schemas**: Zod validation schemas in `cli/schemas/`
- **Types**: CLI-specific types and the `executeHandler` wrapper in `cli/types.ts`

**Core Layer** (`src/core/`):
- **indexer.ts / indexerIncremental.ts**: Parallel indexing with worker pools
- **lancedb.ts**: Vector database (SQ8-quantized embeddings)
- **cozo.ts / astGraph.ts**: Graph database for AST relationships
- **parser.ts**: Tree-sitter based multi-language parsing
- **embedding.ts**: ONNX-based semantic embeddings
- **search.ts**: Multi-strategy retrieval (vector + graph + hybrid)
- **repoMap.ts**: PageRank-based importance scoring

### Data Flow

**Indexing**: Source files → Tree-sitter AST → Embeddings + Symbol extraction → LanceDB (chunks) + CozoDB (refs)

**Search**: Query → Classification → Multi-strategy retrieval → Reranking → Results

### Standard CLI Output Format

All CLI commands output JSON for agent readability:

**Success**:
```json
{
"ok": true,
"command": "semantic",
"repoRoot": "/path/to/repo",
"timestamp": "2024-01-01T00:00:00Z",
"duration_ms": 123,
"data": { ... }
}
```

**Error**:
```json
{
"ok": false,
"reason": "index_not_found",
"message": "No semantic index found",
"command": "semantic",
"hint": "Run 'git-ai ai index --overwrite' to create an index"
}
```

See `src/cli/types.ts` for `CLIResult`, `CLIError`, `ErrorReasons`, and `ErrorHints`.

Comment on lines +67 to +95
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc section specifies that all CLI success responses include timestamp and duration_ms, but the current handler implementations (e.g., src/cli/handlers/queryFilesHandlers.ts) don’t appear to include these fields in their returned JSON—only in logs. Either update the implementation to match this documented output contract, or adjust the examples/claims here so they reflect actual CLI output.

Copilot uses AI. Check for mistakes.
## Key Files by Purpose

### Entry Points
- `bin/git-ai.ts`: Main CLI—proxies to git for non-AI commands, registers `ai` command
- `src/commands/ai.ts`: AI command registry (all `git-ai ai *` subcommands)

### Indexing System
- `src/core/indexer.ts`: Parallel indexing with HNSW vector index
- `src/core/indexerIncremental.ts`: Smart rebuild strategies
- `src/core/parser.ts`: Multi-language Tree-sitter adapters
- `src/core/embedding.ts`: ONNX runtime for local embeddings
- `src/core/lancedb.ts`: LanceDB management (chunks table)
- `src/core/sq8.ts`: Vector quantization for storage efficiency

### Search & Retrieval
- `src/core/search.ts`: Query classification and multi-strategy routing
- `src/core/symbolSearch.ts`: Symbol-based search functionality
- `src/core/astGraphQuery.ts`: Graph-based call relationship queries

### Graph Database
- `src/core/cozo.ts`: CozoDB interface (refs table)
- `src/core/astGraph.ts`: AST graph construction

### Repository Management
- `src/core/git.ts`: Git repository handling
- `src/core/workspace.ts`: Workspace path resolution
- `src/core/manifest.ts`: Index versioning and compatibility checking
- `src/core/indexCheck.ts`: Index validation

### Archive & Distribution
- `src/core/archive.ts`: Pack/unpack index archives (.git-ai/lancedb.tar.gz)
- `src/core/lfs.ts`: Git LFS integration for index storage

### MCP Server
- `src/mcp/server.ts`: MCP server implementation (stdio + HTTP modes)
- `src/mcp/handlers/`: MCP tool implementations
- `src/mcp/tools/`: MCP tool registry

## MCP Integration

The MCP Server enables AI Agents to query git-ai indices. All MCP tools require a `path` parameter to specify the target repository—no implicit repository selection for atomic operation.

**Two modes**:
- **stdio mode** (default): Single-agent connection
- **HTTP mode** (`--http`): Multiple concurrent agents with session management

## Language Support

Supported languages are in `src/core/parser.ts`:
- TypeScript/JavaScript (`.ts`, `.tsx`, `.js`, `.jsx`)
- Java (`.java`)
- Python (`.py`)
- Go (`.go`)
- Rust (`.rs`)
- C (`.c`, `.h`)
- Markdown (`.md`, `.mdx`)
- YAML (`.yml`, `.yaml`)

Each language has a separate LanceDB table with its own HNSW index.

## File Filtering

Indexing respects three filter mechanisms (priority order):
1. `.aiignore` - Highest priority, explicit exclusions
2. `.git-ai/include.txt` - Force-include overrides `.gitignore`
3. `.gitignore` - Standard Git ignore patterns

Pattern syntax: `**` (any dirs), `*` (any chars), `directory/` (entire dir)

## Testing

Tests are located in `test/` with multiple formats (`.test.mjs`, `.test.ts`, `.test.js`).

Run single tests with Node's native test runner:
```bash
node --test test/cliCommands.test.js
```

## Native Dependencies

This project uses native modules that may need build tools:
- `@lancedb/lancedb` - Vector database (platform-specific prebuilt binaries)
- `cozo-node` - Graph database
- `onnxruntime-node` - ONNX runtime
- `tree-sitter-*` - Language parsers

If native builds fail, ensure:
- Node.js >= 18
- Build tools installed (Windows: Visual Studio Build Tools, Linux: build-essential)

## Common Tasks

**Add a new CLI command**:
1. Create handler in `src/cli/handlers/yourHandler.ts`
2. Create Zod schema in `src/cli/schemas/` (optional)
3. Register in `src/cli/registry.ts`
4. Add Commander command in `src/cli/commands/yourCommand.ts`
5. Register in `src/commands/ai.ts`

**Add language support**:
1. Add Tree-sitter grammar in `package.json` dependencies
2. Extend `src/core/parser.ts` with new language adapter
3. Test with `npm run test:parser`

**Add MCP tool**:
1. Create handler in `src/mcp/handlers/`
2. Register in `src/mcp/tools/`
3. Export from `src/mcp/server.ts`
3 changes: 2 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@
"scripts": {
"build": "tsc",
"start": "ts-node bin/git-ai.ts",
"test": "npm run build && node dist/bin/git-ai.js ai index --overwrite && node --test test/*.test.mjs test/*.test.ts",
"test": "npm run build && node dist/bin/git-ai.js ai index --overwrite && node --test test/*.test.mjs test/*.test.ts test/*.test.js",
"test:cli": "bash test-cli.sh",
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

npm run test:cli is implemented via bash test-cli.sh, which will fail on Windows environments that don’t have bash available. Since this package declares win32 support in package.json, consider making the CLI test runner cross-platform (e.g., a small Node script) or integrating the CLI tests into the existing node --test ... invocation.

Suggested change
"test:cli": "bash test-cli.sh",
"test:cli": "node test-cli.js",

Copilot uses AI. Check for mistakes.
"test:parser": "ts-node test/verify_parsing.ts"
},
"files": [
Expand Down
12 changes: 9 additions & 3 deletions src/cli/handlers/queryFilesHandlers.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ import type { SearchFilesInput } from '../schemas/queryFilesSchemas';
import {
isCLIError,
buildRepoMapAttachment,
filterWorkspaceRowsByLang,
} from './sharedHelpers';

function escapeQuotes(s: string): string {
Expand Down Expand Up @@ -249,11 +248,18 @@ export async function handleSearchFiles(input: SearchFilesInput): Promise<CLIRes

const repoMap = input.withRepoMap ? await buildRepoMapAttachment(ctx.repoRoot, input) : undefined;

const files = rows.map(r => ({
path: String(r.file || ''),
symbol: String(r.symbol || ''),
kind: String(r.kind || ''),
lang: String(r.lang || ''),
}));

return success({
repoRoot: ctx.repoRoot,
count: rows.length,
count: files.length,
lang: input.lang,
rows,
files,
...(repoMap ? { repo_map: repoMap } : {}),
Comment on lines 259 to 263

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Normalize workspace query-files payload to files

This branch now returns files, but handleSearchFiles still returns rows when inferWorkspaceRoot(repoRoot) is true (return success({ ...res, rows, ... }) in the workspace path), so the response schema depends on repo type. In manifest workspaces, clients that adopted the new files[].path contract will break even though the same command was called; both branches should emit the same top-level shape (files/count) to keep the CLI API consistent.

Useful? React with 👍 / 👎.

});
Comment on lines 258 to 264
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This handler logs duration_ms, and the PR description/CLAUDE.md describe timestamps + duration being included in all CLI JSON outputs, but the returned success({...}) payload here doesn’t include timestamp or duration_ms. If the standardized output format is part of this PR, add those fields to the returned JSON (or ensure a shared wrapper injects them consistently).

Copilot uses AI. Check for mistakes.
} catch (e) {
Expand Down
8 changes: 4 additions & 4 deletions src/core/lfs.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import { spawnSync } from 'child_process';

function runGit(args: string[], cwd: string) {
const res = spawnSync('git', args, { cwd, stdio: 'inherit' });
function runGit(args: string[], cwd: string, silent: boolean = false) {
const res = spawnSync('git', args, { cwd, stdio: silent ? 'ignore' : 'inherit' });
if (res.status !== 0) throw new Error(`git ${args.join(' ')} failed`);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ WARNING: silent 模式下错误信息丢失

当 silent=true 时,spawnSync 的 stderr 输出被忽略。如果 git 命令失败,用户看不到任何错误原因,只能收到一个模糊的错误消息。这会使调试变得困难。

建议: 在抛出错误时,包含 res.error 和 res.stderr 的信息,以便在 silent 模式下也能提供有意义的错误诊断

Suggested change
if (res.status !== 0) throw new Error(`git ${args.join(' ')} failed`);
if (res.status !== 0) {
const msg = res.error?.message || res.stderr?.toString() || '';
throw new Error(`git ${args.join(' ')} failed: ${msg}`);
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 SUGGESTION: 未检查 res.error 字段

spawnSync 返回的 res.error 字段(当 spawn 本身失败时,如命令不存在)没有被检查。如果 git 不在 PATH 中,res.status 可能为 null,而错误信息丢失。

建议: 在检查 status 之前先检查 res.error

Suggested change
if (res.status !== 0) throw new Error(`git ${args.join(' ')} failed`);
if (res.error) throw new Error(`git spawn failed: ${res.error.message}`);
if (res.status !== 0) throw new Error(`git ${args.join(' ')} failed`);

Comment on lines +4 to 5
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When silent is true, spawnSync uses stdio: 'ignore', so any git error output is discarded. If the command fails, the thrown error (git ... failed) provides no stderr/stdout context, which makes diagnosing LFS tracking failures difficult. Consider using stdio: 'pipe' (or at least capturing stderr) in silent mode and include the captured output in the thrown error message while still suppressing normal console noise.

Suggested change
const res = spawnSync('git', args, { cwd, stdio: silent ? 'ignore' : 'inherit' });
if (res.status !== 0) throw new Error(`git ${args.join(' ')} failed`);
const res = spawnSync('git', args, { cwd, stdio: silent ? 'pipe' : 'inherit' });
if (res.status !== 0) {
let details = '';
if (res.stdout) {
const stdout = res.stdout.toString().trim();
if (stdout) details += `\nstdout:\n${stdout}`;
}
if (res.stderr) {
const stderr = res.stderr.toString().trim();
if (stderr) details += `\nstderr:\n${stderr}`;
}
throw new Error(`git ${args.join(' ')} failed${details}`);
}

Copilot uses AI. Check for mistakes.
}

Expand All @@ -18,7 +18,7 @@ export function isGitLfsInstalled(cwd: string): boolean {

export function ensureLfsTracking(cwd: string, pattern: string): { tracked: boolean } {
if (!isGitLfsInstalled(cwd)) return { tracked: false };
runGit(['lfs', 'track', pattern], cwd);
runGit(['add', '.gitattributes'], cwd);
runGit(['lfs', 'track', pattern], cwd, true);
runGit(['add', '.gitattributes'], cwd, true);
return { tracked: true };
}
7 changes: 7 additions & 0 deletions test-cli.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/bin/bash
if [ -f test/cliCommands.test.js ]; then
npm run build && node --test test/cliCommands.test.js
else
echo "cliCommands.test.js not found (skipping CLI tests)"
exit 0
fi
7 changes: 3 additions & 4 deletions test/e2e.test.js
Original file line number Diff line number Diff line change
Expand Up @@ -86,10 +86,9 @@ test('git-ai works in Spring Boot and Vue repos', async () => {
runOk('node', [CLI, 'ai', 'agent', 'install'], repo);
assert.ok(runOk('node', [CLI, 'ai', 'agent', 'install', '--overwrite'], repo).status === 0);
{
const skill = await fs.readFile(path.join(repo, '.agents', 'skills', 'git-ai-mcp', 'SKILL.md'), 'utf-8');
const rule = await fs.readFile(path.join(repo, '.agents', 'rules', 'git-ai-mcp', 'RULE.md'), 'utf-8');
assert.ok(skill.includes('git-ai-mcp'));
assert.ok(rule.includes('git-ai-mcp'));
// git-ai-code-search has SKILL.md but no RULE.md, so only check SKILL
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ WARNING: 测试覆盖范围缩小

原始代码同时验证 SKILL.md 和 RULE.md 两个文件是否存在且包含正确内容,修改后仅验证 SKILL.md。如果 git-ai-code-search 应该同时具有 SKILL 和 RULE 组件,则测试验证不完整。

建议: 确认 git-ai-code-search 确实不需要 RULE.md 文件。如果确实不需要,建议添加注释说明这是预期行为而非遗漏。如果未来可能需要 RULE,建议保留原始断言或添加明确的 TODO 注释

Suggested change
// git-ai-code-search has SKILL.md but no RULE.md, so only check SKILL
// 确认 git-ai-code-search 确实不需要 RULE.md
// TODO: 如果未来添加 RULE.md,需要恢复 RULE 断言
const skill = await fs.readFile(...)

const skill = await fs.readFile(path.join(repo, '.agents', 'skills', 'git-ai-code-search', 'SKILL.md'), 'utf-8');
assert.ok(skill.includes('git-ai-code-search'), 'git-ai-code-search skill should be installed');
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 SUGGESTION: 文件读取缺少错误处理

fs.readFile 调用没有 try-catch 包裹,如果文件不存在或读取失败,测试会抛出未处理的 Promise 拒绝而非清晰的断言失败

建议: 使用 try-catch 或断言库的 error expectation 处理可能的文件读取错误

Suggested change
assert.ok(skill.includes('git-ai-code-search'), 'git-ai-code-search skill should be installed');
try {
const skill = await fs.readFile(path.join(repo, '.agents', 'skills', 'git-ai-code-search', 'SKILL.md'), 'utf-8');
assert.ok(skill.includes('git-ai-code-search'), 'git-ai-code-search skill should be installed');
} catch (err) {
assert.fail(`Failed to read SKILL.md: ${err.message}`);
}

}
runOk('git', ['add', '.git-ai/meta.json', '.git-ai/lancedb.tar.gz'], repo);
runOk('git', ['commit', '-m', 'add git-ai index'], repo);
Expand Down
Loading
Loading