Skip to content

Optimize rebase/squash rewrite performance and expand regression coverage#515

Open
svarlamov wants to merge 3 commits intofeat/corehooksfrom
feat/rebase-perf-feb-11-1
Open

Optimize rebase/squash rewrite performance and expand regression coverage#515
svarlamov wants to merge 3 commits intofeat/corehooksfrom
feat/rebase-perf-feb-11-1

Conversation

@svarlamov
Copy link
Member

@svarlamov svarlamov commented Feb 13, 2026

Summary

This PR substantially reduces git-ai overhead for rewrite workflows (rebase, merge --squash, and related post-commit paths), while tightening correctness with targeted regression tests.

The work is focused on first-principles performance improvements:

  • reduce total work done,
  • reduce number of git subprocesses,
  • avoid content hydration unless strictly necessary,
  • preserve existing authorship semantics.

Problem

In large repos and rewrite-heavy flows (Graphite stacks, large generated diffs, long commit ranges), rewrite-related operations were dominated by:

  • many small sequential git calls,
  • broad-path scans instead of staged/changed subsets,
  • expensive attribution/materialization on paths that cannot affect AI output,
  • avoidable per-commit/per-file note/path parsing.

This led to multi-minute stalls in real rebase/squash scenarios.

Design/Approach

1) Shrink worksets to AI-relevant + staged/changed paths

  • Squash prep now starts from actual staged files after merge --squash instead of broad branch diffs.
  • Path sets are narrowed to files touched by AI notes in relevant commit ranges.
  • Small staged-set optimization bypasses expensive source-side file extraction when safe.

2) Batch git object reads

  • Added batched content readers using git cat-file --batch:
    • Repository::get_files_content_at_commit
    • optimized Repository::get_all_staged_files_content
  • Rebase/squash and traversal code now hydrate blobs in bulk rather than per-file calls.

3) Fast-paths for no-op / metadata-only scenarios

  • Rebase note remap fast path for unchanged tracked blobs.
  • Squash prep direct-reuse path when target AI side is empty and staged content matches source AI state.
  • Metadata-only note remap retained where no AI-touched files need full rewrite.

4) Avoid unnecessary heavy attribution/materialization

  • VirtualAttributions now supports line-only/initial-only loaders for post-commit paths:
    • from_just_working_log_line_only
    • from_initial_only_line_only
  • This avoids eager file-content + char-range hydration where line-level data is sufficient.
  • Blame path adds machine-mode fast behavior (skip presentation-only human-author hydration when not needed).

5) Squash pre-commit checkpoint skip handshake

  • Squash prep stores staged index tree OID marker.
  • Pre-commit checkpoint fast path skips when tree is unchanged and INITIAL already contains needed AI attribution.
  • Post-commit consumes INITIAL directly in this path.
  • Marker lifecycle is cleaned/reset correctly.

Correctness & Semantics

Key invariant preserved:

  • We do not change authorship semantics to chase speed.

Notable fix included:

  • Restored virtual-attribution file tracking semantics so files are not dropped just because blamed AI lines are empty (this previously caused note-loss regression in certain rebase cases).

Test Coverage Added

New edge/regression tests cover:

  • squash pre-commit skip marker match/mismatch behavior,
  • squash marker persistence + cleanup lifecycle,
  • post-commit filtering predicate correctness for human/AI/override paths,
  • line-attribution compression merge boundaries,
  • VirtualAttributions initial-only and message-clear behavior,
  • authorship traversal parser robustness (quoted paths, truncated batch payload),
  • batched repository readers and index-tree OID behavior.

Representative test additions:

  • src/commands/checkpoint.rs
  • src/git/repo_storage.rs
  • src/authorship/post_commit.rs
  • src/authorship/rebase_authorship.rs
  • src/authorship/virtual_attribution.rs
  • src/git/authorship_traversal.rs
  • src/git/repository.rs

Bench/Diagnostics

  • Added benchmark harness for heavy squash scenarios:
    • scripts/benchmarks/git/benchmark_nasty_squashes.sh
  • Added/expanded perf logging around rewrite/post-commit hot paths to make bottlenecks explicit.

Validation

Ran full suite successfully:

  • cargo test -- --test-threads=1

No test failures.

Scope

Primary touched areas:

  • rewrite orchestration (rebase_authorship, squash prep/rewrite)
  • virtual attribution loading/conversion paths
  • blame/traversal batching optimizations
  • checkpoint/post-commit fast paths
  • repository/storage helpers for batch reads and squash markers

Open with Devin

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 10 additional findings in Devin Review.

Open in Devin Review

@git-ai-cloud-dev
Copy link

git-ai-cloud-dev bot commented Feb 13, 2026

Stats powered by Git AI

🧠 you    ████████████████░░░░  82%
🤖 ai     ░░░░░░░░░░░░░░░░████  18%
More stats
  • 0.0 lines generated for every 1 accepted
  • 0 seconds waiting for AI
  • Top model: codex::gpt-5.3-codex (804 accepted lines, 7 generated lines)

AI code tracked with git-ai

@CLAassistant
Copy link

CLAassistant commented Feb 13, 2026

CLA assistant check
All committers have signed the CLA.

@git-ai-cloud
Copy link

git-ai-cloud bot commented Feb 13, 2026

Stats powered by Git AI

🧠 you    ████████████████████  100%
🤖 ai     ░░░░░░░░░░░░░░░░░░░░  0%
More stats
  • 0.0 lines generated for every 1 accepted
  • 0 seconds waiting for AI

AI code tracked with git-ai

@svarlamov svarlamov force-pushed the feat/rebase-perf-feb-11-1 branch from 3864488 to 313ae9f Compare February 14, 2026 19:18
@svarlamov svarlamov changed the base branch from main to feat/corehooks February 14, 2026 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants