feat(replicache): Add bulk insertion optimization with putMany by arv · Pull Request #5380 · rocicorp/mono

arv · 2025-12-28T10:45:49Z

feat(replicache): Add bulk insertion optimization with putMany

Overview

This PR adds bulk insertion optimization to Replicache's BTree and database layer, significantly improving performance for large batch operations like sync patches.

Changes

Core BTree Changes

packages/replicache/src/btree/node.ts

Added putMany() method to DataNodeImpl for efficient merging of sorted entries
Added putMany() method to InternalNodeImpl with child grouping and rebalancing
Added putManyMergeAndPartition() helper for node rebalancing during bulk operations
Extracted binarySearchFrom() to enable optimized searching from a start index
Refactored readTreeData() to accept getEntrySize parameter (test helper improvement)

packages/replicache/src/btree/write.ts

Added BTreeWrite.putMany() method with two optimized paths:
- Fast path: Bulk loads empty trees bottom-up using optimal partitioning
- Slow path: Merges entries into existing trees with efficient rebalancing
Validates entries are sorted and converts to sized entries in a single pass
Reuses arrays to minimize allocations during tree construction

Database Layer

packages/replicache/src/db/write.ts

Added Write.putMany() method that delegates to BTreeWrite.putMany()
Handles index updates for all entries before bulk insertion
Maintains compatibility with existing put() semantics

Sync Layer Optimization

packages/replicache/src/sync/patch.ts

Added optimizePatch() function to eliminate redundant operations:
- Drops operations before the last clear
- Merges consecutive operations on the same key
- Removes pointless del operations after clear
- Sorts operations by key for optimal bulk loading
Modified apply() to use optimized patches with bulk loading
Extracted mergeUpdate() helper for update operation handling
Added bulkLoadPuts() to handle consecutive put operations efficiently

Performance Impact

The optimization targets common sync patterns:

Initial sync: Loading thousands of entries into an empty tree
Large patches: Applying batches of updates from the server
Snapshot application: Replacing entire datasets

Benchmark Results

Comparison of putMany() vs sequential put() operations:

Scenario	Entries	Value Size	Speedup
Empty tree	100	small	3.36x
Empty tree	100	large	1.49x
Empty tree	1,000	small	5.30x
Empty tree	1,000	large	1.58x
Empty tree	10,000	small	4.15x
Empty tree	10,000	large	1.13x
Construction only	10,000	small	53.73x
Update existing	1,000	mixed	4.47x

Key findings:

Small values show 3-5x speedup consistently
Construction-only (no flush) shows dramatic 53x improvement
Large values benefit less due to serialization overhead
Updating existing entries shows 4.5x improvement

Additional benefits:

Reduces chunk writes through optimal tree construction
Minimizes redundant operations through patch optimization

Testing

New test files:

packages/replicache/src/btree/write.bench.ts - Performance benchmarks comparing sequential put() vs putMany()
Extensive test coverage in:
- packages/replicache/src/btree/node.test.ts - 17 new tests for putMany() behavior
- packages/replicache/src/db/write.test.ts - 3 new tests for database-level putMany()
- packages/replicache/src/sync/patch.test.ts - 24 new tests for patch optimization

Test scenarios covered:

Empty tree bulk loading
Merging with existing entries
Tree rebalancing and partitioning
Index updates
Update operation merging
Patch optimization edge cases

Compatibility

No breaking changes to public APIs
Existing put() and del() methods remain unchanged
putMany() is an additive optimization that can be adopted incrementally
Works with both FormatVersion.V6 and FormatVersion.V7

Implementation Details

Key Algorithm Improvements

Bottom-up tree construction: When building from scratch, constructs the optimal tree structure in a single pass
Batch rebalancing: Groups entries by affected child node and rebalances once per group
Restricted binary search: Uses previous search results to narrow search ranges for sorted input
Patch deduplication: Eliminates redundant operations before applying to the tree

Memory Efficiency

Reuses arrays during tree construction to minimize allocations
Mutates entries in-place during tree node creation (for immutable node pattern)
Batch processes operations to reduce intermediate tree states

Future Work

Consider adding delMany() for bulk deletions
Explore parallel index updates for large batches

Add efficient bulk insertion methods (putMany) to BTree and database layers, significantly improving performance for large batch operations like sync patches. Core Changes - Add putMany() to BTreeWrite with fast path for empty trees and slow path for merging - Add putMany() to DataNodeImpl and InternalNodeImpl for node-level bulk operations - Add Write.putMany() in database layer with index update support - Add optimizePatch() to eliminate redundant operations in sync patches - Extract binarySearchFrom() to enable optimized searching from start index Performance Benchmark results (putMany vs sequential put): - **100 entries (small values)**: 3.36x faster - **1,000 entries (small values)**: 5.30x faster - **10,000 entries (small values)**: 4.15x faster - **Construction only (10,000 entries)**: 53.73x faster - **Update operations (1,000 entries)**: 4.47x faster Additional benefits: - Reduces chunk writes through optimal tree construction - Minimizes redundant operations through patch optimization Testing - Add comprehensive test suite covering bulk operations, rebalancing, and edge cases - Add performance benchmarks comparing sequential put() vs putMany() - Add patch optimization tests with 24 scenarios Implementation Details Fast path (empty tree): - Builds tree bottom-up using optimal partitioning - Constructs ideal tree structure in single pass - Reuses arrays to minimize allocations Slow path (existing tree): - Groups entries by affected child nodes - Performs batch rebalancing per group - Uses restricted binary search for sorted input Patch optimization: - Drops operations before last clear - Merges consecutive operations on same key - Removes pointless deletes after clear - Sorts operations for optimal bulk loading No breaking changes. Additive optimization compatible with V6 and V7 formats.

vercel · 2025-12-28T10:45:54Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Review	Updated (UTC)
replicache-docs	Ready	Preview, Comment	Jan 23, 2026 10:44am
zbugs	Ready	Preview, Comment	Jan 23, 2026 10:44am

github-actions · 2025-12-28T10:48:40Z

Bencher Report

Branch	arv/basic-repl-btree-opt
Testbed	self-hosted

Click to view all benchmark results

Benchmark	Throughput	Benchmark Result operations / second (ops/s) x 1e3 (Result Δ%)	Lower Boundary operations / second (ops/s) x 1e3 (Limit %)
src/client/custom.bench.ts > big schema	📈 view plot 🚷 view threshold	146.36 ops/s x 1e3 (+7.26%) Baseline: 136.45 ops/s x 1e3	120.72 ops/s x 1e3 (82.48%)
src/client/zero.bench.ts > basics > All 1000 rows x 10 columns (numbers)	📈 view plot 🚷 view threshold	2.67 ops/s x 1e3 (+9.88%) Baseline: 2.43 ops/s x 1e3	2.13 ops/s x 1e3 (79.72%)
src/client/zero.bench.ts > pk compare > pk = N	📈 view plot 🚷 view threshold	68.64 ops/s x 1e3 (+9.82%) Baseline: 62.51 ops/s x 1e3	54.62 ops/s x 1e3 (79.57%)
src/client/zero.bench.ts > with filter > Lower rows 500 x 10 columns (numbers)	📈 view plot 🚷 view threshold	4.04 ops/s x 1e3 (+8.08%) Baseline: 3.74 ops/s x 1e3	3.29 ops/s x 1e3 (81.60%)

🐰 View full continuous benchmarking report in Bencher

github-actions · 2025-12-28T10:52:35Z

Bencher Report

Branch	arv/basic-repl-btree-opt
Testbed	self-hosted

Click to view all benchmark results

Benchmark	Throughput	Benchmark Result operations / second (ops/s) (Result Δ%)	Lower Boundary operations / second (ops/s) (Limit %)
1 exists: track.exists(album)	📈 view plot 🚷 view threshold	13,248.78 ops/s (-2.35%) Baseline: 13,568.11 ops/s	11,726.40 ops/s (88.51%)
10 exists (AND)	📈 view plot 🚷 view threshold	217,268.61 ops/s (+10.23%) Baseline: 197,108.58 ops/s	165,059.45 ops/s (75.97%)
10 exists (OR)	📈 view plot 🚷 view threshold	3,970.80 ops/s (+1.57%) Baseline: 3,909.60 ops/s	3,377.88 ops/s (85.07%)
12 exists (AND)	📈 view plot 🚷 view threshold	194,538.65 ops/s (+12.19%) Baseline: 173,394.43 ops/s	144,732.98 ops/s (74.40%)
12 exists (OR)	📈 view plot 🚷 view threshold	3,256.62 ops/s (-2.02%) Baseline: 3,323.87 ops/s	2,860.63 ops/s (87.84%)
12 level nesting	📈 view plot 🚷 view threshold	2,925.18 ops/s (+1.35%) Baseline: 2,886.12 ops/s	2,481.14 ops/s (84.82%)
2 exists (AND): track.exists(album).exists(genre)	📈 view plot 🚷 view threshold	5,089.22 ops/s (-0.21%) Baseline: 5,100.05 ops/s	4,416.91 ops/s (86.79%)
3 exists (AND)	📈 view plot 🚷 view threshold	1,961.40 ops/s (-1.67%) Baseline: 1,994.77 ops/s	1,744.00 ops/s (88.92%)
3 exists (OR)	📈 view plot 🚷 view threshold	980.65 ops/s (-1.63%) Baseline: 996.91 ops/s	864.97 ops/s (88.20%)
5 exists (AND)	📈 view plot 🚷 view threshold	310.12 ops/s (-1.19%) Baseline: 313.86 ops/s	272.66 ops/s (87.92%)
5 exists (OR)	📈 view plot 🚷 view threshold	162.39 ops/s (-1.61%) Baseline: 165.04 ops/s	142.66 ops/s (87.85%)
Nested 2 levels: track > album > artist	📈 view plot 🚷 view threshold	4,454.47 ops/s (-0.11%) Baseline: 4,459.38 ops/s	3,875.96 ops/s (87.01%)
Nested 4 levels: playlist > tracks > album > artist	📈 view plot 🚷 view threshold	735.22 ops/s (+0.31%) Baseline: 732.96 ops/s	640.11 ops/s (87.06%)
Nested with filters: track > album > artist (filtered)	📈 view plot 🚷 view threshold	3,625.67 ops/s (-2.60%) Baseline: 3,722.41 ops/s	3,255.74 ops/s (89.80%)
planned: playlist.exists(tracks)	📈 view plot 🚷 view threshold	659.53 ops/s (+8.05%) Baseline: 610.41 ops/s	540.29 ops/s (81.92%)
planned: track.exists(album) OR exists(genre)	📈 view plot 🚷 view threshold	172.00 ops/s (+5.35%) Baseline: 163.27 ops/s	146.14 ops/s (84.97%)
planned: track.exists(album) where title="Big Ones"	📈 view plot 🚷 view threshold	7,889.80 ops/s (+6.05%) Baseline: 7,439.85 ops/s	6,655.42 ops/s (84.35%)
planned: track.exists(album).exists(genre)	📈 view plot 🚷 view threshold	42.07 ops/s (+9.24%) Baseline: 38.52 ops/s	33.95 ops/s (80.70%)
planned: track.exists(album).exists(genre) with filters	📈 view plot 🚷 view threshold	5,701.89 ops/s (+8.87%) Baseline: 5,237.13 ops/s	4,649.59 ops/s (81.54%)
planned: track.exists(playlists)	📈 view plot 🚷 view threshold	4.25 ops/s (+8.07%) Baseline: 3.94 ops/s	3.50 ops/s (82.31%)
unplanned: playlist.exists(tracks)	📈 view plot 🚷 view threshold	641.86 ops/s (+8.09%) Baseline: 593.84 ops/s	525.58 ops/s (81.88%)
unplanned: track.exists(album) OR exists(genre)	📈 view plot 🚷 view threshold	47.81 ops/s (+9.15%) Baseline: 43.80 ops/s	38.31 ops/s (80.13%)
unplanned: track.exists(album) where title="Big Ones"	📈 view plot 🚷 view threshold	59.80 ops/s (+8.46%) Baseline: 55.14 ops/s	48.85 ops/s (81.68%)
unplanned: track.exists(album).exists(genre)	📈 view plot 🚷 view threshold	41.70 ops/s (+8.70%) Baseline: 38.36 ops/s	33.98 ops/s (81.49%)
unplanned: track.exists(album).exists(genre) with filters	📈 view plot 🚷 view threshold	58.18 ops/s (+7.88%) Baseline: 53.93 ops/s	48.07 ops/s (82.63%)
unplanned: track.exists(playlists)	📈 view plot 🚷 view threshold	4.20 ops/s (+6.81%) Baseline: 3.93 ops/s	3.50 ops/s (83.44%)
zpg: all playlists	📈 view plot 🚷 view threshold	5.83 ops/s (+4.60%) Baseline: 5.58 ops/s	5.06 ops/s (86.71%)
zql: all playlists	📈 view plot 🚷 view threshold	8.30 ops/s (+13.15%) Baseline: 7.34 ops/s	6.23 ops/s (75.01%)
zql: edit for limited query, inside the bound	📈 view plot 🚷 view threshold	236,243.98 ops/s (+14.73%) Baseline: 205,921.66 ops/s	176,733.26 ops/s (74.81%)
zql: edit for limited query, outside the bound	📈 view plot 🚷 view threshold	241,826.86 ops/s (+16.12%) Baseline: 208,254.49 ops/s	168,537.64 ops/s (69.69%)
zql: push into limited query, inside the bound	📈 view plot 🚷 view threshold	115,442.52 ops/s (+10.07%) Baseline: 104,877.52 ops/s	90,152.32 ops/s (78.09%)
zql: push into limited query, outside the bound	📈 view plot 🚷 view threshold	419,986.62 ops/s (+10.77%) Baseline: 379,152.22 ops/s	305,209.21 ops/s (72.67%)
zql: push into unlimited query	📈 view plot 🚷 view threshold	352,967.36 ops/s (+12.78%) Baseline: 312,983.55 ops/s	267,596.15 ops/s (75.81%)
zqlite: all playlists	📈 view plot 🚷 view threshold	1.88 ops/s (+10.52%) Baseline: 1.71 ops/s	1.46 ops/s (77.50%)
zqlite: edit for limited query, inside the bound	📈 view plot 🚷 view threshold	82,254.27 ops/s (+11.68%) Baseline: 73,654.38 ops/s	59,981.43 ops/s (72.92%)
zqlite: edit for limited query, outside the bound	📈 view plot 🚷 view threshold	84,532.17 ops/s (+15.88%) Baseline: 72,951.16 ops/s	56,100.14 ops/s (66.37%)
zqlite: push into limited query, inside the bound	📈 view plot 🚷 view threshold	4,115.09 ops/s (+2.53%) Baseline: 4,013.49 ops/s	3,642.06 ops/s (88.50%)
zqlite: push into limited query, outside the bound	📈 view plot 🚷 view threshold	94,092.16 ops/s (+8.68%) Baseline: 86,577.73 ops/s	76,592.61 ops/s (81.40%)
zqlite: push into unlimited query	📈 view plot 🚷 view threshold	133,914.01 ops/s (+11.54%) Baseline: 120,064.02 ops/s	102,272.74 ops/s (76.37%)

🐰 View full continuous benchmarking report in Bencher

github-actions · 2025-12-28T11:31:50Z

Bencher Report

Branch	arv/basic-repl-btree-opt
Testbed	Linux

Click to view all benchmark results

Benchmark	File Size	Benchmark Result kilobytes (KB) (Result Δ%)	Upper Boundary kilobytes (KB) (Limit %)
zero-package.tgz	📈 view plot 🚷 view threshold	1,800.69 KB (+0.21%) Baseline: 1,796.88 KB	1,832.82 KB (98.25%)
zero.js	📈 view plot 🚷 view threshold	247.33 KB (+0.76%) Baseline: 245.47 KB	250.38 KB (98.78%)
zero.js.br	📈 view plot 🚷 view threshold	67.60 KB (+0.59%) Baseline: 67.20 KB	68.55 KB (98.61%)

🐰 View full continuous benchmarking report in Bencher

rajczi · 2026-01-16T23:08:55Z

We've now been using this build of Replicache in our expo react native mobile app using the sqlite kvStore against op-sqlite@15.2 for quite a while and are very happy with it. It is saving us over half of our initial sync snapshot time (download complete to ready).

rajczi · 2026-01-22T14:28:59Z

We've discovered an issue with this branch where the output of watchers doesn't match the patch from the server. I've prepared a unit test to demonstrate the issue which passes on main and fails here. I will send it to @arv.

…ers for rebalancing

grgbkr

Very nice optimization work!

grgbkr · 2026-02-07T00:12:54Z

packages/replicache/src/btree/node.test.ts

+      expect(structure2).toEqual(structure1);
+
+      // Also verify both trees have the same data
+      await withRead(dagStore1, async dagRead1 => {


instead of nesting the reads like this maybe a helper, and do them in sequence

function checkContents(dagStore, hash) { return withRead(dagStore, async dagRead => { const tree = new BTreeRead( dagRead, formatVersion, hash, getEntrySize, chunkHeaderSize, ); for (let i = 0; i < 500; i++) { const key = `key${i.toString().padStart(4, '0')}`; expect(await tree.get(key)).toBe(i); } }); } await checkContents(dagStore1, hash1); await checkContents(dagStore2, hash2);

grgbkr · 2026-02-07T00:14:56Z

packages/replicache/src/btree/node.test.ts

+  }
+
+  for (const formatVersion of [FormatVersion.V6, FormatVersion.V7] as const) {
+    test(`putMany empty entries > v${formatVersion}`, async () => {


instead of having all the > v${formatVersion}, you could do

for (const formatVersion of [FormatVersion.V6, FormatVersion.V7] as const) { describe(`v${formatVersion}`, () => { test(`putMany empty entries`, async () => { }); // etc }); }

I'm also thinking it is time to remove the old format.

grgbkr · 2026-02-07T00:17:26Z

packages/replicache/src/btree/node.test.ts

+      });
+    });
+
+    test(`putMany triggers merge and partition > v${formatVersion}`, async () => {


How do we know in the tests that a merge or partition are being triggered?

Comments describing how these are triggering the various merges/partitions could help to clarify.

grgbkr · 2026-02-07T00:46:07Z

packages/replicache/src/btree/write.ts

+          );
+
+          // Create internal nodes
+          currentLevel = parentPartitions.map(entries =>


Above you comment reuse array to avoid allocations, but doesn't this map create a new array?

Should this be

for (let i = 0; i< entries.length; i++ ){ currentLevel[i] = this.newInternalNodeImpl(entries, level); } currentLevel.length = entries.length;

Or maybe track the logical length of currentLevel in its own variable instead of resizing the array.

for (let i = 0; i< entries.length; i++ ){ currentLevel[i] = this.newInternalNodeImpl(entries, level); } currentLevelLength= entries.length;

grgbkr · 2026-02-07T00:54:01Z

packages/replicache/src/btree/write.ts

+        return;
+      }
+
+      // Slow path: merge with existing tree


Is this slow path still faster than putting each entry with a separate put call?

grgbkr · 2026-02-07T03:55:04Z

packages/replicache/src/sync/patch.test.ts

+      );
+      await apply(lc, dbWrite, patch);
+
+      expect(await dbWrite.get('a')).toBe(2);


Maybe these test should use scan to read and assert on entire content of db?

grgbkr · 2026-02-07T05:19:42Z

packages/replicache/src/sync/patch.ts

+ * 1. Dropping all operations before the last 'clear'
+ * 2. For each key: put/del replace all previous operations; updates accumulate
+ * 3. Removing standalone 'del' operations after a clear (deleting from empty tree)
+ * 4. Merging updates after puts into a single put operation


why can't updates that don't have a put before them be merged?

A put + merge -> put

I don't recall if I did

merge + merge -> merge

But it should be correct to do.

grgbkr · 2026-02-07T05:20:07Z

packages/replicache/src/sync/patch.ts

+ * 2. For each key: put/del replace all previous operations; updates accumulate
+ * 3. Removing standalone 'del' operations after a clear (deleting from empty tree)
+ * 4. Merging updates after puts into a single put operation
+ * Note: Order is preserved for operations on the same key, but operations


If we also merged updates, would there always be just one operation per key?

I think that is correct.

In theory we could get a del followed by a merge but I have to check again if that is an error or ignored.

grgbkr · 2026-02-07T06:35:20Z

packages/replicache/src/sync/patch.ts

+
+  // already sorted
+
+  // Use putMany which will use BTreeWrite.fromEntries if the map is empty


What is BTreeWrite.fromEntries?

Old reference. I removed it and dealt with it without a new API.

grgbkr · 2026-02-07T06:37:24Z

packages/replicache/src/sync/patch.ts

+          if (existing.length === 1 && existing[0].op === 'put') {
+            const {value} = existing[0];
+            assertObject(value);
+            const merged = mergeUpdate(p, value);


It would be good to add some tests around merging updates that have a constrain property. I feel uncertain if those are being merged correctly.

arv requested a review from grgbkr December 28, 2025 10:45

vercel bot had a problem deploying to Preview – replicache-docs December 28, 2025 10:46 Failure

vercel bot deployed to Preview – zbugs December 28, 2025 10:46 View deployment

Merge branch 'main' into arv/basic-repl-btree-opt

c9055f8

vercel bot had a problem deploying to Preview – replicache-docs December 28, 2025 11:27 Failure

vercel bot deployed to Preview – zbugs December 28, 2025 11:27 View deployment

format and type

181b971

vercel bot deployed to Preview – zbugs December 28, 2025 11:31 View deployment

vercel bot deployed to Preview – replicache-docs December 28, 2025 11:31 View deployment

fix(btree): correct putManyMergeAndPartition to return splice paramet…

c48432a

…ers for rebalancing

vercel bot deployed to Preview – replicache-docs January 23, 2026 09:53 View deployment

vercel bot deployed to Preview – zbugs January 23, 2026 09:53 View deployment

fix(btree): update chunkHeaderSize variable in BTreeWrite.putMany tests

bce709d

vercel bot deployed to Preview – replicache-docs January 23, 2026 10:43 View deployment

Merge branch 'main' into arv/basic-repl-btree-opt

3054311

vercel bot deployed to Preview – replicache-docs January 23, 2026 10:44 View deployment

vercel bot deployed to Preview – zbugs January 23, 2026 10:44 View deployment

grgbkr reviewed Feb 7, 2026

View reviewed changes


		// already sorted

		// Use putMany which will use BTreeWrite.fromEntries if the map is empty

Comments

Conversation

arv commented Dec 28, 2025

feat(replicache): Add bulk insertion optimization with putMany

Overview

Changes

Core BTree Changes

Database Layer

Sync Layer Optimization

Performance Impact

Benchmark Results

Testing

Compatibility

Implementation Details

Key Algorithm Improvements

Memory Efficiency

Future Work

Uh oh!

vercel bot commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rajczi commented Jan 16, 2026

Uh oh!

rajczi commented Jan 22, 2026

Uh oh!

grgbkr left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vercel bot commented Dec 28, 2025 •

edited

Loading

github-actions bot commented Dec 28, 2025 •

edited

Loading

github-actions bot commented Dec 28, 2025 •

edited

Loading

github-actions bot commented Dec 28, 2025 •

edited

Loading