Optimize chunker hot loop for ~1.5-2x throughput by folbricht · Pull Request #307 · folbricht/desync

folbricht · 2026-02-08T10:24:21Z

Summary

Replace expensive DIVL instruction in boundary detection (hValue % disc == disc-1) with Lemire's fast divisibility test — a multiply-and-compare that costs ~5 cycles vs ~26 for hardware division
Convert hashTable from slice to [256]uint32 fixed-size array, eliminating bounds checks on all accesses
Add precomputed hashTableRotated table, removing one RotateLeft32 call per byte in the hot loop
Hoist struct fields (hValue, hIdx, buf, window pointer, Lemire constants) into local variables to avoid repeated pointer dereferences
Replace (hIdx + 1) % 48 with a branch (if hIdx >= 48) — perfectly predicted, taken once per 48 iterations
Reuse backing buffer in fillBuffer() to reduce allocations from O(filesize/buffersize) to O(1)
Add b.SetBytes() to benchmarks for direct MB/s throughput reporting; modernize to use b.Loop()

All optimizations preserve identical chunk boundaries, verified by TestChunkerLargeFile which checks exact SHA512/256 hashes for every chunk.

Test plan

go test -run TestChunkerLargeFile — exact chunk boundary regression (SHA512/256 hashes)
go test -run TestChunker — all chunker tests pass
go test -bench=BenchmarkChunk -benchmem — verify throughput improvement
CI passes on Linux, macOS, Windows

🤖 Generated with Claude Code

Before:

goos: linux
goarch: amd64
pkg: github.com/folbricht/desync
cpu: Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz
BenchmarkChunker-4         	     272	   4304447 ns/op	 2621489 B/op	       2 allocs/op
BenchmarkChunkNull1M-4     	     205	   5907772 ns/op	 2621489 B/op	       2 allocs/op
BenchmarkChunkNull10M-4    	      15	  69146171 ns/op	13107254 B/op	       6 allocs/op
BenchmarkChunkNull50M-4    	       4	 294544244 ns/op	55050312 B/op	      23 allocs/op
BenchmarkChunkNull100M-4   	       2	 593006637 ns/op	107479112 B/op	      43 allocs/op

After:

goos: linux
goarch: amd64
pkg: github.com/folbricht/desync
cpu: Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz
BenchmarkChunker-4         	     472	   2371186 ns/op	 442.22 MB/s	 2621489 B/op	       2 allocs/op
BenchmarkChunkNull1M-4     	     376	   3125929 ns/op	 335.44 MB/s	 2621490 B/op	       2 allocs/op
BenchmarkChunkNull10M-4    	      38	  31382746 ns/op	 334.12 MB/s	 2621489 B/op	       2 allocs/op
BenchmarkChunkNull50M-4    	       7	 153479795 ns/op	 341.60 MB/s	 2621488 B/op	       2 allocs/op
BenchmarkChunkNull100M-4   	       4	 307347136 ns/op	 341.17 MB/s	 2621488 B/op	       2 allocs/op

Closes #244

Replace the two main bottlenecks in the chunker's byte-processing loop: the expensive DIVL instruction for boundary detection (~26 cycles/byte) and the modulo-48 window index (~14 instructions/byte). Key changes: - Replace modulo boundary check with Lemire's fast divisibility test (multiply-and-compare, ~5 cycles vs ~26 for hardware division) - Convert hashTable from slice to [256]uint32 array (eliminates bounds checks) - Add precomputed hashTableRotated table (removes one RotateLeft32 per byte) - Hoist struct fields into local variables in the hot loop - Replace modulo-48 window index with branch (predicted once per 48 iterations) - Reuse backing buffer in fillBuffer() to reduce allocations to O(1) - Add b.SetBytes() to benchmarks for direct MB/s reporting - Modernize benchmarks to use b.Loop() and bytes.NewReader All optimizations preserve identical chunk boundaries, verified by TestChunkerLargeFile which checks exact SHA512/256 hashes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

folbricht merged commit f7c780b into master Feb 8, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Optimize chunker hot loop for ~1.5-2x throughput#307

Optimize chunker hot loop for ~1.5-2x throughput#307
folbricht merged 1 commit intomasterfrom
chunker-benchmarks

folbricht commented Feb 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

folbricht commented Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

folbricht commented Feb 8, 2026 •

edited

Loading