Skip to content

Conversation

@caolan
Copy link

@caolan caolan commented Jan 26, 2026

This patch rewrites all DeltaOps to use seq values relative to the seq number of the enclosing block. This takes advantage of variable width integer encoding to reduce block size when pointers reference nearby blocks (a common case).

Note: this currently uses an int encoding for the relative seq values. For local writes (where core === 0), only a uint should be required (i.e. pointers should only point backwards or to their own block). But when adding writes on top of a remote Hyperbee2, it's possible to have a pointer point to a higher seq number than the block it was written in. Either we support positive and negative seq offsets or we restrict this optimisation to pointers to the local core (0). This patch currently does the former.

Compression benchmarks indicate this patch results in an approximately 18% reduction in block overhead costs. Impact on read/write performance does not seem significant.

I considered three ways to implement this change:

  1. To write new code inside encoding.js that converts pointers to relative before encoding and back to absolute after decoding (this is what I did).
  2. To write a new encoder for compact-encoding that does the same work. This has the advantage of not having to clone the DeltaOps first.
  3. To update the tree to work with relative pointers internally, requiring no change to encode/decode logic (apart from supporting past versions).

I chose option 1 because it is the least intrusive code change. However, it does involve cloning DeltaOps before making their pointers relative because they are referenced directly in the tree. It also requires special care around ownership of objects between encoding.js, index.js, and write.js. Ownership might be made clearer by moving these steps inside write.js or index.js (when creating the batches or inflating the blocks) but that code already felt complex enough.

Option 2 was rejected because it requires new context during the encoding/decoding process. This process is currently well decoupled from the rest of the logic and it makes sense to keep it that way.

Option 3 was rejected because it is a much more intrusive change and I'd be concerned about regressions.

Compares performance of Hyperbee2 against baseline RocksDB performance
for small and large(ish) keys and values in both large and small write
batches.

Introduces a new script to package.json:

npm run bench
Times ascending, descending, random get() calls and iteration over keys.

Run as part of `npm run bench`
This patch rewrites all DeltaOps to use `seq` values relative to
the `seq` number of the enclosing block. This takes advantage
of variable width integer encoding to reduce block size when
pointers reference nearby blocks (a common case).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant