Use relative pointers in block encoding #21
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This patch rewrites all DeltaOps to use
seqvalues relative to theseqnumber of the enclosing block. This takes advantage of variable width integer encoding to reduce block size when pointers reference nearby blocks (a common case).Note: this currently uses an
intencoding for the relativeseqvalues. For local writes (where core === 0), only auintshould be required (i.e. pointers should only point backwards or to their own block). But when adding writes on top of a remote Hyperbee2, it's possible to have a pointer point to a higherseqnumber than the block it was written in. Either we support positive and negativeseqoffsets or we restrict this optimisation to pointers to the local core (0). This patch currently does the former.Compression benchmarks indicate this patch results in an approximately 18% reduction in block overhead costs. Impact on read/write performance does not seem significant.
I considered three ways to implement this change:
encoding.jsthat converts pointers to relative before encoding and back to absolute after decoding (this is what I did).DeltaOpsfirst.I chose option 1 because it is the least intrusive code change. However, it does involve cloning DeltaOps before making their pointers relative because they are referenced directly in the tree. It also requires special care around ownership of objects between
encoding.js,index.js, andwrite.js. Ownership might be made clearer by moving these steps insidewrite.jsorindex.js(when creating the batches or inflating the blocks) but that code already felt complex enough.Option 2 was rejected because it requires new context during the encoding/decoding process. This process is currently well decoupled from the rest of the logic and it makes sense to keep it that way.
Option 3 was rejected because it is a much more intrusive change and I'd be concerned about regressions.