Skip to content

Range-Based Set Reconciliation (negentropy) #9

@hoytech

Description

@hoytech

Hi! Have you seen the set reconcilliation algorithm described here? https://arxiv.org/abs/2212.13567

I wrote a C++ library that defines and implements this as a concrete protocol: https://github.com/hoytech/negentropy

Before building negentropy, I looked into the algorithms that gensync implements, and my conclusion was that range-based reconcilliation is superior under most conditions:

  • Handles very large cardinalities (we are routinely using it for syncing data-sets of size 10 million, and greater)
  • No tuning required: works predictably well with large or small number of differences
  • Bandwidth overhead is logarithmic in DB size and linear in number of symmetric differences
  • Negligible CPU usage compared with CPISync/minisketch
  • No false positives or failed decodes like with bloom-filter based algos

But I would love to have some objective benchmarks to verify this (or be proven mistaken!).

Thanks for any help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions