Skip to content

Remarks on implementation: #1

@acerone85

Description

@acerone85

Some remarks and suggestions to improve on the implementation:

  1. There are some unnecessary extra allocations, e.g. in the function sha256_32 you allocate a buffer where you move data. In practice, you can move the buffer out of the hasher:
#[inline(always)]
fn sha256_32(data: &[u8]) -> Bytes32 {
    let mut hasher = Sha256::new();
    hasher.update(data);
    let out = hasher.finalize();
    // Avoid an extra allocation/copy  by returning `out`'s internal buffer
    let arr: [u8; 32] = *(out.as_ref());
    arr
}
  1. The implementation is fairly low level, which makes it difficult to follow and make the implementation error prone. For example, the function subtree_root_sparse will panic if an empty tree is passed as input. Also, some implementation details are not clear, e.g. why the check if i + 1 < level_nodes.len() && level_nodes[i + 1].0 == (idx ^ 1) { is used to test if the sibling of a node is present.

  2. Judging from the implementation, the structure implemented is more a 256-ary sparse merkle tree, rather than a binary one. In general, using a size 256 as the radix of the tree might result in bigger proofs, and a smaller Radix is preferred (for example, many variations of Sparse Merkle trees such as Merkle Patricia Tries(https://ethereum.org/developers/docs/data-structures-and-encoding/patricia-merkle-trie/) and Jellyfish Merkle trees` (https://developers.diem.com/papers/jellyfish-merkle-tree/2021-01-14.pdf) have a radix of 16.

  3. In general, when programming in rust it is easier to leverage the NewType pattern (https://rust-unofficial.github.io/patterns/patterns/behavioural/newtype.html) to guarantee strong invariants at compile time. For example, the function stem_bit returns a u8, but in practice one could define its own Bit enum to guarantee that the result is either 0 or 1.

#[repr(u8)] // Will be represented in memory as a u8
pub enum Bit {
Zero, // will be represented in memory as `00000000`
One  // will be represented in memory as `00000001`
}
  1. When implementing Sparse Merkle Tries or any variant, it is easier to store all nodes in a HashMap<KeyPrefix, Node>, where KeyPrefix is a sequence from 0 to 32 bytes, which needs to satisfy the trait bounds for HashMap::Key (e.g. Hash), and Node contains a bitmap of which children are defined. If the i-th bit of the bitmap is set, then the node has a child at position i, otherwise it has no child at that position. When traversing the tree from the root to a leaf, it makes sense to define an iterator that helps you select the next node to be traversed, rather than relying on a low-level implementation.

  2. Depending on whether a value of 000...000 for a leaf is the same as the leaf not being present, the implementation might not be collision resistant. From what I understand, because h32or64 returns 00...0 when hashing the concatenation of 00...0 and 00...0, the empty tree and a tree whose all leaves have value 00...0 (or just a leaf with that value) will have the same root hash.

Another question from my side, what is the rationale behind the need for Sparse Merkle trees?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions