lib: create data_structures directory, implement hashset data structure#470
lib: create data_structures directory, implement hashset data structure#470pzmarzly wants to merge 6 commits intofacebook:mainfrom
Conversation
A simple data hashing function.
src/libbpfilter/set.c
Outdated
|
|
||
| static uint64_t _bf_set_elem_hash(const void *data, void *ctx) | ||
| { | ||
| return bf_fnv1a(data, *(const size_t *)ctx, 0); |
There was a problem hiding this comment.
Claude: FNV-1a hash seeded with 0 instead of BF_FNV1A_INIT
The documentation for bf_fnv1a in helper.h explicitly states "Pass @ref BF_FNV1A_INIT as @p hash for the initial call." Using 0 instead of the FNV-1a offset basis (0xcbf29ce484222325ULL) degrades hash distribution quality, leading to more collisions in the hashset. The unit test for bf_fnv1a in tests/unit/libbpfilter/helper.c correctly uses BF_FNV1A_INIT.
Same issue in tests/unit/libbpfilter/data_structures/hashset.c line 14.
| return bf_fnv1a(data, *(const size_t *)ctx, 0); | |
| return bf_fnv1a(data, *(const size_t *)ctx, BF_FNV1A_INIT); |
| static uint64_t uint32_hash(const void *data, void *ctx) | ||
| { | ||
| (void)ctx; | ||
| return bf_fnv1a(data, sizeof(uint32_t), 0); |
There was a problem hiding this comment.
Claude: FNV-1a hash seeded with 0 instead of BF_FNV1A_INIT
Same issue as in set.c — should use BF_FNV1A_INIT to match the documented API contract and produce correct hash distribution.
| return bf_fnv1a(data, sizeof(uint32_t), 0); | |
| return bf_fnv1a(data, sizeof(uint32_t), BF_FNV1A_INIT); |
| * | ||
| * If the element already exists, no duplicate is added and 0 is returned. | ||
| * The hashset grows automatically when the load factor is exceeded. | ||
| * The hashset takes ownership of @p data. |
There was a problem hiding this comment.
Claude: bf_hashset_add ownership semantics are ambiguous for duplicates
The doc says "The hashset takes ownership of @p data" but when the element already exists, bf_hashset_add returns 0 without storing data — ownership is silently not taken. This is a memory management footgun for callers using the idiomatic TAKE_PTR pattern after a successful add.
The current caller (bf_set_add_elem) works around this by pre-checking with bf_hashset_contains, but future callers may not. Consider either:
- Updating the doc to clarify: "On successful insertion, the hashset takes ownership. If the element already exists, ownership is not transferred."
- Or returning
-EEXISTfor duplicates so callers can distinguish the two cases.
|
|
||
| bf_dump_prefix_push(prefix); |
There was a problem hiding this comment.
Claude: bf_set_dump lost bf_dump_prefix_last call for last element
The old bf_list_foreach loop called bf_dump_prefix_last(prefix) when iterating over the last element, producing correct tree-like connector characters in dump output. The new bf_set_foreach (backed by bf_hashset_foreach) has no way to detect the last element, so the final element's prefix will show a ├ connector instead of └. This is a consistent pattern elsewhere in the codebase (see bf_chain_dump, bf_rule_dump, bf_hook_dump).
| */ | ||
|
|
||
| /// Sentinel pointer value for a tombstoned (deleted) slot. | ||
| #define _BF_HASHSET_TOMBSTONE ((void *)1) |
There was a problem hiding this comment.
Claude: _BF_HASHSET_TOMBSTONE is an implementation detail exposed in the public header
This sentinel value is only needed by bf_hashset_foreach's expansion. Consider moving it to the .c file and restructuring the foreach macro to call an inline helper that encapsulates the tombstone check, keeping this internal constant out of the public API surface.
|
Claude: review of facebook/bpfilter #470 (02938a3) Must fix
Suggestions
Nits
|
314af18 to
02938a3
Compare
| return 0; | ||
| return bf_hashset_is_empty(&set->elems); | ||
| } | ||
|
|
There was a problem hiding this comment.
Claude: Memory leak in bf_set_add_elem when bf_hashset_add fails during grow
TAKE_PTR(_elem) nullifies _elem before bf_hashset_add executes. If bf_hashset_add fails (e.g., -ENOMEM from _bf_hashset_grow), the allocated copy is lost — _elem is already NULL so _cleanup_free_ won't free it, and bf_hashset_add didn't store it.
Fix by separating the ownership transfer:
| int r = bf_hashset_add(&set->elems, _elem); | |
| if (r) | |
| return r; | |
| TAKE_PTR(_elem); | |
| return 0; |
| _free_bf_vector_ struct bf_vector *_vec = NULL; | ||
|
|
||
| assert(vec); | ||
| assert(elem_size); |
There was a problem hiding this comment.
Claude: assert() on non-pointer value
assert(elem_size) validates a size_t, but the style guide states: "Only use assert() for pointer values. For other validation, use if () with appropriate error logging and return codes."
All existing assert() calls in src/libbpfilter/ validate pointers. Consider:
| assert(elem_size); | |
| if (!elem_size) | |
| return -EINVAL; |
| *(void **)bf_vector_get(&(set)->slots, _bf_hset_idx)); \ | ||
| _bf_hset_brk; _bf_hset_brk = 0) | ||
|
|
||
| /** |
There was a problem hiding this comment.
Claude: Duplicated slot-liveness check between header and .c file
This _bf_hashset_slot_is_occupied function is functionally identical to _bf_slot_is_live in hashset.c (both check slot != NULL && slot != tombstone). The header version uses the magic (void *)1 while the .c version uses _BF_HASHSET_TOMBSTONE.
Consider consolidating into a single static inline in the header and using it from both the macro and the .c file, avoiding the duplicated magic constant.
qdeslandes
left a comment
There was a problem hiding this comment.
Alright, first pass of review and a few things to fix. I'll do a second, deeper pass, when those are solved. That being said, it's a very welcome addition! :D
| /** | ||
| * @brief Compute a FNV-1a 64-bit hash. | ||
| * | ||
| * Pass @ref BF_FNV1A_INIT as @p hash for the initial call. To hash |
There was a problem hiding this comment.
@ref should be replaced with backticks.
| assert(data); | ||
|
|
||
| const uint8_t *bytes = data; |
There was a problem hiding this comment.
const uint8_t *bytes = data;
assert(data);
There was a problem hiding this comment.
Not fan of bpfilter/data_structures/... path. As we're moving to daemon-less, I would suggest bpfilter/core/..., which will later on contain other core structures of the project (e.g. bf_chain).
There was a problem hiding this comment.
Check dynbuf.c, it's the same logic but dynbuf uses elem_size = 1. I would suggest implementing bf_vector_take to take ownership of the data (so bf_vector would re-allocate memory on the next add). Then (in a subsequent commit), replace bf_dynbuf with bf_vector.
Could you also add a @todo in bf_program to replace img with a bf_vector?
| _free_bf_vector_ struct bf_vector *_vec = NULL; | ||
|
|
||
| assert(vec); | ||
| assert(elem_size); |
There was a problem hiding this comment.
Only assert pointers for NULL. elem_size should be checked and return an error on 0.
| static inline bool _bf_hashset_slot_is_occupied(void *slot) | ||
| { | ||
| return slot != NULL && slot != (void *)1; | ||
| } |
There was a problem hiding this comment.
No private functions in headers.
| * @param set Pointer to the hashset. Must be non-NULL. | ||
| * @param elem_var Name of the @c void* variable to hold each element. | ||
| */ | ||
| #define bf_hashset_foreach(set, elem_var) \ |
There was a problem hiding this comment.
Can't you leverage bf_vector_foreach in here to make it simpler?
|
|
||
| #include "test.h" | ||
|
|
||
| static uint64_t uint32_hash(const void *data, void *ctx) |
There was a problem hiding this comment.
Static symbols are prefixed with _bf_
There was a problem hiding this comment.
Does it need to be part of the public headers?
There was a problem hiding this comment.
It might be worth having a bf_hashset_take.
WIP draft
See #460