Conversation
|
Fixes #418 |
28cbdfa to
57a46e9
Compare
|
Claude: review of facebook/bpfilter #460 (61d3779) Must fix
Suggestions
Nits
CLAUDE.md improvements
|
57a46e9 to
6b64c48
Compare
6b64c48 to
783f5a3
Compare
783f5a3 to
c480a27
Compare
c480a27 to
516ba3c
Compare
516ba3c to
3821d0d
Compare
3821d0d to
72eab46
Compare
5f6eab6 to
a49465f
Compare
|
Thanks for working on this! I have some architectural thoughts that I'd be interested to hear your take on. Right now, in
Why does this matter?
My understanding of the current approach is that My opinion would be to add Two other (possibly connected) smaller points:
@pzmarzly @qdeslandes Interested to hear your thoughts! |
a49465f to
b07b9dc
Compare
Dynamically-sized array of fixed-size elements. Elements are stored inline in a single contiguous allocation. The vector doubles in capacity on overflow.
A simple data hashing function.
Callers previously accessed the inner bf_list directly. Provide proper iteration and size accessors so the set implementation can be swapped without touching every call site.
Open-addressing hashset with linear probing backed by bf_vector. Elements are fixed-size blobs hashed with FNV-1a and compared with memcmp. Removed elements leave tombstones; the table grows at 70% load but never shrinks. This provides an O(1) average-case alternative to the linked-list-based bf_set for large element counts.
When cmake -DUSE_HASHSET=1 is passed, set.h redirects all bf_set_* symbols to their bf_hashset_* counterparts via preprocessor macros. The CI workflow is extended to test both paths. USE_HASHSET defaults to off for now.
|
@yaakov-stein i like your suggestion a lot. Initially I thought I'll try splitting the "low-level" code into
Probably not. I thought that this way I will surely not block the daemonless work in any way, and that we can flip the flag once @qdeslandes's work is complete, but now I see that it may actually cause more merge conflicts for little gain. |
b07b9dc to
61d3779
Compare
| * @brief Iterate over all occupied elements in a hashset. | ||
| * | ||
| * @param set Pointer to the hashset to iterate over. Must be non-NULL. | ||
| * @param elem_var Name of the void* variable to hold each element. | ||
| */ | ||
| #define bf_hashset_foreach(set, elem_var) \ | ||
| for (size_t _bf_hset_idx = 0, _bf_hset_brk = 0; \ | ||
| _bf_hset_idx < bf_hashset_cap(set) && !_bf_hset_brk; ++_bf_hset_idx) \ | ||
| if (*(uint8_t *)bf_vector_get(&(set)->slots, _bf_hset_idx) != \ | ||
| _BF_HASHSET_SLOT_OCCUPIED) \ | ||
| continue; \ | ||
| else \ | ||
| for (void *(elem_var) = \ |
There was a problem hiding this comment.
Claude: bf_hashset_foreach uses function calls in loop condition unlike other foreach macros (suggestion)
The macro calls bf_hashset_cap(set) on every outer-loop iteration and bf_vector_get(...) twice per occupied slot (once for the status check, once for the element pointer). By contrast, bf_list_foreach and bf_vector_foreach access struct fields directly (e.g., (list)->head, (vec)->data, (vec)->len) without function-call overhead.
Consider accessing (set)->slots fields directly for consistency:
- Use
(set)->slots.leninstead ofbf_hashset_cap(set) - Compute slot pointers from
(set)->slots.datainstead of callingbf_vector_get()
|
Is this still required as we have #470 ? |
|
Closing, #470 is better |
Currently,
bf_setusesbf_listunder the hood. This is suboptimal performance-wise. Here I am addingbf_hashset, with the exact same API asbf_set. Ifcmake -DUSE_HASHSET=1is passed, it will beused.
lib: add bf_vector—bf_hashsetis backed by a growablebf_vector, with overflow-safe capacity doubling.lib: helper: add FNV-1a hash function— imported from another PR, a simple hash function.lib: add bf_set_foreach and bf_set_size— currently we directly use the innerbf_listin many places. This is a leak of implementation details, so let's make proper getters.lib: add bf_hashset— main commit. Implements a simple hashset (tombstones on deletion, grows 2x when load factor exceeds 70%, doesn't shrink, linear probing on collision). Includesbf_hashset_foreach,add_many/remove_manyfor bulk operations, and serialization support.build: add USE_HASHSET option to switch to hashset implementation— adds CMake flag to build withhashset.cin place ofset.c. Changes GH Actions workflow to test bothset.candhashset.c.I'm keeping
USE_HASHSETdefault as 0 for now.