-
Notifications
You must be signed in to change notification settings - Fork 12
Description
I cannot bulk load (train()) with the error bound used in the paper (32) as it gave out segmentation fault or in some case, assertion error. In fact, we need to push the error bound up to 1024 and much higher for some of my 'harder' dataset so it can run without those errors. On the other hand, error bound 32 is fine with synthetic lognormal dataset for example. Can you please help me fix the issue or point out something that I do wrong?
For your reference, please try the SOSD dataset (https://github.com/learnedsystems/SOSD) in particular we tested 200M key of osm: https://www.dropbox.com/s/j1d4ufn4fyb4po2/osm_cellids_800M_uint64.zst?dl=1
I adapted your code to work with uint64_t keys by changing key_type in function.h as well as adding a generic version for binary_search_branchless() in util.h. In case there are more signed int64_t hardcoded code, I also tested by turning the keys into signed format but got the same result.