-
Notifications
You must be signed in to change notification settings - Fork 10
Description
The Problem
Currently, when using pllmod_binary_partition_load the user can either specify an existing partition to load data into (which requires beforehand knowledge of the attributes of the partition), or specify this as NULL. The latter prompts the function to create a new partition such that it corresponds perfectly with the one being loaded.
However consider this use case:
The stored partition is large, due to storing a large tree for example. The user, to avoid memory issues, only wants to load the bare minimum to create the partition object, then fetch the big data items (usually the CLVs) in separate chunks, as they are needed.
One way to currently do this is:
Have a local copy of binary_block_header_apply to poll the binary file for the basic attributes, then use pll_partition_create to create a partition accordingly, but with the minimal amount of CLVs allocated. re-alloc the pointers to the CLVs and flag them "not currently in memory" (for example by setting them to NULL), then check for that condition when accessing a CLV (via a custom wrapper function). Load from disk if not in memory.
This requires code duplication (as there is no user exposed way of polling block headers), detailed knowledge of partition internals (free-ing and calloc-ing clv memory, knowing the width of a CLV [for which there is no helper function currently], block related internals) and generally involves lots of custom code for a task that could become very commonplace in the future (large trees).
Solution 1
Have one or more functions returning a "skeleton partition" exposed through the binary interface that the user can supply to pllmod_binary_partition_load, that implement the behavior of loading only the bare minimum, and that creates some methodology, like having partition->clv[x] = nullptr to detect whether a CLV needs to be loaded. Leave the dynamic loading to the user.
Solution 2
Extend the module with more attributes that specify the behavior of pllmod_binary_partition_load.
For example, the user already specifies during pllmod_binary_partition_dump whether CLVs should be dumped in the same block and operation (via PLLMOD_BIN_ATTRIB_PARTITION_DUMP_CLV).
Currently this flag is ignored with regards to how much memory should be allocated (which I think is good, as the user gets a "valid" partition back that doesn't segfault from simply trying to access a CLV).
So supplying attributes like PLLMOD_BIN_ATTRIB_PARTITION_ALLOC_CLV to the load function.
Solution 3
Implement dynamic loading completely, and consistently, in the binary module.
This would involve wrapping almost all major accesses to the partition after loading, or at least some functions to check for non-loaded data and functions to load specific parts.
There would also have to be some effort to make this thread-safe.