Add an optional cache for ParquetFragmentStreamer#1492
Add an optional cache for ParquetFragmentStreamer#1492
Conversation
| ) | ||
|
|
||
|
|
||
| @lru_cache(maxsize=1) # TODO: decide on reasonable upper bound |
There was a problem hiding this comment.
i would pass max_cache_size as a param here instead of use_cache
| ) | ||
|
|
||
|
|
||
| @lru_cache(maxsize=1) # TODO: decide on reasonable upper bound |
There was a problem hiding this comment.
i would pass max_cache_size as a param here instead of use_cache
There was a problem hiding this comment.
yes but how can you set dynamically (tied to an object instance) a singleton component that you want to exist across instances?
There was a problem hiding this comment.
maybe as some global context...
There was a problem hiding this comment.
what if different ParquetFragmentStreamer instances ask for different cache sizes? or were you thinking of something like AssetStore i.e. a singleton handled by fairseq2 and controlled via the recipe? that would be a deeper change though I think, as I would need to touch recipe composition
|
thinking that we probably want to also cache
|
What does this PR do? Please describe:
ParquetFragmentStreamerwill initialize N separateDatasetinstances, with much duplicated work in filesystem scanning metadata parsing etcDoes your PR introduce any breaking changes? If yes, please list them:
Check list: