-
Notifications
You must be signed in to change notification settings - Fork 49
Sorcha/dev/zarr3 compaction #1450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
Latest version can read and write with the zipstore format but needs tidying up (produces lots of print statements to narrow down the issues when trying to run the evaluation from a .yml file) |
|
|
Integration test fails, need to investigate this further! |
What is the performance impact for running evaluation with using zipstore? |
tjhunter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small style comments. Otherwise, it looks ready to try out on a larger scale.
When testing into the logic, I moved the creation of the zarr store straight into inference. Otherwise, I am personally confused when we write during inference and when we just calculate validation losses.
@grassesi : do you know why we don't write data when calling validation during training?
| """Convert raw dask arrays into chunked dask-aware xarray dataset.""" | ||
| chunks = (chunk_nsamples, *self.data.shape[1:]) | ||
|
|
||
| chunks = (chunk_nsamples, *(max(x // 4, 1) for x in self.data.shape[1:])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is this formula coming from? Someone else should be able to update it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extracted out into a scale factor parameter now for easier updating
| self.data_root: zarr.Group | None = None | ||
| self._store: LocalStore | None = None | ||
| # determine type using extension | ||
| ext = self._store_path.suffix[1:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are repeating yourself
| if self.type == "zip": | ||
| if self.create: | ||
| _logger.info("Creating zipstore") | ||
| self._store = ZipStore(self._store_path, mode="a") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
append is more permissive, but there is a cost in opening and closing the store. I would rather have the store opened for the full duration of the write rather than for each chunk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mode = "w" works with the new develop code, before I had some issues with overwriting so was using append - thanks for fixing this :)
We don't write during validation since it's slows down the validation, generates additional files/storage, and is rarely used. There's a config parameter that controls this: WeatherGenerator/config/default_config.yml Line 337 in 1776b0a
|
…rcha/dev/zarr3_compaction
Hi Christian, sorry I missed this before the holidays - the latest tests are all in the hedge doc: https://gitlab.jsc.fz-juelich.de/hedgedoc/3X10vtdVQumQZy4qpBiHWg - there seems to be an small imporevemnt when using the ZipStore to evaluate |
Description
Draft PR to showcase how to enable sharding (experimentation underway to optimise performance)
Issue Number
DRAFT
Closes #1384
Checklist before asking for review
./scripts/actions.sh lint./scripts/actions.sh unit-test./scripts/actions.sh integration-testlaunch-slurm.py --time 60