Sorcha/dev/zarr3 compaction #1450

enssow · 2025-12-11T14:21:53Z

Description

Draft PR to showcase how to enable sharding (experimentation underway to optimise performance)

Issue Number

DRAFT
Closes #1384

Checklist before asking for review

I have performed a self-review of my code
My changes comply with basic sanity checks:
- I have fixed formatting issues with ./scripts/actions.sh lint
- I have run unit tests with ./scripts/actions.sh unit-test
- I have documented my code and I have updated the docstrings.
- I have added unit tests, if relevant
I have tried my changes with data and code:
- I have run the integration tests with ./scripts/actions.sh integration-test
- (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
- (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
I have informed and aligned with people impacted by my change:
- for config changes: the MatterMost channels and/or a design doc
- for changes of dependencies: the MatterMost software development channel

Revert "Implement per-channel logging (#283)" (#434)

…mpaction

…ng common.io)

…mpaction

enssow · 2025-12-17T17:33:55Z

Latest version can read and write with the zipstore format but needs tidying up (produces lots of print statements to narrow down the issues when trying to run the evaluation from a .yml file)

enssow · 2025-12-19T18:23:34Z

Specify type of store to create during an inference with uv run --offline inference --from_run_id zs581tqh --samples 1 --streams_output ERA5 --zarr-store local or uv run --offline inference --from_run_id zs581tqh --samples 1 --streams_output ERA5 --zarr-store zip
Code to run evaluation remains the same with no extra sections of config required (the type of store is detected using the extension) e.g. uv run evaluate --config ../test_evaluate_zip.yml (I tested the latest code with runs: v6l27pog (zip) and bvu0y897 (local)
Backwards Compatible: /p/project1/weatherai/owens1/WeatherGenerator/.venv/lib/python3.12/site-packages/zarr/core/group.py:568: ZarrUserWarning: Both zarr.json (Zarr format 3) and .zgroup (Zarr format 2) metadata objects exist at file:///p/scratch/weatherai/shared_work/results/v8pqzh4y/validation_chkpt00000_rank0000.zarr. Zarr format 3 will be used. warnings.warn(msg, category=ZarrUserWarning, stacklevel=1) get this warning but it does plot (tested with run v8pqzh4y)

enssow · 2025-12-19T18:23:52Z

Integration test fails, need to investigate this further!

clessig · 2025-12-20T12:55:11Z

Latest version can read and write with the zipstore format but needs tidying up (produces lots of print statements to narrow down the issues when trying to run the evaluation from a .yml file)

What is the performance impact for running evaluation with using zipstore?

tjhunter

Some small style comments. Otherwise, it looks ready to try out on a larger scale.

When testing into the logic, I moved the creation of the zarr store straight into inference. Otherwise, I am personally confused when we write during inference and when we just calculate validation losses.

@grassesi : do you know why we don't write data when calling validation during training?

packages/common/src/weathergen/common/config.py

tjhunter · 2025-12-29T15:58:22Z