Using Zarr as a processing step to improve IO #126

ArcticSnow · 2025-08-22T15:12:08Z

ArcticSnow
Aug 22, 2025
Maintainer

As part of the branch tps_zarr I implemented a conversion of ERA5 netcdf to a Zarr store. Then I developped two parallelization workflow:

multicore using the native Python libray
Dask

Finally, downscaled data are stored as netcdf, or zarr (option only available using Dask).

Quick benchmarking on my local machine shows really good improvement (50 points to downscale, 1.5years) :

old way on 4 core: 104s
zarr -> multicore (4c) -> netcdf: 65s
zarr -> multicore (6c) -> netcdf: 57s
zarr -> dask (6 workers, 1 thread, 2GB) -> zarr : 185s

I am not sure how this scales up on larger computer or servers, but the setting of Dask change a fair amount the speed and memory usage. Dask and outputing to netcdf is wortless in terms of time.

TimeSplitter is not implemented with this version. This is where the workflow Zarr -> Dask -> Zarr may shine and handle very large dataset better than manual splitting using TimeSplitter and multicore. This has not been tested yet.

Warning: config file has been updated to include more options. Will be ported to documentation if merged to main

ArcticSnow · 2025-08-25T07:25:42Z

ArcticSnow
Aug 25, 2025
Maintainer Author

Further speed improvement could be seek by adding Numba compiler on the various simple computation. For instance, convert the radiation code to numpy and add Numba compiler. An example here mixing: https://examples.dask.org/applications/stencils-with-numba.html

official documentation: https://numba.pydata.org/

0 replies

ArcticSnow · 2025-08-28T18:33:21Z

ArcticSnow
Aug 28, 2025
Maintainer Author

The current work seems to be running well. I tested downscaling 1960-2024 timeseries for 8 points in a single shot, it took less than a minute on a large server. The server's capacity were close to fully used. This was not even possible with the previous implementation.

I now added a tool to convert a stack of netcdf into a zarr. Conversion from netcdf to zarr can be tricky.

@joelfiddes you may wanna check a bit what I coded. I am about to merge my dev branch tps_dev to main. The old method is still available. Simply few new fields in the config file.

1 reply

ArcticSnow Sep 1, 2025
Maintainer Author

I merged the two branches with PR. Now upgraded the Pypi package to v0.3.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using Zarr as a processing step to improve IO #126

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Using Zarr as a processing step to improve IO #126

Uh oh!

Uh oh!

ArcticSnow Aug 22, 2025 Maintainer

Replies: 2 comments · 1 reply

Uh oh!

Uh oh!

ArcticSnow Aug 25, 2025 Maintainer Author

Uh oh!

Uh oh!

ArcticSnow Aug 28, 2025 Maintainer Author

Uh oh!

ArcticSnow Sep 1, 2025 Maintainer Author

ArcticSnow
Aug 22, 2025
Maintainer

Replies: 2 comments 1 reply

ArcticSnow
Aug 25, 2025
Maintainer Author

ArcticSnow
Aug 28, 2025
Maintainer Author

ArcticSnow Sep 1, 2025
Maintainer Author