Skip to content

Downscaling dataset production script.#879

Open
frodre wants to merge 10 commits intomainfrom
script/downscaling-zarr-processing
Open

Downscaling dataset production script.#879
frodre wants to merge 10 commits intomainfrom
script/downscaling-zarr-processing

Conversation

@frodre
Copy link
Collaborator

@frodre frodre commented Feb 27, 2026

To create the downscaling datasets, we do a bit of light processing to gather variables from various files and renaming of fields/coordinates. This PR formalizes that process into a script scripts/downscaling/process_from_raw_zarrs.py. I used this to produce the +4K datasets under gs://vcm-ml-scratch/andrep/downscaling-xshield-amip-plus-4k that now reside on weka under /climate-default/2026-02-23-X-SHiELD-AMIP-plus-4K-downscaling.

For now, all the configuration is within the script, but if we do this for other data, we should probably split that out into YAML-based configs or something. The script allows for --dry-run mode to preview writes, processing any combination of the dataset resolutions (100km, 25km, 3km), it allows for partial variable appends, and complete overwrites, but errors on output existence if no special zarr write mode is specified.

@frodre frodre marked this pull request as ready for review February 27, 2026 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant