Reformat weather datasets into zarr.
Browse the datasets produced by this repo at https://dynamical.org/catalog/.
- See AGENTS.md for an overview of the approach and this repository.
- Integrate a new dataset to be reformatted.
- Add a new variable to an existing dataset.
We use
uvto manage dependencies and python environmentsrufffor linting and formattingmypyfor type checkingpytestfor testingpre-committo automatically lint and format as you git commit
- Install uv
- Run
uv run pre-commit installto setup the git hooks - If you use VSCode, you may want to install the extensions (ruff, mypy) it will recommend when you open this folder
uv run main --help- list all datasetsuv run main <DATASET_ID> update-templateuv run main <DATASET_ID> backfill-local <INIT_TIME_END>
- Add dependency:
uv add <package> [--dev]. Use--devto add a development only dependency. - Lint:
uv run ruff check [--fix] - Type check:
uv run mypy - Format:
uv run ruff format - Tests:
- Run tests in parallel on all available cores:
uv run pytest - Run tests serially:
uv run pytest -n 0
- Run tests in parallel on all available cores:
To reformat a large archive we parallelize work across multiple cloud servers.
We use
dockerto package the code and dependencieskubernetesindexed jobs to run work in parallel
- Install
dockerandkubectl. Make suredockercan be found at/usr/bin/dockerandkubectlat/usr/bin/kubectl. - Setup a docker image repository and export the
DOCKER_REPOSITORYenvironment variable in your local shell. e.g.export DOCKER_REPOSITORY=container.registry/<project-id>/reformatters/main. Follow your registry's instructions to allow your docker to authenticate and push images to the registry. - Setup a kubernetes cluster and configure kubectl to point to your cluster. e.g.
aws eks update-kubeconfig --region <region> --name <cluster-name>,gcloud container clusters get-credentials <cluster-name> --region <region> --project <project>, etc. - Create a kubectl secret containing a single json encoded value to be passed to fsspec
storage_optionsor splatted as keyword arguments to an icechunk storage openerkubectl create secret generic your-destination-storage-options-key --from-literal=contents='{"key": "...", "secret": "..."}'. Seestorage.py.