-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Summary
Establish reliable data access and storage pipeline for ERA5 reanalysis data and NOAA climate indices. This phase creates the foundation for all subsequent analysis.
Parent Issue: #1
Objectives
- Download and cache ERA5 sea level pressure and geopotential height data
- Load and parse NOAA climate indices (NAO, AO, ONI, PDO)
- Implement preprocessing (anomaly computation, regridding)
- Create comprehensive test suite
System Context
data/
├── raw/ # Downloaded NetCDF files
│ └── era5/
├── processed/ # Anomalies, regridded data
└── external/ # NOAA indices, EM-DAT
Files to Create/Modify
| File | Action | Description |
|---|---|---|
src/data/download.py |
Create | ERA5Downloader class with CDS API |
src/data/loaders.py |
Modify | Add ERA5 loader, EM-DAT loader |
src/data/preprocessing.py |
Create | AnomalyCalculator, Regridder classes |
tests/test_data.py |
Modify | Add download and preprocessing tests |
Implementation Checklist
CDS API Setup
- Document CDS account creation and API key setup
- Test CDS API connectivity
- Handle authentication errors gracefully
ERA5 Downloader
- Implement
ERA5Downloaderclass - Add request builder for monthly SLP/Z500 variables
- Implement download checkpointing (resume failed downloads)
- Add progress tracking with logging
- Handle rate limits with exponential backoff
NOAA Index Loader (Partially Complete)
- Implement
NOAAIndexLoaderclass - Parse NOAA PSL format for NAO, AO, ONI, PDO
- Add caching to file system
- Handle network errors gracefully
Preprocessing
- Implement
AnomalyCalculator(remove climatological mean) - Implement
Regridderfor resolution standardization - Add latitude weighting for EOF preparation
Testing
- Unit tests for NOAA loader
- Integration test for ERA5 download (small subset)
- Test anomaly computation produces zero-mean fields
Code Snippets
ERA5 Download Request
# src/data/download.py
def _build_request(self, variable: str, year: int, month: int) -> dict:
"""Build CDS API request for ERA5 monthly means."""
return {
"product_type": "monthly_averaged_reanalysis",
"variable": variable,
"year": str(year),
"month": f"{month:02d}",
"time": "00:00",
"format": "netcdf",
}Anomaly Calculation
# src/data/preprocessing.py
def compute_anomalies(data: xr.DataArray) -> xr.DataArray:
"""Remove monthly climatology from data.
Args:
data: DataArray with time dimension
Returns:
Anomalies (deviations from monthly mean)
"""
climatology = data.groupby("time.month").mean("time")
anomalies = data.groupby("time.month") - climatology
return anomaliesVerification
# Test ERA5 download
python -m src.data.download --variable msl --year 2020 --month 1 --dry-run
# Verify NOAA loader
python -c "from src.data.loaders import NOAAIndexLoader; print(NOAAIndexLoader().load_index('NAO').head())"
# Run tests
pytest tests/test_data.py -vTechnical Challenges
| Challenge | Mitigation |
|---|---|
| ERA5 downloads slow | Start with NCEP (smaller), use dask for lazy loading |
| CDS rate limits | Implement exponential backoff, queue requests |
| NetCDF memory issues | Use dask chunking from start |
| Network failures | Checkpointing, automatic retry |
Definition of Done
- ERA5 monthly SLP downloads successfully for any year/month
- NOAA indices load into pandas DataFrame with datetime index
- Anomaly computation produces zero-mean monthly fields
- All tests pass with
pytest tests/test_data.py - Coverage >80% for data module
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request