Skip to content

Releases: vergauwenthomas/MetObs_toolkit

v1.0.2

25 Feb 13:00
2e5e420

Choose a tag to compare

What's Changed

Full Changelog: v0.4.7...v1.0.2

v1.0.0

19 Dec 15:48

Choose a tag to compare

What's Changed

Full Changelog: v0.4.7...v1.0.0

v0.4.7

20 Oct 10:58
d1f30f7

Choose a tag to compare

Summary

This release contains a set of improvements, bug fixes, API refinements and tests. Key themes: gap-filling refactor and robustness improvements, new plotting helpers (pandas-backed), better modeldata filtering and selection, GEE authentication/test helpers, logging hardening, improved distance-matrix/buddy-check logic, and added min/max constraints to gap-fills.
Version bump: 0.4.6 → 0.4.7

Highlights

Gap handling refactor

A gap overview API (gap_overview_df / gap_status_overview_df) providing concise, one-row-per-gap summaries at SensorData / Station / Dataset levels.
Default and validation behaviour changed: gap-size checks added and new parameter max_gap_duration_to_fill controls whether a gap is allowed to be filled (defaults adjusted to make behaviour more intuitive).
New gap statuses and logic: a "partially successful gapfill" status is introduced; gap flagging logic updated to treat partially successful gaps more intuitively for sequential gapfilling.
Many gapfill methods refactored to accept/propagate max_gap_duration_to_fill and optional min_value / max_value constraints.
Gap-filling value constraints

New support for min_value and max_value in core gap-fill paths (raw, debiased, diurnal debiased, weighted diurnal).
Filled values can be clipped to prevent unphysical results; tests added for these constraints.
Internal fill functions were updated to accept min/max (e.g. fill_regular_debias, fill_with_diurnal_debias, fill_with_weighted_diurnal_debias).
Model-data selection & plotting

New helper filter_modeldatadf for robust filtering of the modeldata DataFrame by obstype, modelname, modelvariable; used internally by plotting functions.
Station/Dataset plotting improvements:
modeldata_name and modeldata_kwargs added to make_plot to select specific modeldata series for plotting.
A new parameter modeltype adds the ability to select a different model data "type" than the obstype if needed (defaults to obstype).
New convenient pandas-backed plotting helpers:
ModelTimeSeries.pd_plot (wrapper around pandas.Series.plot for model timeseries)
SensorData.pd_plot (wrapper around pandas.Series.plot for sensordata, with label filtering support)
Plotting internals refactored to expose these simpler pd-plot entrypoints.
Tests and baselines for the new pd plots and modeldata plotting added.
New and improved utilities

convert_to_numeric_series added (and integrated into dataset and sensordata import paths) to handle values that use comma as decimal separator.
Timestamp and xarray conversion fixes: timedelta and timestamp attrs serialized in xarray conversions; improved netCDF engine handling (netcdf4 selected by default unless overridden) to avoid Unicode issues.
New dev/test tooling files added for GEE: a script to test GEE authentication environment (deployment/test_gee_auth.py) and updates to CI/dev pipeline scripts.
GEE and geemap

GEE initialization/auth flow improved: try default initialization first; if that fails, fall back to authenticate. Added handling/tests for known EarthEngine/gee changes in the test pipeline.
Dependency pinning: earthengine-api pinned to <=1.6.11 due to compatibility with geemap 0.35.3.
Logging improvements

Logging module now avoids creating duplicate FileHandlers / StreamHandlers. Existing handlers are checked for duplicate filepath/level before adding new handlers.
Buddy check & distance matrix

Buddy-check fixes: bug fixes and improved messaging when joining duplicate messages in the buddy-check loop; new tests added to cover edge cases.
Distance matrix now uses BallTree with haversine metric for better performance and correctness at scale. A separate helper generate_distance_matrix was added.
Docs, examples and tests

Bug fixes (representative)

Fixed gap-filling logic edge cases and gap-size validation (avoid filling overly large gaps by default).
Handled unicode / netcdf engine issues when saving netCDF (default to netcdf4).
Fixed bug in filtering of model data frame used for plotting and selection.
Fixed buddy-check duplicate message/iteration bug and added tests that reproduce triggers.
Fixed handling of comma-as-decimal when importing datasets.
Fixed geemap-related test and notebook display issues (closing figures after comparison).
API / Behaviour changes (important for users)

Station.modeldata: function/return types and usage were adjusted. Model data selection APIs were improved; a helper filter_modeldatadf was added to reliably extract model rows from the model datadf. Check your code if you iterate over station.modeldata or used its type expectations.
New/pushed parameters and renamed args:
Most gapfill and interpolation methods changed from "max_consec_fill" (count-based) to "max_gap_duration_to_fill" (duration-based, independent of dt resolution). Defaults changed (common defaults set to 3h for interpolation and 12h for model-based fills).
Many Dataset/Station/SensorData gapfill methods now accept optional min_value and max_value arguments (to constrain filled values).
Dataset/Station/SensorData now expose gap_overview_df methods (returning a compact per-gap summary).
ModelTimeSeries.pd_plot and SensorData.pd_plot now exist as convenience wrappers.
GEE: connect_to_gee flow attempts initialization first, and authenticates only if necessary. Tests added to check local credential presence.
Migration guide (suggested)

If you previously used max_consec_fill:
Replace usages with max_gap_duration_to_fill; pass a pandas Timedelta or string like "3h" (e.g. max_gap_duration_to_fill="3h" or pd.Timedelta("3h")).
Example: dataset.interpolate_gaps(..., max_gap_duration_to_fill="3h")
To limit filled values:
Pass min_value and/or max_value to fill_gaps_with_raw_modeldata, fill_gaps_with_debiased_modeldata, fill_gaps_with_diurnal_debiased_modeldata, fill_gaps_with_weighted_diurnal_debiased_modeldata.
For plotting:
Use the new pd_plot helpers for quick plots: my_modeltimeseries.pd_plot(...) and my_sensordata.pd_plot(show_labels=["ok"], **kwargs).
To choose specific model data series in make_plot, use modeldata_name or modeldata_kwargs.
For selecting modeldata rows from the combined DataFrame:
Use filter_modeldatadf(modeldatadf, trgobstype, modelname, modelvariable) to robustly get the intended subset.
If you relied on the old Dataset/Station.gaps API for "singular_gaps", switch to gap_overview_df/gap_status_overview_df for single-row-per-gap summaries.
Dependency notes

earthengine-api: pinned to <= 1.6.11 due to geemap compatibility (geemap 0.35.3).
geemap >= 0.35.3 required.
Minor updates across docs/testing tooling.
Developer / internal notes

Contributors (from commit co-authors)

Thomas Vergauwen
Leon Adriaensen (@ADRIE-A3)
Copilot / automated/code-assist contributions mentioned in commit history

v0.4.6

29 Sep 09:08

Choose a tag to compare

Release Notes - MetObs_toolkit v0.4.6

Note: v0.4.5 does not exist. (It is missing because of an installation bug on Py3.10, PyPi restrictions force me to skip that release.)

Release Highlights:
This release delivers enhancements, bug fixes, and improved robustness for the MetObs_toolkit. It focuses on better data handling, new plotting functionalities and fixes for various edge-cases.

🚀 New Features & Enhancements

  • Data Import Robustness:
    • Added support for comma as a decimal symbol when importing data.
    • Introduced convert_to_numeric_series for safer numeric conversions, replacing direct .astype calls.
  • Plotting Improvements:
    • The make_plot() method of the stations class now supports:
      • modeldata_name variable for easier model series selection.
      • modeldata_kwargs to select specific modeldata series.
      • New modeltype parameter to plot different types of modeldata independently from obstype.
  • Site Metadata Enrichment:
    • Added lcz (Local Climate Zone) and altitude as attributes of site.
    • These are now included in the API documentation.
  • Quality Control (QC) Improvements:
    • Enhanced buddy check:
      • More informative error messages with iteration reference.
      • Fix for duplicate messages by joining them.
      • Added tests for relevant edge cases.

🐛 Bug Fixes

  • Fixed bug in test baselines and ensured correct location for baseline data.
  • Fixed bug where altitude being NaN could cause processing errors.
  • Fixed bugs in tests and improved test coverage.
  • Addressed Sphinx warnings in the documentation.
  • Resolved several grammar errors in code comments.

🧪 Testing & Maintenance

  • Added and improved tests for plotting and QC edge-cases.
  • Updated test baselines for more robust regression checking.
  • Black formatting and code style improvements across multiple modules.

🔢 Versioning

  • Version set to v0.4.6.

New Contributors

Full Changelog: v0.4.4...v0.4.6

v0.4.4

05 Sep 11:39
d390206

Choose a tag to compare

Release name: v0.4.4 Tag: v0.4.4 Compare: changes since v0.4.3

Highlights

Data IO and formats
Parquet reader support added. (#557)
New to_parquet and to_csv methods for Dataset and Station classes. (#556)
CF-compliant netCDF serialization for xarray Datasets with nested attributes. (#558)
Model data improvements
ModelTimeseries unit conversion handling and ModelObstype renaming for clearer semantics and consistency. (#543, #545)
Robustness and correctness
Fix for NaTType error in frequency estimation when variable list is empty. (#562) — thanks to @ADRIE-A3 for reporting (#561).
Safer gapfilling invocation by checking stations for obstype when GF is called on Dataset. (#566)
Standardize runtime warnings by converting them to structured logging. (#565)
Improved QC error handling on Dataset. (#560)
Developer experience and docs
Human-readable repr methods for main classes to aid debugging and inspection. (#568)
README updated to include conda install instructions and badge. (#555)

Potential behavior changes

Renamed/standardized “ModelObstype” naming and unit-conversion handling for model time series. Downstream user code referencing the old name or implicit conversions may need to adapt. (#543, #545)

Closed issues addressed in this release window

error importing data, NaTType in frequency for empty variable list — reported by @ADRIE-A3, fixed via (#562). (#561)
template_build_prompt() to accept arguments — opened by @pratiman-91. (#551)
Use of pint for units and conversion — opened by @pratiman-91. (#549)
Update docs to latest version — opened by @pratiman-91. (#547)
Update Repo About information — opened by @pratiman-91. (#548)
Contributors (thank you!)

Code contributions:

@vergauwenthomas (#543, #545, #560, #566)
@pratiman-91 (#555, #557)
@Copilot (app/bot) (#556, #558, #562, #565, #568)

Issue reporters:

@ADRIE-A3 (#561)
@pratiman-91 (#547, #548, #549, #551)

Included pull requests (since v0.4.3)

#543 — Modeltimeseries unit conv handling and modelobstype renaming. (@vergauwenthomas)
#545 — Modeltimeseries unit conv. (@vergauwenthomas)
#555 — Update README.md to include conda install and badge. (@pratiman-91)
#556 — Add to_parquet and to_csv methods for Dataset and Station classes. (@Copilot)
#557 — Parquet reader. (@pratiman-91)
#558 — Implement CF-compliant netCDF serialization for xarray Datasets with nested attributes. (@Copilot)
#560 — Qc on dataset error handling. (@vergauwenthomas)
#562 — Fix NaTType error in frequency estimation for empty variable lists. (@Copilot)
#565 — Standardize warning formatting by converting operational warnings to logging. (@Copilot)
#566 — Check stations for obstype when GF is called on Dataset. (@vergauwenthomas)
#568 — Implement human-readable repr methods for all main classes. (@Copilot)

v0.4.3

25 Aug 13:53

Choose a tag to compare

What's Changed

Full Changelog: v0.4.0...v0.4.3

v0.4.0

16 May 12:09
84b87e4

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.3.0...v0.4.0

v0.4.0a

14 May 10:41

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.3.0...v0.4.0a

v0.3.0

10 Sep 13:11
1de19d0

Choose a tag to compare

The following parts are (major) revised:

  • Gaps: There are no missing observations anymore. All that is missing, is considered a gap.
  • gap filling: Multiple methods with different complexity for filling with modeldata
  • Template: Templates are now stored as JSON files, and in a dedicated class.
  • Modeldata: Modeldata has a specific class for static and dynamic datasets
  • Documentation: The API now has examples for all user-accessible functions and methods.

What's Changed

Full Changelog: v0.2.1...v0.3.0

v0.2.1

16 Jul 10:01
37e3ee8

Choose a tag to compare

Templates are handled by Template() and json file used to store templates.

What's Changed

Full Changelog: v0.2.0...v0.2.1