-
Notifications
You must be signed in to change notification settings - Fork 100
Support use of Google Film for temporal interpolation #2270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…pecific plugin and addition of a Google Film option to the temporal interpolation plugin and CLI.
…he situation where a range of forecast periods are available, but within this range there are gaps to be filled. This class also checks an attribute, which contains information about the forecast source at each forecast period. Forecast periods at which there is a transition from one forecast source to another will be the target for the interpolation.
1949898 to
e0fa0b5
Compare
bayliffe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nice round 30 comments for you.
| - pysteps | ||
| - python-stratify | ||
| - tensorflow | ||
| - tensorflow-hub |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please update the list of exclusions in environment_b to indicate that the temporal interpolation / gap filling CLIs are unsupported by that environment:
improver/envs/environment_b.yml
Line 2 in e0fa0b5
| # It does not include dependencies for the following IMPROVER CLIs: |
| Identifies missing forecast periods in a sequence and fills them using | ||
| temporal interpolation. Can optionally regenerate periods at forecast | ||
| source transitions when cluster_sources configuration is provided. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your comments here and below talk about forecast periods but that's potentially misleading. For the cube shown below, where the data is all valid at the same time I have a set of forecast periods due to differing forecast reference times:
duration_of_sunshine / (seconds) (forecast_reference_time: 15; latitude: 960; longitude: 1280)
Dimension coordinates:
time x - -
latitude - x -
longitude - - x
Auxiliary coordinates:
forecast_period x - -
Scalar coordinates:
time 2012-07-25 00:00:00
originating_centre European Centre for Medium Range Weather Forecasts
Attributes:
Conventions 'CF-1.7'
GRIB_PARAM GRIB1:t032c128n098
It is not appropriate to use this tool to try and fill in a missing forecast initialisation time but that's what would happen in this case. The IMPROVER conventions are to have a single validity time in a cube, which this example adheres to, it simply isn't structured in the way we normally do it.
All of this is to simply say that I think it is important to make clear here that the intention is to fill in missing validity times in a sequence of validity times, providing a more complete forecast evolution through time.
| a dimension coordinate (will be sliced automatically), or multiple | ||
| individual cubes representing different forecast periods. All cubes | ||
| should have forecast_period coordinates and represent the same | ||
| diagnostic at different forecast periods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above as to why this is potentially misleading.
| This dictionary maps realization indices to their forecast sources | ||
| and periods. When provided with interpolation_window_in_hours, | ||
| enables identification and regeneration of forecast periods at | ||
| source transitions. Format: {realization_index: {source_name: [periods]}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sufficiently au fait with the clustering stuff yet to understand how the forecast period / validity time / forecast reference time coordinates are used across models and whether that's guided you towards all the discussion of forecast periods here.
| clipping_bounds = tuple(float(x) for x in clipping_bounds) | ||
| else: | ||
| clipping_bounds = (0.0, 1.0) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can you not handle clipping bounds in the same way as the other CLI to avoid adding things here that Carwyn will have to later move to a meta plugin?
| def google_film_mock_model(): | ||
| """Create a mock TensorFlow Hub model that returns a simple interpolation.""" | ||
|
|
||
| class MockModel: | ||
| def __call__(self, inputs): | ||
| # Simple linear interpolation for testing | ||
| time_frac = inputs["time"][0] | ||
| x0 = inputs["x0"][0] | ||
| x1 = inputs["x1"][0] | ||
| # Linear interpolation: x0 + time_frac * (x1 - x0) | ||
| interpolated = x0 + time_frac * (x1 - x0) | ||
| return {"image": np.expand_dims(interpolated, axis=0)} | ||
|
|
||
| return MockModel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A slightly different implementation of the setup_google_film_mock method in test_TemporalInterpolation.py. Is there a reason to define the same thing twice (it's been imported for reuse in test_ForecastPeriodGapFiller.py)?
| model_path = "/path/to/model" | ||
| scaling = "minmax" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we modify these values so there is less overlap with the subsequent default test.
| ) | ||
| # Data should be different from input cubes (interpolated) | ||
| assert not np.allclose(result_cube.data, cube1.data) | ||
| assert not np.allclose(result_cube.data, cube2.data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cube1 and cube2 are modified by the GoogleFilmInterpolation plugin (the scaling is applied but not reversed on these inputs). As such a demonstration that the data is not the same has little value here. Additionally, we would probably expect cube2 (unscaled) and the result[-1] data to match as this is the final node and the interpolation should not have changed it.
| def test_google_film_process_time_fraction_calculation( | ||
| google_film_sample_cubes, google_film_mock_model, monkeypatch | ||
| ): | ||
| """Test that time fractions are calculated correctly for interpolation.""" | ||
| cube1, cube2 = google_film_sample_cubes | ||
|
|
||
| # Create template with a single intermediate time (50% through) | ||
| time = datetime.datetime(2017, 11, 1, 6) # Midpoint between 3 and 9 | ||
| npoints = 5 | ||
| data = np.ones((npoints, npoints), dtype=np.float32) * 4 | ||
| template = diagnostic_cube( | ||
| data=data, | ||
| time=time, | ||
| frt=datetime.datetime(2017, 11, 1, 3), | ||
| spatial_grid="latlon", | ||
| ) | ||
|
|
||
| captured_time_fractions = [] | ||
|
|
||
| class CapturingMockModel: | ||
| def __call__(self, inputs): | ||
| captured_time_fractions.append(inputs["time"][0]) | ||
| time_frac = inputs["time"][0] | ||
| x0 = inputs["x0"][0] | ||
| x1 = inputs["x1"][0] | ||
| interpolated = x0 + time_frac * (x1 - x0) | ||
| return {"image": np.expand_dims(interpolated, axis=0)} | ||
|
|
||
| def mock_load_model(self, model_path): | ||
| return CapturingMockModel() | ||
|
|
||
| monkeypatch.setattr(GoogleFilmInterpolation, "load_model", mock_load_model) | ||
|
|
||
| plugin = GoogleFilmInterpolation(model_path="/mock/path") | ||
| plugin.process(cube1, cube2, template) | ||
|
|
||
| # Time should be 50% through (6 AM is midpoint between 3 AM and 9 AM) | ||
| assert len(captured_time_fractions) == 1 | ||
| np.testing.assert_almost_equal(captured_time_fractions[0], 0.5, decimal=5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clever. Gross, but clever.
| assert not np.allclose(result_cube.data, cube2.data) | ||
| # Data should be in a reasonable range (after reverse scaling) | ||
| assert np.all(result_cube.data >= 0.5) | ||
| assert np.all(result_cube.data <= 10.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the model >100MB suggesting this test is way too heavy to be included in unit tests (although it will mostly be skipped I assume)?
Addresses https://github.com/metoppv/mo-blue-team/issues/763
improver_test_data PR: metoppv/improver_test_data#118
Description
This PR adds two pieces of functionality:
File breakdown:
temporal-interpolateCLI but is intended to take a range collection of forecast periods and fill gaps, or smooth across transitions between forecast sources.Testing: