Skip to content

Conversation

@clessig
Copy link
Collaborator

@clessig clessig commented Nov 24, 2025

Description

Updated config for new model developments, as well as overall improvement of structure.

Issue Number

Is this PR a draft? Mark it as draft.

Checklist before asking for review

  • I have performed a self-review of my code
  • My changes comply with basic sanity checks:
    • I have fixed formatting issues with ./scripts/actions.sh lint
    • I have run unit tests with ./scripts/actions.sh unit-test
    • I have documented my code and I have updated the docstrings.
    • I have added unit tests, if relevant
  • I have tried my changes with data and code:
    • I have run the integration tests with ./scripts/actions.sh integration-test
    • (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
    • (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
  • I have informed and aligned with people impacted by my change:
    • for config changes: the MatterMost channels and/or a design doc
    • for changes of dependencies: the MatterMost software development channel

Comment on lines +200 to +204
# start_date: 197901010000
start_date: 201401010000
end_date: 202012310000
start_date_val: 202101010000
end_date_val: 202201010000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we support the ISO datetime format we can use it the default config

@@ -0,0 +1,386 @@
# streams_directory: "./config/streams/era5_1deg/"
streams_directory: "./config/streams/era5_nppatms_synop/"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noting it could also be stream:...

### Model parameters ###

model :
embedding :
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consistency: should it be assimilation_engine or embedding? the code mostly talks about the assimilation engine.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will have an Embedding module very soon.

mlp_hidden_factor: 2

forecast_engine:
pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could just put forecast_engine: {} or even forecast_engine:

# blocks: 6
# dropout_rate : 0.1

decoder :
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be split across streams?

decoder:
   ERA5: 
      type: ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in the stream config (where it belongs). At the moment it's a combination between some global params specified here and local params in the stream configs

# a regex that needs to fully match the name of the modules you want to freeze
# e.g. ".*ERA5" will match any module whose name ends in ERA5\
# encoders and decoders that exist per stream have the stream name attached at the end
freeze_modules: ""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eventually, it makes more sense to me to move that into the description of the model:

model:
   frozen: True
   forecast_engine:
       frozen: False

The current regex will not handle code refactoring with new names or packages. Longer term question

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's for the moment still better to specify a list of modules to be frozen

freeze_modules: ""


forecast :
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it separate and not part of the model?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be in the model


### Learning rate params ###

learning_rate :
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not under training or learning? we have other keys such as the descent algo etc.

### Shared model+training parameters ###
# TODO: rename

shared_params :
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this name is vague and the content is not coherent. to me, most of these params would go to the model.

mode : "student-teacher"
#
source :
- masking_params :
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this indentation is not the format you mean. it should be:

source:
  - masking_params:
    strategy: healpix
    num_samples: 4
    rate: 0.4
    hl_mask: 4
    same_strategy_per_batch: false
    teacher_relationship: subset
    
  - masking_params:
    strategy: random
    num_samples: 4
    rate: 0.4
    hl_mask: 4
    same_strategy_per_batch: false
    teacher_relationship: subset

or equivalently in json:

    "source": [
        {
            "masking_params": null,
            "strategy": "healpix",
            "num_samples": 4,
            "rate": 0.4,
            "hl_mask": 4,
            "same_strategy_per_batch": false,
            "teacher_relationship": "subset"
        },
        {
            "masking_params": null,
            "strategy": "random",
            "num_samples": 4,
            "rate": 0.4,
            "hl_mask": 4,
            "same_strategy_per_batch": false,
            "teacher_relationship": "subset"
        }
    ]

checkpoint: 250
log_validation: 0


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing the latest tags section


### Latent noising parameters ###

latent_noise :
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to go to training_strategy

@tjhunter tjhunter marked this pull request as draft December 15, 2025 08:46
@clessig
Copy link
Collaborator Author

clessig commented Jan 5, 2026

Will be merged with #1541

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants