Skip to content

Execute GEOS via calling gcm_run.j + rename marine suites -- part 1, 2, & 3#677

Open
Dooruk wants to merge 76 commits intodevelopfrom
feature/exec_geos_direct_part1
Open

Execute GEOS via calling gcm_run.j + rename marine suites -- part 1, 2, & 3#677
Dooruk wants to merge 76 commits intodevelopfrom
feature/exec_geos_direct_part1

Conversation

@Dooruk
Copy link
Collaborator

@Dooruk Dooruk commented Dec 15, 2025

This is ready to go in. CI-workflows need to be modified to handle new suite names like so:

GEOS-ESM/CI-workflows#32

Main change 1:

SOCA (marine) suites are renamed:

3dvar -> 3dvar_marine
3dvar_cycle -> 3dvar_marine_cycle
3dfgat_cycle -> 3dfgat_marine_cycle

Main change 2:

Cylc calls gcm_run.j directly in flow.cylc.

With this new approach, SWELL can point to an existing GEOS experiment folder (the experiment.yaml key for that is geos_homdir) and the forecast folder is now located under experiment GEOSgcm/forecast directory. It is possible to hotstart. With this new approach, forecast directory is not erased and MAPL history outputs can be accumulated under there. I updated docs a bit but might add on more for GEOSgcm execution.

For those who stumbled upon this PR, more details on change 2 below:

The main thing happening here is that Cylc (flow.cylc) now calls gcm_run.j directly. To facilitate this a forecast directory was created under {swell_exp_dir}/GEOSgcm/forecast. This forecast folder is a replication of a GEOS experiment folder, with only a few changes regarding where HOMDIR, EXPDIR are defined. Model execution happens under {swell_exp_dir}/GEOSgcm/forecast/scratch similar to typical GEOS model runs.

Why was this change necessary:

  • We want SWELL to be less involved in GEOS model execution task(s). The previous method required lots of file manipulation (in particular due to the boundary condition files, AKA /RC files) in the forecast directory. This creates incompatibility while running/testing different GEOSgcm versions. Between multiple products and update frequencies, this is an important requirement.
  • In Cylc templating forecast dir can't be updated in flow.cylc if it is templated in a time dependent way.
  • subprocess simply couldn't run GEOSv12 on Milan nodes. I tried many combinations, it didn't pass beyond the initialization stage.
  • Defining sufficient nodes, MPI layouts etc. is handled by gcm_run.j. If users make mistake in terms of requesting sufficient SLURM nodes, GEOS tries submitting hundreds of instances to compensate lack of compute resources, then NCCS will yell at you.
  • (long term relevancy) gcm_run.j and gcm_setup.j scripts are being or will be modernized. This is work underway but might take a long time (especially gcm_run.j).

⚠️ Which means to use a gcm_run.j in SWELL, some parts should be erased or commented out. Or, my idea is that there could be conditional sections in gcm_run.j say SWELL_active, then gcm_run.j can skip those sections, which are mainly postprocessing anyway.

More details in below comment: #677 (comment)

Finally, little primer on gcm_run.j

Let's consider gcm_run.j in 4 stages:

  1. SLURM & node assignment
  2. Preprocessing
  3. Execution
  4. Postprocessing

In the current implementation, SWELL handles 2 & 3 via python and subprocess and 1 is assumed to be set properly by the user, which caused trouble with the NCCS. For DA purposes 4, postprocessing is explicitly handled by SWELL but that is not the focus of this PR.

In this proposed implementation, the main difference is that we rely on gcm_run.j for 2 and 3 by conducting surgical edits via PrepCoupledGeosRundir at few locations and running gcm_run.j directly from Cylc (which doesn't capture failed exit status):

    [[RunGeos]]
        script = "{{experiment_path}}/forecast/gcm_run.j"
        platform = {{platform}}
        [[[job]]]
            shell = /bin/csh
        [[[directives]]]
        {%- for key, value in scheduling["RunGeos"]["directives"]["all"].items() %}
            --{{key}} = {{value}}
        {%- endfor %}

I created the 3dfgat_coupled_cycle suite for testing, should work by default if anyone has time to check it out.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Dooruk and others added 22 commits February 23, 2026 10:30
* some changes for geosv12

* changes in geos class utility for geos v12

* changes for 3dvar_cycle to test get, prep and run

* rename suites, take out obsolete parts and tasks

* breaking down PR/660 into smaller pieces -- part3, adding hofx_cf experiment (#694)

* first commit

* fix failing tests

* clean up and handling NO

* fix coding norm

* fail if obs is not listed in observation_ioda_names.yaml

* add tier 1 tests

* fix coding norm

* Update src/swell/configuration/jedi/interfaces/geos_cf/task_questions.yaml

Co-authored-by: Doruk Ardağ <38666458+Dooruk@users.noreply.github.com>

* Update src/swell/configuration/jedi/interfaces/geos_cf/task_questions.yaml

Co-authored-by: Doruk Ardağ <38666458+Dooruk@users.noreply.github.com>

* remove channels from eva yaml

---------

Co-authored-by: Doruk Ardağ <38666458+Dooruk@users.noreply.github.com>

* add docstrings and proper naming for tasks as forecast directory is not erased

* version bump

* suite changes with new task names

* code improvements

* Implement R2D2 Ingest Suite (#675)

* Script to setup new r2d2 credentials

* Create a new swell task to test new r2d2

* Adapt get_observations to new R2D2

* Remove exit()

* Add r2d2 configs (#318)

* Update swell tasks to new R2D2 (#318)

* Update r2d2 version of  save obs diagnostics #318

* Remove unused files #318

* Create r2d2 file register script #318

* Add scripts for manual setup for R2D2 #318

* Clean up files (#318)

* Clean up the files (#318)

* Update Python coding norms (#318)

* Fix pycode styles

* Remove redundant lines

* Load R2D2 credentials under TaskBase (#318)

* Load credentials under create R2D2 config (#318)

* make R2D2 host/compiler detection support dynamic (#318)

* Add docs for credential setup (#318)

* Update r2d2_config for cascade (#318)

* Move credentials under create_task (#318)

* Move scripts under utilities (#318)

* Fix pylint errors

* Fix AttributeError when fetching bias correction files (#318)

* Fix bias correction arguments (#318)

* Fix bias correction argument (#318)

* Add file type argument (#318)

* Fix bias correction ingest

* Improve file extension support

* Add logging

* Fix bias coefficient ingestion

* Go back to existing bias naming convention

* Use JCSDA enums for bias files

* Improve logging

* Fix code style

* extend exclude list to ignore venv and build directories

* Change the script name register_files with ingest_files

* Create a test suite

* add ingest question default

* Fix Slurm qos

* Add observation yaml for ingest config

* Fix datetime with string

* suite config

* Add defaults and override yaml for ingest

* Switch to new cylc

* Fix suite config

* Add file pattern

* Make ingest more modular

* Move obs configs under JEDI config directory.

* Add ingest config yamls for obs

* Implement ingest background suite. (#646)

* Use provider name from the provider list (#646)

* Clean up unused parts in yaml files (#646)

* Fetch model name from experiment.yaml (#646)

* Create searh for already ingested files and skip (#646)

* Remove window offset (#646)

* Delete an old doc

* Delete an old script

* Add detailed docstrings. (#646)

* Add type hints. (#646).

* End the task if no source pattern is found (#646).

* Made R2D2 related exceptions explicit. (#646).

* Update background yaml files (#646).

* Fix the glob pattern (#646)

* Add search for already ingested files (#646)

* Fix pycode style tests (#646)

* Fix code tests (#646)

* Remove check function #646

* Create a standalone script to delete obs within range

* Fix pycode style

* Add step for restart

* Removing ingest background suite for another PR (#646)

* Create a README (#646)

* Fix window_start calculation. (#646)

* Fix pycodestyle

* Add docs to Swell website (#646)

* Fix doc path for the website (#646)

* Change path for config files with  experiments directory. (#646)

* Update README (#646)

* make mom6_iau model dependent

* cycle times hack

* make experiment.yaml non-alphabetical again by using default ruamel

* relevant for experiment.yaml

* minor fix for MOM6 IAU

* cycle times and overrride fixes

* add tier2 cycling run

* fix platform defaults

---------

Co-authored-by: Maryam Abdi-Oskouei <mary.abdi@gmail.com>
Co-authored-by: Furkan Goktas <ftgoktas@gmail.com>
@Dooruk Dooruk changed the title Execute GEOS via calling gcm_run.j -- part 1 Execute GEOS via calling gcm_run.j + rename marine suites -- part 1, 2, & 3 Mar 2, 2026
@Dooruk
Copy link
Collaborator Author

Dooruk commented Mar 2, 2026

This is ready to go in and/or final testing. CI-workflows need to be modified to handle new suite names though.

@jeromebarre
Copy link

jeromebarre commented Mar 4, 2026

@Dooruk For the sake of clarity and simplicity, would it be possible to break down the PR in two for the Main change 1 and 2? They are two disctinct feature improvements here. That would also speed up and facilitate the reviews. Thanks a lot!

@Dooruk
Copy link
Collaborator Author

Dooruk commented Mar 5, 2026

@Dooruk For the sake of clarity and simplicity, would it be possible to break down the PR in two for the Main change 1 and 2? They are two disctinct feature improvements here. That would also speed up and facilitate the reviews. Thanks a lot!

I already did this. I'm running out of time before I go on FMLA so I won't be able to do that again. Main change 1 is just renaming suites anyway.

I created this PR after your similar comment, it was only part 1 for a couple of months or so. @mer-a-o reviewed it and decided she will implement the Skylab approach to execute GEOS for compo, and we will collaborate for running NWP afterwards. This is not end all be all in terms of executing GEOS but it is needed urgently for marine group to run some experiments while I'm gone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants