Kokkos Thermodynamics #992

Thanduriel · 2025-11-17T10:23:51Z

Kokkos Thermodynamics

Work in progress on the Kokkos(GPU) thermodynamics. (issue #672)

To track changes in buffers and to sync data between host and device a more data oriented approach is needed. To this end, the management of fields was made more explicit through the ModelArrayAccessor. The whole model and the tests have been updated to use this new interface which requires:

explicitly listing all fields that have to be accessed for an operation
moving the update code into a lambda to capture the fields
updating member functions called in a kernel to get all field data explicitly through their arguments.

See for example physics/src/modules/LateralIceSpreadModule/HiblerSpread.cpp The USE_KOKKOS block should be ignored since is just a copy that has not been updated yet.

To actually enable device execution, the code needs to be ported to Kokkos which involves further changes:

request the device buffers instead of the host ModelArray
all data used in the kernel has to be captured explicitly by copy (especially static member variables)
math functions (cos,min,sqrt) in the kernel have to be replaced with versions from Kokkos

See for example physics/src/modules/IceThermodynamicsModule/ThermoWinton.cpp.

The question is, how should these two different code paths be separated? In the dynamics, the Kokkos code is kept in separate files that build on top of the existing reference implementation, replacing all the relevant parts. This is necessary to achieve a tight integration for the actual computations. In long-term, the idea was to focus on maintaining the Kokkos version since it runs on both CPU and GPU and perhaps remove the old implementation at some point.

Thermodynamics could be ported in the same way, but since the individual components are much smaller it is also feasible to insert the switch inside the update routines and reuse more of the code. In fact, I build a light abstraction layer that hides all the specifics so there is just one version of the code, seeThermoWinton::update. Of course, the combined kernel for overElements still needs to adhere to the stricter rules of a Kokkos kernel, so using these abstractions would not automatically make all the code compatible. This abstraction is also somewhat redundant because Kokkos does effectively the same thing with its execution space concept. It just brings the added benefit of working directly with ModelArray and not needing Kokkos to build NeXtSIM-DG.

…undaries

* made device buffer init lazy * removed resize_arrays() * fix: moved sync state to ExtModelArray

* PDTestDynamics is registered under the proper name

…ynamics

updated CCSMIceAlbedo removed obsolete code restored old interface of FiniteElementSpecHum

Thanduriel · 2026-01-26T10:15:53Z

Concerning virtual function calls in overElements kernels

@timspainNERSC

The files to look at:

IIceAlbedo.hpp changes to the interface inline with other ModelComponents
CCSMIceAlbedo.cpp / SMUIceAlbedo.cpp concrete implementations that I had to port, one for the test and the other is the default for ERA5Atmosphere
FiniteElementFluxes.cpp for the use in calculateIce() (previous) vs updateIce() (now)

Consequences of the new interface that I can't properly evaluate:

Since the inputs to surfaceShortWaveBalance() are no longer provided by the calling module it is technically less flexible. However, in all the cases where it is currently used, It seems like snowThickness is always computed the same way, i0 is a configured constant parameter and temperature is just tsurf.
i0 previously was a parameter of the calling module, e.g. FiniteElementFluxes but is now part of the concrete IIceAlbedo implementation. If it is logically always the same it could also go right into IIceAlbedo itself. I did not do that because there is no associated compilation unit (.cpp file) to define i0 and ModelComponent seem to always be implemented inline in the headers. Is there a reason for this?

timspainNERSC · 2026-01-26T14:32:18Z

Concerning virtual function calls in overElements kernels

@timspainNERSC

The files to look at:

* _IIceAlbedo.hpp_ changes to the interface inline with other ModelComponents

* _CCSMIceAlbedo.cpp_ / _SMUIceAlbedo.cpp_ concrete implementations that I had to port, one for the test and the other is the default for ERA5Atmosphere

The ice albedo changes look fine, as they stand. We only ever need to calculate the ice albedo once per timestep, so it makes sense to use a ModelComponent.

* _FiniteElementFluxes.cpp_ for the use in  `calculateIce() `(previous) vs `updateIce()` (now)

That looks fine, too.

Consequences of the new interface that I can't properly evaluate:

* Since the inputs to `surfaceShortWaveBalance()` are no longer provided by the calling module it is technically less flexible. However, in all the cases where it is currently used, It seems like `snowThickness` is always computed the same way, `i0` is a configured constant parameter and `temperature` is just `tsurf`.

* `i0` previously was a parameter of the calling module, e.g. `FiniteElementFluxes` but is now part of the concrete `IIceAlbedo` implementation. If it is logically always the same it could also go right into `IIceAlbedo` itself. I did not do that because there is no associated compilation unit (.cpp file) to define `i0` and `ModelComponent` seem to always be implemented inline in the headers. Is there a reason for this?

Some of the peculiarities are due to me not fully analysing what the original code in Lagrangian nextSIM was doing and roughly replicating which parameters belonged to which components.

It's not that ModelComponents don't have a .cpp source file, but rather that that is how the module system works. As it currently works, there is an interface header (IModuleName) and then both header and source files for the implementation classes. The Module system is part of the code that you have been looking at much more recently than I have, so perhaps there is a simple way to include a .cpp file in the code generation and building. It's certainly not impossible, but I don't have the shape of the code in my head any longer to say what changes would be necessary.

(The PrognosticData and SlabOcean classes are both derived from ModelComponent and have .cpp source files. Everything else uses the Module system, though.)

Thanduriel added 18 commits October 24, 2025 16:28

created ModelArrayStore for host device sync across ModelComponent bo…

a8abc89

…undaries

fixed asserts

3414c0f

added field resizing in ModelArrayStore

84bbcbd

adapted ModelArrayRef test for ModelArrayAccessor

5c613b0

* made device buffer init lazy * removed resize_arrays() * fix: moved sync state to ExtModelArray

added test for device usage with ModelArrayAccessor

02cce10

limited lifetime of the global ModelArrayStore

7e24787

[WIP] updated core to use ModelArrayAccessor

16991c7

[WIP] updated physics to use ModelArrayAccessor

6c0ccc7

updated SlabOcean test

9c3d00d

updated all tests to use ModelArrayAccessor

783e5d1

added assertions

d8bf580

use correct sss field in TOPAZ update

6db27ed

fixed WITH_KOKKOS switch

127b113

perform ThermoWinton update on device

eaf74cd

eliminated data transfers for advection

d83837c

working tests with Kokkos enabled

c94510e

fixed crash of testPrognosticData on finalize in release builds

3b14d3e

* PDTestDynamics is registered under the proper name

created abstractions to use same code w/out kokkos

5659544

Thanduriel force-pushed the kokkos_ModelArrayStore branch from bba9367 to 23e8a6e Compare December 22, 2025 14:02

Thanduriel added 11 commits December 22, 2025 15:20

ported HiblerSpread to device

23e8a6e

ported ConstantHealing to device

f01c72b

ported SlabOcean to device

abebce6

ported ThermoIce0 to device

9c4d1c0

replaced remaining std funcs in ThermoIce0

d436daa

ported computeGradientOfSeaSurfaceHeight to device

fa0ac76

pre-allocate all SSH gradient fields

0a99772

corrections to comments of ssh gradient computation

c6c0050

share device buffers for advected fields between thermodynamics and d…

ff26e6a

…ynamics

perform kernel setData on device

701bebb

fix: check right dimensions in kokkosMA2DG

820086e

Thanduriel added 7 commits January 15, 2026 19:03

made field copy operations more flexible

6b7fa05

optimized data transfers in getDG0Data and setData

10612ae

applied clang formatting

aa9c71a

made setData functions virtual again for consistency

5c2a20c

fixed test compilation by moving KokkosDGModelArray.hpp into dynamics

4e4f44d

ported FiniteElementFluxes to device

5ec6796

fixed tests

45ddeb5

updated CCSMIceAlbedo removed obsolete code restored old interface of FiniteElementSpecHum

Thanduriel added 3 commits January 26, 2026 11:23

applied formatting

2599996

fixed non Kokkos build

bebf2e4

ruff autofix

9313693

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kokkos Thermodynamics #992

Kokkos Thermodynamics #992

Uh oh!

Thanduriel commented Nov 17, 2025 •

edited

Loading

Uh oh!

Thanduriel commented Jan 26, 2026

Uh oh!

timspainNERSC commented Jan 26, 2026

Concerning virtual function calls in overElements kernels

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Kokkos Thermodynamics #992

Are you sure you want to change the base?

Kokkos Thermodynamics #992

Uh oh!

Conversation

Thanduriel commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Kokkos Thermodynamics

Uh oh!

Thanduriel commented Jan 26, 2026

Concerning virtual function calls in overElements kernels

Uh oh!

timspainNERSC commented Jan 26, 2026

Concerning virtual function calls in overElements kernels

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Thanduriel commented Nov 17, 2025 •

edited

Loading