Fix GPU compat of sparse dae solvers by hexaeder · Pull Request #3073 · SciML/OrdinaryDiffEq.jl

hexaeder · 2026-02-23T12:17:50Z

This PR attempts to make (most) of the sparse DAE solvers GPU compatible. The main changes are:

special handling of Diagonal mass matrices (improves performance by order of magnitude for certain dae init calls)
moves the find_algebraic_vars_eqs function to OrdinaryDiffEqCore
fixing of (mostly scalar indexing) problems in the init or step code of several methods

This PR needs JuliaGPU/CUDA.jl#3032 to slice into the sparse jacobians.

The tests in the simple_dae.jl script in the GPU tests. Currently, those are more like a debug script. I define a custom test-system and execute it with all kinds of algorithms. The tests compare:

the CPU solution of the different solvers against the "known good" CPU solution of Rodas5P
compare the GPU solution of the different solvers to their CPU counterpart
do this all for Diagonal mass matrices (non-diagonal are much harder to support)
the system is testes without sparcity pattern, with sparcity pattern as cuda CSR and with sparsity pattern as coda CSC

I would like input by the maintainers on how the tests should look like. Currently I see several problems:

it takes a long time since (too) many algorithms are tested
some algos seem to perform badly on my toymodel, resulting in large discrepancy between the CPU solutions of one alg vs another alg
some algos perform weirdly with CSC vs CSR jacobian prototoypes. CSC is always worse. The CUDA default is CSC, however CUDSS only supports LU factorization on CSR. So the tests currently compare CSC to CPU with Krylov and CSR to CPU with default linsolve.

Algorithm overview

Fully working

ROS2, ROS3, ROS3PRL, ROS3PRL2, Rodas3, Scholz4_7, ROS34PW3, RosShamp4, Veldd4, Velds4, GRK4T, GRK4A,
Ros4LStab, Rodas4, Rodas42, Rodas4P, Rodas4P2, ROK4a, Rodas5, Rodas5P, Rodas5Pe, Rodas5Pr, Rodas6P,
ImplicitEuler, SDIRK2, ABDF2, QNDF1, QBDF1, Trapezoid

CSC path with elevated errors (Krylov vs direct LU accuracy)

These pass but need relaxed csc_tol. The dense and CSR jacobian paths are fine:
Rosenbrock23, ROS2PR, ROS2S, ROS34PW2, ROS34PRw, Cash4, Hairer4, Hairer42, QNDF2, QBDF2

Working, but solver is a poor fit for this DAE problem

GPU correctly reproduces the CPU result, but the solver itself diverges from the Rodas5P reference:
ROS3PR, ROS3P, ROS34PW1a, ROS34PW1b, QBDF1

Special case: Rosenbrock32/CSC: GPU and CPU Krylov agree, but the combination gives catastrophic
errors vs the reference. Effectively unusable with CSC.

Not yet working (require code changes)

The following algorithms fail and I consider them out of scope of this PR since they'd require substantial work

Rodas23W, Rodas3P Scalar indexing in calculate_interpoldiff!
QNDF, QBDF DeviceMemory error in LinAlg (some mul! call within the step)
FBDF Scalar indexing in reinitFBDF!
RadauIIA3/5/9, AdaptiveRadau lots of problems

Checklist

Appropriate tests were added
Any code changes were done in a way that does not break public API
All documentation related to code changes were updated
The new code follows the
contributor guidelines, in particular the SciML Style Guide and
COLPRAC.
Any new documentation only uses public API

Additional context

Add any other context about the problem here.

hexaeder added 11 commits February 23, 2026 12:50

add Diagonal overload für mass matrix

094453a

fix dae gpu compat

60ff481

move find_algebraic_vars_eqs to OrdinayDiffEqCore

dde2cff

fix cache build for several solvers

c025ebf

WIP test script

5e06983

improve test script

82f2c1c

improve test script

e8dc46e

improve test script

9ee1348

adapt project.toml

06165dd

apply runic

78b24f9

fix missing imports/packages

7f7cfa5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix GPU compat of sparse dae solvers#3073

Fix GPU compat of sparse dae solvers#3073
hexaeder wants to merge 11 commits intoSciML:masterfrom
hexaeder:hw/sparse_dae_gpu

hexaeder commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

hexaeder commented Feb 23, 2026

Algorithm overview

Fully working

CSC path with elevated errors (Krylov vs direct LU accuracy)

Working, but solver is a poor fit for this DAE problem

Not yet working (require code changes)

Checklist

Additional context

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant