Fix GPU compat of sparse dae solvers#3073
Draft
hexaeder wants to merge 11 commits intoSciML:masterfrom
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR attempts to make (most) of the sparse DAE solvers GPU compatible. The main changes are:
find_algebraic_vars_eqsfunction to OrdinaryDiffEqCoreThis PR needs JuliaGPU/CUDA.jl#3032 to slice into the sparse jacobians.
The tests in the
simple_dae.jlscript in the GPU tests. Currently, those are more like a debug script. I define a custom test-system and execute it with all kinds of algorithms. The tests compare:Diagonalmass matrices (non-diagonal are much harder to support)I would like input by the maintainers on how the tests should look like. Currently I see several problems:
CUDSSonly supports LU factorization on CSR. So the tests currently compare CSC to CPU with Krylov and CSR to CPU with default linsolve.Algorithm overview
Fully working
ROS2, ROS3, ROS3PRL, ROS3PRL2, Rodas3, Scholz4_7, ROS34PW3, RosShamp4, Veldd4, Velds4, GRK4T, GRK4A,
Ros4LStab, Rodas4, Rodas42, Rodas4P, Rodas4P2, ROK4a, Rodas5, Rodas5P, Rodas5Pe, Rodas5Pr, Rodas6P,
ImplicitEuler, SDIRK2, ABDF2, QNDF1, QBDF1, Trapezoid
CSC path with elevated errors (Krylov vs direct LU accuracy)
These pass but need relaxed
csc_tol. The dense andCSRjacobian paths are fine:Rosenbrock23, ROS2PR, ROS2S, ROS34PW2, ROS34PRw, Cash4, Hairer4, Hairer42, QNDF2, QBDF2
Working, but solver is a poor fit for this DAE problem
GPU correctly reproduces the CPU result, but the solver itself diverges from the Rodas5P reference:
ROS3PR, ROS3P, ROS34PW1a, ROS34PW1b, QBDF1
Special case: Rosenbrock32/CSC: GPU and CPU Krylov agree, but the combination gives catastrophic
errors vs the reference. Effectively unusable with CSC.
Not yet working (require code changes)
The following algorithms fail and I consider them out of scope of this PR since they'd require substantial work
Rodas23W,Rodas3PScalar indexing incalculate_interpoldiff!QNDF,QBDFDeviceMemory error in LinAlg (somemul!call within the step)FBDFScalar indexing inreinitFBDF!RadauIIA3/5/9,AdaptiveRadaulots of problemsChecklist
contributor guidelines, in particular the SciML Style Guide and
COLPRAC.
Additional context
Add any other context about the problem here.