Skip to content

Fix SimpleNonlinearSolve GPU compat without unnecessary @generated#854

Open
ChrisRackauckas-Claude wants to merge 12 commits intoSciML:masterfrom
ChrisRackauckas-Claude:gpu-compat-remove-generated
Open

Fix SimpleNonlinearSolve GPU compat without unnecessary @generated#854
ChrisRackauckas-Claude wants to merge 12 commits intoSciML:masterfrom
ChrisRackauckas-Claude:gpu-compat-remove-generated

Conversation

@ChrisRackauckas-Claude
Copy link
Contributor

Summary

  • Builds on Fix SimpleNonlinearSolve for GPU compatibility #809 (GPU compatibility for SimpleNonlinearSolve) but removes the unnecessary @generated functions
  • SciMLBase.isinplace extracts a type parameter and returns a compile-time constant — the compiler constant-folds it, so @generated is not needed to eliminate dead branches
  • Keeps all other GPU compatibility changes from Fix SimpleNonlinearSolve for GPU compatibility #809: NonlinearAliasSpecifier standardization, get_alias_u0/should_cache_fx helpers, SimpleHalley GPU exclusion, Buildkite pipeline updates

Specifically reverted these back to regular functions:

  • incompatible_backend_and_problem (was @generated, now regular function — net zero diff vs master)
  • evaluate_f!! (was @generated, now regular function with SciMLBase.isinplace(f))
  • evaluate_f (was @generated, now regular function with SciMLBase.isinplace(prob))
  • should_cache_fx (was @generated, now @inline function)
  • prepare_jacobian (was @generated, now regular function)

The @assert in compute_jacobian!! remains commented out since throw is genuinely GPU-incompatible.

Addresses @oscardssmith's concern on #809 about compile time impact of @generated functions.

Supersedes #809.

Test plan

  • NonlinearSolveBase tests pass (16/16)
  • SimpleNonlinearSolve tests pass (35,584/35,588, 4 pre-existing broken)
  • GPU CI (Buildkite) — needs GPU runner

🤖 Generated with Claude Code

utkarsh530 and others added 11 commits February 26, 2026 11:08
Removed 'src' from the coverage directories in the pipeline configuration.
SciMLBase.isinplace extracts a type parameter and returns a compile-time
constant (true/false). The compiler constant-folds it, so @generated
functions are unnecessary for eliminating dead branches. Regular functions
with if SciMLBase.isinplace(prob) infer identically.

Reverted: incompatible_backend_and_problem, evaluate_f!!, evaluate_f,
should_cache_fx, and prepare_jacobian back to regular functions.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@utkarsh530
Copy link
Member

Still doesn't work for me locally:

using SimpleNonlinearSolve, StaticArrays, CUDA

f(u, p) = u .* u .- p

function kernel_function(prob, alg)
    solve(prob, alg)
    return nothing
end


nu0 = @SVector[1.0f0, 1.0f0]

sllprob = convert(SimpleNonlinearSolve.ImmutableNonlinearProblem, NonlinearProblem{false}(f, nu0, 2.0f0))

alg = SimpleBroyden(; linesearch = Val(true))


solve(sllprob, SimpleNewtonRaphson(; autodiff = AutoForwardDiff()))


solve(sllprob, alg)

@cuda kernel_function(sllprob, SimpleNewtonRaphson(; autodiff = AutoForwardDiff()))

@cuda kernel_function(sllprob, SimpleDFSane())

@cuda kernel_function(sllprob, SimpleHalley(; autodiff = AutoForwardDiff()))


@cuda kernel_function(sllprob, SimpleLimitedMemoryBroyden(; linesearch = Val(true)))
julia> @cuda kernel_function(sllprob, SimpleNewtonRaphson(; autodiff = AutoForwardDiff()))
ERROR: a type error was thrown during kernel execution on thread (1, 1, 1) in block (1, 1, 1).
Stacktrace:
 [1] incompatible_backend_and_problem at /home/utkarsh530/.julia/dev/NonlinearSolve/lib/NonlinearSolveBase/src/autodiff.jl:112
 [2] select_jacobian_autodiff at /home/utkarsh530/.julia/dev/NonlinearSolve/lib/NonlinearSolveBase/src/autodiff.jl:86
 [3] configure_autodiff at /home/utkarsh530/.julia/dev/NonlinearSolve/lib/SimpleNonlinearSolve/src/raphson.jl:28
 [4] #solve#39 at /home/utkarsh530/.julia/dev/NonlinearSolve/lib/SimpleNonlinearSolve/src/SimpleNonlinearSolve.jl:103
 [5] solve at /home/utkarsh530/.julia/dev/NonlinearSolve/lib/SimpleNonlinearSolve/src/SimpleNonlinearSolve.jl:94
 [6] kernel_function at ./REPL[4]:2

CUDA.HostKernel for kernel_function(SciMLBase.ImmutableNonlinearProblem{SVector{2, Float32}, Float32, Float32, NonlinearFunction{Float32, SciMLBase.FullSpecialize, typeof(f), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED_NO_TIME), Nothing, Nothing, Nothing, Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, @NamedTuple{}}, SciMLBase.StandardNonlinearProblem}, SimpleNewtonRaphson{AutoForwardDiff{nothing, Nothing}})

julia> @cuda kernel_function(sllprob, SimpleDFSane())
ERROR: KernelException: exception thrown during kernel execution on device NVIDIA GeForce RTX 5080
Stacktrace:
  [1] check_exceptions
    @ ~/.julia/packages/CUDA/724Sm/src/compiler/exceptions.jl:39
  [2] #device_synchronize#1041
    @ ~/.julia/packages/CUDA/724Sm/lib/cudadrv/synchronization.jl:191
  [3] device_synchronize
    @ ~/.julia/packages/CUDA/724Sm/lib/cudadrv/synchronization.jl:178 [inlined]
  [4] checked_cuModuleLoadDataEx
    @ ~/.julia/packages/CUDA/724Sm/lib/cudadrv/module.jl:18
  [5] CuModule
    @ ~/.julia/packages/CUDA/724Sm/lib/cudadrv/module.jl:60
  [6] CuModule
    @ ~/.julia/packages/CUDA/724Sm/lib/cudadrv/module.jl:49 [inlined]
  [7] link
    @ ~/.julia/packages/CUDA/724Sm/src/compiler/compilation.jl:409
  [8] actual_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/OCZFZ/src/execution.jl:270
  [9] cached_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/OCZFZ/src/execution.jl:159
 [10] macro expansion
    @ ~/.julia/packages/CUDA/724Sm/src/compiler/execution.jl:373 [inlined]
 [11] macro expansion
    @ ./lock.jl:273 [inlined]
 [12] cufunction(f::typeof(kernel_function), tt::Type{Tuple{SciMLBase.ImmutableNonlinearProblem{…}, SimpleDFSane{…}}}; kwargs::@Kwargs{})
    @ CUDA ~/.julia/packages/CUDA/724Sm/src/compiler/execution.jl:368
 [13] cufunction(f::typeof(kernel_function), tt::Type{Tuple{SciMLBase.ImmutableNonlinearProblem{…}, SimpleDFSane{…}}})
    @ CUDA ~/.julia/packages/CUDA/724Sm/src/compiler/execution.jl:365
 [14] top-level scope
    @ ~/.julia/packages/CUDA/724Sm/src/compiler/execution.jl:112
Some type information was truncated. Use `show(err)` to see complete types.

julia> @cuda kernel_function(sllprob, SimpleHalley(; autodiff = AutoForwardDiff()))
CUDA.HostKernel for kernel_function(SciMLBase.ImmutableNonlinearProblem{SVector{2, Float32}, Float32, Float32, NonlinearFunction{Float32, SciMLBase.FullSpecialize, typeof(f), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED_NO_TIME), Nothing, Nothing, Nothing, Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, @NamedTuple{}}, SciMLBase.StandardNonlinearProblem}, SimpleHalley{AutoForwardDiff{nothing, Nothing}})

julia> @cuda kernel_function(sllprob, SimpleLimitedMemoryBroyden(; linesearch = Val(true)))ERROR: a type error was thrown during kernel execution on thread (1, 1, 1) in block (1, 1, 1).
Stacktrace:
 [1] incompatible_backend_and_problem at /home/utkarsh530/.julia/dev/NonlinearSolve/lib/NonlinearSolveBase/src/autodiff.jl:112
 [2] select_jacobian_autodiff at /home/utkarsh530/.julia/dev/NonlinearSolve/lib/NonlinearSolveBase/src/autodiff.jl:86
 [3] configure_autodiff at /home/utkarsh530/.julia/dev/NonlinearSolve/lib/SimpleNonlinearSolve/src/halley.jl:25
 [4] #solve#39 at /home/utkarsh530/.julia/dev/NonlinearSolve/lib/SimpleNonlinearSolve/src/SimpleNonlinearSolve.jl:103
 [5] solve at /home/utkarsh530/.julia/dev/NonlinearSolve/lib/SimpleNonlinearSolve/src/SimpleNonlinearSolve.jl:94
 [6] kernel_function at ./REPL[4]:2

julia> @cuda kernel_function(sllprob, SimpleLimitedMemoryBroyden(; linesearch = Val(true)))
ERROR: KernelException: exception thrown during kernel execution on device NVIDIA GeForce RTX 5080
Stacktrace:
  [1] check_exceptions
    @ ~/.julia/packages/CUDA/724Sm/src/compiler/exceptions.jl:39
  [2] #device_synchronize#1041
    @ ~/.julia/packages/CUDA/724Sm/lib/cudadrv/synchronization.jl:191
  [3] device_synchronize
    @ ~/.julia/packages/CUDA/724Sm/lib/cudadrv/synchronization.jl:178 [inlined]
  [4] checked_cuModuleLoadDataEx
    @ ~/.julia/packages/CUDA/724Sm/lib/cudadrv/module.jl:18
  [5] CuModule
    @ ~/.julia/packages/CUDA/724Sm/lib/cudadrv/module.jl:60
  [6] CuModule
    @ ~/.julia/packages/CUDA/724Sm/lib/cudadrv/module.jl:49 [inlined]
  [7] link
    @ ~/.julia/packages/CUDA/724Sm/src/compiler/compilation.jl:409
  [8] actual_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/OCZFZ/src/execution.jl:270
  [9] cached_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/OCZFZ/src/execution.jl:159
 [10] macro expansion
    @ ~/.julia/packages/CUDA/724Sm/src/compiler/execution.jl:373 [inlined]
 [11] macro expansion
    @ ./lock.jl:273 [inlined]
 [12] cufunction(f::typeof(kernel_function), tt::Type{Tuple{SciMLBase.ImmutableNonlinearProblem{SVector{…}, Float32, Float32, NonlinearFunction{…}, @Kwargs{}, SciMLBase.StandardNonlinearProblem}, SimpleLimitedMemoryBroyden{Val{…}, Val{…}, Nothing}}}; kwargs::@Kwargs{})
    @ CUDA ~/.julia/packages/CUDA/724Sm/src/compiler/execution.jl:368
 [13] cufunction(f::typeof(kernel_function), tt::Type{Tuple{SciMLBase.ImmutableNonlinearProblem{SVector{…}, Float32, Float32, NonlinearFunction{…}, @Kwargs{}, SciMLBase.StandardNonlinearProblem}, SimpleLimitedMemoryBroyden{Val{…}, Val{…}, Nothing}}})
    @ CUDA ~/.julia/packages/CUDA/724Sm/src/compiler/execution.jl:365
 [14] top-level scope
    @ ~/.julia/packages/CUDA/724Sm/src/compiler/execution.jl:112
Some type information was truncated. Use `show(err)` to see complete types.

@AdityaPandeyCN
Copy link
Contributor

Hi @ChrisRackauckas with (SciML/SciMLBase.jl#1270) going in this PR should be functional, I tested the above Utkarsh's example and it worked for me without errors but excluded SimpleHalley as it has some other compatibility issues.

@ChrisRackauckas
Copy link
Member

First order test failure looks real?

@AdityaPandeyCN
Copy link
Contributor

Only one LevenbergMarquardt with AutoZygote() on SVector{2, Float64}

@AdityaPandeyCN
Copy link
Contributor

Seems like a known bug, it has been flagged here (https://github.com/SciML/NonlinearSolve.jl/blob/master/lib/NonlinearSolveFirstOrder/test/rootfind_tests.jl#L412) and it was the common failure on all the first order failures so we can ignore them?

@ChrisRackauckas
Copy link
Member

If it was that then the tests would be passing right?

@AdityaPandeyCN
Copy link
Contributor

I missed that it was set to broken_test, this is working(I dont exaclty know the reason maybe some upstream fix) and we can set is to test. It passes for me locally

julia> runtests("test/rootfind_tests.jl"; name="LevenbergMarquardt")
[ Info: Scanning for test items in project `NonlinearSolveFirstOrder` at paths: test/rootfind_tests.jl
[ Info: Finished scanning for test items in 0.04 seconds.
[ Info: Scheduling 1 tests on pid 72272
21:50:14 | START (1/1) test item "LevenbergMarquardt" at test/rootfind_tests.jl:390
21:51:17 | DONE  (1/1) test item "LevenbergMarquardt" 62.4 secs (22.3% compile, 65.6% GC), 56.04 M allocs (4.314 GB)
┌ Warning: Test item "LevenbergMarquardt" at test/rootfind_tests.jl:390 contains test sets without tests:
│ "[IIP] u0: Vector{Float64}"
└ @ ReTestItems ~/.julia/packages/ReTestItems/rFUty/src/log_capture.jl:328
[ Tests Completed: 1/1 test items were run.
Test Summary:                          | Pass  Total     Time
NonlinearSolveFirstOrder               |   45     45  1m02.5s
  test                                 |   45     45         
    test/rootfind_tests.jl             |   45     45         
      LevenbergMarquardt               |   45     45  1m02.5s
        ad = ADTypes.AutoForwardDiff() |   12     12    18.5s
        ad = ADTypes.AutoZygote()      |    9      9    13.3s
        ad = ADTypes.AutoFiniteDiff()  |   12     12    15.3s
        ad = ADTypes.AutoEnzyme()      |   12     12    15.3s

@AdityaPandeyCN
Copy link
Contributor

AdityaPandeyCN commented Mar 17, 2026

To be more clear this test https://github.com/SciML/NonlinearSolve.jl/blob/master/lib/NonlinearSolveFirstOrder/test/rootfind_tests.jl#L410-L427

was set as a broken test because of a bug, so it threw exception which was flagged as broken and it passed but now it is actually returning a value because of a fix maybe upstream.

Error in testset "[OOP] u0: StaticArraysCore.SVector{2, Float64}" on worker 72272:
Error During Test at /home/aditya/NonlinearSolve.jl/lib/NonlinearSolveFirstOrder/test/rootfind_tests.jl:413
  Expression evaluated to non-Boolean
  Expression: solve_oop(quadratic_f, u0; solver)
       Value: [1.41421356236321, 1.41421356236321]

This correct value is not expected and broken_test is giving error, we should just make it a proper test

@testset "[OOP] u0: $(typeof(u0))" for u0 in ([1.0, 1.0], 1.0, @SVector([1.0, 1.0]))
            sol = solve_oop(quadratic_f, u0; solver)
            @test SciMLBase.successful_retcode(sol)
            err = maximum(abs, quadratic_f(sol.u, 2.0))
            @test err < 1.0e-9

            cache = init(
                NonlinearProblem{false}(quadratic_f, u0, 2.0),
                LevenbergMarquardt(), abstol = 1.0e-9
            )
            @test (@ballocated solve!($cache)) < 200
        end

@AdityaPandeyCN
Copy link
Contributor

Should I raise a different PR? or can we push that here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants