Add precompile workload for Dual and SubArray broadcast operations#1291
Merged
ChrisRackauckas merged 4 commits intoSciML:masterfrom Mar 1, 2026
Merged
Conversation
…ions The ForwardDiff extension defines the OrdinaryDiffEqTag Dual type but had no precompile workload. ODE functions using @view with broadcast operations (e.g. `dy .= k .* y1 .+ k .* y2 .* y3`) trigger ~2.5s of compilation at runtime for SubArray{Dual{...}} broadcast type trees. This adds comprehensive precompilation of: - Scalar Dual arithmetic (+, -, *, /, ^, negation, abs) - Scalar Dual math functions (exp, log, sin, cos, tan, sqrt, etc.) - Scalar Dual comparisons and predicates (min, max, isnan, isfinite) - Vector{Dual} broadcast operations (.+, .-, .*, ./, .^) - Vector{Dual} reductions (sum, norm, dot) - SubArray{Float64} and SubArray{Dual} broadcast patterns matching common ODE right-hand-side functions Testing shows this reduces first-solve time for view-based ODE functions from ~3.0s to ~0.8s (73% reduction). Addresses SciML/DifferentialEquations.jl#1125 Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
The previous commit only exercised `dst .= .-k .* sv1 .+ k .* sv2 .* sv3` (negated first term), but ODE functions commonly use positive first terms like `dst .= k .* sv1 .+ k .* sv2 .* sv3`. These create different Broadcasted type trees that weren't being pre-warmed. Adding this pattern reduces first-solve time from 0.80s to 0.31s, now nearly matching the 0.25s direct-indexing baseline. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
…ding blocks Fused multi-operand broadcast expressions (e.g. `dy .= k .* y1 .+ k .* y2 .* y3`) create unique nested Broadcasted types per expression and cannot be generically precompiled. Only the primitive SubArray operations (copy, scale, multiply, add, subtract, power, negate) are truly generic building blocks. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Contributor
Author
|
Updated: Removed expression-specific fused broadcast patterns (e.g. These create unique nested The precompile workload now contains only generic building blocks:
These provide modest TTFX improvement (~13% for view-based functions) but are universally useful operations that benefit all ForwardDiff-based differentiation. |
Merged
4 tasks
The VectorContinuousCallback termination time can vary slightly across platforms. Use atol=1e-4 instead of exact floating point comparison. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PrecompileTools.@compile_workloadblock toDiffEqBaseForwardDiffExt.jlthat exercises common scalar, array, and SubArray operations onDual{Tag{OrdinaryDiffEqTag, Float64}, Float64, 1}dualTtype but had no precompilation, causing ~2.5s of broadcast compilation overhead at runtime for ODE functions using viewsWhat's precompiled
Scalar Dual operations:
+,-,*,/,^, negation,absexp,log,sin,cos,tan,sqrt,cbrt,asin,acos,atan,sinh,cosh,tanh<,>,min,max,isnan,isinf,isfinitezero,one,float,ForwardDiff.value,ForwardDiff.partialsVector{Dual} operations:
.+,.-,.*,./,.^out .= v1 .* s .+ v2, etc.sum,sum(abs2, ...),maximum(abs, ...)dot,norm(1, 2, Inf)copy,fill!SubArray broadcast patterns (Float64 and Dual):
dst .= -k .* src1 .+ k .* src2 .* src3(linear combination of views)dst .= k .* src1 .- k .* src2 .^ 2 .- k .* src2 .* src3(subtraction chain with power)dst .= k .* src .^ 2(scaled power)Benchmark
Testing with the ROBER problem from DifferentialEquations.jl#1125:
73% reduction in first-solve overhead for view-based ODE functions.
Test plan
🤖 Generated with Claude Code