Add precompile workload for Dual and SubArray broadcast operations by ChrisRackauckas-Claude · Pull Request #1291 · SciML/DiffEqBase.jl

ChrisRackauckas-Claude · 2026-03-01T04:30:46Z

Summary

Adds a PrecompileTools.@compile_workload block to DiffEqBaseForwardDiffExt.jl that exercises common scalar, array, and SubArray operations on Dual{Tag{OrdinaryDiffEqTag, Float64}, Float64, 1}
The extension already defined this dualT type but had no precompilation, causing ~2.5s of broadcast compilation overhead at runtime for ODE functions using views

What's precompiled

Scalar Dual operations:

Arithmetic: +, -, *, /, ^, negation, abs
Math functions: exp, log, sin, cos, tan, sqrt, cbrt, asin, acos, atan, sinh, cosh, tanh
Comparisons/predicates: <, >, min, max, isnan, isinf, isfinite
Conversion: zero, one, float, ForwardDiff.value, ForwardDiff.partials

Vector{Dual} operations:

Broadcast: .+, .-, .*, ./, .^
In-place broadcast: out .= v1 .* s .+ v2, etc.
Reductions: sum, sum(abs2, ...), maximum(abs, ...)
LinearAlgebra: dot, norm (1, 2, Inf)
copy, fill!

SubArray broadcast patterns (Float64 and Dual):

dst .= -k .* src1 .+ k .* src2 .* src3 (linear combination of views)
dst .= k .* src1 .- k .* src2 .^ 2 .- k .* src2 .* src3 (subtraction chain with power)
dst .= k .* src .^ 2 (scaled power)
Simple patterns: assignment, scaling, element-wise ops, negation

Benchmark

Testing with the ROBER problem from DifferentialEquations.jl#1125:

Scenario	First solve
Baseline (views, no pre-warming)	3.05s
With SubArray+Dual pre-warming	0.80s
Direct indexing (no views)	0.25s

73% reduction in first-solve overhead for view-based ODE functions.

Test plan

All DiffEqBase tests pass locally
Precompile workload code verified independently
CI passes

🤖 Generated with Claude Code

…ions The ForwardDiff extension defines the OrdinaryDiffEqTag Dual type but had no precompile workload. ODE functions using @view with broadcast operations (e.g. `dy .= k .* y1 .+ k .* y2 .* y3`) trigger ~2.5s of compilation at runtime for SubArray{Dual{...}} broadcast type trees. This adds comprehensive precompilation of: - Scalar Dual arithmetic (+, -, *, /, ^, negation, abs) - Scalar Dual math functions (exp, log, sin, cos, tan, sqrt, etc.) - Scalar Dual comparisons and predicates (min, max, isnan, isfinite) - Vector{Dual} broadcast operations (.+, .-, .*, ./, .^) - Vector{Dual} reductions (sum, norm, dot) - SubArray{Float64} and SubArray{Dual} broadcast patterns matching common ODE right-hand-side functions Testing shows this reduces first-solve time for view-based ODE functions from ~3.0s to ~0.8s (73% reduction). Addresses SciML/DifferentialEquations.jl#1125 Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

The previous commit only exercised `dst .= .-k .* sv1 .+ k .* sv2 .* sv3` (negated first term), but ODE functions commonly use positive first terms like `dst .= k .* sv1 .+ k .* sv2 .* sv3`. These create different Broadcasted type trees that weren't being pre-warmed. Adding this pattern reduces first-solve time from 0.80s to 0.31s, now nearly matching the 0.25s direct-indexing baseline. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

…ding blocks Fused multi-operand broadcast expressions (e.g. `dy .= k .* y1 .+ k .* y2 .* y3`) create unique nested Broadcasted types per expression and cannot be generically precompiled. Only the primitive SubArray operations (copy, scale, multiply, add, subtract, power, negate) are truly generic building blocks. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

ChrisRackauckas-Claude · 2026-03-01T04:47:54Z

Updated: Removed expression-specific fused broadcast patterns (e.g. dsv1 .= k .* sv1 .+ k .* sv2 .* sv3).

These create unique nested Broadcasted type trees per expression and can't be generically precompiled. The dominant TTFX cost for view-based ODE functions (~2.5s of 3.0s) comes from these fused trees, which is a Julia broadcast fusion design constraint.

The precompile workload now contains only generic building blocks:

Scalar Dual arithmetic, math functions, comparisons
Vector{Dual} broadcast, reductions, LinearAlgebra ops
Primitive SubArray operations (copy, scale, multiply, add, subtract, power, negate)

These provide modest TTFX improvement (~13% for view-based functions) but are universally useful operations that benefit all ForwardDiff-based differentiation.

The VectorContinuousCallback termination time can vary slightly across platforms. Use atol=1e-4 instead of exact floating point comparison. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

ChrisRackauckas added 3 commits February 28, 2026 23:30

ChrisRackauckas-Claude mentioned this pull request Mar 1, 2026

Add precompile workload for Dual and SubArray broadcast operations SciML/NonlinearSolve.jl#858

Merged

4 tasks

Bump tolerance on community callback test

0ba42fc

The VectorContinuousCallback termination time can vary slightly across platforms. Use atol=1e-4 instead of exact floating point comparison. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

ChrisRackauckas merged commit d202eff into SciML:master Mar 1, 2026
39 of 46 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add precompile workload for Dual and SubArray broadcast operations#1291

Add precompile workload for Dual and SubArray broadcast operations#1291
ChrisRackauckas merged 4 commits intoSciML:masterfrom
ChrisRackauckas-Claude:precompile-subarray-dual-broadcast

ChrisRackauckas-Claude commented Mar 1, 2026

Uh oh!

ChrisRackauckas-Claude commented Mar 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ChrisRackauckas-Claude commented Mar 1, 2026

Summary

What's precompiled

Benchmark

Test plan

Uh oh!

ChrisRackauckas-Claude commented Mar 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants