enable torch compile with cudagraphs by sef43 · Pull Request #20 · isayevlab/aimnetcentral

sef43 · 2025-11-11T11:09:13Z

As discussed by @giadefa This PR adds the necessary changes to AIMNet2 so that the model can be torch.compile'd with cudagraphs enabled.

This speeds up small molecule MD significantly. The new example ase_md.py script demonstrates the speedup.
The original runs 10000 steps in 76 seconds, the new version runs in 15 seconds.

Original:

Torch version: 2.8.0+cu128
CUDA available, version 12.8, device: NVIDIA GeForce RTX 4090
running without torch_compile+cudagraphs
energy: -79772.80223186754
Time[ps]      Etot[eV]     Epot[eV]     Ekin[eV]    T[K]
0.0000       -79772.802   -79772.802        0.000     0.0
1.0000       -79765.760   -79769.961        4.201   287.6
2.0000       -79766.232   -79770.634        4.402   301.4
3.0000       -79765.783   -79770.592        4.809   329.2
4.0000       -79766.131   -79770.321        4.191   286.9
5.0000       -79764.888   -79770.354        5.466   374.2
6.0000       -79765.719   -79770.835        5.116   350.3
7.0000       -79766.672   -79771.049        4.377   299.7
8.0000       -79766.566   -79770.707        4.141   283.5
9.0000       -79766.490   -79770.862        4.371   299.3
10.0000      -79767.403   -79771.511        4.109   281.3
Completed MD in 76.3 s (7.634 ms/step)

New with torch.compile(self.model, fullgraph=True, options={'triton.cudagraphs':True}):

Torch version: 2.8.0+cu128
CUDA available, version 12.8, device: NVIDIA GeForce RTX 4090
running with torch_compile+cudagraphs
energy: -79772.8125
Time[ps]      Etot[eV]     Epot[eV]     Ekin[eV]    T[K]
0.0000       -79772.812   -79772.812        0.000     0.0
1.0000       -79766.053   -79770.633        4.580   313.5
2.0000       -79766.829   -79770.547        3.718   254.5
3.0000       -79766.958   -79771.672        4.714   322.7
4.0000       -79766.121   -79770.664        4.543   311.0
5.0000       -79766.836   -79771.133        4.297   294.2
6.0000       -79765.915   -79770.078        4.164   285.1
7.0000       -79766.040   -79771.055        5.015   343.3
8.0000       -79766.454   -79770.727        4.273   292.5
9.0000       -79766.441   -79770.531        4.090   280.0
10.0000      -79766.390   -79771.016        4.625   316.7
Completed MD in 15.1 s (1.511 ms/step)

It is currently only implemented for nb_mode=0 and a single molecule.

The key required changes are to replace data dependent control flow with compile time constant control flow. Therefore, I have added a setup_for_compile_cudagraphs method to some modules to do this.

The feature is supported by the ASE calculator interface.

giadefa · 2025-11-11T14:29:07Z

great!

This PR adds torch.compile support with CUDA graphs for significant speedups on GPU molecular dynamics simulations. Based on community PR #20 from Acellera, but reworked with improvements: **Changes:** - Add `compile_mode=True` parameter to `AIMNet2Calculator` and `AIMNet2ASE` - Add `compile_nb_mode` parameter throughout to avoid data-dependent control flow that breaks CUDA graph capture - Add `get_model_definition_path()` for mapping model names to YAML definitions - Add `cosine_cutoff_tensor()` for CUDA graphs compatibility - Add `enable_compile_mode()` to AIMNet2Base to propagate compile settings - Add `calc_masks_fixed_nb_mode()` for compile-time mask calculation **Improvements over original PR #20:** - Generalized model loading (not hardcoded to one model) - Backward-compatible (original `cosine_cutoff` signature unchanged) - Comprehensive test coverage - Code style compliance (passes pre-commit) - Based on current main branch **Limitations:** - Only `nb_mode=0` (single molecule, dense) supported - Requires CUDA - No PBC support in compile mode - First call has compilation overhead **Usage:** ```python from aimnet.calculators import AIMNet2Calculator calc = AIMNet2Calculator("aimnet2", compile_mode=True) ```

enable torc compile with cudagraphs

78fb758

isayev mentioned this pull request Dec 16, 2025

feat: Add torch.compile with CUDA graphs support for ~5x MD speedup #29

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable torch compile with cudagraphs#20

enable torch compile with cudagraphs#20
sef43 wants to merge 1 commit intoisayevlab:mainfrom
Acellera:main

sef43 commented Nov 11, 2025

Uh oh!

giadefa commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

sef43 commented Nov 11, 2025

Uh oh!

giadefa commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments