feat: Add torch.compile with CUDA graphs support for ~5x MD speedup#29
feat: Add torch.compile with CUDA graphs support for ~5x MD speedup#29
Conversation
This PR adds torch.compile support with CUDA graphs for significant speedups on GPU molecular dynamics simulations. Based on community PR #20 from Acellera, but reworked with improvements: **Changes:** - Add `compile_mode=True` parameter to `AIMNet2Calculator` and `AIMNet2ASE` - Add `compile_nb_mode` parameter throughout to avoid data-dependent control flow that breaks CUDA graph capture - Add `get_model_definition_path()` for mapping model names to YAML definitions - Add `cosine_cutoff_tensor()` for CUDA graphs compatibility - Add `enable_compile_mode()` to AIMNet2Base to propagate compile settings - Add `calc_masks_fixed_nb_mode()` for compile-time mask calculation **Improvements over original PR #20:** - Generalized model loading (not hardcoded to one model) - Backward-compatible (original `cosine_cutoff` signature unchanged) - Comprehensive test coverage - Code style compliance (passes pre-commit) - Based on current main branch **Limitations:** - Only `nb_mode=0` (single molecule, dense) supported - Requires CUDA - No PBC support in compile mode - First call has compilation overhead **Usage:** ```python from aimnet.calculators import AIMNet2Calculator calc = AIMNet2Calculator("aimnet2", compile_mode=True) ```
Pull Request Review: torch.compile with CUDA Graphs SupportSummaryThis PR adds ✅ StrengthsCode Quality
Test Coverage
Performance & Documentation
🔍 Issues & Concerns1. Critical: Potential Device Mismatch in
|
Summary
This PR adds
torch.compilesupport with CUDA graphs for significant speedups on GPU molecular dynamics simulations.Based on community PR #20 from Acellera, but reworked with improvements:
cosine_cutoffsignature unchanged)Changes
compile_mode=Trueparameter toAIMNet2CalculatorandAIMNet2ASEcompile_nb_modeparameter throughout to avoid data-dependent control flow that breaks CUDA graph captureget_model_definition_path()for mapping model names to YAML definitionscosine_cutoff_tensor()for CUDA graphs compatibilityenable_compile_mode()to AIMNet2Base to propagate compile settingscalc_masks_fixed_nb_mode()for compile-time mask calculationPerformance
Based on benchmarks from community PR #20:
Limitations
nb_mode=0(single molecule, dense) supportedUsage
Or with ASE:
Test plan
pytest -m gpu tests/test_compile.py(requires CUDA)python examples/ase_md_compiled.py --compileCloses #20 (supersedes with improvements)