Skip to content

Add comprehensive documentation: tutorials, model guides, and advanced workflows#38

Open
isayev wants to merge 1 commit intomainfrom
docs/documentation-expansion
Open

Add comprehensive documentation: tutorials, model guides, and advanced workflows#38
isayev wants to merge 1 commit intomainfrom
docs/documentation-expansion

Conversation

@isayev
Copy link
Contributor

@isayev isayev commented Feb 16, 2026

Summary

Comprehensive documentation expansion adding 19 new pages and updating 6 existing files (+5,664 lines). Content was written, then reviewed by expert computational chemistry agents who identified ~15 critical scientific accuracy issues, all of which were fixed before this PR.

New content

Models (7 pages)

  • Model selection guide with decision flowchart and "What AIMNet2 Cannot Do" box
  • Architecture overview covering AEV descriptors, ConvSV mechanism, NSE charge equilibration
  • Per-model reference pages for all 5 model families
  • AIMNet2-2025 promoted as recommended B97-3c model (supersedes aimnet2_b973c)
  • AIMNet2-Pd updated to reflect CPCM implicit solvation (THF) baked into the model

Tutorials (6 pages)

  • Single-point calculations (water + aspirin worked examples)
  • Geometry optimization with ASE BFGS + thermochemistry workflow
  • Molecular dynamics (NVT/NPT) with compile_model guidance
  • Periodic systems with DSF/Ewald Coulomb method comparison
  • Batch processing with mol_idx construction and memory management
  • Performance tuning (Warp warmup, torch.compile, dense/sparse modes)

Advanced guides (6 pages)

  • Open-shell chemistry with AIMNet2-NSE (BDE calculations, radical stability)
  • Conformer search pipeline (RDKit ETKDG + AIMNet2 optimization + Boltzmann populations)
  • Reaction paths and transition states via PySisyphus NEB/IRC
  • Intermolecular interactions (water dimer, benzene dimer, BSSE discussion)
  • Palladium catalysis workflows (coordination geometries, Suzuki coupling) with CPCM/THF solvation context
  • Charged systems (charge parameter, dipole moments, Ewald limitations)

Updated files

  • mkdocs.yml: full nav restructure + pymdownx extensions
  • README.md: fix compile_model param, correct ASE API, update Pd model with CPCM/THF
  • .pre-commit-config.yaml: disable MD046 (conflicts with MkDocs admonitions)
  • docs/index.md: replace Material grid cards with standard markdown
  • docs/getting_started.md: scope to installation + add Loading Your Molecule section
  • docs/long_range.md: correct Ewald scaling from O(N log N) to O(N^2)

Scientific accuracy fixes

  • Paper reference: Chemical Science (not JCTC)
  • NSE acronym: Neutral Spin Equilibrated
  • Multi-reference terminology: CASSCF/CASPT2/NEVPT2 (not multi-reference DFT)
  • Aspirin atom types match C9H8O4
  • fmax: max atomic force magnitude (norm), not max component
  • externalstress=0.0: vacuum (0 Pa), not 1 atm
  • Ewald scaling: O(N^2), not O(N log N)
  • Hessian memory: 9M values (~36 MB), not 9B (~36 GB)
  • Alanine dipeptide SMILES corrected
  • ZPE correction warnings added to BDE and barrier calculations
  • Water dimer geometry fixed for proper H-bonding
  • SRCoulomb cutoff: 5.0 A (not ~4.6 A)
  • AIMNet2-Pd: CPCM implicit solvation (THF) documented across model page, guide, tutorial, and README

Review feedback addressed (@zubatyuk)

All 15 inline review comments resolved:

  • aimnet2nse.md: NSE channels clarified as alpha/beta spin charges with independent equilibration
  • architecture.md: Added SAE training workflow (float64 subtract, float32 train, float64 export) and aimnet calc_sae CLI
  • geometry_optimization.md: fmax tightened from 0.01 to 0.005 eV/A with tiered guidance
  • performance.md (10 items):
    • Warp init timing corrected (~2 seconds)
    • Kernel JIT section rewritten with cold/warm cache distinction
    • max-autotune example verified and kept
    • Dense mode table simplified (removed redundant non-PBC qualifier)
    • Dense mode description reframed as beneficial and automatically enabled
    • nb_threshold tuning simplified to default 120 works well
    • Synchronization points subsection removed (internal detail)
    • torch.cuda.Event promoted to primary timing method
    • Multi-GPU section: distributed compute confirmed supported, only DataParallel sharding unsupported
    • Adaptive neighbor list 1.5x growth factor verified in source code
  • single_point.md: compile_model note rewritten -- PyTorch caches compiled kernels and only recompiles on significant size changes
  • getting_started.md: Verification script prints device info

Test plan

  • mkdocs build --strict passes
  • pre-commit run --all-files passes (markdownlint, prettier, codespell, ruff)
  • All code examples use compile_model (not compile_mode)
  • All ASE examples use AIMNet2ASE(base_calc, charge=...) pattern
  • AIMNet2-Pd correctly describes CPCM/THF implicit solvation throughout
  • All 15 review comments from @zubatyuk addressed

@isayev isayev force-pushed the docs/documentation-expansion branch 3 times, most recently from b651244 to d439d17 Compare February 16, 2026 19:18

### Architecture Difference

The key architectural difference from the standard AIMNet2 model is `num_charge_channels=2`. The first channel handles molecular charge (as in all AIMNet2 models), and the second channel encodes spin multiplicity. This allows the model to learn spin-dependent energy contributions without separate models for each spin state.
Copy link
Contributor

@zubatyuk zubatyuk Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two channels are for alpha and beta charges. See https://github.com/isayevlab/aimnetcentral/blob/main/aimnet/models/aimnet2.py#L100

Give a note that two charge channels are equilibrated independently and each is constrained to the total number of alpha and beta electrons. This might lead to spin-polarization (e.g. non-equivalent alpha and beta charges, or non-zero spin-charges) even for singlet systems.


The SAE values are stored in float64 precision to avoid numerical issues when
computing energy differences between large molecules.

Copy link
Contributor

@zubatyuk zubatyuk Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the common approach is to subtract large SAE values (as float64) before training, train small per atom type bias (as float32), and add SAE back during model export (as float64).
Note that SAE could be calculated as average per atom type energies in the dataset using aimnet calc_sae.

Comment on lines +105 to +106
when the maximum atomic force magnitude (i.e., the largest per-atom force norm
sqrt(fx^2+fy^2+fz^2)) drops below this threshold. A value of 0.01 eV/A is a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5e-3 or 2e-3 eV/A

automatically. This initializes the Warp runtime and detects available GPUs:

```python
import aimnet # Triggers wp.init() on first kernel import: ~2-5 seconds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's way less, and only on first compilation. Warp caches compiled kernels between sessions.

This cost is paid once per Python process.

### Kernel JIT Compilation (10-30 seconds)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is way less.

"aimnet2",
compile_model=True,
compile_kwargs={"mode": "max-autotune"},
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure that max-autotune actually works.


| Condition | Mode | Complexity | Neighbor Lists |
| ---------------------------------------- | ---------- | ---------- | -------------- |
| N <= `nb_threshold` AND CUDA AND non-PBC | **Dense** | O(N^2) | No (all-pairs) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'AND non-PBC' is extra here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to say that fully-connected mode is beneficial (and enables in Calculator interface) for small molecules on CUDA.


# High-memory GPU (e.g., 40+ GB) -- stay in dense mode longer
calc = AIMNet2Calculator("aimnet2", nb_threshold=200)
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do benchmarks first.

with 100 atoms in dense mode, others with 150 atoms in sparse mode) causes
recompilation each time, negating the benefits. Set `nb_threshold` so that your
typical workload stays in one mode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it?

`num_neighbors.max().item()` is called to determine the actual maximum neighbor count
for trimming. This is a single `.item()` call per neighbor list computation (not per
atom). For most workloads, this overhead is negligible compared to the NN forward pass.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we go that deep?

elapsed = time.perf_counter() - start

print(f"Average time: {elapsed / n_iterations * 1000:.2f} ms/call")
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perf_counter works. Better use torch.cuda.Event


### Do Not Use Multiple GPUs

AIMNet2 does not support multi-GPU execution (DataParallel or DistributedDataParallel).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?
It does not do shading though. Distributed compute is supported.

**not** compile the neighbor list construction or the external Coulomb/DFTD3
modules. The benefit is greatest for repeated evaluations on the same
system size, such as MD trajectories or geometry optimizations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. In default compile mode, PyTorch triggers re-compilation only if problem size changes significantly, and caches compiled kernels.

print(f"Energy: {result['energy'].item():.4f} eV")
print(f"Forces shape: {result['forces'].shape}")
print("AIMNet2 loaded successfully")
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also print current device and maybe loaded model.

isayev added a commit that referenced this pull request Feb 18, 2026
- Correct NSE charge channels description (alpha/beta, not charge/spin)
- Add SAE training workflow detail and aimnet calc_sae mention
- Tighten fmax recommendation to 0.005 eV/A
- Update Warp init and JIT timing with measured values (L40S benchmarks)
- Clarify max-autotune provides marginal benefit
- Simplify dense mode table (PBC note moved to footnote)
- Reframe dense mode as beneficial for small molecules on CUDA
- Simplify nb_threshold advice based on benchmark data
- Remove Synchronization Points subsection (internal detail)
- Use torch.cuda.Event as primary timing example
- Rewrite multi-GPU section: distributed compute supported, sharding not
- Fix compile_model recompilation claim (caches graphs, not per-size)
- Print device in getting_started verification script
isayev added a commit that referenced this pull request Feb 18, 2026
- Correct NSE charge channels description (alpha/beta, not charge/spin)
- Add SAE training workflow detail and aimnet calc_sae mention
- Tighten fmax recommendation to 0.005 eV/A
- Update Warp init and JIT timing with measured values (L40S benchmarks)
- Clarify max-autotune provides marginal benefit
- Simplify dense mode table (PBC note moved to footnote)
- Reframe dense mode as beneficial for small molecules on CUDA
- Simplify nb_threshold advice based on benchmark data
- Remove Synchronization Points subsection (internal detail)
- Use torch.cuda.Event as primary timing example
- Rewrite multi-GPU section: distributed compute supported, sharding not
- Fix compile_model recompilation claim (caches graphs, not per-size)
- Print device in getting_started verification script
@isayev isayev force-pushed the docs/documentation-expansion branch from 3f7c250 to 1de5f60 Compare February 18, 2026 18:07
isayev added a commit that referenced this pull request Feb 18, 2026
- Correct NSE charge channels description (alpha/beta, not charge/spin)
- Add SAE training workflow detail and aimnet calc_sae mention
- Tighten fmax recommendation to 0.005 eV/A
- Update Warp init and JIT timing with measured values (L40S benchmarks)
- Clarify max-autotune provides marginal benefit
- Simplify dense mode table (PBC note moved to footnote)
- Reframe dense mode as beneficial for small molecules on CUDA
- Simplify nb_threshold advice based on benchmark data
- Remove Synchronization Points subsection (internal detail)
- Use torch.cuda.Event as primary timing example
- Rewrite multi-GPU section: distributed compute supported, sharding not
- Fix compile_model recompilation claim (caches graphs, not per-size)
- Print device in getting_started verification script
@isayev isayev force-pushed the docs/documentation-expansion branch from 1de5f60 to 800ba8f Compare February 18, 2026 18:13
@isayev isayev force-pushed the docs/documentation-expansion branch from 800ba8f to 75d78ba Compare February 18, 2026 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments