Add comprehensive documentation: tutorials, model guides, and advanced workflows#38
Add comprehensive documentation: tutorials, model guides, and advanced workflows#38
Conversation
b651244 to
d439d17
Compare
|
|
||
| ### Architecture Difference | ||
|
|
||
| The key architectural difference from the standard AIMNet2 model is `num_charge_channels=2`. The first channel handles molecular charge (as in all AIMNet2 models), and the second channel encodes spin multiplicity. This allows the model to learn spin-dependent energy contributions without separate models for each spin state. |
There was a problem hiding this comment.
Two channels are for alpha and beta charges. See https://github.com/isayevlab/aimnetcentral/blob/main/aimnet/models/aimnet2.py#L100
Give a note that two charge channels are equilibrated independently and each is constrained to the total number of alpha and beta electrons. This might lead to spin-polarization (e.g. non-equivalent alpha and beta charges, or non-zero spin-charges) even for singlet systems.
|
|
||
| The SAE values are stored in float64 precision to avoid numerical issues when | ||
| computing energy differences between large molecules. | ||
|
|
There was a problem hiding this comment.
Note that the common approach is to subtract large SAE values (as float64) before training, train small per atom type bias (as float32), and add SAE back during model export (as float64).
Note that SAE could be calculated as average per atom type energies in the dataset using aimnet calc_sae.
| when the maximum atomic force magnitude (i.e., the largest per-atom force norm | ||
| sqrt(fx^2+fy^2+fz^2)) drops below this threshold. A value of 0.01 eV/A is a |
| automatically. This initializes the Warp runtime and detects available GPUs: | ||
|
|
||
| ```python | ||
| import aimnet # Triggers wp.init() on first kernel import: ~2-5 seconds |
There was a problem hiding this comment.
It's way less, and only on first compilation. Warp caches compiled kernels between sessions.
| This cost is paid once per Python process. | ||
|
|
||
| ### Kernel JIT Compilation (10-30 seconds) | ||
|
|
| "aimnet2", | ||
| compile_model=True, | ||
| compile_kwargs={"mode": "max-autotune"}, | ||
| ) |
There was a problem hiding this comment.
Please make sure that max-autotune actually works.
|
|
||
| | Condition | Mode | Complexity | Neighbor Lists | | ||
| | ---------------------------------------- | ---------- | ---------- | -------------- | | ||
| | N <= `nb_threshold` AND CUDA AND non-PBC | **Dense** | O(N^2) | No (all-pairs) | |
There was a problem hiding this comment.
'AND non-PBC' is extra here.
There was a problem hiding this comment.
It is better to say that fully-connected mode is beneficial (and enables in Calculator interface) for small molecules on CUDA.
|
|
||
| # High-memory GPU (e.g., 40+ GB) -- stay in dense mode longer | ||
| calc = AIMNet2Calculator("aimnet2", nb_threshold=200) | ||
| ``` |
| with 100 atoms in dense mode, others with 150 atoms in sparse mode) causes | ||
| recompilation each time, negating the benefits. Set `nb_threshold` so that your | ||
| typical workload stays in one mode. | ||
|
|
| `num_neighbors.max().item()` is called to determine the actual maximum neighbor count | ||
| for trimming. This is a single `.item()` call per neighbor list computation (not per | ||
| atom). For most workloads, this overhead is negligible compared to the NN forward pass. | ||
|
|
| elapsed = time.perf_counter() - start | ||
|
|
||
| print(f"Average time: {elapsed / n_iterations * 1000:.2f} ms/call") | ||
| ``` |
There was a problem hiding this comment.
perf_counter works. Better use torch.cuda.Event
|
|
||
| ### Do Not Use Multiple GPUs | ||
|
|
||
| AIMNet2 does not support multi-GPU execution (DataParallel or DistributedDataParallel). |
There was a problem hiding this comment.
Why?
It does not do shading though. Distributed compute is supported.
| **not** compile the neighbor list construction or the external Coulomb/DFTD3 | ||
| modules. The benefit is greatest for repeated evaluations on the same | ||
| system size, such as MD trajectories or geometry optimizations. | ||
|
|
There was a problem hiding this comment.
No. In default compile mode, PyTorch triggers re-compilation only if problem size changes significantly, and caches compiled kernels.
| print(f"Energy: {result['energy'].item():.4f} eV") | ||
| print(f"Forces shape: {result['forces'].shape}") | ||
| print("AIMNet2 loaded successfully") | ||
| ``` |
There was a problem hiding this comment.
Should also print current device and maybe loaded model.
- Correct NSE charge channels description (alpha/beta, not charge/spin) - Add SAE training workflow detail and aimnet calc_sae mention - Tighten fmax recommendation to 0.005 eV/A - Update Warp init and JIT timing with measured values (L40S benchmarks) - Clarify max-autotune provides marginal benefit - Simplify dense mode table (PBC note moved to footnote) - Reframe dense mode as beneficial for small molecules on CUDA - Simplify nb_threshold advice based on benchmark data - Remove Synchronization Points subsection (internal detail) - Use torch.cuda.Event as primary timing example - Rewrite multi-GPU section: distributed compute supported, sharding not - Fix compile_model recompilation claim (caches graphs, not per-size) - Print device in getting_started verification script
- Correct NSE charge channels description (alpha/beta, not charge/spin) - Add SAE training workflow detail and aimnet calc_sae mention - Tighten fmax recommendation to 0.005 eV/A - Update Warp init and JIT timing with measured values (L40S benchmarks) - Clarify max-autotune provides marginal benefit - Simplify dense mode table (PBC note moved to footnote) - Reframe dense mode as beneficial for small molecules on CUDA - Simplify nb_threshold advice based on benchmark data - Remove Synchronization Points subsection (internal detail) - Use torch.cuda.Event as primary timing example - Rewrite multi-GPU section: distributed compute supported, sharding not - Fix compile_model recompilation claim (caches graphs, not per-size) - Print device in getting_started verification script
3f7c250 to
1de5f60
Compare
- Correct NSE charge channels description (alpha/beta, not charge/spin) - Add SAE training workflow detail and aimnet calc_sae mention - Tighten fmax recommendation to 0.005 eV/A - Update Warp init and JIT timing with measured values (L40S benchmarks) - Clarify max-autotune provides marginal benefit - Simplify dense mode table (PBC note moved to footnote) - Reframe dense mode as beneficial for small molecules on CUDA - Simplify nb_threshold advice based on benchmark data - Remove Synchronization Points subsection (internal detail) - Use torch.cuda.Event as primary timing example - Rewrite multi-GPU section: distributed compute supported, sharding not - Fix compile_model recompilation claim (caches graphs, not per-size) - Print device in getting_started verification script
1de5f60 to
800ba8f
Compare
800ba8f to
75d78ba
Compare
Summary
Comprehensive documentation expansion adding 19 new pages and updating 6 existing files (+5,664 lines). Content was written, then reviewed by expert computational chemistry agents who identified ~15 critical scientific accuracy issues, all of which were fixed before this PR.
New content
Models (7 pages)
Tutorials (6 pages)
compile_modelguidancemol_idxconstruction and memory managementAdvanced guides (6 pages)
Updated files
mkdocs.yml: full nav restructure + pymdownx extensionsREADME.md: fixcompile_modelparam, correct ASE API, update Pd model with CPCM/THF.pre-commit-config.yaml: disable MD046 (conflicts with MkDocs admonitions)docs/index.md: replace Material grid cards with standard markdowndocs/getting_started.md: scope to installation + add Loading Your Molecule sectiondocs/long_range.md: correct Ewald scaling from O(N log N) to O(N^2)Scientific accuracy fixes
Review feedback addressed (@zubatyuk)
All 15 inline review comments resolved:
aimnet calc_saeCLITest plan