Draft
Conversation
1. Address the flying ice cube syndrome 2. Ensure trajectory files are written 3. Properly deal with pathnames 4. Use different intervals for restart file and trajectory file updates
There still are a few questions about passing data back to DeepDriveMD. - Does it pick the DCD trajectory file up? - Does the HDF5 contact map file work?
While you can go through a trajectory time step by time step pulling out each frame you cannot use that to write a trajectory directly (in another format for example). You have to select some atoms to create an AtomGroup which gets update every time you read a time step. You have to pass the AtomGroup to the trajectory writer. If you pass a time step to the trajectory writer then the code crashes complaining that the object you passed is neither an AtomGroup nor a Universe. To make matters worse the MDAnalysis documentation is full of broken examples that pass time steps to the trajectory writer. This page https://www.mdanalysis.org/MDAnalysisTutorial/writing.html is the only place where I found the correct way of doing this. The other interesting question is how "select_atoms" works. Thankfully the selection "all" seems to work.
File locking seems to cause problems on Crusher on the compute nodes for no apparent reason.
Collaborator
Author
|
The run_nwchem.py component seems to be working now. The aggregation step that follows still fails. The reason the aggregation fails is that in |
NWChem's XYZ file writer may write the coordinates with Fortran's "*"-notation. This is definitely not compliant with the XYZ file format. So we need to replace the occurances of this notation with straight numbers.
So instead of just trying to activate LMS we should check whether it is there and only activate it if it is installed.
Note that the atoms in 7CZ4-unfolded.pdb and 7CZ4-folded.pdb are ordered differently. I.e. the ordering within the residues is different. I am not sure whether the RMSD calculation handles this correctly.
The structure is the same as in 7cz4_fixedFH_allH_nwc_small.pdb but now the atoms in the protein residues have reordered in the NWChem convention. I think this reordering will help calculating the RSMD.
- change run.sh to deal with multiple use cases instead of just one - nwchem/config.py change the atom selection for the contact map generation - nwchem.py/run_nwchem.py we need to be able to copy additional data files to define the nwchem calculation correctly - 7cz4/config.yaml the contact map is now larger 179x179
The new atom selection makes sure we select an even number of atoms in the 7cz4 use case (and also in the bba use case), and we set the contact map to the corresponding size.
This script pulls data from the HDF5 files generated from the trajectories. The contact maps are transformed into the latent space and selected dimensions as well as the RMSD values are stored in a CSV file. This CSV file should be easy to visualize using Matplotlib.
N2P2 insists on using 3 body potentials even if there is only 1 atom of a given element in the training set. Potentially this is causing major problems in the calculations with NaN's all over the place. So I am adding some structure with more oxygen atoms to see if that fixes it.
N2P2's scaling program will crash if there are any unbound atoms in the training set.
Initial experience suggests that in the N2P2 model we are going
to be significantly impacted by implementation details. I am
surprised by how porous the abstractions are. Examples of issues
that need investigation are:
- the complexity of the descriptors (many Gaussian and
trigoniometric functions)
- the complexity of the activation functions (higher order
polynomials)
- the fact that the last activation functions are still linear
functions. So no matter the complicated descriptors and the
polynomial activation functions in the hidden layers in the
end you are still fitting a function with straight line
segments!
- the convergence rate of the training
- 10 epochs is not normally enough except when you update
your model with every training point and have a massive
training set
- with small training sets (a couple of hundred points) the
convergence of training seems very slow with errors on the
order of 10e+08 on the training set after 100 epochs
- in addition even with a small training set it takes 7
minutes per epoch no matter how many resources you throw
at it
To Do: - With DeePMD LAMMPS produces a model_devi.out file comparing the results of different models to assess the model precision. With N2P2 LAMMPS uses a single model. So we need to implement this comparison ourselves. Part of the infrastructure is there now, but the actual comparison and writing of the model_devi.out file still needs to be implemented.
The test case we are studying has only single bonds so let's not confuse the neural network with data related to things it doesn't need to know about.
These calculations cannot be used with N2P2 because that code crashes if there are unbound atoms.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request wants to add NWChem support to DeepDriveMD.
For this purpose a new directory
DeepDriveMD-pipeline/deepdrivemd/sim/nwchemhas been added. Initially this directory was a copy ofDeepDriveMD-pipeline/deepdrivemd/sim/openmm.This pull requests adds the following files:
nwchem.py- contains input generators for the various NWChem calculations needed for what DeepDriveMD tries to accomplishnwchem_test.py- contains a Python script that executes the functionality ofnwchem.pyon the 1FME example.TO DO:
[V] create
run_nwchem.pyfromrun_openmm.py[V] adapt
run_nwchem.pyto executing NWChem instead of OpenMM[V] adapt
config.pyto NWChem[ ] other