Skip to content

Conversation

@nbehrnd
Copy link
Contributor

@nbehrnd nbehrnd commented Apr 27, 2023

In previous commits, the assignment of bonds was based on a universal
distance threshold. By the present approach, the atom dictionary was
extended by tabulated data about covalent atomic radii and standard
deviations of these data which are used to decide if two atoms assumed
as neutral are close enough to form a bond. The individual decision
can be reported to the CLI by the optional -v, or --verbose flag.

This commit builds on top of an PR,[1] and addresses an issue.[2]

[1] #7
[2] #6

Test-wise addition for / about nitrogen.  Tested on the example of
pyridine by openbabel from the SMILES string (`c1ccncc1`), and
diethylamine (`CCNCC`).
Tested for the examples of diethylether (`CCOCC`), THF (`C1COCC1`),
and propan-2-ol (`C(C)(O)(C)`) in .xyz files generated by obabel.
Tested with openbabel generated .xyz about diethyl thioether
(`CCSCC`) and furfural (`O=Cc1cccs1`).  However, contrasting to
e.g., the depiction with openbabel generated .pov, the C-S bonds
no longer are visible.
Tested with openbabel SMILES string of fluoroethane (`CCF`).  The
.pov/.ini and subsequent generation of .png seem fine.
Introduction of chlorine.  Tested with an .xyz about chloroethane
(`CCCl`) generated by openbabel, the chlorine atom is depicted as
sphere, separated from the backbone of the molecule.  Because of
a similar observation for compounds with C-S bonds, and because the
analogue CCF was rendered well, perhaps the assignment where to
draw a bond faces an obstacle here.
The previous approach requires four lines each for every atom type
as an instance of class Atom.  I think an approach which uses the
element symbol as key to a the corresponding values of mass,
radiosity, and rgb coordinates in a dictionary is - in comparison
to this one - easier to read, and to maintain.  Lacking any better
inspiration, atoms with atom types not yet explicitly defined in
said dictionary default to hydrogen.  In such an instance, the
conversion will not stop for good, however is going to issue one
warning for each atom not fully defined (yet).
The previous definition of bonds if distances are less than
1.6 A is too simple.  For one, there are covalent bonds which
exceed this threshold (C-I).  For two, simple increase of this
limit to say 1.8 A can cause geminal hydrogens on methylene
units to be assigned a H-H bond in addition to an already
existing C-H bond.

This commit ventures out to add the property dictionary of the
atoms additional values of covalent radii, and the standard
deviations of these covalent radii.  Eventually, there shall be
a comparison between the distance as computed reading the .xyz
file, and a theoretical threshold then used to assign a bond,
or not.  For now, the implementation is after the 1.6 A limit
and only considers C and H as test subjects.  Hence, the test
is not yet as effective as wanted.
Though still incomplete, it is time to move the currently relevant
operations into a function.
Instead of a fix 1.6 Angstrom limit applicable on all bonds as
a maximal bond length, the check to build one bond, or not now
is based on comparing covalent radii and distances inferred
from the xyz file read.
In previous versions of the script, there was the uniform
threshold of interatomic distances to check if two atoms
could form a bond (any kind of covalent bond).  With 1.6 A,
this was fine for C-C, and C-H, but too short e.g. for
C-S, or C-I.  An increase to 1.8 A rendered the display of
the longer bonds possible; however, simultaneously built
H-H bonds for geminal hydrogens on alkyl groups.

To test the new approach based on comparison of covalence
radii, benzene.xyz now is complemented by pentylbenzene.xyz
to extend the coverage of tests possible.
The test with pentylbenzene.xyz is encouraging (no spooky
H-H bonds for geminal H on alkyl chains), and the greeter
from the test function can be removed.
The coverage of the script's dictionary and covalent radii
is extended, added are N, O, and S.
The coverage of atomic radii is extended -- two periods
H to Ar, plus the more frequently encountered Br and I.
More as a proof of 'a priori, the concept is working',
than backed by chemical evidence for such a compound,
an example with C, H, N, O, P, S, and I is added as a
colorful test model.  Coverage should be enough for many
simpler molecular compounds.

Lines of the chemical dictionary incomplete for covalent
radii are currently muted which doesn't affect the output of
the other elements.
The atom radii are covered all between number 1 (hydrogen,
H) and 96 (Curium, Cm).  This likely covers most cases of
application of this script in mind.
The sequence of presenting the covalent radii, their standard
deviation; the results of subsequent computation of a threshold
to decide if a covalent bond could form (based on atoms assumed
as uncharged) was changed.  Explicit format definitions on the
fixed precision.
The user now has to press -v (or --verbose) to obtain a detailed
log to the CLI.
@nbehrnd
Copy link
Contributor Author

nbehrnd commented Apr 27, 2023

To ease a review, individual commits are retained separate. I'm then can squash-merge some of them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant