Output Molecular Weight for a Multi-Molecule SDF File#

Let’s say we want to print out the molecular weights of every molecule in an SD file. Why? Well, we might want to plot a histogram of the distribution, or see whether the average of the distribution is significantly different (in the statistical sense) compared to another SD file.


from openbabel import openbabel as ob

obconversion = ob.OBConversion()
obmol = ob.OBMol()

notatend = obconversion.ReadFile(obmol,"../xsaa.sdf")
while notatend:
    obmol = ob.OBMol()
    notatend = obconversion.Read(obmol)


from openbabel import pybel

for molecule in pybel.readfile("sdf","../xsaa.sdf"):

Find information on all of the atoms and bonds connected to a particular atom#

First of all, look at all of the classes in the Open Babel API that end with “Iter”. You should use these whenever you need to do something like iterate over all of the atoms or bonds connected to a particular atom, iterate over all the atoms in a molecule, iterate over all of the residues in a protein, and so on.

As an example, let’s say we want to find information on all of the bond orders and atoms connected to a particular OBAtom called ‘obatom’. The idea is that we iterate over the neighbouring atoms using OBAtomAtomIter, and then find the bond between the neighbouring atom and ‘obatom’. Alternatively, we could have iterated over the bonds (OBAtomBondIter), but we would need to look at the indices of the two atoms at the ends of the bond to find out which is the neighbouring atom:

for neighbour_atom in ob.OBAtomAtomIter(obatom):
   bond = obatom.GetBond(neighbour_atom)

Examples from around the web#

Split an SDF file using the molecule titles#

The following was a request on the CCL.net list:

Hi all, Does anyone have a script to split an SDFfile into single sdfs named after each after each individual molecule as specified in first line of parent multi file?

The solution is simple…

from openbabel import pybel
for mol in pybel.readfile("sdf", "bigmol.sdf"):
   mol.write("sdf", "%s.sdf" % mol.title)

An implementation of RECAP#

TJ O’Donnell (of gNova) has written an implementation of the RECAP fragmentation algorithm in 130 lines of Python. The code is at [1].

TJ’s book, “Design and Use of Relational Databases in Chemistry”, also contains examples of Python code using Open Babel to create and query molecular databases (see for example the link to Open Babel code in the Appendix).