Pybel#
Pybel provides convenience functions and classes that make it
simpler to use the Open Babel libraries from Python, especially for
file input/output and for accessing the attributes of atoms and
molecules. The Atom and Molecule classes used by Pybel can be
converted to and from the OBAtom and OBMol used by the
openbabel
module. These features are discussed in more detail
below.
The rationale and technical details behind Pybel are described in O’Boyle et al [omh2008]. To support further development of Pybel, please cite this paper if you use Pybel to obtain results for publication.
Information on the Pybel API can be found at the interactive Python
prompt using the help()
function. The full API is also listed in
the next section (see Pybel API).
To use Pybel, use from openbabel import pybel
.
Atoms and Molecules#
A
Molecule
can be created in any of three ways:
From an :obapi:`OBMol`, using
Molecule(myOBMol)
By reading from a file (see Input/Output below)
By reading from a string (see Input/Output below)
An Atom
be created in two different ways:
From an :obapi:`OBAtom`, using
Atom(myOBAtom)
By accessing the
atoms
attribute of aMolecule
Molecules have the following attributes: atoms
, charge
, data
, dim
,
energy
, exactmass
, formula
, molwt
, spin
, sssr
, title
and unitcell
(if crystal data). The atoms
attribute provides a
list of the Atoms in a Molecule. The data
attribute returns a
dictionary-like object for accessing and editing the data fields
associated with the molecule (technically, it’s a
MoleculeData
object, but you can use it like it’s a regular dictionary). The
unitcell
attribute gives access to any unit cell data
associated with the molecule (see
:obapi:`OBUnitCell`).
The remaining attributes correspond directly to attributes of
OBMols: e.g. formula
is equivalent to
:obapi:`OBMol::GetFormula() <OpenBabel::OBMol::GetFormula>`. For more information on what these
attributes are, please see the Open Babel C++ documentation for
:obapi:`OBMol`.
For example, let’s suppose we have an SD file containing descriptor values in the data fields:
>>> mol = next(readfile("sdf", "calculatedprops.sdf")) # (readfile is described below)
>>> print(mol.molwt)
100.1
>>> print(len(mol.atoms))
16
>>> print(mol.data.keys())
{'Comment': 'Created by CDK', 'NSC': 1, 'Hydrogen Bond Donors': 3,
'Surface Area': 342.43, .... }
>>> print(mol.data['Hydrogen Bond Donors'])
3
>>> mol.data['Random Value'] = random.randint(0,1000) # Add a descriptor containing noise
Molecules have a write()
method that writes a representation of a Molecule to a file or to a
string. See Input/Output below. They also have
a calcfp()
method that calculates a molecular fingerprint. See Fingerprints
below.
The draw()
method of a Molecule generates 2D coordinates and a 2D depiction of
a molecule. It uses the
OASA library by Beda
Kosata to do this. The default
options are to show the image on the screen (show=True
), not to
write to a file (filename=None
), to calculate 2D coordinates
(usecoords=False
) but not to store them (update=False
).
The addh()
and removeh()
methods allow hydrogens to be added and removed.
If a molecule does not have 3D coordinates, they can be generated
using the make3D()
method. By default, this includes 50 steps of a geometry
optimisation using the MMFF94 forcefield. The list of available
forcefields is stored in the
forcefields
variable. To further optimise the structure, you can use the
localopt()
method, which by default carries out 500 steps of an optimisation
using MMFF94. Note that hydrogens need to be added before calling
localopt()
.
The calcdesc()
method of a Molecule returns a dictionary containing descriptor
values for LogP, Polar Surface Area (“TPSA”) and Molar Refractivity
(“MR”). A list of the available descriptors is contained in the
variable descs
.
If only one or two descriptor values are required, you can specify
the names as follows: calcdesc(["LogP", "TPSA"])
. Since the
data
attribute of a Molecule is also a dictionary, you can
easily add the result of calcdesc()
to an SD file (for example)
as follows:
mol = next(readfile("sdf", "without_desc.sdf"))
descvalues = mol.calcdesc()
# In Python, the update method of a dictionary allows you
# to add the contents of one dictionary to another
mol.data.update(descvalues)
output = Outputfile("sdf", "with_desc.sdf")
output.write(mol)
output.close()
For convenience, a Molecule provides an iterator over its Atoms. This is used as follows:
for atom in myMolecule:
# do something with atom
Atoms have the following attributes: atomicmass
, atomicnum
,
coords
, exactmass
, formalcharge
, heavyvalence
,
heterovalence
, hyb
, idx
, implicitvalence
, isotope
,
partialcharge
, spin
, type
, valence
, vector
. The .coords
attribute provides a tuple (x, y, z) of the atom’s coordinates. The
remaining attributes are as for the Get methods of
:obapi:`OBAtom`.
Input/Output#
One of the strengths of Open Babel is the number of chemical file
formats that it can handle (see Supported File Formats and Options). Pybel provides a dictionary of the
input and output formats in the variables informats
and outformats
where the keys are the three-letter codes for each format (e.g.
pdb
) and the values are the descriptions (e.g. Protein Data Bank
format
).
Pybel greatly simplifies the process of reading and writing molecules to and from strings or files. There are two functions for reading Molecules:
readstring()
reads a Molecule from a stringreadfile()
provides an iterator over the Molecules in a file
Here are some examples of their use. Note in particular the use of
next()
to access the first (and possibly only) molecule in a
file:
>>> mymol = readstring("smi", "CCCC")
>>> print(mymol.molwt)
58
>>> for mymol in readfile("sdf", "largeSDfile.sdf")
... print(mymol.molwt)
>>> singlemol = next(readfile("pdb", "1CRN.pdb"))
If a single molecule is to be written to a molecule or string, the
write()
method of the Molecule should be used:
mymol.write(format)
returns a stringmymol.write(format, filename)
writes the Molecule to a file. An optional additional parameter,overwrite
, should be set toTrue
if you wish to overwrite an existing file.
For files containing multiple molecules, the
Outputfile
class should be used instead. This is initialised with a format and
filename (and optional overwrite
parameter). To write a
Molecule to the file, the
write()
method of the Outputfile is called with the Molecule as a
parameter. When all molecules have been written, the
close()
method of the Outputfile should be called.
Here are some examples of output using the Pybel methods and classes:
>>> print(mymol.write("smi"))
'CCCC'
>>> mymol.write("smi", "outputfile.txt")
>>> largeSDfile = Outputfile("sdf", "multipleSD.sdf")
>>> largeSDfile.write(mymol)
>>> largeSDfile.write(myothermol)
>>> largeSDfile.close()
Fingerprints#
A Fingerprint
can be created in either of two ways:
From a vector returned by the OpenBabel GetFingerprint() method, using
Fingerprint(myvector)
By calling the
calcfp()
method of a Molecule
The calcfp()
method takes an optional argument, fptype
,
which should be one of the fingerprint types supported by OpenBabel
(see Molecular fingerprints and similarity searching). The
list of supported fingerprints is stored in the variable
fps
.
If unspecified, the default fingerprint (FP2
) is calculated.
Once created, the Fingerprint has two attributes: fp
gives the
original OpenBabel vector corresponding to the fingerprint, and
bits
gives a list of the bits that are set.
The Tanimoto coefficient of two Fingerprints can be calculated
using the |
operator.
Here is an example of its use:
>>> from openbabel import pybel
>>> smiles = ['CCCC', 'CCCN']
>>> mols = [pybel.readstring("smi", x) for x in smiles] # Create a list of two molecules
>>> fps = [x.calcfp() for x in mols] # Calculate their fingerprints
>>> print(fps[0].bits, fps[1].bits)
[261, 385, 671] [83, 261, 349, 671, 907]
>>> print(fps[0] | fps[1]) # Print the Tanimoto coefficient
0.3333
SMARTS matching#
Pybel also provides a simplified API to the Open Babel SMARTS
pattern matcher. A
Smarts
object is created, and the
findall()
method is then used to return a list of the matches to a given
Molecule.
Here is an example of its use:
>>> mol = readstring("smi","CCN(CC)CC") # triethylamine
>>> smarts = Smarts("[#6][#6]") # Matches an ethyl group
>>> print(smarts.findall(mol))
[(1, 2), (4, 5), (6, 7)]
Combining Pybel with openbabel.py
#
It is easy to combine the ease of use of Pybel with the
comprehensive coverage of the Open Babel toolkit that
openbabel.py
provides. Pybel is really a wrapper around
openbabel.py
, with the result that the OBAtom and OBMol used by
openbabel.py
can be interconverted to the Atom and Molecule used by
Pybel.
The following example shows how to read a molecule from a PDB file
using Pybel, and then how to use openbabel.py
to add hydrogens. It
also illustrates how to find out information on what methods and
classes are available, while at the interactive Python prompt.
>>> from openbabel import pybel
>>> mol = next(pybel.readfile("pdb", "1PYB"))
>>> help(mol)
Help on Molecule in module pybel object:
...
| Attributes:
| atoms, charge, dim, energy, exactmass, flags, formula,
| mod, molwt, spin, sssr, title.
...
| The original Open Babel molecule can be accessed using the attribute:
| OBMol
...
>>> print(len(mol.atoms), mol.molwt)
3430 49315.2
>>> dir(mol.OBMol) # Show the list of methods provided by openbabel.py
['AddAtom', 'AddBond', 'AddConformer', 'AddHydrogens', 'AddPolarHydrogens', ... ]
>>> mol.OBMol.AddHydrogens()
>>> print(len(mol.atoms), mol.molwt)
7244 49406.0
The next example is an extension of one of the openbabel.py
examples at the top of this page. It shows how a molecule could be
created using openbabel.py
, and then written to a file using
Pybel:
from openbabel import openbabel, pybel
mol = openbabel.OBMol()
a = mol.NewAtom()
a.SetAtomicNum(6) # carbon atom
a.SetVector(0.0, 1.0, 2.0) # coordinates
b = mol.NewAtom()
mol.AddBond(1, 2, 1) # atoms indexed from 1
pybelmol = pybel.Molecule(mol)
pybelmol.write("sdf", "outputfile.sdf")