Difference between revisions of "Python"
Baoilleach (Talk | contribs) m (Python 1.2) |
Baoilleach (Talk | contribs) m |
||
Line 167: | Line 167: | ||
</pre> | </pre> | ||
− | Atoms have the following attributes: atomicmass, atomicnum, cidx, coords, coordidx, exactmass, formatcharge, heavyvalence, heterovalence, hyb, idx, implicitvalence, index, isotope, partialcharge, spin, type, valence, vector. The <code>.coords</code> attribute provides a tuple (x, y, z) of the atom's coordinates. The remaining attributes are as for OBAtom | + | Atoms have the following attributes: atomicmass, atomicnum, cidx, coords, coordidx, exactmass, formatcharge, heavyvalence, heterovalence, hyb, idx, implicitvalence, index, isotope, partialcharge, spin, type, valence, vector. The <code>.coords</code> attribute provides a tuple (x, y, z) of the atom's coordinates. The remaining attributes are as for [http://openbabel.sourceforge.net/api/2.1.0/classOpenBabel_1_1OBAtom.shtml OBAtom]. |
==== Input/Output ==== | ==== Input/Output ==== |
Revision as of 00:01, 1 June 2007
Contents
Accessing Open Babel with Python
There are two ways to access the Open Babel library using Python:
- The
openbabel
module, a direct binding of the Open Babel C++ library, created using the SWIG package. -
Pybel
, a set of convenience functions and classes that simplifies access to theopenbabel
module. (Only available with OpenBabel >= 2.0.3)
For more examples of using Open Babel from Python, see the developer Python tutorial. If you have any questions or find any problems, please send an email to the openbabel-scripting mailing list.
The openbabel module
The openbabel
module provides direct access to the C++ Open Babel library from Python. This binding is generated using the SWIG package and provides access to almost all of the Open Babel interfaces via Python, including the base classes OBMol, OBAtom, OBBond, and OBResidue, as well as the conversion framework OBConversion. As such, essentially any call in the C++ API is available to Python scripts with very little difference in syntax. As a result, the principal documentation is the Open Babel C++ API documentation.
Examples
Here we give some examples of common Python syntax for the openbabel
module and pointers to the appropriate sections of the API documentation.
The example script below creates atoms and bonds one-by-one using the OBMol, OBAtom, and OBBond classes.
import openbabel mol = openbabel.OBMol() print 'Should print 0 (atoms)' print mol.NumAtoms() a = mol.NewAtom() a.SetAtomicNum(6) # carbon atom a.SetVector(0.0, 1.0, 2.0) # coordinates b = mol.NewAtom() mol.AddBond(1, 2, 1) # atoms indexed from 1 print 'Should print 2 (atoms)' print mol.NumAtoms() print 'Should print 1 (bond)' print mol.NumBonds() mol.Clear();
More commonly, Open Babel can be used to read in molecules using the OBConversion framework. The following script reads in molecular information (a SMI file) from a string, adds hydrogens, and writes out an MDL file as a string.
import openbabel obConversion = openbabel.OBConversion() obConversion.SetInAndOutFormats("smi", "mdl") mol = openbabel.OBMol() obConversion.ReadString(mol, "C1=CC=CS1") print 'Should print 5 (atoms)' print mol.NumAtoms() mol.AddHydrogens() print 'Should print 9 (atoms) after adding hydrogens' print mol.NumAtoms() outMDL = obConversion.WriteString(mol)
The following script writes out a file using a filename, rather than reading and writing to a Python string.
import openbabel obConversion = openbabel.OBConversion() obConversion.SetInAndOutFormats("pdb", "mol2") mol = openbabel.OBMol() obConversion.ReadFile(mol, "1ABC.pdb.gz") # Open Babel will uncompress automatically mol.AddHydrogens() print mol.NumAtoms() print mol.NumBonds() print mol.NumResidues() obConversion.WriteFile(mol, '1abc.mol2')
That's it! There's more information on particular calls in the library API. Feel free to address questions to the openbabel-scripting mailing list.
Using iterators
A number of Open Babel toolkit classes provide iterators over various objects; these classes are identifiable by the suffix "Iter" in the list of toolkit classes in the API. The OBMolAtomIter class, for example, makes it easy to iterate over the atoms in a molecule. These iterator classes can be used using the typical Python syntax for iterators:
for atom in openbabel.OBMolAtomIter(mymol): print atom.GetAtomicMass()
Calling a method requiring an array of C doubles
Some Open Babel toolkit methods, for example OBMol::Rotate(), require an array of doubles. It's not possible to directly use a list of floats when calling such a function from Python. Instead, you need to first explicitly create a C array using the double_array() function:
obMol.Rotate([1.0, -54.7, 3]) # Error! myarray = openbabel.double_array([1.0, -54.7, 3]) obMol.Rotate(myarray) # Works!
Combining numpy with Open Babel
If you are using the Python numerical extension, numpy, and you try to pass values from a numpy array to Open Babel, it may not work unless you convert the values to Python built-in types first:
import numpy, openbabel mol = openbabel.OBMol() atom = mol.NewAtom() coord = numpy.array([1.2, 2.3, 4.6], "float32") atom.SetVector(coord[0], coord[1], coord[2]) # Error atom.SetVector(float(coord[0]), float(coord[1]), float(coord[2])) # No error coord = numpy.array([1.2, 2.3, 4.6], "float64") atom.SetVector(coord[0], coord[1], coord[2]) # No error either - not all numpy arrays will cause an error
Pybel
Pybel provides convenience functions and classes that make it simpler to use the Open Babel libraries from Python, especially for file input/output and for accessing the attributes of atoms and molecules. The Atom and Molecule classes used by Pybel can be converted to and from the OBAtom and OBMol used by the openbabel
module. These features are discussed in more detail below.
Information on the Pybel API can be found at the interactive Python prompt using the help()
function, and is also available here: Pybel API.
To use Pybel, use "import pybel
" or "from pybel import *
".
Atoms and Molecules
A Molecule can be created in any of four ways:
- From an OBMol, using
Molecule(myOBMol)
- By reading from a file (see Input/Output below)
- By reading from a string (see Input/Output below)
- An empty Molecule can be created using
Molecule()
An Atom can be created in three different ways:
- From an OBAtom, using
Atom(myOBAtom)
- By accessing the
.atoms
attribute of a Molecule - An empty Atom can be created using
Atom()
It is always possible to access the OBMol or OBAtom on which a Molecule or Atom is based, by accessing the appropriate attribute, either .OBMol
or .OBAtom
. In this way, it is easy to combine the convenience of pybel
with the many additional capabilities present in openbabel
. See Combining Pybel with openbabel.py below.
Molecules have the following attributes: atoms, charge, data, dim, energy, exactmass, flags, formula, mod, molwt, spin, sssr, title and unitcell (if crystal data). The .atoms
attribute provides a list of the Atoms in a Molecule. The .data
attribute returns a dictionary of the data fields associated with the molecule, and can be accessed and edited like a normal dictionary. The .unitcell
attribute gives access to any unit cell data associated with the molecule (see OBUnitCell). The remaining attributes correspond directly to attributes of OBMols: e.g. Molecule.formula
is equivalent to OBMol.GetFormula()
. For more information on what these attributes are, please see the Open Babel Library documentation for OBMol.
For example, let's suppose we have an SD file containing descriptor values in the data fields:
>>> mol = readfile("sdf", "calculatedprops.sdf").next() # (readfile is described below) >>> print mol.molwt 100.1 >>> print len(mol.atoms) 16 >>> print mol.data.keys() {'Comment': 'Created by CDK', 'NSC': 1, 'Hydrogen Bond Donors': 3, 'Surface Area': 342.43, .... } >>> print mol.data['Hydrogen Bond Donors'] 3 >>> mol.data['Random Value'] = random.randint(0,1000) # Add a descriptor containing noise
Molecules have a .write()
method that writes a representation of a Molecule to a file or to a string. See Input/Output below. They also have a .calcfp()
method that calculates a molecular fingerprint. See Fingerprints below.
For convenience, a Molecule provides an iterator over its Atoms. This is used as follows:
for atom in myMolecule: # do something with atom
Atoms have the following attributes: atomicmass, atomicnum, cidx, coords, coordidx, exactmass, formatcharge, heavyvalence, heterovalence, hyb, idx, implicitvalence, index, isotope, partialcharge, spin, type, valence, vector. The .coords
attribute provides a tuple (x, y, z) of the atom's coordinates. The remaining attributes are as for OBAtom.
Input/Output
One of the strengths of Open Babel is the number of chemical file formats that it can handle. Pybel provides a dictionary of the input and output formats in the variables informats and outformats, where the keys are the three-letter codes for each format (e.g. 'pdb') and the values are the descriptions (e.g. 'Protein Data Bank format').
Pybel greatly simplifies the process of reading and writing molecules to and from strings or files. There are two functions for reading Molecules:
-
readstring(format, string)
reads a Molecule from a string -
readfile(format, filename)
provides an iterator over the Molecules in a file
Here are some examples of their use. Note in particular the use of .next()
to access the first (and possibly only) molecule in a file:
>>> mymol = readstring("smi", "CCCC") >>> print mymol.molwt 45 >>> for mymol in readfile("sdf", "largeSDfile.sdf") ... print mymol.molwt >>> singlemol = readfile("pdb", "1CRN.pdb").next()
If a single molecule is to be written to a molecule or string, the .write()
method of the Molecule should be used:
-
mymol.write(format)
returns a string -
mymol.write(format, filename)
writes the Molecule to a file. An optional additional parameter,overwrite
, should be set toTrue
if you wish to overwrite an existing file.
For files containing multiple molecules, the Outputfile class should be used instead. This is initialised with a format and filename (and optional overwrite
parameter). To write a Molecule to the file, the .write()
method of the Outputfile is called with the Molecule as a parameter. When all molecules have been written, the .close()
method of the Outputfile should be called.
Here are some examples of output using the Pybel methods and classes:
>>> print mymol.write("smi") 'CCCC' >>> mymol.write("smi", "outputfile.txt") >>> largeSDfile = Outputfile("sdf", "multipleSD.sdf") >>> largeSDfile.write(mymol) >>> largeSDfile.write(myothermol) >>> largeSDfile.close()
Fingerprints
A Fingerprint can be created in either of two ways:
- From a vector returned by the OpenBabel GetFingerprint() method, using
Fingerprint(myvector)
- By calling the
.calcfp()
method of a Molecule
The .calcfp()
method takes an optional argument, fptype
, which should be one of the fingerprint types supported by OpenBabel (see Tutorial:Fingerprints). If unspecified, the default fingerprint ("FP2") is calculated.
Once created, the Fingerprint has two attributes: fp
gives the original OpenBabel vector corresponding to the fingerprint, and bits
gives a list of the bits that are set.
The Tanimoto coefficient of two Fingerprints can be calculated using the "|" operator.
Here is an example of its use:
>>> import pybel >>> smiles = ['CCCC', 'CCCN'] >>> mols = [pybel.readstring("smi", x) for x in smiles] # Create two molecules from the SMILES >>> fps = [x.calcfp() for x in mols] # Calculate their fingerprints >>> print fps[0].bits, fps[1].bits [261, 385, 671] [83, 261, 349, 671, 907] >>> print fps[0] | fps[1] # Print the Tanimoto coefficient 0.3333
SMARTS matching
Pybel also provides a simplified API to the Open Babel SMARTS pattern matcher. A Smarts object is created, and the .findall()
method is then used to return a list of the matches to a given Molecule.
Here is an example of its use:
>>> mol = readstring("smi","CCN(CC)CC") # triethylamine >>> smarts = Smarts("[#6][#6]") # Matches an ethyl group >>> print smarts.findall(mol) [(1, 2), (4, 5), (6, 7)]
Combining Pybel with openbabel.py
It is easy to combine the ease of use of Pybel, with the comprehensive coverage of the Open Babel toolkit that openbabel.py
provides. Pybel is really a wrapper around openbabel.py
, with the result that the OBAtom and OBMol used by openbabel.py can be interconverted to the Atom and Molecule used by Pybel.
The following example shows how to read a molecule from a PDB file using Pybel, and then how to use openbabel.py to add hydrogens. It also illustrates how to find out information on what methods and classes are available, while at the interactive Python prompt.
>>> import pybel >>> mol = pybel.readfile("pdb", "1PYB").next() >>> help(mol) Help on Molecule in module pybel object: ... | Attributes: | atoms, charge, dim, energy, exactmass, flags, formula, | mod, molwt, spin, sssr, title. ... | The original Open Babel molecule can be accessed using the attribute: | OBMol ... >>> print len(mol.atoms), mol.molwt 3430 49315.2 >>> dir(mol.OBMol) # Show the list of methods provided by openbabel.py ['AddAtom', 'AddBond', 'AddConformer', 'AddHydrogens', 'AddPolarHydrogens', ... ] >>> mol.OBMol.AddHydrogens() >>> print len(mol.atoms), mol.molwt 7244 49406.0
The next example is an extension of one of the openbabel.py
examples at the top of this page. It shows how a molecule could be created using openbabel.py
, and then written to a file using Pybel:
import openbabel, pybel mol = openbabel.OBMol() a = mol.NewAtom() a.SetAtomicNum(6) # carbon atom a.SetVector(0.0, 1.0, 2.0) # coordinates b = mol.NewAtom() mol.AddBond(1, 2, 1) # atoms indexed from 1 pybelmol = pybel.Molecule(mol) pybelmol.write("sdf", "outputfile.sdf")