OBFingerprint Class Reference

The base class for fingerprints. More...

#include <openbabel/fingerprint.h>

Inheritance diagram for OBFingerprint:

List of all members.

Public Types

enum  FptFlag { FPT_UNIQUEBITS = 1 }
typedef std::map< const char
*, OBPlugin *, CharPtrLess
PluginMapType
typedef
PluginMapType::const_iterator 
PluginIterator

Public Member Functions

virtual ~OBFingerprint ()
void SetBit (std::vector< unsigned int > &vec, const unsigned int n)
bool GetBit (const std::vector< unsigned int > &vec, const unsigned int n)
void Fold (std::vector< unsigned int > &vec, unsigned int nbits)
virtual bool GetFingerprint (OBBase *pOb, std::vector< unsigned int > &fp, int nbits=0)=0
virtual unsigned int Flags ()
virtual std::string DescribeBits (const std::vector< unsigned int > fp, bool bSet=true)
virtual const char * Description ()
virtual bool Display (std::string &txt, const char *param, const char *ID=NULL)
virtual OBPluginMakeInstance (const std::vector< std::string > &)
const char * GetID () const
virtual PluginMapTypeGetMap () const =0

Static Public Member Functions

static double Tanimoto (const std::vector< unsigned int > &vec1, const std::vector< unsigned int > &vec2)
static double Tanimoto (const std::vector< unsigned int > &vec1, const unsigned int *p2)
static unsigned int Getbitsperint ()
static OBFingerprintFindFingerprint (const char *ID)
static OBPluginGetPlugin (const char *Type, const char *ID)
static bool ListAsVector (const char *PluginID, const char *param, std::vector< std::string > &vlist)
static void List (const char *PluginID, const char *param=NULL, std::ostream *os=&std::cout)
static std::string ListAsString (const char *PluginID, const char *param=NULL)
static std::string FirstLine (const char *txt)
static PluginIterator Begin (const char *PluginID)
static PluginIterator End (const char *PluginID)

Static Protected Member Functions

static PluginMapTypePluginMap ()
static PluginMapTypeGetTypeMap (const char *PluginID)
static OBPluginBaseFindType (PluginMapType &Map, const char *ID)

Protected Attributes

const char * _id

Classes

struct  bit_or
 Function object to set bits.


Detailed Description

The base class for fingerprints.

These fingerprints are condensed representation of molecules (or other objects) as a list of boolean values (actually bits in a vector<unsigned>) with length of a power of 2. The main motivation is for fast searching of data sources containing large numbers of molecules (up to several million). Open Babel provides some routines which can search text files containing lists of molecules in any format. See the documentation on the class FastSearch.

There are descriptions of molecular fingerprints at
http://www.daylight.com/dayhtml/doc/theory/theory.finger.html) and
http://www.mesaac.com/Fingerprint.htm
Many methods of preparing fingerprints have been described, but the type supported currently in OpenBabel has each bit representing a substructure (or other molecular property). If a substructure is present in the molecule, then a particular bit is set to 1. But because the hashing method may also map other substructures to the same bit, a match does not guarantee that a particular substructure is present; there may be false positives. However, with proper design, a large fraction of irrelevant molecules in a data set can be eliminated in a fast search with boolean methods on the fingerprints. It then becomes feasible to make a definitive substructure search by conventional methods on this reduced list even if it is slow.

OpenBabel provides a framework for applying new types of fingerprints without changing any existing code. They are derived from OBFingerprint and the source file is just compiled with the rest of OpenBabel. Alternatively, they can be separately compiled as a DLL or shared library and discovered when OpenBabel runs.

For more on these specific implementations of fingerprints in Open Babel, please take a look at the developer's wiki: http://openbabel.org/wiki/Fingerprints

Fingerprints derived from this abstract base class OBFingerprint can be for any object derived from OBBase (not just for OBMol). Each derived class provides an ID as a string and OBFingerprint keeps a map of these to provides a pointer to the class when requested in FindFingerprint.

-- To define a fingerprint type --

The classes derived form OBFingerprint are required to provide a GetFingerprint() routine and a Description() routine

    class MyFpType : OBFingerprint 
    {
       MyFpType(const char* id) : OBFingerprint(id){};

       virtual bool GetFingerprint(OBBase* pOb, vector<unsigned int>& fp, int nbits) 
       {
          //Convert pOb to the required type, usually OBMol
          OBMol* pmol = dynamic_cast<OBMol*>(pOb);
          fp.resize(required_number_of_words);
          ... 
          use SetBit(fp,n); to set the nth bit

          if(nbits)
             Fold(fp, nbits);
       }
       
       virtual const char* Description(){ return "Some descriptive text";}
       ...
    };

Declare a global instance with the ID you will use in -f options to specify its use.

    MyFpType theMyFpType("myfpID");

-- To obtain a fingerprint --

    OBMol mol;
    ...
    vector<unsigned int> fp;
    OBFingerprint::GetDefault()->GetFingerprint(&mol, fp); //gets default size of fingerprint
or
    vector<unsigned int> fp;
    OBFingerPrint* pFP = OBFingerprint::FindFingerprint("myfpID");
    ...and maybe...
    pFP->GetFingerprint(&mol,fp, 128); //fold down to 128bits if was originally larger

-- To print a list of available fingerprint types --

    std::string id;
    OBFingerPrint* pFPrt=NULL;
    while(OBFingerprint::GetNextFPrt(id, pFPrt))
    {
       cout << id << " -- " << pFPrt->Description() << endl;
    }

Fingerprints are handled as vector<unsigned int> so that the number of bits in this vector and their order will be platform and compiler dependent, because of size of int types and endian differences. Use fingerprints (and fastsearch indexes containing them) only for comparing with other fingerprints prepared on the same machine.

The FingerprintFormat class is an output format which displays fingerprints as hexadecimal. When multiple molecules are supplied it will calculate the Tanimoto coefficient from the first molecule to each of the others. It also shows whether the first molecule is a possible substructure to all the others, i.e. whether all the bits set in the fingerprint for the first molecule are set in the fingerprint of the others. To display hexadecimal information when multiple molecules are provided it is necessay to use the -xh option.

To see a list of available format types, type babel -F on the command line. The -xF option of the FingerprintFormat class also provides this output, but due to a quirk in the way the program works, it is necessary to have a valid input molecule for this option to work.


Member Typedef Documentation

typedef std::map<const char*, OBPlugin*, CharPtrLess> PluginMapType [inherited]

typedef PluginMapType::const_iterator PluginIterator [inherited]


Member Enumeration Documentation

enum FptFlag

Optional flags.

Enumerator:
FPT_UNIQUEBITS 


Constructor & Destructor Documentation

virtual ~OBFingerprint (  )  [inline, virtual]


Member Function Documentation

void SetBit ( std::vector< unsigned int > &  vec,
const unsigned int  n 
)

Sets the nth bit.

bool GetBit ( const std::vector< unsigned int > &  vec,
const unsigned int  n 
)

return true if the nth bit is set;

void Fold ( std::vector< unsigned int > &  vec,
unsigned int  nbits 
)

Repeatedly ORs the top half with the bottom half until no smaller than nbits.

virtual bool GetFingerprint ( OBBase pOb,
std::vector< unsigned int > &  fp,
int  nbits = 0 
) [pure virtual]

Returns:
fingerprint in vector, which may be resized, folded to nbits (if nbits!=0)

Referenced by FastSearchIndexer::Add(), FastSearch::Find(), FastSearch::FindMatch(), and FastSearch::FindSimilar().

virtual unsigned int Flags (  )  [inline, virtual]

virtual std::string DescribeBits ( const std::vector< unsigned int >  fp,
bool  bSet = true 
) [inline, virtual]

Returns:
a description of each bit that is set (or unset, if bSet=false)
Since:
version 2.2

double Tanimoto ( const std::vector< unsigned int > &  vec1,
const std::vector< unsigned int > &  vec2 
) [static]

Returns:
the Tanimoto coefficient between two vectors (vector<unsigned int>& SeekPositions)

static double Tanimoto ( const std::vector< unsigned int > &  vec1,
const unsigned int *  p2 
) [inline, static]

Inline version of Tanimoto() taking a pointer for the second vector.

If used for two vectors, vec1 and vec2, call as Tanimoto(vec1, &vec2[0]);

static OBFingerprint* FindFingerprint ( const char *  ID  )  [inline, static]

For backward compatibility; a synonym of OBFingerprint::FindType.

Returns:
a pointer to a fingerprint (the default if ID is empty), or NULL if not available

Referenced by FptIndex::CheckFP().

virtual const char* Description (  )  [inline, virtual, inherited]

Required description of a sub-type.

Reimplemented in OBFormat, OBGroupContrib, and OpTransform.

Referenced by OBPlugin::Display(), and OBOp::OpOptions().

bool Display ( std::string &  txt,
const char *  param,
const char *  ID = NULL 
) [virtual, inherited]

Write information on a plugin class to the string txt. Return false if not written. The default implementation outputs: the ID, a tab character, and the first line of the Description. The param string can be used in derived types to provide different outputs.

Reimplemented in OBDescriptor, and OBFormat.

Referenced by OBDescriptor::Display().

virtual OBPlugin* MakeInstance ( const std::vector< std::string > &   )  [inline, virtual, inherited]

Make a new instance of the class. See OpTransform, OBGroupContrib, SmartsDescriptor classes for derived versions. Usually, the first parameter is the classname, the next three are parameters(ID, filename, description) for a constructor, and the rest data.

Reimplemented in OBGroupContrib, and OpTransform.

Referenced by OBConversion::LoadFormatFiles().

static OBPlugin* GetPlugin ( const char *  Type,
const char *  ID 
) [inline, static, inherited]

Get a pointer to a plugin from its type and ID. Return NULL if not found. Not cast to Type*.

Referenced by OBConversion::LoadFormatFiles().

const char* GetID (  )  const [inline, inherited]

Return the ID of the sub-type instance.

Referenced by OBPlugin::Display(), OBFormat::Display(), and OBDescriptor::PredictAndSave().

bool ListAsVector ( const char *  PluginID,
const char *  param,
std::vector< std::string > &  vlist 
) [static, inherited]

Output a list of sub-type classes of the the type PluginID, or, if PluginID is "plugins" or empty, a list of the base types. If PluginID is not recognized or is NULL, the base types are output and the return is false.

Referenced by OBConversion::GetSupportedInputFormat(), OBConversion::GetSupportedOutputFormat(), and OBPlugin::List().

void List ( const char *  PluginID,
const char *  param = NULL,
std::ostream *  os = &std::cout 
) [static, inherited]

As ListAsVector but sent to an ostream with a default of cout if not specified.

Referenced by OBPlugin::ListAsString().

string ListAsString ( const char *  PluginID,
const char *  param = NULL 
) [static, inherited]

As ListAsVector but returns a string containing the list.

string FirstLine ( const char *  txt  )  [static, inherited]

Utility function to return only the first line of a string.

Referenced by OBPlugin::Display(), OBFormat::Display(), and OBOp::OpOptions().

static PluginIterator Begin ( const char *  PluginID  )  [inline, static, inherited]

Return an iterator at the start of the map of the plugin types PluginID or, if there is no such map, the end of the top level plugin map.

Referenced by OBConversion::GetNextFormat(), and OBOp::OpOptions().

static PluginIterator End ( const char *  PluginID  )  [inline, static, inherited]

virtual PluginMapType& GetMap (  )  const [pure virtual, inherited]

Returns the map of the subtypes.

Referenced by OBFormat::RegisterFormat().

static PluginMapType& PluginMap (  )  [inline, static, protected, inherited]

Returns a reference to the map of the plugin types. Is a function rather than a static member variable to avoid initialization problems.

Referenced by OBPlugin::GetTypeMap(), OBPlugin::ListAsVector(), and OBFormat::RegisterFormat().

OBPlugin::PluginMapType & GetTypeMap ( const char *  PluginID  )  [static, protected, inherited]

Returns the map of a particular plugin type, e.g. GetMapType("fingerprints").

static OBPlugin* BaseFindType ( PluginMapType Map,
const char *  ID 
) [inline, static, protected, inherited]

Returns the type with the specified ID, or NULL if not found. Will be cast to the appropriate class in the calling routine.


Member Data Documentation

const char* _id [protected, inherited]


The documentation for this class was generated from the following files: