OBDescriptor Class Reference

Base class for molecular properties, descriptors or features. More...

#include <openbabel/descriptor.h>

Inheritance diagram for OBDescriptor:
OBPlugin OBGroupContrib

List of all members.

Public Types

typedef std::map< const char
*, OBPlugin *, CharPtrLess
PluginMapType
typedef
PluginMapType::const_iterator 
PluginIterator

Public Member Functions

const char * TypeID ()
virtual double Predict (OBBase *, std::string *=NULL)
double PredictAndSave (OBBase *pOb, std::string *param=NULL)
virtual double GetStringValue (OBBase *pOb, std::string &svalue, std::string *param=NULL)
virtual bool Compare (OBBase *pOb, std::istream &ss, bool noEval, std::string *param=NULL)
virtual bool Display (std::string &txt, const char *param, const char *ID=NULL)
virtual bool Order (double p1, double p2)
virtual bool Order (std::string s1, std::string s2)
virtual const char * Description ()
virtual OBPluginMakeInstance (const std::vector< std::string > &)
virtual void Init ()
const char * GetID () const
virtual PluginMapTypeGetMap () const =0

Static Public Member Functions

static bool FilterCompare (OBBase *pOb, std::istream &ss, bool noEval)
static void AddProperties (OBBase *pOb, const std::string &DescrList)
static void DeleteProperties (OBBase *pOb, const std::string &DescrList)
static std::string GetValues (OBBase *pOb, const std::string &DescrList)
static std::pair< std::string,
std::string > 
GetIdentifier (std::istream &optionText)
static OBPluginGetPlugin (const char *Type, const char *ID)
static bool ListAsVector (const char *PluginID, const char *param, std::vector< std::string > &vlist)
static void List (const char *PluginID, const char *param=NULL, std::ostream *os=&std::cout)
static std::string ListAsString (const char *PluginID, const char *param=NULL)
static std::string FirstLine (const char *txt)
static PluginIterator Begin (const char *PluginID)
static PluginIterator End (const char *PluginID)

Static Protected Member Functions

static double ParsePredicate (std::istream &optionText, char &ch1, char &ch2, std::string &svalue)
static bool ReadStringFromFilter (std::istream &ss, std::string &result)
static bool CompareStringWithFilter (std::istream &optionText, std::string &s, bool noEval, bool NoCompOK=false)
static bool ispunctU (char ch)
static bool MatchPairData (OBBase *pOb, std::string &s)
static PluginMapTypePluginMap ()
static PluginMapTypeGetTypeMap (const char *PluginID)
static OBPluginBaseFindType (PluginMapType &Map, const char *ID)

Protected Attributes

const char * _id

Detailed Description

Base class for molecular properties, descriptors or features.

Since:
version 2.2

OBDescriptor and Filtering

On the command line, using the option --filter filter-string converts only those molecules which meet the criteria specified in the filter-string. This is useful to select particular molecules from a set. It is used like: babel dataset.sdf outfile.smi --filter "MW>200 SMARTS!=c1ccccc1 PUBCHEM_CACTVS_ROTATABLE_BOND<5"

The identifier , "PUBCHEM_CACTVS_ROTATABLE_BOND" is the name of an attribute of an OBPairData which has probably been imported from a property in a SDF or CML file. The identifier names are (currently) case dependent. A comparison is made with the value in the OBPairData. This is a numeric comparison if both operands can be converted to numbers (as in the example). If the 5 had been enclosed in single or double quotes the comparison would have been a string comparison, which gives a different result in some cases. OBPairData is searched first to match an identifier.

If there are no OBPair attributes that match, the identifier is taken to be the ID of an OBDescriptor class object. The class OBDescriptor is the base class for classes that wrap molecular properties, descriptors or features. In the example "MW" and "SMARTS" are OBDescriptor IDs and are case independent. They are plugin classes, like fingerprints, forcefields and formats, so that new molecular features can be added or old ones removed (to prevent code bloat) without altering old code. A list of available descriptors is available from the commandline: babel -L descriptors or from the functions OBPlugin::List, OBPlugin::ListAsString and OBPlugin::ListAsVector.

The filter-string is interpreted by a static function of OBDescriptor, FilterCompare(). This identifies the descriptor IDs and then calls a virtual function, Compare(), of each OBDescriptor class to interpret the rest of relational expression, for example, ">200", or "=c1ccccc1". The default version of Compare() is suitable for descriptors, like MW or logP, which return a double from their Predict() method. Classes like SMARTS which need different semantics provide their own.

By default, as in the example, OBDescriptor::FilterCompare() would AND each comparison so that all the comparisons must be true for the test to succeed. However filter-string could also be a full boolean expression, with &, |, !, and parenthases allowing any combination of features to be selected. FilterCompareAs calls itself recursively to give AND precidence over OR and evaluation is not carried out if not needed.

The aim has been to make interpretation of the filter-string as liberal as possible, so that AND can be &&, there can be spaces or commas in places that are reasonable.

The base class, OBDescriptor, uses pointers to OBBase in its functions, like OBFormat, to improve extendability - reactions could have features too. It does mean that a dynamic_cast is needed at the start of the Predict(OBBase* pOb, string*) functions.

To use a particular descriptor, like logP, when programming with the API, use code like the following:

  OBDescriptor* pDescr = OBDecriptor::FindType("logP");
  if(pDescr)
    double val = pDescr->Predict(mol, param);

To add the descriptor ID and the predicted data to OBPairData attached to the object, use PredictAndSave().

Descriptors can have a string parameter, which each descriptor can interpret as it wants, maybe, for instance as multiple numeric values. The parameter is in brackets after the descriptor name, e.g. popcount(FP4). In the above programming example param is a pointer to a std::string which has a default value of NULL, meaning no parameter. GetStringValue() and Compare() are similar.

To parse a string for descriptors use GetIdentifier(), which returns both the ID and the parameter, if there is one.

This facility can be called from the command line.Use the option --add "descriptor list", which will add the requested descriptors to the molecule. They are then visible as properties in SDF and CML formats. The IDs in the list can be separated by spaces or commas. All Descriptors will provide an output value as a string through a virtual function GetStringValue((OBBase* pOb, string& svalue)) which assigns the value of a string descriptor(like inchi) to svalue or a string representation of a numerical property like logP.

The classes MWFilter and TitleFilter illustrate the code that has to be provided for numerical and non-numerical descriptors.


Member Typedef Documentation

typedef std::map<const char*, OBPlugin*, CharPtrLess> PluginMapType [inherited]

Member Function Documentation

const char* TypeID (  ) [inline, virtual]

Redefined by each plugin type: "formats", "fingerprints", etc.

Reimplemented from OBPlugin.

virtual double Predict ( OBBase ,
std::string *  = NULL 
) [inline, virtual]
Returns:
the value of a numeric descriptor

Reimplemented in OBGroupContrib.

double PredictAndSave ( OBBase pOb,
std::string *  param = NULL 
)
Returns:
the value of the descriptor and adds it to the object's OBPairData

Referenced by OBDescriptor::AddProperties().

double GetStringValue ( OBBase pOb,
std::string &  svalue,
std::string *  param = NULL 
) [virtual]

Provides a string value for non-numeric descriptors and returns NaN, or a string representation and returns a numeric value.

This default version provides a string representation of the numeric value.

Referenced by OBDescriptor::GetValues().

bool Compare ( OBBase pOb,
std::istream &  ss,
bool  noEval,
std::string *  param = NULL 
) [virtual]

Parses the filter stream for a relational expression and returns its result when applied to the chemical object.

Compare() is a virtual function and can be overridden to allow different comparison behaviour. The default implementation here is suitable for OBDescriptor classes which return a double value. The stringstream is parsed to retrieve a comparison operator, one of > < >= <= = == != , and a numerical value. The function compares this the value returned by Predict() and returns the result. The stringstream is left after the number, and its state reflects whether any errors have occurred. If noEval is true, the parsing is as normal but Predict is not called and the function returns false.

Referenced by OBDescriptor::FilterCompare().

bool Display ( std::string &  txt,
const char *  param,
const char *  ID = NULL 
) [virtual]

Write information on a plugin class to the string txt. If the parameter is a descriptor ID, displays the verbose description for that descriptor only e.g. babel -L descriptors HBA1

Reimplemented from OBPlugin.

virtual bool Order ( double  p1,
double  p2 
) [inline, virtual]

Comparison of the values of the descriptor. Used in sorting. Descriptors may use more complicated ordering than this default (e.g.InChIFilter)

virtual bool Order ( std::string  s1,
std::string  s2 
) [inline, virtual]
bool FilterCompare ( OBBase pOb,
std::istream &  optionText,
bool  noEval 
) [static]

Interprets the --filter option string and returns the combined result of all the comparisons it contains.

The string has the form: PropertyID1 predicate1 [booleanOp] PropertyID2 predicate2 ... The propertyIDs are the ID of instances of a OBDescriptor class or the Attributes of OBPairData, and contain only letters, numbers and underscores. The predicates must start with a punctuation character and are interpreted by the Compare function of the OBDescriptor class. The default implementation expects a comparison operator and a number, e.g. >=1.3 Whitespace is optional and is ignored. Each predicate and this OBBase object (usually OBMol) is passed to the Compare function of a OBDescriptor. The result of each comparison is combined in a boolean expression (which can include parentheses) in the normal way. The AND operator can be & or &&, the OR operator can be | or ||, and a unitary NOT is ! The expected operator precedence is achieved using recursive calls of the function. If there is no boolean Op, all the tests have to return true for the function to return true, i.e. the default is AND. If the first operand of an AND is 0, or of an OR is 1, the parsing of the second operand continues but no comparisons are done since the result does not matter.

void AddProperties ( OBBase pOb,
const std::string &  DescrList 
) [static]

Reads list of descriptor IDs and calls PredictAndSave() for each.

void DeleteProperties ( OBBase pOb,
const std::string &  DescrList 
) [static]

Deletes all the OBPairDatas whose attribute names are in the list (if they exist).

string GetValues ( OBBase pOb,
const std::string &  DescrList 
) [static]

Reads list of descriptor IDs and OBPairData names and returns a list of values, each precede by a space or the first character in the list if it is whitespace or punctuation.

pair< string, string > GetIdentifier ( std::istream &  optionText ) [static]

Read an identifier and its parameter from the filter string.

double ParsePredicate ( std::istream &  optionText,
char &  ch1,
char &  ch2,
std::string &  svalue 
) [static, protected]

Reads comparison operator and the following string. Return its value if possible else NaN.

bool ReadStringFromFilter ( std::istream &  ss,
std::string &  result 
) [static, protected]

Reads a string from the filter string optionally preceded by = or !=.

Reads a string from the filter stream, optionally preceded by = or !=

Returns:
false if != operator found, and true otherwise.

On entry the stringstream position should be just after the ID. On exit it is after the string. If there is an error, the stringstream badbit is set. Returns false if != found, to indicate negation. Can be of any of the following forms: mystring =mystring ==mystring [must be terminated by a space or tab] "mystring" 'mystring' ="mystring" ='mystring' [mystring can contain spaces or tabs] !=mystring !="mystring" [Returns false indicating negate] There can be spaces or tabs after the operator = == !=

bool CompareStringWithFilter ( std::istream &  optionText,
std::string &  s,
bool  noEval,
bool  NoCompOK = false 
) [static, protected]

Makes a comparison using the operator and a string read from the filter stream with a provided string.

Returns:
the result of the comparison and true if NoCompOK==true and there is no comparison operator.
static bool ispunctU ( char  ch ) [inline, static, protected]
bool MatchPairData ( OBBase pOb,
std::string &  s 
) [static, protected]
Returns:
true if s (with or without _ replaced by spaces) is a PairData attribute. On return s is the form which matches.

The documentation for this class was generated from the following files:
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Defines