Base class for molecular properties, descriptors or features. More...
#include <openbabel/descriptor.h>
Public Types | |
typedef std::map< const char *, OBPlugin *, CharPtrLess > | PluginMapType |
typedef PluginMapType::const_iterator | PluginIterator |
Public Member Functions | |
const char * | TypeID () |
virtual double | Predict (OBBase *, std::string *=NULL) |
double | PredictAndSave (OBBase *pOb, std::string *param=NULL) |
virtual double | GetStringValue (OBBase *pOb, std::string &svalue, std::string *param=NULL) |
virtual bool | Compare (OBBase *pOb, std::istream &ss, bool noEval, std::string *param=NULL) |
virtual bool | Display (std::string &txt, const char *param, const char *ID=NULL) |
virtual bool | Order (double p1, double p2) |
virtual bool | Order (std::string s1, std::string s2) |
virtual const char * | Description () |
virtual OBPlugin * | MakeInstance (const std::vector< std::string > &) |
virtual void | Init () |
const char * | GetID () const |
virtual PluginMapType & | GetMap () const =0 |
Static Public Member Functions | |
static bool | FilterCompare (OBBase *pOb, std::istream &ss, bool noEval) |
static void | AddProperties (OBBase *pOb, const std::string &DescrList) |
static void | DeleteProperties (OBBase *pOb, const std::string &DescrList) |
static std::string | GetValues (OBBase *pOb, const std::string &DescrList) |
static std::pair< std::string, std::string > | GetIdentifier (std::istream &optionText) |
static OBPlugin * | GetPlugin (const char *Type, const char *ID) |
static bool | ListAsVector (const char *PluginID, const char *param, std::vector< std::string > &vlist) |
static void | List (const char *PluginID, const char *param=NULL, std::ostream *os=&std::cout) |
static std::string | ListAsString (const char *PluginID, const char *param=NULL) |
static std::string | FirstLine (const char *txt) |
static PluginIterator | Begin (const char *PluginID) |
static PluginIterator | End (const char *PluginID) |
Static Protected Member Functions | |
static double | ParsePredicate (std::istream &optionText, char &ch1, char &ch2, std::string &svalue) |
static bool | ReadStringFromFilter (std::istream &ss, std::string &result) |
static bool | CompareStringWithFilter (std::istream &optionText, std::string &s, bool noEval, bool NoCompOK=false) |
static bool | ispunctU (char ch) |
static bool | MatchPairData (OBBase *pOb, std::string &s) |
static PluginMapType & | PluginMap () |
static PluginMapType & | GetTypeMap (const char *PluginID) |
static OBPlugin * | BaseFindType (PluginMapType &Map, const char *ID) |
Protected Attributes | |
const char * | _id |
Base class for molecular properties, descriptors or features.
OBDescriptor and Filtering
On the command line, using the option --filter filter-string converts only those molecules which meet the criteria specified in the filter-string. This is useful to select particular molecules from a set. It is used like: babel dataset.sdf outfile.smi --filter "MW>200 SMARTS!=c1ccccc1 PUBCHEM_CACTVS_ROTATABLE_BOND<5"
The identifier , "PUBCHEM_CACTVS_ROTATABLE_BOND" is the name of an attribute of an OBPairData which has probably been imported from a property in a SDF or CML file. The identifier names are (currently) case dependent. A comparison is made with the value in the OBPairData. This is a numeric comparison if both operands can be converted to numbers (as in the example). If the 5 had been enclosed in single or double quotes the comparison would have been a string comparison, which gives a different result in some cases. OBPairData is searched first to match an identifier.
If there are no OBPair attributes that match, the identifier is taken to be the ID of an OBDescriptor class object. The class OBDescriptor is the base class for classes that wrap molecular properties, descriptors or features. In the example "MW" and "SMARTS" are OBDescriptor IDs and are case independent. They are plugin classes, like fingerprints, forcefields and formats, so that new molecular features can be added or old ones removed (to prevent code bloat) without altering old code. A list of available descriptors is available from the commandline: babel -L descriptors or from the functions OBPlugin::List, OBPlugin::ListAsString and OBPlugin::ListAsVector.
The filter-string is interpreted by a static function of OBDescriptor, FilterCompare(). This identifies the descriptor IDs and then calls a virtual function, Compare(), of each OBDescriptor class to interpret the rest of relational expression, for example, ">200", or "=c1ccccc1". The default version of Compare() is suitable for descriptors, like MW or logP, which return a double from their Predict() method. Classes like SMARTS which need different semantics provide their own.
By default, as in the example, OBDescriptor::FilterCompare() would AND each comparison so that all the comparisons must be true for the test to succeed. However filter-string could also be a full boolean expression, with &, |, !, and parenthases allowing any combination of features to be selected. FilterCompareAs calls itself recursively to give AND precidence over OR and evaluation is not carried out if not needed.
The aim has been to make interpretation of the filter-string as liberal as possible, so that AND can be &&, there can be spaces or commas in places that are reasonable.
The base class, OBDescriptor, uses pointers to OBBase in its functions, like OBFormat, to improve extendability - reactions could have features too. It does mean that a dynamic_cast is needed at the start of the Predict(OBBase* pOb, string*) functions.
To use a particular descriptor, like logP, when programming with the API, use code like the following:
OBDescriptor* pDescr = OBDecriptor::FindType("logP"); if(pDescr) double val = pDescr->Predict(mol, param);
To add the descriptor ID and the predicted data to OBPairData attached to the object, use PredictAndSave().
Descriptors can have a string parameter, which each descriptor can interpret as it wants, maybe, for instance as multiple numeric values. The parameter is in brackets after the descriptor name, e.g. popcount(FP4). In the above programming example param is a pointer to a std::string which has a default value of NULL, meaning no parameter. GetStringValue() and Compare() are similar.
To parse a string for descriptors use GetIdentifier(), which returns both the ID and the parameter, if there is one.
This facility can be called from the command line.Use the option --add "descriptor list", which will add the requested descriptors to the molecule. They are then visible as properties in SDF and CML formats. The IDs in the list can be separated by spaces or commas. All Descriptors will provide an output value as a string through a virtual function GetStringValue((OBBase* pOb, string& svalue)) which assigns the value of a string descriptor(like inchi) to svalue or a string representation of a numerical property like logP.
The classes MWFilter and TitleFilter illustrate the code that has to be provided for numerical and non-numerical descriptors.
typedef std::map<const char*, OBPlugin*, CharPtrLess> PluginMapType [inherited] |
const char* TypeID | ( | ) | [inline, virtual] |
Redefined by each plugin type: "formats", "fingerprints", etc.
Reimplemented from OBPlugin.
virtual double Predict | ( | OBBase * | , |
std::string * | = NULL |
||
) | [inline, virtual] |
Reimplemented in OBGroupContrib.
double PredictAndSave | ( | OBBase * | pOb, |
std::string * | param = NULL |
||
) |
Referenced by OBDescriptor::AddProperties().
double GetStringValue | ( | OBBase * | pOb, |
std::string & | svalue, | ||
std::string * | param = NULL |
||
) | [virtual] |
Provides a string value for non-numeric descriptors and returns NaN, or a string representation and returns a numeric value.
This default version provides a string representation of the numeric value.
Referenced by OBDescriptor::GetValues().
bool Compare | ( | OBBase * | pOb, |
std::istream & | ss, | ||
bool | noEval, | ||
std::string * | param = NULL |
||
) | [virtual] |
Parses the filter stream for a relational expression and returns its result when applied to the chemical object.
Compare() is a virtual function and can be overridden to allow different comparison behaviour. The default implementation here is suitable for OBDescriptor classes which return a double value. The stringstream is parsed to retrieve a comparison operator, one of > < >= <= = == != , and a numerical value. The function compares this the value returned by Predict() and returns the result. The stringstream is left after the number, and its state reflects whether any errors have occurred. If noEval is true, the parsing is as normal but Predict is not called and the function returns false.
Referenced by OBDescriptor::FilterCompare().
bool Display | ( | std::string & | txt, |
const char * | param, | ||
const char * | ID = NULL |
||
) | [virtual] |
Write information on a plugin class to the string txt. If the parameter is a descriptor ID, displays the verbose description for that descriptor only e.g. babel -L descriptors HBA1
Reimplemented from OBPlugin.
virtual bool Order | ( | double | p1, |
double | p2 | ||
) | [inline, virtual] |
Comparison of the values of the descriptor. Used in sorting. Descriptors may use more complicated ordering than this default (e.g.InChIFilter)
virtual bool Order | ( | std::string | s1, |
std::string | s2 | ||
) | [inline, virtual] |
bool FilterCompare | ( | OBBase * | pOb, |
std::istream & | optionText, | ||
bool | noEval | ||
) | [static] |
Interprets the --filter option string and returns the combined result of all the comparisons it contains.
The string has the form: PropertyID1 predicate1 [booleanOp] PropertyID2 predicate2 ... The propertyIDs are the ID of instances of a OBDescriptor class or the Attributes of OBPairData, and contain only letters, numbers and underscores. The predicates must start with a punctuation character and are interpreted by the Compare function of the OBDescriptor class. The default implementation expects a comparison operator and a number, e.g. >=1.3 Whitespace is optional and is ignored. Each predicate and this OBBase object (usually OBMol) is passed to the Compare function of a OBDescriptor. The result of each comparison is combined in a boolean expression (which can include parentheses) in the normal way. The AND operator can be & or &&, the OR operator can be | or ||, and a unitary NOT is ! The expected operator precedence is achieved using recursive calls of the function. If there is no boolean Op, all the tests have to return true for the function to return true, i.e. the default is AND. If the first operand of an AND is 0, or of an OR is 1, the parsing of the second operand continues but no comparisons are done since the result does not matter.
void AddProperties | ( | OBBase * | pOb, |
const std::string & | DescrList | ||
) | [static] |
Reads list of descriptor IDs and calls PredictAndSave() for each.
void DeleteProperties | ( | OBBase * | pOb, |
const std::string & | DescrList | ||
) | [static] |
Deletes all the OBPairDatas whose attribute names are in the list (if they exist).
string GetValues | ( | OBBase * | pOb, |
const std::string & | DescrList | ||
) | [static] |
Reads list of descriptor IDs and OBPairData names and returns a list of values, each precede by a space or the first character in the list if it is whitespace or punctuation.
pair< string, string > GetIdentifier | ( | std::istream & | optionText ) | [static] |
Read an identifier and its parameter from the filter string.
double ParsePredicate | ( | std::istream & | optionText, |
char & | ch1, | ||
char & | ch2, | ||
std::string & | svalue | ||
) | [static, protected] |
Reads comparison operator and the following string. Return its value if possible else NaN.
bool ReadStringFromFilter | ( | std::istream & | ss, |
std::string & | result | ||
) | [static, protected] |
Reads a string from the filter string optionally preceded by = or !=.
Reads a string from the filter stream, optionally preceded by = or !=
On entry the stringstream position should be just after the ID. On exit it is after the string. If there is an error, the stringstream badbit is set. Returns false if != found, to indicate negation. Can be of any of the following forms: mystring =mystring ==mystring [must be terminated by a space or tab] "mystring" 'mystring' ="mystring" ='mystring' [mystring can contain spaces or tabs] !=mystring !="mystring" [Returns false indicating negate] There can be spaces or tabs after the operator = == !=
bool CompareStringWithFilter | ( | std::istream & | optionText, |
std::string & | s, | ||
bool | noEval, | ||
bool | NoCompOK = false |
||
) | [static, protected] |
Makes a comparison using the operator and a string read from the filter stream with a provided string.
static bool ispunctU | ( | char | ch ) | [inline, static, protected] |
bool MatchPairData | ( | OBBase * | pOb, |
std::string & | s | ||
) | [static, protected] |