Public Types | Public Member Functions | Protected Member Functions | Protected Attributes

OBSmartsPattern Class Reference
[Substructure Searching]

SMARTS (SMiles ARbitrary Target Specification) substructure searching. More...

#include <openbabel/parsmart.h>

List of all members.

Public Types

enum  MatchType { All, Single, AllUnique }

Public Member Functions

 OBSmartsPattern ()
virtual ~OBSmartsPattern ()
 OBSmartsPattern (const OBSmartsPattern &cp)
OBSmartsPatternoperator= (const OBSmartsPattern &cp)
void WriteMapList (std::ostream &)
Initialization Methods
bool Init (const char *pattern)
bool Init (const std::string &pattern)
Pattern Properties
const std::string & GetSMARTS () const
std::string & GetSMARTS ()
bool Empty () const
bool IsValid () const
unsigned int NumAtoms () const
unsigned int NumBonds () const
void GetBond (int &src, int &dst, int &ord, int idx)
int GetAtomicNum (int idx)
int GetCharge (int idx)
int GetVectorBinding (int idx) const
Matching methods (SMARTS on a specific OBMol)
bool Match (OBMol &mol, bool single=false)
bool Match (OBMol &mol, std::vector< std::vector< int > > &mlist, MatchType mtype=All) const
bool HasMatch (OBMol &mol) const
bool RestrictedMatch (OBMol &mol, std::vector< std::pair< int, int > > &pairs, bool single=false)
bool RestrictedMatch (OBMol &mol, OBBitVec &bv, bool single=false)
unsigned int NumMatches () const
std::vector< std::vector< int > > & GetMapList ()
std::vector< std::vector< int >
>::iterator 
BeginMList ()
std::vector< std::vector< int >
>::iterator 
EndMList ()
std::vector< std::vector< int > > & GetUMapList ()

Protected Member Functions

PatternParseSMARTSPattern (void)
PatternParseSMARTSPart (Pattern *, int)
PatternSMARTSError (Pattern *pat)
PatternParseSMARTSError (Pattern *pat, BondExpr *expr)
AtomExprParseSimpleAtomPrimitive (void)
AtomExprParseComplexAtomPrimitive (void)
AtomExprParseAtomExpr (int level)
BondExprParseBondPrimitive (void)
BondExprParseBondExpr (int level)
PatternParseSMARTSString (char *ptr)
PatternParseSMARTSRecord (char *ptr)
int GetVectorBinding ()
PatternSMARTSParser (Pattern *pat, ParseState *stat, int prev, int part)

Protected Attributes

OBSmartsPrivate * _d
std::vector< bool > _growbond
std::vector< std::vector< int > > _mlist
Pattern_pat
std::string _str
char * _buffer
char * LexPtr
char * MainPtr

Detailed Description

SMARTS (SMiles ARbitrary Target Specification) substructure searching.

Substructure search is an incredibly useful tool in the context of a small molecule programming library. Having an efficient substructure search engine reduces the amount of hard code needed for molecule perception, as well as increases the flexibility of certain operations. For instance, atom typing can be easily performed based on hard coded rules of element type and bond orders (or hybridization). Alternatively, atom typing can also be done by matching a set of substructure rules read at run time. In the latter case customization based on application (such as changing the pH) becomes a facile operation. Fortunately for Open Babel and its users, Roger Sayle donated a SMARTS parser which became the basis for SMARTS matching in Open Babel.

For more information on the SMARTS support in Open Babel, see the wiki page: http://openbabel.org/wiki/SMARTS

The SMARTS matcher, or OBSmartsPattern, is a separate object which can match patterns in the OBMol class. The following code demonstrates how to use the OBSmartsPattern class:

    OBMol mol;
    ...
    OBSmartsPattern sp;
    sp.Init("CC");
    sp.Match(mol);
    vector<vector<int> > maplist;
    maplist = sp.GetMapList();
    //or maplist = sp.GetUMapList();
    //print out the results
    vector<vector<int> >::iterator i;
    vector<int>::iterator j;
    for (i = maplist.begin();i != maplist.end();++i)
    {
    for (j = i->begin();j != i->end();++j)
    cout << j << ' `;
    cout << endl;
    }

The preceding code reads in a molecule, initializes a SMARTS pattern of two single-bonded carbons, and locates all instances of the pattern in the molecule. Note that calling the Match() function does not return the results of the substructure match. The results from a match are stored in the OBSmartsPattern, and a call to GetMapList() or GetUMapList() must be made to extract the results. The function GetMapList() returns all matches of a particular pattern while GetUMapList() returns only the unique matches. For instance, the pattern [OD1]~C~[OD1] describes a carboxylate group. This pattern will match both atom number permutations of the carboxylate, and if GetMapList() is called, both matches will be returned. If GetUMapList() is called only unique matches of the pattern will be returned. A unique match is defined as one which does not cover the identical atoms that a previous match has covered.


Member Enumeration Documentation

enum MatchType
Enumerator:
All 
Single 
AllUnique 

Constructor & Destructor Documentation

OBSmartsPattern (  ) [inline]
~OBSmartsPattern (  ) [virtual]
OBSmartsPattern ( const OBSmartsPattern cp ) [inline]

Member Function Documentation

Pattern * ParseSMARTSPattern ( void   ) [protected]
Pattern * ParseSMARTSPart ( Pattern result,
int  part 
) [protected]
Pattern * SMARTSError ( Pattern pat ) [protected]
Pattern * ParseSMARTSError ( Pattern pat,
BondExpr expr 
) [protected]
AtomExpr * ParseSimpleAtomPrimitive ( void   ) [protected]
AtomExpr * ParseComplexAtomPrimitive ( void   ) [protected]
AtomExpr * ParseAtomExpr ( int  level ) [protected]
BondExpr * ParseBondPrimitive ( void   ) [protected]
BondExpr * ParseBondExpr ( int  level ) [protected]
Pattern * ParseSMARTSString ( char *  ptr ) [protected]
Pattern * ParseSMARTSRecord ( char *  ptr ) [protected]
int GetVectorBinding (  ) [protected]
Pattern * SMARTSParser ( Pattern pat,
ParseState stat,
int  prev,
int  part 
) [protected]
OBSmartsPattern& operator= ( const OBSmartsPattern cp ) [inline]
bool Init ( const char *  pattern )
bool Init ( const std::string &  pattern )

Parse the pattern SMARTS string.

Returns:
Whether the pattern is a valid SMARTS expression
const std::string& GetSMARTS (  ) const [inline]
Returns:
the SMARTS string which is currently used
std::string& GetSMARTS (  ) [inline]
Returns:
the SMARTS string which is currently used
bool Empty (  ) const [inline]
Returns:
If the SMARTS pattern is an empty expression (e.g., invalid)
bool IsValid (  ) const [inline]
Returns:
If the SMARTS pattern is a valid expression
unsigned int NumAtoms (  ) const [inline]
Returns:
the number of atoms in the SMARTS pattern

Referenced by OBPhModel::ParseLine().

unsigned int NumBonds (  ) const [inline]
Returns:
the number of bonds in the SMARTS pattern
void GetBond ( int &  src,
int &  dst,
int &  ord,
int  idx 
)

Access the bond idx in the internal pattern

Parameters:
srcThe index of the beginning atom
dstThe index of the end atom
ordThe bond order of this bond
idxThe index of the bond in the SMARTS pattern
int GetAtomicNum ( int  idx )
Returns:
the atomic number of the atom idx in the internal pattern
int GetCharge ( int  idx )
Returns:
the formal charge of the atom idx in the internal pattern
int GetVectorBinding ( int  idx ) const [inline]
Returns:
the vector binding of the atom idx in the internal pattern
bool Match ( OBMol mol,
bool  single = false 
)

Perform SMARTS matching for the pattern specified using Init().

Parameters:
molThe molecule to use for matching
singleWhether only a single match is required (faster). Default is false.
Returns:
Whether matches occurred

Referenced by OBBondTyper::AssignFunctionalGroupBonds(), OpenBabel::CorrectBadResonanceForm(), OBMol::DoTransformations(), OBAtom::MatchesSMARTS(), and OBMol::NewPerceiveKekuleBonds().

bool Match ( OBMol mol,
std::vector< std::vector< int > > &  mlist,
MatchType  mtype = All 
) const

Perform SMARTS matching for the pattern specified using Init(). This version is (more) thread safe.

Parameters:
molThe molecule to use for matching
mlistThe resulting match list
mtypeThe match type to use. Default is All.
Returns:
Whether matches occurred
bool HasMatch ( OBMol mol ) const

Thread safe check for any SMARTS match

Parameters:
molThe molecule to use for matching
Returns:
Whether there exists any match
bool RestrictedMatch ( OBMol mol,
std::vector< std::pair< int, int > > &  pairs,
bool  single = false 
)
bool RestrictedMatch ( OBMol mol,
OBBitVec bv,
bool  single = false 
)
unsigned int NumMatches (  ) const [inline]
Returns:
the number of non-unique SMARTS matches To get the number of unique SMARTS matches, query GetUMapList()->size()
std::vector<std::vector<int> >& GetMapList (  ) [inline]
Returns:
the entire list of non-unique matches for this pattern
See also:
GetUMapList()

Referenced by OBRotorRules::GetRotorIncrements().

std::vector<std::vector<int> >::iterator BeginMList (  ) [inline]
Returns:
An iterator over the (non-unique) match list, starting at the beginning
std::vector<std::vector<int> >::iterator EndMList (  ) [inline]
Returns:
An iterator over the non-unique match list, set to the end
std::vector< std::vector< int > > & GetUMapList (  )
Returns:
the entire list of unique matches for this pattern A unique match is defined as one which does not cover the identical atoms that a previous match has covered.

For instance, the pattern [OD1]~C~[OD1] describes a carboxylate group. This pattern will match both atom number permutations of the carboxylate, and if GetMapList() is called, both matches will be returned. If GetUMapList() is called only unique matches of the pattern will be returned.

Referenced by OBBondTyper::AssignFunctionalGroupBonds(), OpenBabel::CorrectBadResonanceForm(), and OBAtom::MatchesSMARTS().

void WriteMapList ( std::ostream &  ofs )

Debugging -- write a list of matches to the output stream.


Member Data Documentation

OBSmartsPrivate* _d [protected]

Internal data storage for future expansion.

std::vector<bool> _growbond [protected]
Deprecated:
(Not used)
std::vector<std::vector<int> > _mlist [protected]

The list of matches.

Pattern* _pat [protected]

The parsed SMARTS pattern.

std::string _str [protected]

The string of the SMARTS expression.

Referenced by OBSmartsPattern::operator=().

char* _buffer [protected]
char* LexPtr [protected]
char* MainPtr [protected]

The documentation for this class was generated from the following files:
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Defines