Fastsearch format (fs)#
Fingerprint-aided substructure and similarity searching
Writing to the fs format makes an index of a multi-molecule datafile:
obabel dataset.sdf -ofs
This prepares an index dataset.fs
with default parameters, and is slow
(~30 minutes for a 250,000 molecule file).
However, when reading from the fs format searches are much faster, a few seconds, and so can be done interactively.
The search target is the parameter of the -s
option and can be
slightly extended SMILES (with [#n]
atoms and ~
bonds) or
the name of a file containing a molecule.
Several types of searches are possible:
Identical molecule:
obabel index.fs -O outfile.yyy -s SMILES exact
Substructure:
obabel index.fs -O outfile.yyy -s SMILES or obabel index.fs -O outfile.yyy -s filename.xxx
where
xxx
is a format id known to OpenBabel, e.g. sdfMolecular similarity based on Tanimoto coefficient:
obabel index.fs -O outfile.yyy -at15 -sSMILES # best 15 molecules obabel index.fs -O outfile.yyy -at0.7 -sSMILES # Tanimoto >0.7 obabel index.fs -O outfile.yyy -at0.7,0.9 -sSMILES # Tanimoto >0.7 && Tanimoto < 0.9
The datafile plus the -ifs
option can be used instead of the index file.
NOTE on 32-bit systems the datafile MUST NOT be larger than 4GB.
Dative bonds like -[N+][O-](=O) are indexed as -N(=O)(=O), and when searching the target molecule should be in the second form.
Read Options#
- -t <num>
Do similarity search:<num>mols or <num> as min Tanimoto
- -a
Add Tanimoto coeff to title in similarity search
- -l <num>
Maximum number of candidates. Default<4000>
- -e
Exact match
Alternative to using exact in
-s
parameter, see above- -n
No further SMARTS filtering after fingerprint phase
Write Options#
- -f <num>
Fingerprint type
If not specified, the default fingerprint (currently FP2) is used
- -N <num>
Fold fingerprint to <num> bits
- -u
Update an existing index