Working with Generic Data

Since:
2.3

Introduction

Generic data is a concept used in OpenBabel to store additional information in objects. The objects are usually molecules, atoms or bonds (OBMol, OBAtom and OBBond). The data can literally be anything since OBPairTemplate allows any datatype (classes have to be copyable though) to be stored. For example, a file format contains some strings or numbers (e.g. QM energy, biological activity, chemical supplier & price, ...) for each molecule and these can be stored in the OBMol object. When the file format is used to read a file, the program can access and use this data. A concrete example is the PDB file format which specifies a large number of protein specific data types. All data which cannot be stored using the API is stored as strings in the OBMol object. The program (e.g. a 3D molecular viewer) can retrieve the data (e.g. secondary structure) and use it. It would not be possible to add API methods for all this.

Design

There are two abstract classes defining the interfaces. The OBGenericData interface makes it possible to work with derived classes without knowing anything about the data itself. It contains methods (OBGenericData::SetAttribute and OBGenericData::GetAttribute) for associating the data with a name. To use std::map<std::string, T> analogy, the attribute is the key for the data T. GetValue always returns a std::string and derived classes should convert their data to a string when possible. Returning an empty string is acceptable though. The second OBBase class defines an interface to store/retrieve/remove OBGenericData objects by attribute, type or source. To use std::map analogy again, classes derived from OBBase are the map.

Storing strings and numbers

In many cases storing strings and numbers is all you need. Strings can be stored using the OBPairData class. For numbers there is OBPairInteger and OBPairFloatingPoint. Although the interface is almost the same for these classes multiple examples are given to make it easier to copy/paste.

Storing and retrieving a string:

 // storing a string
 OBPairData *supplier = new OBPairData;
 supplier->SetAttribute("supplier"); // the name or key for the data
 supplier->SetValue("some supplier name/id"); // reading from a file for example
 mol.SetData(supplier);

 // retrieve the string by attribute
 if (mol.HasData("supplier")) {
   OBPairData *supplier = dynamic_cast<OBPairData*>(mol.GetData("supplier"));
   cout << "supplier: " << supplier->GetValue() << endl;
 }

Storing and retrieving an integer:

 // storing an integer
 OBPairInteger *data = new OBPairInteger;
 data->SetAttribute("numAromRings"); // the name or key for the data
 data->SetValue(numAromRings); // computed before
 mol.SetData(data);

 // retrieve the integer by attribute
 if (mol.HasData("numAromRings")) {
   OBPairInteger *data = dynamic_cast<OBPairInteger*>(mol.GetData("numAromRings"));
   cout << "number of aromatic rings: " << data->GetGenericValue() << endl;
 }

There is a small difference between strings and numbers. The main reason is that GetValue always returns a string. OBPairInteger and OBPairFloatingPoint are actually typedefs for OBPairTemplate which defines the appropriate GetGenericValue method to return the numeric data type.

Storing and retrieving a floating point value:

 // storing an integer
 OBPairFloatingPoint *data = new OBPairFloatingPoint;
 data->SetAttribute("activity"); // the name or key for the data
 data->SetValue(8.3); // computed before
 mol.SetData(data);

 // retrieve the integer by attribute
 if (mol.HasData("activity")) {
   OBPairFloatingPoint *data = dynamic_cast<OBPairFloatingPoint*>(mol.GetData("activity"));
   cout << "biological activity: " << data->GetGenericValue() << endl;
 }

Truly generic data using OBPairTemplate

Although there are a number of classes for specific data types, using OBPairTemplate the same can be accomplished with less code. The second example illustrates this but a simpler example is given first.

Storing a list of suppliers in an OBMol object:

 typedef OBPairTemplate< std::vector<std::string> > SupplierData;
 // storing the supplier list
 SupplierData *data = new SupplierData;
 data->SetAttribute("suppliers");
 data->SetValue(suppliers);
 mol.SetData(data);

 // retrieve the supplier list
 if (mol.HasData("suppliers")) {
   SupplierData *data = dynamic_cast<SupplierData*>(mol.GetData("suppliers"));
   std::vector<std::string> &suppliers = data->GetGenericData();
   for (unsigned int i = 0; i < suppliers.size(); ++i)
     cout << suppliers[i] << endl;
 }

Storing complex data in an OBMol object:

 // data representation struct
 struct MyDataRepr {
   double value, error;
   string unit;
 };
 typedef OBPairTemplate< MyDataRepr > MyData;

 // storing the supplier list
 MyData *data = new MyData;
 data->SetAttribute("mydata");
 MyDataRepr repr;
 repr.value = 5.3;
 repr.error = 0.3;
 repr.unit = "kJ/mol";
 data->SetValue(repr);
 mol.SetData(data);

 // retrieve the supplier list
 if (mol.HasData("mydata")) {
   MyData *data = dynamic_cast<MyData*>(mol.GetData("mydata"));
   MyDataRepr &repr = data->GetGenericData();
   cout << repr.value << " +/- " << repr.error << " " << repr.unit << endl;
 }

Specific data types

A number of specific OBGenericData subclasses are provided for frequently used data types: AliasData, OBAngleData, OBAtomClassData, OBChiralData, OBCommentData, OBConformerData, OBDOSData, OBElectronicTransitionData, OBExternalBondData, OBGridData, OBMatrixData, OBNasaThermoData, OBOrbitalEnergyData, OBPairData, OBRateData, OBRingData, OBRotamerList, OBRotationData, OBSerialNums, OBSetData, OBStereoBase, OBSymmetryData, OBTorsionData, OBUnitCell, OBVectorData, OBVibrationData, OBVirtualBond. Consult the documentation for these classes for more information.

Generic data & file formats

Various file formats read and write generic data. This section contains an overview of the data used by file formats. When adding or extending a file format it is highly recommended to update this section.

Read data ordered by data type

This section only contains information on data types used in a similar way by at least two file formats.

OBUnitCell:

Read data ordered by format

adfformat: *.adfout

cacaoformat: *.caccrt

carformat: *.car *.arc

chemkinformat: *.ck

 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Defines