Chemical Registration Systems
Chemical Registration is the “big brother” of cheminformatics.
A cheminformatics system is primarily devoted to recording chemical
structure. Chemical Registration systems are additionally concerned
with:
- Structural novelty - ensure that each compound is only
registered once
- Structural normalization - ensure that structures with
alternative representations (such as nitro groups, ferrocenes, and
tautomers) are entered in a uniform way.
- Structure drawing - ensure that compounds are drawn in a uniform
fashion, so that they can be quickly recognized “by eye”.
- Maintaining relationships among related compounds. For example,
all salt forms of a compound should be recognized as being related
to one another, and compounds in different solvates are also
related.
- Registering mixtures, formulations and alternative structures.
- Registering compounds the structure of which is unknown.
- Roles, responsibilities, security, and company workflow.
- Updates, amendments and corrections, and controlling propagation
of changes (e.g. does changing a compound change a mixture
containing that compound?)
The scope of Chemical Registration Systems is far beyond the goals
of this brief introduction to cheminformatics. However, to
illustrate just one of the points above, let’s consider structural
novelty. In real life, chemical structure can be very ambiguous.
Imagine you have five bottles of a particular compound that has a
stereo center:
- The contents of the first bottle were carefully analyzed, and
found to be a single stereoisomer.
- The contents of the second bottle were carefully analyzed and
found to contain a racemic mixture of the stereoisomers.
- The stereoisomers of the third bottle are unknown. It may be
pure, or have one predominant form, or be a racemic mixture.
- The fourth bottle was obtained by running part of the contents
of bottle #2 through a chromatographic separation. It is
isotopically pure, but you don’t know which stereoisomer.
- The fifth bottle is the other fraction from the same separation
of #4. It is also isotopically pure, but you don’t know which
stereoisomer, but you know it’s the opposite of #4.
Which of these five bottles contain the same compound, and which
are different? That is the essential task of a chemical registry
system, which would consider all five to be different. After all,
you probably have data about each bottle (that’s why you have
them), and you must be able to record it and not confuse it with
the other bottles.
In this example above, consider what is known and not known:
Bottle |
Known |
Not Known |
1 |
Everything |
Nothing |
2 |
Everything |
Nothing |
3 |
Compound is known |
Stereochemistry |
4 |
Compound and purity known, stereochemistry is opposite of #5 |
Specific stereochemistry |
5 |
Compound and purity known, stereochemistry is opposite of #4 |
Specific stereochemistry |
A cheminformatics system has no way to record the contents of the
five bottles; it is only concerned with structure. By contrast, a
chemical registration system can record both what is known as
well as what is not known. This is the critical difference
between the two.