Post on 24-Aug-2019
1João Aires de Sousa
Classification of chemical reactions
Strasbourg Summer School on ChemoinformaticsObernai, 22-25 June 2008
João Aires-de-SousaUniversidade Nova de Lisboa
Portugal
2João Aires de Sousa
Retrieval of reactions from databases Linking of reaction information from different sources Construction of knowledge bases for reaction
prediction and synthesis design Automatic procedures for analyses and correlations
in databases In Bioinformatics: the reconstruction of metabolic
pathways from genomes requires the classification of enzymatic reactions.
Why do we need to classify reactions ?
3João Aires de Sousa
1. Representation of the reaction center A bond is defined as a reaction center if it is made or
broken. An atom is defined as a reaction center if it changes
number of implicit hydrogens, number of valencies, number of π-electrons, atomic charge, the connecting bond is a reaction center.
2. Representation of differences between the structures of products and reactants (implicit representation of the reaction centers and their environments).
Representation of reactions
5João Aires de Sousa
Manual assignment of the reaction center, and atom-to-atom mapping.
or
Algorithm for the automatic assignment of the reaction center, and atom-to-atom mapping.
Representation of the reaction center
6João Aires de Sousa
ClassCodes: structural information about reaction centers and immediate environment
Hashcodes are calculated for all reaction centers taking into account atom properties atom type valence state total number of bonded hydrogens (implicit plus explicitly drawn) number of π-electrons aromaticity formal charges reaction center information
The sum of all reaction center hashcodes of all reactants and one product of a reaction provides the unique reaction classification code: the ClassCode.
http://infochem.de/en/products/software/classify.shtml
7João Aires de Sousa
ClassCodes: structural information about reaction centers and immediate environment
Inclusion of atoms in the immediate environment (spheres) reaction centers only (0-sphere = BROAD) reaction centers + α-atoms (1-sphere = MEDIUM) reaction centers + β-atoms (2-sphere = NARROW)
Multiple occurrences of identical transformations are handled as one.
8João Aires de Sousa
RC (Reaction Classification) numbers
A reaction is decomposed into reactant pairs.
Each pair is then structurally aligned to identify the reaction center (R), the matched region (M), and the difference region (D).
The RC number represents the conversion patterns of atom types in these three regions.
M. Kotera, Y. Okuno, M. Hattori, S. Goto, M. Kanehisa, J. Am. Chem. Soc. 2004, 126, 16487-16498.
9João Aires de Sousa
RC (Reaction Classification) numbers
A list of 68 predefined atom types, e.g.
A list of numerical codes for conversion patterns of atom types, e.g. 1. C1a -> C1a 2. C1a -> C1b C1b -> C1a … 14. C1c -> C1c …
10João Aires de Sousa
RC (Reaction Classification) numbers
Application
automatic assignment of EC numbers to enzymatic reactions
A query reaction is classified based on reactions sharing the same RC number in a database. Different restrictions are possible (e.g., RDM or only RD).
EC sub-subclasses could be assigned with the accuracy of about 90% (coverage: 62% of a data set).
KEGG website http://www.genome.jp
11João Aires de Sousa
Condensed Reaction Graphs (CRG)Vladutz – Fujita – Varnek
Reactants and products are merged in an imaginary transition state or pseudo-compound.
Bonds types are defined according to their fate in the reaction (‘no bond’ to single, single to double, double to ‘no bond’, …)
||
||o o
‘Pseudo-compound’
A. Varnek, D. Fourches, F. Hoonakker, V. P. Solov’ev, J. Comput.-Aided Mol. Des. 2005, 19, 693–703S. Fujita, J. Chem. Inf. Comput. Sci. 1986, 26, 205-212; J. Chem. Inf. Comput. Sci. 1987, 27, 99-104
12João Aires de Sousa
Condensed Reaction Graphs (CRG)Vladutz – Fujita – Varnek
Reactants and products are merged in an imaginary transition state or pseudo-compound.
Bonds types are defined according to their fate in the reaction (‘no bond’ to single, single to double, double to ‘no bond’, …)
The number of occurrences of a fragment is a descriptor.
Fragments are predefined-sized sequences of connected atoms.
A fragment encodes atom types and bond types.
Descriptors can be used for the assessment of similarity between reactions, or for QSPR studies with reactions (just like with
molecules).
13João Aires de Sousa
Fingerprints of enzymatic reaction featuresMitchell lab
N. M. O’Boyle, G. L. Holliday, D. E. Almonacid, J. B. O. Mitchell, J. Mol. Biol. 2007, 368, 1484–1499
58 features of a reaction are used as reaction descriptors:
# of reactants, # of products - # of reactants, # of cycles in products - # of cycles in reactants
# of times a bond type is involved in the reaction (21 bond types)
involvement of cofactors
the total number of each type of bond order change: bond formation, bond cleavage, changes in order from 1 to 2, 2 to 1, 3 to 2
charge changes by atom type
involvement of radicals
14João Aires de Sousa
Fingerprints of enzymatic reaction featuresMitchell lab
N. M. O’Boyle, G. L. Holliday, D. E. Almonacid, J. B. O. Mitchell, J. Mol. Biol. 2007, 368, 1484–1499
58 features of a reaction are used as reaction descriptors
Application to assess similarity between individual steps of enzymatic reaction mechanisms.
Application to quantitatively measure the similarity of enzymatic reactions based upon their explicit mechanisms.
16João Aires de Sousa
Representation of reactions by physicochemical properties of the reaction center
In principle, physicochemical properties of the reaction center are related to the mechanism.
Two situations:
Representation / classification of reactions with a common reaction center
Representation / classification of reactions with diverse reaction centers
17João Aires de Sousa
Representation of reactions by physicochemical properties of the reaction center
Reactions with a common reaction centerL. Chen, J. Gasteiger, J. Am. Chem. Soc. 1997, 119, 4033-4042
Example 1: C---H + Cl---C=O → C---C=O
Electronic variable total charge X σ-electronegativity X π-electronegativity X polarizability X aromaticity indicator X
A reaction is represented by the 5 electronic parameters
18João Aires de Sousa
Representation of reactions by physicochemical properties of the reaction center
Mapping of 74 reactions with a common reaction center on a Kohonen neural network
C---H + Cl---C=O → C---C=O
Dark gray: nucleophilic aliphatic substitution of acyl chlorides
Medium gray: acylation of C=C bonds
Light gray: acylation of arenes.
L. Chen, J. Gasteiger, J. Am. Chem. Soc. 1997, 119, 4033-4042
19João Aires de Sousa
Representation of reactions by physicochemical properties of the reaction center
Reactions with a common reaction centerL. Chen, J. Gasteiger, J. Am. Chem. Soc. 1997, 119, 4033-4042
Example 2: C=C + H–C → H–C–C–C
Electronic variable total charge X X σ-electronegativity X X π-electronegativity X X polarizability X
A reaction is represented by the 7 electronic parameters
20João Aires de Sousa
Representation of reactions by physicochemical properties of the reaction center
Mapping of 120 reactions with a common reaction center on a Kohonen neural network
C=C + H–C → H–C–C–C
L. Chen, J. Gasteiger, J. Am. Chem. Soc. 1997, 119, 4033-4042
21João Aires de Sousa
Representation of reactions by physicochemical properties of the reaction center
Reactions with common features in the reaction centerH. Satoh, O. Sacher, T. Nakata, L. Chen, J. Gasteiger, K. Funatsu, J. Chem. Inf. Comput. Sci. 1998, 38, 210-219
Example 3: reactions with an oxygen atom at the reaction site
The changes in σ-charge, π-charge, σ-electronegativity, π-electronegativity, polarizability, and pKa values at the oxygen atoms of the reaction sites are taken as a representation of the reaction.
22João Aires de Sousa
Representation of reactions by physicochemical properties of the reaction center
Mapping on a Kohonen neural network of 152 O-atoms from 131 reactions with an O-atom at the reaction center
H. Satoh, O. Sacher, T. Nakata, L. Chen, J. Gasteiger, K. Funatsu, J. Chem. Inf. Comput. Sci. 1998, 38, 210-219
a: reductions and alkylations
b: cleavage of epoxides, ethers, lactones and esters
c: oxidation of alcohols
d: formation of epoxides, ethers, lactones, and esters
23João Aires de Sousa
Representation of reactions by physicochemical properties of the reaction center
Reactions with diverse reaction centers
Sacher-Gasteiger approach
6 physicochemical properties for the bonds of products at the reaction center (bond order, difference in σ- and π-electronegativity between the two atoms of the bond, difference in total charge between the two atoms of the bond, stabilization of + and – charge by delocalization).
Reactions represented by vectors with room for up to 6 bonds. Bonds must be ranked and oriented.
Mapping of 33,613 reactions from the Teilheimer database on a 92×92 Kohonen network.
O. Sacher. Ph. D. Thesis, University of Erlangen-Nuremberg.http://www2.chemie.uni-erlangen.de/services/dissonline/data/dissertation/Oliver_Sacher/html
24João Aires de Sousa
Representation of differences between the structures of products and reactants
25João Aires de Sousa
Daylight fingerprints of reactions
The difference in the fingerprint of the reactant molecules and the fingerprint of the product molecules reflects the bond changes which
occur during the reaction
Stoichiometric reactions
Avoids assignment of reaction centers and atom-to-atom mapping
Because fingerprints are binary, multiple occurrences of a path are not encoded, and a simple subtraction is not enough.
It is required to keep track of the count of each path in the reactant and product and then subtract the counts of a given path. If the difference in count is non-zero, then the path is used to set a bit in the difference fingerprint. If the difference in count is zero, then no bit is set for that path in the difference fingerprint.
http://www.daylight.com/dayhtml/doc/theory/theory.finger.html
26João Aires de Sousa
Reaction MOLMAPs
Bonds are classified by Kohonen SOMs from their physicochemical and topological features.
Bond types are assigned by the Kohonen SOM.
MOLMAPs encode the bond types that exist in a molecule.
MOLecular Maps of Atom-level Properties(based on Kohonen neural networks)
Q.-Y. Zhang, J. Aires-de-Sousa, J. Chem. Inf. Model. 2005, 45 (6), 1775-1783.D. A. R. S. Latino, J. Aires-de-Sousa, Angew. Chem. Int. Ed. 2006, 45 (13), 2066-2069.
27João Aires de Sousa
1. Train a self-organising map (SOM) with a diversity of BONDS from different structures (each bond described by n bond properties).
2. Submit all the bonds in ONE structure to the trained SOM.
3. The pattern of activated neurons by bonds of one structure is a molecular descriptor (MOLMAP): a map of the bonds present in that molecule – (a reactivity fingerprint ?...). The frequency of activation is counted for each neuron to yield a numerical representation.
Reaction MOLMAPs
Generating the MOLMAP of molecules
28João Aires de Sousa
Reaction MOLMAPs
Generating the MOLMAP of a reaction
MOLMAPof products
– =
MOLMAPof reactants
MOLMAPof reaction
29João Aires de Sousa
Reaction MOLMAPsApplication to the classification of organic reactions
Application to the classification of enzymatic reactions
EC 1.x.x.x Oxidoreductases EC 2.x.x.x Transferases EC 3.x.x.x Hydrolases EC 4.x.x.x Lyases EC 5.x.x.x Isomerases EC 6.x.x.x Ligases
Mapping of > 3,000 reactions from the KEGG database on a SOM.
Prediction of EC numbers with Random Forests (accuracy up to 89%, 82%, and 80% for the 1st, 2nd, and 3rd digits respectively).
Identification of similar reactions with differences at the class level of the EC number.
D. A. R. S. Latino, J. Aires-de-Sousa, Angew. Chem. Int. Ed. 2006, 45 (13), 2066-2069.
30João Aires de Sousa
Reaction signaturesJ.-L. Faulon, M. Misra, S. Martin, K. Sale, R. Sapra, Bioinformatics 2008, 24, 225–233
J.-L. Faulon, D. P. Visco, Jr., R. S. Pophale, J. Chem. Inf. Comput. Sci. 2003, 43, 707-720
An atomic signature is a canonical representation of the subgraph surrounding a particular atom. This subgraph includes all atoms and bonds up to a predefined distance from the given atom (the height, h).
Each component of a molecular signaturecounts the number of occurrences of a particular atomic signature in the molecule.
Reaction signature = Σ molecular signatures of products – Σ molecular signatures of substrates
31João Aires de Sousa
Reaction signaturesJ.-L. Faulon, M. Misra, S. Martin, K. Sale, R. Sapra, Bioinformatics 2008, 24, 225–233
Data set: 6,556 reactions from the KEGG database.
Machine learning technique: Support Vector Machines (SVM)
Correct assignment of EC class, subclass, and sub-subclass in up to 91%, 84% and 88% respectively (LOO cross-validation).
Application to the classification of enzymatic reactionsfor the automatic assignment of EC numbers
32João Aires de Sousa
Fingerprints of reactions based on atom types
Fingerprints generated for reactant and product molecules separately, based on Sybyl atom types and atom types augmented with a single layer around the central atom.
The difference fingerprint is defined by the differences in occurrence of each atom type in the reactant and product fingerprints.
L. Ridder, M. Wagener, ChemMedChem 2008, 3, 821 – 832
33João Aires de Sousa
Fingerprints of reactions based on atom types
Application to the classification of metabolic reactionsto assist in the establishment of rules for reaction prediction.
L. Ridder, M. Wagener, ChemMedChem 2008, 3, 821 – 832
Projection of all reactions in the training set on a 2D plane to optimally reflect reaction fingerprint distances calculated between all pairs of reactions. The method is based on stochastic proximity embedding, and optimizes the distances between points on a 2D plane to correspond as much as possible to the distances calculated in the fingerprint space between all pairs of metabolic reactions.
34João Aires de Sousa
References
2. L. Chen in Handbook of Chemoinformatics, Vol. 1 (Eds.: J. Gasteiger, T. Engel), Wiley-VCH, New York, 2003, 348-388.
3. http://infochem.de/en/products/software/classify.shtml4. M. Kotera, Y. Okuno, M. Hattori, S. Goto, M. Kanehisa, J. Am. Chem. Soc. 2004, 126, 16487-16498.5. A. Varnek, D. Fourches, F. Hoonakker, V. P. Solov’ev, J. Comput.-Aided Mol. Des. 2005, 19, 693-703.6. S. Fujita, J. Chem. Inf. Comput. Sci. 1986, 26, 205-212; J. Chem. Inf. Comput. Sci. 1987, 27, 99-104.7. N. M. O’Boyle, G. L. Holliday, D. E. Almonacid, J. B. O. Mitchell, J. Mol. Biol. 2007, 368, 1484-1499.8. L. Chen, J. Gasteiger, J. Am. Chem. Soc. 1997, 119, 4033-4042.9. H. Satoh, O. Sacher, T. Nakata, L. Chen, J. Gasteiger, K. Funatsu, J. Chem. Inf. Comput. Sci. 1998, 38,
210-219.10. O. Sacher. Ph. D. Thesis, University of Erlangen-Nuremberg. http://www2.chemie.uni-
erlangen.de/services/dissonline/data/dissertation/Oliver_Sacher/html11. http://www.daylight.com/dayhtml/doc/theory/theory.finger.html12. Q.-Y. Zhang, J. Aires-de-Sousa, J. Chem. Inf. Model. 2005, 45 (6), 1775-1783.13. D. A. R. S. Latino, J. Aires-de-Sousa, Angew. Chem. Int. Ed. 2006, 45 (13), 2066-2069.14. J.-L. Faulon, D. P. Visco, Jr., R. S. Pophale, J. Chem. Inf. Comput. Sci. 2003, 43, 707-720.15. J.-L. Faulon, M. Misra, S. Martin, K. Sale, R. Sapra, Bioinformatics 2008, 24, 225-233.16. L. Ridder, M. Wagener, ChemMedChem 2008, 3, 821-832.