Strasbourg Summer School on Chemoinformatics Obernai, 22...

34
1 João Aires de Sousa Classification of chemical reactions Strasbourg Summer School on Chemoinformatics Obernai, 22-25 June 2008 João Aires-de-Sousa Universidade Nova de Lisboa Portugal

Transcript of Strasbourg Summer School on Chemoinformatics Obernai, 22...

1João Aires de Sousa

Classification of chemical reactions

Strasbourg Summer School on ChemoinformaticsObernai, 22-25 June 2008

João Aires-de-SousaUniversidade Nova de Lisboa

Portugal

2João Aires de Sousa

Retrieval of reactions from databases Linking of reaction information from different sources Construction of knowledge bases for reaction

prediction and synthesis design Automatic procedures for analyses and correlations

in databases In Bioinformatics: the reconstruction of metabolic

pathways from genomes requires the classification of enzymatic reactions.

Why do we need to classify reactions ?

3João Aires de Sousa

1. Representation of the reaction center A bond is defined as a reaction center if it is made or

broken. An atom is defined as a reaction center if it changes

number of implicit hydrogens, number of valencies, number of π-electrons, atomic charge, the connecting bond is a reaction center.

2. Representation of differences between the structures of products and reactants (implicit representation of the reaction centers and their environments).

Representation of reactions

4João Aires de Sousa

Representations of the reaction center

5João Aires de Sousa

Manual assignment of the reaction center, and atom-to-atom mapping.

or

Algorithm for the automatic assignment of the reaction center, and atom-to-atom mapping.

Representation of the reaction center

6João Aires de Sousa

ClassCodes: structural information about reaction centers and immediate environment

Hashcodes are calculated for all reaction centers taking into account atom properties atom type valence state total number of bonded hydrogens (implicit plus explicitly drawn) number of π-electrons aromaticity formal charges reaction center information

The sum of all reaction center hashcodes of all reactants and one product of a reaction provides the unique reaction classification code: the ClassCode.

http://infochem.de/en/products/software/classify.shtml

7João Aires de Sousa

ClassCodes: structural information about reaction centers and immediate environment

Inclusion of atoms in the immediate environment (spheres) reaction centers only (0-sphere = BROAD) reaction centers + α-atoms (1-sphere = MEDIUM) reaction centers + β-atoms (2-sphere = NARROW)

Multiple occurrences of identical transformations are handled as one.

8João Aires de Sousa

RC (Reaction Classification) numbers

A reaction is decomposed into reactant pairs.

Each pair is then structurally aligned to identify the reaction center (R), the matched region (M), and the difference region (D).

The RC number represents the conversion patterns of atom types in these three regions.

M. Kotera, Y. Okuno, M. Hattori, S. Goto, M. Kanehisa, J. Am. Chem. Soc. 2004, 126, 16487-16498.

9João Aires de Sousa

RC (Reaction Classification) numbers

A list of 68 predefined atom types, e.g.

A list of numerical codes for conversion patterns of atom types, e.g. 1. C1a -> C1a 2. C1a -> C1b C1b -> C1a … 14. C1c -> C1c …

10João Aires de Sousa

RC (Reaction Classification) numbers

Application

automatic assignment of EC numbers to enzymatic reactions

A query reaction is classified based on reactions sharing the same RC number in a database. Different restrictions are possible (e.g., RDM or only RD).

EC sub-subclasses could be assigned with the accuracy of about 90% (coverage: 62% of a data set).

KEGG website http://www.genome.jp

11João Aires de Sousa

Condensed Reaction Graphs (CRG)Vladutz – Fujita – Varnek

Reactants and products are merged in an imaginary transition state or pseudo-compound.

Bonds types are defined according to their fate in the reaction (‘no bond’ to single, single to double, double to ‘no bond’, …)

||

||o o

‘Pseudo-compound’

A. Varnek, D. Fourches, F. Hoonakker, V. P. Solov’ev, J. Comput.-Aided Mol. Des. 2005, 19, 693–703S. Fujita, J. Chem. Inf. Comput. Sci. 1986, 26, 205-212; J. Chem. Inf. Comput. Sci. 1987, 27, 99-104

12João Aires de Sousa

Condensed Reaction Graphs (CRG)Vladutz – Fujita – Varnek

Reactants and products are merged in an imaginary transition state or pseudo-compound.

Bonds types are defined according to their fate in the reaction (‘no bond’ to single, single to double, double to ‘no bond’, …)

The number of occurrences of a fragment is a descriptor.

Fragments are predefined-sized sequences of connected atoms.

A fragment encodes atom types and bond types.

Descriptors can be used for the assessment of similarity between reactions, or for QSPR studies with reactions (just like with

molecules).

13João Aires de Sousa

Fingerprints of enzymatic reaction featuresMitchell lab

N. M. O’Boyle, G. L. Holliday, D. E. Almonacid, J. B. O. Mitchell, J. Mol. Biol. 2007, 368, 1484–1499

58 features of a reaction are used as reaction descriptors:

# of reactants, # of products - # of reactants, # of cycles in products - # of cycles in reactants

# of times a bond type is involved in the reaction (21 bond types)

involvement of cofactors

the total number of each type of bond order change: bond formation, bond cleavage, changes in order from 1 to 2, 2 to 1, 3 to 2

charge changes by atom type

involvement of radicals

14João Aires de Sousa

Fingerprints of enzymatic reaction featuresMitchell lab

N. M. O’Boyle, G. L. Holliday, D. E. Almonacid, J. B. O. Mitchell, J. Mol. Biol. 2007, 368, 1484–1499

58 features of a reaction are used as reaction descriptors

Application to assess similarity between individual steps of enzymatic reaction mechanisms.

Application to quantitatively measure the similarity of enzymatic reactions based upon their explicit mechanisms.

15João Aires de Sousa

Representation of the reaction centerwith physicochemical parameters

16João Aires de Sousa

Representation of reactions by physicochemical properties of the reaction center

In principle, physicochemical properties of the reaction center are related to the mechanism.

Two situations:

Representation / classification of reactions with a common reaction center

Representation / classification of reactions with diverse reaction centers

17João Aires de Sousa

Representation of reactions by physicochemical properties of the reaction center

Reactions with a common reaction centerL. Chen, J. Gasteiger, J. Am. Chem. Soc. 1997, 119, 4033-4042

Example 1: C---H + Cl---C=O → C---C=O

Electronic variable total charge X σ-electronegativity X π-electronegativity X polarizability X aromaticity indicator X

A reaction is represented by the 5 electronic parameters

18João Aires de Sousa

Representation of reactions by physicochemical properties of the reaction center

Mapping of 74 reactions with a common reaction center on a Kohonen neural network

C---H + Cl---C=O → C---C=O

Dark gray: nucleophilic aliphatic substitution of acyl chlorides

Medium gray: acylation of C=C bonds

Light gray: acylation of arenes.

L. Chen, J. Gasteiger, J. Am. Chem. Soc. 1997, 119, 4033-4042

19João Aires de Sousa

Representation of reactions by physicochemical properties of the reaction center

Reactions with a common reaction centerL. Chen, J. Gasteiger, J. Am. Chem. Soc. 1997, 119, 4033-4042

Example 2: C=C + H–C → H–C–C–C

Electronic variable total charge X X σ-electronegativity X X π-electronegativity X X polarizability X

A reaction is represented by the 7 electronic parameters

20João Aires de Sousa

Representation of reactions by physicochemical properties of the reaction center

Mapping of 120 reactions with a common reaction center on a Kohonen neural network

C=C + H–C → H–C–C–C

L. Chen, J. Gasteiger, J. Am. Chem. Soc. 1997, 119, 4033-4042

21João Aires de Sousa

Representation of reactions by physicochemical properties of the reaction center

Reactions with common features in the reaction centerH. Satoh, O. Sacher, T. Nakata, L. Chen, J. Gasteiger, K. Funatsu, J. Chem. Inf. Comput. Sci. 1998, 38, 210-219

Example 3: reactions with an oxygen atom at the reaction site

The changes in σ-charge, π-charge, σ-electronegativity, π-electronegativity, polarizability, and pKa values at the oxygen atoms of the reaction sites are taken as a representation of the reaction.

22João Aires de Sousa

Representation of reactions by physicochemical properties of the reaction center

Mapping on a Kohonen neural network of 152 O-atoms from 131 reactions with an O-atom at the reaction center

H. Satoh, O. Sacher, T. Nakata, L. Chen, J. Gasteiger, K. Funatsu, J. Chem. Inf. Comput. Sci. 1998, 38, 210-219

a: reductions and alkylations

b: cleavage of epoxides, ethers, lactones and esters

c: oxidation of alcohols

d: formation of epoxides, ethers, lactones, and esters

23João Aires de Sousa

Representation of reactions by physicochemical properties of the reaction center

Reactions with diverse reaction centers

Sacher-Gasteiger approach

6 physicochemical properties for the bonds of products at the reaction center (bond order, difference in σ- and π-electronegativity between the two atoms of the bond, difference in total charge between the two atoms of the bond, stabilization of + and – charge by delocalization).

Reactions represented by vectors with room for up to 6 bonds. Bonds must be ranked and oriented.

Mapping of 33,613 reactions from the Teilheimer database on a 92×92 Kohonen network.

O. Sacher. Ph. D. Thesis, University of Erlangen-Nuremberg.http://www2.chemie.uni-erlangen.de/services/dissonline/data/dissertation/Oliver_Sacher/html

24João Aires de Sousa

Representation of differences between the structures of products and reactants

25João Aires de Sousa

Daylight fingerprints of reactions

The difference in the fingerprint of the reactant molecules and the fingerprint of the product molecules reflects the bond changes which

occur during the reaction

Stoichiometric reactions

Avoids assignment of reaction centers and atom-to-atom mapping

Because fingerprints are binary, multiple occurrences of a path are not encoded, and a simple subtraction is not enough.

It is required to keep track of the count of each path in the reactant and product and then subtract the counts of a given path. If the difference in count is non-zero, then the path is used to set a bit in the difference fingerprint. If the difference in count is zero, then no bit is set for that path in the difference fingerprint.

http://www.daylight.com/dayhtml/doc/theory/theory.finger.html

26João Aires de Sousa

Reaction MOLMAPs

Bonds are classified by Kohonen SOMs from their physicochemical and topological features.

Bond types are assigned by the Kohonen SOM.

MOLMAPs encode the bond types that exist in a molecule.

MOLecular Maps of Atom-level Properties(based on Kohonen neural networks)

Q.-Y. Zhang, J. Aires-de-Sousa, J. Chem. Inf. Model. 2005, 45 (6), 1775-1783.D. A. R. S. Latino, J. Aires-de-Sousa, Angew. Chem. Int. Ed. 2006, 45 (13), 2066-2069.

27João Aires de Sousa

1. Train a self-organising map (SOM) with a diversity of BONDS from different structures (each bond described by n bond properties).

2. Submit all the bonds in ONE structure to the trained SOM.

3. The pattern of activated neurons by bonds of one structure is a molecular descriptor (MOLMAP): a map of the bonds present in that molecule – (a reactivity fingerprint ?...). The frequency of activation is counted for each neuron to yield a numerical representation.

Reaction MOLMAPs

Generating the MOLMAP of molecules

28João Aires de Sousa

Reaction MOLMAPs

Generating the MOLMAP of a reaction

MOLMAPof products

– =

MOLMAPof reactants

MOLMAPof reaction

29João Aires de Sousa

Reaction MOLMAPsApplication to the classification of organic reactions

Application to the classification of enzymatic reactions

EC 1.x.x.x Oxidoreductases EC 2.x.x.x Transferases EC 3.x.x.x Hydrolases EC 4.x.x.x Lyases EC 5.x.x.x Isomerases EC 6.x.x.x Ligases

Mapping of > 3,000 reactions from the KEGG database on a SOM.

Prediction of EC numbers with Random Forests (accuracy up to 89%, 82%, and 80% for the 1st, 2nd, and 3rd digits respectively).

Identification of similar reactions with differences at the class level of the EC number.

D. A. R. S. Latino, J. Aires-de-Sousa, Angew. Chem. Int. Ed. 2006, 45 (13), 2066-2069.

30João Aires de Sousa

Reaction signaturesJ.-L. Faulon, M. Misra, S. Martin, K. Sale, R. Sapra, Bioinformatics 2008, 24, 225–233

J.-L. Faulon, D. P. Visco, Jr., R. S. Pophale, J. Chem. Inf. Comput. Sci. 2003, 43, 707-720

An atomic signature is a canonical representation of the subgraph surrounding a particular atom. This subgraph includes all atoms and bonds up to a predefined distance from the given atom (the height, h).

Each component of a molecular signaturecounts the number of occurrences of a particular atomic signature in the molecule.

Reaction signature = Σ molecular signatures of products – Σ molecular signatures of substrates

31João Aires de Sousa

Reaction signaturesJ.-L. Faulon, M. Misra, S. Martin, K. Sale, R. Sapra, Bioinformatics 2008, 24, 225–233

Data set: 6,556 reactions from the KEGG database.

Machine learning technique: Support Vector Machines (SVM)

Correct assignment of EC class, subclass, and sub-subclass in up to 91%, 84% and 88% respectively (LOO cross-validation).

Application to the classification of enzymatic reactionsfor the automatic assignment of EC numbers

32João Aires de Sousa

Fingerprints of reactions based on atom types

Fingerprints generated for reactant and product molecules separately, based on Sybyl atom types and atom types augmented with a single layer around the central atom.

The difference fingerprint is defined by the differences in occurrence of each atom type in the reactant and product fingerprints.

L. Ridder, M. Wagener, ChemMedChem 2008, 3, 821 – 832

33João Aires de Sousa

Fingerprints of reactions based on atom types

Application to the classification of metabolic reactionsto assist in the establishment of rules for reaction prediction.

L. Ridder, M. Wagener, ChemMedChem 2008, 3, 821 – 832

Projection of all reactions in the training set on a 2D plane to optimally reflect reaction fingerprint distances calculated between all pairs of reactions. The method is based on stochastic proximity embedding, and optimizes the distances between points on a 2D plane to correspond as much as possible to the distances calculated in the fingerprint space between all pairs of metabolic reactions.

34João Aires de Sousa

References

2. L. Chen in Handbook of Chemoinformatics, Vol. 1 (Eds.: J. Gasteiger, T. Engel), Wiley-VCH, New York, 2003, 348-388.

3. http://infochem.de/en/products/software/classify.shtml4. M. Kotera, Y. Okuno, M. Hattori, S. Goto, M. Kanehisa, J. Am. Chem. Soc. 2004, 126, 16487-16498.5. A. Varnek, D. Fourches, F. Hoonakker, V. P. Solov’ev, J. Comput.-Aided Mol. Des. 2005, 19, 693-703.6. S. Fujita, J. Chem. Inf. Comput. Sci. 1986, 26, 205-212; J. Chem. Inf. Comput. Sci. 1987, 27, 99-104.7. N. M. O’Boyle, G. L. Holliday, D. E. Almonacid, J. B. O. Mitchell, J. Mol. Biol. 2007, 368, 1484-1499.8. L. Chen, J. Gasteiger, J. Am. Chem. Soc. 1997, 119, 4033-4042.9. H. Satoh, O. Sacher, T. Nakata, L. Chen, J. Gasteiger, K. Funatsu, J. Chem. Inf. Comput. Sci. 1998, 38,

210-219.10. O. Sacher. Ph. D. Thesis, University of Erlangen-Nuremberg. http://www2.chemie.uni-

erlangen.de/services/dissonline/data/dissertation/Oliver_Sacher/html11. http://www.daylight.com/dayhtml/doc/theory/theory.finger.html12. Q.-Y. Zhang, J. Aires-de-Sousa, J. Chem. Inf. Model. 2005, 45 (6), 1775-1783.13. D. A. R. S. Latino, J. Aires-de-Sousa, Angew. Chem. Int. Ed. 2006, 45 (13), 2066-2069.14. J.-L. Faulon, D. P. Visco, Jr., R. S. Pophale, J. Chem. Inf. Comput. Sci. 2003, 43, 707-720.15. J.-L. Faulon, M. Misra, S. Martin, K. Sale, R. Sapra, Bioinformatics 2008, 24, 225-233.16. L. Ridder, M. Wagener, ChemMedChem 2008, 3, 821-832.