Exploration of Chemical Space by Molecular...

13
Exploration of Chemical Space by Molecular Morphing David Hoksza 1 , Daniel Svozil 2 1 SIRET Research Group Department of Software Engineering, FMP, Charles University in Prague, Czech Republic 2 Laboratory of Informatics and Chemistry Institute of Chemical Technology, Prague, Czech Republic

Transcript of Exploration of Chemical Space by Molecular...

Exploration of Chemical Space by

Molecular Morphing

David Hoksza1, Daniel Svozil2

1 SIRET Research Group

Department of Software Engineering, FMP, Charles University in Prague, Czech Republic

2 Laboratory of Informatics and Chemistry

Institute of Chemical Technology, Prague, Czech Republic

Outline • Overview and Motivation

• Chemical Space Exploration o morphing operators

o molecule representation

o distance definition

o space exploration

• Experimental Evaluation

October 26, 2011 BIBE 2011 2

Chemical Space • All possible organic compounds comprise a “chemical

space”

• Can be viewed as being analogous to the cosmological universe in its vastness, with chemical compounds populating space instead of stars

• Size o Estimated size of the chemical space: 10100-10200 (SciFinder ~ 6107)

o Around one sextillion (1021) stars in the observable universe

o For example, there are more than 1029 possible derivatives of n-hexane

o Chemical space is infinite for our purposes

• Not all theoretically postulated compounds fall within the limits of what is synthetically feasible

October 26, 2011 BIBE 2011 3

Chemical Space Exploration - Motivation

• Motivation o 2 ligands

October 26, 2011 BIBE 2011 4

General Algorithm 1. Generate n morphs

from MS

2. Accept each morph

with probability give by its distance to MT

3. Accepted morphs form generation M1

4. For each morph Mi from M1 repeat from 1 using MS = Mi

5. Finish when one of the morphs is identical with MT

October 26, 2011 BIBE 2011 5

Molecular Structure Representation

• Fragment-based representation o The fragments present in a structure can be represented as a sequence of

0s and 1s

00010100010101000101010011110100

• 0 means fragment is not present in structure

• 1 means fragment is present in structure (perhaps multiple times)

o structural keys – fixed dictionary of fragments (1:1 relationship bit:fragment, problem: structure containing no fragments in dictionary)

o hashed fingerprints – the fragment description (C-C-N-C-O) can be hashed to the e.g. 1-1024 and this bit is set (problem: collisions, how to

work back from position to fragment?)

October 26, 2011 BIBE 2011 6

Molecular Structure Similarity

• Count the “on” bits in both molecules

• Count the “on” bits in each molecule

struct A: 00010100010101000101010011110100 13 bits on (A)

struct B: 00000000100101001001000011100000 8 bits on (B)

A AND B: 00000000000101000001000011100000 6 bits on (C)

• Tanimoto similarity coefficient

similarity = 𝐶

𝐴 + 𝐵 − 𝐶=

6

13 + 8 − 6= 0.4

October 26, 2011 BIBE 2011 7

Morphing Operators Morphing Operators

MS

MT

Path Example

October 26, 2011 BIBE 2011 8

Exploration Parameters • cnt_max_iterations

• cnt_morphs

• cnt_morphs_det

• dist_det

• cnt_accept

• cnt_accept_max

• cnt_it_prune

• cnt_morphs_max

October 26, 2011 BIBE 2011 9

Evaluation - Datasets • 3 start/target pairs datasets from Pubchem

• 20 pairs in each set

• 3 difficulty levels based on pair similarity o representation of start and target structures by their PubChem

substructure fingerprints

o similarity quantified as the Tanimoto score

• D1 … 0.7 – 0.8 similarity

• D2 … 0.5 – 0.6 similarity

• D3 … 0.3 – 0.4 similarity

• time constraint – 8h

October 26, 2011 BIBE 2011 10

Evaluation - Results

October 26, 2011 BIBE 2011 11

75% 60% 35%

Molpher Student Project • To start at the end of 2011

• Algorithm optimization

• Parallel processing

• Visualization

• Extensive Logging

October 26, 2011 BIBE 2011 12

Questions?

October 26, 2011 BIBE 2011 13