Varun Khanna and Shoba Ranganathan Macquarie University, Sydney, Australia [email protected]...
-
Upload
angelica-hampton -
Category
Documents
-
view
216 -
download
1
Transcript of Varun Khanna and Shoba Ranganathan Macquarie University, Sydney, Australia [email protected]...
Varun Khanna and Shoba RanganathanMacquarie University, Sydney, Australia
Physiochemical property space distribution among human
metabolites, drugs and toxins
2
Outline of the presentation
Introduction Chemoinformatics and current drug discovery approach
Drug-likeness and related measures Molecular bioactivity space
Results Conclusion
3
Chemoinformatics
Chemistry + Informatics = Chemoinformatics Brown 1998; Willett 2007 Involves many sub-disciplines today, such as:
• Similarity and diversity analysis• CASD-Computer Aided Synthesis Design• CASE-Computer Aided Structure Elucidation• QSAR-Quantitative Structure Activity Relationship
4
Current drug discovery process: 10-15 years & US $1 billion
Disease Target identification
Lead identification
Preclinical testing
Human clinical trials
Approved by regulatory authorities
Market
5
Toxicity: major cause of drug failures
Schuster D, Laggner C, Langer T: Why drugs fail - a study on side effects in new chemical entities. Curr Pharm Des 2005, 11(27):3545-3559.
Gut J, Bagatto D: Theragenomic knowledge management for individualised safety of drugs, chemicals, pollutants and dietary ingredients. Expert Opin Drug Metab Toxicol 2005, 1(3):537-554.
However, there is no comparison of toxins to drugs or any other drug-like set of molecules.
Data resources available: Distributed Structure-Searchable Toxicity (DSSTox)
Carcinogenic Potency Database (potency.berkeley.edu)
6
Outline of the presentation
Introduction Chemoinformatics and current drug discovery approach
Drug-likeness and related measures Molecular bioactivity space
Results Conclusion
7
A very brief history of “drug-likeness”
Lipinski’s Rule of Five (Ro5) dominated drug design and discovery since 1997
A molecule is “non-drug-like” if it has >5 five hydrogen bond donors, >10 hydrogen bond acceptors, molecular mass >500 and lipophilicity (measured as AlogP) >5.
Recently, metabolite-likeness is important for designing targeted drugs, that act on specific metabolic pathways (Dobson et al., 2009)
Data resources available are: Human Metabolite Database (www.hmdb.ca) DrugBank (www.drugbank.ca)
8
Molecular bioactivity space
N P
D T
R O
U X
G I
S N
S
M E T A B O L I T E S
9
186 (4.38 %)
228 (2.91 %)
92 (1.65 %)
(3248)Drugs
(995)
Toxins
Metabolites(4568)
Large scale physiochemical property comparison
In this paper, we present Comprehensive analysis of
Drugs Metabolites Toxins
Comparison of Ro5 1D 3D
Clustered (or representative) vs. unclustered (or raw) datasets (for the first time)
10
Clustered and unclustered (raw) datasets
Dataset Metabolites Drugs Toxins
Unclustered M: 6582 D: 4829 T: 1448
Clustered CM: 4568 CD: 3248 CT: 995
11
Properties of drug-like molecules Lipinski properties (Ro5) 1D properties
Number of atoms Number of nitrogen and oxygen atoms Number of rings Number of rotatable bonds
3D properties Molecular volume Molecular surface area Molecular polar surface area Molecular solvent accessible surface area
Analysis Software SciTegic Pilot (accelrys.com/products/scitegic) Clustering: Using “Cluster Clara” algorithm and employing ECFP_4
fingerprints as molecular descriptors.
12
Outline of the presentation
Introduction Chemoinformatics and current drug discovery approach
Drug-likeness and related measures Molecular bioactivity space
Results Conclusion
13
“Rule of five” analysis
Datasets
Lipinski Properties
Molecular weight <500 Da
H-bond Donor <=5
H-bond Acceptor <=10
Log P <5
HMDB (Metabolites) 34% 84% 84% 35%
DDB (Drugs) 84% 86% 87% 92%
CPDB (Toxic molecules)
94% 98% 97% 92%
14
Lipinski properties
0
10
20
30
40
50
60
70
80
-2
0 –
-1
5
-1
5 –
-1
0
-1
0 –
-5
-5
– 0
0 –
5
5 –
10
10
– 1
5
15
– 2
0
20
– 2
5
25
– 3
0
0
5
10
15
20
25
30
35
0-1
00
10
0-2
00
20
0-3
00
30
0-4
00
40
0-5
00
50
0-6
00
60
0-7
00
70
0-8
00
80
0-9
00
90
0-1
00
0
10
00
-11
00
11
00
-12
00
a. Molecular weight b. Alog POxygen atom distribution
0
5
10
15
20
25
30
35
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Number of oxygen atoms
Per
cen
tag
e o
f m
ole
cule
s
MetabolitesDrugsToxins
Oxygen atom distribution
0
5
10
15
20
25
30
35
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Number of oxygen atoms
Per
cen
tag
e o
f m
ole
cule
s
MetabolitesDrugsToxins
0
10
20
30
40
0 2 4 6 8 10 12 14 16 18 20
0
10
20
30
40
0 1 2 3 4 5 6 7 8 9 10
c. Lipinski hydrogen bond donor d. Lipinski hydrogen bond acceptor
Oxygen atom distribution
0
5
10
15
20
25
30
35
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Number of oxygen atoms
Per
cen
tag
e o
f m
ole
cule
s
MetabolitesDrugsToxins
Oxygen atom distribution
0
5
10
15
20
25
30
35
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Number of oxygen atoms
Per
cen
tag
e o
f m
ole
cule
s
MetabolitesDrugsToxins
Pe
rce
nta
ge
of
mo
lecu
les
Pe
rce
nta
ge
of
mo
lecu
les
Pe
rce
nta
ge
of
mo
lecu
les
Pe
rce
nta
ge
of
mo
lecu
les
15
1D property comparison
0
10
20
30
40
50
0-1
0
10-
20
20-
30
30-
40
40-
50
50-
60
60-
70
70-
80
0
10
20
30
40
0-5
5-1
0
10-1
5
15-2
0
20-
25
25-
30
30-3
5
35-
40
40-
45
45-5
0
50-
55
55-
60
60-
65
a. Number of atoms b. Number of carbon atoms
Oxygen atom distribution
0
5
10
15
20
25
30
35
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Number of oxygen atoms
Perc
en
tag
e o
f m
ole
cu
les
MetabolitesDrugsToxins
Oxygen atom distribution
0
5
10
15
20
25
30
35
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Number of oxygen atoms
Perc
en
tag
e o
f m
ole
cu
les
MetabolitesDrugsToxins
0
10
20
30
40
50
0 1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
0 2 4 6 8 10 12 14 16 18 20
c. Number of nitrogen atoms d. Number of oxygen atomsOxygen atom distribution
0
5
10
15
20
25
30
35
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Number of oxygen atoms
Perc
en
tag
e o
f m
ole
cu
les
MetabolitesDrugsToxins
Oxygen atom distribution
0
5
10
15
20
25
30
35
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Number of oxygen atoms
Perc
en
tag
e o
f m
ole
cu
les
MetabolitesDrugsToxins
Pe
rce
nta
ge
of
mo
lecu
les
Pe
rce
nta
ge
of
mo
lecu
les
Pe
rce
nta
ge
of
mo
lecu
les
Pe
rce
nta
ge
of
mo
lecu
les
16
3D property comparison
0
10
20
30
40
50
0-1
00
100
-200
200
-300
300
-400
400
-500
500
-600
600
-700
700
-800
800
-900
0
10
20
30
40
0-1
00
10
0-2
00
20
0-3
00
30
0-4
00
40
0-5
00
50
0-6
00
60
0-7
00
70
0-8
00
80
0-9
00
90
0-1
000
100
0-1
10
0
a. Molecular surface area b. Molecular volume Oxygen atom distribution
0
5
10
15
20
25
30
35
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Number of oxygen atoms
Perc
en
tag
e o
f m
ole
cu
les
MetabolitesDrugsToxins
Oxygen atom distribution
0
5
10
15
20
25
30
35
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Number of oxygen atoms
Perc
en
tag
e o
f m
ole
cu
les
MetabolitesDrugsToxins
0
10
20
30
40
0-1
00
10
0-2
00
20
0-3
00
30
0-4
00
40
0-5
00
50
0-6
00
60
0-7
00
70
0-8
00
80
0-9
00
90
0-1
000
100
0-1
10
0
11
00-1
200
12
00-1
300
13
00-1
400
0
10
20
30
40
50
0-50
50-
100
100
-150
150
-200
200
-250
250
-300
300
-350
350
-400
c. Molecular polar surface area d. Molecular solvent accessible volume
Oxygen atom distribution
0
5
10
15
20
25
30
35
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Number of oxygen atoms
Perc
en
tag
e o
f m
ole
cu
les
MetabolitesDrugsToxins
Oxygen atom distribution
0
5
10
15
20
25
30
35
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Number of oxygen atoms
Perc
en
tag
e o
f m
ole
cu
les
MetabolitesDrugsToxins
Pe
rce
nta
ge
of
mo
lecu
les
Pe
rce
nta
ge
of
mo
lecu
les
Pe
rce
nta
ge
of
mo
lecu
les
Pe
rce
nta
ge
of
mo
lecu
les
Clustered vs. raw datasets - Ia. Number of oxygen atoms
0
5
10
15
20
25
30
35
0 2 4 6 8 10 12 14 16 18 20
0
5
10
15
20
25
30
35
0 2 4 6 8 10 12 14 16 18 20
0
5
10
15
20
25
30
35
0 2 4 6 8 10 12 14 16 18 20
0
10
20
30
40
50
60
70
80
-10
– 5
-5
– 0
0 –
5
5 –
10
10 –
15
15
– 20
20
– 25
25 –
30
M Clustered
M Raw
0
10
20
30
40
50
60
70
80
-15
– -1
0
-10
– 5
-5 – 0
0 – 5
5 – 10
10
– 15
15
– 20
20
– 25
25
– 30
D Clustered
D Raw
0
10
20
30
40
50
60
70
80
-1
5 –
-1
0
-1
0 –
5
-5
– 0
0 –
5
5 –
10
10
– 1
5
15
– 2
0
20
– 2
5
25
– 3
0
T Clustered
T Raw
~ 10 % drop
b. Number of rings
0
10
20
30
40
50
60
70
0 1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
50
60
70
0 1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
50
60
70
0 1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
50
60
70
80
-1
0 – 5
-5
– 0
0 –
5
5 –
10
10
– 1
5
15
– 2
0
20
– 2
5
25
– 3
0
M Clustered
M Raw
0
10
20
30
40
50
60
70
80
-15
– -1
0
-10
– 5
-5 – 0
0 – 5
5 – 10
10
– 15
15
– 20
20
– 25
25
– 30
D Clustered
D Raw
0
10
20
30
40
50
60
70
80
-1
5 –
-1
0
-1
0 –
5
-5
– 0
0 –
5
5 –
10
10
– 1
5
15
– 2
0
20
– 2
5
25
– 3
0
T Clustered
T Raw
~ 9 % drop
~ 9 % drop
Pe
rce
nta
ge
Pe
rce
nta
ge
Pe
rce
nta
ge
Pe
rce
nta
ge
Pe
rce
nta
ge
Pe
rce
nta
ge
Clustered vs. raw datasets -II
Pe
rce
nta
ge
Pe
rce
nta
ge
Pe
rce
nta
ge
b. Molecular polar surface area
0
10
20
30
40
50
0-5
0
50-1
00
100-1
50
150-2
00
200-2
50
250-3
00
300-3
50
350-4
00
0
10
20
30
40
50
0-5
0
50-1
00
100-1
50
150-2
00
200-2
50
250-3
00
300-3
50
350-4
00
0
10
20
30
40
50
0-5
0
50-1
00
100-1
50
150-2
00
200-2
50
250-3
00
300-3
50
350-4
00
0
10
20
30
40
50
60
70
80
-10 – 5
-5 – 0
0 – 5
5 – 10
10 – 15
15 – 20
20 – 25
25 – 30
M Clustered
M Raw
0
10
20
30
40
50
60
70
80
-15
– -10
-10
– 5
-5 – 0
0
– 5
5 – 10
10
– 15
15
– 20
20
– 25
25
– 30
D Clustered
D Raw
0
10
20
30
40
50
60
70
80
-1
5 –
-1
0
-1
0 –
5
-5
– 0
0 –
5
5 –
10
10
– 1
5
15
– 2
0
20
– 2
5
25
– 3
0
T Clustered
T Raw
a. Molecular solubility
0
15
30
45
60
75
-30 –
-25
-25 –
-20
-20 –
-15
-15 –
-10
-10 –
-5
-5 –
0
0 –
5
0
15
30
45
60
75
-30 –
-25
-25 –
-20
-20 –
-15
-15 –
-10
-10 –
-5
-5 –
0
0 –
5
0
15
30
45
60
75
-30 –
-25
-25 –
-20
-20 –
-15
-15 –
-10
-10 –
-5
-5 –
0
0 –
5
0
10
20
30
40
50
60
70
80
-10 – 5
-5 – 0
0 – 5
5 – 10
10 – 15
15 – 20
20 – 25
25 – 30
M Clustered
M Raw
0
10
20
30
40
50
60
70
80
-15
– -10
-10
– 5
-5 – 0
0
– 5
5 – 10
10
– 15
15
– 20
20
– 25
25
– 30
D Clustered
D Raw
0
10
20
30
40
50
60
70
80
-1
5 –
-1
0
-1
0 –
5
-5
– 0
0 –
5
5 –
10
10
– 1
5
15
– 2
0
20
– 2
5
25
– 3
0
T Clustered
T Raw
Pe
rce
nta
ge
Pe
rce
nta
ge
Pe
rce
nta
ge
~ 10 % drop
~ 15 % rise
19
Functional group analysis
Functional Group
Metabolitedataset
Drugs dataset
Toxin dataset
Aromatic atom 17.4% 70.6% 62.3%
Benzene 10.3% 56.0% 53.0%
HBA Ester 56.3% 13.8% 15.4%
Primary amine 28.0% 14.4% 12.0%
Secondary amine 11.4% 64.0% 41.2%
Tertiary amine 44.6% 80.0% 60.0%
Quaternary Amine 15.3% 02.1% 00.5%
Primary amide 01.5% 04.5% 03.9%
Secondary amide 11.4% 31.0% 14.5%
Tertiary amide 02.8% 16.8% 09.2%
Alkyl halide ~0.5% ~0.5% 03.2%
Azo 00.0% ~0.5% 03.4%
Nitroso ~0.5% 00.6% 08.4%
20
Outline of the presentation
Introduction Chemoinformatics and current drug discovery approach
Drug-likeness and related measures Molecular bioactivity space
Results Conclusion
21
Conclusions
70% of the metabolites are outside Lipinski universe whereas 90% of the toxins abide by Lipinski’s rule.
Ro5 does not explicitly take toxicity into account and therefore present day drugs are more akin to toxins.
Empirical rules like the “Ro5” can be refined to increase the coverage of drugs or drug-like molecules that are clearly not close to toxic compounds.
Clustered and unclustered datasets are very similar, except in the case of the number of oxygen atoms, the molecular polar surface area and the number of rings.
22
Related work
Customary medicinal plant database Gaikwad J, Khanna V, Vemulpad S, Jamie J, Kohen J,
Ranganathan S: CMKb: a web-based prototype for integrating Australian Aboriginal customary medicinal plant knowledge.
BMC Bioinformatics 2008, 9 Suppl 12:S25.
Invited Chemoinformatics book chapter Khanna V, Ranganathan S: In Silico Methods for the Analysis of
Metabolites and Drug Molecules, in Algorithms in Computational Molecular Biology: Techniques, Approaches and Applications, eds. M. Elloumi and A.Y. Zomaya, Wiley, 2009, accepted.
23
Acknowledgement
VK is grateful to: Macquarie University for the award of a Research
Excellence Scholarship (MQRES) PhD Supervisor and Co-supervisors @ MQ Colleagues and friends @ MQ InCoB2009 Program and Organizing Committee members
24
Thank you and
Questions
25
Supplementary data