Chemogenomics John Overington
Transcript of Chemogenomics John Overington
ChemogenomicsJohn Overington
1The screen versions of these slides have full details of copyright and acknowledgements
1
Chemogenomics
Dr. John OveringtonEMBL-EBI
2
Drug Discovery
Target Discovery
• Target identification
• Microarray profiling
• Target validation
• Assay development
• Biochemistry
• Clinical/Animaldisease models
Lead Discovery
• High-throughputScreening (HTS)
• Fragment-basedscreening
• Focused libraries
• Screening collection
Lead Optimisation
• Medicinal Chemistry
• Structure-baseddrug design
• Selectivity screens
• ADMET screens
• Cellular/Animaldisease models
• Pharmacokinetics
Preclinical Development
• Toxicology
• In vivo safety pharmacology
• Formulation
• Dose prediction
Med. Chem. SAR Clinical Candidates Drugs
Discovery Development Use
Phase 1
Phase 2
Phase 3
Launch(Phase
4)
PK
tolerabilityEfficacy
Safety
&
Efficacy
Indication
Discovery & expansion
Clinical Trials
3
• Interaction ∝ distance from ligand– Two general components
to binding energy
Van der WaalsE ∝ r-6
short range
Electrostatic E ∝ r-2
longer range
• Use close contacts for selectivity assessment
– Analyse binding site amino acid physicochemistry differences
Ligand Binding
0 2 4 6 8 10020406080
100
% E
nerg
y
Atomic separation (Å)
ChemogenomicsJohn Overington
2The screen versions of these slides have full details of copyright and acknowledgements
4
Similar Ligands Bind Similar Proteins
• QSAR analysis of inhibition of a set of enzymes
• Screening data– 5 enzymes
– 18 compounds
– IC50 values comparedCarefully selected substrates and assays
Catalytic domain screening construct
5
Structure Activity Relationship (SAR) TableCompound Enzyme-1 Enzyme-2 Enzyme-3 Enzyme-4 Enzyme-5
1 1.83 -0.21 0.86 -0.62 0.402 3.97 3.46 1.21 3.36 3.663 4.67 3.53 0.75 2.92 3.284 2.71 2.98 1.04 3.82 4.395 2.57 2.88 0.64 4.15 4.366 3.36 2.89 1.02 3.58 3.947 3.26 2.84 0.66 3.49 4.008 3.99 2.55 0.08 3.77 3.699 3.69 2.44 0.00 2.56 1.96
10 3.91 2.95 0.17 3.89 3.2511 4.73 4.45 1.54 4.46 4.6512 3.77 2.87 0.32 2.75 2.9713 3.85 2.64 0.26 3.50 3.9114 3.99 2.65 0.30 2.72 2.5515 4.54 4.71 1.74 4.81 4.6816 3.72 2.88 0.48 3.30 4.7017 4.15 2.69 0.07 3.59 4.6518 4.16 4.12 2.17 4.16 4.75
6
Correlation of Binding
• Correlation coefficient matrices– Cluster analysis - multidimensional scaling
– N x M activity matrix - N proteins and M compoundsN x N: ‘similarity’ of proteins
M x M: ‘similarity’ of inhibitors
– How similar are inhibition profiles
– Insights into selectivity
ChemogenomicsJohn Overington
3The screen versions of these slides have full details of copyright and acknowledgements
7
Correlation of Binding (2)
enzyme-1 enzyme-2 enzyme-3 enzyme-4 enzyme-5
enzyme-1 1.000 0.795 0.147 0.572 0.447
enzyme-2 1.000 0.523 0.877 0.771
enzyme-3 1.000 0.285 0.365
enzyme-4 1.000 0.894
enzyme-5 1.000
Correlation matrix (standardised log activity data)
Most similar pair, enzyme-5 and enzyme-4
Least similar pair, enzyme-3 and enzyme-1
Perfectly correlated SAR = 1.00, uncorrelated = 0.00, anti-correlated = -1.00
8
Protein Bioactivity Clustering
-0.4 -0.2 0.0 0.2 0.4 0.6-0.3
-0.2
-0.1
0.0
0.1
0.2
0.3
Enzyme-1
Enzyme-2 Enzyme-3
Enzyme-4
Enzyme-5
2nd
axis
1st axis
Inhibitor SAR-based clustering
9
Binding Site
• Built representative inhibitor complexed to library of enzyme models
• Extracted all residues within 9Å of ligand– Binding site contains 46 residues
19 invariant – across all sequences
27 invariant – across set of (enzyme-1, enzyme-2, enzyme-3, enzyme-4, enzyme-5)
• Clustered sequences based on properties of active site residues
ChemogenomicsJohn Overington
4The screen versions of these slides have full details of copyright and acknowledgements
10
Binding Site Clustering (2)
-10 -5 0 5 10 15 20
-10
-5
0
5
10
15
Enzyme-2Enzyme-4
Enzyme-5
Enzyme-1
Enzyme-3
Axis
2
Axis 1
Active site sequence-based clustering
11
Similar Proteins Bind Similar Ligands
• SAR profiles mirror active site differences– Can anticipate activity profile of newly identified
enzyme sequences
– Can anticipate likely selectivity issues in vivo
Provide focus biology experiments
Hypothesis driven selectivity screens
12
Binding Features Mapped to Structure• General tools to map binding features on to 3-D protein models
– Andrews binding energyP.R. Andrews et al., J. Med. Chem. 27, pp. 1648-1657 (1984)
Fragment-based contributions to ligand binding
– Eisenberg and McLachlan solvation parametersD. Eisenberg and A.D. McLachlan, Nature, 319, pp. 199-203 (1986)
Captures important empirical components of free energy of solvation/ hydrophobic effect
– AAindex databaseKawashima et al., Nuc. Acids Res., 27, pp. 368-369 (1999)
402 residue-based indices
– Wold QSAR parametersSandberg et al., J. Med. Chem., 41, pp. 2481-2491 (1998)
Factor analysis of physicochemical amino acid features
ChemogenomicsJohn Overington
5The screen versions of these slides have full details of copyright and acknowledgements
13
Structure Mapping of Andrews Energy
Mapping of Andrews binding energy to ligand site residues of rhodopsin structure;
Note high predicted binding energy for retinal binding lysine N atom
14
Drug Discovery (2)
Target Discovery
•Target identification
•Microarray profiling
•Target validation
•Assay development
•Biochemistry
•Clinical/Animaldisease models
Lead Discovery
• High-throughputScreening (HTS)
• Fragment-basedscreening
• Focused libraries
• Screening collection
Lead Optimisation
• Medicinal Chemistry
• Structure-baseddrug design
• Selectivity screens
• ADMET screens
• Cellular/Animaldisease models
• Pharmacokinetics
Preclinical Development
• Toxicology
• In vivo safety pharmacology
• Formulation
• Dose prediction
Med. Chem. SAR Clinical Candidates Drugs
Discovery Development Use
Phase 1
Phase 2
Phase 3
Launch(Phase
4)
PK
tolerabilityEfficacy
Safety
&
Efficacy
Indication
Discovery &
expansion
Clinical Trials
14
15
ChEMBL
15
eTox Project - CONFIDENTIAL15
ChemogenomicsJohn Overington
6The screen versions of these slides have full details of copyright and acknowledgements
16
J.P. Overington, B. Al-Lazikani & A.L. Hopkins (2006) ‘How many drug targets are there?’Nat. Rev. Drug Disc., 5, 993-996
16
17
Targets of Launched DrugsNat. Rev. Drug Disc., 5, pp. 993-996 (2006)
18
Drug Approvals
ChemogenomicsJohn Overington
7The screen versions of these slides have full details of copyright and acknowledgements
19
NFκB Pathway
19
20
FDA Approved Drugs
20
21
• Database of clinical development candidates
– Contains ~12,000 2-D structures/sequencesEstimated size ~35-45,000 compounds
Clinical Candidates
ChemogenomicsJohn Overington
8The screen versions of these slides have full details of copyright and acknowledgements
22
Clinical Candidates (2)
22
23
Drug Discovery (3)
>1,000,000 compound records
> 5,000,000 bioactivities
~45,000 abstracted papers
~12,000 clinicalcandidates
~1,300drugs
Target Discovery
• Target identification
• Microarray profiling
• Target validation
• Assay development
• Biochemistry
• Clinical/Animaldisease models
Lead Discovery
• High-throughputScreening (HTS)
• Fragment-basedscreening
• Focused libraries
• Screening collection
Lead Optimisation
• Medicinal Chemistry
• Structure-baseddrug design
• Selectivity screens
• ADMET screens
• Cellular/Animaldisease models
• Pharmacokinetics
Preclinical Development
• Toxicology
• In vivo safety pharmacology
• Formulation
• Dose prediction
DrugsMed. Chem. SAR ClinicalCandidates
Discovery Development Use
Phase 1
Phase 2
Phase 3
Launch(Phase
4)
PK
tolerabilityEfficacy
Safety
&
Efficacy
Indication
Discovery &
expansion
Clinical Trials
24
What Is the ChEMBL Data?
ChemogenomicsJohn Overington
9The screen versions of these slides have full details of copyright and acknowledgements
25SAR Data
Compound
Assa
y
Ki = 4.5 nM
>ThrombinMAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQ ARSLLQ RVRRAN TFLEEV RKGNLE RECVEE TCSYEEAFEALESSTATDVFWAKYTACETARTPRDKL AACLEG NCAEGL GTNYRG HVNITR SGIECQ LWRSRYPHKPEINSTTHPGADLQENFCRNPDSSTTGP WCYTTD PTVRRQ ECSIPV CGQDQV TVAMTP RSEGSSVNLSPPLEQCVPDRGQQYQGRLAVTTHGLPC LAWASA QAKALS KHQDFN SAVQLV ENFCRN PDGDEEGVWCYVAGKPGDFGYCDLNYCEEAVEEETGD GLDEDS DRAIEG RTATSE YQTFFN PRTFGS GEADCGLRPLFEKKSLEDKTERELLESYIDGRIVEGS DAEIGM SPWQVM LFRKSP QELLCG ASLISD RWVLTAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYE RNIEKI SMLEKI YIHPRY NWRENL DRDIAL MKLKKPVAFSDYIHPVCLPDRETAASLLQAGYKGRVT GWGNLK ETWTAN VGKGQP SVLQVV NLPIVE RPVCKDSTRIRITDNMFCAGYKPDEGKRGDACEGDSG GPFVMK SPFNNR WYQMGI VSWGEG CDRDGK YGFYTHVFRLKKWIQKVIDQFGE
ED2 = 230 nM
What Is the ChEMBL Data? (2)
Inhibition of human Thrombin
PTT (partial thromboplastin
time)
26
4th generation3rd generation2nd generation1st generationPrototype
Drug Optimisation
N
N
N+
O
O
Azomycin
(1956)
Streptomyces
natural product
trichomonacidal
‘toxic’
Metronidazole 1962
N
N
N+
O
O
O
N
N
Cl
N
N
Cl
Cl
O
Cl
Cl
N
N
Cl
Cl
O
Cl
Clotrimazole 1970
Miconazole 1970
Econazole 1972
N
N
Cl
Cl
S
Cl
N
N
N+
O
O
SO O
N
N
Tinidazole 1970
Bifonazole 1981
Sulconazole 1980
Ketoconazole 1978 Itraconazole 1984
Terconazole 1980
Voriconazole 2002
Fluconazole 1988
Fosfluconazole 2004
Posaconazole 2005
Imidazole triazole
After W. Sneader26
27
A. Gaulton, L. Bellis, J. Chambers, M. Davies, A. Hersey, Y. Light, S. McGlinchey, R. Akhtar,F. Atkinson, A.P. Bento, B. Al-Lazikani, D. Michalovich, & J.P. Overington (2011) ‘ChEMBL: A Large-scale Bioactivity Database For Chemical Biology And Drug Discovery’Nucl. Acids Res. Database Issue
ChemogenomicsJohn Overington
10The screen versions of these slides have full details of copyright and acknowledgements
28
L.J. Bellis, R. Akhtar, B.
Al-Lazikani, F. Atkinson, P. Bento, J. Chambers, M.
Davies, A. Gaulton, A. Hersey,
K. Ikeda, F.A. Kruger, Y. Light,
S. McGlinchey, R. Santos, B. Stauch & J.P. Overington (2011) ‘Collation and Data-mining
of Literature Bioactivity Data for
Drug Discovery’Biochem. Soc.
Trans. 39, 1365-1370
29
30
ChEMBL (2)
ChemogenomicsJohn Overington
11The screen versions of these slides have full details of copyright and acknowledgements
31
Compound Searching
31
32
Chart Views of Data
32
33
Chart Views of Data (2)
33
ChemogenomicsJohn Overington
12The screen versions of these slides have full details of copyright and acknowledgements
34
Target Class Data
34
35
Organism Class Data
36
ChemogenomicsJohn Overington
13The screen versions of these slides have full details of copyright and acknowledgements
37
38
Drug Target Assessment
PotentialTargets
Multiparameter
Druggability
Scoring
P003P010P210P002P007P196P083P012P051P199P023P037P060P058
F SL Sel
Evidence-based Objective Target
Ranking
Feature
Structure
Ligand
Selectivity
>P001 TMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHEL
RVISSIEQKTERNEKKQQMGKEYREKIEAELQDICNDKMKGDYFRYLSEVASGDNKQTTVSNSQQAYQEAFEIS
EILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLGEGEN>P002
RIISSIEQKEENKGGEDKLKMIREYRQMVETELKLICYYKMKGDYHRYLAEFATGNDRKEAAENSLVAYKAASD
YYEILNSPDRACRLAKAAFDDAIAELDTLSEESYKDSEQNKEALQDVEDENQ>P003
VISSIEQKTMADGNEKKLEKVKAYREKIEKELETVCNFYLKMKGDYYRYLAEVASGEKKNSVVEASEAAYKEAF
FYYEIQNAPEQACLLAKQAFDDAIAELDTLNEDSYKDAGEGN>P004
VDREQLVQKARLAEQAERYDDMAAAMKNVTELNEPLSVISSIEQKTSADGNEKKIEMVRAYREKIEKELEAVCQ
FYLKMKGDYYRYLAEVATGEKRATVVESSEKAYSEAHFYYEIQNAPEQACHLAKTAFDDAIAELDTLNEDSYKDGGEGNN
>P005 MERASLIQKAKLAEQAERYEDMAAFMKGAVEKGEELS
VLSSIEQKSNEEGSEEKGPEVREYREKVETELQGVCDLKMKGDYYRYLAEVATGDDKKRIIDSARSAYQEAMDIYEIANSPEEAISLAKTTFDEAMADLHTLSEDSYKDST
EAPQEPQS>P006
MEKTELIQKAKLAEQAERYDDMATCMKAVTEQGAELSVISSIEQKTDTSDKKLQLIKDYREKVESELRSICTTVMKGDYFRYLAEVACGDDRKQTIDNSQGAYQEAFDISK
ILNNPELACTLAKTAFDEAIAELDTLNEDSYKDSTLIEGAEN
>P007 MDKNELVQKAKLAEQAERYDDMAACMKSVTEQGAELSVVSSIEQKTEGAEKKQQMAREYREKIETELRDICNDV
MKGDYYRYLAEVAAGDDKKGIVDQSQQAYQEAFEISKILNSPEKACSLAKTAFDEAIAELDTLSEESYKDSTLI
EGGEN
.
.
.
.
39
Sequence-Based Target Scoring
Top 5% of genesTop 15% of genes
Top 25% of genes
O
NO
FF
N
EFLORNITHINEOrnithine decarboxylase
N
O
NN
PYRAZINAMIDEFatty Acid Synthase**
SN N
NO OO
O
N
SULFADOXINEDihydropteroate synthetase
O
OO
C
ATOVAQUONECytochrome b
Druggability score
Num
ber o
f Gen
es
ChemogenomicsJohn Overington
14The screen versions of these slides have full details of copyright and acknowledgements
40
• Poorly defined site
• Highly peptidic nature
• Lipophillic polyvalent acid
• Lead optimisation graveyard
• ‘Tantalising’
Structure-Based Scoring
HIV-1 reversetranscriptase
HIV-1 proteinase
LCK SH2 domain
Target Structure Site
• Extended site
• Highly peptidic nature
• Likely poor PK
• Compact enclosed site
• Primarily hydrophobic
• Balanced H-bond don/acc
• Inherent flexibility of site
41
Structure-Based Scoring (2)
DruggableNon-druggable
• Machine Learning approachLDA (Linear Discriminant Analysis)
PDA (Penalized Linear Discriminant Analysis)
KRIDGE (Kernel Ridge Regression Model)
MLP (Multi Layer Perceptron)
SVM (Support Vector machine with Gaussian kernels)
CART (Classification and regression trees)
ADABOOST
K-Nearest Neighbour Models
Boosted Versions of CART, MLP, KRIDGE
Heterogenous ensemble, trained on recall:accuracy precision recall F1
train: 0.95283 0.73943 0.93215 0.82231
test: 0.92683 0.60942 0.85261 0.70299
42
Structure-Based Scoring (3)
ChemogenomicsJohn Overington
15The screen versions of these slides have full details of copyright and acknowledgements
43
Predicted Druggability (Small Mol)
43
44
F. Agüero, B. Al-Lazikani, M. Aslett, M. Berriman, F.S. Buckner, R.K. Campbell, S. Carmona, I.M. Carruthers, A.W.E. Chan, F. Chen, G.J. Crowther, C. Hertz-Fowler, A.L. Hopkins, G. McAllister, S. Nwaka, J.P. Overington, A. Pain, G.V. Paolini, U. Pieper, S.A. Ralph, A. Riechers, D.S. Roos, DS, A. Šali, D. Shanmugam, T. Suzuki, W.C. Van Voorhis, & C.L. Verlinde (2008) ‘Genomic-scale Prioritization of Drug Targets: TDRtargets.org’. Nature Rev. Drug. Discov., 7 900-7. DOI: 10.1038/nrd2684
45
Druggability Scoring – Validation
Molecular target (or resistance mechanism target) is in top 1% scoring
genes in feature-based druggability scoring
ChemogenomicsJohn Overington
16The screen versions of these slides have full details of copyright and acknowledgements
46
The ChEMBL-og - www.chemblog.org
47