Chemogenomics John Overington

16
Chemogenomics John Overington 1 The screen versions of these slides have full details of copyright and acknowledgements 1 Chemogenomics Dr. John Overington EMBL-EBI [email protected] 2 Drug Discovery Target Discovery Target identification Microarray profiling Target validation Assay development Biochemistry Clinical/ Animal disease models Lead Discovery • High- throughput Screening (HTS) • Fragment- based screening • Focused libraries • Screening collection Lead Optimisation • Medicinal Chemistry • Structure-based drug design • Selectivity screens ADMET screens • Cellular/Animal disease models • Pharmacokinetics Preclinical Development • Toxicology In vivo safety pharmacology • Formulation Dose prediction Med. Chem. SAR Clinical Candidates Drugs Discovery Development Use Phase 1 Phase 2 Phase 3 Launch (Phase 4) PK tolerability Efficacy Safety & Efficacy Indication Discovery & expansion Clinical Trials 3 • Interaction distance from ligand – Two general components to binding energy Van der Waals E r -6 short range Electrostatic E r -2 longer range Use close contacts for selectivity assessment – Analyse binding site amino acid physicochemistry differences Ligand Binding 0 2 4 6 8 10 0 20 40 60 80 100 % Energy Atomic separation (Å)

Transcript of Chemogenomics John Overington

Page 1: Chemogenomics John Overington

ChemogenomicsJohn Overington

1The screen versions of these slides have full details of copyright and acknowledgements

1

Chemogenomics

Dr. John OveringtonEMBL-EBI

[email protected]

2

Drug Discovery

Target Discovery

• Target identification

• Microarray profiling

• Target validation

• Assay development

• Biochemistry

• Clinical/Animaldisease models

Lead Discovery

• High-throughputScreening (HTS)

• Fragment-basedscreening

• Focused libraries

• Screening collection

Lead Optimisation

• Medicinal Chemistry

• Structure-baseddrug design

• Selectivity screens

• ADMET screens

• Cellular/Animaldisease models

• Pharmacokinetics

Preclinical Development

• Toxicology

• In vivo safety pharmacology

• Formulation

• Dose prediction

Med. Chem. SAR Clinical Candidates Drugs

Discovery Development Use

Phase 1

Phase 2

Phase 3

Launch(Phase

4)

PK

tolerabilityEfficacy

Safety

&

Efficacy

Indication

Discovery & expansion

Clinical Trials

3

• Interaction ∝ distance from ligand– Two general components

to binding energy

Van der WaalsE ∝ r-6

short range

Electrostatic E ∝ r-2

longer range

• Use close contacts for selectivity assessment

– Analyse binding site amino acid physicochemistry differences

Ligand Binding

0 2 4 6 8 10020406080

100

% E

nerg

y

Atomic separation (Å)

Page 2: Chemogenomics John Overington

ChemogenomicsJohn Overington

2The screen versions of these slides have full details of copyright and acknowledgements

4

Similar Ligands Bind Similar Proteins

• QSAR analysis of inhibition of a set of enzymes

• Screening data– 5 enzymes

– 18 compounds

– IC50 values comparedCarefully selected substrates and assays

Catalytic domain screening construct

5

Structure Activity Relationship (SAR) TableCompound Enzyme-1 Enzyme-2 Enzyme-3 Enzyme-4 Enzyme-5

1 1.83 -0.21 0.86 -0.62 0.402 3.97 3.46 1.21 3.36 3.663 4.67 3.53 0.75 2.92 3.284 2.71 2.98 1.04 3.82 4.395 2.57 2.88 0.64 4.15 4.366 3.36 2.89 1.02 3.58 3.947 3.26 2.84 0.66 3.49 4.008 3.99 2.55 0.08 3.77 3.699 3.69 2.44 0.00 2.56 1.96

10 3.91 2.95 0.17 3.89 3.2511 4.73 4.45 1.54 4.46 4.6512 3.77 2.87 0.32 2.75 2.9713 3.85 2.64 0.26 3.50 3.9114 3.99 2.65 0.30 2.72 2.5515 4.54 4.71 1.74 4.81 4.6816 3.72 2.88 0.48 3.30 4.7017 4.15 2.69 0.07 3.59 4.6518 4.16 4.12 2.17 4.16 4.75

6

Correlation of Binding

• Correlation coefficient matrices– Cluster analysis - multidimensional scaling

– N x M activity matrix - N proteins and M compoundsN x N: ‘similarity’ of proteins

M x M: ‘similarity’ of inhibitors

– How similar are inhibition profiles

– Insights into selectivity

Page 3: Chemogenomics John Overington

ChemogenomicsJohn Overington

3The screen versions of these slides have full details of copyright and acknowledgements

7

Correlation of Binding (2)

enzyme-1 enzyme-2 enzyme-3 enzyme-4 enzyme-5

enzyme-1 1.000 0.795 0.147 0.572 0.447

enzyme-2 1.000 0.523 0.877 0.771

enzyme-3 1.000 0.285 0.365

enzyme-4 1.000 0.894

enzyme-5 1.000

Correlation matrix (standardised log activity data)

Most similar pair, enzyme-5 and enzyme-4

Least similar pair, enzyme-3 and enzyme-1

Perfectly correlated SAR = 1.00, uncorrelated = 0.00, anti-correlated = -1.00

8

Protein Bioactivity Clustering

-0.4 -0.2 0.0 0.2 0.4 0.6-0.3

-0.2

-0.1

0.0

0.1

0.2

0.3

Enzyme-1

Enzyme-2 Enzyme-3

Enzyme-4

Enzyme-5

2nd

axis

1st axis

Inhibitor SAR-based clustering

9

Binding Site

• Built representative inhibitor complexed to library of enzyme models

• Extracted all residues within 9Å of ligand– Binding site contains 46 residues

19 invariant – across all sequences

27 invariant – across set of (enzyme-1, enzyme-2, enzyme-3, enzyme-4, enzyme-5)

• Clustered sequences based on properties of active site residues

Page 4: Chemogenomics John Overington

ChemogenomicsJohn Overington

4The screen versions of these slides have full details of copyright and acknowledgements

10

Binding Site Clustering (2)

-10 -5 0 5 10 15 20

-10

-5

0

5

10

15

Enzyme-2Enzyme-4

Enzyme-5

Enzyme-1

Enzyme-3

Axis

2

Axis 1

Active site sequence-based clustering

11

Similar Proteins Bind Similar Ligands

• SAR profiles mirror active site differences– Can anticipate activity profile of newly identified

enzyme sequences

– Can anticipate likely selectivity issues in vivo

Provide focus biology experiments

Hypothesis driven selectivity screens

12

Binding Features Mapped to Structure• General tools to map binding features on to 3-D protein models

– Andrews binding energyP.R. Andrews et al., J. Med. Chem. 27, pp. 1648-1657 (1984)

Fragment-based contributions to ligand binding

– Eisenberg and McLachlan solvation parametersD. Eisenberg and A.D. McLachlan, Nature, 319, pp. 199-203 (1986)

Captures important empirical components of free energy of solvation/ hydrophobic effect

– AAindex databaseKawashima et al., Nuc. Acids Res., 27, pp. 368-369 (1999)

402 residue-based indices

– Wold QSAR parametersSandberg et al., J. Med. Chem., 41, pp. 2481-2491 (1998)

Factor analysis of physicochemical amino acid features

Page 5: Chemogenomics John Overington

ChemogenomicsJohn Overington

5The screen versions of these slides have full details of copyright and acknowledgements

13

Structure Mapping of Andrews Energy

Mapping of Andrews binding energy to ligand site residues of rhodopsin structure;

Note high predicted binding energy for retinal binding lysine N atom

14

Drug Discovery (2)

Target Discovery

•Target identification

•Microarray profiling

•Target validation

•Assay development

•Biochemistry

•Clinical/Animaldisease models

Lead Discovery

• High-throughputScreening (HTS)

• Fragment-basedscreening

• Focused libraries

• Screening collection

Lead Optimisation

• Medicinal Chemistry

• Structure-baseddrug design

• Selectivity screens

• ADMET screens

• Cellular/Animaldisease models

• Pharmacokinetics

Preclinical Development

• Toxicology

• In vivo safety pharmacology

• Formulation

• Dose prediction

Med. Chem. SAR Clinical Candidates Drugs

Discovery Development Use

Phase 1

Phase 2

Phase 3

Launch(Phase

4)

PK

tolerabilityEfficacy

Safety

&

Efficacy

Indication

Discovery &

expansion

Clinical Trials

14

15

ChEMBL

15

eTox Project - CONFIDENTIAL15

Page 6: Chemogenomics John Overington

ChemogenomicsJohn Overington

6The screen versions of these slides have full details of copyright and acknowledgements

16

J.P. Overington, B. Al-Lazikani & A.L. Hopkins (2006) ‘How many drug targets are there?’Nat. Rev. Drug Disc., 5, 993-996

16

17

Targets of Launched DrugsNat. Rev. Drug Disc., 5, pp. 993-996 (2006)

18

Drug Approvals

Page 7: Chemogenomics John Overington

ChemogenomicsJohn Overington

7The screen versions of these slides have full details of copyright and acknowledgements

19

NFκB Pathway

19

20

FDA Approved Drugs

20

21

• Database of clinical development candidates

– Contains ~12,000 2-D structures/sequencesEstimated size ~35-45,000 compounds

Clinical Candidates

Page 8: Chemogenomics John Overington

ChemogenomicsJohn Overington

8The screen versions of these slides have full details of copyright and acknowledgements

22

Clinical Candidates (2)

22

23

Drug Discovery (3)

>1,000,000 compound records

> 5,000,000 bioactivities

~45,000 abstracted papers

~12,000 clinicalcandidates

~1,300drugs

Target Discovery

• Target identification

• Microarray profiling

• Target validation

• Assay development

• Biochemistry

• Clinical/Animaldisease models

Lead Discovery

• High-throughputScreening (HTS)

• Fragment-basedscreening

• Focused libraries

• Screening collection

Lead Optimisation

• Medicinal Chemistry

• Structure-baseddrug design

• Selectivity screens

• ADMET screens

• Cellular/Animaldisease models

• Pharmacokinetics

Preclinical Development

• Toxicology

• In vivo safety pharmacology

• Formulation

• Dose prediction

DrugsMed. Chem. SAR ClinicalCandidates

Discovery Development Use

Phase 1

Phase 2

Phase 3

Launch(Phase

4)

PK

tolerabilityEfficacy

Safety

&

Efficacy

Indication

Discovery &

expansion

Clinical Trials

24

What Is the ChEMBL Data?

Page 9: Chemogenomics John Overington

ChemogenomicsJohn Overington

9The screen versions of these slides have full details of copyright and acknowledgements

25SAR Data

Compound

Assa

y

Ki = 4.5 nM

>ThrombinMAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQ ARSLLQ RVRRAN TFLEEV RKGNLE RECVEE TCSYEEAFEALESSTATDVFWAKYTACETARTPRDKL AACLEG NCAEGL GTNYRG HVNITR SGIECQ LWRSRYPHKPEINSTTHPGADLQENFCRNPDSSTTGP WCYTTD PTVRRQ ECSIPV CGQDQV TVAMTP RSEGSSVNLSPPLEQCVPDRGQQYQGRLAVTTHGLPC LAWASA QAKALS KHQDFN SAVQLV ENFCRN PDGDEEGVWCYVAGKPGDFGYCDLNYCEEAVEEETGD GLDEDS DRAIEG RTATSE YQTFFN PRTFGS GEADCGLRPLFEKKSLEDKTERELLESYIDGRIVEGS DAEIGM SPWQVM LFRKSP QELLCG ASLISD RWVLTAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYE RNIEKI SMLEKI YIHPRY NWRENL DRDIAL MKLKKPVAFSDYIHPVCLPDRETAASLLQAGYKGRVT GWGNLK ETWTAN VGKGQP SVLQVV NLPIVE RPVCKDSTRIRITDNMFCAGYKPDEGKRGDACEGDSG GPFVMK SPFNNR WYQMGI VSWGEG CDRDGK YGFYTHVFRLKKWIQKVIDQFGE

ED2 = 230 nM

What Is the ChEMBL Data? (2)

Inhibition of human Thrombin

PTT (partial thromboplastin

time)

26

4th generation3rd generation2nd generation1st generationPrototype

Drug Optimisation

N

N

N+

O

O

Azomycin

(1956)

Streptomyces

natural product

trichomonacidal

‘toxic’

Metronidazole 1962

N

N

N+

O

O

O

N

N

Cl

N

N

Cl

Cl

O

Cl

Cl

N

N

Cl

Cl

O

Cl

Clotrimazole 1970

Miconazole 1970

Econazole 1972

N

N

Cl

Cl

S

Cl

N

N

N+

O

O

SO O

N

N

Tinidazole 1970

Bifonazole 1981

Sulconazole 1980

Ketoconazole 1978 Itraconazole 1984

Terconazole 1980

Voriconazole 2002

Fluconazole 1988

Fosfluconazole 2004

Posaconazole 2005

Imidazole triazole

After W. Sneader26

27

A. Gaulton, L. Bellis, J. Chambers, M. Davies, A. Hersey, Y. Light, S. McGlinchey, R. Akhtar,F. Atkinson, A.P. Bento, B. Al-Lazikani, D. Michalovich, & J.P. Overington (2011) ‘ChEMBL: A Large-scale Bioactivity Database For Chemical Biology And Drug Discovery’Nucl. Acids Res. Database Issue

Page 10: Chemogenomics John Overington

ChemogenomicsJohn Overington

10The screen versions of these slides have full details of copyright and acknowledgements

28

L.J. Bellis, R. Akhtar, B.

Al-Lazikani, F. Atkinson, P. Bento, J. Chambers, M.

Davies, A. Gaulton, A. Hersey,

K. Ikeda, F.A. Kruger, Y. Light,

S. McGlinchey, R. Santos, B. Stauch & J.P. Overington (2011) ‘Collation and Data-mining

of Literature Bioactivity Data for

Drug Discovery’Biochem. Soc.

Trans. 39, 1365-1370

29

30

ChEMBL (2)

Page 11: Chemogenomics John Overington

ChemogenomicsJohn Overington

11The screen versions of these slides have full details of copyright and acknowledgements

31

Compound Searching

31

32

Chart Views of Data

32

33

Chart Views of Data (2)

33

Page 12: Chemogenomics John Overington

ChemogenomicsJohn Overington

12The screen versions of these slides have full details of copyright and acknowledgements

34

Target Class Data

34

35

Organism Class Data

36

Page 13: Chemogenomics John Overington

ChemogenomicsJohn Overington

13The screen versions of these slides have full details of copyright and acknowledgements

37

38

Drug Target Assessment

PotentialTargets

Multiparameter

Druggability

Scoring

P003P010P210P002P007P196P083P012P051P199P023P037P060P058

F SL Sel

Evidence-based Objective Target

Ranking

Feature

Structure

Ligand

Selectivity

>P001 TMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHEL

RVISSIEQKTERNEKKQQMGKEYREKIEAELQDICNDKMKGDYFRYLSEVASGDNKQTTVSNSQQAYQEAFEIS

EILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLGEGEN>P002

RIISSIEQKEENKGGEDKLKMIREYRQMVETELKLICYYKMKGDYHRYLAEFATGNDRKEAAENSLVAYKAASD

YYEILNSPDRACRLAKAAFDDAIAELDTLSEESYKDSEQNKEALQDVEDENQ>P003

VISSIEQKTMADGNEKKLEKVKAYREKIEKELETVCNFYLKMKGDYYRYLAEVASGEKKNSVVEASEAAYKEAF

FYYEIQNAPEQACLLAKQAFDDAIAELDTLNEDSYKDAGEGN>P004

VDREQLVQKARLAEQAERYDDMAAAMKNVTELNEPLSVISSIEQKTSADGNEKKIEMVRAYREKIEKELEAVCQ

FYLKMKGDYYRYLAEVATGEKRATVVESSEKAYSEAHFYYEIQNAPEQACHLAKTAFDDAIAELDTLNEDSYKDGGEGNN

>P005 MERASLIQKAKLAEQAERYEDMAAFMKGAVEKGEELS

VLSSIEQKSNEEGSEEKGPEVREYREKVETELQGVCDLKMKGDYYRYLAEVATGDDKKRIIDSARSAYQEAMDIYEIANSPEEAISLAKTTFDEAMADLHTLSEDSYKDST

EAPQEPQS>P006

MEKTELIQKAKLAEQAERYDDMATCMKAVTEQGAELSVISSIEQKTDTSDKKLQLIKDYREKVESELRSICTTVMKGDYFRYLAEVACGDDRKQTIDNSQGAYQEAFDISK

ILNNPELACTLAKTAFDEAIAELDTLNEDSYKDSTLIEGAEN

>P007 MDKNELVQKAKLAEQAERYDDMAACMKSVTEQGAELSVVSSIEQKTEGAEKKQQMAREYREKIETELRDICNDV

MKGDYYRYLAEVAAGDDKKGIVDQSQQAYQEAFEISKILNSPEKACSLAKTAFDEAIAELDTLSEESYKDSTLI

EGGEN

.

.

.

.

39

Sequence-Based Target Scoring

Top 5% of genesTop 15% of genes

Top 25% of genes

O

NO

FF

N

EFLORNITHINEOrnithine decarboxylase

N

O

NN

PYRAZINAMIDEFatty Acid Synthase**

SN N

NO OO

O

N

SULFADOXINEDihydropteroate synthetase

O

OO

C

ATOVAQUONECytochrome b

Druggability score

Num

ber o

f Gen

es

Page 14: Chemogenomics John Overington

ChemogenomicsJohn Overington

14The screen versions of these slides have full details of copyright and acknowledgements

40

• Poorly defined site

• Highly peptidic nature

• Lipophillic polyvalent acid

• Lead optimisation graveyard

• ‘Tantalising’

Structure-Based Scoring

HIV-1 reversetranscriptase

HIV-1 proteinase

LCK SH2 domain

Target Structure Site

• Extended site

• Highly peptidic nature

• Likely poor PK

• Compact enclosed site

• Primarily hydrophobic

• Balanced H-bond don/acc

• Inherent flexibility of site

41

Structure-Based Scoring (2)

DruggableNon-druggable

• Machine Learning approachLDA (Linear Discriminant Analysis)

PDA (Penalized Linear Discriminant Analysis)

KRIDGE (Kernel Ridge Regression Model)

MLP (Multi Layer Perceptron)

SVM (Support Vector machine with Gaussian kernels)

CART (Classification and regression trees)

ADABOOST

K-Nearest Neighbour Models

Boosted Versions of CART, MLP, KRIDGE

Heterogenous ensemble, trained on recall:accuracy precision recall F1

train: 0.95283 0.73943 0.93215 0.82231

test: 0.92683 0.60942 0.85261 0.70299

42

Structure-Based Scoring (3)

Page 15: Chemogenomics John Overington

ChemogenomicsJohn Overington

15The screen versions of these slides have full details of copyright and acknowledgements

43

Predicted Druggability (Small Mol)

43

44

F. Agüero, B. Al-Lazikani, M. Aslett, M. Berriman, F.S. Buckner, R.K. Campbell, S. Carmona, I.M. Carruthers, A.W.E. Chan, F. Chen, G.J. Crowther, C. Hertz-Fowler, A.L. Hopkins, G. McAllister, S. Nwaka, J.P. Overington, A. Pain, G.V. Paolini, U. Pieper, S.A. Ralph, A. Riechers, D.S. Roos, DS, A. Šali, D. Shanmugam, T. Suzuki, W.C. Van Voorhis, & C.L. Verlinde (2008) ‘Genomic-scale Prioritization of Drug Targets: TDRtargets.org’. Nature Rev. Drug. Discov., 7 900-7. DOI: 10.1038/nrd2684

45

Druggability Scoring – Validation

Molecular target (or resistance mechanism target) is in top 1% scoring

genes in feature-based druggability scoring

Page 16: Chemogenomics John Overington

ChemogenomicsJohn Overington

16The screen versions of these slides have full details of copyright and acknowledgements

46

The ChEMBL-og - www.chemblog.org

47