Surveying ligandSurveying ligand-- and targetand target ... · >75,000 Human Sequences >116,000...
Transcript of Surveying ligandSurveying ligand-- and targetand target ... · >75,000 Human Sequences >116,000...
Surveying ligandSurveying ligand-- and targetand target--based similarities ithin thebased similarities ithin thebased similarities within the based similarities within the
KinomeKinome
Stephan Stephan SchürerSchürer & Steven Muskal& Steven Muskal
Kinase Targets of Clinical Interest from Vieth et al. Drug Disc. Today 10, 839 (2005).
Eidogen-Sertanty KKB SAR Data Point Distribution
Primary targets w/ reportedclinical data
Reported secondary targets & targets w/ >60% ID
Kinase SAR Knowledgebase – Hot Targets
>362,000 SAR data points curated from >4,270 journal articles and patents
>130 Bayesian QSAR Models
• Knowledge-Driven Discovery Solutions Provider• Formed in March 2005 when Sertanty (Libraria Sertanty 2003) acquired Eidogen (Bionomix 2000)• >$20M Invested in Technology Development• 12 FTE’s• Worldwide Customerbase• Cash-Positive
About Eidogen-Sertanty
• DirectDesign™ Discovery Collaborations• In Silico Target Screening (“Target Fishing” and Repurposing)• Target and compound prioritization services• Fast Follower Design: Novel, Patentable Leads
• Chemogenomic Databases & Analysis Software• TIP™ - Structural Informatics Platform• KKB™ - Kinase SAR and Chemistry Knowledgebase• CHIP™ - Chemical Intelligence Platform
> 400KSequences
> 158KChains &Models
> 388KSites
> 33MSequence Similarities
> 69MStructure Similarities
> 62MSite Similarities
TIP Algorithm Engine
STRUCTFAST™
Basic Principle: Gaps known to exist should not be strongly penalized.
Known Gap
Structure Alignment of Homologous Crystal Structures
STructure Realization Utilizing Cogent Tips From Aligned Structural Templates
Leverages experimental structure and structural alignment data to create better alignments
Known Gap
2) STRUCTFAST: Protein Sequence Remote Homology Detection and Alignment Using Novel Dynamic Programming and Profile-Profile Scoring Proteins. 2006 64:960-967
1) Convergent Island Statistics: A fast method for determining local alignment score significance. Bioinformatics, 2005, 21, 2827-2831
SiteSeeker™
Geometric Site-Finding Algorithms Find Many PocketsBut they don’t know which pockets are important!
Evolutionary Trace ApproachCan’t clearly define site boundary
Not all conserved residues are functionally relevant
Reliability & ConfidenceWe use proteins with apo- & co-crystal structures in the PDB to test the accuracy & reliability of method
Allows us to map SiteSeeker score to predict confidence!(e.g. At this SiteSeeker score, 80% are “real” co-crystal sites)
Sites with <60% confidence are not stored in TIP
SiteSeeker combines both methods
SiteSorter™Weighted Clique Detection Algorithm
Importance of Points Related To Conservation In Multiple Sequence Alignment
Surface Atoms Assigned One of 5 Different Chemical CharactersMatching points increase the SiteSorter similarity score
TIP Content>75,000 Human Sequences
>116,000 Total PDB chains (~50K PDBs)> 42,000 Homology Models
>194,000 PDB co-crystal sites>190,000 Predicted Sites (on PDBs & Models)
>33M Sequence Similarities
>69M Structural Similarities
>62M Site Similarities
Automatically updated with new models as the PDB grows
Updated monthly withnew PDBs and models:
e.g. March 2006:661 new PDBs added447 new models built- 153 had no previous structure in TIP - 294 had “better” models built
e.g. July 2008:576 new PDBs added1045 new models built
Kinase Knowledgebase (KKB)Kinase inhibitor structures and SAR data mined from
> 4278 journal articles/patents
KKB Content Summary (Q2-2008):# of kinase targets: >390# of SAR Data points: > 362,000# of unique kinase molecules with SAR data: >120,000# of annotated assay protocols: >16,000# of annotated chemical reactions: >2,300# of unique kinase inhibitors: >465,000 (~340K enumerated from patent chemistries)
KKB Growth Rate:• Average 15-20K SAR data points added per quarter• Average 20-30K unique structures added per quarter
Kinase Knowledgebase (KKB)Kinase inhibitor structures and SAR data mined from
> 4100 journal articles/patents
KKB Content Summary (Q1-2008):# of kinase targets: >300# of SAR Data points: > 345,000# of unique kinase molecules with SAR data: >118,000# of annotated assay protocols: >15,350# of annotated chemical reactions: >2,300# of unique kinase inhibitors: >463,000 (~340K enumerated from patent chemistries)
KKB Growth Rate:• Average 15-20K SAR data points added per quarter• Average 20-30K unique structures added per quarter
Kinase Validation Set
Three sizable datasets freely available to the research community
http://www.eidogen-sertanty.com/kinasednld.php
LIMK1 – ATP binding site comparison LIMK1
AURKA SRC 62% ID in ATP Site
>3000 inhibitors in KKB
58% ID in ATP Site
>5100 inhibitors in KKB
Hbond donors Hbond acceptors Hbond donor/acceptors
LCK 58% ID in ATP Site
>8200 inhibitors in KKB
Conserved with LIMK1 Not conserved with LIMK1
The ATP site of LIMK1 shares a high level of homology with several wellstudied kinases
Kinome by SequenceKinome by Sequence
Kinase domain sequence similarities - MST
CAMKCAMK
AGC second domainAGC second domain
AGCAGC
TKTKothersothers
CK1CK1
CMGCCMGC
STESTERGCRGCTKLTKL
Kinome by SARKinome by SAR
Relating kinase targets by SAR
• Relationships derived from Bayesian categorization models• Adopted from Schuffenhauer Org Biomol Chem 2004 3256
• Bayesian categorization models built within PipelinePilot:
• Kinase enzyme assay data activity cutoff pIC50 > 6 5; all other• Kinase enzyme assay data, activity cutoff pIC50 > 6.5; all other compounds “negative”
• Functional group connectivity fingerprints length 4• ROC > 0.7
• Bayesian feature weights (~10,000 features) extracted for each model
• Correlation matrix determined between Bayesian vectors
• Visualization via minimum spanning trees (Kruskal algorithm)• Visualization via minimum spanning trees (Kruskal algorithm)
Kinase SAR Bayesian models
1
IKBMAP3K10
AURKCALK
MSK1 TNK2 FAK2 PHK PIK4CA
MAP3K8
CDC7
MYLK MAP2K2
CDC2A
ROCK1TGFBR1
ROCK
NTRK2
PRKCH
PDKPRKCB2
CDK9
PRKCE MAPKAPK3
MAP2K1
ADKIKBKB
ABL1
CSKPRKCQ
PRKCB1
ERBB2MAPKAPK2
RO
C
0.98
0.99 PRKACA MAPKDYRK1A
IRAK4
EPHA2
AKT3MAP3K11RPS6KA5
RPS6KA1YES1
MAP3K9
CAMK2
ERN
ERBB4
AKT2
AURKB CDC2A
CAMKSYK WEE1
ILKPKG
ROCK2
PIK3CBTK1
IGF1R FGFR
PIK3
CHEK2
JAK3
CDK9
GSK3A
MAPK10
PRKCG
FAK
AKT1
RAF1
CHEK1
PIK3CG
AURKA
BRAF
MET
PKC
CDK5
FGFR1
ERBB2
GSK3B
CDK4CDC2
EGFR
0.96
0.97PHKA1 CSNK1D
MAP3K9
PLK1
AKT2
BTK
CSNK2
PIK3CD
MAPK8
MAPK9
PDGFR
GSK3
PIK3CA
CDK7
PRKCDITK
PRKDC
FLT1 CHUK
TEKCSF1R
FLT3
PKAPRKCA
LCK
PDGFRB
SRCKDR
0.95EPHB4
CDK6
RPS6KB1FGFR2
MAPK11
BCR_ABL
FYN
NTRK1
RET
KIT
ABL
PRKDCMAPK14
CDK2
0.93
0.94 MAPK11
CMGCCMGCCAMKCAMK AGCAGCSTESTETKTK othersothersTKLTKL
0.92
12 25 40 80 150 299 600 999 2000 4000 6000
LYNPDPK1
KKB Num DP
130 Kinase Enzyme ModelsFCFP_4 fingerprintsscaled by number of actives
Kinase target relationships by SAR – MST
TKTK
othersothers
PKCsPKCs
CMGCCMGC AGCAGC
MAPKsMAPKs
PI3KsPI3KsCDKsCDKs
130 kinase models MST – all “similarities” > 0.27
SAR-based similarity vs. Sequence identity
0.8
0.9
eatu
re s
im CDC2A / CHUK
NH
NHR
0 6
0.7SA
R fe N
RNH
R
0 4
0.5
0.6
0.3
0.4
0.1
0.2
FGFR2 / FGFR30
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Sequence identity
FGFR2 / FGFR3
CDC2A and CHUK: > 90 ligands with activity against both targets
FGFR2 / FGFR3: no similar ligands
338 259 C
ount
FCFP4 Tanimoto (all pairs) Top active FGFR2:
200
250 238
One activeOne activeNeither activeNeither active
C
150
200
164 152
182 Both activeBoth active
Top active FGFR3:
100 80
98
Top active FGFR3:
50
20
65
31
00.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24 0.26 0.28 0.3 0.32
2 8 13
3
Similarity Bin
Kinome by structure binding site similaritiesKinome by structure binding site similarities
Relating kinases by ATP binding-site similarity
• Human Kinase domain sequences extracted (Sugen, Swissprot, PFAM)
H Ki (500 ) d l d i STRUCTFAST• Human Kinome (500 sequences) modeled using STRUCTFAST• Multiple models per sequence (subset of 263 presented here)
• Binding sites for all models computed (SiteSeeker)Binding sites for all models computed (SiteSeeker)
• Binding site similarity scores computed (SiteSorter)
Si il i li d AB N AB / (AA BB AB)• Similarity scores normalized: AB_Norm := AB / (AA + BB - AB)
• AB – Site Similarity between sites A & B• AA / BB “Self Site” Similarity Scores• AA / BB – Self Site Similarity Scores
• Analysis and visualization with MST
Kinase Site Similarity Relationships – MST
TKTKothersothersCAMKCAMK
STESTETKLTKL
AGCAGCAGCAGC
CK1CK1
TKTK
CMGCCMGC263 kinases; MST – all “similarities” > 0.6
Sequence vs. Site Similarity
1ite S
im
MAP3K8 / NTRK1
0.9
Si
MAST4 / PIK3R4
0.7
0.8
0.6
Same templateSame template
0.4
0.5Same templateSame templateDifferent templateDifferent template
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9Sequence Identity
Similar sites – different sequences
• STE_STE11_MAP3K8: template 1u5rA• TK Trk TRKA (NTRK1): template 1ir3A_ _ ( ) te p ate 3
MAP3K8 NTRK1
LGKGAY.V.A.K.V.E.V.MEFV.GGS.S.D.NN.M.DLGEGAF V A K – E V FE-M –GD – D –N L D
MAP3K8NTRK1
Similar sites – different site AA composition
• AGC_MAST_MAST4: template 1z5mA• Other_VPS15_PIK3R4: template 1z5mA• Site sequence similarity: 0 2Site sequence similarity: 0.2• Normalized (physicochemical) site similarity: 0.78
PIK3R4 MAST4PIK3R4 MAST4
.K.ISNG.GAV.A.K.V.MEYVEGGD.T.K.DN.L.TD
.K.LGST.FKV.K.F.P.FRQYVRDN.D.S.EN.M.TDMAST4PIK3R4
What did we learn?
Expected global trend:Similar sequence results in physicochemical- and fold-similar binding sites
Dissimilar sequences do not always result in different binding sites
Binding site similarities group in “patches” by domain sequence similaritySubtle differences in site relationships among groups and sub-types
Modeling templates influence results: F ki i t l t t i t b t b d l dFor many kinases no experimental structures exist, but can be modeledGrowing body of structural information will optimize the picture
Body of selective Kinase compounds continues to grow
In principle, small molecules can be optimized to differentiate between very similar (sequence) kinases
Conclusions and Next steps
Quantifying similarity relationships within the Kinome can provide insight in early Kinase drug development
Similarity within the Kinome should consider SAR-based and structure-based binding site similarity (v. domain sequence-based similarity)
Next steps include
Analyze trends with respect to DFG-In/DFG-outy pQuantify template effectsInvestigate effects of site size and predicted vs. templated sites
Acknowledgements
• Stephan Schürer
• Kevin Hambly
• Joe Danzer
• Brian Palmer
• Derek Debe
• Aleksandar Poleksic
• Accelrys/Scitegic - Shikha Varma-O'Brien/Ton van Daelen
ContactSteven Muskal
Chief Executive [email protected]