Numerical Characterization of Structure-Activity Relationships from a Medicinal ... ·...
Transcript of Numerical Characterization of Structure-Activity Relationships from a Medicinal ... ·...
Defining & UsingStructure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
Numerical Characterization ofStructure-Activity Relationships from a
Medicinal Chemists Point of View
Rajarshi Guha
School of InformaticsIndiana University
National Medicinal Chemistry SymposiumPittsburgh, PA15th June, 2008
Defining & UsingStructure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
Structure Activity Relationships
Assumptions
I Similar molecules will have similar activities
I Small changes in structure will lead to small changes inactivity
I One implication is that SAR’s are additive
I This is the basis for QSAR modeling
Martin, Y.C. et al., J. Med. Chem., 2002, 45, 4350–4358
Defining & UsingStructure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
Structure Activity Landscapes
Melanocortin-4 receptor inhibitors
N
N
O
F3C
NH2
NH2
Cl Cl
N
N
O
F3C
NH2
NH
Cl Cl
O
Ki = 39.0 nM Ki = 1.8 nM
N
N
O
NH2
NH
Cl Cl
O
F3C
NH2
N
N
O
F3C
NH2
NH
Cl Cl
O NH2
Ki = 10.0 nM Ki = 1.0 nM
Tran, J.A. et al., Bioorg. Med. Chem. Lett., 2007, 15, 5166–5176
Defining & UsingStructure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
Structure Activity Landscapes
Rugged gorges or rolling hills?
I Small structural changes associated with large activitychanges represent steep slopes in the landscape
I Activity Cliffs
I But traditionally, QSAR assumes gentle slopes
I Machine learning is not very good for special cases
Maggiora, G.M., J. Chem. Inf. Model., 2006, 46, 1535–1535
Defining & UsingStructure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
Characterizing the Landscape
Converting activity cliffs to numbers
I A cliff can be numerically characterized
I Structure-Activity Landscape Index (SALI)
SALIi ,j =|Ai − Aj |
1− sim(i , j)
I Cliffs are characterized by elements of the matrix withvery large values
Guha, R.; Van Drie, J.H., J. Chem. Inf. Model., 2008, 48, 646–658
Defining & UsingStructure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
Visualizing the SALI Matrix
Defining & UsingStructure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
Visualizing SALI Values
Alternatives?
I A heatmap is an easy to understand visualization
I Coupled with brushing, can be a handy tool
I A more flexible approach is to consider a network view ofthe matrix
The SALI graph
I Compounds are nodes
I Nodes i , j are connected if SALIi ,j > X
I Only display connected nodes
Defining & UsingStructure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
Visualizing the SALI Graph
●1
●2●3
●4
●5
●6
●7
●8
●9
●10●12
●13
●15
●17
●18
●19
●20 ●21●22
●23●24
●25
●27
●28
●29
●30
●31
●34●35
●37
●38
●39
●40
●41 ●42
●43
●44
●45
●46
●49
●50
●51
●52
●53
●54
●55
●56●57
●58
●59
●60
●62
●63
●64
●65
●66
●68
●69
●71 ●72 ●73
●75
●76
I Nodes are ordered such that the tail node in an edge haslower activity than the head node
Defining & UsingStructure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
Better Visualization
SALIViewer
I Java application for generating and visualizing SALIgraphs
I Create SALI graphs from SMILES and activity data,using the CDK fingerprints
I Easily examine SALI graphs at different cutoffs
I Provides 2D depictions for nodes and edges
I Generate SALI curves
http://cheminfo.informatics.indiana.edu/~rguha/code/java/salivis
Defining & UsingStructure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
Better Visualization - SALIViewer
Defining & UsingStructure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
Varying Fingerprint Methods
BCI 1052 bit
Tanimoto Similarity
Den
sity
0.70 0.75 0.80 0.85 0.90 0.95 1.00
02
46
8MACCS 166 bit
Tanimoto Similarity
Den
sity
0.70 0.75 0.80 0.85 0.90 0.95 1.00
02
46
8
CDK 1024 bit
Tanimoto Similarity
Den
sity
0.6 0.7 0.8 0.9 1.0
02
46
8
I Shorter fingerprints will lead to more “similar” pairs
I Requires a higher cutoff to focus on significant cliffs
Defining & UsingStructure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
Varying the Similarity Metric
I The similarity metric does not affect the SALI values
Defining & UsingStructure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
SALI Graphs & Predictive Models
I The graph view allows us to view SAR’s and identifytrends easily
I The aim of a QSAR model is to encode SAR’s
I Traditionally, we consider the quality of a model in termsof RMSE or R2
I But in general, we’re not as interested in RMSE’s as weare in whether the model predicted something as moreactive than something else
I What we want to have is the correct orderingI We assume the model is statistically significant
Defining & UsingStructure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
SALI Graphs & Predictive Models
Measuring model quality
I A QSAR model should easily encode the “rolling hills”
I A good model captures the most significant cliffs
I Can be formalized as
How many of the edge orderings of a SALIgraph does the model predict correctly?
I Define S(X ), representing the number of edges correctlypredicted for a SALI network at a threshold X
I Repeat for varying X and obtain the SALI curve
Defining & UsingStructure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
SALI Graphs & Predictive Models
Measuring model quality
I A QSAR model should easily encode the “rolling hills”
I A good model captures the most significant cliffs
I Can be formalized as
How many of the edge orderings of a SALIgraph does the model predict correctly?
I Define S(X ), representing the number of edges correctlypredicted for a SALI network at a threshold X
I Repeat for varying X and obtain the SALI curve
Defining & UsingStructure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
SALI Curves - An Example
0.0 0.2 0.4 0.6 0.8 1.0
−1.
0−
0.5
0.0
0.5
1.0
X
S(X
)
3−descriptor5−descriptorScrambled 3−descriptor
Defining & UsingStructure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
SALI Curves & Model Comparison
I Considered four datasets
I Developed linear regression models, using exhaustivesearch for feature selection
I Identify three models for each datasetI Minimum RMSE (“best”)I Median RMSEI Maximum RMSE (“worst”)
I Generate SALI curves for each model and summarize bydataset
Guha, R.; Van Drie, J.H., J. Chem. Inf. Model., submitted
Defining & UsingStructure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
SALI Curves & Model Comparison
0.0 0.2 0.4 0.6 0.8 1.0
−1.
0−
0.5
0.0
0.5
1.0
X
S(X
)
Min−RMSEMedian−RMSEMax−RMSE
0.0 0.2 0.4 0.6 0.8 1.0
−1.
0−
0.5
0.0
0.5
1.0
X
S(X
)
Min−RMSEMedian−RMSEMax−RMSE
PDGFR (3) Artemisinin (3)
0.0 0.2 0.4 0.6 0.8 1.0
−1.
0−
0.5
0.0
0.5
1.0
X
S(X
)
Min−RMSEMedian−RMSEMax−RMSE
0.0 0.2 0.4 0.6 0.8 1.0
−1.
0−
0.5
0.0
0.5
1.0
X
S(X
)Min−RMSEMedian−RMSEMax−RMSE
Melanocortin (6) Benzodiazepine (5)
Defining & UsingStructure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
SALI Curves & Model Comparison
I The initial and final portions ofthe curve are of interest
I It’s also useful to summarizethe whole curve
I We evaluate the area betweenthe curve and the X-axis (SCI)
I -1 ≤ SCI ≤ 1 0.0 0.2 0.4 0.6 0.8 1.0−
1.0
−0.
50.
00.
51.
0
X
S(X
)
SCI = 0.12
SALI Curve Integral
Defining & UsingStructure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
SALI Curves & Model Comparison
PDGFR Artemisinin Melanocortin Benzodiazepine
SC
I
−0.
10.
00.
10.
20.
30.
40.
50.
6Min−RMSEMedian−RMSEMax−RMSE
Defining & UsingStructure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
Examining Any Type of Model . . .
I Previous examples make use of predicted values fromQSAR models
I We can consider any “prediction” that is supposed totrack observed activity
I RanksI Energies
I Allows us to apply this approach to any type ofcomputational model that predicts something
I DockingI CoMFAI Pharmacophore
Defining & UsingStructure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
Docking & CoMFA Models
0.0 0.2 0.4 0.6 0.8 1.0
−1.
0−
0.5
0.0
0.5
1.0
X
S(X
)
SCI = 0.95
0.0 0.2 0.4 0.6 0.8 1.0−
1.0
−0.
50.
00.
51.
0
X
S(X
)SCI = 0.99
Docking CoMFA
I Not surprising that 3D models capture more cliffs
I The CoMFA model is nearly perfect!
Holloway, K. et al, J. Med. Chem., 1995, 38, 305–317
Cavalli, A. et al, J. Med. Chem., 2002, 45, 3844–3853
Defining & UsingStructure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
Comparing Landscapes
I The SALI curve is a function ofI datasetI descriptor space
I We can quantify a descriptor spaces ability to encodethe structure-activity landscape using SALI graphs
I What is the size of the graph as a function of SALIcutoff?
I The SALI approach allows us to investigate molecularrepresentations that may not be directly accessible
I Work in progress
Defining & UsingStructure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
Comparing Landscapes
I The SALI curve is a function ofI datasetI descriptor space
I We can quantify a descriptor spaces ability to encodethe structure-activity landscape using SALI graphs
I What is the size of the graph as a function of SALIcutoff?
I The SALI approach allows us to investigate molecularrepresentations that may not be directly accessible
I Work in progress
Defining & UsingStructure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
Comparing Landscapes
I The SALI curve is a function ofI datasetI descriptor space
I We can quantify a descriptor spaces ability to encodethe structure-activity landscape using SALI graphs
I What is the size of the graph as a function of SALIcutoff?
I The SALI approach allows us to investigate molecularrepresentations that may not be directly accessible
I Work in progress
Defining & UsingStructure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
Comparing Landscapes
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
SALI Cutoff
Nor
mal
ized
Num
ber
of C
liffs
FingerprintBest 3−DescriptorWorst 3−Descriptor
A Type 2 SALI curve for the PDGFR dataset, comparing 3different molecular representations
Defining & UsingStructure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
What’s Next?
I SALI graphs and curves represent a framework forexploring structure-∗ landscapes
Open questions
I Weighted SALI graphs (ADMET, synthetic feasibility)
I Is it correct to identify cliffs using fingerprints, and thenpredict cliffs using different descriptors?
I Can we use SALI curves to compare 3D and 2Ddescriptor spaces?
I Can we use SCI for feature selection?
Defining & UsingStructure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
Conclusions
I The SALI is an effective way to numerically encodeactivity cliffs
I The network view of these values allows us to exploreSAR’s in an intuitive way
I Using the SALI curve allows us to compare predictivemodels in a manner that is intuitive for a medicinalchemist
Defining & UsingStructure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
Acknowledgments
I John Van Drie
Defining & UsingStructure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
Defining & UsingStructure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
Making Use of the SALI Graph
I A little difficult with a non-interactive graph
I We can investigate a series of transformations thatincrease (or decrease) activity
I Identify two types of SAR’sI BroadI DetailedI Depends on what cutoff we choose
I These correspond somewhat to the continuous anddiscontinuous SAR’s described by Peltason et al.
Peltason, L. et al., J. Med. Chem., 2007, 50, 5571–5578
Defining & UsingStructure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
Glucocorticoid Inhibitors
06−20 06−16
06−30 07−12
07−17
06−42 06−24 06−44 07−30
07−38
07−4106−21 06−32 06−41
06−43 07−13 07−14
07−19 07−20 07−23
07−35 07−36 07−37 07−39 07−42
I 62 dihydroquinoline derivatives
I IC50’s reported, some values were censored
I 50% SALI graph generated using 1052 bit BCIfingerprints
Takahashi, H. et al, Bioorg. Med. Chem. Lett., 2007, 17, 5091–5095
Defining & UsingStructure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
Glucocorticoid Inhibitors
I Moving from ally or phenylethylto ethyl causes a 6-fold increasein activity
I Reducing bulk at this positionseems to improve activity
I Pretty broad conclusion
I But ethyl is not much smallerthan allyl
I We need more detail
NH
O
O
07-20, 2000 nM
NH
O
O
07-23, 2000 nM
NH
O
O
07-17, 355 nM
Defining & UsingStructure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
Glucocorticoid Inhibitors
●06−20
●06−23
●06−16 ●06−36
●06−40●06−41 ●06−30 ●06−31 ●07−12●07−15
●07−17
●07−18
●07−19 ●07−20 ●07−23
●07−22●06−42●06−24
●06−44
●07−30●07−38
●07−41●06−21 ●06−32 ●06−37●06−39
●06−43 ●07−13 ●07−14●07−21●07−25●07−26 ●07−27 ●07−29
●07−35●07−36
●07−37
●07−39
●07−42
I Generated using a 30% cutoff
Defining & UsingStructure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
Glucocorticoid Inhibitors
●06−20
●06−23
●06−16 ●06−36
●06−40●06−41 ●06−30 ●06−31 ●07−12●07−15
●07−17
●07−18
●07−19 ●07−20 ●07−23
●07−22●06−42●06−24
●06−44
●07−30●07−38
●07−41●06−21 ●06−32 ●06−37●06−39
●06−43 ●07−13 ●07−14●07−21●07−25●07−26 ●07−27 ●07−29
●07−35●07−36
●07−37
●07−39
●07−42
I Generated using a 30% cutoff
Defining & UsingStructure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
Glucocorticoid Inhibitors
NH
O
O
07-15, 2000 nM
I Suggests that electron density isalso important
I Lower π density possibly correlatesto increased activity
I Confirmed by 07-23 → 07-18
I 07-15 → 07-17 is interesting sincethe change increases the bulk
NH
O
O
07-20, 2000 nM
NH
O
O
07-18, 710 nM
NH
O
O
07-17, 355 nM
Defining & UsingStructure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
Glucocorticoid Inhibitors
I These observations match those made by Takahashiet al.
I More detailed graphs exhibit longer paths that focus onthe bulk of side chains at the C4-α position
I A number of paths consider changes to the epoxidesubstitution
I Usually of length 1I Highlights the fact that bulk at the C4 α has greater
impact on activity than epoxide substitutions
I The SALI graph stresses the non-linearity of the SAR
Defining & UsingStructure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
SALI Curves - Control Experiments
0.0 0.2 0.4 0.6 0.8 1.0
−1.
0−
0.5
0.0
0.5
1.0
X
S(X
)
0.0 0.2 0.4 0.6 0.8 1.0
−1.
0−
0.5
0.0
0.5
1.0
X
S(X
)
No noise1050100200
Scrambling
I Scramble the Y-variable andrebuild the model
I Evaluate the SALI curve
I Repeat 50 times and takethe mean of the counts for agiven cutoff
Noise
I Add uniform noise to eachdescriptor, rebuild the model
I We expect little variation inthe plateau