Slas talk 2016
-
Upload
sean-ekins -
Category
Science
-
view
405 -
download
0
Transcript of Slas talk 2016
Ensuring Chemical Structure, Biological Data and Computational Model QualitySean Ekins
1 Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC 27526, USA.2Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA 94010, USA3Collaborations Pharmaceuticals, Inc., 5616 Hilltop Needmore Road, Fuquay-Varina, NC 27526, USA.4Phoenix Nest, Inc. P.O. BOX 150057, Brooklyn NY 11215, USA.5Hereditary Neuropathy Foundation, 401 Park Avenue South, 10th Floor, New York, NY 10016, USA
Email: [email protected] Twitter: collabchem
Outline
Database Quality Molecule structure availability Dispensing Error Simulating Error NIH Probe Quality BIA 10-2474
"Well, here's another nice mess you've gotten me into!"
Summary of Data Rich World What do we trust?
‘Big’ Chemistry Databases 1Billion molecules but how many are
real
It Started for me by Looking at Malaria Data with SMARTS Filters
Med. Chem. Commun., 2010,1, 325-330
Used various filters (Pfizer, Glaxo, Abbott – implemented by University of New Mexico) with antimalarial datasets
Found large percentages of libraries were failing filters
Some filters more stringent than others (Alarm vs Glaxo)
Proposed wider use of such filters
PAINS also appeared in 2010
Circa 2011-2012Structure Quality Issues
Everywhere
NPC Browser http://tripod.nih.gov/npc/
Database released and within days 100’s of errors found in structures
DDT, 16: 747-750 (2011)
Science Translational Medicine 2011
DDT 17: 685-701 (2012)
Circa 2013-now: Finding Structures of Pharma
Molecules is Hard
DDT, 18: 58-70 (2013)
NCATS and MRC made molecule identifiers from several pharmas available without structures.. Continues today
Limits computational repurposing efforts, transparency
DDT editorial Dec 2011
http://goo.gl/dIqhU
This editorial led to collaboration
It’s Not Just Structure Quality we Need to Worry About
How do you Move a Liquid?
Images courtesy of Bing, Tecan
McDonald et al., Science 2008, 322, 917.Belaiche et al., Clin Chem 2009, 55, 1883-1884
Plastic Leaching
Using Literature Data From Different Dispensing Methods to Generate
Computational ModelsFew molecule structures and corresponding datasets are public
Using data from 2 AstraZeneca patents:
Tyrosine kinase EphB4 pharmacophores (Accelrys Discovery Studio) were developed using data for 14 compounds
IC50 determined using different dispensing methods
Analyzed correlation with simple descriptors (SAS JMP)
Calculated LogP correlation with log IC50 data for acoustic dispensing (r2 = 0.34, p < 0.05, N = 14)
Barlaam, B. C.; Ducray, R., WO 2009/010794 A1, 2009Barlaam, B. C.; Ducray, R.; Kettle, J. G., US 7,718,653 B2, 2010
Compound #
5 0.002 0.5534 0.003 0.1467 0.003 0.778
W7b 0.004 0.1528 0.004 0.445
W5 0.006 0.0876 0.007 0.973
W3 0.012 0.049W1 0.014 0.1129 0.052 0.17010 0.064 0.817
W12 0.158 0.250W11 0.207 14.40011 0.486 3.030
3.312.8
1.669.6
6.2
8.2
IC50 Acoustic (µM) IC50 Tips (µM) Ratio IC50Tip/IC50ADE
276.548.7
259.342.5
111.313.7
139.04.2
14 Compounds With Structures and IC50 Data
Barlaam, B. C.; Ducray, R., WO 2009/010794 A1, 2009Barlaam, B. C.; Ducray, R.; Kettle, J. G., US 7,718,653 B2, 2010
-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
log IC50-acoustic
log
IC50
-tips
log IC50 Values for Tip-based Serial Dilution and Dispensing Versus Acoustic Dispensing with Direct Dilution
Shows Poor R2 = 0.246
acoustic technique always gave more potent IC50 values
PLoS ONE 8(5): e62325 (2013)
Hydrophobic
features (HPF)
Hydrogen
bond
acceptor
(HBA)
Hydrogen
bond donor
(HBD)
Observed
vs. predicted
IC50 r
Acoustic mediated process 2 1 1 0.92
Tip-based process 0 2 1 0.80PLoS ONE 8(5): e62325 (2013)
Acoustic Tip based
Tyrosine Kinase EphB4 Pharmacophores
Generated with Discovery Studio (Accelrys)
Cyan = hydrophobic
Green = hydrogen bond acceptor
Purple = hydrogen bond donor
Each model shows most potent molecule mapping
• An additional 12 compounds from AstraZeneca Barlaam, B. C.; Ducray, R., WO 2008/132505 A1, 2008
• 10 of these compounds had data for tip based dispensing and 2 for acoustic dispensing
• Calculated LogP and logD showed low but statistically significant correlations with tip based dispensing (r2= 0.39 p < 0.05 and 0.24 p < 0.05, N = 36)
• Used as a test set for pharmacophores
• The two compounds analyzed with acoustic liquid handling were predicted in the top 3 using the ‘acoustic’ pharmacophore
• The ‘Tip-based’ pharmacophore failed to rank the retrieved compounds correctly
Test set Evaluation of Pharmacophores
PLoS ONE 8(5): e62325 (2013)
Pharmacophores for the tyrosine kinase EphB4 generated from crystal structures in the protein data bank PDB using Discovery Studio version 3.5.5
Automated Receptor-Ligand Pharmacophore Generation
MethodCyan = hydrophobic
Green = hydrogen bond acceptor
Purple = hydrogen bond donor
Grey = excluded volumes
Each model shows most potent molecule mappingBioorg Med Chem Lett
2010, 20, 6242-6245.Bioorg Med Chem Lett 2008, 18, 5717-5721. Bioorg Med Chem Lett 2008, 18, 2776-2780.Bioorg Med Chem Lett 2011, 21, 2207-2211.
PLoS ONE 8(5): e62325 (2013)
• In the absence of structural data, pharmacophores and other computational and statistical models are used to guide medicinal chemistry in early drug discovery.
• Our findings suggest acoustic dispensing methods could improve HTS results and avoid the development of misleading computational models and statistical relationships.
• Automated pharmacophores are closer to pharmacophore generated with acoustic data – all have hydrophobic features – missing from Tip- based pharmacophore model
• Importance of hydrophobicity seen with logP correlation and crystal structure interactions
• Public databases should annotate this meta-data alongside biological data points, to create larger datasets for comparing different computational methods.
Dispensing Issues Summary
PLoS ONE 8(5): e62325 (2013)
Simple computational replica of experiment
Simulate experiments
Understand error
Just need assay protocol, data on imprecision and inaccuracy
Can be used before an assay is ever performed
IPython notebook available
Boot Strapping for Evaluating Dispensing Error
Hanson, Ekins and Chodera, J Comput Aided Mol Des 29: 1073-1086 (2015)
Modeling Error Using the Bootstrap Principle
Simulate Error and bias in dispensing
Hanson, Ekins and Chodera, J Comput Aided Mol Des 29: 1073-1086 (2015)
Modeling Error Using the Bootstrap Principle
Can account for some but not all error
Hanson, Ekins and Chodera, J Comput Aided Mol Des 29: 1073-1086 (2015)
Modeling Error Using the Bootstrap Principle
The number of wells for dilution series can impact error
Try simulation for yourself https://goo.gl/Rku8c5
Hanson, Ekins and Chodera, J Comput Aided Mol Des 29: 1073-1086 (2015)
What is a Probe? Crowdsourcing NIH Probe Evaluation
NIH spent a decade funding HTS efforts as part of the MLSCN and MLPCN
By 2010 $576.6M in funding
Various definitions of a probe
Potency, selectivity, solubility and availability
Little has been done to learn from this work
J Chem Inf Model. (2014) 10:2996-3004
Could One Medicinal Chemist be enough? But do we really need a crowd? Could 1 medicinal chemist be
enough? > 40 years experienceChris Lipinski scored the original 64 cpds
– he was close to medianFound more probes since 2009• Now scored more than 300 NIH Probes
for desirabilityExtensive due diligence
Based on literature (public/private)Chemical Reactivity
J Chem Inf Model. (2014) 10:2996-3004J Med Chem. (2015) 5:2068-76
Contribution of Criteria for Considering Compounds as Undesirable
79% of 322 probes are desirable
J Chem Inf Model. (2014) 10:2996-3004
Simple Property Comparison for NIH Probes
Properties from CDD
Properties from Discovery Studio
Higher MWT, rotatable bonds and heavy atoms is desirable
J Chem Inf Model. (2014) 10:2996-3004
Expert Evaluation vs PAINS and Bad Apple
Desirable probes less likely to be filtered by PAINS or BadApple as promiscuous than those scored as undesirable.
(Fisher's exact test, p>0.0001 for PAINS and p=0.04 for BadApple). J Chem Inf Model. (2014) 10:2996-3004
Since the rule of 5 there has been a considerable focus on more rules – ALERTS, PAINS, QED, BadApple etc
Cross Validation of NIH Probes Machine Learning Models
FCFP_6 descriptors + 8 simple descriptors Leave out 50% x 100 of Bayesian models
5 fold cross validation for n307 models External test sets
J Chem Inf Model. (2014) 10:2996-3004
Comparison of Desirability Scores with Bayesian Learning Predicted Scores and Other Metrics
• The colors on the heat map correspond to the value of the indicated metric for each probe, listed vertically.
• The scale was normalized internally with green corresponding to the optimal condition within each metric.
• Data in CDD Public and can be used with
3 fold cross validationROC = 0.69
J Chem Inf Model. (2014) 10:2996-3004
NIH Probes now Added to Approved Drugs Mobile App
http://goo.gl/PVkQeo
Making the data more accessible as we are drowning in molecules
Ligand efficiency higher in undesirable compounds
Bayesian model preferable in classifying desirable compounds vs other molecule quality metrics
Model could improve probe selection, score libraries, prior to more extensive due diligence
Probes could be scored by additional chemists dependent on needs e.g. bias to CNS, anticancer..
J Chem Inf Model. (2014) 10:2996-3004
Issues Raised in NIH Probes Search
Complexities in finding the NIH MLP probes in PubChem
Identifier and structure searches in CAS SciFinderTM reveals an extreme disclosure
The parallel worlds of commercial and public database disclosure do not completely intersect
Integration and intersections of databases and the need for bioassay ontology adoption
Public Commercial
J Med Chem. (2015) 5:2068-76
The Tragic Case of BIA 10-2474
Crowdsourcing BIA 10-2474 / Target/s -Predictions/Speculations
Nobody confirmed molecule name / structure used in trial in first few days
Predictions with Polypharma, Bayesian models and SEA (Shoichet lab)
Suggested promiscuity, beyond target of FAAH
BIA 10-2474 / Metabolite Predictions-Structure Ultimately Was Not Same
Raises questions on Openness, transparency
Use of software for predictions
Quality and utility of predictive tools
But without information on structure its impossible
Making Predictions Open in Real Time
www.collabchem.com http://cheminf20.org/ http://cdsouthan.blogspot.com/
Recommendations Need more collaboration or openness in terms of
availability of chemistry and biology data. Role of publishers?
Increased communication between the various databases that are both public and proprietary
Companies need to be more transparent structure/ID deposition of Phase I clinical trial data globally
Could lead to more opportunity for discovery / repurposing
Chance to profile compounds with computational tools and flag possible issues
Role of ‘armchair science’ and crowd in raising issues is valid
AcknowledgementsAlex M. Clark
Antony J. Williams
Christopher Southan
John Chodera and Sonya Hanson
NIH NCATS 9R44TR000942-02 “Biocomputation across distributed private datasets to enhance drug discovery”.
Nadia Litterman
Joe Olechno
Christopher A. Lipinski
Barry A. Bunin
Jeremy Yang for the link to BadApple Biovia for providing Discovery Studio
Extra slides
Key Recent ReferencesModeling error in experimental assays using the bootstrap principle: understanding discrepancies between assays using different dispensing technologies.Hanson SM, Ekins S, Chodera JD.J Comput Aided Mol Des. 2015 Dec;29(12):1073-86.
Open Source Bayesian Models. 2. Mining a "Big Dataset" To Create and Validate Models with ChEMBL.Clark AM, Ekins S.J Chem Inf Model. 2015 Jun 22;55(6):1246-60.
Parallel worlds of public and commercial bioactive chemistry data.Lipinski CA, Litterman NK, Southan C, Williams AJ, Clark AM, Ekins S.J Med Chem. 2015 Mar 12;58(5):2068-76.
Computational prediction and validation of an expert's evaluation of chemical probes.Litterman NK, Lipinski CA, Bunin BA, Ekins S.J Chem Inf Model. 2014 Oct 27;54(10):2996-3004.
Dispensing processes impact apparent biological activity as determined by computational and statistical analyses.Ekins S, Olechno J, Williams AJ.PLoS One. 2013 May 1;8(5):e62325.
Extra Resources https://github.com/choderalab/cadd-grc-2013
https://github.com/choderalab/cadd-grc-2013/blob/master/slides/2013-07-21%20CADD%20GRC%20-%20Experimental%20Terror%20-%207%20interleaved.pdf
https://github.com/choderalab/dispensing-errors-manuscript/blob/master/notebooks/echo-vs-tips.ipynb