Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s evaluation of the NIH...
-
Upload
sean-ekins -
Category
Science
-
view
293 -
download
3
description
Transcript of Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s evaluation of the NIH...
Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s evaluation of the NIH
chemical probes
SaaS
Easy to use
Used by AcademiaIndustry, Biotech
Private
Selective collaboration
100’s of published datasets
Copyright © 2013 All Rights Reserved Collaborative Drug Discovery
MM4TB: 25 organizations
New
Old
Neuroscience
Kinetoplastid Drug Development
Consortium
NIH spent a decade funding HTS efforts as part of the MLSCN and MLPCN
By 2010 $576.6M in funding Various definitions of a probe Potency, selectivity, solubility and availability Little has been done to learn from this work
Lajiness et al. - 13 Chemists assessed 22,000 compounds (2000 each) for drug or lead likeness. Not consistent in rejecting undesirable compounds
(J Med Chem 2004, 47: 4891-6)
Hack et al.- 145 chemists to fill holes in a screening library (J Chem Inf Model 2012; 51, 3275-86)
Kutchukian et al. – medicinal chemists surveyed in selecting fragments for a lead – lack of consensus in compound selection
(PLOS ONE 2012, 7, e48476)
Since the rule of 5 there has been a considerable focus on more rules –ALERTS, PAINS, QED, BadApple etc
But do we really need a crowd? Could 1 medicinal chemist be enough? > 40 years experience
Chris Lipinski scored the original 64 cpds – he was close to median
Found more probes since 2009 Now scored more than 300 NIH Probes for
desirability
Extensive due diligence
▪ Based on literature (public/private)
▪ Chemical Reactivity
79% of 322 probes are desirable
ML010
(CID 17757274)
valsartan
(CID 60846) CAS1164083-19-5US20120040982
(CID 57498937)
ML160
(CID 824820)
representing molecules of different classes from public and commercial databases
Properties from CDD
Properties from Discovery Studio
Higher Mwt, rotatable bonds and heavy atoms is desirable
Yellow - desirable
Blue - undesirable
Yellow – chemical probes
Blue - Microsource spectrum compounds
Desirable probes less likely to be filtered by PAINS or BadApple as promiscuous than those scored as undesirable.
(Fisher's exact test, p>0.0001 for PAINS and p=0.04 for BadApple).
322 NIH MLP probes
clustered into 44 groups using ECFP_6 fingerprints
using a Tanimotosimilarity threshold of >0.11 for cluster membership.
Blue - desirable Red – undesirable
Circle area is proportional to cluster size, and singletons are represented as a dot.
Drug discovery is repetitive and there are 1000s of diseases
Drug discovery is high risk
Do we need robots or just smarter programs that discover the ideas we test?
What would happen if we could model Chris’s decisions
Potential for other non medicinal chemists to benefit Streamline scoring compounds, save time
NIH probes
FCFP_6 descriptors + 8 simple descriptors Leave out 50% x 100 of Bayesian models
5 fold cross validation for n307 models
• The colors on the heat map correspond to the value of
the indicated metric for each probe, listed vertically.
• The scale was normalized internally with green
corresponding to the optimal condition within each
metric.
MoDELS RESIDE IN PAPERSNOT ACCESSIBLE…THIS IS UNDESIRABLE
How do we share them?How do we use Them?
Open Extended Connectivity Fingerprints
ECFP_6 FCFP_6
Collected, deduplicated, hashed
Sparse integers
• Invented for Pipeline Pilot: public method, proprietary details
• Often used with Bayesian models: many published papers
• Built a new implementation: open source, Java, CDK– stable: fingerprints don't change with each new toolkit release
– well defined: easy to document precise steps
– easy to port: already migrated to iOS (Objective-C) for TB Mobile app
• Provides core basis feature for CDD open source model service
Data + One Click =
Uses Bayesian algorithm and FCFP_6 fingerprints
Rebuilt the n307 model in CDD Models
3 fold cross validation
ROC = 0.69
http://goo.gl/PVkQeo
Making the data more accessible as we are drowning in molecules
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
log database size (millions)
Ligand efficiency higher in undesirable compounds
Bayesian model preferable in classifying desirable compounds vs other molecule quality metrics
Model could improve probe selection, score libraries, prior to more extensive due diligence
Probes could be scored by additional chemists dependent on needs e.g. bias to CNS, anticancer..
CNS
Anticancer
NIH probes
Complexities in finding the NIH MLP probes in PubChem
Identifier and structure searches in CAS SciFinderTM
reveals an extreme disclosure
The parallel worlds of commercial and public database disclosure do not completely intersect
Integration and intersections of databases and the need for bioassay ontology adoption
Public Commercial
Need more collaboration or opennessin terms of availability of chemistryand biology data.
Increased communication betweenthe various databases that are bothpublic and proprietary
Major hurdles exist to prevent thisfrom happening - too muchcommercial value to proprietarydatabases
Clearly CAS and the othercommercial vendors have to takenotice
We acknowledge that the Bayesian model software within CDD was developed with support from Award Number 9R44TR000942-02 “Biocomputation across distributed private datasets to enhance drug discovery” from the NCATS.
SE gratefully acknowledges Biovia (formerly Accelrys) for providing Discovery Studio.
SE thanks Jeremy Yang for the link to BadApple
Litterman NK, Lipinski CA, Bunin BA, Ekins S. Computational Prediction and Validation of an Expert's Evaluation of Chemical Probes. J Chem Inf Model. 2014 Oct 27;54(10):2996-3004. doi: 10.1021/ci500445u. Epub 2014 Oct 7.
Christopher A. Lipinski, Nadia Litterman, Christopher Southan, Antony J. Williams, Alex M. Clark and Sean Ekins, The parallel worlds of public and commercial bioactive chemistry data J Med Chem. Epub 2014 Nov 21.