Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s evaluation of the NIH...

Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s evaluation of the NIH

chemical probes

SaaS

Easy to use

Used by AcademiaIndustry, Biotech

Private

Selective collaboration

100’s of published datasets

Copyright © 2013 All Rights Reserved Collaborative Drug Discovery

MM4TB: 25 organizations

New

Old

Neuroscience

Kinetoplastid Drug Development

Consortium

NIH spent a decade funding HTS efforts as part of the MLSCN and MLPCN

By 2010 $576.6M in funding Various definitions of a probe Potency, selectivity, solubility and availability Little has been done to learn from this work

Lajiness et al. - 13 Chemists assessed 22,000 compounds (2000 each) for drug or lead likeness. Not consistent in rejecting undesirable compounds

(J Med Chem 2004, 47: 4891-6)

Hack et al.- 145 chemists to fill holes in a screening library (J Chem Inf Model 2012; 51, 3275-86)

Kutchukian et al. – medicinal chemists surveyed in selecting fragments for a lead – lack of consensus in compound selection

(PLOS ONE 2012, 7, e48476)

Since the rule of 5 there has been a considerable focus on more rules –ALERTS, PAINS, QED, BadApple etc

But do we really need a crowd? Could 1 medicinal chemist be enough? > 40 years experience

Chris Lipinski scored the original 64 cpds – he was close to median

Found more probes since 2009 Now scored more than 300 NIH Probes for

desirability

Extensive due diligence

▪ Based on literature (public/private)

▪ Chemical Reactivity

79% of 322 probes are desirable

ML010

(CID 17757274)

valsartan

(CID 60846) CAS1164083-19-5US20120040982

(CID 57498937)

ML160

(CID 824820)

representing molecules of different classes from public and commercial databases

Properties from CDD

Properties from Discovery Studio

Higher Mwt, rotatable bonds and heavy atoms is desirable

Yellow - desirable

Blue - undesirable

Yellow – chemical probes

Blue - Microsource spectrum compounds

Desirable probes less likely to be filtered by PAINS or BadApple as promiscuous than those scored as undesirable.

(Fisher's exact test, p>0.0001 for PAINS and p=0.04 for BadApple).

322 NIH MLP probes

clustered into 44 groups using ECFP_6 fingerprints

using a Tanimotosimilarity threshold of >0.11 for cluster membership.

Blue - desirable Red – undesirable

Circle area is proportional to cluster size, and singletons are represented as a dot.

Drug discovery is repetitive and there are 1000s of diseases

Drug discovery is high risk

Do we need robots or just smarter programs that discover the ideas we test?

What would happen if we could model Chris’s decisions

Potential for other non medicinal chemists to benefit Streamline scoring compounds, save time

NIH probes

FCFP_6 descriptors + 8 simple descriptors Leave out 50% x 100 of Bayesian models

5 fold cross validation for n307 models

• The colors on the heat map correspond to the value of

the indicated metric for each probe, listed vertically.

• The scale was normalized internally with green

corresponding to the optimal condition within each

metric.

MoDELS RESIDE IN PAPERSNOT ACCESSIBLE…THIS IS UNDESIRABLE

How do we share them?How do we use Them?

Open Extended Connectivity Fingerprints

ECFP_6 FCFP_6

Collected, deduplicated, hashed

Sparse integers

• Invented for Pipeline Pilot: public method, proprietary details

• Often used with Bayesian models: many published papers

• Built a new implementation: open source, Java, CDK– stable: fingerprints don't change with each new toolkit release

– well defined: easy to document precise steps

– easy to port: already migrated to iOS (Objective-C) for TB Mobile app

• Provides core basis feature for CDD open source model service

Data + One Click =

Uses Bayesian algorithm and FCFP_6 fingerprints

Rebuilt the n307 model in CDD Models

3 fold cross validation

ROC = 0.69

http://goo.gl/PVkQeo

Making the data more accessible as we are drowning in molecules

-1

-0.5

0

0.5

1

1.5

2

2.5

3

3.5

log database size (millions)

http://goo.gl/PVkQeo

Ligand efficiency higher in undesirable compounds

Bayesian model preferable in classifying desirable compounds vs other molecule quality metrics

Model could improve probe selection, score libraries, prior to more extensive due diligence

Probes could be scored by additional chemists dependent on needs e.g. bias to CNS, anticancer..

CNS

Anticancer

NIH probes

Complexities in finding the NIH MLP probes in PubChem

Identifier and structure searches in CAS SciFinderTM

reveals an extreme disclosure

The parallel worlds of commercial and public database disclosure do not completely intersect

Integration and intersections of databases and the need for bioassay ontology adoption

Public Commercial

Need more collaboration or opennessin terms of availability of chemistryand biology data.

Increased communication betweenthe various databases that are bothpublic and proprietary

Major hurdles exist to prevent thisfrom happening - too muchcommercial value to proprietarydatabases

Clearly CAS and the othercommercial vendors have to takenotice

We acknowledge that the Bayesian model software within CDD was developed with support from Award Number 9R44TR000942-02 “Biocomputation across distributed private datasets to enhance drug discovery” from the NCATS.

SE gratefully acknowledges Biovia (formerly Accelrys) for providing Discovery Studio.

SE thanks Jeremy Yang for the link to BadApple

Litterman NK, Lipinski CA, Bunin BA, Ekins S. Computational Prediction and Validation of an Expert's Evaluation of Chemical Probes. J Chem Inf Model. 2014 Oct 27;54(10):2996-3004. doi: 10.1021/ci500445u. Epub 2014 Oct 7.

Christopher A. Lipinski, Nadia Litterman, Christopher Southan, Antony J. Williams, Alex M. Clark and Sean Ekins, The parallel worlds of public and commercial bioactive chemistry data J Med Chem. Epub 2014 Nov 21.

Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s evaluation of the NIH...

Science

Transcript of Medicinal Chemistry Due Diligence: Computational Predictions of an expert’s evaluation of the NIH...