Predicting Pharmacology

24
WP van Hoorn, Feb 2006 1 Predicting Pharmacology Willem van Hoorn Pfizer Global Research & Development Sandwich UK [email protected] Pipeline Pilot UGM, San Diego, Mar 2006

description

A unified database of structure/activity data is presented. This database was used to derive activity / classification models with Bayesian statistics and Linear Discriminant Analysis. This work has been published: http://www.nature.com/nbt/journal/v24/n7/abs/nbt1228.html

Transcript of Predicting Pharmacology

Page 1: Predicting Pharmacology

WP van Hoorn, Feb 20061

Predicting Pharmacology

Willem van Hoorn

Pfizer Global Research & Development

Sandwich

UK

[email protected]

Pipeline Pilot UGM, San Diego, Mar 2006

Page 2: Predicting Pharmacology

WP van Hoorn, Feb 20062

Willem van Hoorn

Standing on the Shoulders of Giants

Gaia Paolini

Richard Shapland

Andrew Hopkins

Jonathan Mason

Page 3: Predicting Pharmacology

WP van Hoorn, Feb 20063

The Work of Giants

4.8 M structures

275k active compounds

600k activities (IC50, etc)

3k targets

800 human targets

InpharmaticaStARLITe

CerepBioprint

ThomsonIDDB

Pfizer in house

• Oracle / DayCard cartridge• Structures stored as smiles• Pipeline Pilot:• Canonical tautomers, salt stripping, etc• Access: ODBC components + web service• Pfizer compound structure retrieval

Unified DB

Page 4: Predicting Pharmacology

WP van Hoorn, Feb 20064

Why Giants Are Required

Page 5: Predicting Pharmacology

WP van Hoorn, Feb 20065

Unified DB

Unified Database as Starting Point

Bayesian Learn Molecular Categories

Predicting activities

Linear Discriminant Analysis (LDA)

Predicting gene families

Polypharmacology interaction network

Page 6: Predicting Pharmacology

WP van Hoorn, Feb 20066

MetalloproteasesMetalloproteases

Cysteine proteasesCysteine proteases

Serine proteasesSerine proteases

PhosphodiesterasesPhosphodiesterases Aminergic GPCRsAminergic GPCRs

Peptide GPCRsPeptide GPCRs

GPCRs (others: classes A, B & C)GPCRs (others: classes A, B & C)

Enzymes Enzymes (hydrolases, transferases, oxidoreductases & others)(hydrolases, transferases, oxidoreductases & others)

Ion ChannelsIon Channels

Nuclear hormoneNuclear hormonereceptorsreceptors

Aspartyl proteasesAspartyl proteases

KinasesKinases

MiscellaneousMiscellaneous

Polypharmacology Network From Binding Data

Node : targetEdge : compound

Page 7: Predicting Pharmacology

WP van Hoorn, Feb 20067

Deriving Multi-Category Bayesian Model

Unified DB

238k actives (≤ 10 µM),human target, Mw < 1000,pass reactivity filter,≥ 10 actives / target

FCFP_6

90% / 214k 10% / 23,792

55,781 activities

698 models

Page 8: Predicting Pharmacology

WP van Hoorn, Feb 20068

Assessing the Predictions of the Random Test Set

Large number of predictions:• 23,792 * 698 ~ 16.6M• 55,781 activities, rest unknown presumed inactive• Interpretation of Bayesian score?• Score ≥ cut-off : active, rest inactive• # predicted actives = F(cut-off)

Comparison with random:• For each cut-off: calculate number of predicted actives• Generate exactly same number of random predicted actives

Page 9: Predicting Pharmacology

WP van Hoorn, Feb 20069

50

Assessing the Predictions of the Random Test Set

58,428 predictions / 17,210 compounds16,281 compounds ≥1 correct prediction31,600 true positives (random: 292)Enrichment ~ 100 fold26,828 false positives (random: 55,489)24,181 false negatives

Page 10: Predicting Pharmacology

WP van Hoorn, Feb 200610

Nuclear hormone receptorsNuclear hormone receptors

Ion ChannelsIon Channels

PhosphodiesterasesPhosphodiesterases

AminergicAminergicGPCRsGPCRs

PeptidePeptideGPCRsGPCRs

GPCRs (others)GPCRs (others)

Enzymes Enzymes (others)(others)

True positive prediction

False positive prediction

Predicted Polypharmacology Network At Bayesian Cut-off 50

Page 11: Predicting Pharmacology

WP van Hoorn, Feb 200611

Predicted Polypharmacology Network At Bayesian Cut-off 50

• At confidence level 50, most predictions are intra gene class• Quite a few false positive connections coincide with true positives• Exceptions: Ion Channels, Enzymes-others• Although the prediction is wrong, the connection is right?• Or the prediction is right and the connection is false negative (not measured?)• Most interesting part of predicted connections to test• Compare to Peter Willett’s work in similarity searches:

(Next) Nearest neighbours of inactive nearest neighbours are equal likely to

be active as nearest neighbours themselves: J. Med. Chem. 2005, 48, 7049

Page 12: Predicting Pharmacology

WP van Hoorn, Feb 200612

A More Challenging Test Set: Cerep Bioprint

Unified DB

238k actives (≤ 10 µM),human target, Mw < 1000,pass reactivity filter,≥ 10 actives / target

FCFP_6

237k

Bioprint997 compounds316 targets

694 models

Page 13: Predicting Pharmacology

WP van Hoorn, Feb 200613

A More Challenging Test Set: Cerep Bioprint

50

720 predictions / 291 compounds210 compounds ≥1 correct prediction433 true positives (random: 17)Enrichment ~ 25 fold287 false positives (random: 55,489)12,281 false negatives

Page 14: Predicting Pharmacology

WP van Hoorn, Feb 200614

Another Look At The Same Data

0

36,222 predictions 6,121 true positives30,101 false positives6,593 false negatives48% of actives in 11% of dataPlus 378 extra predicted targets

Page 15: Predicting Pharmacology

WP van Hoorn, Feb 200615

A More Challenging Test Set: Cerep Bioprint

• Bioprint harder to predict than 10% random test set • Data can be interpreted depending on need• Few high confidence predictions, appropriate for triaging HTS hits• Many low confidence predictions, appropriate for risk assessment of lead

Page 16: Predicting Pharmacology

WP van Hoorn, Feb 200616

length

height

left rim bottom rim

H. LohningerTeach/Me Data Analysishttp://www.vias.org/tmdatanaleng

Linear Discriminant Analysis

diagonal

NOTE Length Left Right Bottom Top Diagonal GenuineBN1 214.8 131.0 131.1 9.000 9.700 141.0 true

BN2 214.6 129.7 129.7 8.100 9.500 141.7 true

BN3 214.8 129.7 129.7 8.700 9.600 142.2 true

BN4 214.8 129.7 129.6 7.500 10.40 142.0 true

BN5 215.0 129.6 129.7 10.40 7.700 141.8 true

BN6 215.7 130.8 130.5 9.000 10.10 141.4 true

BN7 215.5 129.5 129.7 7.900 9.600 141.6 true

BN8 214.5 129.6 129.2 7.200 10.70 141.7 true

BN9 214.9 129.4 129.7 8.200 11.00 141.9 true

BN10 215.2 130.4 130.3 9.200 10.00 140.7 true

…. …. …. …. …. …. …. ….

BN195 214.9 130.3 130.5 11.60 10.60 139.8 false

BN196 215.0 130.4 130.3 9.900 12.10 139.6 false

BN197 215.1 130.3 129.9 10.30 11.50 139.7 false

BN198 214.8 130.3 130.4 10.60 11.10 140.0 false

BN199 214.7 130.7 130.8 11.20 11.20 139.4 false

BN200 214.3 129.9 129.9 10.20 11.50 139.6 false

• Similar to PCA which tries to represent classes• Tries to discover what distinguishes classes• Compare letters: O and Q• PCA focuses on circle, LDA on tail• Web example: distinguish between genuine and false banknotes• Training set: 200 banknotes, 100 genuine / 100 forgeries

Page 17: Predicting Pharmacology

WP van Hoorn, Feb 200617

Predicting Forgeries with LDA and Bayesian

NOTE Length Left Right Bottom Top Diagonal BankNotes LD1

BN1 215.1 130.0 129.8 9.100 10.20 141.5 true 2.501

BN2 214.7 130.7 130.8 11.20 11.20 139.4 false -4.561

BN3 214.3 129.9 129.9 10.20 11.50 139.6 false -3.390

BN4 214.7 130.0 129.4 7.800 10.00 141.2 true 4.060

NOTE Length Left Right Bottom Top Diagonal BankNotesBayes

BN1 215.1 130.0 129.8 9.100 10.20 141.5 1.992

BN2 214.7 130.7 130.8 11.20 11.20 139.4 -6.611

BN3 214.3 129.9 129.9 10.20 11.50 139.6 -6.341

BN4 214.7 130.0 129.4 7.800 10.00 141.2 1.771

LDA

Bayesian

Page 18: Predicting Pharmacology

WP van Hoorn, Feb 200618

Predicting Gene Class by Physical Properties

Compounds binding to different gene classes posses different

physical property distributions:

Can this be used to predict gene class from physical properties alone?

How does LDA compare to Bayesian?

Mw clogP

Page 19: Predicting Pharmacology

WP van Hoorn, Feb 200619

Predicting Gene Class by Physical Properties

Unified DB

148k actives (≤ 10 µM),human target, Mw < 1000,pass reactivity filter,binding to single target class only

Aminergic GPCRsAspartyl ProteasesCysteine ProteasesEnzymes- othersGPCRs Class A- othersGPCRs Class BGPCRs Class CHydrolasesIon Channels- Ligand_GatedIon Channels- othersKinases- othersMetalloproteasesNuclear hormone receptorsOthersOxidoreductasesPDEsPeptide GPCRsProtein KinasesSerine ProteasesTransferases

20 Gene Classes:

Page 20: Predicting Pharmacology

WP van Hoorn, Feb 200620

Molecular_WeightNum_H_Acceptors Num_H_DonorsNum_RotatableBondsMolecular_PolarSurfaceAreaNo_IonCenters Molecular_SolubilityMolecular_SurfaceAreaClogP *Andrews*

Predicting Gene Class by Physical Properties

10 Descriptors:

147,534

118,118

29,416

Page 21: Predicting Pharmacology

WP van Hoorn, Feb 200621

Predicting Gene Class by Physical Properties

29416 (9025)1 (0)

349 (137)5309 (1423)8123 (2811)

791 (248)888 (241)2638 (499)482 (163)279 (74)

0 (0)152 (59)47 (0)0 (0)0 (0)1 (0)

1268 (366)1969 (321)

75 (28)1180 (613)

5864 (2042)LDA (correct)

29416 (5631)1012 (125)792 (133)341 (147)

2809 (1135)2176 (392)1437 (329)

90 (47)2083 (345)1626 (293)1545 (100)964 (104)

2109 (280)350 (42)

3346 (146)2340 (115)962 (309)

1 (0)1464 (73)

1670 (614)2299 (902)

Bayes (correct)

29416 (1447)1460 (36)1526 (53)

1488 (148)1461 (236)1468 (56)1492 (54)

1465 (167)1459 (53)1515 (47)1430 (11)1441 (29)1448 (52)1461 (15)1438 (29)1477 (14)

1524 (117)1451 (135)1470 (13)1479 (29)

1463 (153)Random (correct)

29416727913

292750271178138533361238849198594764286339226

26472574252728

3228ExperimentTarget class

TotalTransferasesSerine ProteasesProtein KinasesPeptide GPCRsPDEsOxidoreductasesOthersNuclear hormone receptorsMetalloproteasesKinases- othersIon Channels- othersIon Channels- Ligand_GatedHydrolasesGPCRs Class CGPCRs Class BGPCRs Class A- othersEnzymes- othersCysteine ProteasesAspartyl ProteasesAminergic GPCRs

Page 22: Predicting Pharmacology

WP van Hoorn, Feb 200622

Predicting Gene Class by Physical Properties

• Enrichment over random: LDA ~ 6 fold, Bayes ~4 fold• Bayesian: more equal spread• LDA: some baskets contain too many eggs?• Some of the misclassifications might be true: many missing values• Unbiased and fast method to (pre)screen large compound collection• Compare with other unbiased methods: docking, pharmacophore search

Page 23: Predicting Pharmacology

WP van Hoorn, Feb 200623

Conclusions

• Data from heterogeneous sources can be combined in one knowledge base• Predictive Bayesian models can be derived from it• Models are adaptive, regenerate to incorporate latest experimental results• Models are not replacement for experiment• Models can lead to substantially lower screening investment• Drug design compared to supermarket stock inventory:

Just in time delivery vs. just enough screening

• Don’t discount simple molecular properties

Page 24: Predicting Pharmacology

WP van Hoorn, Feb 200624