Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types...

52
Big Data in Chemistry Institute of Structural Biology (STB), Helmholtz Zentrum München (HMGU) Dr. Igor V. Tetko Neuherberg, 18 October 2016

Transcript of Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types...

Page 1: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Big Data in Chemistry

Institute of Structural Biology (STB), Helmholtz Zentrum München (HMGU)

Dr.IgorV.Tetko

Neuherberg,18October2016

Page 2: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Outline

•  Sources•  ExampleofBigData•  Dataqualityandcomplexity•  Annota?onoflargevirtualsets•  Deeplearning•  Securedatasharing•  Outlook

Page 3: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Big Data Sources

DowereallyhaveBigDatainchemistry?Whatkindoflargedatadowehave?

Page 4: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Big Data definition

Bigdataisatermfordatasetsthataresolargeorcomplexthattradi?onaldataprocessingapplica?ons

areinadequate(Wikipedia)

Page 5: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Large Chemical Database

100,000

1,000,000

10,000,000

100,000,000

1,000,000,000

Compounds

Exp.Facts

?

Page 6: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Data Types

Database Main data types

ChEMBL v. 211 Data mined from literature and PubChemHTS assays

BindingDB2 Experimental protein-small molecule interaction data

PubChem3 Bioactivity data from HTS assays

Reaxys4 Literature mined property, activity and reaction data

SciFinder (CAS)5 Experimental properties, 13C and 1H NMR spectra, reaction data

GOSTAR6 Target-linked data from patents and articles

AZ IBIS7 AZ in-house SAR data points

OCHEM8 Mainly ADMET data collected from literature

1)PapadatosG,etal.JComputAidedMolDes2015;29(9)885-96.2)GilsonMK,etal.NucleicAcidsRes2016;44(D1):D1045-53.3)KimS,etal.NucleicAcidsRes2016;44(D1):D1202-13.4)h^p://www.elsevier.com/solu?ons/reaxys5)h^p://www.cas.org/products/scifinder6)h^p://www.gostardb.com7)MuresanSetal.DrugDiscovToday2011;16(23-24):1019-30.8)SushkoI,etal..JComputAidedMolDes2011;25(6):533-54.

Page 7: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Big Data sizesBigdataisatermfordatasetsthataresolargeorcomplexthattradi?onaldataprocessingapplica?onsareinadequate(Wikipedia)

CCBY-SA3.0,h^ps://commons.wikimedia.org/w/index.php?curid=29452425

1exabyte:1018

Page 8: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Large Chemical Database

100,0001,000,000

10,000,000100,000,000

1,000,000,00010,000,000,000100,000,000,000

1,000,000,000,00010,000,000,000,000

100,000,000,000,0001,000,000,000,000,000

10,000,000,000,000,000100,000,000,000,000,000

1,000,000,000,000,000,000

Compounds

Exp.Facts

Page 9: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Big Data are relative to a field

•  Methodstoanalyzesuchdatadonotexist•  Wemaynotsufficienttechnicalresources(speed,memory)to

usetheexis?ngmethods•  Wemaynothaveknowledgetousetheexis?ngmethodsThustheBigDatacanappeardueto:Physicalchallenges(hardware)Knowledgechallenges(informa?cs,sogware)

Page 10: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Example of Big Data

Whichdataarereallybigones?

Page 11: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

What data sizes are “big” ones?

“Generalmel?ngpointpredic?onbasedonadiversecompounddatasetandar?ficialneuralnetworks”Karthikeyanetal.J.Chem.Inf.Model.2005,45(3),681-90.N=4173

à  Largedataset~50kà  Bigdataset~250k

Page 12: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Melting Point Datasets

•  Bergström277•  Bradley2886•  OCHEM22404•  Enamine21883

data

Bergström

Bradley

OCHEM

Enamine

TetkoetalJ.Chem.Inf.Model.2014,22;54(12):3320-9.

Page 13: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

275k Melting Point Datasets

•  Bergström277•  Bradley2886•  OCHEM22404•  Enamine21883•  PATENTS228079

data

Bergström Bradley OCHEM Enamine Patents

TetkoetalJ.Chemoinforma2cs,2016,8,2.

COMBINED:OCHEM+Enamine+Bradley+Bergström

Page 14: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Extraction of MP information from patents

•  [0835] To a solution of 2-amino-4,6-dimethoxybenzamide (0.195 g, 0.99 mmol) and 5-(2-(tert-butyldimethylsilyloxy)ethoxy)-6-phenylpicolinaldehyde (0.355 g, 0.99 mmol) in N,N-dimethyl acetamide (10 ml), was added NaHSO3 (0.264 g, 1.49 mmol) and p-toluenesulfonic acid monohydrate (0.038 g, 0.198 mmol). The reaction mixture was heated at 120° C. for 16 h. After that time the reaction was cooled to rt and the solvent was removed under reduced pressure. The reaction mixture was then diluted with water (150 mL) and neutralized with NaHCO3. The precipitated solids were collected by filtration, washed with water and dried to give 2-(5-(2-(tert-butyldimethylsilyloxy)ethoxy)-6-phenylpyridin-2-yl)-5,7-dimethoxyquinazolin-4(3H)-one (0.500 g, 94%) as an off-white solid: 1H NMR (400 MHz, DMSO-d6) δ 11.08 (s, 1H), 8.35 (d, J=8.98 Hz, 1H), 8.21 (d, J=2.34 Hz, 2H), 7.82 (d, J=8.59 Hz, 1H), 7.44-7.52 (m, 3H), 6.81 (d, J=2.34 Hz, 1H), 6.58 (d, J=2.34 Hz, 1H), 4.24-4.32 (m, 2H), 3.94-4.00 (m, 2H), 3.92 (s, 3H), 3.86 (s, 3H), 0.85 (s, 9H), 0.08 (s, 6H); ESI MS m/z 534 [M+H]+.

•  http://www.google.com/patents/US20140140956

Page 15: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Extracting of melting points from patents

Workflow

NextMoveLtd,UK

Page 16: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Extraction of MP information from patents

Page 17: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Modeling of MP data

Package name

Type of descriptors

Number of descriptors

Matrix size, billions

Non zero values, millions

Sparseness

Functional Groups integer 595 0.18 3.1 33

QNPR integer 1502 0.45 6.3 49

MolPrint binary 688634 205 8.1 7200

Estate count float 631 0.19 10 14

Inductive float 54 0.02 11 1

ECFP4 binary 1024 0.31 12 25

Isida integer 5886 1.75 18 37

ChemAxon float 498 0.15 23 1.5

GSFrag integer 1138 0.34 24 5.7

CDK float 239 0.07 27 2

Adriana float 200 0.06 32 1.3

Mera, Mersy float 571 0.17 61 1.1

Dragon float 1647 0.49 183 1.5

Page 18: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Large à Big

•  NeuralNetworkswastooslow(ensembletraining!)àSVMwasused

•  Supportofparallelcalcula?ons(48core)•  Supportofgridanalysis(>1000CPUs)

•  Storageoffulldatamatrix->sparsedatamatrix

Page 19: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Prediction errors for Bergström drug like compounds using models developed with different training sets

0

10

20

30

40

50

60

277(Bergström) 4k(Karthikeyan) 50k(Literature) 275k(Patents)

°C

trainingsetsize

Page 20: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Prediction of Huuskonen set using ALOGPS logP and MP based on 50k measurements

logS=0.5–0.01(MP-25)–logKow

Page 21: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Prediction of Huuskonen set using ALOGPS logP and MP based on 230k measurements

logS=0.5–0.01(MP-25)–logKow

Page 22: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Big Data Quality and Complexity

Whyisitveryimportant?Howdomainspecificanalysiscouldhelp?

Page 23: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Suscep?bilityofCPM-basedHTStoscreeningcompound-basedinterference.(A)Assayschema?cfortheCPM-basedHTSusedinthisstudy.TheassaymeasurestheHATac?vityoftheR^109–Vps75complex,whichcatalyzesthetransferofanacetylmoietyfromacetyl-CoAtospecificlysineresiduesontheAsf1–dH3–H4substratecomplextoproduceacetylatedhistoneresiduesandcoenzymeA(CoA).Addi?onofthethiol-scavengingprobeCPMleadstoahighlyfluorescentadductbyreac?ngwiththeCoAbyproduct,whichisusedtoquan?fyHATac?vityviafluorescenceintensitymeasurement.(B)Representa?veassayinterferencechemotypesiden?fiedduringpost-HTStriage.

DahlinetalJ.Med.Chem.2015,58,2091-2113.

99.8%FHs!

Page 24: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Promiscuous compounds filters

Page 25: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Promiscuous compounds filters

Page 26: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Pan Assay INterference compoundS (PAINS) Filters

•  colorquenching•  singletoxygenquenching•  auto-fluorescence•  covalentbinding•  inherently“s?cking”

compounds•  disrupttheinterac?on

betweenthetagoftheproteinandbindingsiteofthedetec?onsystem

BaellandHolloway,J.Med.Chem.,2010,53:2719-40.

~500filtersbasedonN=93212compounds

Page 27: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Structural & Toxic Alerts at http://ochem.eu •  Screening of compounds against published toxicity alerts,

groups, frequent hitters •  Filter alerts by endpoints or publications •  Create or upload custom SMARTS rules

Sushkoetal,J.Chem.Inf.Model,2012,52(8):2310-6.

>500func?onalgroups>2.3kalertsintotal

Page 28: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Identification of AlphaScreen-HIS Frequent Hitters

For Peer Review

210x297mm (300 x 300 DPI)

Page 31 of 43

http://mc.manuscriptcentral.com/jbsc

Journal of Biomolecular Screening

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

TrueHitsTM:StreptavidinDonorbeadBio?nylatedAcceptorbeads

SchorppetalJ.Biomol.Screen.2014,9,715-726.

Page 29: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Mode Of Action of AlphaScreen-HIS Frequent Hitters

SchorppetalJ.Biomol.Screen.2014,9,715-726.

Page 30: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Bio Assays Ontology relationships

Abeyruwan,U.etal“EvolvingBioAssayOntology(BAO):Modulariza?on,Integra?onandApplica?ons,”JournalofBiomedicalSeman?cs,vol.5,no.1:S5,2014.

Page 31: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Annotation of large chemical spaces

BigData,whichhavebeenalwaysinchemistry.

Page 32: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Virtual chemical spaces

1,000

10,000

100,000

1,000,000

10,000,000

100,000,000

1,000,000,000

10,000,000,000

100,000,000,000

1,000,000,000,000

Synthesizable~1024andtotalspaceis~1060

Page 33: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Virtual chemical spaces

1.00E+031.00E+041.00E+051.00E+061.00E+071.00E+081.00E+091.00E+101.00E+111.00E+121.00E+131.00E+141.00E+151.00E+161.00E+171.00E+18

GDB*N–allpossiblechemicalswith≤Natoms

Page 34: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Virtual chemical spaces

1.00E+03

1.00E+05

1.00E+07

1.00E+09

1.00E+11

1.00E+13

1.00E+15

1.00E+17

1.00E+19

1.00E+21

1.00E+23

Synthesizable~1024

Page 35: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Virtual chemical spaces

1.00E+031.00E+061.00E+091.00E+121.00E+151.00E+181.00E+211.00E+241.00E+271.00E+301.00E+331.00E+361.00E+391.00E+421.00E+451.00E+481.00E+511.00E+541.00E+571.00E+60

Synthesizable~1024andtotal“drug–like”spaceis~1060

Page 36: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Annotation of compounds

•  ALOGPS2.1*(predic?onoflogPandwatersolubilityofchemicalcompounds)

•  ~100,000moleculesperminute

•  Annota?onofGDB-17willtake~3yearsofcalcula?onsusingonecore

•  ~10minutesonLeibnizSupercompu?ngCentrewith241,000cores

*Tetko,I.V.J.Chem.Inf.Comput.Sci.2001,41,1407-1421.

Page 37: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

We can’t predict unpredictable!

N

O O

newmeasurement

N O

O OHN O

O

newseriestopredict

Page 38: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

New machine learning approaches

WhichmethodscanhelpuswithBigData?

Page 39: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

CourtesyofProf.J.Bajorath

Page 40: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Multi-task learning

0

0.1

0.2

0.3

0.4

0.5

0.6

Mea

n A

bsol

ute

Erro

rone several

Human data

FatBrainLiverKidneyMuscle

Problem:• predic?onof?ssue-airpar??oncoefficients• smalldatasets30-100molecules(human&ratdata)Results:simultaneouspredic?onofseveralproper?esincreasedtheaccuracyofmodels

Varnek,A.etalJ.Chem.Inf.Model.2009,49,133-44.

Page 41: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Renaissance of neural networks

Deeplearning–  Massiveneuralnetworkswiththousandsofneuronsandlayers–  Newlearningmethods(dropouttechnique)

Examplesoftheuseofdeeplearningtechnology:–  Recogni?onofChinesecharacterswithhumanaccuracy–  VictoryinGo-tournament–  Diagnos?csofbreastcancer

Baskin,I.I.;Winkler,D.;Tetko,I.V.Arenaissanceofneuralnetworksindrugdiscovery.Expertopinionondrugdiscovery2016,11(8):785-95.

Page 42: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

h^p://adsabs.harvard.edu/abs/2015arXiv150202072R

259datasets•  128PubChem•  17MUV•  102DUD-E•  12Tox21

Total~40Mdatapointsfor1.6Mcompounds

Descriptors:ECFP4RDKit

Page 43: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Multitask Networks Learning Results

•  Massivelymul?tasknetworksobtainpredic?veaccuraciessignificantlybe^erthansingle-taskmethods.

•  Thepredic?vepowerofmul?tasknetworksimprovesasaddi?onaltasksanddataareadded.

•  Thetotalamountofdataandthetotalnumberoftasksbothcontributesignificantlytomul?taskimprovement.

•  Mul?tasknetworksaffordlimitedtransferabilitytotasksnotinthetrainingset.

h^p://adsabs.harvard.edu/abs/2015arXiv150202072R

Page 44: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Multitask benefit from increasing tasks and data independently.

h^p://adsabs.harvard.edu/abs/2015arXiv150202072R

Page 45: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Secure Information Sharing

Howcanweshareinforma?onbutnotdata?Howcanweenablecoopera?onbetweenindustries?

Page 46: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Secure Sharing of information

•  CINF/COMPworkshopwasorganizedduringACSin2005byProf.Oprea•  Variousstructurerepresenta?on(descriptors)wereproposed•  Severalmethodsforsecuresharingwereintroduced

•  Butinthetheore?callimit*–  SMILESrepresenta?onofmolecules:CCC,CNCCC,c1ccccc1–  Zippingofstructuresrequires<1bitperatom–  Structurewith32atomsrequires<32bits–  Anydescriptorortheircombina?onwith>32bitscouldbeusedtodecodea

molecule(intheory)

*Tetko,I.V.;Abagyan,R.;Oprea,T.I.J.Comput.Aided.Mol.Des.2005,19,749-764.

Page 47: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Currently used technologies

“Honestbroker”–  Receivesdescriptors(orstructures)–  Developmodelsanddonotrevealtheunderlyingdata

Sharingrela?onshipsbetweenstructures–  MatchedMolecularPairs(changesinpropertyduetochangeofgroups)

Sharingdevelopedmodels–  Structuralalerts–  Computa?onalpredic?onmodels

Sharingreliablepredic?ons(surrogatedata)*

*Tetko,I.V.;Abagyan,R.;Oprea,T.I.J.Comput.Aided.Mol.Des.2005,19,749-764.

Page 48: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Multi-party secure computation

Page 49: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Secure summation

Page 50: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Conclusions

ExpectaOonsü  Improvedpredic?onofproper?es,and

ac?vi?esü  Improvedpoly-pharmacologyü  Searchofnewchemistry(topdown

explora?onanddenovodesign)ü  Predic?onofinvivoenpoints

Challengesü  Newmachinelearningapproaches(deep

learning)ü  Integra?onofdiversedataandapriory

knowledge(ontology,pathways,invitro,invivo,simula?onresults,differenterrors,etc.)

ü  Applicabilitydomainü  Securedatasharingü  Datavisualiza?onü  Denovodesign

Page 51: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Further reading

•  Tetko,I.V.;Engkvist,O.;Koch,U.;Reymond,J.L.;Chen,H.,BIGCHEM:ChallengesandOpportuni?esforBigDataAnalysisinChemistry.MolInform2016,35(11-12):615-621(OpenAccess).

•  Tetko,I.V.;Engkvist,O.;Chen,H.Does'BigData'existinmedicinalchemistry,andifso,howcanitbeharnessed?FutureMedChem.20168(15):1801-1806(OpenAccess).

Page 52: Big Data in Chemistry - Aboutbigchem.eu/sites/default/files/School1_Tetko.pdf · Data Types Database Main data types ChEMBL v. 211 Data mined from literature and PubChem HTS assays

Acknowledgements

Dr.Y.SushkoDr.S.NovotarskyiMr.R.KörnerMrs.E.SalminaDr.K.Hadian(TOX,HMGU)Dr.A.Williams(USA)Dr.D.Lowe(UK)

Dr.O.Engkvist(AZ)Dr.H.Chen(AZ)Dr.U.Koch(LDC)Prof.J.-L.Reymond(UniBern)Dr.I.Baskin(MSU)Dr.D.Winkler(CSIRO)AndBIGCHEMpartners