Linking Open Drug Data to Cheminformatics and Proteochemometrics

Post on 27-Jan-2015

111 views 2 download

Tags:

description

My talk at SWAT4LS 2009 in Amsterdam.

Transcript of Linking Open Drug Data to Cheminformatics and Proteochemometrics

Linking Open Drug Data toCheminformatics andProteochemometrics

Egon Willighagen <http://chem-bla-ics.blogspot.com/>

Bioclipse & Proteochemometric Group (Prof. Wikberg)Department of Pharmaceutical Biosciences

Uppsala University

2009-11-20

Knowledge...

Solanum lycopersicum...

We model our world, but ...Life is not uni- or bivariateKnowledge is not eitherBut we think of it as suchInformation Loss!

2009-11-20 Bioclipse & Proteochemometric Group - 2 - Egon Willighagen | chem-bla-ics.blogspot.com

Names...

benzene3-[4-[3-(1-methyl-7-oxo-3-propyl-4H-pyrazolo[4,3-d]pyrimidin-5-yl)-4-propoxyphenyl]sulfonylpiperazin-1-yl]propanoicacidInChI=1S/C25H34N6O6S/c1-4-6-19-22-23(29(3)28-19)25(34)27-24(26-22)18-16-17(7-8-20(18)37-15-5-2)38(35,36)31-13-11-30(12-14-31)10-9-21(32)33/h7-8,16H,4-6,9-15H2,1-3H3,(H,32,33)(H,26,27,34)

2009-11-20 Bioclipse & Proteochemometric Group - 3 - Egon Willighagen | chem-bla-ics.blogspot.com

... Molecular reality...

1 000 000 000 000 000 000 000 000000 000 000 000 000 000 000 000000 000 000 000

2009-11-20 Bioclipse & Proteochemometric Group - 4 - Egon Willighagen | chem-bla-ics.blogspot.com

... and Numbers

2009-11-20 Bioclipse & Proteochemometric Group - 5 - Egon Willighagen | chem-bla-ics.blogspot.com

Knowledge Representation: InformationLoss

2009-11-20 Bioclipse & Proteochemometric Group - 6 - Egon Willighagen | chem-bla-ics.blogspot.com

Data Analysis

2009-11-20 Bioclipse & Proteochemometric Group - 7 - Egon Willighagen | chem-bla-ics.blogspot.com

Proteochemometrics

2009-11-20 Bioclipse & Proteochemometric Group - 8 - Egon Willighagen | chem-bla-ics.blogspot.com

Main Theme

How do we navigate dimensionality space?How include prior knowledge?While minimizing information loss?With optimal knowledge extraction?And maximizing interpretability?Without ending up in random correlation?

2009-11-20 Bioclipse & Proteochemometric Group - 9 - Egon Willighagen | chem-bla-ics.blogspot.com

OpenMolecules RDF: dereferenceable URI

http://rdf.openmolecules.net/

2009-11-20 Bioclipse & Proteochemometric Group - 10 - Egon Willighagen | chem-bla-ics.blogspot.com

OpenMolecules RDF: linked data

http://rdf.openmolecules.net/

2009-11-20 Bioclipse & Proteochemometric Group - 11 - Egon Willighagen | chem-bla-ics.blogspot.com

The Chemistry Development Kit

A Family of ProjectsCDK-Taverna (chemoinformatics workflows)JChemPaint (semantic 2D editor)ChemoJava (GPL-ed extension)

Goalslibrary of cheminformatics algorithmseducational

UsageCDK: 100+ times cited in scientific literatureBioclipse, KNIME, Jumbo (CML), AMBIT, ...

C. Steinbeck et al., J.Chem.Inf.Comput.Sci, 2003C. Steinbeck et al., Curr.Pharm.Design, 2006

2009-11-20 Bioclipse & Proteochemometric Group - 12 - Egon Willighagen | chem-bla-ics.blogspot.com

Bioclipse

O. Spjuth et al., BMC Bioinformatics 2007, 8:59

2009-11-20 Bioclipse & Proteochemometric Group - 13 - Egon Willighagen | chem-bla-ics.blogspot.com

Integration

Servicesdatabases: PubChemweb servicesGoogle SpreadsheetsMyExperiment.org: BioclipseScripting LanguageTwitter, ...journals, ...

TechniquesSOAP, REST, XMPP, . . .Resource Description Frameworkdedicated APIs

2009-11-20 Bioclipse & Proteochemometric Group - 14 - Egon Willighagen | chem-bla-ics.blogspot.com

Bioclipse-RDF

local RDF storageread/write RDF/XML, N3run SPARQL queries (local and remote)extract RDF from XHTML/RDFa

Thanx to Jena and Pellet.

2009-11-20 Bioclipse & Proteochemometric Group - 15 - Egon Willighagen | chem-bla-ics.blogspot.com

Quote of the Day

"There are too many people doing data integration,this is a waste of a lot of smart people’s time"

@alanruttenberg at #swat4ls2009 via dullhunk - twitter

2009-11-20 Bioclipse & Proteochemometric Group - 16 - Egon Willighagen | chem-bla-ics.blogspot.com

SPARQL end points

GNU FDLNMRShiftDB data (also available via Bio2RDF)

CC0ChemPediaOpen Notebook Science Solubility

2009-11-20 Bioclipse & Proteochemometric Group - 17 - Egon Willighagen | chem-bla-ics.blogspot.com

Names 2 Graphs 2 Numbers...

2009-11-20 Bioclipse & Proteochemometric Group - 18 - Egon Willighagen | chem-bla-ics.blogspot.com

Disease 2 PDB

2009-11-20 Bioclipse & Proteochemometric Group - 19 - Egon Willighagen | chem-bla-ics.blogspot.com

CDK as RDF

model1:atom1a cdk:Atom ;cdk:hasFormalCharge "1" ;cdk:symbol "O" .

model1:atom2a cdk:Atom ;cdk:symbol "C" .

model1:mol1a cdk:Molecule ;dc:title "Methanol" ;owl:sameAs <http://rdf.openmolecules.net/?InChI=1/CH4O/c1-2/h2H,1H3> ;cdk:hasAtom model1:atom2 ,

model1:atom1 ;cdk:hasBond model1:bond1 .

2009-11-20 Bioclipse & Proteochemometric Group - 20 - Egon Willighagen | chem-bla-ics.blogspot.com

Proteochemometrics

2009-11-20 Bioclipse & Proteochemometric Group - 21 - Egon Willighagen | chem-bla-ics.blogspot.com

OWL for Descriptors

Used for model and data.

2009-11-20 Bioclipse & Proteochemometric Group - 22 - Egon Willighagen | chem-bla-ics.blogspot.com

MyExperiment: Bioclipse ScriptingLanguage

2009-11-20 Bioclipse & Proteochemometric Group - 23 - Egon Willighagen | chem-bla-ics.blogspot.com

What does this bring us?

Platform to integrate the RDF with the computation worldBioclipse as single point of accessScripting, sharing of scripts with MyExperiment.orgBridge the nominal with the numerical world

2009-11-20 Bioclipse & Proteochemometric Group - 24 - Egon Willighagen | chem-bla-ics.blogspot.com

Where next?

FrameworkTriple generation on demand (XMPP, SADI, ...)Ontology alignmentsSemantic Mediawiki integration

ProteochemometricsKnowledge discoveryData set aggregationAutomated model validation

2009-11-20 Bioclipse & Proteochemometric Group - 25 - Egon Willighagen | chem-bla-ics.blogspot.com

The Details

http://www.citeulike.org/user/

egonw/tag/papers

http:

//chem-bla-ics.blogspot.com

http://egonw.github.com

waveto:

egon.willighagen@googlewave.com

2009-11-20 Bioclipse & Proteochemometric Group - 26 - Egon Willighagen | chem-bla-ics.blogspot.com