Linking Open Drug Data toCheminformatics andProteochemometrics
Egon Willighagen <http://chem-bla-ics.blogspot.com/>
Bioclipse & Proteochemometric Group (Prof. Wikberg)Department of Pharmaceutical Biosciences
Uppsala University
2009-11-20
Knowledge...
Solanum lycopersicum...
We model our world, but ...Life is not uni- or bivariateKnowledge is not eitherBut we think of it as suchInformation Loss!
2009-11-20 Bioclipse & Proteochemometric Group - 2 - Egon Willighagen | chem-bla-ics.blogspot.com
Names...
benzene3-[4-[3-(1-methyl-7-oxo-3-propyl-4H-pyrazolo[4,3-d]pyrimidin-5-yl)-4-propoxyphenyl]sulfonylpiperazin-1-yl]propanoicacidInChI=1S/C25H34N6O6S/c1-4-6-19-22-23(29(3)28-19)25(34)27-24(26-22)18-16-17(7-8-20(18)37-15-5-2)38(35,36)31-13-11-30(12-14-31)10-9-21(32)33/h7-8,16H,4-6,9-15H2,1-3H3,(H,32,33)(H,26,27,34)
2009-11-20 Bioclipse & Proteochemometric Group - 3 - Egon Willighagen | chem-bla-ics.blogspot.com
... Molecular reality...
1 000 000 000 000 000 000 000 000000 000 000 000 000 000 000 000000 000 000 000
2009-11-20 Bioclipse & Proteochemometric Group - 4 - Egon Willighagen | chem-bla-ics.blogspot.com
... and Numbers
2009-11-20 Bioclipse & Proteochemometric Group - 5 - Egon Willighagen | chem-bla-ics.blogspot.com
Knowledge Representation: InformationLoss
2009-11-20 Bioclipse & Proteochemometric Group - 6 - Egon Willighagen | chem-bla-ics.blogspot.com
Data Analysis
2009-11-20 Bioclipse & Proteochemometric Group - 7 - Egon Willighagen | chem-bla-ics.blogspot.com
Proteochemometrics
2009-11-20 Bioclipse & Proteochemometric Group - 8 - Egon Willighagen | chem-bla-ics.blogspot.com
Main Theme
How do we navigate dimensionality space?How include prior knowledge?While minimizing information loss?With optimal knowledge extraction?And maximizing interpretability?Without ending up in random correlation?
2009-11-20 Bioclipse & Proteochemometric Group - 9 - Egon Willighagen | chem-bla-ics.blogspot.com
OpenMolecules RDF: dereferenceable URI
http://rdf.openmolecules.net/
2009-11-20 Bioclipse & Proteochemometric Group - 10 - Egon Willighagen | chem-bla-ics.blogspot.com
OpenMolecules RDF: linked data
http://rdf.openmolecules.net/
2009-11-20 Bioclipse & Proteochemometric Group - 11 - Egon Willighagen | chem-bla-ics.blogspot.com
The Chemistry Development Kit
A Family of ProjectsCDK-Taverna (chemoinformatics workflows)JChemPaint (semantic 2D editor)ChemoJava (GPL-ed extension)
Goalslibrary of cheminformatics algorithmseducational
UsageCDK: 100+ times cited in scientific literatureBioclipse, KNIME, Jumbo (CML), AMBIT, ...
C. Steinbeck et al., J.Chem.Inf.Comput.Sci, 2003C. Steinbeck et al., Curr.Pharm.Design, 2006
2009-11-20 Bioclipse & Proteochemometric Group - 12 - Egon Willighagen | chem-bla-ics.blogspot.com
Bioclipse
O. Spjuth et al., BMC Bioinformatics 2007, 8:59
2009-11-20 Bioclipse & Proteochemometric Group - 13 - Egon Willighagen | chem-bla-ics.blogspot.com
Integration
Servicesdatabases: PubChemweb servicesGoogle SpreadsheetsMyExperiment.org: BioclipseScripting LanguageTwitter, ...journals, ...
TechniquesSOAP, REST, XMPP, . . .Resource Description Frameworkdedicated APIs
2009-11-20 Bioclipse & Proteochemometric Group - 14 - Egon Willighagen | chem-bla-ics.blogspot.com
Bioclipse-RDF
local RDF storageread/write RDF/XML, N3run SPARQL queries (local and remote)extract RDF from XHTML/RDFa
Thanx to Jena and Pellet.
2009-11-20 Bioclipse & Proteochemometric Group - 15 - Egon Willighagen | chem-bla-ics.blogspot.com
Quote of the Day
"There are too many people doing data integration,this is a waste of a lot of smart people’s time"
@alanruttenberg at #swat4ls2009 via dullhunk - twitter
2009-11-20 Bioclipse & Proteochemometric Group - 16 - Egon Willighagen | chem-bla-ics.blogspot.com
SPARQL end points
GNU FDLNMRShiftDB data (also available via Bio2RDF)
CC0ChemPediaOpen Notebook Science Solubility
2009-11-20 Bioclipse & Proteochemometric Group - 17 - Egon Willighagen | chem-bla-ics.blogspot.com
Names 2 Graphs 2 Numbers...
2009-11-20 Bioclipse & Proteochemometric Group - 18 - Egon Willighagen | chem-bla-ics.blogspot.com
Disease 2 PDB
2009-11-20 Bioclipse & Proteochemometric Group - 19 - Egon Willighagen | chem-bla-ics.blogspot.com
CDK as RDF
model1:atom1a cdk:Atom ;cdk:hasFormalCharge "1" ;cdk:symbol "O" .
model1:atom2a cdk:Atom ;cdk:symbol "C" .
model1:mol1a cdk:Molecule ;dc:title "Methanol" ;owl:sameAs <http://rdf.openmolecules.net/?InChI=1/CH4O/c1-2/h2H,1H3> ;cdk:hasAtom model1:atom2 ,
model1:atom1 ;cdk:hasBond model1:bond1 .
2009-11-20 Bioclipse & Proteochemometric Group - 20 - Egon Willighagen | chem-bla-ics.blogspot.com
Proteochemometrics
2009-11-20 Bioclipse & Proteochemometric Group - 21 - Egon Willighagen | chem-bla-ics.blogspot.com
OWL for Descriptors
Used for model and data.
2009-11-20 Bioclipse & Proteochemometric Group - 22 - Egon Willighagen | chem-bla-ics.blogspot.com
MyExperiment: Bioclipse ScriptingLanguage
2009-11-20 Bioclipse & Proteochemometric Group - 23 - Egon Willighagen | chem-bla-ics.blogspot.com
What does this bring us?
Platform to integrate the RDF with the computation worldBioclipse as single point of accessScripting, sharing of scripts with MyExperiment.orgBridge the nominal with the numerical world
2009-11-20 Bioclipse & Proteochemometric Group - 24 - Egon Willighagen | chem-bla-ics.blogspot.com
Where next?
FrameworkTriple generation on demand (XMPP, SADI, ...)Ontology alignmentsSemantic Mediawiki integration
ProteochemometricsKnowledge discoveryData set aggregationAutomated model validation
2009-11-20 Bioclipse & Proteochemometric Group - 25 - Egon Willighagen | chem-bla-ics.blogspot.com
The Details
http://www.citeulike.org/user/
egonw/tag/papers
http:
//chem-bla-ics.blogspot.com
http://egonw.github.com
waveto:
2009-11-20 Bioclipse & Proteochemometric Group - 26 - Egon Willighagen | chem-bla-ics.blogspot.com
Top Related