Triples for the People (Scientists): Liberating biological knowledge with the Semantic Web
-
Upload
michel-dumontier -
Category
Health & Medicine
-
view
4.163 -
download
0
description
Transcript of Triples for the People (Scientists): Liberating biological knowledge with the Semantic Web
1
Triples for the People (Scientists): Liberating biological knowledge with the Semantic Web
Ottawa/Chicago Semantic Web Meetup : 23-11-09
Michel Dumontier, Ph.D.Associate Professor of Bioinformatics
Carleton University
Department of BiologySchool of Computer Science
Institute of BiochemistryOttawa Institute of Systems Biology
Ottawa-Carleton Institute of Biomedical Engineering
Ottawa/Chicago Semantic Web Meetup : 23-11-092 Carole Goble (ISWC 2005)
Web-based Knowledge Discovery a very painful process
Ottawa/Chicago Semantic Web Meetup : 23-11-093
With current web search engines…It takes a lot of digging to get answers
Ottawa/Chicago Semantic Web Meetup : 23-11-094
Portals provide structured informationand give better results
Ottawa/Chicago Semantic Web Meetup : 23-11-095
Surface web:167 terabytes
Deep web:91,000 terabytes
545-to-one
We need to expose the deep web
Ottawa/Chicago Semantic Web Meetup : 23-11-096
Data silos – not made for sharing
Ottawa/Chicago Semantic Web Meetup : 23-11-097
We want to simultaneously
query the 1000+ biological databases
Ottawa/Chicago Semantic Web Meetup : 23-11-098
How do we integrate these resources?
Ottawa/Chicago Semantic Web Meetup : 23-11-099
The Semantic Web is a web of knowledge.
It is about standards for publishing, sharing and querying knowledge drawn from diverse sources
It enables the answering of sophisticated questions
Ottawa/Chicago Semantic Web Meetup : 23-11-0910
A growing web of linked data
Ottawa/Chicago Semantic Web Meetup : 23-11-0911
Bio2RDF provides a framework to glue to link data networks together
Ottawa/Chicago Semantic Web Meetup : 23-11-0912
Resource Description Framework (RDF)
Uniform Resource Identifier (URI) can be used as entity names
http://bio2rdf.org/uniprot:P05067
is a name for Amyloid precursor protein
http://bio2rdf.org/omim:104300
is a name for Alzheimer disease
uniprot:P05067
omim:104300
Allows one to talk about anything
Ottawa/Chicago Semantic Web Meetup : 23-11-0913
Resource Description Framework (RDF)
Protein
is a
A RDF statement consists of:– Subject: resource identified by a URI– Predicate: resource identified by a URI– Object: resource or literal
uniprot:P05067
Allows one to express statements
Ottawa/Chicago Semantic Web Meetup : 23-11-0914
Multi-Source Data Integration
uniprot:P05067 Membrane
Proteinis a
located in
uniprot:P05067
uniprot:P05067 uniprot:P05067interacts with
UniProt
Gene Ontology
uniprot:P05067
has name
located in
interacts with
Unified view
+
+
iRefIndex
depends on consistent naming
Membrane
Protein
uniprot:P05067
Ottawa/Chicago Semantic Web Meetup : 23-11-0915
Building statements creates knowledge
uniprot:P05067
Protein
is a
omim:104300
Disease
is a
is involved in
Amyloid precursor
protein
label
AlzheimerDisease
label
Ottawa/Chicago Semantic Web Meetup : 23-11-091616
RDF/XML<?xml version="1.0"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:u="http://purl.uniprot.org/uniprot/"
<rdf:Description rdf:about=“&u;Q16665"> <rdf:type rdf:resource=“&u;Protein"/> </rdf:Description></rdf:RDF>
PREFIX u: <http://purl.uniprot.org/uniprot/> .
<u:Q16665> a <u:Protein> .
RDF/N3
RDF has multiple representations
Ottawa/Chicago Semantic Web Meetup : 23-11-0917
Bio2RDF’s RDFized data fits together
Ottawa/Chicago Semantic Web Meetup : 23-11-0918
Bio2RDF serves up over 4 billion triples of linked biological data
Ottawa/Chicago Semantic Web Meetup : 23-11-0919
something you can lookup or search for with rich descriptions
Ottawa/Chicago Semantic Web Meetup : 23-11-0920
Bio2RDF: Raw Data!
Ottawa/Chicago Semantic Web Meetup : 23-11-0921
SPARQL is the new cool kid on the block
SQL SPARQL
Ottawa/Chicago Semantic Web Meetup : 23-11-0922
Bio2RDF’s describe service uses SPARQL
CONSTRUCT {?s ?p ?o .
}WHERE {?s ?p ?o .FILTER(?s = <http://bio2rdf.org/ns:id>).
}
Sent to http://ns.bio2rdf.org/sparql?query=...
http://bio2rdf.org/ns:id
Ottawa/Chicago Semantic Web Meetup : 23-11-0923
Bio2RDF’s search service uses SPARQLhttp://bio2rdf.org/search/hexokinase
kegguniprot
gene
bio2rdf.org
Ottawa/Chicago Semantic Web Meetup : 23-11-0924
Yai for data!
But how do we discover more than what was in the data?
Ottawa/Chicago Semantic Web Meetup : 23-11-0925
Ontology as Strategy
Ottawa/Chicago Semantic Web Meetup : 23-11-0926
uniprot:P05067
Protein
is a
Molecule
is a
is a
Reasoning and Inference through Semantics
fact
ontology
Knowledge base
Ottawa/Chicago Semantic Web Meetup : 23-11-0927
Logic Based Ontologies Are Conceptual Lego
Ottawa/Chicago Semantic Web Meetup : 23-11-0928
A simple ontology: Animals
Living Thing
Grass
Animal
Plant
Tree
Body Part
Arm
Leg
Person
Cow
Carnivore
Herbivore
eats
eats
eatshas part
Ottawa/Chicago Semantic Web Meetup : 23-11-0929
The Web Ontology Language (OWL) Has Explicit Semantics
Can therefore be used to capture knowledge in a machine understandable way
Ottawa/Chicago Semantic Web Meetup : 23-11-0930
• Subsumption is the primary axis (relationship) in OWL• Superclass/subclass relationship, “is a”• All members of a subclass must be members of its superclasses
• All Proteins are also Molecules• Protein is a subclass of Molecule• Molecule is a superclass of Protein• Molecule subsumes Protein
owl:Thing superclass of all Classes
Protein
Molecule
Key idea: Subsumption
Ottawa/Chicago Semantic Web Meetup : 23-11-0931
Key Idea: Disjunction
Stating that 2 classes are disjoint means
DNA
= individual
Something cannot be both an Protein and DNA
Protein
This can help us find errors
32
Transcription Factor
“A protein that binds to DNA and regulates gene expression.
Ottawa/Chicago Semantic Web Meetup : 23-11-09
By stating the necessary and sufficient conditions we discover new knowledge
Key Idea: Class equivalence
Ottawa/Chicago Semantic Web Meetup : 23-11-0933Barry smith
Many ontologies required
Over 170 bio-ontologies
Ottawa/Chicago Semantic Web Meetup : 23-11-0934
We’re interested in Personalized Medicine
The ability to offer • The Right Drug• To The Right Patient• For The Right Disease• At The Right Time• With The Right Dosage
Genetic and metabolic data will allow drugs to be tailored to patient subgroups
35 Ottawa/Chicago Semantic Web Meetup : 23-11-09
Ottawa/Chicago Semantic Web Meetup : 23-11-0936
PHARMGKB is an emerging resource for pharmacogenomics
+ Role of genes, gene variants , drugs + pharmacokinetics + pharmacodynamics + clinical outcomes. + Links to publications
- Natural language descriptions- Variant details in publications
Ottawa/Chicago Semantic Web Meetup : 23-11-0937
contains statements from 11/40 relevant publications involving 45 genes / gene variants, 57 drugs annotated with 19 classes of antidepressants, 45 drug treatments, 47 drug-gene interactions, 29 clinical outcomes, 10 drug-induced side-effects, and 8 gene-disease interactions.
PHARMACOGENOMICS OF DEPRESSION KNOWLEDGE BASE
Ottawa/Chicago Semantic Web Meetup : 23-11-0938
Nortriptyline induced side effects for ABCB1 gene variants
‘side effect’ that ‘is realized by’ some (‘drug treatment’ that ‘involves’ some ‘nortriptyline’ and
‘involves’ some (‘variant of’ some ‘ABCB1’))
QUERYING THE PDKBProtégé 4, FaCT++, DL Query Tab
postural hypotension is a side effect of nortriptyline treatment of depression for individuals presenting the 3435C>T genotype
Ottawa/Chicago Semantic Web Meetup : 23-11-0939
Web-based Knowledge DiscoverySome of our queries need services
Ottawa/Chicago Semantic Web Meetup : 23-11-0940
The Holy Grail:
Align the promoters of all serine threonine kinases involved exclusively in the regulation of cell sorting during wound healing in blood vessels.
Retrieve and align 2000nt 5' from every serine/threonine kinase in Mus musculus expressed exclusively in the tunica [I | M |A] whose expression increases 5X or more within 5 hours of wounding but is not activated during the normal development of blood vessels, and is <40% homologous in the active site to kinases known to be involved in cell-cycle regulation in any other species.
Ottawa/Chicago Semantic Web Meetup : 23-11-0941
Ottawa/Chicago Semantic Web Meetup : 23-11-0942
Semantic Automated Discovery and Integration
http://sadiframework.org
Mark Wilkinson, UBCMichel Dumontier, Carleton UniversityChristopher Baker, UNB
Ottawa/Chicago Semantic Web Meetup : 23-11-0943
As OWL Axioms
HomologousGeneImage is owl:equivalentTo {
Gene Q hasImage image P
Gene Q hasSequence Sequence Q
Gene R hasSequence Sequence R
Sequence Q similarTo Sequence R
Gene R = “my gene of interest” }
Ottawa/Chicago Semantic Web Meetup : 23-11-0944
Build aknowledge basefrom a series of questions
Ottawa/Chicago Semantic Web Meetup : 23-11-0945
You want to join the knowledge web
Ottawa/Chicago Semantic Web Meetup : 23-11-0946
Share your data
Ottawa/Chicago Semantic Web Meetup : 23-11-0947
Bridge your data with others in semantic communities
Ottawa/Chicago Semantic Web Meetup : 23-11-0948
Time-sensitive or frequently updated data is one way to encourage more visits.
Ottawa/Chicago Semantic Web Meetup : 23-11-0949
Ottawa/Chicago Semantic Web Meetup : 23-11-0950
Ottawa/Chicago Semantic Web Meetup : 23-11-0951
The Knowledge Web
• Merging data & services
• Reasoning & question answering
• Persistent (RESTful)
• Trust & Security
Data consumers must be able to rely upon your data to use it as a foundation for their own applications.
Ottawa/Chicago Semantic Web Meetup : 23-11-0952
Join the knowledge web.
Ottawa/Chicago Semantic Web Meetup : 23-11-0953