The Semantic Web - This time... its Personal
-
Upload
mark-wilkinson -
Category
Technology
-
view
1.343 -
download
2
description
Transcript of The Semantic Web - This time... its Personal
“Shopping for data should be as easy
as shopping for shoes!!”
Carole Goble
The Semantic WebThis time… it’s personal!
Mark Wilkinson, PI Bioinformatics, Heart + Lung Institute @ St. Paul’s HospitalVancouver, BC, Canada
My Lab
Engineering &
Research
Programmers&
Students
Coolness!&
Study of Coolness!
Middleware
Robert Stevens
Shouldn’t be
seen!
DEMO
...to give you incentive to listen to the rest of the presentation
Show me patients with elevated creatinine along with their latest BUN and creatinine levels
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX patients: <http://sadiframework.org/ontologies/patients.owl#> PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#> SELECT ?patient ?bun ?creatFROM <http://sadiframework.org/ontologies/patients.rdf>WHERE {
?patient rdf:type patients:ElevatedCreatininePatient .?patient pred:latestBUN ?bun . ?patient pred:latestCreatinine ?creat .
}
VOILA!
There was no database...
There was no warehouse...
There was no patient data anywhere annotated as
“Elevated Creatinine Patient”
How did I answer a question where the required data didn’t exist?
(...no, I didn’t just make it up LOL!)
In the beginning was the problem...
The Problem
The Problem
The Holy Grail:(circa 2002)
Align the promoters of all serine threonine kinases involved exclusively in the regulation of cell sorting during wound healing in blood vessels.
Retrieve and align 2000nt 5' from every serine/threonine kinase in Mus musculus expressed exclusively in the tunica [I | M |A] whose expression increases 5X or more within 5 hours of wounding but is not activated during the normal development of blood vessels, and is <40% homologous in the active site to kinases known to be involved in cell-cycle regulation in any other species.
The Problem
The Problem
The Solution??
Why not?
Heart
Heart
You don’t know
what I know!
You don’t think
how I think!
No Personalization!
So… what can we do?
What we need
How do we make Bruce’s knowledge machine-readable?
Ontologies!
Two Problems with that…
#1 Ontology Spectrum
Catalog/ID
SelectedLogical
Constraints(disjointness,
inverse, …)
Terms/glossary
Thesauri“narrowerterm”relation
Formalis-a
Frames(Properties)
Informalis-a
Formalinstance
Value Restrs.
GeneralLogical
constraints
Originally from AAAI 1999- Ontologies Panel by Gruninger, Lehmann, McGuinness, Uschold, Welty; – updated by McGuinness.Description in: www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html
#1 Ontology Spectrum
Catalog/ID
SelectedLogical
Constraints(disjointness,
inverse, …)
Terms/glossary
Thesauri“narrowerterm”relation
Formalis-a
Frames(Properties)
Informalis-a
Formalinstance
Value Restrs.
GeneralLogical
constraints
#1 Ontology Spectrum
Catalog/ID
SelectedLogical
Constraints(disjointness,
inverse, …)
Terms/glossary
Thesauri“narrowerterm”relation
Formalis-a
Frames(Properties)
Informalis-a
Formalinstance
Value Restrs.
GeneralLogical
constraints
WHY?
#1 Ontology Spectrum
Catalog/ID
SelectedLogical
Constraints(disjointness,
inverse, …)
Terms/glossary
Thesauri“narrowerterm”relation
Formalis-a
Frames(Properties)
Informalis-a
Formalinstance
Value Restrs.
GeneralLogical
constraints
WHY?
Because I say so!
Because it fulfils XYZ
Because I say so?!?
That’s not very... Personal...
No room for disagreement
No place for scientific discourse
It is what it is...Because I say so!
Clay Shirky
“Ontology is over-rated”
http://www.shirky.com/writings/ontology_overrated.html
If we’re going to personalize the Semantic Web
We must change the way we create and use
classification systems(“shelves”)
“Get rid of the shelf”
- Clay Shirkey
You don’t know
what I know!
So don’t tell ME
how data should be interpreted!
How do we get rid of the shelf?
This is going to hurt...
Web Services
vs.
Semantic Web
Web Servicesare not “connected” to
the Semantic Web
Why?
Web ServicesXML + XML Schema
Semantic WebRDF + OWL
Web ServicesPOST of SOAP-XML
Semantic WebGET of RDF-XML
Web ServicesNo (rigorous) semantics
Semantic WebRich, flexible semantics
Web Services&
Semantic Web
Fundamentally and deeply different Web technologies!
>1000 X more data!
Accessing these databases and
analytical algorithms “transparently”,
based on an individual researcher’s
ideas, beliefs, and preferences
will help us personalize
medical research
Mark Butler (2003) Is the semantic web hype? Hewlett Packard laboratories presentation at MMU, 2003-03-12
Semantic Web?(my definition)
An information system where machines can receive information from one source, re-interpret it, and correctly use it for a purpose that the source had
not anticipated.
Re-interpretation
Correct re-use
Both are critical to
the personalization
of research
Building a
personalized Semantic Web…
Step-by-step…
Founding partner
Semantic Automated Discovery and Integrationhttp://sadiframework.org
(open source)MicrosoftResearch
“best-practices” for Semantic Web Service provision
standards-compliant
Lightweight(only 2 “rules”)
Rules come from observations:
SADI Observation #1:
Web Services in Bioinformatics create implicit biological relationships
between their input and output
SADI Observation #1:
SADI Best Practice #1
Make the implicit explicit…
A Web Service should create “triples” linking the input data to the output data, thus explicitly describing the semantic
relationship between them
SADI Best Practice #1
This is what bioinformatics Web Services implicitly do anyway!
Easy to implement this as a best-practice
SADI Observation #2:HTTP GET and POST
GET guarantees the response relates to the request URI
in a very precise and predictable way
POST does not…
SADI Observation #2:GET and POST
That’s why Web Services have a fundamentally different behaviour than the Semantic Web
SADI Observation #2:GET and POST
We can fix that!
(without breaking any existing rules or standards!)
SADI Best Practice #2
SUBJECT URI of the output graph (triples) is the same
as the SUBJECT URI of the input graph (triples)
(the output is “about” the input... Now explicitly!)
Consequence
The “Semantics” of our interaction with the Web Service are now
explicit and identical to the “Semantics” of GET
SADI Web Service Interfaces
Service Interfaces defined by two OWL classes:
SADI Web Service Interfaces
OWL Class #1: My Input Class
SADI Web Service Interfaces
OWL Class #2: My Output Class
SADI Web Service Interfaces
My Service consumes OWL Individuals of Class #1
and returns OWL Individuals of Class #2
…but the URI of those two individuals is the same!(see best practice #2)
How do we discover services?
Since input and output are about the same “thing”,
we can automatically determine
what a service does
by comparing
the Input and Output OWL classes
Automatically index services in a registry based
on what properties (predicates) Services add to
their respective input data
How do we discover services?
EXAMPLE
Input Data: BRCA1 rdf:type Gene ID
Output Data: BRCA1 hasDNASequence AGCTTAGCCA…
Registry Index: Service provides “hasDNASequence” property to Gene IDs
Now we can answer questions like
“what is the DNA sequence of BRCA1?”
Discover a SADI Web Service that generates the
DNA Sequence property for gene identifiers
Okay, enough tech gobbledygook
What will this do for ME?
Demo #1
Imagine there is a “virtual database” containing all of the data from all of the databases,together with the output of
every conceivable analysis
How do we query that database?
“SHARE”
Semantic Health And Research Environment
SADI client application
What pathways does UniProt protein P47989 belong to?
PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>PREFIX ont: <http://ontology.dumontierlab.com/>PREFIX uniprot: <http://lsrn.org/UniProt:>SELECT ?gene ?pathway WHERE {
uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway .
}
Recapwhat we just saw
A standard SPARQL query was entered into SHARE, a SADI-aware query engine
Recapwhat we just saw
The query was interpreted to extract the properties being queried and these were passed to SADI for Web Service discovery
Recapwhat we just saw
SADI searched-for, found, and accessed all databases and/or analytical tools capable
of generating those properties
Recapwhat we just saw
We posed, and answered a complex database query
WITHOUT A DATABASE
(in fact, the data didn’t even have to exist...)
The Holy Grail:
Align the promoters of all serine threonine kinases involved exclusively in the regulation of cell sorting during wound healing in blood vessels.
Retrieve and align 2000nt 5' from every serine/threonine kinase in Mus musculus expressed exclusively in the tunica [I | M |A] whose expression increases 5X or more within 5 hours of wounding but is not activated during the normal development of blood vessels, and is <40% homologous in the active site to kinases known to be involved in cell-cycle regulation in any other species.
Cool!
…but I’m supposed to be personalizing research…
Let’s make this a little more personal by bringing in Ontologies
My Definition of Ontology (for this talk)
Ontologies explicitly define the things that exist in “the world”
based on what properties each kind of thing must have
Ontology Spectrum
Catalog/ID
SelectedLogical
Constraints(disjointness,
inverse, …)
Terms/glossary
Thesauri“narrowerterm”relation
Formalis-a
Frames(Properties)
Informalis-a
Formalinstance
Value Restrs.
GeneralLogical
constraints
Demo #2Discover instances of OWL classes
from data that doesn’t exist…
Data exhibits “late binding”
Late binding:
“purpose and meaning” of the data is
not determined untilthe moment it is required
Benefitof late binding
Data is amenable toconstant re-interpretation
...MY interpretation
How?
DO NOT PRE-CLASSIFY DATA
Just hang properties on it
Ontologies are in the “Frames” area of the Ontology spectrum, and therefore can leverage SADI and be
“executed” as workflows
???
Did you just say
“execute ontologies as workflows?!”
Show me patients with elevated creatinine along with their latest BUN and creatinine levels
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX patients: <http://sadiframework.org/ontologies/patients.owl#> PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#> SELECT ?patient ?bun ?creatFROM <http://sadiframework.org/ontologies/patients.rdf>WHERE {
?patient rdf:type patients:ElevatedCreatininePatient .?patient pred:latestBUN ?bun . ?patient pred:latestCreatinine ?creat .
}
Start burrowing through the OWL class find that we need aregression model OWL class
Regression models have features like slopes and intercepts… and so onThe class is completely decomposed until a set of required Services are discoveredcapable of creating all these necessary properties
Successful decomposition of the OWL class to discover the need for a LinearRegression Web Service, and so on
VOILA!
Current Research Project Benjamin Vandervalk
There are many ways to resolve these ontologies and queries into workflows
My student Benjamin is currently studying optimization of this query resolution strategy
Interesting counterpart to DBMS query resolution because there are no indices for Web Services, and other issues (e.g. Service speed) are an important factor.
The Holy Grail:
Align the promoters of all serine threonine kinases involved exclusively in the regulation of cell sorting during wound healing in blood vessels.
Retrieve and align 2000nt 5' from every serine/threonine kinase in Mus musculus expressed exclusively in the tunica [I | M |A] whose expression increases 5X or more within 5 hours of wounding but is not activated during the normal development of blood vessels, and is <40% homologous in the active site to kinases known to be involved in cell-cycle regulation in any other species.
OWL Class restrictions converted into workflows
SPARQL queries converted into workflows
Reasoning happening in parallel with query execution
Data fulfilling OWL models is discovered,
or generated through running analytical tools
SADI and CardioSHARE
I still don’t seehow this is“Personal”
??
SELECT ?patient ?bun ?creatFROM <http://sadiframework.org/ontologies/patients.rdf>WHERE {
?patient rdf:type patients:ElevatedCreatininePatient .?patient pred:latestBUN ?bun . ?patient pred:latestCreatinine ?creat .
}
I created a small ontologydescribing my definition of
an Elevated Creatinine Patient
… it was MY ontology!
I can re-use it
I can modify it as I change my world-view
I can publish it for others to use
Others can modify it to fit THEIR world-view
My personal world-view is being dynamically resolved againstglobal data and knowledge
…but it’s bigger than that…
“Elevated Creatinine Patient”
I made that up! It came out of my head!
What’s another word for a world-view that you make-up?
Hypothesis
Current Research Project
We believe that ontologies and hypotheses are, in some ways, the same “thing”…
…simply assertions about individuals that may or may not exist
Current Research ProjectSoroush Samadian
e-Copy Clinical Outcomes investigations to determine if Outcomes hypotheses can be modeled as OWL ontologies and automatically resolved by SHARE
Recap
SADI Semantic Web Services generate triples; the predicates of those triples are indexed... Period.
For a given query, determine which properties are available, and which need to be
discovered/generated
Find services that generate the properties we need
Semantic Web
An information system where machines can receive information from one source, re-interpret it, and correctly use it for a purpose that the source had
not anticipated.
My Purpose!!
What SADI + SHARE supports
Re-interpretation
We constantly compare the collection of properties, gathered from third-parties worldwide, to whatever world-model (query/ontology) we wish to view it
through.
MY world model
Novel re-use
There is no way for the provider to dictate how their data should be used, or how it should be interpreted. They simply add
their properties into the “data cloud” and those properties are used in whatever
way is appropriate for ME.
What SADI + SHARE supports
And all this because SADI simply requires
that the input URI
is the same
as the output URI
Important “wins”
Data remains distributed
no warehouse!
Data is not “exposed” as a SPARQL endpoint
greater provider-control over computational resources
Semi-automated SADI service writing and deployment
Taverna
Semantically-guided SADI service discovery and pipelining
SADI Plug-ins
Where do we go from here?
Consent
Who defines my consent?
IRB
(in Canada REB)
Where’s MY opinionin that process?
New Research Project - iConsent
I define an ontology describing the conditions under which I would allow my bio/medical data to be used
Type of study, funding source, type of information
Highly granular, highly detailedHighly PERSONAL
New Research Project - iConsent
Researchers approach the database with an “instance” of their study (e.g. an instance of the OBI)
Semantic negotiation between my ontology and their study parameters determines which aspects of my data they are
allowed access to (if any)
New Research Project - iConsent
For Patient:
Personalized access to personal information
Much more granular than current consent models
New Research Project - iConsent
For Researcher:
Access to more data from willing patients
Automated profiling of data you weren’t allowed access to – detection of sample bias!
Simple and Open WINS!Join us!
We have recently received funding from CANARIEto assist and train service providers
in deploying their own SADI Semantic Web Services
Come join us – we’re having a lot of fun!!
http://sadiframework.orghttp://twitter.com/sadiframework
Fin
This presentation available on SlideShare: keywords ‘sadi’, ‘wilkinson’
C r e d i t s
B e n j a m i n V a n d e r V a l k M S c S t u d e n t B i o i n f o r m a t i c s T r a i n i n g P r o g r a m m e
L u k e M c C a r t h y L e a d P r o g r a m m e r , S A D I
S o r o u s h S a m a d i a n P h D S t u d e n tB i o i n f o r m a t i c s T r a i n i n g P r o g r a m m e
Microsoft Research