The Semantic Web - This time... its Personal

160
“Shopping for data should be as easy as shopping for shoes!!” Carole Goble

description

My presentation on SADI, SHARE, CardioSHARE, and the new iConsent project. Presented to the faculty and students at Stanford Medical Informatics, Palo Alto, USA. May 14th, 2010. How do we make the semantic web, and medical research, more personal? (both for the researcher and for the patient) I present some ideas we're exploring

Transcript of The Semantic Web - This time... its Personal

Page 1: The Semantic Web - This time... its Personal

“Shopping for data should be as easy

as shopping for shoes!!”

Carole Goble

Page 2: The Semantic Web - This time... its Personal

The Semantic WebThis time… it’s personal!

Mark Wilkinson, PI Bioinformatics, Heart + Lung Institute @ St. Paul’s HospitalVancouver, BC, Canada

Page 3: The Semantic Web - This time... its Personal

My Lab

Page 4: The Semantic Web - This time... its Personal

Engineering &

Research

Page 5: The Semantic Web - This time... its Personal

Programmers&

Students

Page 6: The Semantic Web - This time... its Personal

Coolness!&

Study of Coolness!

Page 7: The Semantic Web - This time... its Personal

Middleware

Page 8: The Semantic Web - This time... its Personal

Robert Stevens

Page 9: The Semantic Web - This time... its Personal
Page 10: The Semantic Web - This time... its Personal

Shouldn’t be

seen!

Page 11: The Semantic Web - This time... its Personal
Page 12: The Semantic Web - This time... its Personal

DEMO

...to give you incentive to listen to the rest of the presentation

Page 13: The Semantic Web - This time... its Personal

Show me patients with elevated creatinine along with their latest BUN and creatinine levels

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX patients: <http://sadiframework.org/ontologies/patients.owl#> PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#> SELECT ?patient ?bun ?creatFROM <http://sadiframework.org/ontologies/patients.rdf>WHERE {

?patient rdf:type patients:ElevatedCreatininePatient .?patient pred:latestBUN ?bun . ?patient pred:latestCreatinine ?creat .

}

Page 14: The Semantic Web - This time... its Personal

VOILA!

Page 15: The Semantic Web - This time... its Personal

There was no database...

Page 16: The Semantic Web - This time... its Personal

There was no warehouse...

Page 17: The Semantic Web - This time... its Personal

There was no patient data anywhere annotated as

“Elevated Creatinine Patient”

Page 18: The Semantic Web - This time... its Personal

How did I answer a question where the required data didn’t exist?

(...no, I didn’t just make it up LOL!)

Page 19: The Semantic Web - This time... its Personal

In the beginning was the problem...

Page 20: The Semantic Web - This time... its Personal

The Problem

Page 21: The Semantic Web - This time... its Personal

The Problem

Page 22: The Semantic Web - This time... its Personal

The Holy Grail:(circa 2002)

Align the promoters of all serine threonine kinases involved exclusively in the regulation of cell sorting during wound healing in blood vessels.

Retrieve and align 2000nt 5' from every serine/threonine kinase in Mus musculus expressed exclusively in the tunica [I | M |A] whose expression increases 5X or more within 5 hours of wounding but is not activated during the normal development of blood vessels, and is <40% homologous in the active site to kinases known to be involved in cell-cycle regulation in any other species.

Page 23: The Semantic Web - This time... its Personal

The Problem

Page 24: The Semantic Web - This time... its Personal

The Problem

Page 25: The Semantic Web - This time... its Personal

The Solution??

Page 26: The Semantic Web - This time... its Personal

Why not?

Heart

Heart

Page 27: The Semantic Web - This time... its Personal

You don’t know

what I know!

Page 28: The Semantic Web - This time... its Personal

You don’t think

how I think!

Page 29: The Semantic Web - This time... its Personal

No Personalization!

Page 30: The Semantic Web - This time... its Personal

So… what can we do?

Page 31: The Semantic Web - This time... its Personal

What we need

Page 32: The Semantic Web - This time... its Personal

How do we make Bruce’s knowledge machine-readable?

Page 33: The Semantic Web - This time... its Personal

Ontologies!

Page 34: The Semantic Web - This time... its Personal

Two Problems with that…

Page 35: The Semantic Web - This time... its Personal

#1 Ontology Spectrum

Catalog/ID

SelectedLogical

Constraints(disjointness,

inverse, …)

Terms/glossary

Thesauri“narrowerterm”relation

Formalis-a

Frames(Properties)

Informalis-a

Formalinstance

Value Restrs.

GeneralLogical

constraints

Originally from AAAI 1999- Ontologies Panel by Gruninger, Lehmann, McGuinness, Uschold, Welty; – updated by McGuinness.Description in: www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html

Page 36: The Semantic Web - This time... its Personal

#1 Ontology Spectrum

Catalog/ID

SelectedLogical

Constraints(disjointness,

inverse, …)

Terms/glossary

Thesauri“narrowerterm”relation

Formalis-a

Frames(Properties)

Informalis-a

Formalinstance

Value Restrs.

GeneralLogical

constraints

Page 37: The Semantic Web - This time... its Personal

#1 Ontology Spectrum

Catalog/ID

SelectedLogical

Constraints(disjointness,

inverse, …)

Terms/glossary

Thesauri“narrowerterm”relation

Formalis-a

Frames(Properties)

Informalis-a

Formalinstance

Value Restrs.

GeneralLogical

constraints

WHY?

Page 38: The Semantic Web - This time... its Personal

#1 Ontology Spectrum

Catalog/ID

SelectedLogical

Constraints(disjointness,

inverse, …)

Terms/glossary

Thesauri“narrowerterm”relation

Formalis-a

Frames(Properties)

Informalis-a

Formalinstance

Value Restrs.

GeneralLogical

constraints

WHY?

Because I say so!

Because it fulfils XYZ

Page 39: The Semantic Web - This time... its Personal

Because I say so?!?

That’s not very... Personal...

Page 40: The Semantic Web - This time... its Personal
Page 41: The Semantic Web - This time... its Personal

No room for disagreement

Page 42: The Semantic Web - This time... its Personal

No place for scientific discourse

Page 43: The Semantic Web - This time... its Personal

It is what it is...Because I say so!

Page 44: The Semantic Web - This time... its Personal

Clay Shirky

“Ontology is over-rated”

http://www.shirky.com/writings/ontology_overrated.html

Page 45: The Semantic Web - This time... its Personal

If we’re going to personalize the Semantic Web

We must change the way we create and use

classification systems(“shelves”)

Page 46: The Semantic Web - This time... its Personal

“Get rid of the shelf”

- Clay Shirkey

Page 47: The Semantic Web - This time... its Personal

You don’t know

what I know!

Page 48: The Semantic Web - This time... its Personal

So don’t tell ME

how data should be interpreted!

Page 49: The Semantic Web - This time... its Personal

How do we get rid of the shelf?

Page 50: The Semantic Web - This time... its Personal

This is going to hurt...

Page 51: The Semantic Web - This time... its Personal

Web Services

vs.

Semantic Web

Page 52: The Semantic Web - This time... its Personal

Web Servicesare not “connected” to

the Semantic Web

Why?

Page 53: The Semantic Web - This time... its Personal

Web ServicesXML + XML Schema

Semantic WebRDF + OWL

Page 54: The Semantic Web - This time... its Personal

Web ServicesPOST of SOAP-XML

Semantic WebGET of RDF-XML

Page 55: The Semantic Web - This time... its Personal

Web ServicesNo (rigorous) semantics

Semantic WebRich, flexible semantics

Page 56: The Semantic Web - This time... its Personal

Web Services&

Semantic Web

Fundamentally and deeply different Web technologies!

Page 57: The Semantic Web - This time... its Personal
Page 58: The Semantic Web - This time... its Personal

>1000 X more data!

Page 59: The Semantic Web - This time... its Personal

Accessing these databases and

analytical algorithms “transparently”,

based on an individual researcher’s

ideas, beliefs, and preferences

will help us personalize

medical research

Page 60: The Semantic Web - This time... its Personal

Mark Butler (2003) Is the semantic web hype? Hewlett Packard laboratories presentation at MMU, 2003-03-12

Page 61: The Semantic Web - This time... its Personal

Semantic Web?(my definition)

An information system where machines can receive information from one source, re-interpret it, and correctly use it for a purpose that the source had

not anticipated.

Page 62: The Semantic Web - This time... its Personal

Re-interpretation

Correct re-use

Both are critical to

the personalization

of research

Page 63: The Semantic Web - This time... its Personal

Building a

personalized Semantic Web…

Step-by-step…

Page 64: The Semantic Web - This time... its Personal

Founding partner

Semantic Automated Discovery and Integrationhttp://sadiframework.org

(open source)MicrosoftResearch

Page 65: The Semantic Web - This time... its Personal

“best-practices” for Semantic Web Service provision

Page 66: The Semantic Web - This time... its Personal

standards-compliant

Page 67: The Semantic Web - This time... its Personal

Lightweight(only 2 “rules”)

Page 68: The Semantic Web - This time... its Personal

Rules come from observations:

Page 69: The Semantic Web - This time... its Personal

SADI Observation #1:

Web Services in Bioinformatics create implicit biological relationships

between their input and output

Page 70: The Semantic Web - This time... its Personal

SADI Observation #1:

Page 71: The Semantic Web - This time... its Personal

SADI Best Practice #1

Make the implicit explicit…

A Web Service should create “triples” linking the input data to the output data, thus explicitly describing the semantic

relationship between them

Page 72: The Semantic Web - This time... its Personal

SADI Best Practice #1

This is what bioinformatics Web Services implicitly do anyway!

Easy to implement this as a best-practice

Page 73: The Semantic Web - This time... its Personal

SADI Observation #2:HTTP GET and POST

GET guarantees the response relates to the request URI

in a very precise and predictable way

POST does not…

Page 74: The Semantic Web - This time... its Personal

SADI Observation #2:GET and POST

That’s why Web Services have a fundamentally different behaviour than the Semantic Web

Page 75: The Semantic Web - This time... its Personal

SADI Observation #2:GET and POST

We can fix that!

(without breaking any existing rules or standards!)

Page 76: The Semantic Web - This time... its Personal

SADI Best Practice #2

SUBJECT URI of the output graph (triples) is the same

as the SUBJECT URI of the input graph (triples)

(the output is “about” the input... Now explicitly!)

Page 77: The Semantic Web - This time... its Personal

Consequence

The “Semantics” of our interaction with the Web Service are now

explicit and identical to the “Semantics” of GET

Page 78: The Semantic Web - This time... its Personal

SADI Web Service Interfaces

Service Interfaces defined by two OWL classes:

Page 79: The Semantic Web - This time... its Personal

SADI Web Service Interfaces

OWL Class #1: My Input Class

Page 80: The Semantic Web - This time... its Personal

SADI Web Service Interfaces

OWL Class #2: My Output Class

Page 81: The Semantic Web - This time... its Personal

SADI Web Service Interfaces

My Service consumes OWL Individuals of Class #1

and returns OWL Individuals of Class #2

…but the URI of those two individuals is the same!(see best practice #2)

Page 82: The Semantic Web - This time... its Personal

How do we discover services?

Since input and output are about the same “thing”,

we can automatically determine

what a service does

by comparing

the Input and Output OWL classes

Page 83: The Semantic Web - This time... its Personal

Automatically index services in a registry based

on what properties (predicates) Services add to

their respective input data

How do we discover services?

Page 84: The Semantic Web - This time... its Personal

EXAMPLE

Input Data: BRCA1 rdf:type Gene ID

Output Data: BRCA1 hasDNASequence AGCTTAGCCA…

Registry Index: Service provides “hasDNASequence” property to Gene IDs

Page 85: The Semantic Web - This time... its Personal

Now we can answer questions like

“what is the DNA sequence of BRCA1?”

Discover a SADI Web Service that generates the

DNA Sequence property for gene identifiers

Page 86: The Semantic Web - This time... its Personal

Okay, enough tech gobbledygook

What will this do for ME?

Page 87: The Semantic Web - This time... its Personal

Demo #1

Page 88: The Semantic Web - This time... its Personal

Imagine there is a “virtual database” containing all of the data from all of the databases,together with the output of

every conceivable analysis

Page 89: The Semantic Web - This time... its Personal

How do we query that database?

Page 90: The Semantic Web - This time... its Personal

“SHARE”

Semantic Health And Research Environment

SADI client application

Page 91: The Semantic Web - This time... its Personal

What pathways does UniProt protein P47989 belong to?

PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>PREFIX ont: <http://ontology.dumontierlab.com/>PREFIX uniprot: <http://lsrn.org/UniProt:>SELECT ?gene ?pathway WHERE {

uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway .

}

Page 92: The Semantic Web - This time... its Personal
Page 93: The Semantic Web - This time... its Personal
Page 94: The Semantic Web - This time... its Personal
Page 95: The Semantic Web - This time... its Personal

Recapwhat we just saw

A standard SPARQL query was entered into SHARE, a SADI-aware query engine

Page 96: The Semantic Web - This time... its Personal

Recapwhat we just saw

The query was interpreted to extract the properties being queried and these were passed to SADI for Web Service discovery

Page 97: The Semantic Web - This time... its Personal

Recapwhat we just saw

SADI searched-for, found, and accessed all databases and/or analytical tools capable

of generating those properties

Page 98: The Semantic Web - This time... its Personal

Recapwhat we just saw

We posed, and answered a complex database query

WITHOUT A DATABASE

(in fact, the data didn’t even have to exist...)

Page 99: The Semantic Web - This time... its Personal

The Holy Grail:

Align the promoters of all serine threonine kinases involved exclusively in the regulation of cell sorting during wound healing in blood vessels.

Retrieve and align 2000nt 5' from every serine/threonine kinase in Mus musculus expressed exclusively in the tunica [I | M |A] whose expression increases 5X or more within 5 hours of wounding but is not activated during the normal development of blood vessels, and is <40% homologous in the active site to kinases known to be involved in cell-cycle regulation in any other species.

Page 100: The Semantic Web - This time... its Personal

Cool!

Page 101: The Semantic Web - This time... its Personal

…but I’m supposed to be personalizing research…

Let’s make this a little more personal by bringing in Ontologies

Page 102: The Semantic Web - This time... its Personal

My Definition of Ontology (for this talk)

Ontologies explicitly define the things that exist in “the world”

based on what properties each kind of thing must have

Page 103: The Semantic Web - This time... its Personal

Ontology Spectrum

Catalog/ID

SelectedLogical

Constraints(disjointness,

inverse, …)

Terms/glossary

Thesauri“narrowerterm”relation

Formalis-a

Frames(Properties)

Informalis-a

Formalinstance

Value Restrs.

GeneralLogical

constraints

Page 104: The Semantic Web - This time... its Personal

Demo #2Discover instances of OWL classes

from data that doesn’t exist…

Page 105: The Semantic Web - This time... its Personal

Data exhibits “late binding”

Page 106: The Semantic Web - This time... its Personal
Page 107: The Semantic Web - This time... its Personal

Late binding:

“purpose and meaning” of the data is

not determined untilthe moment it is required

Page 108: The Semantic Web - This time... its Personal

Benefitof late binding

Data is amenable toconstant re-interpretation

...MY interpretation

Page 109: The Semantic Web - This time... its Personal

How?

DO NOT PRE-CLASSIFY DATA

Just hang properties on it

Page 110: The Semantic Web - This time... its Personal

Ontologies are in the “Frames” area of the Ontology spectrum, and therefore can leverage SADI and be

“executed” as workflows

Page 111: The Semantic Web - This time... its Personal

???

Page 112: The Semantic Web - This time... its Personal

Did you just say

“execute ontologies as workflows?!”

Page 113: The Semantic Web - This time... its Personal

Show me patients with elevated creatinine along with their latest BUN and creatinine levels

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX patients: <http://sadiframework.org/ontologies/patients.owl#> PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#> SELECT ?patient ?bun ?creatFROM <http://sadiframework.org/ontologies/patients.rdf>WHERE {

?patient rdf:type patients:ElevatedCreatininePatient .?patient pred:latestBUN ?bun . ?patient pred:latestCreatinine ?creat .

}

Page 114: The Semantic Web - This time... its Personal

Start burrowing through the OWL class find that we need aregression model OWL class

Page 115: The Semantic Web - This time... its Personal

Regression models have features like slopes and intercepts… and so onThe class is completely decomposed until a set of required Services are discoveredcapable of creating all these necessary properties

Page 116: The Semantic Web - This time... its Personal

Successful decomposition of the OWL class to discover the need for a LinearRegression Web Service, and so on

Page 117: The Semantic Web - This time... its Personal

VOILA!

Page 118: The Semantic Web - This time... its Personal

Current Research Project Benjamin Vandervalk

There are many ways to resolve these ontologies and queries into workflows

My student Benjamin is currently studying optimization of this query resolution strategy

Interesting counterpart to DBMS query resolution because there are no indices for Web Services, and other issues (e.g. Service speed) are an important factor.

Page 119: The Semantic Web - This time... its Personal

The Holy Grail:

Align the promoters of all serine threonine kinases involved exclusively in the regulation of cell sorting during wound healing in blood vessels.

Retrieve and align 2000nt 5' from every serine/threonine kinase in Mus musculus expressed exclusively in the tunica [I | M |A] whose expression increases 5X or more within 5 hours of wounding but is not activated during the normal development of blood vessels, and is <40% homologous in the active site to kinases known to be involved in cell-cycle regulation in any other species.

Page 120: The Semantic Web - This time... its Personal

OWL Class restrictions converted into workflows

SPARQL queries converted into workflows

Reasoning happening in parallel with query execution

Data fulfilling OWL models is discovered,

or generated through running analytical tools

SADI and CardioSHARE

Page 121: The Semantic Web - This time... its Personal

I still don’t seehow this is“Personal”

??

Page 122: The Semantic Web - This time... its Personal

SELECT ?patient ?bun ?creatFROM <http://sadiframework.org/ontologies/patients.rdf>WHERE {

?patient rdf:type patients:ElevatedCreatininePatient .?patient pred:latestBUN ?bun . ?patient pred:latestCreatinine ?creat .

}

Page 123: The Semantic Web - This time... its Personal

I created a small ontologydescribing my definition of

an Elevated Creatinine Patient

Page 124: The Semantic Web - This time... its Personal

… it was MY ontology!

Page 125: The Semantic Web - This time... its Personal

I can re-use it

Page 126: The Semantic Web - This time... its Personal

I can modify it as I change my world-view

Page 127: The Semantic Web - This time... its Personal

I can publish it for others to use

Page 128: The Semantic Web - This time... its Personal

Others can modify it to fit THEIR world-view

Page 129: The Semantic Web - This time... its Personal
Page 130: The Semantic Web - This time... its Personal

My personal world-view is being dynamically resolved againstglobal data and knowledge

Page 131: The Semantic Web - This time... its Personal

…but it’s bigger than that…

Page 132: The Semantic Web - This time... its Personal

“Elevated Creatinine Patient”

Page 133: The Semantic Web - This time... its Personal

I made that up! It came out of my head!

Page 134: The Semantic Web - This time... its Personal

What’s another word for a world-view that you make-up?

Hypothesis

Page 135: The Semantic Web - This time... its Personal

Current Research Project

We believe that ontologies and hypotheses are, in some ways, the same “thing”…

…simply assertions about individuals that may or may not exist

Page 136: The Semantic Web - This time... its Personal

Current Research ProjectSoroush Samadian

e-Copy Clinical Outcomes investigations to determine if Outcomes hypotheses can be modeled as OWL ontologies and automatically resolved by SHARE

Page 137: The Semantic Web - This time... its Personal

Recap

SADI Semantic Web Services generate triples; the predicates of those triples are indexed... Period.

For a given query, determine which properties are available, and which need to be

discovered/generated

Find services that generate the properties we need

Page 138: The Semantic Web - This time... its Personal

Semantic Web

An information system where machines can receive information from one source, re-interpret it, and correctly use it for a purpose that the source had

not anticipated.

My Purpose!!

Page 139: The Semantic Web - This time... its Personal

What SADI + SHARE supports

Re-interpretation

We constantly compare the collection of properties, gathered from third-parties worldwide, to whatever world-model (query/ontology) we wish to view it

through.

MY world model

Page 140: The Semantic Web - This time... its Personal

Novel re-use

There is no way for the provider to dictate how their data should be used, or how it should be interpreted. They simply add

their properties into the “data cloud” and those properties are used in whatever

way is appropriate for ME.

What SADI + SHARE supports

Page 141: The Semantic Web - This time... its Personal

And all this because SADI simply requires

that the input URI

is the same

as the output URI

Page 142: The Semantic Web - This time... its Personal

Important “wins”

Page 143: The Semantic Web - This time... its Personal

Data remains distributed

no warehouse!

Page 144: The Semantic Web - This time... its Personal

Data is not “exposed” as a SPARQL endpoint

greater provider-control over computational resources

Page 145: The Semantic Web - This time... its Personal

Semi-automated SADI service writing and deployment

Taverna

Semantically-guided SADI service discovery and pipelining

SADI Plug-ins

Page 146: The Semantic Web - This time... its Personal

Where do we go from here?

Page 147: The Semantic Web - This time... its Personal
Page 148: The Semantic Web - This time... its Personal
Page 149: The Semantic Web - This time... its Personal

Consent

Page 150: The Semantic Web - This time... its Personal

Who defines my consent?

Page 151: The Semantic Web - This time... its Personal

IRB

(in Canada REB)

Page 152: The Semantic Web - This time... its Personal

Where’s MY opinionin that process?

Page 153: The Semantic Web - This time... its Personal

New Research Project - iConsent

I define an ontology describing the conditions under which I would allow my bio/medical data to be used

Type of study, funding source, type of information

Highly granular, highly detailedHighly PERSONAL

Page 154: The Semantic Web - This time... its Personal

New Research Project - iConsent

Researchers approach the database with an “instance” of their study (e.g. an instance of the OBI)

Semantic negotiation between my ontology and their study parameters determines which aspects of my data they are

allowed access to (if any)

Page 155: The Semantic Web - This time... its Personal

New Research Project - iConsent

For Patient:

Personalized access to personal information

Much more granular than current consent models

Page 156: The Semantic Web - This time... its Personal

New Research Project - iConsent

For Researcher:

Access to more data from willing patients

Automated profiling of data you weren’t allowed access to – detection of sample bias!

Page 157: The Semantic Web - This time... its Personal

Simple and Open WINS!Join us!

We have recently received funding from CANARIEto assist and train service providers

in deploying their own SADI Semantic Web Services

Come join us – we’re having a lot of fun!!

http://sadiframework.orghttp://twitter.com/sadiframework

Page 158: The Semantic Web - This time... its Personal

Fin

This presentation available on SlideShare: keywords ‘sadi’, ‘wilkinson’

Page 159: The Semantic Web - This time... its Personal

C r e d i t s

B e n j a m i n V a n d e r V a l k M S c S t u d e n t B i o i n f o r m a t i c s T r a i n i n g P r o g r a m m e

L u k e M c C a r t h y L e a d P r o g r a m m e r , S A D I

S o r o u s h S a m a d i a n P h D S t u d e n tB i o i n f o r m a t i c s T r a i n i n g P r o g r a m m e

Page 160: The Semantic Web - This time... its Personal

Microsoft Research