Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

41
ONTOLOGY BASED DATA ACCESS Architecture, Techniques and Systems Mariano Rodríguez-Muro KRDB Research Group Free University of Bozen-Bolzano BMIR, Stanford February, 2012

description

Seminar on Ontology Based Data Access for RDBMSs through query rewriting at Stanford's BMIR lab. 2012.

Transcript of Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

Page 1: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

ONTOLOGY BASED DATA ACCESS Architecture, Techniques and Systems

Mariano Rodríguez-Muro KRDB Research Group

Free University of Bozen-Bolzano BMIR, Stanford February, 2012

Page 2: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

ONTOLOGIES Reasoning and Data

OBDA: Architecture, Techniques and Systems

Page 3: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

Ontologies

• A formal conceptualization of a domain of interest • They come in many different

languages: RDFS, OBO, OWL 2, SWRL, etc. • Uses • Documentation • Knowledge Exchange • Discovering new knowledge • Ontologies + Data…

OBDA: Architecture, Techniques and Systems

Page 4: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

Instance reasoning •  Instance reasoning •  Infer new information about the data •  Detect inconsistent data •  Use inferred information for complex queries (e.g., SPARQL)

• Queries •  Is :person/mariano an instance of :Mammal? •  Retrieve all instances of :Mammal •  SELECT ?x, ?y WHERE { ?x a :Mammal; :hasAncestor ?y. ?y a :Mammal }

• Requirements •  Fast execution •  Efficient resource management •  Big data, Big ontologies

OBDA: Architecture, Techniques and Systems

Page 5: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

The usual workflow

OBDA: Architecture, Techniques and Systems

Reasoner

Source

Application

Communication

Ontology

Inputs

Triples Application Code

Page 6: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

Problem with approach •  Software Complexity • Duplication • Data refreshing

• Data structure is lost (PKEYS, FOREIGN KEYS, information about the import procedure)

OBDA: Architecture, Techniques and Systems

Reasoner

Source

Application

Communication

Ontology

Inputs

Triples Application Code

Page 7: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

OBDA Models and Architecture

OBDA: Architecture, Techniques and Systems

Page 8: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

OBDA as an Architecture

OBDA: Architecture, Techniques and Systems

Reasoner

Source

Application

Direct Communication

Ontology

OBDA Model

Inputs

Page 9: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

OBDA Models: Sources and Mappings

“A formal specification of the relationship between data in a data source and the vocabulary of the ontology”

OBDA: Architecture, Techniques and Systems

OBDA Model

Source

Source Declaration A set of mappings

Page 10: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

Mapping

“A tuple of 2 queries, one over the source and one over the ontology, with the same signature. Intuitively, a mapping associates the data specified by qs with the answers for qo ”

OBDA: Architecture, Techniques and Systems

qs⊆qo

SELECT id FROM condition WHERE c_id = 3333

⊆ CardiacArrestPatient(?id)èq(?id)

id = (23) <23> rdf:type CardiacArrestPatient

Page 11: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

Example OBDA model

OBDA: Architecture, Techniques and Systems

SELECT id FROM condition WHERE c_id = 3333

⊆ CardiacArrestPatient(?id) è q(?id)

SELECT id,name,age,ssn FROM patient ⊆ Patient(?id) ^ name(?id,?name)

^ age(?id,?age) ^ ssn(?id, ?ssn) è q(?id,?name,?age,?ssn)

id [PKEY] name age ssn

12345 John 37 xxx-999

… … … …

Table: patient

patient_id [FKEY] c_id [FKEY]

12345 3333

… …

Table: condition

Page 12: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

Example OBDA model

OBDA: Architecture, Techniques and Systems

id [PKEY] name age ssn

12345 John 37 xxx-999

… … … …

Table: patient

patient_id [FKEY] c_id [FKEY]

12345 3333

… …

Table: condition

<12345> rdf:type :Patient. <12345> :name “John”. <12345> :age “37”. <12345> :ssn “xxx-999” <12345> rdf:type :CardiacArrestPatient …

Page 13: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

The Pay-off • At least •  The source is documented •  Data handling can be done automatically (by the reasoner) •  Reduced cost of application development and maintenance •  The reasoner can analyze source and mappings to minimize the cost of

inference

• The sweet spot •  On-the-fly data access •  Reasoning by query rewriting •  Exploitation of efficient engines

OBDA: Architecture, Techniques and Systems

Page 14: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

QUERY REWRITING

OBDA: Architecture, Techniques and Systems

Page 15: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

Query Rewriting in a Nutshell

• Given a query Q, a TBox T, an OBDA model <D, M> to compute a query Q’ such that:

answer(Q,T,mat(D,M)) = answer(Q’,D)

where mat(D,M) is the collection of assertion resulting from “materializing” the mappings into ABox assertions (assertional triples)

OBDA: Architecture, Techniques and Systems

Page 16: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

Example OBDA model

OBDA: Architecture, Techniques and Systems

SELECT id FROM condition WHERE c_id = 3333

⤳ CardiacArrestPatient(?id) è q(?id)

SELECT id,name,age,ssn FROM patient ⤳ Patient(?id) ^ name(?id,?name)

^ age(?id,?age) ^ ssn(?id, ?ssn) è q(?id,?name,?age,?ssn)

id [PKEY] name age ssn

12345 John 37 xxx-999

… … … …

Table: patient

patient_id [FKEY] c_id [FKEY]

12345 3333

… …

Table: condition

Page 17: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

Query Rewriting: An example

OBDA: Architecture, Techniques and Systems

Ontology (Tbox)

SubClassOf(:CardiacArrest :HearthCondition) SubClassOf(:CardiacArrestPatient :Patient) SubClassOf(:CardiacArrestPatient ObjectSomeValuesFrom(:affectedBy :CardiacArrest))

Query (SPARQL)

SELECT ?p ?name ?ssn WHERE { ?p a :Patient; :name ?name; :ssn ?ssn; :age ?age

:affectedBy [ a :HeartCondition

]. FILTER (?age >= 21 && ?age <= 50) }

Page 18: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

Query Rewriting: An example

OBDA: Architecture, Techniques and Systems

Rewritten query

SELECT ?p ?name ?ssn WHERE { {?p a :Patient; :name ?name; :ssn ?ssn; :age ?age

:affectedBy [ a :HeartCondition

]. FILTER (?age >= 21 && ?age <= 50) }

UNION {?p a :Patient; :name ?name; :ssn ?ssn; :age ?age

:affectedBy [ a :CardiacArrest

]. FILTER (?age >= 21 && ?age <= 50) }

UNION {?p a :Patient; :name ?name; :ssn ?ssn; :age ?age; a :CardiacArrestPatient. FILTER (?age >= 21 && ?age <= 50) }

UNION … }

Page 19: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

Query Rewriting An Example

OBDA: Architecture, Techniques and Systems

SQL query

SELECT tp.id as p, tp.name as name, tp.age as age FROM patient tp JOIN condition tc ON tp.id = tc.patient_id WHERE c.c_id = 3333 AND tp.age >= 21 AND tp.age <= 50

?p ?name ?ssn

12345 John xxx-999

Answer

“Fast execution even in the presence of millions of assertions”

Page 20: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

That Simple? • Warning: Query rewritings can easily grow to exponentially. • Effective query rewriting requires: •  Highly efficient rewriting algorithm that is able to detect redundancy •  Highly efficient SQL generation: •  Detect redundant SQL (w.r.t. constraints and mappings) •  Optimize individual SQL queries (w.r.t. constraints and mappings) •  Generate optimal SQL (w.r.t. the database engine) •  Able to deal with impedance miss-match (URIs and Literals vs. Data values)

•  Database engine tuning (indexing, buffers, disk, etc.)

• Effective query rewriting gives you: •  Fast system initialization •  Small footprint •  Fast query execution

OBDA: Architecture, Techniques and Systems

Page 21: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

Efficient Languages (for pure query rewriting)

• RDFS, DL-Lite, OWL 2 QL • Datalog+- • DL-lite/OWL 2 QL/Datalog+- fragments of SWRL

Promising Languages (for combined approaches) •  EL++ and OWL 2 EL •  OWL-Horst and OWL 2 RL •  SWRL with limited recursivity

OBDA: Architecture, Techniques and Systems

Page 22: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

SYSTEMS OBDALib, OBDA Plugin for Protègè 4

OBDA: Architecture, Techniques and Systems

Page 23: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

OBDA as an Architecture

OBDA: Architecture, Techniques and Systems

Ontology

Reasoner

OBDA Model

Source

Application

Communication

Inputs

Page 24: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

OBDALib A Java library for: •  OBDA Model creation and manipulation •  OBDA Model persistence •  Interfaces for OBDA-capable reasoners •  SQL parsing and Datalog translation •  RDBMS metadata extraction libraries •  OBDA model materialization

In the near future: •  Automatic OBDA model generation (compatible with W3C’s RDB2RDF

direct mapping) •  Support for W3C’s R2RML syntax

OBDA: Architecture, Techniques and Systems

Page 25: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

OBDA Plugin for Protégé 4

“A plugin to write and test OBDA models interact with OBDA-capable reasoners”

OBDA: Architecture, Techniques and Systems

Page 26: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

OBDA Model tab and tools

OBDA: Architecture, Techniques and Systems

Page 27: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

OBDA Model tab and tools

OBDA: Architecture, Techniques and Systems

Page 28: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

OBDA Model synch

An EditorKitHook plugin to: • Associate an OBDA

model to the editor environment • Synchronize OBDA

models with OBDA-capable reasoners

OBDA: Architecture, Techniques and Systems

Page 29: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

DataQuery Tab

OBDA: Architecture, Techniques and Systems

Page 30: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

SYSTEMS Quest

OBDA: Architecture, Techniques and Systems

Page 31: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

Quest An OBDA-capable reasoner with focus on fast and efficient query answering over very large ontologies and volumes of data. Features: •  Support for RDFS and OWL 2 QL and DL-Lite •  SPARQL

• On-the-fly reasoning based on query rewriting •  Read-only “Virtual OBDA” •  Read/Write “Triple-store” mode

• Generation of highly optimized SQL

• OWLAPI 3 and Protégé support

OBDA: Architecture, Techniques and Systems

Page 32: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

Quest in virtual mode

OBDA: Architecture, Techniques and Systems

Ontology

Quest

OBDA Model

Source

Application

JDBC

Inputs

MySQL, PostgreSQL, DB2 and Oracle

Page 33: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

Data integration with Quest in virtual mode

OBDA: Architecture, Techniques and Systems

Ontology

Quest

OBDA Model

Database Federator

Application

JDBC

Inputs

E.g., Teiid

Page 34: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

Read/Write triple-store mode

OBDA: Architecture, Techniques and Systems

Ontology

Quest

Triples

JDBC Storage

Application

JDBC

Storage is is based on the Semantic Index technique (ISWC11, KR12)

Technique based on “smart index” computation that allows to retrieve hierarchy inferences by means of interval queries (FAST SQL!)

Page 35: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

Performance in triple-store mode: Resource Index Experiments •  Input: •  Ontology: The asserted is-a relations in obs_relation (for all RI ontologies) •  Data: The annotations for Clinical Trials.gov •  Queries e.g,.

SELECT ?x WHERE { ?x a :DNA_Repair_Gene; a :Antigen_Gene; a :Cancer_Gene. }

OBDA: Architecture, Techniques and Systems

Page 36: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

Performance in triple-store mode: Resource Index Experiments • System setup costs: •  Resource Index workflow: •  Ontology Closure: X ? •  CT annotation closure: 7 days (naïve), 40 mins optimized •  Space requirements for CT: 16 GB + isa-closure: 70 GB

•  Using a naïve implementation of Quest’s reasoning technique for the RI: •  Ontology Closure: 5 mins •  CT annotation closure: none •  Space requirements for CT: 16 GB

• Execution speed: roughly the same • Potential to eliminate all _isa_annotation_tables and the closure of relation_isa.

OBDA: Architecture, Techniques and Systems

Page 37: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

DEMO

OBDA: Architecture, Techniques and Systems

Page 38: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

CONCLUSIONS

OBDA: Architecture, Techniques and Systems

Page 39: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

Summary • OBDA as an architecture •  Benefits: Software Complexity, Optimization and On-the-fly query

answering

• Basis of query rewriting in OBDA •  Introduced •  OBDALib •  OBDA Plugin for Protégé •  Quest

• Briefly mentioned the performance advantages of Quest’s reasoning technique

OBDA: Architecture, Techniques and Systems

Page 40: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

Where to go now? • Resource index overhauling? • Demos? • More detail on the techniques? • More details on the systems? • Development and plugins for Protege •  Projects?! • You call it J

OBDA: Architecture, Techniques and Systems

Page 41: Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewriting

THANK YOU

OBDA: Architecture, Techniques and Systems