Global-as-View Ontology-Based Data Access for Relational Data · 2019-10-17 · Knowledge graphs...

People and Knowledge NetworksWeST

Fachbereich 4: Informatik Institute for Web Scienceand Technologies

Global-as-View Ontology-Based DataAccess for Relational Data

Masterarbeitzur Erlangung des Grades eines Master of Science (M.Sc.)

im Studiengang Informatik

vorgelegt vonAdrian Skubella

Erstgutachter: Prof. Dr. Steffen StaabInstitute for Web Science and Technologies

Zweitgutachter: M. Sc. Daniel JankeInstitute for Web Science and Technologies

Koblenz, im September 2019

Erklärung

Hiermit bestätige ich, dass die vorliegende Arbeit von mir selbstständig verfasstwurde und ich keine anderen als die angegebenen Hilfsmittel – insbesondere keineim Quellenverzeichnis nicht benannten Internet-Quellen – benutzt habe und dieArbeit von mir vorher nicht in einem anderen Prüfungsverfahren eingereicht wurde.Die eingereichte schriftliche Fassung entspricht der auf dem elektronischenSpeichermedium (CD-Rom).

Ja Nein

Mit der Einstellung dieser Arbeit in die Bibliothekbin ich einverstanden. ◻ ◻

Der Veröffentlichung dieser Arbeit im Internetstimme ich zu. ◻ ◻

Der Text dieser Arbeit ist unter einer CreativeCommons Lizenz (CC BY-SA 4.0) verfügbar. ◻ ◻

Der Quellcode ist unter einer GNU General PublicLicense (GPLv3) verfügbar. ◻ ◻

Die erhobenen Daten sind unter einer CreativeCommons Lizenz (CC BY-SA 4.0) verfügbar. ◻ ◻

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .(Ort, Datum) (Unterschrift)

iii

Anmerkung

• If you would like us to contact you for the graduation ceremony,

please provide your personal E-mail address: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

• If you would like us to send you an invite to join the WeST Alumni

and Members group on LinkedIn, please provide your LinkedIn ID : . . . . . . . . . .

v

Zusammenfassung

Ontology Based Data Access (OBDA) ist eine Technologie, um verschiedene Daten-quellen auf ein globales Schema abzubilden. Das globale Schema kann anschließend an-gefragt werden. Diese Technologie kann zum Beispiel genutzt werden, um relationaleDaten in Knowledge Graphen zu integrieren. In dieser Arbeit wurde ein formaler Rah-men für OBDA-Systeme entwickelt. Basierend auf diesem formalen Rahmen wurdedas OBDA-System UltrawrapOBDA formalisiert. Des Weiteren wurde UltrawrapOBDA

reimplementiert, erweitert und der benötigte Speicherbedarf des Systems wurde op-timiert. Ergebnisse des Texas Benchmark zeigen, dass das reimplementierte Systemdurchschnittlich 3.16 mal schneller ist als UltrawrapOBDA und 1.87 mal schneller alsdas OBDA-Sytem Ontop. Außerdem sind die Ausführungszeiten der Reimplementie-rung und der optimisierten Reimplementierung vergleichbar, während das optimiertesystem 55% weniger Speicherplatz benötigt als das unoptimierte System.

Abstract

Ontology Based Data Access (OBDA) is a paradigm with which different data sourcescan be mapped onto a global schema that can be queried. A use case of OBDA isto integrate relational data into knowledge graphs. In this thesis a formal frame-work for OBDA systems is presented. Based on this framework the OBDA systemUltrawrapOBDA is formalized. UltrawrapOBDA has been reimplemented, extendedand the space consumption of the system has been optimized. Results of the TexasBenchmark show that the reimplemented system is averagely 3.16 times faster thanUltrawrapOBDA and averagely 1.87 times faster than the state of the art OBDAsystem Ontop. Furthermore, the execution times of the reimplemented system andthe optimized reimplementation are comparable, while the space consumption of theoptimized system is reduced by 55% compared to the unoptimized version.

vii

Contents

1. Introduction 11.1. Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2. Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2. Preliminaries 52.1. Resource Description Framework . . . . . . . . . . . . . . . . . . . . . . 52.2. Ontologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3. SPARQL Protocol and RDF Query Language . . . . . . . . . . . . . . . 92.4. Relational Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.5. Relational Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.6. Structured Query Language (SQL) . . . . . . . . . . . . . . . . . . . . . 23

3. Ontology Based Data Access 253.1. Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.2. Formal Framework for Ontology Based Data Access . . . . . . . . . . . 26

4. Ultrawrap 304.1. Compilation Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.2. Tripleview Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.3. Runtime Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.4. SQL Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5. Optimization 485.1. View Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485.2. Support of Exclusive Superclass Instances . . . . . . . . . . . . . . . . . 52

6. Evaluation 546.1. Texas Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546.2. Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576.3. Evaluation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576.4. Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

7. Related Work 627.1. Ontop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 627.2. Optique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637.3. Mastro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647.4. D2RQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 657.5. Morph-RDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

8. Conclusion and Future Research 66

Acknowledgments 68

ix

A. Overview of symbols 72

x

1. Introduction

Knowledge graphs store knowledge about various domains. Examples of knowledgegraphs are the open source knowledge graph Wikidata1, Microsoft Satori2 and theGoogle knowledge graph3. The Google knowledge graph is used for instance fordisplaying information that is connected to the search term in a Google search.One way to store graphs is the triple based Resource Description Framework (RDF),

which is a representation of directed, labelled graphs. Furthermore, ontologies de-scribe a schema for RDF data. With the help of ontologies it is possible to infer newknowledge from existing RDF data. For instance, consider that an ontology definesthat master student and bachelor student are subclasses of the class student. Further-more, consider that Alice is a master student and Bob is a bachelor student. Withthe help of the ontology it can be inferred that Alice and Bob are also instances ofstudent.A lot of information is only available in relational databases. The query language for

relational databases is the Structured Query Language (SQL). One way to integrateinformation stored in relational databases into knowledge graphs is to extract therelational data, translate it to RDF triples and store it in a database for RDF calleda triplestore. Such an approach is called extract, transform, load (ETL). Since therelational database will still be used after an ETL process, a drawback of this strategyis that a second database system is needed and thereby, data is stored twice, oncein the relational database and once in the triple store. Furthermore, every time thedata in the relational database is updated the data needs to be translated and storedin the triple store again.An alternative way of integrating relational data into graphs is the Ontology-Based

Data Access (OBDA). OBDA systems virtualize the information stored in relationsin the relational database as RDF graph. This means that the relational schema of arelational database is mapped onto an ontology, which serves as global schema. Thenqueries written in the standard query language for RDF graphs SPARQL Protocoland Query Language (SPARQL) can be issued against the ontology. These queriesare then translated to SQL queries based on the mappings. These SQL queries canbe used to retrieve data from the relational database such that the query results areequivalent to the SPARQL results obtained when the SPARQL query is executed onthe actual RDF graph. In figure 1 an overview of an OBDA system is given.In this thesis a formal framework for OBDA systems that is independent from a

particular implementation has been introduced. With this framework the OBDA sys-tem UltrawrapOBDA[1] has been formalized. Furthermore, UltrawrapOBDA has beenreimplemented and the system has been optimized. Contrary to the original system,the optimized reimplementation supports instances of superclasses, that are not in-stances of any of the subclasses of the superclass. Furthermore, the space required to

1https://www.wikidata.org last retrieved 20.09.20192https://blogs.bing.com/search/2013/03/21/understand-your-world-with-bing/ last retrieved20.09.2019

3https://developers.google.com/knowledge-graph/

1

Relational Schema Relations

Relational Database

Ontology Virtualized Graph

schema Of

schema Of

User / Application

SPARQL Queries

Mappings Virtualizedas

Figure 1: Overview of an OBDA system.

use the system was reduced. The implemented system has been benchmarked withthe Texas Benchmark and the results have been compared to benchmark results ofUltrawrapOBDA and the state of the art OBDA system Ontop [2].

1.1. Research Questions

In order to reimplement and to optimize UltrawrapOBDA the following research ques-tion have been answered in this thesis.

• Research question 1: How can UltrawrapOBDA be formally defined?Even though UltrawrapOBDA is presented in [1], it is only partly formally de-fined. Therefore, a formal definition of the complete OBDA system is needed.

• Research question 2: How can the space consumption of materialized viewsbe reduced?UltrawrapOBDA uses views, which are virtual tables based on the result sets ofSQL queries, to virtualize an RDF graph. In order to enhance the performanceof the OBDA system, views are materialized, which means that the result setsof the SQL queries are physically stored. In these materialized views data isstored redundantly and therefore, the space consumption of materialized viewsmay be reduced.

• Research question 3: How can instances of superclasses be used indepen-dently from their subclasses.

2

UltrawrapOBDA creates a single SQL view for each class in an ontology. Viewsfor superclasses are defined as the union of all of their subclass views. Subse-quently, each instance of a superclass has to also be instance of at least onesubclass of the superclass. However, RDF allows for instances of superclassesthat are not instances of any of the subclasses of the superclass. Consider anontology that defines that master student and bachelor student are subclassesof student. Furthermore, consider that Alice is an instance of master studentand that Bob is an instance of bachelor student. Additionally to Alice and Bob,the PhD student Carol exists. Since Carol is a PhD student, she is an instanceof student but she is not an instance of master student or bachelor student.The student view is defined as the union of the master student and the bachelorstudent view and thereby it contains Alice and Bob, but not Carol who is anexclusive superclass instance. However, Carol should be included in the studentview.

• Research question 4: How well does the reimplemented and optimized sys-tem perform?The reimplemented system and the optimized reimplementation should be eval-uated using a benchmark for OBDA systems. The benchmark results shouldbe used to evaluate the effect of the optimizations. In order to compare theperformance of the new system with the performance of existing OBDA sys-tems, the benchmark results should be compared to the benchmark results ofUltrawrapOBDA and Ontop.

1.2. Methodology

In the first step the subsets of RDF, Ontologies, SPARQL and relational algebraneeded for this thesis have been introduced in section 2. After that a formal frameworkfor OBDA systems independent from the actual OBDA system has been defined insection 3. With the help of this framework UltrawrapOBDA has been formalized insection 4, such that research question 1 has been answered.After having defined all necessary parts of the OBDA system, the system has been

reimplemented. After that the optimizations have been defined and implemented.Section 5.1 describes how certain attributes can be omitted in views to reduce thespace needed by the OBDA system. This section addresses research question 2.Furthermore, in section 5.2 it has been described how the system supports exclusivesuperclass instances to answer research question 3.In order to answer research question 4, the implementation has been benchmarked

with the Texas Benchmark [3] and the results have been compared to the benchmarkresults of the state of the art OBDA system Ontop4 and the benchmark results pro-vided for UltrawrapOBDA. The results of the benchmark and the comparison of theresults have been described in section 6. In section 7 related work in the field of

4https://ontop.inf.unibz.it/

3

OBDA has been summarized and finally in section 8 the results of the thesis havebeen summarized and possible future research has been presented.

4

2. Preliminaries

The OBDA system that has been implemented in this thesis enables querying rela-tional data with SPARQL based on mappings, which map relational data onto a givenontology. In this section the data schema as well as the query languages for the RDFand relational data are defined. Due to the considerable amount of symbols that areintroduced in this section, table 42 in appendix A shows a summary of the introducedsymbols.

2.1. Resource Description Framework

The Resource Description Framework (RDF) represents a directed labelled graph.In the OBDA system that has been implemented in this work, relational data isvirtualized as an RDF graph to allow for querying relational data with the querylanguage for the RDF.An RDF graph consists of triples called RDF triples.5

Definition 1 (RDF Triple and RDF Graph)IBL denotes the set I∪B∪L where I, B and L are disjoints sets of IRIs, blank nodesand literals respectively. An RDF triple tr is a triple (s,p,o) ∈ (I ∪ B) × I × IBL. Inan RDF triple s is called the subject, p the predicate and o the object of the triple.Furthermore, an RDF graph G is a set of RDF triples. [4]

Example 1A single RDF triple is depicted in listing 1 in the n-triples format.6

<http ://www.university.com/Alice ><http ://www.w3.org /1999/02/22 -rdf -syntax -ns#type >

<http ://www.university.com/MasterStudent >.

Listing 1: A single RDF triple in the n-triple format.

A graphical representation of the triple is depicted in figure 2.

5https://www.w3.org/TR/2014/NOTE-rdf11-primer-20140624/ last retrieved 28.03.20196https://www.w3.org/TR/n-triples/ last retrieved 28.03.2019

5

http://www.university.com/Alice

http://www.w3.org/1999/02/22-rdf-syntax-ns#type

http://www.university.com/MasterStudent

Figure 2: Graphical representation of an RDF triple.

Example 2In listing 2 an example of an RDF graph written in the n-triples format is given. Thedata holds information about two students at a university, namely Alice and Bob.The data says that Alice is a master student and that Bob is a bachelor student.Furthermore, the data defines that Alice and Bob are studying computer science. Agraphical representation of the RDF graph resulting from the triples is depicted infigure 3. In this RDF graph "Computer Science" is a literal, which is illustrated infigure 3 by the rectangular shape of the vertex in the graph.


<http ://www.university.com/MasterStudent >.<http ://www.university.com/Alice >

<http ://www.university.com/field >"Computer Science ".

<http ://www.university.com/Bob ><http ://www.w3.org /1999/02/22 -rdf -syntax -ns#type >

<http ://www.university.com/BachelorStudent >.<http ://www.university.com/Bob >

<http ://www.university.com/field >"Computer Science ".

Listing 2: RDF data written in the n-triples format.

6

http://www.university.com/Alice

http://www.university.com/Bob



http://www.university.com/MasterStudent

http://www.university.com/BachelorStudent

http://www.university.com/field

http://www.university.com/field

“Computer Science”

Figure 3: Graphical representation of an RDF graph.

2.2. Ontologies

The term ontologies is overloaded because it has different meanings in different fieldsof research. [5] defines an ontology in the context of computer science as "a means toformally model the structure of a system, i.e., the relevant entities and relations thatemerge from its observation, and which are useful to our purpose".In case of RDF, ontologies describe the schema of RDF data. Ontologies define

classes or concepts and the relations between those. Furthermore, ontologies oftendefine class hierarchies.In this work ontologies provide a global schema onto which relational data will be

mapped. SPARQL queries are written against this global schema. Those SPARQLqueries are translated based on the underlying relational data such that they canretrieve the desired information from the relational data.For representing ontologies the Web Ontology Language (OWL) can be used. OWL

can be serialized as RDF data. In this work the subset of OWL is considered that isdefined in definition 2. To distinguish between triples that belong to an ontology andthose that are not part of the ontology ontological triples and assertional triples aredefined hereinafter. These definitions of triples are based on [1] and [6].

7

Definition 2 (Ontological Terms)The set Tontological = subClassOf, subProperty, domain, range, type, equivalentClass,equivalentProperty, inverse, symmetricProperty is the set of ontological terms.

For simplicity the full IRIs of ontological terms are omitted.

Definition 3 (Ontological Triples)An RDF triple (s, p, o) is an ontological triple if

1) s ∈ (I ∖ Tontological) and2) either p ∈ (Tontological ∖ type) and o ∈ (I ∖ Tontological)

or p = type and o = symmetricProperty

Definition 4 (Assertional Triples)An RDF triple is assertional if it is not ontological.

Based on the definition of ontological triples, an ontology can be defined as follows.

Definition 5 (Ontology)An ontology O is a set of ontological triples.

The semantics τGtr of a triple tr in an ontology are presented in the following defi-nition. The following definitions are based on [1].

Definition 6 (Semantics of Ontological Triples)The semantics of an ontological triple tr is the evaluation of the function τGtr over theRDF graph G.

a) τG(s,subClassOf,o) = ∀x ∈ IBL∣(x, type, s) ∈ G → (x, type, o) ∈ G

b) τG(s,subProperty,o) = ∀x, y ∈ IBL∣(x, s, y) ∈ G → (x, o, y) ∈ G

c) τG(s,domain,o) = ∀x, y ∈ IBL∣(x, s, y) ∈ G → (x, type, o) ∈ G

d) τG(s,range,o) = ∀x, y ∈ IBL∣(x, s, y) ∈ G → (y, type, o) ∈ G

e) τG(s,equivalentClass,o) = ∀x ∈ IBL∣(x, type, s) ∈ G ↔ (x, type, o) ∈ G

f) τG(s,equivalentProperty,o) = ∀x, y ∈ IBL∣(x, s, y) ∈ G ↔ (x, o, y) ∈ G

g) τG(s,inverseProperty,o) = ∀x, y ∈ IBL∣(x, s, y) ∈ G ↔ (y, o, x) ∈ G

h) τG(s,type,symmetricProperty) = ∀x, y ∈ IBL∣(x, s, y) ∈ G → (y, s, x) ∈ G

Note that besides the triples that are inferred as defined in definition 6 additionaltriples are inferred. In this work only the subset of inferred triples considered usefulis dealt with. An inferred triple is considered useful if it is inferred by one of the rulespresented in definition 6. Furthermore, an IRI i is called an instance of a class c ifthere exists an RDF triple (i, type, c).

8

Example 3In listing 3 an example of an ontology is given. This ontology defines that each in-stance of BachelorStudent or MasterStudent is also an instance of Student becausethe former two classes are subclasses of Student. Based on the ontology shown inlisting 3 and definition 6 a) the triples shown in listing 4 can be inferred when theRDF dataset shown in listing 2 and the ontology are combined.

<http ://www.university.com/BachelorStudent ><http ://www.w3.org /2000/01/rdf -schema#subClassOf >

<http ://www.university.com/Student >.<http ://www.university.com/MasterStudent >

<http ://www.w3.org /2000/01/rdf -schema#subClassOf ><http ://www.university.com/Student >.

Listing 3: An example of an OWL ontology.


<http ://www.university.com/Student >.<http ://www.university.com/Bob >

<http ://www.w3.org /1999/02/22 -rdf -syntax -ns#type ><http ://www.university.com/Student >.

Listing 4: Newly inferred triples based on an ontology.

2.3. SPARQL Protocol and RDF Query Language

The SPARQL Protocol and RDF Query Language (SPARQL) is the standard querylanguage for querying RDF data. In this subsection the subset of SPARQL neededfor this thesis is introduced. The definitions for SPARQL are based on the W3Crecommendation for SPARQL 1.1 [7] and [8].

Example 4A simple SPARQL query retrieving in what field Alice is aiming to obtain her master’sdegree in is depicted in listing 5. In this query so called prefixes are used. Prefixesdefine abbreviations that can be used in a query to shorten an IRI. For instance, theterm PREFIX uni:<http://www.university.com/> defines that uni:Alice actuallymeans <http://www.university.com/Alice>.

9

PREFIX uni: <http ://www.university.com/>

SELECT ?field WHEREuni:Alice uni:field ?field.

Listing 5: SPARQL query retrieving the field Alice studies in.

When this query is issued against the graph that is depicted in figure 3, "ComputerScience" is bound to the variable ?field. After that the variable binding that definesthat "Computer Science" is bound to ?field is returned because the variable ?fieldis stated after the SELECT keyword in the query.

Syntax

In the following definition the syntax of SPARQL will be introduced. In SPARQLqueries graph patterns are used to define, which data should be retrieved.

Definition 7 (Graph Pattern and Triple Pattern)A tuple of the form tp = (IBL ∪ V) × (I ∪ V) × (IBL ∪ V), is a graph pattern and iscalled triple pattern. The set of triple patterns is denoted by T P. With graph patternsP,P1 and P2:

P is a graph pattern.P1.P2 is a graph pattern and is called join.P1 OPTIONAL P2 is a graph pattern and is called optional.P1 UNION P2 is a graph pattern and is called union.

Furthermore, var(tp) is the set of variables that occur in tp.

Example 5An example of a triple pattern is uni:Alice uni:field ?field in the SPARQLquery depicted in listing 5. Furthermore, the SPARQL queries depicted in listings 6,7 and 8 contain examples of join, optional and union graph patterns respectively.

PREFIX uni: <http ://www.university.com/>PREFIX rdf: <http ://www.w3.org /1999/02/22 -rdf -syntax -ns#>

SELECT ?field ?type WHEREuni:Alice uni:field ?field.uni:Alice rdf:type ?type

Listing 6: Query containing a join graph pattern.

10


SELECT ?field WHEREuni:Alice uni:field ?fieldOPTIONAL uni:Alice rdf:type uni:BachelorStudent

Listing 7: Query containing an optional graph pattern.


SELECT ?type WHEREuni:Alice rdf:type ?typeUNIONuni:Bob rdf:type ?type

Listing 8: Query containing a union graph pattern.

Graph patterns are used within so called SELECT queries in SPARQL. The queriesdepicted in listings 5, 6, 7 and 8 are SELECT queries.

Definition 8 (SELECT Query)If P is a graph pattern and V is a set of variables, then SELECT V WHERE P andSELECT * WHERE P are SELECT queries.

Semantics

In order to retrieve data from an RDF graph with a SELECT query, so called variablebindings are used.

Definition 9 (Variable bindings)The partial function µ ∶ V → IBL is called a variable binding. For the triple patterntp, µ(tp) denotes the triple obtained when all variables in tp are replaced according toµ. The domain dom(µ) of a variable binding µ is the set of variables on which µ isdefined.

Example 6An example of a variable binding is µ = (?type, uni:MasterStudent). In this example?type is the variable on which the mapping is defined and uni:MasterStudent is thevalue that is bound to ?type. Consider the triple pattern tp = (uni:Alice, rdf:type,?type). The triple obtained by µ(tp) is (uni:Alice, rdf:type, uni:MasterStudent).

Variable bindings can be compatible as described in the following definition.

11

Definition 10 (Compatible Variable Bindings)Two variable bindings µ1 and µ2 are compatible variable bindings when:∀x ∈ dom(µ1) ∩ dom(µ2) ∶ µ1(x) = µ2(x)

Example 7Consider the variable bindings µ1 =(?type, uni:MasterStudent),(?person, uni:Alice)and µ2 = (?type, uni:MasterStudent),(?field, ComputerScience). The shared do-main of the two variable bindings dom(µ1)∩dom(µ2) = ?type. Due to the fact thatµ1(?type) = µ2(?type) = uni:MasterStudent, the two variable bindings are compatible.

The join, union and difference of two sets of variable bindings can be created asdefined in the following definition:

Definition 11 (Join, Union and Difference of Sets of Variable Bindings)Let Ω1 and Ω2 be two sets of variable bindings, then the join (1), union (2), difference(3) and left outer join (4) of these sets are defined as follows:

(1) Ω1 &Ω2 =µ1 ∪ µ2∣µ1 ∈ Ω1, µ2 ∈ Ω2 and µ1 and µ2 are compatible(2) Ω1 ∪Ω2 =µ∣µ ∈ Ω1 or µ ∈ Ω2

(3) Ω1 ∖Ω2 =µ ∈ Ω1∣∀µ′∈ Ω2 ∶ µ and µ′ are not compatible

(4) Ω1d|><| Ω2 =(Ω1 &Ω2) ∪ (Ω1 ∖Ω2)

Based on these definition the evaluation of a graph pattern, denoted by the functionJ.KG, where G is the graph on which the graph pattern is evaluated, can be defined.

Definition 12 (Evaluation of Graph Pattern)Let G be an RDF graph, let tp be a triple pattern and let P1 and P2 be graph patterns,then the evaluation JP KG is defined as:

JtpKG = µ∣dom(µ) = var(tp) and µ(tp) ∈ GJP1.P2KG = JP1KG & JP2KG

JP1 OPTIONAL P2KG = JP1KGd|><| JP2KGJP1 UNION P2KG = JP1KG ∪ JP2KG

With this evaluation of graph patterns the evaluation of a SELECT query is definedas follows.

Definition 13 (Evaluation of SELECT Query)The evaluation JQKG of a query Q of the form SELECT V WHERE P on RDF graphG is the set of all projections µ∣V of bindings µ from JP KG to V , where the projectionof µ∣V is the binding that coincides with µ on V and is undefined elsewhere.The evaluation of SELECT * WHERE P is equal to the evaluation of SELECT V WHEREP where V = var(P ) and var(P ) denotes the set of all variables in P .

12

2.4. Relational Data Model

Relational database systems are the backbone of ample web sites and software sys-tems [9]. In this section basics on relational databases and the underlying relationalmodel, needed for this thesis, will be presented. These basics are based on [10] and[1]. In relational databases data is stored as relations.

Example 8One example of a schema of a relation is given in table 1, where the schema and therelation are depicted as table. In this relation schema various attributes are defined.Attributes are depicted as column names in table 1, namely ID, Name and Field withthe domains integer, characters and characters respectively.

STUDENTID Name Field1 Alice Computer Science2 Bob Computer Science

Table 1: Table depicting relation schema and relation.

Definition 14 (Domain)A domain D is a set of atomic values.

Due to the fact that NULL values often appear in real world datasets the NULL valueswill be defined in the context of relational algebra hereinafter.

Definition 15 (NULL)NULL ∉D is the keyword that defines the absence of a value.

Definition 16 (Relation Schema)A relation schema R(A1,A2, ...,An) is the schema of a single relation where A1, ...,Anare the attributes of the relation schema. The arity of the relation schema is equalto n.

Based on the relation schema relations can be defined. Informally speaking, arelation is the set of entries in a table defined by the relation schema.

Definition 17 (Relation)A relation r of a relation schema R(A1,A2, ...,An) is a set of tuples r = tu1, tu2, ...tum

where each tuple tu is an ordered list of values < v1, v2, ...vn > where vi ∈ dom(Ai) ∪NULL . The ith value of a tuple tu is denoted by tu[Ai]. Furthermore, att(r) denotesthe set of attributes A1,A2, ...,An in r.

An example of a relation is the set of tuples r = tu1, tu2 where the tuple tu1 =<1,Alice,ComputerScience > and tu2 =< 2,Bob,ComputerScience > as depicted in

13

table 1. Thereby, tu1[ID] = 1 and tu2[Name] = Bob are examples of how values intuples can be denoted.So far it was talked about the schema of a single relation. A complete database

also has a schema called relational schema.

Definition 18 (Relational Schema)A relational schema S = R1,R2, ...,Rn of a database is a set of relation schemes.Each attribute Ai in Rj ∈ S has a Domain D denoted by dom(Ai).

Furthermore, relational schemas can be instantiated.

Definition 19 (Instance of Relational Schema)An instance s = r1, r2, ...rn of a relational schema S is a set of relations where foreach relation schema Ri ∈ S a corresponding relation ri exists in s.

Based on the instance of a relational schema s an instance of a relation schema canbe written as Rs. This expressions defines the instance of a relation schema R thatis included in s.

2.5. Relational Algebra

Query languages such as the well known Structured Query Language (SQL), whichare used to query relational data are defined based on relational algebra. In relationalalgebra sets, their union and difference and the Cartesian product from set theory areused. The definitions in this section are the definitions introduced in [1].

Syntax

In relational algebra, relational algebra expressions are used. A relational algebraexpression ϕ and its attributes att(ϕ) are defined hereinafter. In the following sectionsS denotes a relational schema, s denotes an instance of a relational schema S, Rdenotes a relation schema and r denotes an instance of a single relation schema R.

Definition 20 (Relation in relational algebra)Let ϕ = R and R ∈ S. Then ϕ is a relational algebra expression over S such thatatt(ϕ) = att(R).

Definition 21 (NULL)Let A be an attribute and ϕ = NULLA then ϕ is a relational algebra expression overS where att(ϕ) = A.

Definition 22 (Condition)Let A be a set of attribute names and let A ∈ A. Furthermore, let a ∈ D, then a

14

condition condA is of the form:

A = a

A ≠ a

isNull(A)

isNotNull(A)

true

If cond1A and cond2A are conditions, then

cond1A ∧ cond2A

andcond1A ∨ cond2A

are conditions.

Example 9Consider the relation depicted in table 1. Let A = att(STUDENT ), then a conditioncondID,Name,F ield = isNotNull(Name).

Definition 23 (Selection)Let ϕ1 be a relational algebra expression over S, let A ⊆ att(ϕ1). Then the followingexpressions ϕ2 is a relational algebra expression with att(ϕ2) = att(ϕ1):

ϕ2 = σcondA(ϕ1)

Example 10For instance, σID=1(STUDENT ) is a selection on the relation depicted in table 1.

Definition 24 (Projection)Let ϕ1 be a relational algebra expression over S with U ⊆ att(ϕ1). Let ϕ2 = πU(ϕ1)

then ϕ2 is a relational algebra expression over S and att(ϕ2) = U .

Example 11An example of a projection on the example relation STUDENT depicted in table 1is: πID,F ield(STUDENT ).

Definition 25 (Coalesce)Let ϕ1 be a relational algebra expression over S and let A1,A2 ∈ att(ϕ1) and letAnew /∈ att(ϕ1), then ϕ2 = κA1,A2,Anew(ϕ1) is a relational algebra expression withatt(ϕ2) = att(ϕ1) ∪ Anew.

Example 12Consider the relation depicted in table 2. An example of a coalesce is the followingrelational algebra expression: κPostalCode,City,Location(ADDRESS).

15

ADDRESSName City PostalCodeAlice Koblenz 56073Bob Cologne NULLCarol NULL NULL

Table 2: Table depicting relation that stores cities and postal codes of persons.

Definition 26 (Rename of Attribute)Let ϕ1 be a relational algebra expression over S. Furthermore let A ∈ att(ϕ1) and letB ∉ att(ϕ1). If ϕ2 = %A→B(ϕ1), then ϕ2 is a relational algebra expression over S withatt(ϕ2) = (att(ϕ1) ∖ A) ∪ B.

Example 13Considering the relation depicted in table 1 %Name→FirstName(STUDENT ) is an ex-ample of a rename.

Definition 27 (Union)Let ϕ1, ϕ2 be relational algebra expressions over S with att(ϕ1) = att(ϕ2). Let ϕ3 =

ϕ1 ∪ ϕ2 then ϕ3 is a relational algebra expression over S and att(ϕ3) = att(ϕ1)

Example 14Assume the relation depicted in table 3 called PERSON. The union of STUDENT andPERSON can be written as STUDENT ∪ PERSON .

PERSONName City AgeAlice Koblenz 22Bob Cologne 30Carol Koblenz 23

Table 3: Table depicting relation schema and relation for persons.

Definition 28 (Outer Union)Let ϕ1, ϕ2 be relational algebra expressions over S. Let ϕ3 = ϕ1 ⊎ ϕ2 then ϕ3 is arelational algebra expression over S and att(ϕ3) = att(ϕ1) ∪ att(ϕ2).

Example 15The outer union of the table STUDENT and PERSON is STUDENT ⊎ PERSON .

Definition 29 (Difference)Let ϕ1, ϕ2 be relational algebra expressions over S with att(ϕ1) = att(ϕ2). Let ϕ3 =

ϕ1 ∖ ϕ2 then ϕ3 is a relational algebra expression over S and att(ϕ3) = att(ϕ1)

Example 16Consider the relation STUDENT depicted in table 1 and the table below.

16

SINGLE_STUDENTID Name Field2 Bob Computer Science

Table 4: Table depicting relation schema and relation.

Example 17The difference of the two relations STUDENT and SINGLE_STUDENT can be expressedwith the clause STUDENT∖SINGLE_STUDENT.

Definition 30 (Cross Join)Let ϕ1, ϕ2 be relational algebra expression over S and let att(ϕ1) ∪ att(ϕ2) = ∅. Letϕ3 = ϕ1×ϕ2, then ϕ3 is a relational algebra expression over S and att(ϕ3) = att(ϕ1)∪

att(ϕ2).

Example 18Consider the relations STUDENT and CITY depicted in table 1 and table 5 respec-tively. The cross product of the two relations can be written as STUDENT ×CITY .

CITYCityName PostalCodeKoblenz 56068Cologne 50667

Table 5: Table depicting relation storing city names and postal codes.

Definition 31 (Theta Join)Let ϕ1, ϕ2 be relational algebra expressions over S and let att(ϕ1) ∪ att(ϕ2) = ∅. Letϕ3 = ϕ1 &condA ϕ2 and let A ⊆ att(ϕ1) ∪ att(ϕ2). Then ϕ3 is a relational algebraexpression with att(ϕ3) = att(ϕ1) ∪ att(ϕ2).If no condA is given in a theta join ϕ1 & ϕ2, then the theta join is equivalent to

ϕ1 &true ϕ2.

Example 19Consider the relation shown in table 2 and the relation shown in table 6, whichdepicts codes for missing information. A join with a NULL in the join condition is:ADDRESS &PostalCode=NULL CODES.

CODESCode Description

1 Postal Code2 Incomplete

Table 6: Relation holding information about codes and their meaning.

17

Definition 32 (Left Outer Join)Let ϕ1, ϕ2 be relational algebra expressions over S, let att(ϕ1) ∪ att(ϕ2) = ∅, andlet A ⊆ att(ϕ1) ∪ att(ϕ2). Let ϕ3 = ϕ1 d|><| condA ϕ2, then ϕ3 is an relational algebraexpression over S with att(ϕ3) = att(ϕ1) ∪ att(ϕ2) called left outer join.If no condA is given in a left outer join ϕ1 d|><| ϕ2, then the left outer join is equivalent

to ϕ1 d|><| true ϕ2.

Example 20An example of a left outer join of the relations STUDENT and GRADES depicted in table 3and 7 respectively is PERSON d|><| ID=StudentID GRADES.

GRADESStudentID AverageGrade

1 1.7

Table 7: Table depicting grades for students.

It may happen that in a join of two relations the set of attributes of the relationsis not disjunct. In such cases the fully qualified name of an attribute can be used.A fully qualified name takes the source relation of the attribute into consideration.Consider the relation STUDENT depicted in table 1 and the relation PERSON depictedin table 3. Both relations have an attribute called NAME. Fully qualified names forthe attributes are STUDENT.NAME and PERSON.NAME for the attributes in STUDENT andPERSON respectively.Should a self join occur and thereby, should the source relation not be sufficient to

unambiguously identify an attribute, the attribute can be renamed or the underlyingrelation may be named.

Definition 33 (Naming of Relational Algebra Expression)Let ϕ be a relational algebra expression, and let E be an arbitrary but unambiguousname for the algebra expression then ρE(ϕ) is an relational algebra expression, suchthat each A ∈ att(ϕ) can be addressed with E.A.

This definition does not change any value in the actual relation.

Example 21Consider the relation depicted in table 1. Consider the relation should be joined onthe attribute ID with itself. In order to ensure that the attributes are unambiguousthe following relational algebra expression may be used:

STUDENT &STUDENT.ID=renamedSTUDENT.ID ρrenamedSTUDENT (STUDENT )

Although parentheses are not explicitly mentioned in the syntax definition, theyare sometimes used for clarification.

18

Semantics

Based on the syntax of relational algebra the semantics will be introduced in thefollowing. Let S be a relational schema, s an instance of a relational schema S, Ra relation schema and r an instance of a single relation schema R. Furthermore, letϕ be a relational algebra expression over S. The evaluation of the relational algebrafunction JϕKs over the instance s of S is defined as follows.

Definition 34 (Evaluation relation name)Let ϕ = R then JϕKs = Rs

Let student and person be the relations of the relational schemes of the relationsSTUDENT and PERSON respectively. Furthermore, let the relational schema instances = student, person. Then JSTUDENT Ks = STUDENT s = student.

Definition 35 (Evaluation of NULL)Let ϕ = NULLA and A is an attribute, then JϕKs = tu where tu is a tuple, such thattu[A] = NULL.

Definition 36 (Evaluation of Condition)The evaluation of a condition over a tuple tu JcondAKtu is defined as:

JA = aKtu ∶=

⎧⎪⎪⎨⎪⎪⎩

true, if tu[A] = a

false, otherwise

JA ≠ aKtu ∶=

⎧⎪⎪⎨⎪⎪⎩

true, if tu[A] ≠ a

false, otherwise

JisNull(A)Ktu ∶=

⎧⎪⎪⎨⎪⎪⎩

true, if tu[A] = NULL

false, otherwise

JisNotNull(A)Ktu ∶=

⎧⎪⎪⎨⎪⎪⎩

true, if tu[A] ≠ NULL

false, otherwise

JtrueKtu ∶= true

Furthermore, let cond1A, cond2A be conditions, then:

Jcond1A ∧ cond2AKtu ∶= Jcond1AKtu ∧ Jcond2AKtuJcond1A ∨ cond2AKtu ∶= Jcond1AKtu ∨ Jcond2AKtu

Example 22Consider the single tuple <2, Bob, Computer Science> depicted in table 4. Theevaluation of the condition ID = 2 over this tuple is:

JID = 2K<2,Bob,ComputerScience> = true

19

Definition 37 (Evaluation of Selection)The evaluation of a selection selects tuples from a relation which satisfy a condition.Let ϕ1 be a relational algebra expression over S and let condA be a condition withA ⊆ att(ϕ1), then the evaluation of a selection σcondA(ϕ1) is defined as:

JσcondA(ϕ1)Ks ∶= tu ∈ Jϕ1Ks ∣ JcondAKtu

Example 23The evaluation of the selection σID=1(STUDENT ) with regard to the relation de-picted in table 1 is <1, Alice, Computer Science>.

Definition 38 (Evaluation of Projection)The evaluation of a projection chooses a subset of attributes from a relation. Let ϕ1

be a relational algebra expression over S and U ⊆ att(ϕ1).

If ϕ2 = πU(ϕ1), then Jϕ2Ks = tu′∣tu ∈ Jϕ1Ks and ∀A ∈ U ∶ tu′[A] = tu[A]

Example 24The evaluation of the projection πID,F ield(STUDENT ) is the following set of tuples<1, Computer Science>,<2, Computer Science>.

Definition 39 (Evaluation of Coalesce)Let ϕ1 be a relational algebra expression over S. Let ϕ2 = κA1,A2,Anew(ϕ1), then

Jϕ2Ks ∶= tu∣tu1 ∈ Jϕ1Ks ∶ ∀A ∈ att(ϕ1) ∶ tu[A] = tu1[A] and

tu[Anew] =

⎧⎪⎪⎨⎪⎪⎩

tu1[A2], if tu1[A1] = NULL

tu1[A1], otherwise

Example 25The evaluation of the coalesce κPostalCode,City,Location(ADDRESS) is depicted intable 8.

Name City PostalCode LocationAlice Koblenz 56073 56073Bob Cologne NULL CologneCarol NULL NULL NULL

Table 8: Table depicting result of a coalesce.

Definition 40 (Evaluation of Rename of Attribute)The evaluation of rename allows for renaming an attribute. If ϕ1 is a relationalalgebra expression, A ∈ att(ϕ1) and B ∉ att(ϕ1). The evaluation of the renameoperation %A→B(ϕ1) is the set of tuples where the attribute A is now B.

Let ϕ2 = %A→B(ϕ1), thenJϕ2Ks = tu′∣tu ∈ Jϕ1Ks and tu′[B] = tu[A]

and ∀C ∈ att(ϕ1) ∶ C ≠ A⇒ tu′[C] = tu[C]

20

Example 26The evaluation of %Name→FirstName(STUDENT ) on table 1 would result in table 9.

STUDENTID FirstName Field1 Alice Computer Science2 Bob Computer Science

Table 9: Result of rename operation.

Definition 41 (Evaluation of Union)Let ϕ1 = ϕ2 ∪ ϕ3 then the evaluation of the union is defined as:

Jϕ1Ks = Jϕ2Ks ∪ Jϕ3Ks

Example 27Consider the STUDENT table depicted in table 1 and the BIOLOGY_STUDENT table de-picted in table 10. The union of these two tables STUDENT∪BIOLOGY_STUDENTis depicted in table 11.

BIOLOGY_STUDENTID Name Field3 Carol Biology

Table 10: Table depicting biology student.

ID Name Field1 Alice Computer Science2 Bob Computer Science3 Carol Biology

Table 11: Result of union operation.

Definition 42 (Evaluation of Outer Union)Let ϕ1 = ϕ2 ⊎ ϕ3 then

Jϕ1Ks ∶= tu∣(∀tu2 ∈ ϕ2 ∶ (∀A1 ∈ att(ϕ2) ∶ tu[A1] = tu2[A1]) and(∀A2 ∈ att(ϕ3) ∖ att(ϕ2) ∶ tu[A2] = NULL))

and(∀tu3 ∈ ϕ3 ∶ (∀A3 ∈ att(ϕ1) ∶ tu[A3] = tu3[A3]) and

(∀A4 ∈ att(ϕ2) ∖ att(ϕ3) ∶ tu[A4] = NULL))

21

Example 28The relation of the outer union of the tables STUDENT and PERSON, PERSON ⊎

STUDENT is depicted in table 12.

Name City Age ID FieldAlice Koblenz 23 NULL NULLBob Cologne 30 NULL NULLCarol Koblenz 23 NULL NULLAlice NULL NULL 1 Computer ScienceBob NULL NULL 2 Computer Science

Table 12: Result of outer union.

Definition 43 (Evaluation of Difference)If ϕ1 = ϕ2 ∖ ϕ3, then Jϕ1Ks = Jϕ2Ks ∖ Jϕ3Ks.

Example 29The result of the difference operation STUDENT∖SINGLE_STUDENT is depictedin table 13.

STUDENTID Name Field1 Alice Computer Science

Table 13: Difference of two relations.

Definition 44 (Evaluation of Cross Join)Let ϕ1 = ϕ2 × ϕ3 then the evaluation of the cross join is defined as:

Jϕ1Ks ∶= tu∣∀tu1 ∈ Jϕ2Ks ∶ ∀tu2 ∈ Jϕ3Ks ∶ (∀A ∈ att(ϕ2) ∶ tu[A] = tu1[A])and(∀A ∈ att(ϕ3) ∶ tu[A] = tu2[A])

Example 30Consider the relations STUDENT and CITY depicted in table 1 and table 5 re-spectively. The cross product of the two relations STUDENT × CITY is depictedin table 14.

ID Name Field CityName PostalCode1 Alice Computer Science Koblenz 560681 Alice Computer Science Cologne 506672 Bob Computer Science Koblenz 560682 Bob Computer Science Cologne 50667

Table 14: Result of a cross join.

22

Definition 45 (Evaluation of Theta Join)Let ϕ1, ϕ2 be relational algebra expressions over s and let ϕ1 = ϕ2&condA ϕ3. Then theevaluation of the theta join is defined as:

Jϕ1Ks ∶= JσcondA(ϕ1 × ϕ2)Ks

Example 31The evaluation of the theta join ADDRESS &PostalCode=NULL CODES is depictedin table 15.

Name City PostalCode Code DescriptionBob Cologne NULL 1 Postal CodeBob Cologne NULL 2 IncompleteCarol NULL NULL 1 Postal CodeCarol NULL NULL 2 Incomplete

Table 15: Table depicting evaluation of join with NULL in join condition.

Example 32Executing the join STUDENT &Name=Name PERSON would result in the relationdepicted in table 16.

ID Name Field Name City Age1 Alice Computer Science Alice Koblenz 222 Bob Computer Science Bob Cologne 30

Table 16: Result of join operation.

Definition 46 (Evaluation of Left Outer Join)Let ϕ1, ϕ2 relational algebra expression over S and let ϕ1 = ϕ2 d|><| condA ϕ3, then theevaluation of the left outer join is defined as:

Jϕ1Ks = J(ϕ2 &condA ϕ3) ⊎ (ϕ2 ∖ πatt(ϕ2)(ϕ2 &condA ϕ3))Ks

Example 33The relation, which results from the evaluation of the left outer join of the two tablesSTUDENT and GRADES, STUDENT d|><| ID=StudentID GRADES with the joincondition ID = StudentID is depicted in table 17.

ID Name Field StudentID AverageGrade1 Alice Computer Science Koblenz 1 1.72 Bob Computer Science NULL NULL

Table 17: Result of left outer join.

23

2.6. Structured Query Language (SQL)

In order to query relational data the Structured Query Language (SQL) is used. Infact SQL is based on relational algebra. The translation of SQL to relational algebrais defined in [11]. In the context of this master thesis SQL is needed to query thevirtualized RDF graph.In order to query relational data so called SQL SELECT queries can be used. An

example of such a query retrieving the name of the person with the id 1 from therelation depicted in table 1 is shown in listing 9.

SELECT NAME FROM STUDENT WHERE ID = 1

Listing 9: SQL query retrieving name.

This query would return Alice as NAME. Hereby SELECT NAME corresponds to theprojection of relational algebra πNAME(ϕ1). In this context ϕ1 is defined by therest of the query. The clause WHERE ID = 1 corresponds to the selection of relationalalgebra: σID=1(ϕ2). ϕ2 is defined by FROM STUDENT, which says that the selection isexecuted on the table STUDENT. Therefore, the complete relational algebra expressionof the query depicted in listing 9 is πNAME(σID=1(STUDENT )).Besides SELECT queries the creation of views is also needed in this work. A view

is in fact an SQL query result. A SELECT query is stored and the results of thequery are directly shown as table. This table of results is the view. Besides normalviews also materialized views exist. The difference between materialized and notmaterialized views is that in materialized views the results of the SELECT queryare physically stored in the database, whereas in not materialized views the SELECTquery is evaluated every time the view is accessed. In listing 10 an example of a querythat creates a materialized view is given.

CREATE MATERIALIZED VIEWSELECT NAME FROM STUDENT WHERE ID = 1

Listing 10: SQL query creating a materialized view.

24

3. Ontology Based Data Access

In order to integrate a relational database into an RDF graph, Ontology Based DataAccess (OBDA) is used. In this section a formal framework for OBDA is presented,which had been used to formalize the OBDA system that has been implemented inthis thesis.

3.1. Mapping

In ontology based data access a relational schema S is mapped onto an ontology Obased on a mapping M such that SPARQL queries can be issued against an instanceof S. Thereby, the ontology O serves as the global schema of the data. This meansthat when a mapping has been defined, a user of the OBDA system does not needany knowledge of the underlying relational data. The user can simply issue SPARQLqueries against the ontology and retrieve the desired information. Hereinafter, map-pings from relational data onto RDF data are defined.

Definition 47 (Mapping Templates)Let A be an attribute of a relation. The arbitrary string θ is a mapping template.In a mapping template substrings of the form A can occur and denote templatevariables in a mapping template.7 Furthermore, att(θ) denotes the set of attributenames in θ.

Example 34The string http://www.uni.com/student/ID is a mapping template with one tem-plate variable in it, namely ID.

Definition 48 (Evaluation of Mapping Templates)Let A be an attribute of a relation and let tu be a tuple in a relation. The evaluationof a single template variable A is defined as follows:

JAKtu ∶= str(tu[A])

Where str denotes the function that creates a string from a given input. The evalua-tion of a mapping template JθKtu for a given tuple tu is the string obtained by replacingeach template variable in θ with the evaluation of the template variable. Furthermore,let R be the relation schema of tu then att(θ) ⊆ att(R).

Example 35Consider the tuple tu =<1,Alice,ComputerScience> from table 1. The evaluationJhttp ∶ //www.uni.com/studentIDKtu = http ∶ //www.uni.com/student/ JIDKtu= http ∶ //www.uni.com/student/1.

7If or are used within θ without being used as markup for the template variable, then theyhave to be escaped as \ or \. Consequently also \ has to be escaped as \\ if it is not used asescape character.

25

Definition 49 (Mapping Rule)Let ϕ be a relational algebra expression, θ1 and θ2 mapping templates and iri ∈ I,then a mapping rule is:

ϕ (θ1, iri, θ2)

Example 36An example of a mapping rule that defines that for each student in table 1 a tripleshould be created where the subject contains the ID, the predicate is always the IRIhttp://www.uni.com/name and the object is an IRI including the value stored in theNAME column of the relation STUDENT is:

STUDENT (http ∶ //www.uni.com/student/ID,

http ∶ //www.uni.com/name,

http ∶ //www.uni.com/student/NAME)

Definition 50 (Evaluation of Mapping Rule)The evaluation of a mapping rule over an instance of a relational schema s is a setof triples:

Jϕ (θ1, iri, θ2)Ks = (Jθ1Ktu, iri, Jθ2Ktu) ∣tu ∈ JϕKs

Example 37The evaluation of the mapping rule shown in example 36 results in the two triplesdepicted in listing 11.

<http ://www.uni.com/student/1><http ://www.uni.com/name >

<http ://www.uni.com/student/Alice >.<http ://www.uni.com/student/2>

<http ://www.uni.com/name ><http ://www.uni.com/student/Bob >.

Listing 11: RDF triples resulting from evaluation of mapping rules.

Definition 51 (Mapping)A mapping M is a set of mapping rules.

3.2. Formal Framework for Ontology Based Data Access

After having defined all inputs that are given to an OBDA system a formal frameworkfor OBDA will be formalized now. The definitions for the formal framework of OBDAare based on [12] and on [1].

26

Definition 52 (OBDA Specification)An OBDA specification (S,M,O) specifies how the relational schema S can be mappedonto the ontology O based on the mapping M such that the result of the evaluation ofall mapping rules in M result in valid RDF triples.

With the help of an OBDA specification, an instance s of a relational schema andtherefore, the relational data in s can be mapped onto the ontology O.

Definition 53 (OBDA instance)An OBDA instance is the tuple ((S,M,O), s) where s is the instance of a relationalschema S.

SPARQL queries can be issued against an OBDA instance such that sets of variablemappings are returned that correspond to the triples that are created by the evalua-tion of each mapping rule in a mapping M . In order to also obtain results that arenot explicitly stored in the data, but can be inferred with the help of the ontology,two approaches can be used. In the first approach the input mapping is saturatedwith additional rules, such that the mapping also creates all implicit triples.

Definition 54 (Mapping Saturation)For a given mapping M and an ontology O the function sat(M,O) produces a sat-urated mapping M ′, where M ⊆ M ′. Thereby, M ′ is the set of mapping rules thatproduces all triples produced by M and all implicit triples that can be inferred basedon O.

Example 38Consider the ontology consisting of one triple:

(http ∶ //www.uni.com/BachelorStudent,

http ∶ //www.w3.org/2000/01/rdf − schema#subClassOf,

http ∶ //www.uni.com/Student)

This ontology defines that each bachelor student is also a student. Consider thefollowing mapping M .


http ∶ //www.w3.org/1999/02/22 − rdf − syntax − ns#type,

http ∶ //www.uni.com/BachelorStudent)

Based on the ontology, a mapping rule that defines that each bachelor student is alsoa student has to be added to the mapping in order to create a saturated mapping.

27

The saturated mapping M ′ is depicted below.



http ∶ //www.uni.com/BachelorStudent),




The second approach to also consider implicit knowledge when querying an OBDAinstance is to extend queries according to the given ontology.

Definition 55 (Query Extension)For a SPARQL SELECT query Q and an ontology O the function extend(Q,O)

extends the query Q based on the ontology O to the query Q′. Thereby, Q′ returnsall variable bindings that would have been returned, if Q would have been executed onan RDF graph that includes all implicit triples based on O.

Example 39The query depicted in listing 12 retrieves all vertices that are of the type student.Consider the ontology from example 38. The ontology defines that each bachelorstudent is also a student. Therefore, the query can be extended to the query depictedin listing 13. Thereby, the union of all students and bachelor students is created toobtain all implicit results.

PREFIX rdf: <http ://www.w3.org /1999/02/22 -rdf -syntax -ns\#>PREFIX uni: <http ://www.uni.com/>SELECT ?s WHERE?s rdf:type uni:Student

Listing 12: Input query to an OBDA system.

PREFIX rdf: <http ://www.w3.org /1999/02/22 -rdf -syntax -ns\#>PREFIX uni: <http ://www.uni.com/>SELECT ?s WHERE

?s rdf:type uni:StudentUNION?s rdf:type uni:BachelorStudent

Listing 13: Ontology based extended query.

In order to obtain results from the underlying relational database the SPARQLquery has to be rewritten to a SQL query, which retrieves the desired results. TheSPARQL query is rewritten based on the mapping. The SQL query is then issued

28

against the underlying relational database and the result of the SQL query is trans-formed into respective SPARQL results, which are then returned to the user of thesystem.Definition 56 (Relation to Variable Binding Transformation)Given a relation schema R and an instance of this schema r, the function transform(r)transforms the relation into a set of variable bindings.

transform(r) = µ∣tu ∈ r and µ = (toVar(A), tu[A])∣A ∈ att(R) and tu[A] ≠ NULL

Thereby, the static function toVar(A) creates a SPARQL variable from an attributename.Example 40Consider the relation depicted in table 1. The result of transform(STUDENT ) is:

transform(STUDENT ) =

(?ID,1), (?NAME,Alice), (?FIELD,Computer Science),

(?ID,2), (?NAME,Bob), (?FIELD,Computer Science)

Definition 57 (Query Rewriting)Given a SPARQL query Q, and an OBDA instance ((S,M,O), s), the functionrewrite(Q,M) rewrites Q to a SQL query such that:

transform(Jrewrite(extend(Q,O), sat(M,O))Ks) = JQKJsat(M,O)Ks

In figure 4 the dataflow in an OBDA system is depicted. The mapping saturationand query extension based on the ontology are the first steps in the figure. After thatthe extended SPARQL query is translated to SQL with the help of the mapping. Theresulting SQL query is executed on the underlying relational database and the queryresults are transformed to variable bindings.

Mapping

Ontology

SPARQL Query

TranslateQuery

SQL Query

Execute SQL

Query

SQL Results

Transform Results

SPARQL Results

Saturate Mapping

Saturated Mapping

Relational Database

Dataflow

Extend Query

Extended SPARQL

Query

Figure 4: Dataflow in an OBDA system.

29

Figure 5: Dataflow in UltrawrapOBDA.

4. Ultrawrap

UltrawrapOBDA is an OBDA system that was developed by Juan Sequeda [1]. Thesystem allows for querying relational data stored in an Oracle database with SPARQL.Before UltrawrapOBDA can be extended, the OBDA system will be presented in thissection. UltrawrapOBDA differentiates between the compilation phase and the run-time phase. In figure 5 the dataflow in UltrawrapOBDA is depicted.

4.1. Compilation Phase

In the compilation phase UltrawrapOBDA is prepared to allow for querying relationaldata with SPARQL in the later runtime phase. The input in the compilation phase isan ontology O, a mappingM , a relational schema S and an instance s of S. Formallyspeaking the input creates an OBDA instance ((O,S,M), s). The OBDA systemsupports implicit triples by saturating the input mapping with additional mappingrules. In UltrawrapOBDA the saturation of the mapping is achieved with inferencerules of the form (s, p, o) ∶ ρ1ρ2 where given a triple (s, p, o) in the ontology a mappingrule ρ2 is returned if a mapping rule ρ1 exists in the mapping. The mapping rules thatare used in UltrawrapOBDA do only allow for complete strings as mapping templatesand not for any template variables in the mapping templates. The inference rules arelisted in definition 58.

Definition 58 (Ultrawrap Inference Rules)Let A,B ∈ I, let θ, θ1, θ2 be mapping templates and let ϕ be a relational algebra ex-

30

pression, then the inference rules are defined as:

(A, subClassOf,B) ∶ϕ (θ, type,A)

ϕ (θ, type,B)

(A, subProperty,B) ∶ϕ (θ1,A, θ2)

ϕ (θ1,B, θ2)

(A,domain,B) ∶ϕ (θ1,A, θ2)

ϕ (θ1, type,B)

(A, range,B) ∶ϕ (θ1,A, θ2)

ϕ (θ2, type,B)

(A, equivalentClass,B)or(B, equivalentClass,A) ∶ϕ (θ, type,A)

ϕ (θ, type,B)

(A, equivalentProperty,B)or(B, equivalentProperty,A) ∶ϕ (θ1,A, θ2)

ϕ (θ1,B, θ2)

(A, inverseProperty,B)or(B, inverseProperty,A) ∶ϕ (θ1,A, θ2)

ϕ (θ2,B, θ1)

(A, symmetricProperty,B)or(B,symmetricProperty,A) ∶ϕ (θ1,A, θ2)

ϕ (θ2,A, θ1)

These inference rules are applied until a fix point is reached and the set of ruledoes not change anymore. The saturation of mappings is defined in the followingdefinition.Definition 59 (Saturation of Mappings)Let M be a mapping and O an ontology then a single saturation step is defined as:

sat′(M,O) =M ∪ m∣m = (s, p, o) ∶ρ1ρ2

where (s, p, o) ∈ O and ρ1 ∈M (1)

Subsequently the mapping saturation function sat is defined as:

sat(M,O) =

⎧⎪⎪⎨⎪⎪⎩

M, if sat′(M,O) =M

sat(sat′(M,O),O), otherwise(2)

The formally defined mapping rules in the saturated mapping are implemented asSQL queries that create SQL views in the underlying relational database during theview selection. These SQL views depict a virtualized RDF graph. Thereby, a singleview for each property and for each RDF type is created. These views are calledtripleviews.Example 41Consider the relation depicted in table 18. This relation stores information aboutcourses at a university and the primary key of the relation is the ID. Furthermore,consider a mapping rule which defines that for each tuple in the relation a tripleshould be created in which the subject contains the ID of the tuple, the pred-icate is http://www.uni.com/description and the object is the value stored inDESCRIPTION. The mapping is shown in (3).

31

COURSESID NAME LECTURER DESCRIPTIONc1 Mathematics Dr. Strange Teaches the basics of mathematics.c2 Physics Dr. Octavius Physics is one of the most fundamental scientific

disciplines, and its main goal is to understandhow the universe behaves.

c3 Databases Dr. Acula Teaches relational algebra and SQL.

Table 18: Relation holding information about courses at a university.

COURSES (http ∶ //www.uni.com/course/ID,

http ∶ //www.uni.com/description,

”DESCRIPTION”)

(3)

The corresponding SQL query that creates the tripleview is depicted in listing 14.Executing this query on the relation shown in table 18 would result in the viewdepicted in table 19. For each property and for each class such a tripleview is created.

CREATE VIEW descriptionView SELECT S, P, O ASSELECT

CONCAT ("http ://www.uni.com/course/", ID) AS S,"http ://www.uni.com/description" AS P,CONCAT(", DESCRIPTION , ") AS O

FROMCOURSES

Listing 14: SQL query creating SQL view based on mapping rule.

descriptionViewS P Ohttp://www.uni.com/course/c1

http://www.uni.com/description

"Teaches the basics of mathematics."

http://www.uni.com/course/c2


"Physics is one of the most funda-mental scientific disciplines, and itsmain goal is to understand how theuniverse behaves."



"Teaches relational algebra andSQL."

Table 19: Triple view containing all triples with http://www.uni.com/descriptionas predicate.

32

There may exists more than one mapping rule creating triples with the same pred-icate. For instance, consider the following mapping rule.

COURSES (http ∶ //www.uni.com/course/NAME,


”DESCRIPTION”)

(4)

Together with the mapping rule shown in (3) two subqueries are used in the viewcreation. The union of these subqueries is created to create the description view. Therespective SQL query is depicted in listing 15.

CREATE VIEW descriptionView SELECT S, P, O ASSELECT

CONCAT ("http ://www.uni.com/course/", ID) AS S,"http ://www.uni.com/description" AS P,CONCAT(", DESCRIPTION , ") AS O

FROMCOURSES

UNIONSELECTCONCAT ("http ://www.uni.com/course/", NAME) AS S,"http ://www.uni.com/description" AS P,CONCAT(", DESCRIPTION , ") AS O

FROMCOURSES

Listing 15: SQL query creating SQL view based on two mapping rules.

The SQL subqueries created by multiple mapping rules that define the same classare unioned analogously to create the respective view for the class.

Definition 60 (View Function)The function view(iri), with iri ∈ IRI returns the respective view name for a propertyor a class..

The view function is needed to retrieve the view name for a given property or classin the later process of translating SPARQL queries into SQL queries.Furthermore, a view is created that contains all triples independent from their

property or class. This view is needed if there is triple pattern in a SPARQL querywhere the predicate is a variable. If the predicate is a variable, the triple patterncannot be mapped to any other tripleview and therefore, it will be mapped to theview with all triples in it. This view is called allTriplesView.

4.2. Tripleview Optimization

In order to enhance the query execution time of SPARQL queries that are posedagainst the OBDA system, tripleviews may be optimized. Sequeda names three pos-

33

sible optimizations of the tripleviews:

1. Addition of primary key columns.

2. Creation of separate tripleviews for different data types.

3. Materialization of views.

1. Addition of primary key columns:Indices optimize the performance of relational databases by minimizing the numberof disk accesses required when a query is executed. An index stores a pointer withthe physical address on a hard disk where information about a primary key is stored.Sequeda argues that due to the fact that the subject column S and the object

column O in the tripleviews do not correspond to the primary keys of the sourcerelation of the triple SQL optimizers cannot leverage indexing for speeding up queryexecution. Therefore two additional columns can be added to a tripleview, namelyS_pk, which denotes the primary key of the tuple from which the subject is taken andO_pk, which does the same for the object. Thereby, O_pk is null if O is a literal andnot an IRI.Due to the fact that the views the system works with are actually queries that are

executed whenever a view is accessed, the desired data is still stored in the sourcerelations. Therefore, queries with these additional primary keys can exploit the indicesand speed up queries because the joins are done on these values.Example 42Consider the description view from example 41. Adding primary keys from the sourcerelation to the tripleview results in the view depicted in table 20. Hereby, the valuein the primary key column for the object is NULL, because the objects are literals.

descriptionViewS S_pk P O O_pkhttp://www.uni.com/course/c1

c1 http://www.uni.com/description

"Teaches the basics ofmathematics."

NULL



"Physics is one of themost fundamental sci-entific disciplines, andits main goal is to un-derstand how the uni-verse behaves."

NULL



"Teaches relational al-gebra and SQL."

NULL

Table 20: Triple view having additional primary key columns.

2. Creation of separate tripleviews for different data types:The second tripleview optimization that is proposed creates separate views depending

34

BOOKSID NAME DESCRIPTIONb1 Basics of Databases This book covers the basic topics of databases.b2 Physics in a Nutshell A collection of physics formula.

Table 21: Relation holding information about literature used at a university.

on the datatype of the object column in a tripleview. In the first step separatetripleviews were created depending on the predicate of a triple, or the class of aninstance, as described above. These triples may have different source relations. Allvalues in these tripleviews were cast to the datatype varchar. Sequeda argues that thesize of the object column in a tripleview is the same as the biggest column from anyof the source relations, where the column corresponds to the later object column ofthe tripleview. This leads to poor query performance. Therefore, separate tripleviewswere created for the same property with different datatypes in the object column.

Example 43Consider the BOOKS relation depicted in table 21 that holds information about booksused for teaching at a university. Furthermore, consider a mapping rule that createstriples from this relation, where the subject corresponds to the ID, the predicate ishttp://www.uni.com/description and the object is a literal that corresponds to thevalue stored in the DESCRIPTION column. The mapping looks like:

BOOKS (http ∶ //www.uni.com/books/ID,


”DESCRIPTION”)

(5)

Now consider that the DESCRIPTION column in the BOOKS relation is of the typevarchar(50) and that the type of the DESCRIPTION column in the COURSES rela-tion depicted in table 18 is of the type varchar(150). Even though, the mappingrules define that from both relations triples should be created where the predicateis http://www.uni.com/description, the triples would not be stored in the sametripleview because the object columns are of different data types. Actually two triple-views for the property http://www.uni.com/description would be created. One forthe triples where the object column has the datatype varchar(50) and one tripleviewwhere the datatype is varchar(150).

3. Materialization of views:In UltrawrapOBDA a distinction is drawn between tripleviews and materialized triple-views. Hereby, tripleviews are stored as queries, which are executed whenever a viewis accessed. Materialized tripleviews on the other hand are stored as actual rela-tions. This means that the underlying query does not have to be executed when thematerialized tripleview is accessed.

35

Sequeda argues that materializing every tripleview leads to the best query executiontimes of UltrawrapOBDA at the cost of additional space. Materializing no tripleviewrequires no additional space, but also leads to higher query execution times. In orderto keep the required space to store tripleviews small and to also have low executiontimes, Sequeda proposes to materialize only leaf views. In case of properties, leaf viewsare the tripleviews, where the property of the tripleview has no more subproperties.In case of classes, leaf views are the tripleviews in which the respective class does nothave any subclasses.Example 44Consider the ontology shown in listing 3, which defines that bachelor students andmaster students are subclasses of the class student. Furthermore, bachelor and masterstudents do not have any subclasses and there is no instance of student that is notalso a bachelor or master student. Therefore, the leaf views would be the views for thebachelor student and master student classes and subsequently these tripleviews wouldbe materialized. Finally, the not materialized student tripleview would be defined asthe union of all of its subclasses, i.e. bachelor and master student.

4.3. Runtime Phase

Based on the virtualized graph that is created in the compilation phase and stored intripleviews, SPARQL queries can be issued against the OBDA system in the runtimephase. The SPARQL query that is posed against the OBDA system is translated toan SQL query, which then retrieves the desired data from the prior created views. Theresults of the SQL query are translated to variable bindings and are finally returned.Due to the fact that inferring implicit triples is achieved by mapping saturation and

not by query extension, the function extend(Q,O), which extends a SPARQL queryQ based on the ontology O to infer implicit triples is the identity function. Formallyspeaking:Definition 61 (Query Extension in Ultrawrap)Let Q be a SPARQL query and O an ontology, then the query extension in Ultrawrapis defined as:

extend(Q,O) = Q (6)

The SPARQL query that is issued against UltrawrapOBDA, is converted to anabstract syntax tree by a parser based on the SPARQL grammar8. An abstractsyntax tree is a tree representation of the syntax. This syntax tree is then used totranslate the SPARQL query to an SQL query. The translation of SPARQL to SQLis mainly based on [13] and [14].Example 45Consider the query depicted in listing 16 that asks for the description of the coursewith the ID c1 and c2. In figure 6 the abstract syntax tree of the SPARQL query isshown.

8https://www.w3.org/TR/sparql11-query/#grammar last retrieved 1.07.2019

36

SELECT ?person ?name

JOIN

UNION ?lecture <http://www.uni.com/name> ?name

?person <http://www.uni.com/hears> ?lecture ?person <http://www.uni.com/worksAt> <http://www.uni.com/laboratory>

Figure 6: SPARQL abstract syntax tree.

SELECT ?person ?name WHERE? person <http ://www.uni.com/hears > ?lectureUNION? person <http ://www.uni.com/worksAt >

<http ://www.uni.com/laboratory >.?lecture <http ://www.uni.com/name > ?name

Listing 16: SPARQL query containing union and join.

The nodes in the SPARQL syntax tree are translated to SQL statements. Theexamples that will follow the next definitions will be based on the virtualized graphshown in table 22 and table 23. Additionally the allTriplesView that contains alltriples in the virtualized graph is depicted in table 24. The tables depict the viewsthat are created in the compilation phase. In this tables the prefix uni: is used forbetter legibility. This means that uni:c1 actually means http://www.uni.com/c1.

courseS P O

uni:c1 type uni:courseuni:c2 type uni:course

studentS P O

uni:s1 type uni:studentuni:s2 type uni:student

professorS P O

uni:p1 type uni:professor

Table 22: Views containing triples that define types of instances.

37

hearsS P O

uni:s1 uni:hears uni:c1uni:s2 uni:hears uni:c2

nameS P O

uni:c1 uni:name "Chemistry"uni:c2 uni:name "Biology"

worksAtS P O

uni:p1 uni:worksAt uni:laboratorydescription

S P Ouni:c1 uni:description "The basics of chemistry"

Table 23: Views for triples with different properties.

allTriplesViewS P O

uni:c1 type uni:courseuni:c2 type uni:courseuni:s1 type uni:studentuni:s2 type uni:studentuni:p1 type uni:professoruni:s1 uni:hears uni:c1uni:s2 uni:hears uni:c2uni:c1 uni:name "Chemistry"uni:c2 uni:name "Biology"uni:p1 uni:worksAt uni:laboratoryuni:c1 uni:description "The basics of chemistry"

Table 24: View that contains all triples in the virtualized graph.

Single triple patterns are translated to relational algebra expressions. In the fol-lowing definition it is defined how triple patterns are translated to relational algebraexpressions.

Definition 62 (Triple Pattern Translation)Let v1, v2, v3 ∈ V and let iri1, iri2, iri3 ∈ I. Furthermore, the function view is thefunction defined in definition 60. The translation trans((s,p,o)) of a triple pattern is

38

defined as follows:

trans((v1, v2, v3)) ∶= %S→v1(%P→v2(%O→v3(πS,P,O(allT riplesV iew)))

trans((v1, v2, iri1)) ∶= %S→v1(%P→v2(πS,P (σO=iri1(allT riplesV iew))))

trans((v1, iri1, v2)) ∶=

⎧⎪⎪⎨⎪⎪⎩

%S→v1(%O→v2(πS,O(σP=iri1(allT riplesV iew)))), if iri1 = type%S→v1(%O→v2(πS,O(view(iri1))), otherwise

trans((v1, iri1, iri2)) ∶=

⎧⎪⎪⎨⎪⎪⎩

%S→v1(πS(view(iri2))), if iri1 = type%S→v1(πS(σO=iri2(view(iri1)))), otherwise

trans((iri1, v1, v2)) ∶= %P→v1(%O→v2(πP,O(σS=iri1(allT riplesV iew))))

trans((iri1, v1, iri2)) ∶= %P→v1(πP (σS=iri1(σO=iri2(allT riplesV iew))))

trans((iri1, iri2, v1)) ∶=

⎧⎪⎪⎨⎪⎪⎩

%O→v1(πO(σS=iri1(σP=iri2(allT riplesV iew)))), if iri2 = type%O→v1(πO(σS=iri1(view(iri2)))), otherwise

trans((iri1, iri2, iri3)) ∶=

⎧⎪⎪⎨⎪⎪⎩

π∅(σS=iri1(view(iri3))), if iri2 = typeπ∅(σS=iri1(σO=iri3(view(iri2)))), otherwise

Example 46Consider the query depicted in listing 16. This query contains 3 triple patterns. Inthe following the translated relational algebra expressions for the triple patterns willbe shown and the evaluation of these relational algebra expressions.

The first triple pattern is ?person <http://www.uni.com/hears> ?lecture andwill be translated to the relational algebra expression:

%S→person(%O→lecture(πS,O(hears))

Evaluating this relation algebra expression on the virtualized graph results in therelation depicted in table 25.

person lectureuni:s1 uni:c1uni:s2 uni:c2

Table 25: Evaluation of a translated triple pattern.

The second triple pattern in the query ?person <http://www.uni.com/worksAt><http://www.uni.com/laboratory> is translated to the relational algebra expres-sion:

%S→person(πS(σO=uni∶laboratory(worksAt)))

The evaluation of this relational algebra expression results in the following relation:

39

personuni:p1


The last triple pattern in the query ?lecture <http://www.uni.com/name> ?nameis translated to:

%S→lecture(%O→name(πS,O(name))

The evaluation of this relational algebra expression is depicted in table 27.

lecture nameuni:c1 "Chemistry"uni:c2 "Biology"


Definition 63 (Translation of SPARQL Union)Let P1, P2 be graph patterns. The evaluation of a SPARQL union is defined as:

trans(P1UNIONP2) ∶= trans(P1) ⊎ trans(P2)

Example 47Consider the query depicted in listing 16. The translation of the triple pattern andthe evaluation of the translation has been presented in example 46. Let the triplepattern ?person <http://www.uni.com/hears> ?lecture be tp1 and additionallylet ?person <http://www.uni.com/worksAt> <http://www.uni.com/laboratory>be tp2 then the translation of the SPARQL union tp1 UNION tp2 is the relationalalgebra expression:

trans(tp1) ⊎ trans(tp2)

Evaluating this translation on the virtualized graph results in the following relation:

person lectureuni:s1 uni:c1uni:s2 uni:c2uni:p1 NULL

Table 28: Evaluation of a translated SPARQL Union.

This relation shows why an outer union is needed for the translation of the SPARQLunion. The evaluation of the translation of tp1 has a different set of attributes than

40

the evaluation of the translation of tp2. Formally: att(trans(tp1)) ≠ att(trans(tp2)).In relational algebra the attributes of two relation algebra expressions in a union haveto be equal to each other. The domains of two variable bindings in a SPARQL uniondo not have to be equal to each other. Therefore an outer union has to be used inrelational algebra to mimic the SPARQL union.

Definition 64 (Translation of SPARQL Join)Let P1, P2 be graph patterns.Let att(trans(P1)) ∩ att(trans(P2)) ≠ ∅.Let A′

1,A′2, ...A

′n ∈ att(trans(P1)) ∩ att(trans(P2)).

Let A1,A2, ...Am ∈ (att(trans(P1))∪att(trans(P2)))∖(att(trans(P1))∩att(trans(P2))).Then the translation of a SPARQL join with common attributes on each side of thejoin is defined as:

trans(P1.P2) ∶= πA1,A2,...Am,A′1,A′

2,...A′n(

κP1.A′1,P2.A′1,A′

1(κP1.A′2,P2.A′2,A

′

2(...(κP1.A′n,P2.A′n,A

′n(

trans(P1) &cond1∧cond2...∧condn trans(P2)

))...))

),

where condi = (P1.A′i = P2.A′

i ∨ P1.A′i = NULL ∨ P2.A′

i = NULL).

Example 48Once again consider the query depicted in listing 16. The evaluation of the translationand evaluation triple pattern was shown in example 46. Furthermore, the translationand evaluation of the union was shown in example 47. Let P be the union in thequery and let tp be the triple pattern ?lecture <http://www.uni.com/name> ?name.Then the translation of the SPARQL join P1.tp is:

πperson,lecture,name(

κgp1.lecture,gp2.lecture,lecture(

ρgp1(trans(P )) &gp1.lecture=gp2.lecture∨gp1.lecture=NULL∨gp2.lecture=NULL ρgp2(trans(tp))))

The evaluation of this relational algebra expression is depicted in the relation intable 29.

person lecture nameuni:s1 uni:c1 "Chemistry"uni:s2 uni:c2 "Biology"uni:p1 uni:c1 "Chemistry"uni:p1 uni:c2 "Biology"

Table 29: Evaluation of a translated SPARQL join.

41

Excursion: If a natural join, which joins two relations on all attributes with the samename, was used to join the relations, a different result would have been produced.The result of the natural join is depicted in table 30.

person lecture nameuni:s1 uni:c1 "Chemistry"uni:s2 uni:c2 "Biology"

Table 30: Evaluation of natural join.

One can see that all rows that have uni:p1 as person are missing. This is the casebecause the natural join does not join on values that are NULL. In the evaluation ofthe translation of the SPARQL union the value of lecture was NULL where theperson was uni:p1 and therefore, results with uni:p1 as person are missing whenthe natural join is used.In order to also obtain these results the theta join is used for attributes with the samename, such that they are also joined on NULL values. Furthermore, the coalescefunction is used to ensure that the lecture value is not NULL if there exists at leastone possible value in one of the joined relations. If the coalesce function would nothave been used, the values for lecture would be NULL for tuples that have uni:p1as person.

Example 49Let ?student type <http://www.uni.com/student> be tp1. Furthermore, let ?coursetype <http://www.uni.com/course> be tp2 and let table 31 and table 32 be the re-lations resulting from evaluating the translations of the triple patterns respectively.

studentuni:s1uni:s2

Table 31: Evaluation of a triple pattern.

courseuni:c1uni:c2

Table 32: Evaluation of a triple pattern.

The relations have no common attribute names and therefore, the join of the twotriple patterns trans(tp1).trans(tp2) is:

πstudent,course(trans(tp1) &true trans(tp2))

42

This expression is equivalent to the cross join of the two relations. The result of thejoin is displayed in table 33.

student courseuni:s1 uni:c1uni:s1 uni:c2uni:s2 uni:c1uni:s2 uni:c2

Table 33: Evaluation of a SPARQL join without common attributes.

Definition 65 (Translation of SPARQL Optional)Let P1, P2 be graph patterns.Let att(trans(P1)) ∩ att(trans(P2)) ≠ ∅.Let A′

1,A′2, ...A

′n ∈ att(trans(P1)) ∩ att(trans(P2)).

Let A1,A2, ...Am ∈ (att(trans(P1))∪att(trans(P2)))∖(att(trans(P1))∩att(trans(P2))).Then the translation of a SPARQL optional with common attributes on each side ofthe join is defined as:

trans(P1 OPTIONALP2) ∶= πA1,A2,...Am,A′1,A′

2,...A′n(

κP1.A′1,P2.A′1,A′

1(κP1.A′2,P2.A′2,A

′

2(...(κP1.A′n,P2.A′n,A

′n(

trans(P1) d|><| cond1∧cond2...∧condntrans(P2)

))...))

),

where condi = (P1.A′i = P2.A′

i ∨ P1.A′i = NULL ∨ P2.A′

i = NULL)

SELECT ?person ?name WHERE? person <http ://www.uni.com/hears > ?lectureUNION? person <http ://www.uni.com/worksAt >

<http ://www.uni.com/laboratory >OPTIONAL?lecture <http ://www.uni.com/description > ?description

Listing 17: SPARQL query containing a union and an optional graph pattern.

Example 50Consider the query depicted in listing 17. This query is similar to the query depictedin listing 16 with the difference that there is a SPARQL optional instead of a SPARQLjoin in the query, and the last triple pattern is different. Thereby, the evaluation ofthe first two triple pattern in the query and the union is equivalent to the translationsand evaluations in example 46 and example 47. The evaluation of the translation of

43

the triple pattern ?lecture <http://www.uni.com/description> ?description isshown in the following table:

lecture descriptionuni:c1 "The basics of chemistry"

Table 34: Evaluation of translated triple pattern.

Now let P be the union in the query and let tp be the triple pattern ?lecture<http://www.uni.com/description> ?description. Then the translation of theSPARQL optional is:

πperson,lecture,description(

κgp1.lecture,gp2.lecture,lecture(

ρgp1(trans(P )) d|><| gp1.lecture=gp2.lecture∨gp1.lecture=NULL∨gp2.lecture=NULLρgp2(trans(tp))))

The evaluation of this relational algebra expression is depicted in table 35.

person lecture descriptionuni:s1 uni:c1 "The basics of Chemistry"uni:s2 uni:c2 NULLuni:p1 uni:c1 "The basics of Chemistry"

Table 35: Evaluation of a translated SPARQL optional.

Similar to the translation of the SPARQL join a natural left outer join would notproduce all desired tuples in the relations. In this case the tuple <uni:s2, uni:c2,NULL> would be missing in the result relation. Therefore, the theta join and thecoalesce function are used for the translation of the SPARQL optional. Like thetranslation of a SPARQL join the SPARQL optional is equivalent to a cross join ifthere are no common attributes on either side of the SPARQL optional.

Definition 66 (Translation of SPARQL SELECT)The translation of a SPARQL SELECT query with the set of variables V ′ and thegraph pattern P is defined as:

trans(SELECT V ′ WHERE P) ∶= πV ′(trans(P ))

Example 51Consider the SPARQL query in listing 16. The translations and the evaluations ofthe translations were presented in example 46, example 47 and example 48. Let P bethe translation of the SPARQL join in the query, then the translation of the SPARQLquery is:

πperson,name(trans(P ))

44

The evaluation of the relational algebra expression is depicted in table 36.

person nameuni:s1 "Chemistry"uni:s2 "Biology"uni:p1 "Chemistry"uni:p1 "Biology"

Table 36: Evaluation of the translation of a SPARQL SELECT.

After having translated a query it is executed on the relational database. The rela-tion that results from this execution is then translated to variable bindings accordingto definition 56.

4.4. SQL Optimizations

The SPARQL query that a user issues against the OBDA system is translated to anSQL query that is executed on the underlying relational databases. Sequeda arguesthat there are two important SQL query optimizations that relational databases canuse to enhance the performance of UltrawrapOBDA. The first one is to detect unsatis-fiable conditions and the second one eliminates self joins. These query optimizationswere originally presented in [15], [16] and [17].

Detection of Unsatisfiable Conditions

This optimization checks if a query result will be empty before the actual query isexecuted. Consider the SQL query depicted in listing 18.

SELECT * FROM descriptionView WHERES = ’http ://www.uni.com/course/c1’ ANDS = ’http ://www.uni.com/course/c2’

Listing 18: SQL query with unsatisfiable condition.

Due to the fact that the query has the condition that the subject S should behttp://www.uni.com/course/c1 and http://www.uni.com/course/c2 at the sametime, the query will return an empty result. Thereby, it is not necessary to executethe query, but it can be detected that the query returns an empty result by lookingat the unsatisfiable condition.Such empty results of queries can be used for unnecessary union sub-tree pruning.

Since the union of a non empty relation with an empty relation corresponds to thenon empty result, the union may be omitted.

Example 52Consider that ϕ1 is a non empty relation and that ϕ2 is an empty relation with

45

att(ϕ1) = att(ϕ2). Then:

Jϕ1 ∪ ϕ2Ks = Jϕ1Ks

Furthermore, if there are multiple relations that are unioned, all empty relations maybe omitted. Consider that ϕ1 and ϕ2 are non empty relations and that ϕ3 and ϕ4 areempty relations. Furthermore, att(ϕ1) = att(ϕ2) = att(ϕ3) = att(ϕ4). Then:

Jϕ1 ∪ ϕ2 ∪ ϕ3 ∪ ϕ4Ks = Jϕ1 ∪ ϕ2Ks

Self Join Elimination

The second optimization should eliminate unnecessary joins of tables with themselves.Such a join of a table with itself is called self join. There are two different types of selfjoin eliminations: 1) Self join elimination of projection and 2) self join elimination ofselection.1) Self join elimination of projection: Consider the relation depicted in table 1

and the query depicted in listing 19.

SELECT t1.Name , t2.FieldFROM STUDENT t1, STUDENT t2WHERE t1.ID = 1 AND t1.ID = t2.ID

Listing 19: SQL query with unnecessary self join of projection.

This query projects attributes of the same relation but the projection is done firstand then the results are joined. Thereby, an unnecessary self join is constructed. Inorder to eliminate this self join the query can be rewritten to the query depicted inlisting 20. This query is just a straight forward projection and selection on a singletable.SELECT Name , FieldFROM STUDENTWHERE ID = 1

Listing 20: SQL query after elimination of self join of projection.

2) Self join elimination of selection: The second self join occurs during theselection. Consider the query depicted in listing 21.

SELECT t1.IDFROM STUDENT t1, STUDENT t2WHERE t1.Name = ’Alice ’ AND t2.Field = ’Computer Science ’

AND t1.ID = t2.ID

Listing 21: SQL query with unnecessary self join of selection.

46

In this query the selection is done separately on t1 and t2, which are actually thesame relation. The results are joined after the selection. This query can be rewrittento the query depicted in listing 22.

SELECT IDFROM STUDENTWHERE Name = ’Alice ’ AND Field = ’Computer Science ’

Listing 22: SQL query after elimination of self join of selection.

Again, after the self join elimination the query is a simple selection and projectionon a single relation.

47

5. Optimization

After reimplementing UltrawrapOBDA as described in section 4, the system shouldbe optimized in the next step. Two optimizations have been added to the system.The first optimization was a triple view optimization that reduced the space neededto materialize views by omitting attributes that have the same value stored in eachrow of the view.The second optimization is the addition of support of exclusive superclass instances.

UltrawrapOBDA creates superclass views by generating the union of all subclass viewsof the superclass. Thereby, instances of the superclass always have to be instancesof at least one subclass of the superclass too. The second optimization allows forexclusive instances of the superclass, which means that an instance of a superclassdoes not necessarily have to be an instance of a subclass.

5.1. View Optimization

In the compilation phase of UltrawrapOBDA a view for each class and for each propertyin the given ontology is generated. These views have three columns to store triplesin: S, P and O, denoting subject, predicate and object respectively. Consider thestudent view depicted in table 37.

bachelorstudentS P O

uni:s1 type uni:BachelorStudentuni:s2 type uni:BachelorStudent

Table 37: View containing all instances of the type uni:BachelorStudent.

The value in the columns P and O are the same for each row in table 37. Since thevalue of the P column is type for each class view and the value of the O column isincluded in the name of the view, the P and O column may be omitted in the classviews. This means that a class view has only one column named S, which containsall instances of the class. Take for instance the optimized uni:BachelorStudent viewshown in table 38. The view only stores the instances of the type uni:BachelorStudentand omits the P and the O column.

bachelorstudentS

uni:s1uni:s2

Table 38: Optimized view with only one column.

Likewise, the value stored in the P column of a property view is the same in eachrow in a single property view. Consider the property view depicted in table 39. This

48

hearsS P O

uni:s1 uni:hears uni:c1uni:s2 uni:hears uni:c2

Table 39: View containing all triples with the predicate uni:hears.

hearsS O

uni:s1 uni:c1uni:s2 uni:c2

Table 40: Optimized view with only two columns.

table depicts the view that holds all triples, which have the property uni:hears aspredicate. Therefore, the value of the P column is uni:hears in each row of theview. In the optimized view the P column is omitted and thus, the property view hasonly two attributes, namely S and O. In table 40 the optimized property view for theproperty uni:hears is depicted.Since the optimized class views and the optimized property views have different sets

of attributes the allTriplesView cannot be generated by simply creating the unionof all views. For class views the P and O column have to be added and for propertyviews the P column has to be added. Therefore, the allTriplesView is created asdefined in the following definition.

Definition 67 (Creation of allTriplesView)Let O be an ontology. Let c1, c2, ..., cn be the classes in O. Let p1, p2, ..., pn be theproperties in O. Then the allTriplesView is created by the following SQL query.

CREATE VIEW allTriplesView ASSELECT S, ’type ’ AS P, ’c1’ AS O FROM view(c1)UNIONSELECT S, ’type ’ AS P, ’c2’ AS O FROM view(c2)...SELECT S, ’type ’ AS P, ’cn’ AS O FROM view(cn)UNIONSELECT S, ’p1’ AS P, O FROM view(p1)UNIONSELECT S, ’p2’ AS P, O FROM view(p2)...SELECT S, ’pn’ AS P, O FROM view(pn)

49

Example 53Consider the optimized uni:BachelorStudent view and the optimized uni:hearsview depicted in table 38 and 40 respectively. The allTriplesView created fromthese two views would be generated by the SQL query:

CREATE VIEW allTriplesView ASSELECT S, ’type ’ AS P,

’http ://www.uni.com/BachelorStudent ’ AS OFROM bachelorstudentUNIONSELECT S, ’http :// www.uni.com/hears ’, OFROM hears

Despite data being omitted in the optimized views, the translation of SPARQLqueries to SQL queries as described in section 4.3 does not have to be changed. Sinceonly the translations of triple patterns address views directly, the translation of otherSPARQL constructs, such as join, optional and union are not affected by optimizingthe views. When looking at the translation of triple patterns, several cases have tobe considered:

1. The predicate is a variable.When the predicate of a triple pattern is a variable, the triple pattern mapsto the allTriplesView. Since the allTriplesView has the three columns S, Pand O it can answer any triple pattern, regardless if the subject and object of atriple pattern are variables, IRIs or literals.

2. The predicate is type.If the predicate is type 4 different combinations of variables v, v1, v2 and IRIsiri, iri1, iri2 are possible:(a) v1 type v2If the subject and the object of a triple pattern are variable and the predicateis type, then the class view cannot be determined and thus, the triple patternis mapped onto the allTriplesView, which can answer the triple pattern.(b) v type iriIf the subject of a triple pattern with the predicate type is a variable and the ob-ject is an IRI, then the triple pattern can be mapped onto view(iri). Since onlythe subject of the triple pattern is a variable, the respective view has enoughinformation to answer the translation of the triple pattern. Take for instancethe triple pattern ?student type <http://www.uni.com/BachelorStudent>.This triple patten with one variable is translated to the relational algebra ex-pression %Sstudent(πS(bachelorstudent)). Since only the S column is projected,the uni:BachelorStudent view depicted in table 38 can answer this triple pat-tern.(c) iri type vSimilar to case 2a a class view cannot be determined, since the object of the

50

triple pattern is a variable. Subsequently, this triple pattern is mapped ontothe allTriplesView, which can answer the translation of the triple pattern.(d) iri1 type iri2If subject and object of a triple pattern with type as predicate are IRIs, thetriple pattern is mapped onto view(iri2). All rows with the value of iri1 storedin the S column are selected and then the empty set in projected. Consider thetriple pattern uni:s1 type <http://www.uni.com/BachelorStudent>. Thistriple pattern would be translated to (π∅(σS=uni∶s1(bachelorstudent)). Onceagain this relational algebra expression can be answered by the view shown intable 38.

3. The predicate is an IRI and not type.If the predicate p of a triple pattern is an IRI different from type, then 4different combinations of variables v, v1, v2 and IRIs iri, iri1, iri2 are possible.(a) v1 p v2If subject and object are variables, the triple pattern is mapped onto view(p)and the S and O columns of the view are projected. Consider the triple pattern?student uni:hears course. The respective relational algebra expression tothis triple pattern is %Sstudent(%Ocourse(πS,O(hears))). This expression canbe answered by the optimized uni:hears view shown in table 40.(b) v p iriIf only the subject of the triple pattern is a variable, then the triple patternis mapped onto view(p) and the rows with iri as the value of O are selectedfrom the view. Then the values of the column S are projected. Consider thetriple pattern ?student uni:hears uni:c1. The triple pattern is translated tothe relational algebra expression %Sstudent(πS(σO=uni∶c1(hears))). Since onlythe columns S and O are addressed, the relational algebra expression can beanswered by the uni:hears view.(c) iri p vThis case is analogue to case 3b with the difference that in this case the objectis a variable and the subject is an IRI. Therefore, it can also be answered bythe optimized views.(d) iri1 p iri2Finally, the last case is a triple pattern with two IRIs. Once again this triplepattern is mapped onto view(p). All rows where the value stored in the Scolumn corresponds to iri1 and the value stored O column corresponds to iri2 areselected and then the empty set is projected. Consider the triple pattern uni:s1uni:hears uni:c1. This triple pattern is translated to the relational algebraexpression π∅(σS=uni∶s1(σO=uni∶c1(hears))). This relational algebra expressioncan be answered by the optimized uni:hears view shown in table 40.

In summary, the P and O columns in the original class views are never addressedand therefore, they can be omitted in the optimized views. Likewise, the P column inthe original property views is never selected and subsequently the P column can be

51

omitted in the optimized property views. Since the omitted columns are not selectedin the original views, the triple pattern translation does not have to be altered for theoptimized views.

5.2. Support of Exclusive Superclass Instances

To this point a view for a superclass was created by creating the union of all views ofsubclasses of the superclass. Take for instance the classes uni:BachelorStudent anduni:MasterStudent, which are subclasses of the class uni:Student. The uni:Studentview is created as the union of the uni:BachelorStudent and the uni:MasterStudentview. In the reimplemented system each instance of uni:Student is also an instance ofuni:BachelorStudent or uni:MasterStudent. However, in an RDF graph instancesof uni:student can exists that are neither instances of uni:BachelorStudent nor ofuni:MasterStudent. Consider the IRI <http://www.uni.com/studentPhd1> thatdenotes a PhD student. Since a PhD student is neither a bachelor student, nor amaster student, the PhD student <http://www.uni.com/studentPhd1> is only ofthe type uni:Student.Due to the fact that the uni:Student view is the union of its subclass views such

exclusive superclass instances could not be represented by the original uni:Studentview. In order to also support exclusive superclass instances, a materialized view ex-clusiveStudent is created that contains all instances that are of the type uni:Studentand not of the type uni:BachelorStudent or uni:MasterStudent. After that theuni:student view is defined as the union of the uni:BachelorStudent view, theuni:MasterStudent view and the exclusiveStudent view. The set of mapping rulesthat define such an exclusive view is defined as following:

Definition 68 (Mapping Rules for Exclusive Superclasses)Let c be a superclass and let SC be the set of subclasses of c. Let M be a mapping.The set of mapping rules to create the exclusive view for the superclass c is definedas:

Mc = m∣m ∈M ∶m = ϕ (θ, type, c) and/∃m′

∈M ∶m′= ϕ (θ, type, sc) where sc ∈ SC

Note that the system currently only checks for the syntactic equivalence of ϕ and notfor the semantic equivalence.

Example 54Let uni:BachelorStudent be a subclass of uni:Student. For simplicity consider thatuni:Student does not have the subclass uni:MasterStudent. Furthermore, considerthe following mapping rules:

52

m1 = STUDENT (http ∶ //www.uni.com/student/ID,

type,

http ∶ //www.uni.com/BachelorStudent)

m2 = STUDENT (http ∶ //www.uni.com/student/ID,

type,


m3 = PHDSTUDENT (http ∶ //www.uni.com/student/ID,

type,


The mapping rule m1 is not included in MStudent since the object of triples createdby this rule is uni:BachelorStudent and not uni:Student. However, the instancesof the class uni:BachelorStudent that are defined by this mapping rule are laterincluded in the uni:Student view because uni:BachelorStudent is a subclass ofuni:Student. Mapping rules m1 and m2 have the same relational algebra expression,namely STUDENT, the same mapping template http://www.uni.com/student/IDfor subjects and type as predicate IRI. Thereby, the evaluation of the mapping rulesreturns the same triples, with the difference that all objects of triples created by m1

are uni:BachelorStudent and all objects of triples created by m2 are uni:Student.Since, the instances created by evaluating m1 are later unioned into the uni:Studentview, mapping rule m2 can be ignored in MStudent. Finally, MStudent = m3 becausethe instances created by m3 are not included in any subclass view of uni:Student.

The materialized view for exclusive instances of the uni:student class is createdby the following SQL query:

CREATE MATERIALIZED VIEW exclusiveStudent ASSELECTCONCAT(’http ://www.uni.com/student/’, ID) AS S,FROM PHDSTUDENT

Finally, the uni:student view would be created by the following SQL query.

CREATE VIEW student ASSELECT * FROM exclusiveStudentUNIONSELECT * FROM bachelorStudent

53

6. Evaluation

Sequeda’s UltrawrapOBDA based on Oracle DB is not available as open source. How-ever, benchmark results of the Texas Benchmark [3] for UltrawrapOBDA are providedin [1]. To check whether the reimplementation based on PostgreSQL9 created inthis thesis has a performance comparable to UltrawrapOBDA the Texas Benchmarkhas been used to evaluate the reimplemented system and the optimized system withPostgreSQL as underlying relational database. Furthermore, the benchmark is usedto evaluate the open source OBDA system Ontop10 with PostgreSQL as underlyingrelational database to compare the OBDA system implemented in this thesis with afreely available OBDA system.In the following section the benchmark used for the evaluation will be presented and

after that the benchmark results of UltrawrapOBDA provided in [1], the reimplementedsystem, the optimized system and Ontop are compared.

6.1. Texas Benchmark

The Texas Benchmark for OBDA systems was inspired by the Wisconsin Benchmarkfor relational databases [18]. It provides relational data, multiple mappings from thisrelational data onto RDF triples, various ontologies, which serve as the global schemaof the OBDA system, and a set of benchmark queries.

Dataset

The dataset of the Texas Benchmark consist of a single database with a single relationcalled PRODUCTS. This relation consists of 21 attributes, one million rows and theprimary key is the PRODUCTID. Besides the primary key there are 14 filler attributesthat have random strings as values. Additionally, the relation has the attributes TWO,FIVE, TEN, TWENTY, FIFTY and HUNDRED. The value of these attributes is a randominteger value between 1 and the respective attribute name. For instance the valueof the attribute TWO can be either 1 or 2, or the value of the attribute FIFTY is anumber between 1 and 50. These 6 attributes are used to create views with a specificselectivity in the compilation phase.

Ontology

The ontologies in the Texas Benchmark are rather simple and only have two types oftriples: 1) Triples of the form (x, type, owl:Class) that define that x is a class and 2)triples of the form (x, subClassOf, y). Thereby, the ontologies used for the benchmarkdefine tree like class structures.The Texas Benchmark provides 5 ontologies with different depths. The depth of an

ontology is defined by the highest number of classes that are transitively connected

9https://www.postgresql.org/ last retrieved 12.7.201910https://ontop.inf.unibz.it/ last retrieved 9.9.2019

54

by edges labeled with subClassOf. Consider the ontology consisting of two triples:

(MasterStudent, subClassOf,Student), (Student, subClassOf,Person)

In figure 7 the ontology and the depth of this ontology is depicted. Since thereare three classes connected transitively by edges labeled with subClassOf the overalldepth of this ontology is 3.

Person

Student

MasterStudent

subClassOf

subClassOf

Depth 3

Depth 2

Depth 1

Figure 7: Depths of an ontology.

Note that in the actual benchmark ontologies there are more than one class on asingle depth of the tree and classes have more than one subclass. In total there are5 ontologies in the benchmark with the depths 2, 3, 4, 5 and 6. Each ontology hasexactly 100 leaf classes.

Mapping

The Texas Benchmark provides R2RML mappings [19] from relational data to RDFtriples. There are 6 mappings for each depth of the ontology and therefore, 30 map-pings in total. The mapping rules in the mapping are all of the form:

ϕ (http ∶ //www.obda − benchmark.org/texas/instance/Product/PRODUCTID,


c)

Where ϕ = πPRODUCTID(σselec=i(PRODUCT )) and selec is one of TWO, FIVE, TEN,TWENTY, FIFTY or HUNDRED and i is a integer value. Furthermore, c is a class definedin the respective ontology. This means that all triples in the virtualized RDF graphhave type as property.

55

Example 55Let ϕ = πPRODUCTID(σHUNDRED=30(PRODUCT )), and let class1 be a class in theontology, then an example of a mapping rule in the benchmark is:

ϕ (http ∶ //www.obda − benchmark.org/texas/instance/Product/PRODUCTID,


class1)

This rule defines that for each row in the relation PRODUCT that has the value 30 forthe attribute HUNDRED a triple should be created where the PRODUCTID defines thesubject of the triple, the predicate is type and the object is class1.

For a single ontology depth there are 6 mappings with different selectivities. The 6mappings differ from each other in terms of selec in ϕ in the mapping rule. For exam-ple, in each mapping rule in the mapping with selectivity 10, selec is TEN. Thereby, aclass has approximately 100000 instances. In each mapping rule in the mapping withselectivity 100, selec is HUNDRED and therefore, each class has about 10000 instances.Furthermore, the Texas Benchmark provides unsaturated and saturated mappings

with respect to the ontology of the respective depth. Therefore, it can be checked if theimplementation of mapping saturation in an OBDA system is correct by comparingthe generated saturated mapping to the provided saturated mapping.

Queries

A benchmark query set is provided for each ontology depth. The number of queriesfor an ontology depth corresponds to the depth. For instance, there are 3 benchmarkqueries for the ontology with depth 3. Consider the ontology depicted in figure 7with the depth 3. Each query in the benchmark query set asks for all instances ofa single class on a different depth of the class tree. In case of the example ontologythe first query would ask for all instances of Person, the second query would askfor all instances of Student and finally the third query would ask for all instancesof MasterStudent. Thereby, the benchmark evaluates how well subclasses can beretrieved from the virtualized graph. In listing 23 an example query from the bench-mark is depicted asking for all instances on depth 1 of the ontology.

PREFIX d3: <http ://www.obda -benchmark.org/texas/ontologies /03 DepthThree.owl#>

PREFIX rdf: <http ://www.w3.org /1999/02/22 -rdf -syntax -ns#>SELECT ?x WHERE

?x rdf:type d3:D1_5.

Listing 23: Example of a benchmark query.

56

6.2. Experimental Setup

The benchmark was executed on an Ubuntu 16.04 machine with 8 GB memory, 500GB disk space and 4 1.7 Ghz processor cores for the reimplemented and optimizedsystem and for Ontop 1.18.1. The Java version on the machine was 1.8.0.171. Theversion of PostgreSQL was 9.5.17. In comparison the benchmark for UltrawrapOBDA

was executed on a Microsoft Windows Server 2008 R2 Standard, with Oracle 11gRelease 2 Enterprise Edition as relational database. The machine had four cores ofIntel Xeon X7460 with 2.66 GHz and 16 GB of RAM.All queries were executed 10 times and the highest and lowest execution time of

each query was deleted to prevent outliers. Then the average execution time perquery was calculated from the remaining 8 execution times per query.

6.3. Evaluation Results

The first step of the reimplemented system and the optimized reimplemented systemis to saturate the mapping. The newly generated saturated mapping is equivalentto the saturated mapping provided for the benchmark. Therefore, it seems that themapping saturation for the ontological term subClassOf is implemented correctly.In table 41 the average query execution time for single depths and selectivities are

given for UltrawrapOBDA, the reimplemented system, the optimized reimplementedsystem and Ontop. For instance, D5_20 in the first column means that the depth ofthe ontology was 5 and that the selectivity was 20.

The results show that the execution times for the reimplementation and the op-timized reimplementation are lower than the execution time of UltrawrapOBDA andOntop in all cases. Furthermore, the execution times of the reimplementation andthe optimized reimplementation are very similar to each other.The highest average execution time per query for each OBDA system is for the

ontology with depth 2 and the selectivity of 2. UltrawrapOBDA needs averagely9795ms to execute a query, the reimplementation needs 1798ms, the optimized reim-plementation needs 1790ms and Ontop needs averagely 2206ms. This means thatUltrawrapOBDA needs about 5.47 times as long as the reimplementation and theoptimized reimplementation need. Ontop on the other hand only needs about 1.23times as long as the newly implemented systems. The shortest execution time foreach OBDA system are achieved for the ontology with the depth 2 and a selectivityof 100. The average execution time per query for UltrawrapOBDA for this setting is550ms and for Ontop it is slower with 721ms. The reimplemented system needs only121ms to execute a query and the optimized reimplementation needs about 111msper query. This means UltrawrapOBDA needs about 4.78 times as long as the reim-plementation and Ontop needs about 6.23 times as long as the reimplementation.

∗ Since UltrawrapOBDA is not publicly available the results for the system are taken from [1] andthereby, UltrawrapOBDA was evaluated with a different experimental setting.

57

DepthandSelect.

UltrawrapOBDA∗ Reimplementation OptimizedReimplementation

Ontop

D6_100 903 412 402 1395D6_50 960 443 446 1145D6_20 952 469 453 1024D6_10 1150 589 580 1073D6_5 2237 834 790 1222D6_2 3528 1770 1736 2031D5_100 876 400 371 1277D5_50 904 392 379 1046D5_20 1094 475 470 1010D5_10 1087 543 550 1044D5_5 2196 858 846 1297D5_2 3934 1718 1709 2111D4_100 1148 426 361 1293D4_50 1205 428 411 1090D4_20 1259 472 449 978D4_10 1525 535 532 1004D4_5 2973 874 872 1328D4_2 4990 1789 1759 2027D3_100 1040 305 300 1116D3_50 1253 322 313 979D3_20 1417 365 362 879D3_10 1793 453 452 998D3_5 2537 689 678 1233D3_2 6697 1746 1753 2183D2_100 550 121 111 721D2_50 640 133 142 677D2_20 995 200 198 710D2_10 1955 362 359 881D2_5 3835 718 713 1260D2_2 9795 1798 1790 2206Average 2156 687 676 1274

Table 41: Execution times of SPARQL queries on OBDA systems in ms.

58

Generally UltrawrapOBDA seems to be faster than Ontop in settings with higher se-lectivities such as 100 or 50 but Ontop seems to be faster than UltrawrapOBDA withlower selectivities such as 2 or 5. Since UltrawrapOBDA was evaluated with a dif-ferent experimental setup and since this evaluation is 6 years old, the comparison ofUltrawrapOBDA has to be handled with care. In figure 8 the average execution timeper query of all depths an selectivities is shown.

Figure 8: Average execution time per query.

The chart shows that the average execution time of UltrawrapOBDA is the highestwith 2156ms per query and thereby, UltrawrapOBDA needs averagely 3.16 times aslong as the newly implemented system. Ontop has an average query execution timeof 1274ms and is faster than UltrawrapOBDA but slower than the newly implementedsystems. Ontop needs averagely 1.87 times as long as the newly implemented system.Note that Ontop does not create any views or stores any additional data in the rela-tional database but creates SQL queries that are executed directly on the unchangedrelational database. The unoptimized reimplementation needs averagely 687ms andthe optimized reimplementation needs averagely 676ms to execute a query. Thereby,the reimplementation and the optimized reimplementation have comparable execu-tion times, even though the optimized version supports exclusive superclass instancesand needs less space to materialize views.The space required to store the relational data of the Texas Benchmark, without

any views or materialized views is 1219MB. For materializing views additional space isneeded. There is no information on how much additional space UltrawrapOBDA needsto materializes views and Ontop does not create any additional views. Therefore, only

59

the required space for the reimplementation and the optimized reimplementation iscompared. In figure 9 the additional space for materializing views is presented.

Figure 9: Additional Space Required for Materializing Views.

The chart shows that the optimized version of the materialized views requires only91MB of additional storage space to store the materialized views. The unoptimizedviews required 205MB to store the materialized views and subsequently the requiredspace could be reduced by about 114MB or about 55%. While the storage spacewas reduced with the optimization, the execution time did not change as shown infigure 8.In [20] it is argued that benchmarks for databases tend to measure execution times

but often neglect the completeness and correctness of query results. Since a fast queryexecution is not helpful if the result set is incorrect or incomplete the virtualized graphcreated in the compilation phase has been compared to the results of Ontop. Byexecuting the query SELECT * WHERE ?s ?p ?o, a result set containing all triplesin the OBDA system was returned. This result was created for Ontop and the newlyimplemented system. The comparison of the results returned by Ontop and the newlyimplemented system show that each result in the result set of Ontop is existing inthe result set of the new system and vise versa. However, the result set of the newlyimplemented system included duplicate results. These duplicate results were causedby the mapping saturation because of a subclass relationship that did not have a treestructure. Other potential causes for duplicate results may be a careless definition ofmapping rules or by the implemented bag semantics of SPARQL. Nevertheless, thereimplemented system and Ontop returned the same results and thus, it seems thatboth systems return complete and correct results.

60

6.4. Summary of Results

The results show that the newly implemented system has generally lower executiontimes than UltrawrapOBDA and Ontop. However, there are some threats to validityto be considered: Due to the fact that UltrawrapOBDA is not publicly available, thebenchmark results of UltrawrapOBDA used in this thesis were provided in [1]. There-fore, the evaluation was conducted on a different experimental setting. Furthermore,the evaluation is 6 years old and subsequently, a new evaluation may yield differentresults. In order to have a more meaningful comparison the state of the art OBDAsystem Ontop was benchmarked to compare it with the reimplementation. However,the newly implemented OBDA system returned duplicate results and Ontop did not.The lower execution times of the new system may be explained by the fact that On-top filters duplicate results and the new system does not. Another aspect is that theTexas Benchmark only has a single table in the underlying relational database eventhough real world databases usually have multiple tables and views. Furthermore,the ontologies provided for the benchmark only have the ontological term subClassOfand therefore fail to test how well OBDA systems support other ontological terms.Finally, the queries in the benchmark are very simple and only test triple patternsand SPARQL projections and neglect other SPARQL features.Nonetheless, the newly implemented system returns a correctly saturated mapping

and the result sets created by Ontop and the newly implemented system containedthe same results, though the newly implemented system returned duplicate results.When it comes to execution time the new OBDA system has lower execution timesthen the competitors UltrawrapOBDA and Ontop. Furthermore, the execution timesshow that the reimplemented system and the optimized reimplemented system havecomparable execution times even though the optimized version needs about 55% lessspace to materialize views and supports exclusive superclass instances.

61

7. Related Work

The OBDA system that was reimplemented in this thesis is based on UltrawrapOBDA.In the following section various other OBDA systems and their functionality arepresented.

7.1. Ontop

In [2] the open source OBDA system Ontop is presented. Ontop is developed by theKnowledge Representation meets Databases research group at the Free Universityof Bozen Bolzano11. Ontop supports various commercial and open source relationaldatabase systems, for instance Oracle, Microsoft SQL Server, PostgreSQL or MySQL.The system distinguishes between the off-line phase, in which the system is preparedand the on-line phase, in which SPARQL queries can be issued against the system.In the off-line phase three inputs are given to Ontop: An ontology, an instance of arelational database and a mapping.In the off-line phase Ontop takes three steps to enable querying the relational

database with SPARQL:

1. The input ontology is given to a reasoner and all implicit triples in the ontol-ogy are inferred. Ontop supports RDFS [21] and OWL 2 QL [22] as ontologylanguages and hence, Ontop supports more ontological terms than the reimple-mented system.

2. So called T -mappings are created from the input mapping. This means that alltriples that would have been inferred based on the ontology are also representedby the mapping. The mapping for the OBDA system can be defined either in asyntax especially designed for Ontop mappings or in the W3C standard languagefor expressing mappings from relational databases to RDF datasets R2RML [19].This step is similar to the saturation of mappings in UltrawrapOBDA.

3. Mapping rules are optimized to enable performant query rewriting. There is nosuch step in UltrawrapOBDA and subsequently not in the system presented inthis thesis.

After the off-line phase has been finished, SPARQL queries can be issued againstOntop in the on-line phase. Ontop supports SPARQL 1.0 queries and the OWL 2 QLentailment regime of SPARQL 1.1. Translating SPARQL queries into SQL is realizedthrough a system called Quest [23]. The query answering of SPARQL queries can bedivided into three steps:

1. The SPARQL query is translated into a respective SQL query based on theT -mapping. Each basic graph pattern is replaced according to the mapping, anSPARQL join is translated to an SQL inner join, a SPARQL optional clause is

11https://ontop.inf.unibz.it/about-us/ last retrieved 17.05.19

62

translated to an SQL left join and SPARQL unions and filters are translated toSQL unions and filters respectively.

2. The resulting unoptimized SQL query is then optimized by pushing joins in-side of unions, eliminating redundant self-joins and by removing subqueries ifpossible.

3. The SQL query is executed and the results are returned as correspondingSPARQL results.

Contrary to UltrawrapOBDA and subsequently the system introduced in this thesis,Ontop does not add any views to the underling relational database. When a SPARQLquery is issued against Ontop the query is rewritten, optimized and the resulting SQLquery is executed directly on the original relational database.

7.2. Optique

In [24] the OBDA system Optique is presented. Optique is based on Ontop for queryanswering. This means that Optique creates T -mappings and translates SPARQLqueries to SQL queries in the same way as Ontop does. Equally to Ontop, Optiquesupports R2RML as mapping language and allows for inference based on OWL2 QL.Optique leverages the capabilities of Ontop but claims to go beyond the OBDA

paradigm by addressing 4 prerequisites:

1. It is stated that an OBDA system needs an ontology for the global view andmappings in order to query underlying data. Since these artifacts may becomevery complex, they should be maintainable and comprehensive. In order to ad-dress this prerequisite, Optique extracts ontologies and mappings directly fromthe relational schema of data. This process is based on the W3C recommen-dation for a direct mapping of relational data onto RDF data [25]. Thereby,a single class is created for each table in the relational schema and a singleproperty for each column. The quality of the created ontology and mappingstrongly depends on the quality of the relational schema but may be a goodstarting point. Furthermore, it is stated that with the help of the ontologymatching tool LogMap [26], ontology matching could be used to combine exist-ing ontologies with the newly generated ontology. However, it is also mentionedthat still work from ontology experts will be needed in most cases to finalizethe ontology and the mapping to be useful in the OBDA context.

2. Users of an OBDA system cannot be expected to know SPARQL to query theOBDA system. Therefore, a user interface for query formalization as well asan interface that helps with understanding the ontology is needed. Optiqueprovides a user interface called Optique VQS that is based on the ontologythat a user, who is not familiar with a query language like SPARQL can use tocreate SPARQL queries. The interface provides a browser to navigate through

63

the ontology and a query generation interface that hides the actual SPARQLsyntax.

3. It is claimed that existing OBDA systems do not handle big ontologies andmappings very well and therefore, are not scalable. In order to address thisprerequisite the benchmark results presented in [24] are mentioned, which showthat Ontop does scale well.

4. Not all data is relational data and therefore, geospatial, temporal and streamingdata should be taken into consideration and be made available via the OBDAsystem. Optique addresses this prerequisite by allowing streaming data to bequeried with the OBDA system. The relational data stream management systemused in Optique is called Exareme [27]. Based on the stream data the mappingsin Optique create not only static virtual data, also ontology-level streams maybe generated. These may not be processed with SPARQL. Therefore, Optiquealso provides an extension of SPARQL called streaming and temporal ontologyaccess with a reasoning-based query language (STARQL). This query languageallows for using classical SPARQL syntax and a custom syntax for queryingstreaming data.

7.3. Mastro

In [28] the Java based OBDA system Mastro is presented. The OBDA system wasdeveloped at the university of Rome "La Sapienz" and the free university of Bozen-Bolzano. It is claimed that Mastro supports any relational database that can beaccessed with a JDBC driver, however experiments were conducted on an IBM DB2relational database 12 and an Oracel 10g relational database13.In [29] the integration of R2RML as mapping language into Mastro is presented.

According to the Mastro web page14 Mastro currently fully supports R2RML map-pings. When it comes to query answering Mastro is based on conjunctive queriesand therefore, only SPARQL joins and unions are supported. Mastro fully supportsOWL 2 QL ontologies and inferring data is achieved by query rewriting in two steps:First the given SPARQL query is extended according to the input ontology. In thesecond step the extended SPARQL query is translated to an SQL query based onthe given mappings. Then the resulting SQL query is executed and the results aretranslated to variable bindings.

12https://www.ibm.com/analytics/db2 last retrieved 18.09.201913https://www.oracle.com/technetwork/database/express-edition/database10gxe-459378.html last

retrieved 18.09.201914http://obdasystems.com/mastro last retrieved 19.05.19

64

7.4. D2RQ

Among the first OBDA systems was the open source system D2RQ15. The OBDAsystem supports multiple relational databases such as Oracle, MySQL, PostgreSQL,SQL and HSQLDB. D2RQ enables querying a virtualized RDF graph with SPARQLbased on the D2RQ mapping language, which describes the relationship betweenrelational data and the desired RDF data. Similar to R2RML, D2RQ mappings areserialized in RDF. A mapping is either written manually or is automatically createdby mapping each table onto a new RDF class and each column of the table onto anew property.D2RQ does not distinguish between any phases and does not create any views in

the relational database. Based on the mappings, SPARQL queries issued against thesystem are translated to SQL queries that are executed on the underlying, unchangedrelational database. D2RQ does not infer any implicit knowledge based on a givenontology and therefore, only allows for querying the exact same data that is given inthe underlying relational database.In [30] an approach to allow for using SPARQL queries to update the underlying

relational data is proposed. The system D2RQ++ supports INSERT and DELETEqueries and ensures that no integrity constraints from the original relational databaseare violated. The system that was introduced in this thesis does not support anyupdate actions.

7.5. Morph-RDB

Morph-RDB [31] is an OBDA system based on R2RML mappings. In addition tothe relational databases MySQL, PostgreSQL and H2, Morph-RDB also allows forquerying CSV files and the column store MonetDB16 as RDF graph. The OBDAsystem does not create any views in the relational database and has only one phase inwhich SPARQL queries are translated to SQL queries. A set of query optimizationsare used to enhance the performance of SQL queries. Some of the optimizationstrategies Morph-RDB uses during query translation from SPARQL to SQL are self-join elimination, subquery elimination, and left-outer join elimination. The OBDAsystem does not support any inference. Therefore, underlying relational data may bequeried with SPARQL, but no new triples can be inferred based on the given ontology.Besides as OBDA system, Morph-RDB can also be used as R2RML processor.

This means that based on the input R2RML mapping and the underlying database,Morph-RDB can create and save an RDF graph consisting of the triples specifiedby the mapping and the underlying data. However, this feature can be achieved bymost OBDA system by simply issuing a query that retrieves all triples against theOBDA system. Nonetheless, actual R2RML processors might be more performantfor generating RDF graphs from relational data than OBDA systems.

15http://d2rq.org/ last retrieved 19.05.1916https://www.monetdb.org/ last retrieved 18.09.2019

65

8. Conclusion and Future Research

Relational databases are the most frequently used databases. In order to include rela-tional data to knowledge graphs, OBDA systems can be used. Based on an ontologythat serves as global schema and mappings from relational data onto this ontology,relational data can be queried with SPARQL.In this thesis a formal framework for OBDA systems has been introduced. Based

on this framework the OBDA system UltrawrapOBDA has been formally defined.UltrawrapOBDA uses views and materialized views to create a virtualized graph thatis queryable with SPARQL. After reimplementing this system, two optimizations havebeen made: i) The amount of columns in views has been reduced and subsequentlythe space needed to store materialized views has been reduced. ii) The support ofinstances of superclasses that are not instances of any of their subclasses has beenadded. In the reimplemented unoptimized system instances of superclasses also hadto be instances of at least one subclass of the superclass.The reimplementation, the optimized reimplementation and the state of the art

OBDA system Ontop have been benchmarked with the Texas Benchmark, which isa benchmark created especially for OBDA systems. Results of the benchmark showthat the average query execution times of the reimplemented and optimized sys-tem are comparable even though the optimized system supports exclusive superclassinstances and the space required to materialize views has been reduced by approx-imately 55%. The comparison of the execution times of UltrawrapOBDA with thereimplemented system shows that UltrawrapOBDA needs averagely 3.14 times longerto execute queries than the reimplemented system. Ontop needs needs averagely 1.87times longer than the reimplementation. However, the reimplementation returns du-plicate results and Ontop does not. Therefore, one aspect that could be addressed infuture research is to identify, which duplicates should be retained based on the bagsemantics of SPARQL as described in [32] and which duplicates should be eliminated.Currently, the implemented OBDA systems supports SPARQL triple patterns, pro-

jection, joins, optionals and unions. In the future more SPARQL features could beadded to the system, such as filters, minus or property paths. Furthermore, whenunwanted duplicate results are eliminated, the system could be extended to supportaggregation functions. Another SPARQL feature that could be added are UPDATEfunctions to add data to the relational database via SPARQL queries. Furthermore,whenever the relational database is updated, the materialized views should be up-dated, too. Thereby, the transactional security of the OBDA system may be main-tained. The system developed in this thesis only supports a subset of OWL2 QL andcould be extended to fully support OWL2 QL.Furthermore, benchmark query execution times that were measured on cold and

warm caches could be compared to compare speed ups. Since the Texas Benchmarkevaluates only very specific features of OBDA systems as described in section 6.4 theimplemented OBDA system could be evaluated more extensively with other OBDAbenchmarks such as the NPD benchmark [33], or a new benchmark could be devel-

66

oped, which focuses on measuring execution times of diverse queries as well as ontesting for completeness and correctness of results.

67

Acknowledgments

First and foremost I want to thank my advisor Daniel Janke not only for the supportduring this thesis but for all the things I have learned during my masters program.Secondly, I want to thank Martin Leinberger, without whom this thesis would nothave been possible. With Daniel and Martin each meeting had the perfect balance ofproductive discussions and laughs.I also want to thank Steffen Staab for his guidance during this thesis, and for all

the possibilities I had during my masters program.Next, I want to thank Frederik Rüther, Nick Theisen and Thies Möhlenhof for

proof-reading this thesis and for every beer we had together during our studies.Finally, I want to thank my family and my girlfriend for their constant support

during this thesis and my whole studies.

68

References

[1] J. F. Sequeda, Integrating Relational Databases with the Semantic Web. PhDthesis, University of Texas at Austin, 5 2015.

[2] D. Calvanese, B. Cogrel, S. Komla-Ebri, R. Kontchakov, D. Lanti, M. Rezk,M. Rodriguez-Muro, and G. Xiao, “Ontop: Answering sparql queries over rela-tional databases,” Semantic Web, vol. 8, 02 2016.

[3] J. F. Sequeda, M. Arenas, and D. P. Miranker, “Obda: Query rewriting or ma-terialization? in practice, both!,” in The Semantic Web – ISWC 2014 (P. Mika,T. Tudorache, A. Bernstein, C. Welty, C. Knoblock, D. Vrandečić, P. Groth,N. Noy, K. Janowicz, and C. Goble, eds.), (Cham), pp. 535–551, Springer Inter-national Publishing, 2014.

[4] M. Lanthaler, D. Wood, and R. Cyganiak, “RDF 1.1 conceptsand abstract syntax,” W3C recommendation, W3C, Feb. 2014.http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/.

[5] N. Guarino, D. Oberle, and S. Staab, What Is an Ontology?, pp. 1–17. 05 2009.

[6] J. Weaver and J. A. Hendler, “Parallel materialization of the finite rdfs clo-sure for hundreds of millions of triples,” in The Semantic Web - ISWC 2009(A. Bernstein, D. R. Karger, T. Heath, L. Feigenbaum, D. Maynard, E. Motta,and K. Thirunarayan, eds.), (Berlin, Heidelberg), pp. 682–697, Springer BerlinHeidelberg, 2009.

[7] S. Harris and A. Seaborne, “SPARQL 1.1 query language,” w3c recommen-dation, W3C, Mar. 2013. http://www.w3.org/TR/2013/REC-sparql11-query-20130321/.

[8] J. Pérez, M. Arenas, and C. Gutierrez, “Semantics and complexity of sparql,” inThe Semantic Web - ISWC 2006 (I. Cruz, S. Decker, D. Allemang, C. Preist,D. Schwabe, P. Mika, M. Uschold, and L. M. Aroyo, eds.), (Berlin, Heidelberg),pp. 30–43, Springer Berlin Heidelberg, 2006.

[9] B. He, M. Patel, Z. Zhang, and K. C.-C. Chang, “Accessing the deep web,”Commun. ACM, vol. 50, pp. 94–101, May 2007.

[10] R. Elmasri and S. Navathe, Fundamentals of Database Systems, ch. 3, pp. 59–85.USA: Addison-Wesley Publishing Company, 6th ed., 2010.

[11] S. Ceri and G. Gottlob, “Translating sql into relational algebra: Optimization,semantics, and equivalence of sql queries,” IEEE Transactions on Software En-gineering, vol. SE-11, pp. 324–345, April 1985.

69

[12] G. Xiao, D. Calvanese, R. Kontchakov, D. Lembo, A. Poggi, R. Rosati, andM. Zakharyaschev, “Ontology-based data access: A survey,” in Proceedings of theTwenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pp. 5511–5519, International Joint Conferences on Artificial Intelligence Or-ganization, 7 2018.

[13] A. Chebotko, S. Lu, and F. Fotouhi, “Semantics preserving sparql-to-sql trans-lation,” Data & Knowledge Engineering, vol. 68, pp. 973–1000, 10 2009.

[14] R. Cyganiak, “A relational algebra for sparql,” 01 2005.

[15] U. S. Chakravarthy, J. Grant, and J. Minker, “Logic-based approach to semanticquery optimization,” ACM Trans. Database Syst., vol. 15, pp. 162–207, June1990.

[16] Q. Cheng, J. Gryz, F. Koo, T. Y. C. Leung, L. Liu, X. Qian, and K. B. Schiefer,“Implementation of two semantic query optimization techniques in db2 universaldatabase,” in Proceedings of the 25th International Conference on Very LargeData Bases, VLDB ’99, (San Francisco, CA, USA), pp. 687–698, Morgan Kauf-mann Publishers Inc., 1999.

[17] S. T. Shenoy and Z. M. Ozsoyoglu, “A system for semantic query optimization,”in Proceedings of the 1987 ACM SIGMOD International Conference on Manage-ment of Data, SIGMOD ’87, (New York, NY, USA), pp. 181–195, ACM, 1987.

[18] D. J. DeWitt, “The wisconsin benchmark: Past, present, and future,” in TheBenchmark Handbook, 1991.

[19] S. Sundara, S. Das, and R. Cyganiak, “R2RML: RDB to RDFmapping language,” W3C recommendation, W3C, Sept. 2012.http://www.w3.org/TR/2012/REC-r2rml-20120927/.

[20] A. Skubella, D. Janke, and S. Staab, “Beseppi: Semantic-based benchmarking ofproperty path implementations,” in The Semantic Web (P. Hitzler, M. Fernández,K. Janowicz, A. Zaveri, A. J. Gray, V. Lopez, A. Haller, and K. Hammar, eds.),(Cham), pp. 475–490, Springer International Publishing, 2019.

[21] D. Brickley and R. Guha, “RDF schema 1.1,” W3C recommendation, W3C, Feb.2014. http://www.w3.org/TR/2014/REC-rdf-schema-20140225/.

[22] I. Horrocks, B. C. Grau, Z. Wu, A. Fokoue, and B. Motik, “OWL 2web ontology language profiles,” W3C recommendation, W3C, Oct. 2009.http://www.w3.org/TR/2009/REC-owl2-profiles-20091027/.

[23] M. Rodríguez-Muro and D. Calvanese, “Quest, a system for ontology based dataaccess,” CEUR Workshop Proceedings, vol. 849, 01 2012.

70

[24] M. Giese, A. Soylu, G. Vega-Gorgojo, A. Waaler, P. Haase, E. Jiménez-Ruiz,D. Lanti, M. Rezk, G. Xiao, O. Ozcep, and R. Rosati, “Optique: Zooming in onbig data,” Computer, vol. 48, pp. 60–67, Mar 2015.

[25] A. Bertails, E. Prud’hommeaux, M. Arenas, and J. Sequeda, “A direct map-ping of relational data to RDF,” W3C recommendation, W3C, Sept. 2012.http://www.w3.org/TR/2012/REC-rdb-direct-mapping-20120927/.

[26] E. Jiménez-Ruiz and B. Cuenca Grau, “Logmap: Logic-based and scalable on-tology matching,” in The Semantic Web – ISWC 2011 (L. Aroyo, C. Welty,H. Alani, J. Taylor, A. Bernstein, L. Kagal, N. Noy, and E. Blomqvist, eds.),(Berlin, Heidelberg), pp. 273–288, Springer Berlin Heidelberg, 2011.

[27] H. Kllapi, E. Sitaridi, M. M. Tsangaris, and Y. Ioannidis, “Schedule optimizationfor data processing flows on the cloud,” in Proceedings of the 2011 ACM SIGMODInternational Conference on Management of Data, SIGMOD ’11, (New York,NY, USA), pp. 289–300, ACM, 2011.

[28] D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, A. Poggi, M. Rodriguez-Muro, R. Rosati, M. Ruzzi, and D. F. Savo, “The mastro system for ontology-based data access,” Semant. web, vol. 2, pp. 43–53, Jan. 2011.

[29] M. Namici, “R2RML mappings in OBDA systems: Enabling comparison amongOBDA tools,” CoRR, vol. abs/1804.01405, 2018.

[30] V. Eisenberg and Y. Kanza, “D2rq/update: Updating relational data via virtualrdf,” WWW’12 - Proceedings of the 21st Annual Conference on World Wide WebCompanion, 04 2012.

[31] F. Priyatna, O. Corcho, and J. Sequeda, “Formalisation and experiences of r2rml-based sparql to sql query translation using morph,” pp. 479–490, 04 2014.

[32] C. Nikolaou, E. V. Kostylev, G. Konstantinidis, M. Kaminski, B. C. Grau, andI. Horrocks, “Foundations of ontology-based data access under bag semantics,”Artif. Intell., vol. 274, pp. 91–132, 2019.

[33] D. Lanti, M. Rezk, G. Xiao, and D. Calvanese, “D.: The npd benchmark: realitycheck for obda systems,” in In: Proceedings of the 18th International Conferenceon Extending Database Technology (EDBT, pp. 617–628, 2015.

71

A. Overview of symbols

Symbol MeaningI The set of all IRIs.B The set of all blank nodes.L The set of all literals.IBL I ∪ B ∪L.tr An RDF triple.G A set of RDF triples called RDF graph.

Tontological The set of ontological terms.O An ontology.τGtr The evaluation of the ontological triple tr over the RDF graph G.V The set of possible variables in a SPARQL query.tp A triple pattern. tp ∈ (IBL ∪ V) × (I ∪ V) × (IBL ∪ V ).T P The set of triple patterns.P A graph pattern.

var(P ) The set of variables in P .µ A variable binding.

dom(µ) Function returning the domain of µ.µ(tp) The triple obtained by replacing all variables in tp according to µ.

Ω A set of variable bindings.Q A SPARQL query.D A domain.A An attribute name.

R(A1,A2, ...An) A relation schema with the attributes A1,A2, ...An.dom(Ai) Returns the domain of the attribute Ai.att(R) Returns the set of attributes in R.

r = tu1, tu2, ...tun A relation.tu =< v1, v2, ...vn > A tuple in a relation with the values v1, v2, ...vn.

tu[Ai] Returns the ith value vi of a tuple tu.S = R1,R2...Rn A relational schema.s = r1, r2...rn An instance of a relation schema.

ϕ A relational algebra expression.σcond(ϕ) Selection in ϕ satisfying the cond.

πA1,A2,...An(ϕ) The projection of the attributes A1...An in ϕ.ρA1→A2(ϕ) Rename of the attribute A1 to A2 in ϕ.ϕ1 ⊎ ϕ2 The outer union of ϕ1 and ϕ2.ϕ1 ∖ ϕ2 The difference of ϕ1 and ϕ2.ϕ1 × ϕ2 The cross join of ϕ1 and ϕ2.

ϕ1 &condA ϕ2 The theta join of ϕ1 and ϕ2 with the condition cond.ϕ1 d|><| condA ϕ2 The left outer join of ϕ1 and varphi2 with the condition cond.

θ A mapping template.ϕ (θ1, iri, θ2) A mapping rule.

72

(S,M,O) An OBDA specification.((S,M,O), s) An OBDA instance.sat(M,O) Function that saturates the mapping M based on the ontology O.

extend(Q,O) Function that extends SPARQL query Q based on ontology O.transform(r) Function that transforms a relation to variable bindings.rewrite(Q,M) Function that rewrites SPARQL query Q to an SQL query based on the

mapping M .(s, p, o) ∶ ρ1ρ2 Inference rule with the RDF triple (s, p, o) and the mapping rules ρ1 and

ρ2.trans(P ) Function that translates graph pattern P in a relational algebra expres-

sion.MC Set of mapping rules that create the exclusive view for the class C.

Table 42: Overview of the symbols introduced in this thesis.

73

Global-as-View Ontology-Based Data Access for Relational Data · 2019-10-17 · Knowledge graphs...

Documents

Transcript of Global-as-View Ontology-Based Data Access for Relational Data · 2019-10-17 · Knowledge graphs...