Semantic Web Query Processing with Relational Databases Artem Chebotko [email protected] Department...
-
date post
21-Dec-2015 -
Category
Documents
-
view
220 -
download
0
Transcript of Semantic Web Query Processing with Relational Databases Artem Chebotko [email protected] Department...
Semantic Web Query Processing with Relational Databases
Artem [email protected]
Department of Computer ScienceWayne State University
1/23/2007 2
Outline
The Semantic Web RDF SPARQL Relational Storage of RDF data SPARQL-to-SQL Translation Relational Nested Optional Join
1/23/2007 6
My Web page with Semantics <foaf:Person rdf:nodeID=“http://www.cs.wayne.edu/~artem/ID">
<foaf:name>Artem Chebotko</foaf:name>
<foaf:homepage rdf:resource="http://www.cs.wayne.edu/~artem" />
<foaf:img rdf:resource="http://www.cs.wayne.edu/~artem/main/welcome/welcome.jpg" />
<foaf:workplaceHomepage rdf:resource="http://www.cs.wayne.edu"/>
</foaf:Person>
1/23/2007 7
The Semantic Web
A Web of data (vs. a Web of documents)
… machine-processable/readable data
Framework for integration and combination of data from various sources
Data reuse across application, organization, and community boundaries
1/23/2007 9
RDF
RDF (Resource Description Framework) provides a common framework for representing resources and relations among them. Anything can be a resource (e.g., a person, a file, etc).
RDF provides a data model and a syntax
<foaf:Person rdf:nodeID=“http://www.cs.wayne.edu/~artem/ID">
<foaf:name>Artem Chebotko</foaf:name>
<foaf:homepage rdf:resource="http://www.cs.wayne.edu/~artem" />
<foaf:img rdf:resource="http://www.cs.wayne.edu/~artem/main/welcome/welcome.jpg" />
<foaf:workplaceHomepage rdf:resource="http://www.cs.wayne.edu"/>
</foaf:Person>
1/23/2007 10
RDF Model
RDF statement is a triple that consists of a subject, a predicate, and an object. foaf="http://xmlns.com/foaf/0.1/"
<foaf:Person rdf:nodeID=“http://www.cs.wayne.edu/~artem/ID">
<foaf:name>Artem Chebotko</foaf:name>
<foaf:homepage rdf:resource="http://www.cs.wayne.edu/~artem" />
<foaf:img rdf:resource="http://www.cs.wayne.edu/~artem/main/welcome/welcome.jpg" />
<foaf:workplaceHomepage rdf:resource="http://www.cs.wayne.edu"/>
</foaf:Person>
1/23/2007 11
RDF Model
RDF’s graph model: RDF models statements as nodes and edges in a graph.
http://www.cs.wayne.edu/~artem/ID
http://www.cs.wayne.edu/~artem
http://www.cs.wayne.edu/~artem/main/welcome/welcome.jpg
http://www.cs.wayne.edu
Artem Chebotko
foaf:name
foaf:homepage foaf:img
foaf:workplaceHomepage
1/23/2007 12
SPARQL
SPARQL is an RDF query language Graph pattern matching
Basic graph patterns, optional graph patterns, etc.
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?url FROM <my-foaf-data.rdf>
WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url . }
Query 1: Find the homepage URL of Artem Chebotko
Result 1: ?url is bound to the value “http://www.cs.wayne.edu/~artem”
?url
http://www.cs.wayne.edu/~artem
1/23/2007 13
SPARQL
Query 2: Find both the homepage and weblog of Artem Chebotko
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?url ?log FROM <my-foaf-data.rdf>
WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url .
?someone foaf:weblog ?log .}
Result 2: ?url and ?log are unbound
?url ?log
1/23/2007 14
SPARQL
Query 3: Find (1) the homepage of Artem Chebotko and
(2) his weblog if this information is available
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?url ?log FROM <my-foaf-data.rdf>
WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url .
OPTIONAL { ?someone foaf:weblog ?log .}
}
Result 3: ?url is bound to “http://www.cs.wayne.edu/~artem” and ?log is unbound
?url ?log
http://www.cs.wayne.edu/~artem
1/23/2007 15
SPARQL
Basic semantics of OPTIONAL patterns The evaluation of an OPTIONAL clause is not
obligated to succeed, and in the case of failure, no value will be returned for those unbound variables in the SELECT clause.
Semantics of shared variables In general, shared variables must be bound to the
same values. Variables can be shared among subjects, predicates, objects, and across each other.
More complicated semantics follows …
1/23/2007 16
SPARQL
Semantics of parallel OPTIONAL patterns While the failure of the evaluation of an OPTIONAL
clause does not block the evaluation of a following parallel OPTIONAL clause, the success of the evaluation of an OPTIONAL clause obligates the same variables in the following parallel OPTIONAL clauses to be bound to the same values.
1/23/2007 17
SPARQLQuery 4: Find (1) the homepage of Artem Chebotko and
(2) his weblog if this information is available
(3) his workplace homepage if this information is available
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?url ?log ?work FROM <my-foaf-data.rdf>
WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url .
OPTIONAL { ?someone foaf:weblog ?log .}
OPTIONAL { ?someone foaf:workplaceHomepage ?work .}
}
Result 4:
?url ?log ?work
http://www.cs.wayne.edu/~artem http://www.cs.wayne.edu
What if …
OPTIONAL { ?someone foaf:workplaceHomepage ?log .}
1/23/2007 18
SPARQL
Semantics of nested OPTIONAL patterns Before an OPTIONAL clause is evaluated, all
containing basic graph patterns or OPTIONAL clauses must have succeeded.
1/23/2007 19
SPARQLQuery 5: Find (1) the homepage of Artem Chebotko and
(2) his weblog if this information is available
(3) his workplace homepage if this information is available and weblog is
available
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?url ?log ?work FROM <my-foaf-data.rdf>
WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url .
OPTIONAL { ?someone foaf:weblog ?log .
OPTIONAL { ?someone foaf:workplaceHomepage ?work .}
}
}
Result 5: ?url is bound to “http://www.cs.wayne.edu/~artem” and ?log is unbound
?url ?log ?work
http://www.cs.wayne.edu/~artem
1/23/2007 20
Relational Storage of RDF data
Increasing amount of RDF data on the Web highlights the need for its efficient and effective management.
Using relational database technology as a basis for storing and querying RDF data is a reasonable choice as this technology is well understood and known to have good performance.
1/23/2007 21
Relational Storage of RDF data
The simplest oneTable Triples
More complicated (and more efficient) storage schemas are possible
subject predicate object
http://www.cs.wayne.edu/~artem/ID foaf:name Artem Chebotko
http://www.cs.wayne.edu/~artem/ID foaf:homepage http://www.cs.wayne.edu/~artem
http://www.cs.wayne.edu/~artem/ID foaf:img http://www.cs.wayne.edu/~artem/main/welcome/welc ome.jpg
http://www.cs.wayne.edu/~artem/ID foaf:workplaceHomepage
http://www.cs.wayne.edu
1/23/2007 22
SPARQL-to-SQL Translation
Problem: Relational databases “know” SQL, but not SPARQL
Solution: translate SPARQL queries into equivalent SQL queries in order to access RDF data stored in a relational database Algorithm BGPtoSQL to translate a SPARQL basic
graph pattern to its SQL equivalent Algorithm SPARQLtoSQL to translate SPARQL
queries with arbitrary complex optional graph patterns
1/23/2007 23
BGPtoSQL
Basic idea: Step 1:
Assign a unique table alias to every triple pattern E.g., t1 and t2 Construct the FROM clause to contain all the table
aliases
WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url . }
FROM Triples t1, Triples t2
1/23/2007 24
BGPtoSQL
Step 2: Construct the SELECT clause to contain every
relational attribute that corresponds to a distinct variable
WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url . }
SELECT t1.subject AS someone, t2.object AS url
FROM Triples t1, Triples t2
1/23/2007 25
BGPtoSQL
Step 3: Construct the WHERE clause to restrict attribute
values to the corresponding URIs and Literals
WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url . }
SELECT t1.subject AS someone, t2.object AS url
FROM Triples t1, Triples t2
WHERE t1.predicate = ‘foaf:name’ AND t1.object = ‘Artem Chebotko’ AND
t2.predicate = ‘foaf:homepage’
1/23/2007 26
BGPtoSQL
Step 4: Create an inverted list for variables
Finish the WHERE clause: attributes that correspond to shared variables must have same values)
WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url . }
SELECT t1.subject AS someone, t2.object AS url
FROM Triples t1, Triples t2
WHERE t1.predicate = ‘foaf:name’ AND t1.object = ‘Artem Chebotko’ AND
t2.predicate = ‘foaf:homepage’ AND t1.subject = t2.subject
?someone t1.subject, t2.subject
?url t2.object
1/23/2007 27
SPARQLtoSQL
Step 1: Translate all BGPs to SQL with BGPtoSQL. E.g., q1, q2, q3, q4
SELECT ?url ?log ?topic
WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url .
OPTIONAL { ?someone foaf:weblog ?log .
OPTIONAL { ?url foaf:topic ?topic .}
}
OPTIONAL { ?someone http://www.example.org/blog ?log .}
}
1/23/2007 28
SPARQLtoSQL Step 2:
Join the ‘relations’ (q1, q2, q3, q4) in the order as their corresponding graph patterns appear in the query
LEFT OUTER JOIN
SELECT ?url ?log ?topic
WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url .
OPTIONAL { ?someone foaf:weblog ?log .
OPTIONAL { ?url foaf:topic ?topic .}
}
OPTIONAL { ?someone http://www.example.org/blog ?log .}
}
Q = SELECT r1.someone AS someone, r1.url AS url, r2.log AS log
FROM (q1) r1 LEFT OUTER JOIN (q2) r2 ON (r1.someone = r2.someone)
1/23/2007 29
SPARQLtoSQL
SELECT ?url ?log ?topic
WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url .
OPTIONAL { ?someone foaf:weblog ?log .
OPTIONAL { ?url foaf:topic ?topic .}
}
OPTIONAL { ?someone http://www.example.org/blog ?log .}
}
Q = SELECT r11.someone AS someone, r11.url AS url, r11.log AS log, r22.topic AS topic
FROM (Q) r11 LEFT OUTER JOIN (q3) r22 ON (
r11.url = r22.url AND r11.log IS NOT NULL)
1/23/2007 30
SPARQLtoSQL
SELECT ?url ?log ?topic
WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url .
OPTIONAL { ?someone foaf:weblog ?log .
OPTIONAL { ?url foaf:topic ?topic .}
}
OPTIONAL { ?someone http://www.example.org/blog ?log .}
}
Q = SELECT r111.someone AS someone, r111.url AS url,
COALESCE(r111.log,r222.log) AS log, r111.topic AS topic
FROM (Q) r111 LEFT OUTER JOIN (q4) r222 ON (
r111.someone = r222.someone
AND (r111.log = r222.log OR r111.log IS NULL) )
1/23/2007 31
SPARQLtoSQL
Step 3: Project only required attributes (variables)
SELECT ?url ?log ?topic
WHERE { ?someone foaf:name “Artem Chebotko” . ?someone foaf:homepage ?url .
OPTIONAL { ?someone foaf:weblog ?log .
OPTIONAL { ?url foaf:topic ?topic .}
}
OPTIONAL { ?someone http://www.example.org/blog ?log .}
} }
SELECT r.url AS url, r.log AS log, r.topic AS topic
FROM (Q) r
1/23/2007 32
SPARQLtoSQL
Almost complete query (need to replace q1, q2, q3, q4)SELECT r.url AS url, r.log AS log, r.topic AS topic
FROM (
SELECT r111.someone AS someone, r111.url AS url,
COALESCE(r111.log,r222.log) AS log, r111.topic AS topic
FROM (
SELECT r11.someone AS someone, r11.url AS url, r11.log AS log, r22.topic AS topic
FROM (
SELECT r1.someone AS someone, r1.url AS url, r2.log AS log
FROM (q1) r1 LEFT OUTER JOIN (q2) r2 ON (r1.someone = r2.someone)
) r11 LEFT OUTER JOIN (q3) r22 ON (r11.url = r22.url AND r11.log IS NOT NULL)
) r111 LEFT OUTER JOIN (q4) r222 ON (
r111.someone = r222.someone
AND (r111.log = r222.log OR r111.log IS NULL) )
) r
1/23/2007 33
Experimental Study
Dataset: WordNet, 700,000+ triples Translation algorithms are very efficient and scalable.
For example, SPARQLtoSQL translated queries with less than 50 OPTIONAL clauses with one triple pattern in each in less than 0.001 sec. regardless of the clause tree layout
The evaluation of most sample queries in Oracle showed to be unsatisfactory (order of seconds) due to the simple relational schema being the most important reason. Note that this does not imply that the algorithms are not
practical. SPARQLtoSQL does not directly depend on a particular database schema as long as the BGPtoSQL stub for the database is provided, which we believe is a reasonable expectation from existing RDF storage systems.
1/23/2007 34
Experimental Study
The evaluation of sample queries in the in-memory relational database showed much better results. In these experiments, we were able to try different
implementations of the left outer join based on nested-loops, sort-merge and simple hash methods.
1/23/2007 36
New Example
P r o f es s o r
T o m
J er r y
J ef f
G r ad S tu d en t
r d f :ty p eh as Ad v is o r
N ata lia
Inst
ance
Sche
ma
h as C o ad v is o r
h as Ad v is o r
h as C o ad v is o r
1/23/2007 37
New Example
Retrieve: (1) every graduate student in the RDF graph; (2) the student's advisor if this information is available; (3) the student's coadvisor if this information is available and if the
student's advisor has been successfully retrieved in the previous step. In other words, the query returns students and as many advisors as
possible; there is no point to return a coadvisor if there is even no advisor for a student.
1/23/2007 38
Motivation: Computation Waste with LOJ
R 1 R 2
R 1 .s tu = R 2 .s tu
R 3
R 4 .s tu = R 3 .s tu A N DR 4 .a dv IS N O T N U LL
stu
Je rry
Na ta lia
stu co adv
Je rry Je ff
Na ta lia T o m
stu ad v
Je rry T o m
Na ta lia NUL L
stu ad v co adv
Je rry T o m Je ff
Na ta lia NUL L NUL L
R 4
R re s
stu ad v
Je rry T o m
1/23/2007 39
Nested Optional Join
A novel relational operator to translate nested optional patterns
An alternative to the left outer join Joins Twin Relations (base relation + optional relation)
A base relation: tuples that have a potential to satisfy a join condition if used in a nested optional join.
An optional relation: tuples that are guaranteed to fail a join condition if used in a nested optional join.
S b( )S oR b( )R o
Q b( )Q o
r sr ( a ) = s ( b ) ?
r nf a lse
r strue
r n
1/23/2007 40
SPARQL-to-SQL Translation with NOJ
(R 1b ,R 1
o ) (R 2b ,R 2
o )
(R 1b,R 1
o).s tu = (R 2b,R 2
o).s tu(R 3
b ,R 3o )
(R 4b,R 4
o).s tu = (R 3b,R 3
o).s tu
stu
Je rry
Na ta lia
stu coadv
Je rry Je ff
Na ta lia T om
(R 4b ,R 4
o )
(R re sb ,R re s
o )
stustu adv
stu adv
Je rry T om
stu adv
Na ta lia NULL
stu coadv
stu adv coadv
Je rry T om Je ff
stu adv coadv
Na ta lia NULL NULL
( )
( )
)(
(
( )
)
stu adv
Je rry T om
1/23/2007 41
Nested Optional Join
NOJ vs. LOJ the NOJ allows the processing of the tuples that are
guaranteed to be NULL padded very efficiently, in linear time
the NOJ does not require the NOT NULL check to return correct results
NOJ algorithms nested-loops NOJ algorithm NL-NOJ sort-merge NOJ algorithm SM-NOJ simple hash NOJ algorithm SH-NOJ.
1/23/2007 43
Nested Optional Join
for in-memory evaluation: JSF <= 0.005, SH-NOJ JSF >= 0.8, NL-NOJ 0.005 < JSF < 0.8, SM-NOJ
1/23/2007 44
Possible Future Work
Extending our work to support other SPARQL constructs, such as UNION, FILTER, etc.
Adding intelligence to our SPARQL-to-SQL translation to support the nested optional join.
Investigating possible optimizations of parallel optional graph patterns.
Defining the relational algebra for SPARQL with the support of nested and parallel optional joins.
… and more
1/23/2007 45
References
Artem Chebotko, Mustafa Atay, Shiyong Lu and Farshad Fotouhi "Extending Relational Databases with a Nested Optional Join for Efficient Semantic Web Query Processing". Technical Report TR-DB-052006-CLJF, Department of Computer Science, Wayne State University, November, 2006. Download
Artem Chebotko, Shiyong Lu, Hasan M. Jamil and Farshad Fotouhi "Semantics Preserving SPARQL-to-SQL Query Translation for Optional Graph Patterns". Technical Report TR-DB-052006-CLJF, Department of Computer Science, Wayne State University, May, 2006. Download