Optimizing Query Answering
under
Ontological Constraints
Giorgio Orsi1,2 and Andreas Pieris2
1Institute for the Future of Computing
Oxford Martin School
University of Oxford
2Department of Computer Science
University of Oxford
VLDB 2011
Ontological Databases
Ontological Reasoning DB Constraints
Ontological DB
Ontological Databases
D
D
D
ABox
TBox
Ontological Reasoning DB Constraints
Ontological DB
Ontological Databases
D
D
D
Q(X) 9Y (X,Y)
ABox
TBox
Ontological Reasoning DB Constraints
Ontological DB
Ontological Databases
D
D
D
, ABox
TBox
{ t | D [ ² 9u (t,u) }
Ontological Reasoning DB Constraints
Ontological DB
Q(X) 9Y (X,Y)
Ontological Constraints (examples)
Concept Inclusions: 8X emp(X) person(X)
(Inverse) Relation Inclusion:
Relation Transitivity: 8X8Y8Z mgs(X,Y),mgs(Y,Z) mgs(X,Z)
8X8Y manages(X,Y) isManaged(Y,X)
Participation: 8X emp(X) 9Y report(X,Y)
Disjointness: 8X emp(X), customer(X) ?
Functionality: 8X8Y8Z reports(X,Y),reports(X,Z) Y = Z
Datalog§
¡ Datalog variant allowing in the head:
- 9-variables ! TGDs 8X8Y (X,Y) 9Z (X,Z)
- Equality atoms ! EGDs 8X (X) Xi=Xj
- Constant false (?) ! NCs 8X (X) ?
Datalog+
[Cali’ et Al, PODS 09]
Datalog+
Datalog§ [Cali’ et Al, PODS 09]
¡ Datalog variant allowing in the head:
- 9-variables ! TGDs 8X8Y (X,Y) 9Z (X,Z)
- Equality atoms ! EGDs 8X (X) Xi=Xj
- Constant false (?) ! NCs 8X (X) ?
¡ But, query answering under Datalog+ is undecidable
Datalog+
Datalog§ [Cali’ et Al, PODS 09]
¡ Datalog variant allowing in the head:
- 9-variables ! TGDs 8X8Y (X,Y) 9Z (X,Z)
- Equality atoms ! EGDs 8X (X) Xi=Xj
- Constant false (?) ! NCs 8X (X) ?
¡ Datalog+ is syntactically restricted ! Datalog§
¡ But, query answering under Datalog+ is undecidable
Datalog+
Datalog§ [Cali’ et Al, PODS 09]
¡ Datalog variant allowing in the head:
- 9-variables ! TGDs 8X8Y (X,Y) 9Z (X,Z)
- Equality atoms ! EGDs 8X (X) Xi=Xj
- Constant false (?) ! NCs 8X (X) ?
¡ Datalog+ is syntactically restricted ! Datalog§
¡ But, query answering under Datalog+ is undecidable
¡ TGDs more expressive than inclusion dependencies
8D8P8A runs(D,P),area(P,A) 9E employee(E,D,A)
The Chase Procedure
Input: Database D, set of TGDs
Output: A model of D [
person(john)
8X person(X) 9Y father(Y,X) 8X8Y father(X,Y) person(X)
D
chase(D,) = D [ ?
The Chase Procedure
Input: Database D, set of TGDs
Output: A model of D [
person(john)
D
chase(D,) = D [ {father(z1,john)
8X person(X) 9Y father(Y,X) 8X8Y father(X,Y) person(X)
The Chase Procedure
Input: Database D, set of TGDs
Output: A model of D [
person(john)
D
chase(D,) = D [ {father(z1,john), person(z1)
8X person(X) 9Y father(Y,X) 8X8Y father(X,Y) person(X)
The Chase Procedure
Input: Database D, set of TGDs
Output: A model of D [
person(john)
D
chase(D,) = D [ {father(z1,john), person(z1), father(z2,z1)
8X person(X) 9Y father(Y,X) 8X8Y father(X,Y) person(X)
The Chase Procedure
Input: Database D, set of TGDs
Output: A model of D [
person(john)
D
chase(D,) = D [ {father(z1,john), person(z1), father(z2,z1), …}
8X person(X) 9Y father(Y,X) 8X8Y father(X,Y) person(X)
Query Answering via Chase
[see, e.g., Deutsch, Nash & Remmel, PODS 08]
D [ ² Q , chase(D,) ² Q
D
. . .
C = chase(D,)
M1
M2
h1 h2
h1(C) h2(C)
Q h
Query Answering via Chase
D
. . .
C = chase(D,)
M1
M2
h1 h2
h1(C) h2(C)
Q h
Possibly Infinite!
Q
Query answering via rewriting
Q
Q
compilation
Query answering via rewriting
Q
evaluation
Q
Q
compilation
D
Query answering via rewriting
Chase vs rewriting
Linear TGDs
8X8Y r(X,Y) 9Z (X,Z)
single body atom
¡ Properly generalize inclusion dependencies.
¡ Enjoy the bounded-derivation depth property.
¡ FO-rewritable query answering in AC0 (data complexity).
Bounded derivation depth property (BDDP)
D
Q
Bounded derivation depth property (BDDP)
D
Q k levels
constant
w.r.t. D
chase(D,) ² Q ) chasek(D,) ² Q
Bounded derivation depth property (BDDP)
BDDP ) First-Order Rewritability
D
Q k levels
constant
w.r.t. D
Q
q promotesTo(A,B), customer(B) (original query)
promoter(X) Y promotesTo(X,Y)
promotesTo(X,Y) customer(Y)
q promotesTo(A,B), customer(B) Q
FO-rewritability: Example [Gottlob et Al., ICDE 11]
q promotesTo(A,B), customer(B)
q promotesTo(A,B), customer(V0,B)
{ Y = B }
( V0 is fresh )
promoter(X) Y promotesTo(X,Y)
promotesTo(X,Y) customer(Y)
Q
FO-rewritability: Example [Gottlob et Al., ICDE 11]
Q
q promotesTo(A,B), customer(B)
q promotesTo(A,B), customer(B)
q promotesTo(A,B), promotesTo(V0,B) ans(A) promotesTo(A,B)
factorization
{ A = V0 }
promoter(X) Y promotesTo(X,Y)
promotesTo(X,Y) customer(Y)
Q
FO-rewritability: Example [Gottlob et Al., ICDE 11]
Q
q promotesTo(A,B), customer(B)
q promoter(A)
promoter(X) Y promotesTo(X,Y)
promotesTo(X,Y) customer(Y)
Q
FO-rewritability: Example [Gottlob et Al., ICDE 11]
Q
q promotesTo(A,B), customer(B)
q promotesTo(A,B) {X = A, Y = B}
q promotesTo(A,B), customer(B)
UCQ rewriting
(first-order)
promoter(X) Y promotesTo(X,Y)
promotesTo(X,Y) customer(Y)
Q
FO-rewritability: Example [Gottlob et Al., ICDE 11]
Q
q promoter(A)
q promotesTo(A,B), customer(B)
q promotesTo(A,B)
q promotesTo(A,B), customer(B)
FO-rewritability
¡ Desirable properties of a FO-rewriting:
independent on the DB
executable by any DBMS
easy to compute (e.g., polynomial time)
small size (e.g., polynomial size)
FO-rewritability
¡ Unions of Conjunctive Queries (UCQs)
executable by any DBMS
DB independent
easy to optimize and distribute
worst-case exponential size in Q and
Calvanese et Al, JAR 07
Perez Urbina et Al, JAL 09
Cali’ et Al, PODS 09
Gottlob et Al, ICDE 11
and others…
¡ Desirable properties of a FO-rewriting:
independent on the DB
executable by any DBMS
easy to compute (e.g., polynomial time)
small size (e.g., polynomial size)
¡ Combined and hybrid FO-rewriting
good computational properties
(e.g., polynomial in size)
requires access to the DB
Kontchakov et Al., KR 10
Gottlob and Schwentick, DL 11
FO-rewritability
¡ Purely intensional Datalog rewriting
very compressed representation
purely intensional
requires view-creation or Datalog engine
¡ Combined and hybrid FO-rewriting
good computational properties
(e.g., polynomial in size)
requires access to the DB
Kontchakov et Al., KR 10
Gottlob and Schwentick, DL 11
Perez Urbina et Al, JAL 09
Rosati and Almatelli., KR 10
Orsi and Pieris., VLDB 11
FO-rewritability
Datalog rewriting: keep it first-order!
¡ A Datalog query is (in general) not a first-order query
a non-recursive Datalog query is a first-order query
a bounded Datalog query is a first-order query
an output-predicate-bounded Datalog query is a first-order query
Datalog rewriting: keep it first-order!
¡ Input:
a (w.l.o.g. boolean) conjunctive query Q = <q,ρ>
Q : q(X) p(X), s(X,Y) <q, q(X) p(X),s(X,Y) >
a set of linear TGDs
¡ Output:
a bounded Datalog query Q = <q,π >
¡ A Datalog query is (in general) not a first-order query
a non-recursive Datalog query is a first-order query
a bounded Datalog query is a first-order query
an output-predicate-bounded Datalog query is a first-order query
Predicate-bounded Datalog
r(X,Y), t(Y) → t(X)
r(X,Y ) → r(Y,X)
r(X,Y ), t(Z) → q(X)
¡ bounded Datalog program
all IDB predicates are bounded.
¡ p-bounded Datalog program
the predicate p is bounded.
Predicate-bounded Datalog
r(X,Y), t(Y) → t(X)
r(X,Y ) → r(Y,X)
r(X,Y ), t(Z) → q(X)
¡ bounded Datalog program
all IDB predicates are bounded.
¡ p-bounded Datalog program
the predicate p is bounded.
t is unbounded t(X) cannot be computed using
a DB-independent FO-query
Predicate-bounded Datalog
r(X,Y), t(Y) → t(X)
r(X,Y ) → r(Y,X)
r(X,Y ), t(Z) → q(X)
¡ bounded Datalog program
all IDB predicates are bounded.
¡ p-bounded Datalog program
the predicate p is bounded.
t is unbounded t(X) cannot be computed using
a DB-independent FO-query
t(X) r(X,Y), t(Y)
Predicate-bounded Datalog
r(X,Y), t(Y) → t(X)
r(X,Y ) → r(Y,X)
r(X,Y ), t(Z) → q(X)
¡ bounded Datalog program
all IDB predicates are bounded.
¡ p-bounded Datalog program
the predicate p is bounded.
t is unbounded t(X) cannot be computed using
a DB-independent FO-query
t(X) r(X,Y), r(Y,Z), t(Z)
t(X) r(X,Y), t(Y) v
Predicate-bounded Datalog
r(X,Y), t(Y) → t(X)
r(X,Y ) → r(Y,X)
r(X,Y ), t(Z) → q(X)
¡ bounded Datalog program
all IDB predicates are bounded.
¡ p-bounded Datalog program
the predicate p is bounded.
t is unbounded t(X) cannot be computed using
a DB-independent FO-query
t(X) r(X,Y), r(Y,Z), t(Z) v
t(X) r(X,Y), t(Y) v
t(X) r(X,Y), r(Y,Z), r(Z,W), t(W)
Predicate-bounded Datalog
r(X,Y), t(Y) → t(X)
r(X,Y ) → r(Y,X)
r(X,Y ), t(Z) → q(X)
¡ bounded Datalog program
all IDB predicates are bounded.
¡ p-bounded Datalog program
the predicate p is bounded.
t is unbounded t(X) cannot be computed using
a DB-independent FO-query
t(X) r(X,Y), r(Y,Z), t(Z) v
t(X) r(X,Y), t(Y) v
t(X) r(X,Y), r(Y,Z), r(Z,W), t(W) v
…
Predicate-bounded Datalog
r(X,Y), t(Y) → t(X)
r(X,Y ) → r(Y,X)
r(X,Y ), t(Z) → q(X)
¡ bounded Datalog program
all IDB predicates are bounded.
¡ p-bounded Datalog program
the predicate p is bounded.
q is bounded q(X) can be computed using
a DB-independent FO-query
Predicate-bounded Datalog
r(X,Y), t(Y) → t(X)
r(X,Y ) → r(Y,X)
r(X,Y ), t(Z) → q(X)
¡ bounded Datalog program
all IDB predicates are bounded.
¡ p-bounded Datalog program
the predicate p is bounded.
q is bounded q(X) can be computed using
a DB-independent FO-query
q(X) r(X,Y)
Predicate-bounded Datalog
r(X,Y), t(Y) → t(X)
r(X,Y ) → r(Y,X)
r(X,Y ), t(Z) → q(X)
¡ bounded Datalog program
all IDB predicates are bounded.
¡ p-bounded Datalog program
the predicate p is bounded.
q is bounded q(X) can be computed using
a DB-independent FO-query
q(X) r(Y,X)
q(X) r(X,Y) v
Output-predicate-bounded Datalog and BDDP
OPB
DATALOG
PROGRAM
Output-predicate-bounded Datalog and BDDP
INPUT
RULES
OPB
DATALOG
PROGRAM
BDDP
RULES
OUTPUT
RULES
Output-predicate-bounded Datalog and BDDP
OPB
DATALOG
PROGRAM
enjoy
BDDP
property
non recursive non recursive
INPUT
RULES
BDDP
RULES
OUTPUT
RULES
Datalog Rewriting: Skolemization (and renaming)
r(X,Y) Z s(Y,Z)
s(X,Y) Z p(Y,Y,Z)
p(X,Y,Z) t(Z)
Datalog Rewriting: Skolemization (and renaming)
r(X,Y) Z s(Y,Z)
s(X,Y) Z p(Y,Y,Z)
p(X,Y,Z) t(Z)
r(X1,Y1) s(Y1,f1(Y1))
s(X2,Y2) p(Y2,Y2,f2(Y2))
p(X3,Y3,Z3) t(Z3)
f
Datalog Rewriting: Skolemization (and renaming)
r(X,Y) Z s(Y,Z)
s(X,Y) Z p(Y,Y,Z)
p(X,Y,Z) t(Z)
¡ f and are equisatisfiable (not equivalent)
¡ Introduce one Skolem function for each existential variable
r(X1,Y1) s(Y1,f1(Y1))
s(X2,Y2) p(Y2,Y2,f2(Y2))
p(X3,Y3,Z3) t(Z3)
f
Datalog Rewriting: Rule Saturation
¡ Apply resolution inference rule to rules in f
at least one of the rules contains Skolem terms
δ1 : r (X1,Y1) s(Y1,f1(Y1))
δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))
δ3 : p(X3,Y3,Z3) t(Z3)
f
Datalog Rewriting: Rule Saturation
¡ Apply resolution inference rule to rules in f
at least one of the rules contains Skolem terms
f [f]
…
r(X1,Y1) p(f1(Y1) ,f1(Y1), f2(f1(Y1)))
…
δ1 : r (X1,Y1) s(Y1,f1(Y1))
δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))
δ3 : p(X3,Y3,Z3) t(Z3)
Datalog Rewriting: Properties of Rule Saturation
¡ [f] mimics the chase derivations.
Datalog Rewriting: Properties of Rule Saturation
¡ [f] mimics the chase derivations.
δ1 : r (X1,Y1) s(Y1,f1(Y1))
δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))
δ3 : p(X3,Y3,Z3) t(Z3)
Datalog Rewriting: Properties of Rule Saturation
¡ [f] mimics the chase derivations.
¡ [f] depends only on .
¡ [f] is possibly infinite
linear TGDs have BDDP: suffices to construct it up to k steps [f]k.
δ1 : r (X1,Y1) s(Y1,f1(Y1))
δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))
δ3 : p(X3,Y3,Z3) t(Z3)
Datalog Rewriting: Query Saturation
¡ resolve [f] with the query Q.
use only rules with Skolem terms.
Datalog Rewriting: Query Saturation
¡ resolve [f] with the query Q.
use only rules with Skolem terms.
…
δ1 : r (X1,Y1) s(Y1,f1(Y1))
δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))
δ3 : p(X3,Y3,Z3) t(Z3)
…
[δ12]] : r (X1,Y1) p(f1(Y1) ,f1(Y1), f2(f1(Y1)))
…
Q s(A,B), p(B,B,C) [f] Q
…
Q r(X1,Y1), p(f1(Y1), f1(Y1),C)
…
[Q,f]
Datalog Rewriting: Query Saturation
¡ bypasses chase derivations with function symbols
…
δ1 : r (X1,Y1) s(Y1,f1(Y1))
δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))
δ3 : p(X3,Y3,Z3) t(Z3)
…
[δ12]] : r (X1,Y1) p(f1(Y1) ,f1(Y1), f2(f1(Y1)))
…
Q s(A,B), p(B,B,C)
[f]
Q
Datalog Rewriting: Finalization
¡ keep only the function-free rules from [f] [ [Q,f]
¡ derivations producing certain answers are captured by function-
symbol-free rules.
¡ (optional) add auxiliary rules to make the rewriting a proper Datalog
query i.e., clear distinction IDB/EDB.
OPB-Datalog and BDDP revisited
OPB
DATALOG
PROGRAM
INPUT
RULES
BDDP
RULES
OUTPUT
RULES
ff([f]) ff([Q,f]) aux
enjoy
BDDP
property
non recursive non recursive
¡ use the predicate graph to reduce the number of rules in f
δ1 : r (X1,Y1) s(Y1,f1(Y1))
δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))
δ3 : p(X3,Y3,Z3) t(Z3)
f
Q s(A,B), p(B,B,C)
Q
Optimizations: Pruning
¡ use the predicate graph to reduce the number of rules in f
δ1 : r (X1,Y1) s(Y1,f1(Y1))
δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))
δ3 : p(X3,Y3,Z3) t(Z3)
f
Q s(A,B), p(B,B,C)
Q
Optimizations: Pruning
r s
p t
Q
Optimizations: Pruning
¡ use the predicate graph to reduce the number of rules in f
δ1 : r (X1,Y1) s(Y1,f1(Y1))
δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))
δ3 : p(X3,Y3,Z3) t(Z3)
f
Q s(A,B), p(B,B,C)
Q
r s
p t
Q
Optimizations: Pruning
¡ use the predicate graph to reduce the number of rules in f
δ1 : r (X1,Y1) s(Y1,f1(Y1))
δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))
δ3 : p(X3,Y3,Z3) t(Z3)
f
Q s(A,B), p(B,B,C)
Q
¡ we are no longer
independent on Q!
r s
p t
Q
Optimizations: Query Elimination
¡ eliminate implied atoms during query saturation
δ1 : r (X1,Y1) s(Y1,f1(Y1))
δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))
f Q
Q s(A,B), p(B,B,C)
Optimizations: Query Elimination
¡ eliminate implied atoms during query saturation
f Q
s(A,B) ² p(B,B,C)
Q s(A,B), p(B,B,C)
atom coverage
δ1 : r (X1,Y1) s(Y1,f1(Y1))
δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))
Optimizations: Query Elimination
¡ eliminate implied atoms during query saturation
f
Q s(A,B), p(B,B,C) ≡ Q s(A,B)
Q
s(A,B) ² p(B,B,C)
Q s(A,B), p(B,B,C)
atom coverage
δ1 : r (X1,Y1) s(Y1,f1(Y1))
δ2 : s(X2,Y2) p(Y2,Y2,f2(Y2))
Optimizations: Query Elimination
¡ unique elimination strategy (w.r.t. the number of eliminated atoms)
see paper.
¡ given m = |body(ρ)| and n = ||
worst-case size of [f] [ [Q,f] is O((n∙m)m)
worst-case size of Q = <q,π > is O(n+m)m
¡ atom coverage under linear TGDs can be checked in polynomial time
see paper.
Experimental Results
Discussion
¡ Datalog rewriting is substantially more compact than UCQ rewriting.
¡ Does Datalog improve the end-to-end performance?
¡ Extend the procedure to larger classes of TGDs
guarded TGDs [Cali’ et Al, PODS 09] non FO-rewritable
sticky-join TGDs [Cali’ et Al, VLDB 10]
The Datalog Family
Thomas
Lukasiewicz
Georg
Gottlob
Andreas
Pieris
Andrea
Calì
Giorgio
Orsi
Thank you!
Top Related