ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting
-
Upload
boris-glavic -
Category
Science
-
view
109 -
download
0
description
Transcript of ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting
Perm Processing Provenance and Data on the
Same Data Model through Query Rewriting
Boris Glavic
Database Technology Group
Department of Informatics University of Zurich
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
Gustavo Alonso
Systems GroupDepartment of Computer
Science ETH Zurich
2
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
Overview
1. Introduction to Perm2. The Perm Provenance
Representation3. Query Rewriting for Provenance
Computation4. Perm Implementation5. Experimental Results6. Conclusion
3
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
1. Introduction
Query Transformation
Data items: Result relation
Data items: Base relations
Relational Provenance
4
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
1. Introduction
Query
Which input data item(s) influenced which output data item(s)? Granularity
Tuple Attribute Value ...
Contribution semantics Influence (Why) Copy (Where) ...
5
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
The problem of computing this type of provenance has been solved before See e.g. [Cui, Widom ICDE ‘00]
but... Non-relational representation of
provenance data Separation of provenance and “normal”
data Non-relational computation of
provenance data
1. Introduction
6
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
1. Introduction
Perm Provenance Extension of the Relational
Model Provenance Management System
“Pure” Relational representation of provenance
Query result tuples and provenance tuples are represented as a single relation
7
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
1. Introduction
Benefits: Provenance can be... ... Stored in standard DBMS ... Queried using SQL ... Directly interpreted by a user Direct association between provenance
and “normal data”
8
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
1. Introduction
Provenance Computation -> Use query rewrite
Given query q Generate query q+
Computes the provenance of all result tuples from q
9
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
1. Introduction
Benefits: Rewritten query is expressed in
relational algebra Can be optimized and executed by a R-
DBMS E.g. can be stored as a view Used as a subquery
10
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
Overview
1. Introduction to Perm2. The Perm Provenance
Representation3. Query Rewriting for Provenance
Computation4. Perm Implementation5. Results6. Conclusion
11
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
2. The Perm Approach
sName
itemID
Migros 1
Migros 2
Migros 2
Coop 3
Coop 3
id price
1 100
2 10
3 25
itemssales
12
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
2. The Perm Approach
Compute the sum of sales for each shop
SELECT sName, sum(price) FROM sales, items WHERE itemId = id GROUP BY sName;
13
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
2. The Perm Approach
sName
itemID
Migros 1
Migros 2
Migros 2
Coop 3
Coop 3
id price
1 100
2 10
3 25
itemssales
name Sum(price)
Migros 120
Coop 50
result
14
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
2. The Perm Approach
sName
itemID
Migros 1
Migros 2
Migros 2
Coop 3
Coop 3
id price
1 100
2 10
3 25
itemssales
name Sum(price)
Migros 120
Coop 50
result
15
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
2. The Perm Approach
sName
itemID
Migros 1
Migros 2
Migros 2
Coop 3
Coop 3
id price
1 100
2 10
3 25
itemssales
name Sum(price)
Migros 120
Coop 50
result
16
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
2. The Perm Approach
Desired result format:
OriginalAttributes
Relation 1 Attributes
Relation n Attributes
17
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
2. The Perm Approach
name sum(price) P(sName)
P(itemId)
P(id) P(price)
Migros
120 Migros 1 1 100
Migros
120 Migros 2 2 10
Migros
120 Migros 2 2 10
Coop 10 Coop 3 3 25
Coop 10 Coop 3 3 25
Original result sales items
18
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
Overview
1. Introduction to Perm2. The Perm Provenance
Representation3. Query Rewriting for Provenance
Computation4. Perm Implementation5. Results6. Conclusion
19
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
3. Query Rewriting for Provenance Computation
Rewrite method basics Use algebra representation of the query Replace every algebra operator with an
algebra statement that propagates provenance alongside with the original results
-> need a rewrite rule for each relational algebra operator
20
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
3. Query Rewriting for Provenance Computation
Rewrite process
op3
op1
op2
21
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
3. Query Rewriting for Provenance Computation
Rewrite process
op3
op1
op2 op3
op1b
op2
op1a
op1c
Apply Rewrite rule
22
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
3. Query Rewriting for Provenance Computation
Rewrite process
op3
op1b
op2
op1a
op1cApply Rewrite rules
23
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
3. Query Rewriting for Provenance Computation
Rewrite rules notations:
Rewritten statement (query)
Provenance attributes
€
T +
€
P(T + )
24
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
3. Query Rewriting for Provenance Computation
Rewrite rules example:SELECT agg, GFROM TGROUP BY G
SELECT agg, G, P(T)FROM
(SELECT agg, G FROM T GROUP BY G) AS aggLEFT OUTER JOIN(SELECT G AS G’, P(T) FROM T ) AS provON (G = G’)
+
25
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
3. Query Rewriting for Provenance Computation
Rewrite rules example:SELECT sum(revenue) AS sum, shopFROM salesGROUP BY shop
shop month revenue
Migros Jan 100
Migros Feb 10
Migros Mar 10
Coop Jan 25
Coop Feb 25
salessum shop
120 Migros
50 Coop
result
26
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
3. Query Rewriting for Provenance Computation
SELECT sum, shop, pShop, pMonth, pRevenueFROM
(SELECT sum(revenue) AS sum, shop FROM sales GROUP BY shop) AS aggLEFT OUTER JOIN(SELECT shop AS shop’, pShop, pMonth, pRevenue FROM sales ) AS provON (shop = shop’)
sum shop pShop pMonth pRevenue
120 Migros Migros Jan 100
120 Migros Migros Feb 10
120 Migros Migros Mar 10
50 Coop Coop Jan 25
50 Coop Coop Feb 25
+
27
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
SELECT sum, shop, pShop, pMonth, pRevenueFROM
(SELECT sum(revenue) AS sum, shop FROM sales GROUP BY shop) AS aggLEFT OUTER JOIN(SELECT shop AS shop’, pShop, pMonth, pRevenue FROM sales ) AS provON (shop = shop’)
3. Query Rewriting for Provenance Computation
sum shop pShop pMonth pRevenue
120 Migros Migros Jan 100
120 Migros Migros Feb 10
120 Migros Migros Mar 10
50 Coop Coop Jan 25
50 Coop Coop Feb 25
+
28
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
SELECT sum, shop, pShop, pMonth, pRevenueFROM
(SELECT sum(revenue) AS sum, shop FROM sales GROUP BY shop) AS aggLEFT OUTER JOIN(SELECT shop AS shop’, pShop, pMonth, pRevenue FROM sales ) AS provON (shop = shop’)
3. Query Rewriting for Provenance Computation
sum shop pShop pMonth pRevenue
120 Migros Migros Jan 100
120 Migros Migros Feb 10
120 Migros Migros Mar 10
50 Coop Coop Jan 25
50 Coop Coop Feb 25
+
29
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
Overview
1. Introduction to Perm2. The Perm Provenance
Representation3. Query Rewriting for Provenance
Computation4. Perm Implementation5. Results6. Conclusion
30
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
4. Perm Implementation
Extension of PostgreSQL DBMS Implemented inside of PostgreSQL
-> does not affect client applications Extended SQL language Perm module
Implements algebraic rewrite rules as query rewrites
31
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
4. Perm Implementation
SQL-PLE: SQL extension SELECT PROVENANCE ...
Nice benefits: CREATE VIEW x AS SELECT
PROVENANCE ... SELECT PROVENANCE ... INTO x ... SELECT ... FROM (SELECT
PROVENANCE ...
32
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
4. Perm Implementation
Perm Architecture
Parser & Analyser
Rewriter
Perm Module
Planner
Executor
SELECT PROVENANCE ....
Q =...
Q’+ =...
MergeJoin (...
Q’ =...
33
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
Overview
1. Introduction to Perm2. The Perm Provenance
Representation3. Query Rewriting for Provenance
Computation4. Perm Implementation5. Experimental Results6. Conclusion
34
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
5. Experimental Results
TPC-H benchmark
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
35
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
Overview
1. Introduction to Perm2. The Perm Provenance
Representation3. Query Rewriting for Provenance
Computation4. Perm Implementation5. Experimental Results6. Conclusion
36
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
6. Conclusion
Benefits Compute provenance for SQL Full SQL query power for provenance
data Lazy or eager computation Reuse existing database technology Supports external provenance
37
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
6. Conclusion
Future work Physical operators for more efficient
provenance computation Storage compression Include transformation provenance Support different contribution semantics Support various granularities
38
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
Questions
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt.
Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt.
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.