Materialized View Selection for XQuery Workloads
description
Transcript of Materialized View Selection for XQuery Workloads
Materialized View Selection for XQuery Workloads
Asterios Katsifodimos1, Ioana Manolescu1 & Vasilis Vassalos2
1Inria Saclay & Université Paris-Sud, 2Athens University of Economics and Business
Athens University of
Economics and Business
Materialized View Selection for XQuery Workloads 2
View selection in XML databasesProblem definition
Find a set of materialized views that minimizes workload evaluation costs not exceeding a space budget.
Materialized View Selection for XQuery Workloads 3
Materialized View Selection for XQuery Workloads
View selection for multiple-views XQuery rewriting
Rich subset of XQuery Tree patterns with multiple return nodes and value joins
We provide Candidate view pruning methods View selection algorithms:
Utility-Based Greedy (UDG) Reduce-Optimize Algorithm (ROA)
Extensive experimental evaluation Outperforming & extending state-of-the-art works
Contributions
Materialized View Selection for XQuery Workloads
Outline
The View Selection Problem
View Language & Candidate Views
View Selection Algorithms
Related Work & Experimentation
- 4
Materialized View Selection for XQuery Workloads 5
Query and view languageAnatomy of a query
cont=subtree of the text element Value-join
Return the ID of every book along with its text and author if the book author has a paper in the SIGMOD conference.
ID
ID of book
book
textcont authorval
paper
author conference[=“SIGMOD”]
Materialized View Selection for XQuery Workloads 6
Candidate Views
JOIN[v1.authorID>v2.bookID]
SCAN(v1) SCAN(v2)
PROJECT[textcont, authorval]
Rewriting
v1
authorID,val
v2
bookID
textID,cont
Candidate Views
Example:
Query
book
authorvaltextcont
Candidate views: views that can participate in a rewriting of a query. Property: candidate views are exactly those embeddable in a query.
Materialized View Selection for XQuery Workloads 7
Candidate Views
Number of candidate views For query of m value joins and k tree patterns:
Early pruning is needed
Rules of thumb for pruning Drop all views that can be replaced by others Views should not store anything extraneous
Challenge: remove maximum number of views Preserve low cost and/or small size rewriting possibilities.
Materialized View Selection for XQuery Workloads
Candidate Views
8
Pruning techniques
book
authorvaltextcont
v2
authorID, cont
v2‘
authorID,val
v3
bookID
textID,cont
v1
book
authorID
Query Candidate Views
② Do not store unnecessary data i.e. useless cont, val or //-axis Avoid expensive rewritings Save space
① Annotate all nodes with ID Maximize rewriting
opportunities
v1‘
bookID
authorID v3‘
bookID
textID,cont
Materialized View Selection for XQuery Workloads
Outline
The View Selection Problem
View Language & Candidate Views
View Selection Algorithms
Related Work & Experimentation
- 9
Materialized View Selection for XQuery Workloads 10
Materializing a set of views
Benefit of materializing a set of views
benefit (V, Q)=(cost of evaluating Q over D) – (cost of evaluating Q over V)
Computation of benefit requires invoking rewriting algorithm Expensive!
Space occupancy of a view set V Total size (in bytes)
View set benefit
Materialized View Selection for XQuery Workloads 11
View Selection Algorithms
High similarity with the classic 0-1 knapsack problem
Typical element of the greedy algorithms for knapsack:
utility(v,Q)=benefit({v} U V, Q)/size(v)
Knapsack-inspired view selection
Knapsack View Selection
Weight View Size
Profit Benefit (evaluation cost savings)
Materialized View Selection for XQuery Workloads 12
S=12
View Selection AlgorithmsUtility-Driven Greedy (UDG) Algorithm
U=Utility(=benefit/size)S=Space occupancy
Space BudgetCandidate Views
U=10S=7
U=60S=5
U=50S=4
U=8S=2
1. Enumerate candidate views
2. Compute view utilities
3. Order views by utility
4. Select the view of largest utility fitting in budget
5. Repeat 2-4 until budget exhausted
Materialized View Selection for XQuery Workloads 13
S=12
View Selection Algorithms
1. Enumerate candidates
2. Compute utilities
3. Order by utility
4. Select the view of largest utility fitting in budget
5. Repeat 2-4 until budget exhausted
Utility-Driven Greedy (UDG) Algorithm
U=Utility(=benefit/size)S=Space occupancy
Space BudgetCandidate Views
U=12S=7
U=40S=5
U=64S=4
U=9S=2
Materialized View Selection for XQuery Workloads 14
S=12
View Selection AlgorithmsUtility-Driven Greedy (UDG) Algorithm
U=Utility(=benefit/size)S=Space occupancy
Space BudgetCandidate Views
U=13S=7
U=10S=5
U=64S=4
U=4S=2
1. Enumerate candidates
2. Compute utilities
3. Order by utility
4. Select the view of largest utility fitting in budget
5. Repeat 2-4 until budget exhausted
Greedy algorithms for knapsack not a perfect fit for our problem
Utility of a view may change after every round depends on other views already selected
Materialized View Selection for XQuery Workloads 15
View Selection AlgorithmsState space search (state=candidate view set)
S1
S3
S4 S5
S6 S7 S8
S9
S10
S11
S12 S13
S14
S15 S16
Initial state:
Best state:
query workload
largest benefit under space budget
transform(S1)S8
Materialized View Selection for XQuery Workloads 16
View Selection Algorithms
View Break: break a view in smaller parts
Reveals common sub-expressions of views
Can reduce or increase space occupancy
Increases query evaluation costs
State Transformations: Break, Join, Generalize, Adapt
book
textcont authorval
paper
author conference[=“SIGMOD”]
Materialized View Selection for XQuery Workloads 17
View Selection Algorithms
Join: opposite to Break, join two views into one
Reduces evaluation costs
Joined views can be smaller in size
State Transformations: Break, Join, Generalize, Adapt
book
textcont authorval
paper
author conference[=“SIGMOD”]
ID
val ,ID
Materialized View Selection for XQuery Workloads 18
View Selection Algorithms
Generalize: generalization/relaxation of a view
Reveals common sub-expressions of views
Can reduce or increase space occupancy
Increases query evaluation costs
State Transformations: Break, Join, Generalize, Adapt
book
textcont authorval
paper
author conferenceval[=“SIGMOD”]val
cont
Materialized View Selection for XQuery Workloads 19
View Selection Algorithms
Adapt: specialization of views by1. Conversion of //-axis to /-axis 2. Addition of existential nodes
Reduces evaluation costs
“Adapted” views can be smaller in size
State Transformations: Break, Join, Generalize, Adapt
book
text author
paper
author conferenceval[=“SIGMOD”]
cont
Break, Join, Generalize, Adapt Allow to generate all states Guaranteed not to generate pruned views
Materialized View Selection for XQuery Workloads 20
View Selection Algorithms
Huge number of states
Call rewriting algorithm after every state transition
Need for heuristics
Proposal: heuristic three-phase algorithm ROA
The Reduce-Optimize algorithm (ROA)
OptimizeJump
Reduce
Materialized View Selection for XQuery Workloads 21
View Selection AlgorithmsThe Reduce-Optimize algorithm (ROA)
Space Budget
Time
Time
Space
Occupa
ncy
Benefi
t
Reduce Optimize Jump Reduce Optimize Reduce ...
SolutionBest State Revisited StateIntermediary State
Materialized View Selection for XQuery Workloads 22
View Selection Algorithms
1. Some transitions may apply several transformations at once
2. Stop the rewriting algorithm early After k rewritings found or At a timeout
3. Consider only the lowest cost rewritings
Reducing ROA search time - heuristics
Materialized View Selection for XQuery Workloads
Outline
The View Selection Problem
View Language & Candidate Views
View Selection Algorithms
Related Work & Experimentation
- 23
Materialized View Selection for XQuery Workloads 24
Related Work
Algorithm Rewriting power
[Mandhani, Suciu VLDB05] 1-view rewritings
[Tang et. al. DASFAA09] 1-view rewritings
Utility-Driven Greedy Multiple view rewritings
Reduce-Optimize Multiple view rewritings
Materialized View Selection for XQuery Workloads 25
Experimental Evaluation
Queries Workloads
Tree patterns: Q1(14), Q2 (50), Q3(100) Tree patterns + joins: Q4 (50), 20% joins
Query Selectivity ⅓ low, ⅓ medium, ⅓ high
Database: 1GB XMark (10x100MB documents)
Settings
Space budget S=size(Q) Tested space budgets:
S, S/2, S/4, S/6
Algorithms UDG and ROA Competitors:
[Mandhani & Suciu VLDB05] [Tang et al. DASFAA09]
Implementation ViP2P*, Java
*http://vip2p.saclay.inria.fr
Materialized View Selection for XQuery Workloads 26
Experimental EvaluationWorkload Evaluation Time of Q1 (14 queries)
Reduce-Optimize (ROA)
Space/Time Greedy [Tang et al. DASFAA09]
Set-Cover Greedy [Mandhani & Suciu VLDB05]
Utility-Driven Greedy (UDG)
Space Optimal [Tang et al. DASFAA09]
Hit
Rat
io
Eva
luat
ion
time
vers
us d
ocs
Materialized View Selection for XQuery Workloads 27
Experimental EvaluationEvaluation Time & hit ratio for Q3 (100 queries)
Reduce-Optimize (ROA)
Set-Cover Greedy [Mandhani & Suciu VLDB05]
Hit
Rat
io
Eva
luat
ion
time
vers
us d
ocs
Materialized View Selection for XQuery Workloads 28
Experimental EvaluationROA evaluation for Q4 (50 queries, 20% value-joined)
% of evaluation time vs. documents
Hit Ratio
Materialized View Selection for XQuery Workloads 29
Conclusions
Automatic selection of XQuery views for multiple-views rewritings
Reduction of candidate views By orders of magnitude
ROA performs better than related work Scales and manages to find good solutions relatively fast
80% of the benefits attained in ~2 minutes Maximum benefit attained within 25 minutes.
Algorithms of [Tang et. al. DASFAA09] did not scale beyond 14 queries
Utility Drive Greedy (UDG) did not scale beyond 50 queries
Thank you
- 30
Questions?
?
Materialized View Selection for XQuery Workloads
BACKUP
- 31
Materialized View Selection for XQuery Workloads 32
Cost of algebraic plans
Algebraic Plan cost Execution cost of an operator has
A CPU execution cost and An IO cost Both depend on input
Evaluation cost of a plan: Calculated bottom-up
Estimating the evaluation cost of a rewriting
Data Statistics DataGuide of every document
Enriched with information: # of instances of a path Average path val size (bytes) Average path cont size (bytes) Distinct values of each path
Used to estimate Cardinality & size of a view
Materialized View Selection for XQuery Workloads 33
Cost of algebraic plans
View Size Cardinality
v1 500KB 50v2 100KB 10
Cost estimation example
JOIN[v1.author=v2.author]
SCAN(v1)
SCAN(v2)
SELECT[conference=“SIGMOD”]
PROJECT[textcont, authorval]
IO=100 | CPU=10
IO=100 | CPU=10+10IO= 500 | CPU=50
IO=500+100 | CPU=70+50*5
IO=600 | CPU=320+25
OUTPUT=50
OUTPUT=10
OUTPUT=5
OUTPUT=25 (50*5*0.1)
OUTPUT=25
Materialized View Selection for XQuery Workloads 34
Experimental EvaluationROA time to attain increasing benefits (minutes)
Materialized View Selection for XQuery Workloads 35
Experimental EvaluationCandidate views pruning
CS0max Maximum estimated number of candidate views
CS0min Minimum estimated number of candidate views
CS1 Pruned candidate view set
CS2 Pruned candidate view set – only linear path candidates
Materialized View Selection for XQuery Workloads 36
Candidate Views
The cardinality of the set of candidate views of a tree pattern query q of |q| nodes is bounded by:
Size of the set of candidate views for a tree pattern
Combinations of nodes of q: ({a},{b},{c},{a,b},{a,c},{a,b,c})
Edge combinations: how to connect nodes with (/, //) e.g. /a/b, //a/b, /a//b, //a//b}.
There are 12 return node variations for each node in a pattern e.g. (aID,cont,aval,aID,val…)
Example: q=/a/bval/c
Materialized View Selection for XQuery Workloads 37
Candidate Views
Given a joined pattern q with: k tree patterns and m value-joins
The candidate view set size of q is bounded by:
Size of the set of candidate views for a joined pattern
Value join combinationsNumber of views resulting from all possible cartesian products of k tree patterns
Materialized View Selection for XQuery Workloads 38
View Selection Algorithms
The benefit of materializing a view set V is The difference in cost of evaluating the workload over V
vs. evaluating from the documents
Benefit of materializing a set of views
Cost of evaluating query q given the set
of materialized views V
Cost of evaluating query q from the
documents
Frequency of query q
Materialized View Selection for XQuery Workloads 39
Tree Pattern queryof |q| nodes
Joined Pattern query with m value joins & k tree patterns