Database Techniek – Query Optimization Database Techniek Query Optimization (chapter 14)
Materialized View Selection and Maintenance using Multi-Query Optimization
description
Transcript of Materialized View Selection and Maintenance using Multi-Query Optimization
Materialized View Materialized View Selection and Selection and
Maintenance using Maintenance using Multi-Query Multi-Query OptimizationOptimization
Hoshi MistryHoshi MistryPrasan RoyPrasan Roy
S. SudarshanS. SudarshanKrithi RamamrithamKrithi Ramamritham
Materialized ViewsMaterialized ViewsComplex results materialized in order
to speed up queries that depend on these results
Increasingly being supported by commercial database systems (e.g. Oracle8i)
Crucial in data warehousing environments
Materialized View Materialized View MaintenanceMaintenanceAs underlying data changes, the
materialized views need to be refreshed
Efficient view maintenance crucial! Need to provide up-to-date query
responses growing Amount of data added to data
warehouses increasing Maintenance time window shrinking
FocusFocusEfficient techniques for maintenance of
a set of materialized views (MVs) byTransient materialization of common
subexpressions (CSEs)Selection of additional MVsComputation of the best maintenance
policy and plan for each MV
Transient Materialization Transient Materialization of Common of Common SubexpressionsSubexpressionsCSEs materialized to reduce maintenance cost
by sharing computation, disposed after use
Motivated by Blakeley et al. [SIGMOD86], Ross et al. [SIGMOD96] – Huge search space; considered impractical
Earlier work by Sellis [TODS88] Efficient heuristic algorithms proposed by
Roy et al. [SIGMOD00]
Selection of Additional Selection of Additional MVsMVsAdditional views materialized permanently
to reduce the overall maintenance cost
Motivated by Ross et al. [SIGMOD96]– restricted to incremental maintenance only– do not consider transient materialization
MV selection in general addressed in Roussopolous [TODS82], Agrawal et al. [VLDB00]
Best Maintenance Policy Best Maintenance Policy and Plan Computationand Plan ComputationFor each MV, Determine the best maintenance policy
(incremental or recomputation) Find the corresponding best plan Earlier work by Vista [EDBT98]
– Does not take into account transient materialization of CSEs or presence of other MVs
Current systems need manual specification of the maintenance policy
ContributionContributionA framework that consolidates the choice
ofCSEs to be transiently materializedAdditional MVs Best maintenance plan
(incremental/recomputation) Integrated with a state of the art query
optimizer (Volcano [ICDE93])
ExampleExample
dAdA BB CC DD dEdE
BCBCDEDE
ABCABC CDECDE BCDEBCDE
mergemerge
mergemerge
incremental refreshincremental refresh recomputationrecomputation recomputationrecomputation
incremental refreshincremental refresh
permanentpermanent permanentpermanent permanentpermanent
permanentpermanent
transienttransient
initial setinitial set
ApproachApproach Setting up the search space of
maintenance plans Best maintenance plan
computation Transient/Permanent materialized
view selection
ApproachApproach Setting up the search space of
maintenance plans Best maintenance plan
computation Transient/Permanent materialized
view selection
Setting Up the Setting Up the Maintenance Plan SpaceMaintenance Plan Space
The Query DAG representation for recomputation plans
Incorporating incremental plans
Representation of the Representation of the Recomputation Plan SpaceRecomputation Plan Space
Equivalence ClassEquivalence Class(OR node)(OR node)
OperationOperation(AND node)(AND node)
AND/OR Query DAG
BCBC
ABCABC BCDBCD
CDCDABAB
CC DDBB
Best PlanBest Plan
AA
Additionally incorporates subsumption derivations Details in Roy et al. [SIGMOD00]
Incremental Plans:Incremental Plans:Propagation Based Differential Propagation Based Differential GenerationGeneration Differentials propagated one at a time For each differential dR
– Start at dR and compute node differentials bottom-up along the “best plan” in a topological order
– Differential of a node computed as a function of its inputs and their differentials
• e.g. d(E1E2) = E1 dE2 U E2dE1 U dE1dE2 where dEi= differential of Ei wrt dR
– Refresh the relation R and the affected MVs wrt dR by merging with the differentials computed as above
Ross et al. [SIGMOD96]
Incorporating Incremental Plans:Incorporating Incremental Plans:Propagation Based Differential Propagation Based Differential GenerationGeneration
Equivalence ClassEquivalence Class(OR node)(OR node)
OperationOperation(AND node)(AND node)
Propagation of dA
BCBC
BCdABCdA
BdABdA
CCBBdAdA
Best PlanBest Plan
Incorporating Incremental Plans:Incorporating Incremental Plans:Propagation Based Differential Propagation Based Differential GenerationGeneration
Equivalence ClassEquivalence Class(OR node)(OR node)
OperationOperation(AND node)(AND node)
Propagation of dB
CdBCdB
ACdBACdB CDdBCDdB
CDCDAdBAdB
CC DDdBdBAA
Best PlanBest Plan
Incorporating Incremental Plans:Incorporating Incremental Plans:Propagation Based Differential Propagation Based Differential GenerationGeneration
Equivalence ClassEquivalence Class(OR node)(OR node)
OperationOperation(AND node)(AND node)
Propagation of dC
BdCBdC
ABdCABdC BDdCBDdC
DdCDdCABAB
dCdC DDBBAA
Best PlanBest Plan
Incorporating Incremental Plans:Incorporating Incremental Plans:Propagation Based Differential Propagation Based Differential GenerationGeneration
Equivalence ClassEquivalence Class(OR node)(OR node)
OperationOperation(AND node)(AND node)
Propagation of dD
BCBC
BCdDBCdD
CdDCdD
CC dDdDBB
Best PlanBest Plan
Incorporating Incremental Incorporating Incremental PlansPlansLogical representation
ABAB
AA
BdABdA
BB
AdBAdB
dBdBdAdA For each equiv node and each base differential affecting it
– Introduce a new equiv node representing its differential– Populate with the differential plans
Maintain statistics for the full expression after successive mergesLarge space overhead!
recomputation planrecomputation plan
incremental planincremental planMerge operatorMerge operator
Incorporating Incremental Incorporating Incremental PlansPlans
ABAB
AA
BdABdA
BB
AdBAdB
dBdBdAdA
Reuse the same structure for successive propagation cycles separate best plan pointers for each cycle separate statistics for the full expression after successive mergesAlso incorporates sort-orders, indices, etc. Roy et al. [SIGMOD00]
Actual space-efficient representation
ApproachApproach Setting up the search space of
maintenance plans Best maintenance plan
computation Transient/Permanent materialized
view selection
Maintenance Plan Maintenance Plan ComputationComputationGiven Set of nodes Mt materialized transiently
– can include full results as well as differentials Set of nodes Mp materialized permanently
– includes full results but not differentialscompute the best consolidated
maintenance plan for Mp
Maintenance Plan Maintenance Plan ComputationComputationBest plan computed using a query optimizer
extended as follows: Plan accessing a materialized view (trans/perm)
does not include its computation, only its use Cost of a maintenance plan
totalcost(Mp, Mt) = eMpmaintcost(e | Mp, Mt) + eMttrmatcost(e | Mp, Mt)
wheremaintcost(Mp, Mt) : cost of cheapest maintenance plan for e
(recomputation/incremental)trmatcost(Mp, Mt) : cost of computing and materializing e
ApproachApproach Setting up the search space of
maintenance plans Best maintenance plan
computation Transient/Permanent materialized
view selection
Transient/Permanent Transient/Permanent Materialized View Materialized View SelectionSelectionGiven set of MVs M already materialized,
determine Set of nodes Mt to materialize transiently Set of nodes Mp ( M) to materialize
permanentlysuch that totalcost(Mp, Mt) is minimized
Exhaustive approach too expensive. Need heuristics!
Transient/Permanent Materialized View Transient/Permanent Materialized View SelectionSelectionA Greedy HeuristicA Greedy Heuristic
Input: Initial MVs MOutput: Mp ( M) , Mt, corresp. best planBegin
Mp = M; Mt = {}S = set of equivalence nodes in the DAG for MWhile ( S {} )
Pick z S which maximizes Benefit(z | Mp, Mt)If ( Benefit(z | Mp, Mt) 0 )
breakIf ( z is a full result and
maintcost(z | Mp, Mt) < trmatcost(z | Mp, Mt) )Mp = Mp U {z}
else Mt = Mt U {z}S = S – {z}
Return (Mp, Mt)End How to compute Benefit(z | Mp, Mt)?
Transient/Permanent Materialized View Transient/Permanent Materialized View SelectionSelectionBenefit ComputationBenefit ComputationBenefit(z | Mp, Mt) = gain(z | Mp, Mt) - investment(z | Mp, Mt)
where
gain(z | Mp, Mt) = eMp(maintcost(e | Mp, Mt) - maintcost(e | Mp, Mt U {z})) + eMt(trmatcost(e | Mp, Mt) - trmatcost(e | Mp, Mt U {z}))
and
investment(z | Mp, Mt) = min(maintcost(z | Mp, Mt), trmatcost(z | Mp, Mt))if z is a full result trmatcost(z | Mp, Mt) if z is a differential
Benefit computation expensive. Need efficient techniques!
Transient/Permanent Materialized View Transient/Permanent Materialized View SelectionSelectionImproving Efficiency of the Improving Efficiency of the Greedy HeuristicGreedy Heuristic Cost-propagation based incremental
techniques to efficiently compute Benefit Monotonicity assumption
– Reduces the number of Benefit computations Techniques to determine if a node can be
shared across a given maintenance plan– Reduces the number of nodes considered for
transient materialization
Adapted from Roy et al. [SIGMOD00]. See paper for details.
BenchmarkBenchmarkSingle Views
– Same views as above, refreshed separately
Set of Views– 10 views (5 with aggregates, 5
without) on 8 distinct relations, refreshed together
Effect of Transient and Effect of Transient and Permanent MaterializationPermanent Materialization
Single ViewsSingle Views Set of ViewsSet of Views
Effect of Adaptive Effect of Adaptive Maintenance Policy Maintenance Policy SelectionSelection
Single ViewsSingle Views Set of ViewsSet of Views
Scalability AnalysisScalability Analysis
Optimization Memory RequirementsOptimization Memory Requirements Optimization TimeOptimization Time
Negligible Negligible one-timeone-time costs costs
ConclusionConclusionPresented techniques Automate sharing of computation Automate view selection Automate maintenance policy selection and plan
computation Do the above in an integrated manner
– leading to benefits greater than could be achieved by considering each dimension individually
Are efficient and scalable – the overall benefits greatly outweigh the one-time cost
Integrate with state-of-the-art optimizers (e.g. MS SQL-Server)
Future WorkFuture WorkExtend presented techniquesTo handle limited spaceTo speed up a workload of queries
in addition to maintenance of a set of materialized views
To work in dynamic query result caching environments
QuestionsQuestions