Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

42
Multi-Query Multi-Query Optimization and Optimization and Applications Applications Prasan Roy Prasan Roy Indian Institute of Technology - Indian Institute of Technology - Bombay Bombay

Transcript of Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

Page 1: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

Multi-Query Multi-Query Optimization andOptimization and

ApplicationsApplications

Prasan RoyPrasan RoyIndian Institute of Technology - BombayIndian Institute of Technology - Bombay

Page 2: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

May 2000 Multi-Query Optimization and Applications 2

MotivationMotivation Queries often involve repeated

computation– Queries on overlapping views, stored

procedures, nested queries, etc.– Update expressions for a set of overlapping

materialized views– Automatically generated queries

• XML-QL complex path expressions SQL query batches

Our focus: Faster query processing by avoiding repeated computation

Page 3: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

May 2000 Multi-Query Optimization and Applications 3

OutlineOutline

Multi-query optimizationApplication to related problems

– Query result caching– Materialized view selection and

maintenanceConclusions and future work

Page 4: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

Multi-Query OptimizationMulti-Query Optimization

Prasan RoyPrasan Roy, S. Seshadri, S. Sudarshan and Siddhesh Bhobe,, S. Seshadri, S. Sudarshan and Siddhesh Bhobe,Efficient and Extensible Algorithms for Multi-Query OptimizationEfficient and Extensible Algorithms for Multi-Query Optimization,,ACM SIGMOD 2000ACM SIGMOD 2000

Page 5: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

May 2000 Multi-Query Optimization and Applications 5

Motivating ExampleMotivating Example

AA

BB CC

BB

CC DD

Best Plan for Best Plan for A JOIN B JOIN CA JOIN B JOIN C

Best Plan forBest Plan forB JOIN C JOIN DB JOIN C JOIN D

Foreign Key Dependency: AForeign Key Dependency: ABBCCDD Total Cost = 460Total Cost = 460

100100

1010

100100

100100

10101010

1010

100100

1010 1010

Page 6: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

May 2000 Multi-Query Optimization and Applications 6

BCBC

Motivating ExampleMotivating Example

AA

BB CC

DD

Total Cost = 370Total Cost = 370Benefit = 90Benefit = 90

100100 100100

100100

1010

1010

1010

1010

1010

1010

1010

Foreign Key Dependency: AForeign Key Dependency: ABBCCDD

Page 7: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

May 2000 Multi-Query Optimization and Applications 7

Problem StatementProblem Statement

AA

BB CC

DD

Find the cheapest plan exploiting transiently materialized common subexpressions (CSEs)– Assumption: No shared pipelines

Common SubexpressionCommon Subexpression

Page 8: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

May 2000 Multi-Query Optimization and Applications 8

ProblemsProblems Locally optimal subplans may not be

globally optimal Mutually exclusive alternatives

(A JOIN B JOIN C)(A JOIN B JOIN C)

(B JOIN C JOIN D)(B JOIN C JOIN D)

(C JOIN D JOIN E)(C JOIN D JOIN E)What to share: (B JOIN C)(B JOIN C) or (C JOIN D)(C JOIN D) ?

Materializing and sharing a CSE not necessarily cheaper

Page 9: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

May 2000 Multi-Query Optimization and Applications 9

ExampleExample

AA

BB CC

BB

CC DD

Best Plan for Best Plan for A JOIN B JOIN CA JOIN B JOIN C

Best Plan forBest Plan forB JOIN C JOIN DB JOIN C JOIN D

Foreign Key Dependency: AForeign Key Dependency: ABBCCDD Total Cost = 154Total Cost = 154

100100

1010

1010

1010

111010

1010

11

11 11

Page 10: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

May 2000 Multi-Query Optimization and Applications 10

BCBC

ExampleExample

AA

BB CC

DD

100100 1010

1010

1010

11

1010

1010

11

1010

1010

Foreign Key Dependency: AForeign Key Dependency: ABBCCDDTotal Cost = 172Total Cost = 172

Benefit = -18Benefit = -18

Page 11: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

May 2000 Multi-Query Optimization and Applications 11

ApproachApproach

1. Set up the search space of execution plans

2. Explore the search space to find the best execution plan

Page 12: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

May 2000 Multi-Query Optimization and Applications 12

Representation of Plan Representation of Plan SpaceSpace

Equivalence ClassEquivalence Class(OR node)(OR node)

OperationOperation(AND node)(AND node)

AND/OR Query DAG

BCBC

AA

ABCABC BCDBCD

CDCDABAB

CC DDBB

Example PlanExample Plan(Solution Graph)(Solution Graph)

Page 13: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

May 2000 Multi-Query Optimization and Applications 13

DAG Generation DAG Generation ModificationsModificationsUnificationUnification Volcano: Duplicate subexpressions No CSEs!

BCBC

AA

ABCABC

ABAB

CCBB

BCBC

BCDBCD

CDCD

CC DDBB

Modification: Duplicate subexpressions unified

Page 14: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

May 2000 Multi-Query Optimization and Applications 14

DAG Generation DAG Generation ModificationsModificationsSubsumptionSubsumption Volcano: No expression subsumption Missed

CSEs

(A<10)

(A<10) (A>50)

(A>50)

(A<10 or A>50)

(A>50)

(A>10)

(A>50)

SubsumptionSubsumptionderivationderivation

Modification: Subsumption derivations introduced

Page 15: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

May 2000 Multi-Query Optimization and Applications 15

Exploring the Search SpaceExploring the Search SpaceAn Exhaustive AlgorithmAn Exhaustive AlgorithmInput: DAG for query QOutput: Set of nodes to materialize, corresp. best

plan1. Y = set of equivalence nodes in DAG2. Pick X Y which minimizes BestCost(Q, X) 3. Return X

BestCost(Q, X) = cost of the best plan for Q given that the nodes

in X are transiently materialized

Too expensive! Need heuristics.

Page 16: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

May 2000 Multi-Query Optimization and Applications 16

Exploring the Search SpaceExploring the Search SpaceA Greedy HeuristicA Greedy HeuristicInput: DAG for query QOutput: Set of nodes to materialize, corresp. best

plan1. X = {}; Y = set of equivalence nodes in DAG2. While( Y {} )

Pick z Y which maximizes Benefit(z | Q, X)If( Benefit(z | Q, X) > 0 )

Y = Y – {z}; X = X U {z}Else Y = {}

3. Return X

Benefit(z | Q, X) = BestCost(Q, X) - BestCost(Q, X U {z})

Appeared in [Gupta, ICDT97]. Our Contribution: improve efficiency

Page 17: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

May 2000 Multi-Query Optimization and Applications 17

Improving EfficiencyImproving EfficiencySummarySummaryInput: DAG for query QOutput: Set of nodes to materialize, corresp. best plan1. X = {}; Y = set of equivalence nodes in DAG2. While( Y {} )

Pick z Y which maximizes Benefit(z | Q, X)If( Benefit(z | Q, X) > 0 )

Y = Y – {z}; X = X U {z}Else Y = {}

3. Return X

Restrict the set of materialization candidates Compute Benefit efficiently Heuristically avoid computing Benefit for some nodes

Page 18: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

May 2000 Multi-Query Optimization and Applications 18

Improving EfficiencyImproving EfficiencyOnly CSEs Materialized Only CSEs Materialized CSEs identified in a bottom-up traversal

Common SubexpressionCommon Subexpression

BCBC

AA

ABCABC BCDBCD

CDCDABAB

CC DDBB

Page 19: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

May 2000 Multi-Query Optimization and Applications 19

Improving EfficiencyImproving EfficiencySummarySummaryInput: DAG for query QOutput: Set of nodes to materialize, corresp. best plan1. X = {}; Y = set of equivalence nodes in DAG2. While( Y {} )

Pick z Y which maximizes Benefit(z | Q, X)If( Benefit(z | Q, X) > 0 )

Y = Y – {z}; X = X U {z}Else Y = {}

3. Return X

Restrict the set of materialization candidates Compute Benefit efficiently Heuristically avoid computing Benefit for some nodes

Page 20: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

May 2000 Multi-Query Optimization and Applications 20

Efficient Benefit ComputationEfficient Benefit Computation Incremental Re- Incremental Re-optimizationoptimizationX : Set of CSEs already materializedz : unmaterialized CSE

Best plan given X materialized Best plan given X U {z} materialized

Observation Best plans change only for the

ancestors of z

Page 21: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

May 2000 Multi-Query Optimization and Applications 21

Incremental Re-optimizationIncremental Re-optimization ExampleExample

BCBC

ABCABC BCDBCD

CDCDABAB

Best PlanBest Plan

X = {}

1010 101010101010

100100 100100100100

100100 100100 100100 100100

230230230230 230230

230230z = (B JOIN C)

BCBC10101010

1010

120120 120120

130130

CCBBAA DD

Page 22: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

May 2000 Multi-Query Optimization and Applications 22

Incremental Re-optimizationIncremental Re-optimization Efficient PropagationEfficient PropagationAncestor nodes visited bottom-up in

a topological order– Guarantees no revisits

Propagation path pruned if the current node’s best cost remains unchanged

Page 23: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

May 2000 Multi-Query Optimization and Applications 23

Improving EfficiencyImproving EfficiencySummarySummaryInput: DAG for query QOutput: Set of nodes to materialize, corresp. best plan1. X = {}; Y = set of equivalence nodes in DAG2. While( Y {} )

Pick z Y which maximizes Benefit(z | Q, X)If( Benefit(z | Q, X) > 0 )

Y = Y – {z}; X = X U {z}Else Y = {}

3. Return X

Restrict the set of materialization candidates Compute Benefit efficiently Heuristically avoid computing Benefit for some nodes

Page 24: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

May 2000 Multi-Query Optimization and Applications 24

Avoiding Benefit Avoiding Benefit ComputationComputation Monotonicity Assumption

– Benefit of a node does not increase due to materialization of other nodes

• Often true

An earlier benefit of a node is an upper bound on its current benefit

Do not recompute a node’s benefit if another node’s current benefit is greater

Optimization costs decrease by 90%

Page 25: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

May 2000 Multi-Query Optimization and Applications 25

Experimental ResultsExperimental ResultsTPCD-0.1 on Microsoft SQL Server

6.5 – using SQL rewriting for MQO

0

200

400

600

800

1000

Q2 Q2-D Q11 Q15

Exec

utio

n Ti

me

(sec

s)

No-MQO

MQO (Greedy)

Page 26: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

May 2000 Multi-Query Optimization and Applications 26

Alternatives to GreedyAlternatives to Greedy Volcano-SHVolcano-SH A lightweight post-pass heuristic

1.Compute the best plan for each query independently, using Volcano

2.Find the set of nodes in the best plans to materialize (cost-based)

Similar previous work [Subramanium and Venkataraman, SIGMOD 1998]

Page 27: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

May 2000 Multi-Query Optimization and Applications 27

Alternatives to GreedyAlternatives to Greedy Volcano-RUVolcano-RU A lightweight extension of

Volcano1. Batched queries optimized in

sequence Q1, Q2, …, Qn2. Find the best plan for query Qi given

the best plans for queries Qj, j < i3. Cost based materialization of nodes

in best plans of Qj, j < i Plan quality sensitive to the query

sequence

Page 28: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

May 2000 Multi-Query Optimization and Applications 28

Experimental ResultsExperimental ResultsTPCD-0.1 query batches

0

200

400

600

800

BQ1 BQ2 BQ3 BQ4 BQ5

Estim

ated

Ex

ecut

ion

Tim

e (s

ecs) Volcano

Volcano-SH

Volcano-RU

Greedy

Page 29: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

May 2000 Multi-Query Optimization and Applications 29

Experimental ResultsExperimental ResultsTPCD-0.1 query batches

0.01

0.1

1

10

BQ1 BQ2 BQ3 BQ4 BQ5

Opt

imiz

atio

n Ti

me

(sec

s), l

ogar

ithm

ic s

cale Volcano

Volcano-SH

Volcano-RU

Greedy

Page 30: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

May 2000 Multi-Query Optimization and Applications 30

FeaturesFeatures Easily implemented

– First MQO implementation integrated with a state-of-the-art optimizer (as far as we know)

– Also partially prototyped on Microsoft SQL-Server

Support for index selection– Index modeled as physical property

(like “interesting order”) Extensible and flexible

– New operators, data models– Readily adapts to other problems

• Query result caching• Materialized view selection/maintenance

Page 31: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

Query Result CachingQuery Result Caching

P. RoyP. Roy, K. Ramamritham, S. Seshadri, P. Shenoy and S. Sudarshan,, K. Ramamritham, S. Seshadri, P. Shenoy and S. Sudarshan,Don’t Trash Your Intermediate Results, Cache ‘emDon’t Trash Your Intermediate Results, Cache ‘em,,Submitted for publicationSubmitted for publication

Page 32: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

May 2000 Multi-Query Optimization and Applications 32

Problem StatementProblem Statement

Minimize the total execution time of an online workload by– Caching intermediate/final results of

individual queries, and– Using these cached results to answer

later queries

Page 33: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

May 2000 Multi-Query Optimization and Applications 33

System ModelSystem Model

Page 34: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

May 2000 Multi-Query Optimization and Applications 34

ContributionsContributionsIntermediate as well as final results

cached– Optimizer-driven cache management– Adapts to workload changes

Cache-aware cost-based optimization– Novel framework for cached result

matching

Page 35: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

May 2000 Multi-Query Optimization and Applications 35

Experimental ResultsExperimental Results Overheads negligible Performance on 900 query TPCD-1

based uniform cube-point workload

Page 36: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

Materialized View Materialized View Selection and Selection and MaintenanceMaintenance

Hoshi Mistry, Hoshi Mistry, Prasan RoyPrasan Roy, K. Ramamritham and S. Sudarshan,, K. Ramamritham and S. Sudarshan,Materialized View Selection and Maintenance Using Multi-Query OptimizationMaterialized View Selection and Maintenance Using Multi-Query Optimization,,Submitted for publicationSubmitted for publication

Page 37: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

May 2000 Multi-Query Optimization and Applications 37

Problem StatementProblem StatementSpeed up maintenance of a set of

materialized views by– Exploiting CSEs between different

view maintenance expressions– Selecting additional views to be

materialized

Page 38: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

May 2000 Multi-Query Optimization and Applications 38

ContributionsContributionsOptimization of maintenance

expressions– Support for transiently materialized

“delta’’ viewsNicely integrates transient vs

permanent view materialization choices

Page 39: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

May 2000 Multi-Query Optimization and Applications 39

Experimental ResultsExperimental ResultsOverheads negligiblePerformance benefit for maintenance

of two TPCD-0.1 based SPJA views

Page 40: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

May 2000 Multi-Query Optimization and Applications 40

ConclusionConclusion

MQO is practical– Low overheads, high benefits– Easily implemented and integrated

Leads to novel solutions to related problems– Query result caching– Materialized view selection and

maintenance

Page 41: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

May 2000 Multi-Query Optimization and Applications 41

Future WorkFuture Work

Further extensions of MQO– Shared execution pipelines

Query result caching in presence of updates

Other problems– Continuous queries, XML view

caching, etc.

Page 42: Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

May 2000 Multi-Query Optimization and Applications 42

Other ContributionsOther ContributionsGarbage Collection in Object

Oriented Databases– Developed a “transaction-aware”

cyclic reference counting algorithm– Provided a formal proof of correctness

S. Ashwin, S. Ashwin, Prasan RoyPrasan Roy, S. Seshadri, Avi Silberschatz and S. , S. Seshadri, Avi Silberschatz and S. Sudarshan,Sudarshan,Garbage Collection in Object-Oriented Databases Using Transactional Garbage Collection in Object-Oriented Databases Using Transactional Cyclic Reference CountingCyclic Reference Counting, VLDB 1997, VLDB 1997

Prasan RoyPrasan Roy, S. Seshadri, Avi Silberschatz, S. Sudarshan and S. , S. Seshadri, Avi Silberschatz, S. Sudarshan and S. Ashwin,Ashwin,Garbage Collection in Object-Oriented Databases Using Transactional Garbage Collection in Object-Oriented Databases Using Transactional Cyclic Reference CountingCyclic Reference Counting, Invited Paper, VLDB Journal, August 1998, Invited Paper, VLDB Journal, August 1998