CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath...

42
CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost- based optimization) Shivnath Babu

Transcript of CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath...

Page 1: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

CPS216: Advanced Database Systems

Notes 09:Query Optimization (Cost-based optimization)

Shivnath Babu

Page 2: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Query Optimization Problem

Pick the best plan from the space of physical plans

Page 3: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Cost-Based Optimization

• Prune the space of plans using heuristics

• Estimate cost for remaining plans– Be smart about how you iterate through plans

• Pick the plan with least cost

Focus on queries with joins

Page 4: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Heuristics for pruning plan space

• Predicates as early as possible

• Avoid plans with cross products

• Only left-deep join trees

Page 5: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Physical Plan Selection

Logical Query Plan

P1 P2 …. Pn

C1 C2 …. Cn

Pick minimum cost one

Physical plans

Costs

Page 6: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Review of Notation

• T (R) : Number of tuples in R

• B (R) : Number of blocks in R

Page 7: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Simple Cost Model

Cost (R S) = T(R) + T(S) Cost (R S) = T(R) + T(S)

All other operators have 0 cost

Note: The simple cost model used for illustration only

Page 8: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Cost Model Example

R S

T

X

T(R) + T(S)

T(X) + T(T)

Total Cost: T(R) + T(S) + T(T) + T(X)

Page 9: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Selinger Algorithm

• Dynamic Programming based

• Dynamic Programming:– General algorithmic paradigm– Exploits “principle of optimality”– Useful reading:

• Chapter 16, Introduction to Algorithms,Cormen, Leiserson, Rivest

Page 10: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Principle of Optimality

Optimal for “whole” made up from optimal for “parts”

Page 11: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Principle of Optimality

Query: R1 R2 R3 R4 R5

R3 R2

R4

R1

R5Optimal Plan:

Page 12: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Principle of Optimality

Query: R1 R2 R3 R4 R5

R3 R2

R4

R1

R5Optimal Plan:

Optimal plan for joining R3, R2, R4, R1

Page 13: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Principle of Optimality

Query: R1 R2 R3 R4 R5

R3 R2

R4

R1

R5Optimal Plan:

Optimal plan for joining R3, R2, R4

Page 14: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Exploiting Principle of Optimality

Query: R1 R2 … Rn

R3 R1

R2

R2 R3

R1

Optimalfor joining R1, R2, R3

Sub-Optimalfor joining R1, R2, R3

Page 15: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Exploiting Principle of Optimality

R3 R1

R2

Ri

RjSub-Optimal

for joining R1,…,Rn

A sub-optimal sub-plan cannot lead to anoptimal plan

Page 16: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Query: R1 R2 R3 R4

{ R1 } { R2 } { R3 } { R4 }

{ R1, R2 } { R1, R3 } { R1, R4 } { R2, R3 } { R2, R4 } { R3, R4 }

{ R1, R2, R3 } { R1, R2, R4 } { R1, R3, R4 } { R2, R3, R4 }

{ R1, R2, R3, R4 }

Progressof

algorithm

Selinger Algorithm:

Page 17: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Notation

OPT ( { R1, R2, R3 } ):

Cost of optimal plan to join R1,R2,R3

T ( { R1, R2, R3 } ):

Number of tuples in R1 R2 R3

Page 18: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

OPT ( { R1, R2, R3 } ):

OPT ( { R2, R3 } ) + T ( { R2, R3 } ) + T(R1)

OPT ( { R1, R2 } ) + T ( { R1, R2 } ) + T(R3)

OPT ( { R1, R3 } ) + T ( { R1, R3 } ) + T(R2)

Min

Selinger Algorithm:

Note: Valid only for the simple cost model

Page 19: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Query: R1 R2 R3 R4

{ R1 } { R2 } { R3 } { R4 }

{ R1, R2 } { R1, R3 } { R1, R4 } { R2, R3 } { R2, R4 } { R3, R4 }

{ R1, R2, R3 } { R1, R2, R4 } { R1, R3, R4 } { R2, R3, R4 }

{ R1, R2, R3, R4 }

Progressof

algorithm

Selinger Algorithm:

Page 20: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Query: R1 R2 R3 R4

{ R1 } { R2 } { R3 } { R4 }

{ R1, R2 } { R1, R3 } { R1, R4 } { R2, R3 } { R2, R4 } { R3, R4 }

{ R1, R2, R3 } { R1, R2, R4 } { R1, R3, R4 } { R2, R3, R4 }

{ R1, R2, R3, R4 }

Progressof

algorithm

Selinger Algorithm:

Page 21: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

R2

R3

R4

R1

Selinger Algorithm:

Optimal plan:

Query: R1 R2 R3 R4

Page 22: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

More Complex Cost Model

• DB System:– Two join algorithms:

• Tuple-based nested loop join• Sort-Merge join

– Two access methods• Table Scan• Index Scan (all indexes are in memory)

– Plans pipelined as much as possible

• Cost: Number of disk I/O s

Page 23: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Cost of Table Scan

Table Scan

R

Cost: B (R)

Page 24: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Cost of Clustered Index Scan

Index Scan

R

Cost: B (R)

Page 25: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Cost of Clustered Index Scan

Index Scan

R

Cost: B (X)R.A > 50

X

Page 26: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Cost of Non-Clustered Index Scan

Index Scan

R

Cost: T (R)

Page 27: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Cost of Non-Clustered Index Scan

Index Scan

R

Cost: T (X)R.A > 50

X

Page 28: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Cost of Tuple-Based NLJ

NLJ

Cost for entireentire plan:

Cost (Outer) + T(X) x Cost (Inner)

Inner

X

Outer

Page 29: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Cost of Sort-Merge Join

Right

X

Left

Y

Sort Sort

Merge Cost for entire plan:

Cost (Right) + Cost (Left) +2 (B (X) + B (Y) )

R1.A = R2.A

R1 R2

Page 30: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Cost of Sort-Merge Join

Right

X

Left

Y

Sort

Merge Cost for entire plan:

Cost (Right) + Cost (Left) +2 B (Y)

R1.A = R2.A

R1 R2

Sorted on R1.A

Page 31: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Cost of Sort-Merge Join

Right

X

Left

Y

Merge Cost for entire plan:

Cost (Right) + Cost (Left)

R1.A = R2.A

R1 R2

Sorted on R1.A

Sorted on R2.A

Page 32: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Cost of Sort-Merge Join

Bottom Line: Cost depends on sorted-ness of inputs

Page 33: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Principle of Optimality?

Query: R1 R2 R3 R4 R5

SMJ

R1

Is Plan X the optimal plan for joining R2,R3,R4,R5?

Optimal plan:(R1.A = R2.A)

Plan X

Scan

Page 34: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Violation of Principle of Optimality

Plan YPlan X

(sorted on R2.A) (unsorted on R2.A)

Optimal plan for joiningR2,R3,R4,R4

Suboptimal plan for joiningR2,R3,R4,R5

Page 35: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Principle of Optimality?

Query: R1 R2 R3 R4 R5

SMJOptimal plan:

(R1.A = R2.A)

Plan X

Can we assert anything about plan X?

R1

Scan

Page 36: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Weaker Principle of Optimality

If plan X produces output sorted on R2.A then If plan X produces output sorted on R2.A then plan X is the plan X is the optimal planoptimal plan for joining R2,R3,R4,R5 for joining R2,R3,R4,R5 that produces output sorted on R2.Athat produces output sorted on R2.A

If plan X produces output unsorted on R2.A thenIf plan X produces output unsorted on R2.A thenplan X is the plan X is the optimal planoptimal plan for joining R2, R3, R4, R5 for joining R2, R3, R4, R5

Page 37: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Interesting Order

• An attribute is an interesting orderinteresting order if:– participates in a join predicate– Occurs in the Group By clause– Occurs in the Order By clause

Page 38: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Interesting Order: Example

Select *From R1(A,B), R2(A,B), R3(B,C)Where R1.A = R2.A and R2.B = R3.B

Interesting Orders: R1.A, R2.A, R2.B, R3.B

Page 39: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Modified Selinger Algorithm

{R1} {R1}(A) {R3}(B){R2} {R2}(A) {R2}(B) {R3}

{R1,R2} {R1,R2}(A) {R1,R2}(B) {R2,R3} {R2,R3}(A) {R2,R3}(B)

{R1,R2,R3}

Page 40: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Notation

{R1,R2} (C)

Optimal way of joining R1, R2 so that output is sortedon attribute R2.C

Page 41: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Modified Selinger Algorithm

{R1} {R1}(A) {R3}(B){R2} {R2}(A) {R2}(B) {R3}

{R1,R2} {R1,R2}(A) {R1,R2}(B) {R2,R3} {R2,R3}(A) {R2,R3}(B)

{R1,R2,R3}

Page 42: CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

Roadmap

• Query Optimization– Heuristics for pruning plan space– Selinger Algorithm– Estimating costs of plans