15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove...

77
Intro to Database Systems 15-445/15-645 Fall 2019 Andy Pavlo Computer Science Carnegie Mellon University AP 15 Query Planning & Optimization Part 2

Transcript of 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove...

Page 1: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

Intro to Database Systems

15-445/15-645

Fall 2019

Andy PavloComputer Science Carnegie Mellon UniversityAP

15Query Planning &Optimization

Part 2

Page 2: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

ADMINISTRIVIA

Project #3 will be released this week.It is due Sun Nov 17th @ 11:59pm.

Homework #4 will be released next week.It is due Wed Nov 13th @ 11:59pm.

2

Page 3: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

QUERY OPTIMIZATION

Heuristics / Rules→ Rewrite the query to remove stupid / inefficient things.→ These techniques may need to examine catalog, but they

do not need to examine data.

Cost-based Search→ Use a model to estimate the cost of executing a plan.→ Evaluate multiple equivalent plans for a query and pick

the one with the lowest cost.

3

Page 4: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

TODAY'S AGENDA

Plan Cost Estimation

Plan Enumeration

Nested Sub-queries

4

Page 5: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

COST ESTIMATION

How long will a query take?→ CPU: Small cost; tough to estimate→ Disk: # of block transfers→ Memory: Amount of DRAM used→ Network: # of messages

How many tuples will be read/written?

It is too expensive to run every possible plan to determine this information, so the DBMS need a way to derive this information…

5

Page 6: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

STATISTICS

The DBMS stores internal statistics about tables, attributes, and indexes in its internal catalog.

Different systems update them at different times.

Manual invocations:→ Postgres/SQLite: ANALYZE→ Oracle/MySQL: ANALYZE TABLE→ SQL Server: UPDATE STATISTICS→ DB2: RUNSTATS

6

Page 7: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

STATISTICS

For each relation R, the DBMS maintains the following information:→ NR: Number of tuples in R.→ V(A,R): Number of distinct values for attribute A.

7

Page 8: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

DERIVABLE STATISTICS

The selection cardinality SC(A,R) is the average number of records with a value for an attribute A given NR / V(A,R)

Note that this assumes data uniformity.→ 10,000 students, 10 colleges – how many students in SCS?

8

Page 9: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

SELECTION STATISTICS

Equality predicates on unique keys are easy to estimate.

What about more complex predicates? What is their selectivity?

9

SELECT * FROM people WHERE id = 123

SELECT * FROM people WHERE val > 1000

SELECT * FROM people WHERE age = 30AND status = 'Lit'

CREATE TABLE people (id INT PRIMARY KEY,val INT NOT NULL,age INT NOT NULL,status VARCHAR(16)

);

Page 10: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

COMPLEX PREDICATES

The selectivity (sel) of a predicate P is the fraction of tuples that qualify.

Formula depends on type of predicate:→ Equality→ Range→ Negation→ Conjunction→ Disjunction

10

Page 11: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

COMPLEX PREDICATES

The selectivity (sel) of a predicate P is the fraction of tuples that qualify.

Formula depends on type of predicate:→ Equality→ Range→ Negation→ Conjunction→ Disjunction

10

Page 12: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

SELECTIONS COMPLEX PREDICATES

Assume that V(age,people) has five distinct values (0–4) and NR = 5

Equality Predicate: A=constant→ sel(A=constant) = SC(P) / NR→ Example: sel(age=2) =

11

0 1 2 3 4

cou

nt

age

SC(age=2)=1

SELECT * FROM people WHERE age = 2

1/5

Page 13: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

0 1 2 3 4

cou

nt

age

SELECTIONS COMPLEX PREDICATES

Range Predicate:→ sel(A>=a) = (Amax – a) / (Amax – Amin)→ Example: sel(age>=2)

12

≈ (4 – 2) / (4 – 0)≈ 1/2

agemin = 0

SELECT * FROM people WHERE age >= 2

agemax = 4

Page 14: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

0 1 2 3 4

cou

nt

age

SELECTIONS COMPLEX PREDICATES

Negation Query:→ sel(not P) = 1 – sel(P)→ Example: sel(age != 2)

13

SC(age=2)=1

SELECT * FROM people WHERE age != 2

Page 15: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

0 1 2 3 4

cou

nt

age

SELECTIONS COMPLEX PREDICATES

Negation Query:→ sel(not P) = 1 – sel(P)→ Example: sel(age != 2)

Observation: Selectivity ≈ Probability

13

= 1 – (1/5) = 4/5

SC(age!=2)=2 SC(age!=2)=2

SELECT * FROM people WHERE age != 2

Page 16: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

SELECTIONS COMPLEX PREDICATES

Conjunction: → sel(P1 ⋀ P2) = sel(P1) ∙ sel(P2)→ sel(age=2 ⋀ name LIKE 'A%')

This assumes that the predicates are independent.

14

SELECT * FROM people WHERE age = 2AND name LIKE 'A%'

P1 P2

Page 17: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

SELECTIONS COMPLEX PREDICATES

Conjunction: → sel(P1 ⋀ P2) = sel(P1) ∙ sel(P2)→ sel(age=2 ⋀ name LIKE 'A%')

This assumes that the predicates are independent.

14

SELECT * FROM people WHERE age = 2AND name LIKE 'A%'

P1 P2

Page 18: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

SELECTIONS COMPLEX PREDICATES

Disjunction: → sel(P1 ⋁ P2)

= sel(P1) + sel(P2) – sel(P1⋀P2)= sel(P1) + sel(P2) – sel(P1) ∙ sel(P2)

→ sel(age=2 OR name LIKE 'A%')

This again assumes that theselectivities are independent.

15

SELECT * FROM people WHERE age = 2

OR name LIKE 'A%'

P1 P2

Page 19: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

SELECTIONS COMPLEX PREDICATES

Disjunction: → sel(P1 ⋁ P2)

= sel(P1) + sel(P2) – sel(P1⋀P2)= sel(P1) + sel(P2) – sel(P1) ∙ sel(P2)

→ sel(age=2 OR name LIKE 'A%')

This again assumes that theselectivities are independent.

15

SELECT * FROM people WHERE age = 2

OR name LIKE 'A%'

P1 P2

Page 20: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

SELECTION CARDINALIT Y

Assumption #1: Uniform Data→ The distribution of values (except for the heavy hitters) is

the same.

Assumption #2: Independent Predicates→ The predicates on attributes are independent

Assumption #3: Inclusion Principle→ The domain of join keys overlap such that each key in the

inner relation will also exist in the outer table.

16

Page 21: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

CORREL ATED AT TRIBUTES

Consider a database of automobiles:→ # of Makes = 10, # of Models = 100

And the following query:→ (make="Honda" AND model="Accord")

With the independence and uniformity assumptions, the selectivity is:→ 1/10 × 1/100 = 0.001

But since only Honda makes Accords the real selectivity is 1/100 = 0.01

17

Source: Guy Lohman

Page 22: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

COST ESTIMATIONS

Our formulas are nice, but we assume that data values are uniformly distributed.

20

0

5

10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Uniform Approximation

Distinct values of attribute

# of occurrences

Page 23: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

COST ESTIMATIONS

Our formulas are nice, but we assume that data values are uniformly distributed.

21

0

5

10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Non-Uniform Approximation

Page 24: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

COST ESTIMATIONS

Our formulas are nice, but we assume that data values are uniformly distributed.

21

0

5

10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Non-Uniform Approximation

Bucket #1Count=8

Bucket #2Count=4

Bucket #3Count=15

Bucket #4Count=3

Bucket #5Count=14

Page 25: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

COST ESTIMATIONS

Our formulas are nice, but we assume that data values are uniformly distributed.

21

Bucket #1Count=8

Bucket #2Count=4

Bucket #3Count=15

Bucket #4Count=3

Bucket #5Count=14

0

5

10

15

1-3 4-6 7-9 10-12 13-15

Non-Uniform Approximation

Bucket Ranges

Page 26: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

HISTOGRAMS WITH QUANTILES

Vary the width of buckets so that the total number of occurrences for each bucket is roughly the same.

22

0

5

10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Histogram (Quantiles)

Bucket #1Count=12

Bucket #2Count=12

Bucket #3Count=9

Bucket #4Count=12

Page 27: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

HISTOGRAMS WITH QUANTILES

Vary the width of buckets so that the total number of occurrences for each bucket is roughly the same.

22

0

5

10

15

1-5 6-8 9-13 14-15

Histogram (Quantiles)

Page 28: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

SAMPLING

Modern DBMSs also collect samples from tables to estimate selectivities.

Update samples when the underlying tables changes significantly.

23

⋮1 billion tuples

SELECT AVG(age)FROM people WHERE age > 50

id name age status

1001 Obama 58 Rested

1002 Kanye 41 Weird

1003 Tupac 25 Dead

1004 Bieber 25 Crunk

1005 Andy 38 Lit

Page 29: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

SAMPLING

Modern DBMSs also collect samples from tables to estimate selectivities.

Update samples when the underlying tables changes significantly.

23

⋮1 billion tuples

SELECT AVG(age)FROM people WHERE age > 50

id name age status

1001 Obama 58 Rested

1002 Kanye 41 Weird

1003 Tupac 25 Dead

1004 Bieber 25 Crunk

1005 Andy 38 Lit

1001 Obama 58 Rested

1003 Tupac 25 Dead

1005 Andy 38 Lit

Table Sample

Page 30: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

SAMPLING

Modern DBMSs also collect samples from tables to estimate selectivities.

Update samples when the underlying tables changes significantly.

23

⋮1 billion tuples1/3sel(age>50) =

SELECT AVG(age)FROM people WHERE age > 50

id name age status

1001 Obama 58 Rested

1002 Kanye 41 Weird

1003 Tupac 25 Dead

1004 Bieber 25 Crunk

1005 Andy 38 Lit

1001 Obama 58 Rested

1003 Tupac 25 Dead

1005 Andy 38 Lit

Table Sample

Page 31: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

OBSERVATION

Now that we can (roughly) estimate the selectivity of predicates, what can we actually do with them?

24

Page 32: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

QUERY OPTIMIZATION

After performing rule-based rewriting, the DBMS will enumerate different plans for the query and estimate their costs.→ Single relation.→ Multiple relations.→ Nested sub-queries.

It chooses the best plan it has seen for the query after exhausting all plans or some timeout.

25

Page 33: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

SINGLE-REL ATION QUERY PL ANNING

Pick the best access method.→ Sequential Scan→ Binary Search (clustered indexes)→ Index Scan

Predicate evaluation ordering.

Simple heuristics are often good enough for this.

OLTP queries are especially easy…

26

Page 34: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

OLTP QUERY PL ANNING

Query planning for OLTP queries is easy because they are sargable (Search Argument Able).→ It is usually just picking the best index.→ Joins are almost always on foreign key relationships with

a small cardinality.→ Can be implemented with simple heuristics.

27

CREATE TABLE people (id INT PRIMARY KEY,val INT NOT NULL,⋮

);

SELECT * FROM peopleWHERE id = 123;

Page 35: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

MULTI-REL ATION QUERY PL ANNING

As number of joins increases, number of alternative plans grows rapidly→ We need to restrict search space.

Fundamental decision in System R: only left-deep join trees are considered.→ Modern DBMSs do not always make this assumption

anymore.

28

Page 36: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

MULTI-REL ATION QUERY PL ANNING

Fundamental decision in System R: Only consider left-deep join trees.

29

⨝⨝

A B

C

D

⨝⨝

A B

C

D

⨝⨝

A BC D

Page 37: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

MULTI-REL ATION QUERY PL ANNING

Fundamental decision in System R: Only consider left-deep join trees.

29

⨝⨝

A B

C

D

⨝⨝

A B

C

D

⨝⨝

A BC DX X

Page 38: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

MULTI-REL ATION QUERY PL ANNING

Fundamental decision in System R: Only consider left-deep join trees.

Allows for fully pipelined plans where intermediate results are not written to temp files.→ Not all left-deep trees are fully pipelined.

30

Page 39: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

MULTI-REL ATION QUERY PL ANNING

Enumerate the orderings→ Example: Left-deep tree #1, Left-deep tree #2…

Enumerate the plans for each operator→ Example: Hash, Sort-Merge, Nested Loop…

Enumerate the access paths for each table→ Example: Index #1, Index #2, Seq Scan…

Use dynamic programming to reduce the number of cost estimations.

31

Page 40: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

DYNAMIC PROGRAMMING

32

SortMerge JoinR.a=S.a

SortMerge JoinT.b=S.b

Hash JoinT.b=S.b

R ⨝ ST

T ⨝ SR

R ⨝ S ⨝ T

Hash JoinR.a=S.a SELECT * FROM R, S, T

WHERE R.a = S.aAND S.b = T.b

Cost: 300

Cost: 400

Cost: 280

Cost: 200

RST

Page 41: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

DYNAMIC PROGRAMMING

32

Hash JoinT.b=S.b

R ⨝ ST

T ⨝ SR

R ⨝ S ⨝ T

Hash JoinR.a=S.a SELECT * FROM R, S, T

WHERE R.a = S.aAND S.b = T.b

Cost: 300

Cost: 200

RST

Page 42: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

DYNAMIC PROGRAMMING

32

Hash JoinT.b=S.b

R ⨝ ST

T ⨝ SR

R ⨝ S ⨝ T

Hash JoinR.a=S.a

Hash JoinS.b=T.b

SortMerge JoinS.b=T.b

SortMerge JoinS.a=R.a

Hash JoinS.a=R.a

SELECT * FROM R, S, TWHERE R.a = S.aAND S.b = T.b

Cost: 300

Cost: 200

Cost: 450

Cost: 300

Cost: 400

Cost: 380

RST

Page 43: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

DYNAMIC PROGRAMMING

32

Hash JoinT.b=S.b

R ⨝ ST

T ⨝ SR

R ⨝ S ⨝ T

Hash JoinR.a=S.a

Hash JoinS.b=T.b

SortMerge JoinS.a=R.a

SELECT * FROM R, S, TWHERE R.a = S.aAND S.b = T.b

Cost: 300

Cost: 200

Cost: 300

Cost: 380

RST

Page 44: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

DYNAMIC PROGRAMMING

32

Hash JoinT.b=S.b

R ⨝ ST

T ⨝ SR

R ⨝ S ⨝ TSortMerge JoinS.a=R.a

SELECT * FROM R, S, TWHERE R.a = S.aAND S.b = T.b

Cost: 200

Cost: 300

RST

Page 45: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

CANDIDATE PL AN EXAMPLE

How to generate plans for search algorithm:→ Enumerate relation orderings→ Enumerate join algorithm choices→ Enumerate access method choices

No real DBMSs does it this way.It’s actually more messy…

33

SELECT * FROM R, S, TWHERE R.a = S.aAND S.b = T.b

Page 46: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

CANDIDATE PL ANS

Step #1: Enumerate relation orderings

34

T R

S ⨝

S T

R ×

R S

T

R S

T ⨝

S R

T ×

S T

R

Prune plans with cross-products immediately!

Page 47: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

CANDIDATE PL ANS

Step #1: Enumerate relation orderings

34

T R

S ⨝

S T

R ×

R S

T

R S

T ⨝

S R

T ×

S T

R

X

XPrune plans with cross-products immediately!

Page 48: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

CANDIDATE PL ANS

Step #2: Enumerate join algorithm choices

35

R S

T

Do this for the other plans.

R S

TNLJ

NLJ

R S

THJ

NLJ

R S

TNLJ

HJ

R S

T

HJ

HJ

Page 49: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

CANDIDATE PL ANS

Step #2: Enumerate join algorithm choices

35

R S

T

Do this for the other plans.

R S

TNLJ

NLJ

R S

THJ

NLJ

R S

TNLJ

HJ

R S

T

HJ

HJ

Page 50: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

CANDIDATE PL ANS

Step #3: Enumerate access method choices

36

R S

T

HJ

HJ

Do this for the other plans.

HJ

HJ

SeqScan SeqScan

SeqScan

HJ

HJ

SeqScan IndexScan(S.b)

SeqScan

Page 51: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

POSTGRES OPTIMIZER

Examines all types of join trees→ Left-deep, Right-deep, bushy

Two optimizer implementations:→ Traditional Dynamic Programming Approach→ Genetic Query Optimizer (GEQO)

Postgres uses the traditional algorithm when # of tables in query is less than 12 and switches to GEQO when there are 12 or more.

37

Page 52: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

POSTGRES OPTIMIZER

38

1st Generation

R S

T

NL

NLCost:300

T R

S

NL

HJ

S R

T

HJ

HJ

Cost:200

Cost:100

Page 53: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

POSTGRES OPTIMIZER

38

Best:1001st Generation

R S

T

NL

NLCost:300

T R

S

NL

HJ

S R

T

HJ

HJ

XCost:200

Cost:100

Page 54: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

POSTGRES OPTIMIZER

38

Best:1001st Generation 2nd Generation

R S

T

NL

NLCost:300

T R

S

NL

HJ

S R

T

HJ

HJ

XCost:200

Cost:100

S R

T

HJ

HJ

R T

S

NL

HJ

T R

S

HJ

HJ

Cost:80

Cost:200

Cost:110

Page 55: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

POSTGRES OPTIMIZER

38

1st Generation 2nd GenerationBest:80

R S

T

NL

NLCost:300

T R

S

NL

HJ

S R

T

HJ

HJ

XCost:200

Cost:100

S R

T

HJ

HJ

R T

S

NL

HJ

T R

S

HJ

HJ

Cost:80

Cost:200

Cost:110

Page 56: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

POSTGRES OPTIMIZER

38

1st Generation 2nd GenerationBest:80

R S

T

NL

NLCost:300

T R

S

NL

HJ

S R

T

HJ

HJ

XCost:200

Cost:100

S R

T

HJ

HJ

R T

S

NL

HJ

T R

S

HJ

HJ

X

Cost:80

Cost:200

Cost:110

Page 57: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

POSTGRES OPTIMIZER

38

1st Generation 2nd Generation 3rd Generation

Best:80

R S

T

NL

NLCost:300

T R

S

NL

HJ

S R

T

HJ

HJ

XCost:200

Cost:100

S R

T

HJ

HJ

R T

S

NL

HJ

T R

S

HJ

HJ

X

Cost:80

Cost:200

Cost:110

R S

T

HJ

HJ

R S

T

HJ

HJ

R T

S

HJ

HJ

Cost:90

Cost:160

Cost:120

Page 58: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

NESTED SUB-QUERIES

The DBMS treats nested sub-queries in the where clause as functions that take parameters and return a single value or set of values.

Two Approaches:→ Rewrite to de-correlate and/or flatten them→ Decompose nested query and store result to temporary

table

39

Page 59: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

NESTED SUB-QUERIES: REWRITE

40

SELECT name FROM sailors AS SWHERE EXISTS (

SELECT * FROM reserves AS RWHERE S.sid = R.sidAND R.day = '2018-10-15'

)

Page 60: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

NESTED SUB-QUERIES: REWRITE

40

SELECT name FROM sailors AS SWHERE EXISTS (

SELECT * FROM reserves AS RWHERE S.sid = R.sidAND R.day = '2018-10-15'

)

SELECT nameFROM sailors AS S, reserves AS RWHERE S.sid = R.sidAND R.day = '2018-10-15'

Page 61: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

NESTED SUB-QUERIES: DECOMPOSE

41

For each sailor with the highest rating (over all sailors) and at least two reservations for red boats, find the sailor id and the earliest date on which the sailor has a reservation for a red boat.

SELECT S.sid, MIN(R.day)FROM sailors S, reserves R, boats BWHERE S.sid = R.sidAND R.bid = B.bidAND B.color = 'red'AND S.rating = (SELECT MAX(S2.rating)

FROM sailors S2)GROUP BY S.sid

HAVING COUNT(*) > 1

Page 62: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

DECOMPOSING QUERIES

For harder queries, the optimizer breaks up queries into blocks and then concentrates on one block at a time.

Sub-queries are written to a temporary table that are discarded after the query finishes.

42

Page 63: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

DECOMPOSING QUERIES

43

SELECT S.sid, MIN(R.day)FROM sailors S, reserves R, boats BWHERE S.sid = R.sidAND R.bid = B.bidAND B.color = 'red'AND S.rating = (SELECT MAX(S2.rating)

FROM sailors S2)GROUP BY S.sid

HAVING COUNT(*) > 1

Nested Block

Page 64: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

DECOMPOSING QUERIES

43

SELECT S.sid, MIN(R.day)FROM sailors S, reserves R, boats BWHERE S.sid = R.sidAND R.bid = B.bidAND B.color = 'red'AND S.rating = (SELECT MAX(S2.rating)

FROM sailors S2)GROUP BY S.sid

HAVING COUNT(*) > 1

Nested Block

SELECT MAX(rating) FROM sailors

Page 65: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

DECOMPOSING QUERIES

43

SELECT S.sid, MIN(R.day)FROM sailors S, reserves R, boats BWHERE S.sid = R.sidAND R.bid = B.bidAND B.color = 'red'AND S.rating = (SELECT MAX(S2.rating)

FROM sailors S2)GROUP BY S.sid

HAVING COUNT(*) > 1

SELECT MAX(rating) FROM sailors

###

Page 66: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

DECOMPOSING QUERIES

43

SELECT S.sid, MIN(R.day)FROM sailors S, reserves R, boats BWHERE S.sid = R.sidAND R.bid = B.bidAND B.color = 'red'AND S.rating = (SELECT MAX(S2.rating)

FROM sailors S2)GROUP BY S.sid

HAVING COUNT(*) > 1

Outer Block

SELECT MAX(rating) FROM sailors

###

Page 67: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

CONCLUSION

Filter early as possible.

Selectivity estimations→ Uniformity→ Independence→ Histograms→ Join selectivity

Dynamic programming for join orderings

Rewrite nested queries

Again, query optimization is hard…

44

Page 68: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

EXTRA CREDIT

Each student can earn extra credit if they write a encyclopedia article about a DBMS.→ Can be academic/commercial, active/historical.

Each article will use a standard taxonomy.→ For each feature category, you select pre-defined options

for your DBMS.→ You will then need to provide a summary paragraph with

citations for that category.

45

Page 69: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

EXTRA CREDIT

Each student can earn extra credit if they write a encyclopedia article about a DBMS.→ Can be academic/commercial, active/historical.

Each article will use a standard taxonomy.→ For each feature category, you select pre-defined options

for your DBMS.→ You will then need to provide a summary paragraph with

citations for that category.

45

Page 70: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

EXTRA CREDIT

Each student can earn extra credit if they write a encyclopedia article about a DBMS.→ Can be academic/commercial, active/historical.

Each article will use a standard taxonomy.→ For each feature category, you select pre-defined options

for your DBMS.→ You will then need to provide a summary paragraph with

citations for that category.

45

Page 71: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

EXTRA CREDIT

Each student can earn extra credit if they write a encyclopedia article about a DBMS.→ Can be academic/commercial, active/historical.

Each article will use a standard taxonomy.→ For each feature category, you select pre-defined options

for your DBMS.→ You will then need to provide a summary paragraph with

citations for that category.

45

Page 72: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

DBDB.IO

All the articles will be hosted on dbdb.io→ I will post registration details on Piazza.

I will post a sign-up sheet for you to pick what DBMS you want to write about.→ If you choose a widely known DBMS, then the article will

need to be comprehensive.→ If you choose an obscure DBMS, then you will have to do

the best you can to find information.

46

Page 73: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

DBDB.IO

All the articles will be hosted on dbdb.io→ I will post registration details on Piazza.

I will post a sign-up sheet for you to pick what DBMS you want to write about.→ If you choose a widely known DBMS, then the article will

need to be comprehensive.→ If you choose an obscure DBMS, then you will have to do

the best you can to find information.

46

Page 75: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

HOW TO DECIDE

Pick a DBMS based on whatever criteria you want:→ Country of Origin→ Popularity→ Programming Language→ Single-Node vs. Embedded vs. Distributed→ Disk vs. Memory→ Row Store vs. Column Store→ Open-Source vs. Proprietary

48

Page 76: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

PL AGIARISM WARNING

This article must be your own writing with your own images. You may not copy text/images directly from papers or other sources that you find on the web.

Plagiarism will not be tolerated.See CMU's Policy on Academic Integrity for additional information.

49

Page 77: 15 Query Planning &OptimizationQUERY OPTIMIZATION Heuristics / Rules →Rewrite the query to remove stupid / inefficient things. →These techniques may need to examine catalog, but

CMU 15-445/645 (Fall 2019)

NEXT CL ASS

Transactions!→ aka the second hardest part about database systems

50