CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645...

91
Database Systems 15-445/15-645 Fall 2018 Andy Pavlo Computer Science Carnegie Mellon Univ. AP Lecture #13 Query Optimization

Transcript of CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645...

Page 1: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

Database Systems

15-445/15-645

Fall 2018

Andy PavloComputer Science Carnegie Mellon Univ.AP

Lecture #13

Query Optimization

Page 2: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

ADMINISTRIVIA

Mid-term Exam is on Wednesday October 17th

→ See mid-term exam guide for more info.

Project #2 – Checkpoint #2 is due Friday October 19th @ 11:59pm.

2

Page 3: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

QUERY OPTIMIZATION

Remember that SQL is declarative.→ User tells the DBMS what answer they want, not how to

get the answer.

There can be a big difference in performance based on plan is used:→ See last week: 1.3 hours vs. 0.45 seconds

4

Page 4: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

IBM SYSTEM R

First implementation of a query optimizer.People argued that the DBMS could never choose a query plan better than what a human could write.

A lot of the concepts from System R’s optimizer are still used today.

5

Page 5: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

QUERY OPTIMIZATION

Heuristics / Rules→ Rewrite the query to remove stupid / inefficient things.→ Does not require a cost model.

Cost-based Search→ Use a cost model to evaluate multiple equivalent plans

and pick the one with the lowest cost.

6

Page 6: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

QUERY PL ANNING OVERVIEW

7

SQL Query

Parser

AbstractSyntax

TreeAnnotated

AST

Query Plan

CostModel

SystemCatalog

Rewriter(Optional)

Binder OptimizerAnnotated

AST

Name→Internal ID

Page 7: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

TODAY'S AGENDA

Relational Algebra Equivalences

Plan Cost Estimation

Plan Enumeration

Nested Sub-queries

Mid-Term Review

8

Page 8: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

REL ATIONAL ALGEBRA EQUIVALENCES

Two relational algebra expressions are equivalentif they generate the same set of tuples.

The DBMS can identify better query plans without a cost model.

This is often called query rewriting.

9

Page 9: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

PREDICATE PUSHDOWN

10

student enrolled

s.sid=e.sid

grade='A'

s.name,e.cid

⨝s

p

student enrolled

s.sid=e.sid

grade='A'

s.name,e.cid

sp

SELECT s.name, e.cidFROM student AS s, enrolled AS eWHERE s.sid = e.sidAND e.grade = 'A'

Page 10: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

REL ATIONAL ALGEBRA EQUIVALENCES

11

name, cid(σgrade='A'(student⋈enrolled))

name, cid(student⋈(σgrade='A'(enrolled)))

=

SELECT s.name, e.cidFROM student AS s, enrolled AS eWHERE s.sid = e.sidAND e.grade = 'A'

Page 11: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

REL ATIONAL ALGEBRA EQUIVALENCES

Selections:→ Perform filters as early as possible.→ Reorder predicates so that the DBMS applies the most

selective one first.→ Break a complex predicate, and push down

σp1∧p2∧…pn(R) = σp1(σp2(…σpn(R)))

Simplify a complex predicate → (X=Y AND Y=3) → X=3 AND Y=3

12

Page 12: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

REL ATIONAL ALGEBRA EQUIVALENCES

Projections:→ Perform them early to create smaller tuples and reduce

intermediate results (if duplicates are eliminated)→ Project out all attributes except the ones requested or

required (e.g., joining keys)

This is not important for a column store…

13

Page 13: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

PROJECTION PUSHDOWN

14

student enrolled

s.sid=e.sid

grade='A'

s.name,e.cid

⨝s

p

student enrolled

s.sid=e.sid

grade='A'

s.name,e.cid

s

p

sid,cidpsid,namep

SELECT s.name, e.cidFROM student AS s, enrolled AS eWHERE s.sid = e.sidAND e.grade = 'A'

Page 14: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

MORE EXAMPLES

Impossible / Unnecessary Predicates

15

Source: Lukas Eder

SELECT * FROM A WHERE 1 = 0;X

CREATE TABLE A (id INT PRIMARY KEY,val INT NOT NULL );

Page 15: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

MORE EXAMPLES

Impossible / Unnecessary Predicates

15

Source: Lukas Eder

SELECT * FROM A WHERE 1 = 0;

SELECT * FROM A WHERE 1 = 1;

X

CREATE TABLE A (id INT PRIMARY KEY,val INT NOT NULL );

Page 16: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

MORE EXAMPLES

Impossible / Unnecessary Predicates

15

Source: Lukas Eder

SELECT * FROM A WHERE 1 = 0;

SELECT * FROM A WHERE 1 = 1;SELECT * FROM A;

X

CREATE TABLE A (id INT PRIMARY KEY,val INT NOT NULL );

Page 17: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

MORE EXAMPLES

Impossible / Unnecessary Predicates

Join Elimination

15

Source: Lukas Eder

SELECT * FROM A WHERE 1 = 0;

SELECT A1.*FROM A AS A1 JOIN A AS A2ON A1.id = A2.id;

SELECT * FROM A WHERE 1 = 1;SELECT * FROM A;

X

CREATE TABLE A (id INT PRIMARY KEY,val INT NOT NULL );

Page 18: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

MORE EXAMPLES

Impossible / Unnecessary Predicates

Join Elimination

15

Source: Lukas Eder

SELECT * FROM A WHERE 1 = 0;

SELECT * FROM A WHERE 1 = 1;

SELECT * FROM A;

SELECT * FROM A;

X

CREATE TABLE A (id INT PRIMARY KEY,val INT NOT NULL );

Page 19: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

MORE EXAMPLES

Ignoring Projections

16

Source: Lukas Eder

SELECT * FROM A AS A1WHERE EXISTS(SELECT * FROM A AS A2

WHERE A1.id = A2.id);

CREATE TABLE A (id INT PRIMARY KEY,val INT NOT NULL );

Page 20: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

MORE EXAMPLES

Ignoring Projections

16

Source: Lukas Eder

SELECT * FROM A;

CREATE TABLE A (id INT PRIMARY KEY,val INT NOT NULL );

Page 21: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

MORE EXAMPLES

Ignoring Projections

Merging Predicates

16

Source: Lukas Eder

SELECT * FROM AWHERE val BETWEEN 1 AND 100

OR val BETWEEN 50 AND 150;

SELECT * FROM A;

CREATE TABLE A (id INT PRIMARY KEY,val INT NOT NULL );

Page 22: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

MORE EXAMPLES

Ignoring Projections

Merging Predicates

16

Source: Lukas Eder

SELECT * FROM AWHERE val BETWEEN 1 AND 150;

SELECT * FROM A;

CREATE TABLE A (id INT PRIMARY KEY,val INT NOT NULL );

Page 23: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

REL ATIONAL ALGEBRA EQUIVALENCES

Joins:→ Commutative, associative

R⋈S = S⋈R(R⋈S)⋈T = R⋈(S⋈T)

How many different orderings are there for an n-way join?

17

Page 24: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

REL ATIONAL ALGEBRA EQUIVALENCES

How many different orderings are there for an n-way join?

Catalan number ≈4n→ Exhaustive enumeration will be too slow.

We’ll see in a second how an optimizer limits the search space...

18

Page 25: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

COST ESTIMATION

How long will a query take?→ CPU: Small cost; tough to estimate→ Disk: # of block transfers→ Memory: Amount of DRAM used→ Network: # of messages

How many tuples will be read/written?

What statistics do we need to keep?

19

Page 26: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

STATISTICS

The DBMS stores internal statistics about tables, attributes, and indexes in its internal catalog.

Different systems update them at different times.

Manual invocations:→ Postgres/SQLite: ANALYZE→ Oracle/MySQL: ANALYZE TABLE→ SQL Server: UPDATE STATISTICS→ DB2: RUNSTATS

20

Page 27: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

STATISTICS

For each relation R, the DBMS maintains the following information:→ NR: Number of tuples in R.→ V(A,R): Number of distinct values for attribute A.

21

Page 28: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

DERIVABLE STATISTICS

The selection cardinality SC(A,R) is the average number of records with a value for an attribute A given NR / V(A,R)

Note that this assumes data uniformity.→ 10,000 students, 10 colleges – how many students in SCS?

22

Page 29: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

SELECTION STATISTICS

Equality predicates on unique keys are easy to estimate.

What about more complex predicates? What is their selectivity?

23

SELECT * FROM people WHERE id = 123

SELECT * FROM people WHERE val > 1000

SELECT * FROM people WHERE age = 30AND status = 'Lit'

Page 30: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

COMPLEX PREDICATES

The selectivity (sel) of a predicate P is the fraction of tuples that qualify.

Formula depends on type of predicate:→ Equality→ Range→ Negation→ Conjunction→ Disjunction

24

Page 31: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

COMPLEX PREDICATES

The selectivity (sel) of a predicate P is the fraction of tuples that qualify.

Formula depends on type of predicate:→ Equality→ Range→ Negation→ Conjunction→ Disjunction

24

Page 32: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

SELECTIONS COMPLEX PREDICATES

Assume that V(age,people) has five distinct values (0–4) and NR = 5

Equality Predicate: A=constant→ sel(A=constant) = SC(P) / V(A,R)→ Example: sel(age=2) =

25

0 1 2 3 4

cou

nt

age

SELECT * FROM people WHERE age = 2

Page 33: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

SELECTIONS COMPLEX PREDICATES

Assume that V(age,people) has five distinct values (0–4) and NR = 5

Equality Predicate: A=constant→ sel(A=constant) = SC(P) / V(A,R)→ Example: sel(age=2) =

25

0 1 2 3 4

cou

nt

age

V(age,people)=5

SELECT * FROM people WHERE age = 2

Page 34: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

SELECTIONS COMPLEX PREDICATES

Assume that V(age,people) has five distinct values (0–4) and NR = 5

Equality Predicate: A=constant→ sel(A=constant) = SC(P) / V(A,R)→ Example: sel(age=2) =

25

0 1 2 3 4

cou

nt

age

V(age,people)=5

SC(age=2)=1

SELECT * FROM people WHERE age = 2

1/5

Page 35: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

0 1 2 3 4

cou

nt

age

SELECTIONS COMPLEX PREDICATES

Range Query:→ sel(A>=a) = (Amax – a) / (Amax – Amin)→ Example: sel(age>=2)

26

= (4 – 2) / (4 – 0)= 1/2

agemin = 0

SELECT * FROM people WHERE age >= 2

agemax = 4

Page 36: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

0 1 2 3 4

cou

nt

age

SELECTIONS COMPLEX PREDICATES

Negation Query:→ sel(not P) = 1 – sel(P)→ Example: sel(age != 2)

27

SC(age=2)=1

SELECT * FROM people WHERE age != 2

Page 37: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

0 1 2 3 4

cou

nt

age

SELECTIONS COMPLEX PREDICATES

Negation Query:→ sel(not P) = 1 – sel(P)→ Example: sel(age != 2)

Observation: Selectivity ≈ Probability

27

= 1 – (1/5) = 4/5

SC(age!=2)=2 SC(age!=2)=2

SELECT * FROM people WHERE age != 2

Page 38: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

SELECTIONS COMPLEX PREDICATES

Conjunction: → sel(P1 ⋀ P2) = sel(P1) ∙ sel(P2)→ sel(age=2 ⋀ name LIKE 'A%')

This assumes that the predicates are independent.

28

SELECT * FROM people WHERE age = 2AND name LIKE 'A%'

P1 P2

Page 39: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

SELECTIONS COMPLEX PREDICATES

Conjunction: → sel(P1 ⋀ P2) = sel(P1) ∙ sel(P2)→ sel(age=2 ⋀ name LIKE 'A%')

This assumes that the predicates are independent.

28

SELECT * FROM people WHERE age = 2AND name LIKE 'A%'

P1 P2

Page 40: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

SELECTIONS COMPLEX PREDICATES

Conjunction: → sel(P1 ⋀ P2) = sel(P1) ∙ sel(P2)→ sel(age=2 ⋀ name LIKE 'A%')

This assumes that the predicates are independent.

28

SELECT * FROM people WHERE age = 2AND name LIKE 'A%'

P1 P2

Page 41: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

SELECTIONS COMPLEX PREDICATES

Disjunction: → sel(P1 ⋁ P2)

= sel(P1) + sel(P2) – sel(P1⋁P2)= sel(P1) + sel(P2) – sel(P1) ∙ sel(P2)

→ sel(age=2 OR name LIKE 'A%')

This again assumes that the selectivitiesare independent.

29

SELECT * FROM people WHERE age = 2

OR name LIKE 'A%'

P1 P2

Page 42: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

SELECTIONS COMPLEX PREDICATES

Disjunction: → sel(P1 ⋁ P2)

= sel(P1) + sel(P2) – sel(P1⋁P2)= sel(P1) + sel(P2) – sel(P1) ∙ sel(P2)

→ sel(age=2 OR name LIKE 'A%')

This again assumes that the selectivitiesare independent.

29

SELECT * FROM people WHERE age = 2

OR name LIKE 'A%'

P1 P2

Page 43: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

RESULT SIZE ESTIMATION FOR JOINS

Given a join of R and S, what is the range of possible result sizes in # of tuples?

In other words, for a given tuple of R, how many tuples of S will it match?

30

Page 44: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

RESULT SIZE ESTIMATION FOR JOINS

General case: Rcols⋂Scols={A} where A is not a key for either table.→ Match each R-tuple with S-tuples:

estSize ≈ NR ∙ NS / V(A,S)→ Symmetrically, for S:

estSize ≈ NR ∙ NS / V(A,R)

Overall: → estSize ≈ NR ∙ NS / max({V(A,S), V(A,R)})

31

Page 45: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

COST ESTIMATIONS

Our formulas are nice but we assume that data values are uniformly distributed.

32

0

5

10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Uniform Approximation

Distinct values of attribute

# of occurrences

Page 46: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

COST ESTIMATIONS

Our formulas are nice but we assume that data values are uniformly distributed.

33

Bucket #1Count=8

Bucket #2Count=4

Bucket #3Count=15

Bucket #4Count=3

Bucket #5Count=14

0

5

10

15

1-3 4-6 7-9 10-12 13-15

Non-Uniform Approximation

Bucket Ranges

Page 47: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

HISTOGRAMS WITH QUANTILES

A histogram type wherein the "spread" of each bucket is same.

34

0

5

10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Equi-width Histogram (Quantiles)

Bucket #1Count=12

Bucket #2Count=12

Bucket #3Count=9

Bucket #4Count=12

Page 48: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

HISTOGRAMS WITH QUANTILES

A histogram type wherein the "spread" of each bucket is same.

34

0

5

10

15

1-5 6-8 9-13 14-15

Equi-width Histogram (Quantiles)

Page 49: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

SAMPLING

Modern DBMSs also collect samples from tables to estimate selectivities.

Update samples when the underlying tables changes significantly.

35

⋮1 billion tuples

SELECT AVG(age)FROM people WHERE age > 50

id name age status

1001 Obama 56 Rested

1002 Kanye 40 Weird

1003 Tupac 25 Dead

1004 Bieber 23 Crunk

1005 Andy 37 Lit

Page 50: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

SAMPLING

Modern DBMSs also collect samples from tables to estimate selectivities.

Update samples when the underlying tables changes significantly.

35

⋮1 billion tuplessel(age>50) =

SELECT AVG(age)FROM people WHERE age > 50

id name age status

1001 Obama 56 Rested

1002 Kanye 40 Weird

1003 Tupac 25 Dead

1004 Bieber 23 Crunk

1005 Andy 37 Lit

1001 Obama 56 Rested

1003 Tupac 25 Dead

1005 Andy 37 Lit

Table Sample

Page 51: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

SAMPLING

Modern DBMSs also collect samples from tables to estimate selectivities.

Update samples when the underlying tables changes significantly.

35

⋮1 billion tuples1/3sel(age>50) =

SELECT AVG(age)FROM people WHERE age > 50

id name age status

1001 Obama 56 Rested

1002 Kanye 40 Weird

1003 Tupac 25 Dead

1004 Bieber 23 Crunk

1005 Andy 37 Lit

1001 Obama 56 Rested

1003 Tupac 25 Dead

1005 Andy 37 Lit

Table Sample

Page 52: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

OBSERVATION

Now that we can (roughly) estimate the selectivity of predicates, what can we actually do with them?

36

Page 53: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

QUERY OPTIMIZATION

After performing rule-based rewriting, the DBMS will enumerate different plans for the query and estimate their costs.→ Single relation.→ Multiple relations.→ Nested sub-queries.

It chooses the best plan it has seen for the query after exhausting all plans or some timeout.

37

Page 54: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

SINGLE-REL ATION QUERY PL ANNING

Pick the best access method.→ Sequential Scan→ Binary Search (clustered indexes)→ Index Scan

Simple heuristics are often good enough for this.

OLTP queries are especially easy.

38

Page 55: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

OLTP QUERY PL ANNING

Query planning for OLTP queries is easy because they are sargable.→ Search Argument Able→ It is usually just picking the best index.→ Joins are almost always on foreign key relationships with

a small cardinality.→ Can be implemented with simple heuristics.

39

Page 56: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

MULTI-REL ATION QUERY PL ANNING

As number of joins increases, number of alternative plans grows rapidly→ We need to restrict search space.

Fundamental decision in System R: only left-deep join trees are considered.→ Modern DBMSs do not always make this assumption

anymore.

40

Page 57: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

MULTI-REL ATION QUERY PL ANNING

Fundamental decision in System R: Only consider left-deep join trees.

41

⨝⨝

A B

C

D

⨝⨝

A B

C

D

⨝⨝

A BC D

Page 58: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

MULTI-REL ATION QUERY PL ANNING

Fundamental decision in System R: Only consider left-deep join trees.

41

⨝⨝

A B

C

D

⨝⨝

A B

C

D

⨝⨝

A BC DX X

Page 59: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

MULTI-REL ATION QUERY PL ANNING

Fundamental decision in System R: Only consider left-deep join trees.

Allows for fully pipelined plans where intermediate results are not written to temp files.→ Not all left-deep trees are fully pipelined.

42

Page 60: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

MULTI-REL ATION QUERY PL ANNING

Enumerate the orderings→ Example: Left-deep tree #1, Left-deep tree #2…

Enumerate the plans for each operator→ Example: Hash, Sort-Merge, Nested Loop…

Enumerate the access paths for each table→ Example: Index #1, Index #2, Seq Scan…

Use dynamic programming to reduce the number of cost estimations.

43

Page 61: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

DYNAMIC PROGRAMMING

45

RST

SortMerge JoinR.a=S.a

SortMerge JoinT.b=S.b

Hash JoinT.b=S.b

R ⨝ ST

T ⨝ SR

R ⨝ S ⨝ T

Hash JoinR.a=S.a

SELECT * FROM R, S, TWHERE R.a = S.aAND S.b = T.b

Page 62: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

DYNAMIC PROGRAMMING

45

RST

Hash JoinT.b=S.b

R ⨝ ST

T ⨝ SR

R ⨝ S ⨝ T

Hash JoinR.a=S.a

SELECT * FROM R, S, TWHERE R.a = S.aAND S.b = T.b

Page 63: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

DYNAMIC PROGRAMMING

45

RST

Hash JoinT.b=S.b

R ⨝ ST

T ⨝ SR

R ⨝ S ⨝ T

Hash JoinR.a=S.a Hash Join

S.b=T.b

SortMerge JoinS.b=T.b

SortMerge JoinS.a=R.a

Hash JoinS.a=R.a

SELECT * FROM R, S, TWHERE R.a = S.aAND S.b = T.b

Page 64: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

DYNAMIC PROGRAMMING

45

RST

Hash JoinT.b=S.b

R ⨝ ST

T ⨝ SR

R ⨝ S ⨝ T

Hash JoinR.a=S.a Hash Join

S.b=T.b

SortMerge JoinS.a=R.a

SELECT * FROM R, S, TWHERE R.a = S.aAND S.b = T.b

Page 65: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

DYNAMIC PROGRAMMING

45

RST

Hash JoinT.b=S.b

R ⨝ ST

T ⨝ SR

R ⨝ S ⨝ TSortMerge Join

S.a=R.a

SELECT * FROM R, S, TWHERE R.a = S.aAND S.b = T.b

Page 66: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

CANDIDATE PL AN EXAMPLE

How to generate plans for search algorithm:→ Enumerate relation orderings→ Enumerate join algorithm choices→ Enumerate access method choices

No real DBMSs does it this way.It’s actually more messy…

46

SELECT * FROM R, S, TWHERE R.a = S.aAND S.b = T.b

Page 67: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

CANDIDATE PL ANS

Step #1: Enumerate relation orderings

47

T R

S ⨝

S T

R ×

R S

T

R S

T ⨝

S R

T ×

S T

R

Page 68: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

CANDIDATE PL ANS

Step #1: Enumerate relation orderings

47

T R

S ⨝

S T

R ×

R S

T

R S

T ⨝

S R

T ×

S T

R

X

XPrune plans with cross-products immediately!

Page 69: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

CANDIDATE PL ANS

Step #2: Enumerate join algorithm choices

48

R S

T

Do this for the other plans.

R S

TNLJ

NLJ

R S

THJ

NLJ

R S

TNLJ

HJ

R S

T

HJ

HJ

Page 70: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

CANDIDATE PL ANS

Step #2: Enumerate join algorithm choices

48

R S

T

Do this for the other plans.

R S

TNLJ

NLJ

R S

THJ

NLJ

R S

TNLJ

HJ

R S

T

HJ

HJ

Page 71: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

CANDIDATE PL ANS

Step #3: Enumerate access method choices

49

R S

T

HJ

HJ

Do this for the other plans.

HJ

HJ

SeqScan SeqScan

SeqScan

HJ

HJ

SeqScan IndexScan(S.b)

SeqScan

Page 72: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

NESTED SUB-QUERIES

The DBMS treats nested sub-queries in the where clause as functions that take parameters and return a single value or set of values.

Two Approaches:→ Rewrite to de-correlate and/or flatten them→ Decompose nested query and store result to temporary

table

50

Page 73: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

NESTED SUB-QUERIES: REWRITE

51

SELECT name FROM sailors AS SWHERE EXISTS (

SELECT * FROM reserves AS RWHERE S.sid = R.sidAND R.day = '2018-10-15'

)

SELECT nameFROM sailors AS S, reserves AS RWHERE S.sid = R.sidAND R.day = '2018-10-15'

Page 74: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

NESTED SUB-QUERIES: DECOMPOSE

For each sailor with the highest rating (over all sailors) and at least two reservations for red boats, find the sailor id and the earliest date on which the sailor has a reservation for a red boat.

52

SELECT S.sid, MIN(R.day)FROM sailors S, reserves R, boats BWHERE S.sid = R.sidAND R.bid = B.bidAND B.color = 'red'AND S.rating = (SELECT MAX(S2.rating)

FROM sailors S2)GROUP BY S.sid

HAVING COUNT(*) > 1

Page 75: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

DECOMPOSING QUERIES

For harder queries, the optimizer breaks up queries into blocks and then concentrates on one block at a time.

Sub-queries are written to a temporary table that are discarded after the query finishes.

53

Page 76: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

DECOMPOSING QUERIES

54

SELECT S.sid, MIN(R.day)FROM sailors S, reserves R, boats BWHERE S.sid = R.sidAND R.bid = B.bidAND B.color = 'red'AND S.rating = (SELECT MAX(S2.rating)

FROM sailors S2)GROUP BY S.sid

HAVING COUNT(*) > 1

Nested Block

Page 77: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

DECOMPOSING QUERIES

54

SELECT S.sid, MIN(R.day)FROM sailors S, reserves R, boats BWHERE S.sid = R.sidAND R.bid = B.bidAND B.color = 'red'AND S.rating = (SELECT MAX(S2.rating)

FROM sailors S2)GROUP BY S.sid

HAVING COUNT(*) > 1

Nested Block

SELECT MAX(rating) FROM sailors

###

Page 78: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

DECOMPOSING QUERIES

54

SELECT S.sid, MIN(R.day)FROM sailors S, reserves R, boats BWHERE S.sid = R.sidAND R.bid = B.bidAND B.color = 'red'AND S.rating = (SELECT MAX(S2.rating)

FROM sailors S2)GROUP BY S.sid

HAVING COUNT(*) > 1

SELECT MAX(rating) FROM sailors

###

Page 79: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

DECOMPOSING QUERIES

54

SELECT S.sid, MIN(R.day)FROM sailors S, reserves R, boats BWHERE S.sid = R.sidAND R.bid = B.bidAND B.color = 'red'AND S.rating = (SELECT MAX(S2.rating)

FROM sailors S2)GROUP BY S.sid

HAVING COUNT(*) > 1

Outer Block

SELECT MAX(rating) FROM sailors

###

Page 80: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

CONCLUSION

Filter early as possible.

Selectivity estimations→ Uniformity→ Independence→ Histograms→ Join selectivity

Dynamic programming for join orderings

Rewrite nested queries

Query optimization is really hard…

55

Page 81: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

Midterm Exam

Who: You

What: Midterm Exam

When: Wed Oct 17th 12:00pm ‐ 1:20pm

Where: Posner Mellon Auditorium

Why: https://youtu.be/xgMiaIPxSlc

56

Page 82: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

MIDTERM

What to bring:→ CMU ID→ Calculator→ One 8.5x11" page of notes (double-sided)

What not to bring:→ Live animals→ Your wet laundry

57

Page 83: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

MIDTERM

Covers up to Joins (inclusive).→ Closed book, one sheet of notes (double-sided)→ Please email Andy if you need special accommodations.

https://15445.courses.cs.cmu.edu/fall2018/midterm-guide.html

58

Page 84: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

REL ATIONAL MODEL

Integrity Constraints

Relation Algebra

59

Page 85: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

SQL

Basic operations:→ SELECT / INSERT / UPDATE / DELETE→ WHERE predicates→ Output control

More complex operations:→ Joins→ Aggregates→ Common Table Expressions

60

Page 86: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

STORAGE

Buffer Management Policies→ LRU / MRU / CLOCK

On-Disk File Organization→ Heaps→ Linked Lists

Page Layout→ Slotted Pages→ Log-Structured

61

Page 87: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

HASHING

Static Hashing→ Linear Probing→ Robin Hood→ Cuckoo Hashing

Dynamic Hashing→ Extendible Hashing→ Linear Hashing

Comparison with B+Trees

62

Page 88: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

TREE INDEXES

B+Tree→ Insertions / Deletions→ Splits / Merges→ Difference with B-Tree→ Latch Crabbing / Coupling

Radix Trees

Skip Lists

63

Page 89: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

SORTING

Two-way External Merge Sort

General External Merge Sort

Cost to sort different data sets with different number of buffers.

64

Page 90: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

QUERY PROCESSING

Processing Models→ Advantages / Disadvantages

Join Algorithms→ Nested Loop→ Sort-Merge→ Hash

65

Page 91: CMU 15-445/645 Database Systems (Fall 2018 :: Query Optimization · 2019-03-06 · CMU 15-445/645 (Fall 2018) QUERY OPTIMIZATION Remember that SQL is declarative. →User tells the

CMU 15-445/645 (Fall 2018)

NEXT CL ASS

Parallel Query Execution

66