CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

42
CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu

Transcript of CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

Page 1: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

CPS216: Advanced Database Systems

Notes 03:Query Processing (Overview, contd.)

Shivnath Babu

Page 2: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

parse

Query rewriting

Physical plan generation

execute

result

SQL query

parse tree

logical query planstatistics

physical query plan

QueryOptimization

Query Execution

Overview of Query

Processing

Page 3: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

parse

Query rewriting

Physical plan generation

execute

result

SQL query

parse tree

logical query planstatistics

physical query plan

Initial logical plan

“Best” logical plan

Logical plan

Rewrite rules

Page 4: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

Query Rewriting

B,D

R.A = “c” Λ R.C = S.C

X

R S

B,D

R.A = “c”

X

R S

R.C = S.C

We will revisit it towards the end of this lecture

Page 5: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

parse

Query rewriting

Physical plan generation

execute

result

SQL query

parse tree

Best logical query planstatistics

Best physical query plan

Page 6: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

Physical Plan Generation

B,D

R.A = “c”

R

S

Natural join

Best logical planR S

Index scan Table scan

Hash join

Project

Page 7: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

parse

Query rewriting

Physical plan generation

execute

result

SQL query

parse tree

Best logical query planstatistics

Best physical query plan

Enumerate possible physical plans

Find the cost of each plan

Pick plan with minimum cost

Page 8: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

Physical Plan Generation

Logical Query Plan

P1 P2 …. Pn

C1 C2 …. Cn

Pick minimum cost one

Physical plans

Costs

Page 9: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

Plans for Query Execution• Roadmap

– Path of a SQL query– Operator trees– Physical Vs Logical plans– Plumbing: Materialization Vs pipelining

Page 10: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

Modern DBMS Architecture

Disk(s)

Applications

OS

Parser

Query Optimizer

Query Executor

Storage Manager

Logical query plan

Physical query plan

Access method API calls

SQL

File system API callsStorage system API calls

DBMS

Page 11: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

Logical Plans Vs. Physical Plans

B,D

R.A = “c”

R

S

Natural join

Best logical planR S

Index scan Table scan

Hash join

Project

Page 12: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

B,D

R.A = “c”

R

S

Operator Plumbing

• Materialization: output of one operator written to disk, next operator reads from the disk

• Pipelining: output of one operator directly fed to next operator

Page 13: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

B,D

R.A = “c”

R

S

Materialization

Materialized here

Page 14: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

B,D

R.A = “c”

R

S

Iterators: Pipelining

Each operator supports:• Open()• GetNext()• Close()

Page 15: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

Iterator for Table Scan (R)

Open() { /** initialize variables */ b = first block of R; t = first tuple in block b;}

GetNext() { IF (t is past last tuple in block b) { set b to next block; IF (there is no next block) /** no more tuples */ RETURN EOT; ELSE t = first tuple in b; } /** return current tuple */ oldt = t; set t to next tuple in block b; RETURN oldt;}

Close() { /** nothing to be done */}

Page 16: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

Iterator for Select

Open() { /** initialize child */ Child.Open();}

GetNext() { LOOP: t = Child.GetNext(); IF (t == EOT) { /** no more tuples */ RETURN EOT; } ELSE IF (t.A == “c”) RETURN t; ENDLOOP:}

Close() { /** inform child */ Child.Close();}

R.A = “c”

Page 17: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

Iterator for Sort

Open() { /** Bulk of the work is here */ Child.Open(); Read all tuples from Child and sort them}

GetNext() { IF (more tuples) RETURN next tuple in order; ELSE RETURN EOT;}

Close() { /** inform child */ Child.Close();}

R.A

Page 18: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

• TNLJ (conceptually)

for each r Lexp do

for each s Rexp do

if Lexp.C = Rexp.C, output r,s

Iterator for Tuple Nested Loop Join

Lexp Rexp

Page 19: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

Example 1: Left-Deep Plan

R1(A,B)

TableScan

R2(B,C)

TableScan

R3(C,D)

TableScan

TNLJ

TNLJ

Question: What is the sequence of getNext() calls?

Page 20: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

Example 2: Right-Deep Plan

R3(C,D)

TableScan

TNLJ

R1(A,B)

TableScan

R2(B,C)

TableScan

TNLJ

Question: What is the sequence of getNext() calls?

Page 21: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

Example

Worked on blackboard

Page 22: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

Cost Measure for a Physical Plan

• There are many cost measures– Time to completion– Number of I/Os (we will see a lot of this)– Number of getNext() calls

• Tradeoff: Simplicity of estimation Vs. Accurate estimation of performance as seen by user

Page 23: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

Textbook outline

Chapter 15

15.1 Physical operators- Scan, Sort (Ch. 11.4), Indexes (Ch. 13)

15.2-15.6 Implementing operators +

estimating their cost

15.8 Buffer Management

15.9 Parallel Processing

Page 24: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

Chapter 16

16.1 Parsing

16.2 Algebraic laws

16.3 Parse tree logical query plan

16.4 Estimating result sizes

16.5-16.7 Cost based optimization

Textbook outline (contd.)

Page 25: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

Chapter 5 Relational Algebra

Chapter 6 SQL

Background Material

Page 26: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

parse

Query rewriting

Physical plan generation

execute

result

SQL query

parse tree

logical query planstatistics

physical query plan

Query Processing - In class order

2; 16.1

3; 16.2,16.3

1; 13, 15

4; 16.4—16.7

First: A quick look at this

Page 27: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

Why do we need Query Rewriting?

• Pruning the HUGE space of physical plans– Eliminating redundant conditions/operators– Rules that will improve performance with very

high probability

• Preprocessing– Getting queries into a form that we know how

to handle best

Reduces optimization time drastically without noticeably affecting quality

Page 28: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

Some Query Rewrite Rules

• Transform one logical plan into another– Do not use statistics

• Equivalences in relational algebra

• Push-down predicates

• Do projects early

• Avoid cross-products if possible

Page 29: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

Equivalences in Relational Algebra

R S = S R Commutativity

(R S) T = R (S T) Associativity

Also holds for: Cross Products, Union, Intersection

R x S = S x R

(R x S) x T = R x (S x T)

R U S = S U R

R U (S U T) = (R U S) U T

Page 30: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

Apply Rewrite Rule (1)

B,D [ R.C=S.C [R.A=“c”(R X S)]]

B,D

R.A = “c” Λ R.C = S.C

X

R S

B,D

R.A = “c”

X

R S

R.C = S.C

Page 31: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

Rules: Project

Let: X = set of attributes

Y = set of attributes

XY = X U Y

xy (R) = x [y (R)]

Page 32: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

Let p = predicate with only R attribs

q = predicate with only S attribs

m = predicate with only R,S attribs

p (R S) =

q (R S) =

Rules: combined

[p (R)] S

R [q (S)]

Page 33: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

Apply Rewrite Rule (2)

B,D [ R.C=S.C [R.A=“c”(R)] X S]

B,D

R.A = “c”

X

R

S

R.C = S.C

B,D

R.A = “c”

X

R S

R.C = S.C

Page 34: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

Apply Rewrite Rule (3)

B,D [[R.A=“c”(R)] S]

B,D

R.A = “c”

R

S

B,D

R.A = “c”

X

R

S

R.C = S.CNatural join

Page 35: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

Rules: combined (continued)

pq (R S) = [p (R)] [q (S)]

pqm (R S) =

m [(p R) (q S)]pvq (R S) =

[(p R) S] U [R (q S)]

Page 36: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

p1p2 (R) p1 [p2 (R)]

p (R S) [p (R)] S

R S S R

x [p (R)] x {p [xz (R)]}

Which are “good” transformations?

Page 37: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

Conventional wisdom: do projects early

Example: R(A,B,C,D,E) P: (A=3) (B=“cat”)

E {p (R)} vs. E {p{ABE(R)}}

Page 38: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

But: What if we have A, B indexes?

B = “cat” A=3

Intersect pointers to get

pointers to matching tuples

Page 39: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

Bottom line:

• No transformation is always good

• Some are usually good: – Push selections down– Avoid cross-products if possible– Subqueries Joins

Page 40: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

Avoid Cross Products (if possible)

• Which join trees avoid cross-products?• If you can't avoid cross products, perform

them as late as possible

Select B,DFrom R,S,T,UWhere R.A = S.B R.C=T.C R.D = U.D

Page 41: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

More Query Rewrite Rules

• Transform one logical plan into another– Do not use statistics

• Equivalences in relational algebra• Push-down predicates• Do projects early• Avoid cross-products if possible• Use left-deep trees• Subqueries Joins • Use of constraints, e.g., uniqueness

Page 42: CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

Announcements• First homework will be posted next week

– I will send an email announcement when it is posted

• We will start discussing past projects and project ideas from next week– Your first goal is to have a concretely defined project by

the last week of September (the earlier the better!)– Small class single student projects– October and November should be used to work on the

projects– Final “demo’s” will be in the first week of December