1 Query Evaluation Partially using Prof. Hector Garcia-Molina’s slides (Notes06, Notes07)...
-
Upload
quinton-sharratt -
Category
Documents
-
view
234 -
download
3
Transcript of 1 Query Evaluation Partially using Prof. Hector Garcia-Molina’s slides (Notes06, Notes07)...
1
Query Evaluation
Partially using Prof. Hector Garcia-Molina’s slides (Notes06, Notes07)http://www-db.stanford.edu/~ullman/dscb.html
Donghui ZhangNortheastern University
2
Query Evaluation
SQL Query Query Result
SELECT E.NameFROM Emp EWHERE E.SSN<5000AND E.Age>50
Michael JordanDonghui Zhang
• Check the data and meta data;• Produce query result
Server
Michael JordanDonghui Zhang
???
3
Query Evaluation Steps
• Query Compiling: get logical Q.P.• Query Optimization: choose a physical
Q.P.• Query Execution: execute
4
parse
convert
apply laws
estimate result sizes
consider physical plans estimate costs
pick best
execute
{P1,P2,…..}
{(P1,C1),(P2,C2)...}
Pi
answerSQL query
parse tree
logical query plan
“ improved” l.q.p
l.q.p. +sizes
statistics
query compiling
query optimization
query execution
5
Query Compiling Parse
• Background knowledge: Grammar.• Input: SQL query.• Output: a parse tree.
• Start with a simple grammar:– Only SFW (no group by, having, nested query)– Simple AND condition (no OR, UNION, EXISTS, IN, …)– One table (no conditions like E.did=D.did)
SELECT E.NameFROM Emp EWHERE E.SSN<5000 AND E.Age>50
6
• <SFW> := SELECT <SelList> FROM <Table> WHERE <CondList>
• <SelList> := <Attribute> | <Attribute>, <SelList> • <CondList> := <Condition> | <Condition> AND
<CondList>• <Condition> := <Attribute> <op> <value>• <op>:= > | < | = | >= | <=
SELECT E.NameFROM Emp EWHERE E.SSN<5000 AND E.Age>50Query Compiling Parse
Grammar
7
SELECT E.NameFROM Emp EWHERE E.SSN<5000 AND E.Age>50Query Compiling Parse
Parse Tree
<SFW>
SELECT <SelList> FROM <Table> WHERE <CondList>
<Attribute> <op> <value>
E.SSN < 5000 <op> <value>
E.Age > 50
<Attribute>
<Condition>
Emp E<Attribute>
E.Name
<Condition>AND<CondList>
8
Query Compiling Convert
• Input: a parse tree.• Output: a logical query plan.
• Algorithm: followed by . E.Name(E.SSN<5000 AND E.Age>50(E) )
• Alternatively, a l.q.p tree.
SELECT E.NameFROM Emp EWHERE E.SSN<5000 AND E.Age>50
Emp E
E.SSN<5000 AND E.Age>50
E.Name
9
Query Compiling Apply Laws
• Replace with , push [and ] down.
• Only used for multiple tables. So skip.
SELECT E.NameFROM Emp EWHERE E.SSN<5000 AND E.Age>50
Emp E
E.SSN<5000 AND E.Age>50
E.Name
10
parse
convert
apply laws
estimate result sizes
consider physical plans estimate costs
pick best
execute
{P1,P2,…..}
{(P1,C1),(P2,C2)...}
Pi
answerSQL query
parse tree
logical query plan
“ improved” l.q.p
l.q.p. +sizes
statistics
query compiling
query optimization
query execution
11
Query Optimization Estimate Result Sizes
• The size of each input table is stored as meta data.
• Intermediate result: size not known, but needed to estimate I/O cost of physical plan.
• But for the simple case, can be evaluated on the fly. So no need to estimate the size of . So skip.
SELECT E.NameFROM Emp EWHERE E.SSN<5000 AND E.Age>50
12
Query Optimization Consider Physical Plans
• Associate each RA operator with an implementation scheme.
• Multiple implementation schemes? Enumerate all.
SELECT E.NameFROM Emp EWHERE E.SSN<5000 AND E.Age>50
Emp E
E.SSN<5000 AND E.Age>50
E.Name
Plan 1 (always work!)
scan
on-the-fly
13
Query Optimization Consider Physical Plans
• For the other physical plans, need to know what indices exist.
• Primary index: controls the actual storage of a table.– Suppose a primary B+-tree index exists on SSN.
• Secondary index: built on some other attribute. Does not store the actual record. Each leaf entry stores a set of page IDs in the primary index.– Suppose a secondary B+-tree index exists on Age.
SELECT E.NameFROM Emp EWHERE E.SSN<5000 AND E.Age>50
e.g. entry in Age index:
Age=50, pageIDs={1, 4, 6}
21 3 54 6
SSN index
14
Query Optimization Consider Physical Plans
SELECT E.NameFROM Emp EWHERE E.SSN<5000 AND E.Age>50
Emp E
E.SSN<5000 AND E.Age>50
E.Name
Plan 2
range search in SSN index
on-the-fly
15
Query Optimization Consider Physical Plans
SELECT E.NameFROM Emp EWHERE E.SSN<5000 AND E.Age>50
Emp E
E.SSN<5000 AND E.Age>50
E.Name
Plan 3
range search in Age index, follow pointers to SSN index
on-the-fly
16
Query Optimization Estimate Costs
• Estimate #I/Os for each physical plan.• Pick the cheapest one.
• Input: physical plan.• Additional input:
– meta data (e.g. how many levels a B+-tree has)– assumptions (e.g. the root node of every B+-tree is
pinned)– memory buffer size.
SELECT E.NameFROM Emp EWHERE E.SSN<5000 AND E.Age>50
17
Query Optimization Estimate Costs Meta Data
• All the database tables.• For each table R:
– Schema– T(R): #records in R– For every attribute A:
• V(R, A): #distinct values of A• min(R, A): minimum value of A• max(R, A): maximum value of A
– Primary index: #levels, #leaf nodes.– Secondary index: #levels, #leaf nodes, average
#pageIDs per leaf entry.
SELECT E.NameFROM Emp EWHERE E.SSN<5000 AND E.Age>50
18
Query Optimization Estimate Costs sample input
• Assume for table E:– Schema = (SSN: int, Name: string, Age: int, Salary: int) – T(E) = 100 tuples. – For attribute SSN:
• V(E, SSN)=100, min(E, SSN)=0000, max(E, SSN)=9999– For attribute Age:
• V(E, Age)=20, min(E, Age)=21, max(E, Age)=60– Primary index on SSN: 3 level B+-tree, 50 leaf nodes.– Secondary index on Age: 2 level B+-tree, 10 leaf nodes,
every leaf entry points to 3.5 pageIDs (on average).
• Assumptions: all B+-tree roots are pinned. Can reach the first leaf page of a B+-tree directly.
• Memory buffer size: 2 pages.
SELECT E.NameFROM Emp EWHERE E.SSN<5000 AND E.Age>50
19
Query Optimization Estimate Costs
• Cost = 50. (The primary index has 50 leaf nodes. Assume we can reach the first leaf page of a B+-tree directly.)
SELECT E.NameFROM Emp EWHERE E.SSN<5000 AND E.Age>50
Emp E
E.SSN<5000 AND E.Age>50
E.Name
Plan 1 (always work!)
scan
on-the-fly
20
Query Optimization Estimate Costs
SELECT E.NameFROM Emp EWHERE E.SSN<5000 AND E.Age>50
Emp E
E.SSN<5000 AND E.Age>50
E.Name
Plan 2
range search in SSN index
on-the-fly
• Cost = 25. SSN<5000 selects half of the employees, so 50/2=25 leaf nodes.
• Note: if condition is E.SSN>5000, needs 1 more I/O.
21
Query Optimization Estimate Costs
SELECT E.NameFROM Emp EWHERE E.SSN<5000 AND E.Age>50
Emp E
E.SSN<5000 AND E.Age>50
E.Name
Plan 3
range search in Age index, follow pointers to SSN index
on-the-fly
• Cost = 10/4 + 20/4 * 3.5 = 21.
#I/Os in the Age index #I/Os in the SSN index
22
Query Optimization Estimate Costs
SELECT E.NameFROM Emp EWHERE E.SSN<5000 AND E.Age>50
Emp E
E.SSN<5000 AND E.Age>50
E.Name
Plan 3
range search in Age index, follow pointers to SSN index
on-the-fly
• Cost = 10/4 + 20/4 * 3.5 = 21.
Age index has 10 leaf nodes. Check 1/4 of them, since [51,60] is 1/4 of [21,60].
23
Query Optimization Estimate Costs
SELECT E.NameFROM Emp EWHERE E.SSN<5000 AND E.Age>50
Emp E
E.SSN<5000 AND E.Age>50
E.Name
Plan 3
range search in Age index, follow pointers to SSN index
on-the-fly
• Cost = 10/4 + 20/4 * 3.5 = 21.
20 distinct ages divided by 4to get #ages in [51,60].
times 3.5 (#pageIDs per page)to get #I/Os in the SSN index.
24
Query Optimization Pick Best
SELECT E.NameFROM Emp EWHERE E.SSN<5000 AND E.Age>50
physical plan I/O cost
Plan 1: scan 50
Plan 2: range search SSN index
25
Plan 3: range search Age index
21
Pick!
25
parse
convert
apply laws
estimate result sizes
consider physical plans estimate costs
pick best
execute
{P1,P2,…..}
{(P1,C1),(P2,C2)...}
Pi
answerSQL query
parse tree
logical query plan
“ improved” l.q.p
l.q.p. +sizes
statistics
query compiling
query optimization
query execution
26
Another case study: two tables.
• Extended grammar:– Only SFW (no group by, having, nested query)– Simple AND condition (no OR, UNION, EXISTS, IN, …)– Allow two tables (allow conditions like E.did=D.did)
• Example query:SELECT E.Name, D.DnameFROM Emp E, Dept DWHERE E.Did=D.Did AND E.SSN<5000 AND
D.budget=1000
27
• <SFW> := SELECT <SelList> FROM <TableList> WHERE <CondList>
• <SelList> := <Attribute> | <Attribute>, <SelList> • <TableList> := <Table> | <Table>, <Table>• <CondList> := <Condition> | <Condition> AND
<CondList>• <Condition> := <Attribute> <op> <value> |
<Attribute> = <Attribute>• <op>:= > | < | = | >= | <=
Query Compiling Parse Grammar
SELECT E.Name, D.DnameFROM Emp E, Dept DWHERE E.Did=D.Did AND E.SSN<5000 AND D.budget=1000
28
Query Compiling Parse Parse Tree
SELECT E.Name, D.DnameFROM Emp E, Dept DWHERE E.Did=D.Did AND E.SSN<5000 AND D.budget=1000
<SFW>
SELECT <SelList> FROM<TableList>WHERE<CondList>
<Attribute>
E.Name
, <SelList>
<Attribute>
D.Dname
29
Query Compiling Parse Parse Tree
SELECT E.Name, D.DnameFROM Emp E, Dept DWHERE E.Did=D.Did AND E.SSN<5000 AND D.budget=1000
<SFW>
SELECT <SelList> FROM <CondList><TableList>WHERE
<Table> <Table>
Emp E Dept D
,
30
Query Compiling Parse Parse Tree
SELECT E.Name, D.DnameFROM Emp E, Dept DWHERE E.Did=D.Did AND E.SSN<5000 AND D.budget=1000
<SFW>
SELECT <SelList> FROM <CondList>
<Attribute> = <Attribute>
E.Did D.Did <Condition>
<Condition> AND <CondList>
<Condition>AND <CondList>
<TableList>WHERE
31
Query Compiling Convert
• Algorithm: then then .
E.Name. D.Dname(E.Did=D.Did AND E.SSN<5000 AND
D.budget=1000(ED) )
• The l.q.p tree:
SELECT E.Name, D.DnameFROM Emp E, Dept DWHERE E.Did=D.Did AND E.SSN<5000 AND D.budget=1000
Emp E
E.Did=D.Did AND E.SSN<5000 AND D.budget=1000
E.Name, D.Dname
Dept D
32
Query Compiling Apply Laws
• Always always: (try to) replace with !
SELECT E.Name, D.DnameFROM Emp E, Dept DWHERE E.Did=D.Did AND E.SSN<5000 AND D.budget=1000
Emp E
E.Did=D.Did AND E.SSN<5000 AND D.budget=1000
E.Name, D.Dname
Dept D
33
Query Compiling Apply Laws
• Always always: (try to) replace with !
• Also, push down.
SELECT E.Name, D.DnameFROM Emp E, Dept DWHERE E.Did=D.Did AND E.SSN<5000 AND D.budget=1000
Emp E
E.SSN<5000 AND D.budget=1000
E.Name, D.Dname
Dept D
34
Query Compiling Apply Laws
• Always always: (try to) replace with !
• Also, push down.
SELECT E.Name, D.DnameFROM Emp E, Dept DWHERE E.Did=D.Did AND E.SSN<5000 AND D.budget=1000
Emp E
E.SSN<5000 AND D.budget=1000
E.Name, D.Dname
Dept D
35
Query Compiling Apply Laws
• Always always: (try to) replace with !
• Also, push down.
SELECT E.Name, D.DnameFROM Emp E, Dept DWHERE E.Did=D.Did AND E.SSN<5000 AND D.budget=1000
Emp E
E.SSN<5000
E.Name, D.Dname
Dept D
D.budget=1000
36
Query Compiling Apply Laws Theory Behind
• Let p = predicate with only E attributes q = predicate with only D attributes m = E & D’s common attributes are equal• We have:
pqm (E D) = p(E) q(D)
SELECT E.Name, D.DnameFROM Emp E, Dept DWHERE E.Did=D.Did AND E.SSN<5000 AND D.budget=1000
37
parse
convert
apply laws
estimate result sizes
consider physical plans estimate costs
pick best
execute
{P1,P2,…..}
{(P1,C1),(P2,C2)...}
Pi
answerSQL query
parse tree
logical query plan
“ improved” l.q.p
l.q.p. +sizes
statistics
query compiling
query optimization
query execution
38
Query Optimization Consider Physical Plans
• Because join is so important, let’s skip result size estimation for now, and let’s assume selections are not pushed down.
SELECT E.Name, D.DnameFROM Emp E, Dept DWHERE E.Did=D.Did AND E.SSN<5000 AND D.budget=1000
Emp E
E.SSN<5000 AND D.budget=1000
E.Name, D.Dname
Dept D
39
Four Join Algorithms
• Iteration join (nested loop join)• Merge join• Hash join• Join with index
40
Example E D over common attribute Did
• E:– T(E)=10,000 – primary index on SSN, 3 levels. – |E|= 1,000 leaf nodes.
• D:– T(D)=5,000– primary index on Did. 3 levels.– |D| = 500 leaf nodes.
• Memory available = 101 blocks
41
Iteration Join
1. for every block in E2. scan through D;3. join records in the E block with records in the D block.
• I/O cost = |E| + |E| * |D| =
1000 + 1000*500 = 501,000.
• Works good for small buffer (e.g. two blocks).
42
• Can we do better?Use our memory(1) Read 100 blocks of E(2) Read all of D (using 1 block) + join(3) Repeat until done
• I/O cost = |E| + |E|/100 * |D| =
1000 + 10*500 = 6,000.
43
• Can we do better?Reverse join order: D E. i.e. For every 100 D blocks, go
through E.
• I/O cost = |D| + |D|/100 * |E| =
500 + 5*1000 = 5,500.
44
• Merge join (conceptually)(1) if R1 and R2 not sorted, sort them(2) i 1; j 1;
While (i T(R1)) (j T(R2)) do if R1{ i }.C = R2{ j }.C then
outputTuples else if R1{ i }.C > R2{ j }.C then j j+1 else if R1{ i }.C < R2{ j }.C then i i+1
45
Procedure Output-TuplesWhile (R1{ i }.C = R2{ j }.C) (i T(R1)) do
[jj j;
while (R1{ i }.C = R2{ jj }.C) (jj T(R2)) do
[output pair R1{ i }, R2{ jj };
jj jj+1 ]
i i+1 ]
46
Example
i R1{i}.C R2{j}.C j1 10 5 12 20 20 23 20 20 34 30 30 45 40 30 5
50 6 52 7
47
Merge Join Cost
• Recall that |E|=1000, |D|=500. And |D| is already sorted on Did.
• External sort E: pass 0, by reading and writing E, produces a file with 10 sorted runs. Another read is enough.
• No need to write! Can pipeline to join operator.
• Cost = 3*1000 + 500 = 3,500.
48
• Hash join (conceptual)– Hash function h, range 0 k– Buckets for R1: G0, G1, ... Gk– Buckets for R2: H0, H1, ... Hk
Algorithm(1) Hash R1 tuples into G buckets(2) Hash R2 tuples into H buckets(3) For i = 0 to k do
match tuples in Gi, Hi buckets
49
Simple example hash: even/odd
R1 R2 Buckets2 5 Even 4 4 R1 R23 12 Odd: 5 38 139 8
1114
2 4 8 4 12 8 14
3 5 9 5 3 13 11
50
Hash Join Cost
• Read + write both E and D for partitioning, then read to join.
• Cost = 3 * (1000 + 500) = 4,500.
51
• Join with index (Conceptually)
For each r E do
Find the corresponding D tuple by probing index.
• Assuming the root is pinned in memory,Cost = |E| + T(E)*2 = 1000 + 10,000*2 = 21,000.
52
Note:
• The costs are different if integrate selection conditions!
• E.g. for the index join, only check half of E. So should be 500+5,000*2=10,500.
• Selection condition which is not used during join should be evaluated to filter the join result. E.g. index join checked D without evaluating the selection condition on D.
53
physical plan with selections being pushed down
• Finally, let’s consider pushing down selections.• Now that the join operator takes intermediate
results (which could be written to disk), we need to estimate their sizes…
SELECT E.Name, D.DnameFROM Emp E, Dept DWHERE E.Did=D.Did AND E.SSN<5000 AND D.budget=1000
Emp E
E.SSN<5000
E.Name, D.Dname
Dept D
D.budget=1000
54
parse
convert
apply laws
estimate result sizes
consider physical plans estimate costs
pick best
execute
{P1,P2,…..}
{(P1,C1),(P2,C2)...}
Pi
answerSQL query
parse tree
logical query plan
“ improved” l.q.p
l.q.p. +sizes
statistics
query compiling
query optimization
query execution
55
Estimating result size
• Keep statistics for relation R– T(R) : # tuples in R– S(R) : # of bytes in each R tuple– V(R, A) : # distinct values in R for
attribute A– min(R, A)– max(R, A)
56
Example R A: 20 byte string
B: 4 byte integerC: 8 byte dateD: 5 byte string
A B C D
cat 1 10 a
cat 1 20 b
dog 1 30 a
dog 1 40 c
bat 1 50 d
T(R) = 5 S(R) = 37V(R,A) = 3 V(R,C) = 5V(R,B) = 1 V(R,D) = 4
57
Size estimates for W = R1 x R2
T(W) =
S(W) =
T(R1) T(R2)
S(R1) + S(R2)
58
S(W) = S(R)
T(W) = ?
Size estimate for W = A=a(R)
59
Example R V(R,A)=3
V(R,B)=1V(R,C)=5V(R,D)=4
W = z=val(R) T(W) =
A B C D
cat 1 10 a
cat 1 20 b
dog 1 30 a
dog 1 40 c
bat 1 50 d
T(R)V(R,Z)
60
Assumption:
Values in select expression Z = valare uniformly distributedover possible V(R,Z) values.
61
What about W = z val (R) ?
T(W) = ?
• T(W) = T(R)/2?
62
• Solution: Estimate values in range
Example R ZMin=1 V(R,Z)=10
W= z 16 (R)
Max=20
f = 5 (fraction of range) 20
T(W) = f T(R)
63
Size estimate for W = R1 R2
Let x = attributes of R1 y = attributes of R2
X Y =
Same as R1 x R2
Case 1
64
W = R1 R2 X Y = AR1 A B C R2 A D
Case 2
Assumption:
V(R1,A) V(R2,A) Every A value in R1 is in R2
V(R2,A) V(R1,A) Every A value in R2 is in R1
65
R1 A B C R2 A D
Computing T(W) when V(R1,A) V(R2,A)
Take 1 tuple Match
1 tuple matches with T(R2)
tuples... V(R2,A)
so T(W) = T(R2) T(R1) V(R2, A)
66
• V(R1,A) V(R2,A) T(W) = T(R2) T(R1)
V(R2,A)
• V(R2,A) V(R1,A) T(W) = T(R2) T(R1)
V(R1,A)
[A is common attribute]
67
T(W) = T(R2) T(R1)max{ V(R1,A), V(R2,A) }
In general W = R1 R2
68
S(W) = S(R1) + S(R2) - S(A) size of attribute
A
69
Note: for complex expressions, need
intermediate T,S,V results.
E.g. W = [A=a (R1) ] R2
Treat as relation U
T(U) = T(R1)/V(R1,A) S(U) = S(R1)
Also need V (U, *) !!
70
To estimate Vs
E.g., U = A=a (R1) Say R1 has attribs A,B,C,D
V(U, A) = V(U, B) =V(U, C) = V(U, D) =
71
Example R 1 V(R1,A)=3
V(R1,B)=1V(R1,C)=5V(R1,D)=3
U = A=a (R1)
A B C D
cat 1 10 10
cat 1 20 20
dog 1 30 10
dog 1 40 30
cat 1 50 10
V(U,A) =1 V(U,B) =1 V(U,C) = T(R1)
V(R1,A)V(U,D) ... somewhere in between
72
For an arbitrary attribute D other than A (the attribute being selected)V(R1,D) ranges from 1 to T(R1), andV(U,D) ranges from 1 to T(R1)/V(R1,A).
),1(/)1(
),(
)1(
),1(
ARVRT
DUV
RT
DRVLet’s make
Or, V(U,D) = V(R1,D)/V(R1,A)
73
For Joins U = R1(A,B) R2(A,C)
V(U,A) = min { V(R1, A), V(R2, A) }V(U,B) = V(R1, B)V(U,C) = V(R2, C)
74
Example:
Z = R1(A,B) R2(B,C) R3(C,D)
T(R1) = 1000 V(R1,A)=50 V(R1,B)=100
T(R2) = 2000 V(R2,B)=200 V(R2,C)=300
T(R3) = 3000 V(R3,C)=90 V(R3,D)=500
R1
R2
R3
75
T(U) = 10002000 V(U,A) = 50 200 V(U,B) = 100
V(U,C) = 300
Partial Result: U = R1 R2
76
Z = U R3
T(Z) = 100020003000 V(Z,A) = 50200300 V(Z,B) = 100
V(Z,C) = 90 V(Z,D) = 500
77
• E:– T(E)=10,000 – primary index on SSN, 3 levels. – |E|= 1,000 leaf nodes.– V(E,SSN)=10,000: from 0000 to 9999.
• D:– T(D)=5,000– primary index on Did. 3 levels.– |D| = 500 leaf nodes.– V(D,budget)=20: from 100 to 10,000.
• Memory available = 11 blocks• ?? What’s the best physical plan?
SELECT E.Name, D.DnameFROM Emp E, Dept DWHERE E.Did=D.Did AND E.SSN<5000 AND D.budget=1000
Example
Note: |E’| = 500|D’| = 25
78
SELECT E.Name, D.DnameFROM Emp E, Dept DWHERE E.Did=D.Did AND E.SSN<5000 AND D.budget=1000
l.q.p
Emp E
E.SSN<5000
E.Name, D.Dname
Dept D
D.budget=1000
79
SELECT E.Name, D.DnameFROM Emp E, Dept DWHERE E.Did=D.Did AND E.SSN<5000 AND D.budget=1000
p.q.p #1
Emp E
E.SSN<5000
E.Name, D.Dname
Dept D
D.budget=1000
range search scan
iteration join; D is outer table
Cost = 500 (read D)+ 25 (write D’)+ 25 + ceiling(25/10)*500
= 2050
80
SELECT E.Name, D.DnameFROM Emp E, Dept DWHERE E.Did=D.Did AND E.SSN<5000 AND D.budget=1000
p.q.p #2
Emp E
E.SSN<5000
E.Name, D.Dname
Dept D
D.budget=1000
range search scan
sort merge Cost = 5*500 (sort E’; no write)+ 500 (read D)
= 3000
81
SELECT E.Name, D.DnameFROM Emp E, Dept DWHERE E.Did=D.Did AND E.SSN<5000 AND D.budget=1000
p.q.p #3
Emp E
E.SSN<5000
E.Name, D.Dname
Dept D
D.budget=1000
range search scan
hash join Cost = 3*500 (for E’)+ 500 (read D)+ 25 (write D’)+ 3*25 (for D’)
= 3000
Note: M should be bigger than sqrt(min{|E’|, |D’|})+1. - Why? - What if not?
82
SELECT E.Name, D.DnameFROM Emp E, Dept DWHERE E.Did=D.Did AND E.SSN<5000 AND D.budget=1000
p.q.p #4
Emp E
E.SSN<5000
E.Name, D.Dname
Dept D
D.budget=1000
range search
index nested loop join
Cost = 500 (scan E’)+ 5000*(3-1) (for D)
= 10,500
83
Some notes
• For BNL, merge, hash joins: always push selection!
• For index join, do not push selection on the inner table (the one whose primary key is involved in the join condition).
• For BNL, make the smaller table be the outer table – join could be free if it fits in memory!