09 Rewrites
Transcript of 09 Rewrites
Relational Algebra Equivalencies
Feb 19, 2014
���1
Static Hashing
���2
012
...
N-1
h(k) % N
k
...
......
...
...
Primary Bucket Pages Overflow Pages(Contiguous) (Linked List)
Problem
���3
Overflow Pages are Expensive
Resizing a Hash Table is Slow
What do we do about it?
Extendible Hashing
���4
4,12,32,16
1,5,21,13
10
15,7,19
00011011
gd = 2A (ld=2)
B (ld=2)
C (ld=2)
D (ld=2)
Directory Data Pages
The directory and data pages have an associated “depth” (global/local)
To look up a value use the last gd bits of the key’s hash value as an index into the dir
Extendible Hashing
���5
4,12,32,16
1,5,21,13
10
15,7,19
00011011
A (ld=2)
B (ld=2)
C (ld=2)
D (ld=2)
Insert 20 (h(20) = 1100)
gd = 3
(Need to Split Bucket A)
0
100101110111
000
4,12,20
32,16
A2 (ld=3)
A (ld=3)
Dir entries not being split point to the same bucket
Extendible Hashing
���6
1,5,21,13
10
15,7,19
00011011
A (ld=2)
B (ld=2)
C (ld=2)
D (ld=2)
Insert 31 (h(31) = 1001)
gd = 3
(Need to Split Bucket B)
0
100101110111
000
4,12,20
32,16
A2 (ld=3)
A (ld=3)
Don’t need to double dir when splitting bucket w/ ld < gd
1,21
5,13,31
B2 (ld=3)
B (ld=3)
Linear Hashing
���7
0000000101001110010111011110001001
Next = 2Buckets Existing
at the start of this round
bucket to be split
Buckets already split this round
‘Split image’ buckets created this round
Level = 3 (23 = 8 Entries)
Linear Hashing:Lookups
���8
Next = 2
Level = 3 (23 = 8 Entries)
overflow overflow
overflowh(k) % 2Level = 4 (100)
0000000101001110010111011110001001
Next ≤ 4
Lookup K
Use entry 4
Linear Hashing:Lookups
���9
Next = 2
Level = 3 (23 = 8 Entries)
overflow overflow
overflowh(k) % 2Level = 1 (001)
0000000101001110010111011110001001
1 < Next
Lookup K
Use entry h(k) % 2(Level+1)
1 (1001) or 9 (1001)
10100010
Linear Hashing:Splitting
���10
Next = 2
Level = 3 (23 = 8 Entries)
overflow overflow
overflow
0000000101001110010111011110001001
Split Next
Increment Next
Partition on bit Level
Hashing Algorithms
• Indirection: Use pointers to avoid bulk copies of the entire hash table
• Extendible Hashing
• Partial: Incrementally grow the hash table.
• Linear Hashing
• Continuous: Allow bucket boundaries to move.
• Consistent Hashing
���11
Consistent Hashing
• Insight: Make split/merge faster by making bin boundaries deterministic, but random.
• Used mostly in distributed data-stores
• (Amazon, Facebook, …)
• Minimal applications to file-based storage.
���12
(‘Chord: A Scalable Peer-to-peer…’, Stoica et al.)
Consistent Hashing
���13
0 232-1
Consistent Hashing
���14
0232-1
231-1
230-1
229-1Modular Arithmetic (mod 232)
!
(232-1)+1 = 0
Numbers form a ‘Ring’
Consistent Hashing
���15
Assign each bucket a random point on the ring
A
B
Each bucket contains values that hash to ring
positions between its point and its predecessor
Hash Range of A
Hash Range of B
Consistent Hashing
���16
A
B
C
B
C
A
Consistent Hashing
���17
• Splits/Merges are cheap.
• At most 2 buckets are affected.
• No need for page duplication.
• Mapping hash value to bucket is expensive.
• Need to have a lookup mechanism/directory.
• Chord: Decentralized lookup mechanism.
Group Work
���18
Static Hashing vs
Extendible Hashing vs
Linear Hashing vs
Consistent Hashing
What advantages does each algorithm have What is a situation in which you might want to use each?
Recap: Using Indexes
���19
�(R.A>1)^(R.B<R.C)
Using Indexes 1 Selection Predicate + File Scan = Potential Index Scan
�(R.B<R.C)
IndexScan(R,R.A,Range(1,1))FileScan(R)
Recap: Using Indexes
���20
Using Indexes 2 Join + 2xFile Scan = Potential Index Nested Loop
FileScan(S)FileScan(R)
./R.A<S.B IndexNLJ(S, S.B,Range(R.A,1))
FileScan(R)
Index-Nested Loop
���21
foreach hA, . . .i 2 R
IndexNLJ(S, S.B,Range(R.A,1))
FileScan(R)
foreach hB, . . .i 2 IndexScan(S, S.B,Range(A,1))
emit hA, . . . , B, . . .i
Equivalencies
���22
≠(No Beard) (Beard)
=(Leonard Nimoy) (Zachary Quinto)
(good vs evil)
(different actor, same character)
Optimization
���23
• Rewriting Queries into Equivalent Forms
• Some rewrites are always good ideas
• Pushing down selections
• Pushing down projections
• Some rewrites might be good ideas
• Join reordering
• How do we get to these equivalent forms?
RA Equivalencies
���24
�c1(�c2(R)) ⌘ �c2(�c1(R))
�c1^...^cn(R) ⌘ �c1(. . . (�cn(R)))
R ./ (S ./ T ) ⌘ (R ./ S) ./ T
(R ./ S) ⌘ (S ./ R)
(Combinable)
(Commutative)
(Idempotent)
(Associative)
(Commutative)
Try It: Show that R ./ (S ./ T ) ⌘ (T ./ R) ./ S
Selection
Projection
Join (and Cross Product)
⇡a1(R) ⌘ ⇡a1(. . . (⇡an(R)))
Intermission���25
RA Equivalencies
���26
Selection commutes with Projection (but only if attribute set a and condition c are compatible)
a must include all columns referenced by c
⇡a(�c(R)) ⌘ �c(⇡a(R))
⇡a(�c(R)) ⌘ ⇡a(�c(⇡a[cols(c)(R)))Show that
RA Equivalencies
���27
Selection combines with Cross Product to form a Join as per the definition of Join
(Note: This only helps if we have a join algorithm for conditions like c)
�c(R⇥ S) ⌘ R ./c S
Show that
�R.B=S.B^R.A>3(R⇥ S) ⌘ �R.A>3(R ./R.B=S.B S)
RA Equivalencies
���28
Selection commutes with Cross Product (but only if condition c references attributes of R exclusively)
�c(R⇥ S) ⌘ (�c(R))⇥ S
Show that�c(R ./c0 S) ⌘ (�c(R)) ./c0 S
�R.B=S.B^R.A>3(R⇥ S) ⌘ (�R.A>3(R)) ./R.B=S.B S
RA Equivalencies
���29
Projection commutes (distributes) over Cross Product (where a1 and a2 are the attributes in a from R and S respectively)
⇡a(R⇥ S) ⌘ (⇡a1(R))⇥ (⇡a2(S))
Show that⇡a(R ./c S) ⌘ (⇡a1(R)) ./c (⇡a2(S))
(under what condition)How can we work around this limitation?
⇡a((⇡a1[(cols(c)\cols(R))(R)) ./c (⇡a2[(cols(c)\cols(S))(S)))
RA Equivalencies
���30
Union and Intersections are Commutative and Associative !
Selection and Projection both commute with both Union and Intersection
Enumerating Possible Plans
���31
• There are many algorithms suitable for processing a specific data set:
1.How do we identify feasible algorithms.
2.How do we choose between them