CS4432: Database Systems II Transaction Management Motivation 1.
CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A....
-
date post
21-Dec-2015 -
Category
Documents
-
view
214 -
download
0
Transcript of CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A....
CS 4432 lecture #10 - indexing & hashing
1
CS4432: Database Systems IILecture #10
Professor Elke A. Rundensteiner
CS 4432 lecture #10 - indexing & hashing
2
1.1. B+-tree Odds and EndsB+-tree Odds and Ends2.2. Hashing (briefly)Hashing (briefly)
Chapter 4 – INDEXING Wrap-up
value
record
CS 4432 lecture #10 - indexing & hashing
3
Root
B+Tree Example n=3
100
120
150
180
30
3 5 11
30
35
100
101
110
120
130
150
156
179
180
200
CS 4432 lecture #10 - indexing & hashing
4
ComparisonB-tree vs. indexed seq.
file• Less space, so
lookup faster• Inserts managed
by overflow area• Requires
temporary restructuring
• Unpredictable performance
• Consumes more space, so lookup slower
•Each insert/delete potentially restructures
•Build-in restructuring
• Predictable performance
CS 4432 lecture #10 - indexing & hashing
5
• DBA does not know when to reorganize
• DBA does not know how full to loadpages of new index
B-trees better …
CS 4432 lecture #10 - indexing & hashing
6
• A la buffering… Is LRU a good policy for B+tree buffers?
Of course not!
Should try to keep root in memory at all times
(and perhaps some nodes from second level)
CS 4432 lecture #10 - indexing & hashing
7
Interesting problem:
For B+tree, how large should n be?
…
n is number of keys / node
CS 4432 lecture #10 - indexing & hashing
8
assumptions: n children per node and N records in database
(1) Time to read B-Tree node from disk is (tseek + tread*n) msec.(2) Once in main memory, use binary search to locate key, (a + b log_2 n) msec(3) Need to search (read) log_n (N) tree nodes
(4) t-search = (tseek + tread*n + (a + b*log_2(n)) * log n (N)
CS 4432 lecture #10 - indexing & hashing
9
Can get: f(n) = time to find a record
f(n)
nopt n
FIND nopt by f’(n) = 0
What happens to nopt as:•Disk gets faster? CPU get faster? …
CS 4432 lecture #10 - indexing & hashing
10
Bulk Loading of B+ Tree
• For large collection of records, create B+ tree.• Method 1: Repeatedly insert records slow.• Method 2: Bulk Loading more efficient.
CS 4432 lecture #10 - indexing & hashing
11
Bulk Loading of B+ Tree
• Initialization: – Sort all data entries – Insert pointer to first (leaf) page in new (root) page.
3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*
Sorted pages of data entries; not yet in B+ treeRoot
CS 4432 lecture #10 - indexing & hashing
12
Bulk Loading (Contd.)
• Index entries for leaf pages always entered into right-most index page
• When this fills up, it splits. (Split may go up right-
most path to root.)
Faster than repeated inserts, especially when one considers locking!
3* 4* 6* 9* 10*11* 12*13* 20*22* 23* 31* 35*36* 38*41* 44*
Root
Data entry pages
not yet in B+ tree3523126
10 20
3* 4* 6* 9* 10* 11* 12*13* 20*22* 23* 31* 35*36* 38*41* 44*
6
Root
10
12 23
20
35
38
not yet in B+ treeData entry pages
CS 4432 lecture #10 - indexing & hashing
13
Summary of Bulk Loading
• Method 1: multiple inserts.– Slow.– Does not give sequential storage of leaves.
• Method 2: Bulk Loading – Has advantages for concurrency control.– Fewer I/Os during build.– Leaves will be stored sequentially (and
linked) – Can control “fill factor” on pages.
CS 4432 lecture #10 - indexing & hashing
14
key h(key)
Hashing
<key>
.
.
Buckets(typically 1disk block)
CS 4432 lecture #10 - indexing & hashing
15
Example hash function
• Key = ‘x1 x2 … xn’ n byte character string
• Have b buckets• h: add x1 + x2 + ….. xn
– compute sum modulo b
CS 4432 lecture #10 - indexing & hashing
16
This may not be best function … Read Knuth Vol. 3 if you really
need to select a good function.
Good hash Expected number of function: keys/bucket is the
same for all buckets
CS 4432 lecture #10 - indexing & hashing
17
Within a bucket:
• Do we keep keys sorted?
• Yes, if CPU time critical & Inserts/Deletes not too frequent
CS 4432 lecture #10 - indexing & hashing
18
Next: example to illustrateinserts, overflows,
deletes
h(K)
CS 4432 lecture #10 - indexing & hashing
19
EXAMPLE 2 records/bucket
INSERT:h(a) = 1h(b) = 2h(c) = 1h(d) = 0
0
1
2
3
d
ac
b
h(e) = 1
e
CS 4432 lecture #10 - indexing & hashing
20
0
1
2
3
a
bc
e
d
EXAMPLE: deletion
Delete:ef
fg
maybe move“g” up
cd
CS 4432 lecture #10 - indexing & hashing
21
Rule of thumb:• Try to keep space utilization
between 50% and 80% Utilization = # keys used
total # keys that fit
• If < 50%, wasting space• If > 80%, overflows significant
depends on how good hashfunction is & on # keys/bucket
CS 4432 lecture #10 - indexing & hashing
22
How do we cope with growth?
• Overflows and reorganizations• Dynamic hashing
• Extensible hashing• Others …
CS 4432 lecture #10 - indexing & hashing
23
Extensible hashing : idea 1
(a) Use i of b bits output by hash function
b h(K)
use i grows over time….
00110101
CS 4432 lecture #10 - indexing & hashing
24
(b) Use directory
h(K)[i ] to bucket
.
.
.
.
Extensible hashing : idea 2
CS 4432 lecture #10 - indexing & hashing
25
Example: h(k) is 4 bits; 2 keys/bucket
i = 1
1
1
0001
1001
1100
Insert 1010
11100
1010
New directory
200
01
10
11
i =
2
2
01
CS 4432 lecture #10 - indexing & hashing
26
10001
21001
1010
21100
Insert:
0111
0000
00
01
10
11
2i =
Example continued
0111
0000
0111
0001
2
2
CS 4432 lecture #10 - indexing & hashing
27
00
01
10
11
2i =
21001
1010
21100
20111
20000
0001
Insert:
1001
Example continued
1001
1001
1010
000
001
010
011
100
101
110
111
3i =
3
3
CS 4432 lecture #10 - indexing & hashing
28
Extensible hashing: deletion
• Merge blocks and cut directory if possible
(Reverse insert procedure)
CS 4432 lecture #10 - indexing & hashing
29
Extensible hashing
Can handle growing files- with less wasted space- with no full reorganizations
Summary
+
Indirection(Not bad if directory in
memory)
Directory doubles in size(Now it fits, now it does not)
-
-
CS 4432 lecture #10 - indexing & hashing
30
• Hashing good for probes given keye.g., SELECT …
FROM RWHERE R.A = 5
Indexing vs Hashing
CS 4432 lecture #10 - indexing & hashing
31
• INDEXING (Including B Trees) good for
Range Searches:e.g., SELECT
FROM RWHERE R.A > 5
Indexing vs Hashing
CS 4432 lecture #10 - indexing & hashing
32
The BIG picture….
• Chapters 2 & 3: Storage, records, blocks...• Chapter 4 & 5: Access Mechanisms -
Indexes- B trees- Hashing- Multi key
• Chapter 6 & 7: Query Processing