CPSC 461 Final Review I Hessam Zakerzadeh Dina Said.
-
Upload
trevor-gibbs -
Category
Documents
-
view
217 -
download
0
Transcript of CPSC 461 Final Review I Hessam Zakerzadeh Dina Said.
CPSC 461Final Review I
Hessam ZakerzadehDina Said
9.1) What is the most important difference between a disk and a tape?
9.1) What is the most important difference between a disk and a tape?
Tapes are sequential devices that do not support direct access to a desired page. We must essentially step through all pages in order.
Disks support direct access to a desired page.
Exercise 11.4 Answer the following questions about Linear Hashing:
1. How does Linear Hashing provide an average-case search cost of only slightly more than one disk I/O, given that overflow buckets are part of its data structure?
Linear Hashing
• No directory• More flexibility wrt time for bucket splits• Worse performance than Extendible Hashing if
data is skewed.• Utilizes a family of Hash function h0,h1,… such
that hi(v)=h(v) mod 2iN–N is the initial number of buckets–If N is power of 2d0, then apply h and look at the last
di bits → di=d0+1
Inserting a Data Entry in LH
Find bucket by applying hLevel/ hLevel+1: If the bucket to insert into is full:
Add overflow page and insert data entry. (Maybe) split Next bucket and increment Next.
Else simply insert the data entry into the bucket.
Bucket Split A split can be triggered by
the addition of a new overflow page conditions such as space utilization
Whenever a split is triggered, the Next bucket is split, and hash function hLevel+1 redistributes entries
between this bucket (say bucket number b) and its split image;
the split image is therefore bucket number b+NLevel.
Next Next + 1.
Example: Insert 44 (11100), 9 (01001)
0hh
1
Level=0, Next=0, N=4
00
01
10
11
000001
010011
PRIMARYPAGES
44*
36*
32*
25*
9*
5*
14*
18*
10*
30*
31*
35*
11*
7*
(This infois for illustrationonly!)
Next=0
Example: Insert 43 (101011)0
hh1
Level=0, N=4
00
01
10
11
000001
010011
Next=0
PRIMARYPAGES
44*
36*
32*
25*
9*
5*
14*
18*
10*
30*
31*
35*
11*
7*
(This infois for illustrationonly!)
0hh
1
Level=0
0001
10
11
000
001
010011
Next=1PRIMARYPAGES
OVERFLOWPAGE
S
00
100
44*
36*
32*
25*
9* 5*
14*
18*
10*
30*
31*
35*
11*
7* 43*
(This infois for illustrationonly!)
ç
Example: End of a Round
0hh1
22*
00
0110
11
000
001
010
011
00
100
Next=3
01
10
101
110
Level=0, Next = 3 PRIMARY
PAGES
OVERFLOWPAGES
32*
9*
5*
14*
25*
66* 10*18* 34*
35*31* 7* 11* 43*
44*36*
37*29*
30*
0hh1
37*
00
01
10
11
000
001
010
011
00
100
10
101
110
Next=0
111
11
PRIMARYPAGES
OVERFLOWPAGES
11
32*
9*
25*
66* 18* 10* 34*
35* 11*
44* 36*
5*
29*
43*
14* 30* 22*
31* 7*
50*
Insert 50 (110010)Level=1, Next = 0
Exercise 11.4 Answer the following questions about Linear Hashing:
1. How does Linear Hashing provide an average-case search cost of only slightly more than one disk I/O, given that overflow buckets are part of its data structure?
If we start with an index which has B buckets, during the round all the buckets will be split in order, one after the other.
A hash function is expected to distribute the search key values uniformly in all the buckets
A split can be triggered by Conditions such as space utilization → length of the overflow chain reduces.
Therefore, number of overflow pages isn't expect to be more than 1
Exercise 11.4 Answer the following questions about Linear Hashing:
Does Linear Hashing guarantee at most one disk access to retrieve a record with a given key value?
Exercise 11.4 Answer the following questions about Linear Hashing:
Does Linear Hashing guarantee at most one disk access to retrieve a record with a given key value?
No. Overflow chains are part of the structure, so no such guarantees are provided
Exercise 11.4 Answer the following questions about Linear Hashing:
If a Linear Hashing index using Alternative (1) for data entries contains N records, with P records per page and an average storage utilization of 80 percent, what is the worst-case cost for an equality search? Under what conditions would this cost be the actual search cost?
Maximum Number of records in each page = 0.8 * P
If all keys map to the same bucket We will have (N / 0.8P) pages in that bucket. This is the worst time
Exercise 11.4 Answer the following questions about Linear Hashing:
If the hash function distributes data entries over the space of bucket numbers in a very skewed (non-uniform) way, what can you say about the space utilization in data pages?
Space utilization = Total Number of buckets / Total Number of pages
If data is skewed: All records are mapped to the same bucket
Suppose that we have m main pages All records will be mapped to bucket 0 Each additional overflow will cause split Suppose we added n overflow pages to bucket 0 → we
added n buckets Total Number of buckets = n+1 Total Number of pages = m + n +n Space Utilization = (n+1) / (m+2n) < 50% → Very bad
13.4
10*10^6 pages Files320 pagesaverage seek time 10 ms, average rotational delay 5 msTransfer time 1 ms per pageThe page is 4K
For Pass 0: Ceil(10*10^6 / 320)= 31250 Runs Read Cost per Run = (10+5 + 1*320) Write Cost per Run = (10+5 + 1*320) Total I/O cost =
No of Runs * (Cost of read + Cost of Write)
= 31250 * 2* (15+320) → Cost of Pass 0
10*10^6 pages Files320 pagesaverage seek time 10 ms, average rotational delay 5 msTransfer time 1 ms per pageThe page is 4K
Total Cost for subsequent merges = No. of Passes * (Read Cost + Write Cost)
No. of passes = ceil (lognoOfWay31250)
= ceil ( ln 31250 / ln No. of ways) Read/Write Cost:
= No. of blocks * ( 10 + 5 + 1 * No. of pages per block) No. of blocks= Ceil (10*10^6 / No. of pages per block)
10*10^6 pages Files320 pagesaverage seek time 10 ms, average rotational delay 5 msTransfer time 1 ms per pageb) Create 256 ‘input’ buffers of 1 page each, create an ‘output’ buffer of 64 pages, and do 256-way merges.
Total Cost for subsequent merges = No. of Passes * (Read Cost + Write Cost) No. of passes =
= ceil ( ln 31250 / ln No. of ways) = ceil ( ln 31250 / ln 256) = 2
Read Cost:= 16 *10^7
10*10^6 pages Files320 pagesaverage seek time 10 ms, average rotational delay 5 msTransfer time 1 ms per pageb) Create 256 ‘input’ buffers of 1 page each, create an ‘output’ buffer of 64 pages, and do 256-way merges.
Write Cost:= No. of blocks * ( 10 + 5 + 1 * No. of pages per block) =
156250 * (15+64) No. of blocks= Ceil (10*10^6 / No. of pages per block) = ceil
(10* 10^6 /64) = 156250
10*10^6 pages Files320 pagesaverage seek time 10 ms, average rotational delay 5 msTransfer time 1 ms per pageb) Create 256 ‘input’ buffers of 1 page each, create an ‘output’ buffer of 64 pages, and do 256-way merges.
Total Cost for subsequent merges = No. of Passes * (Read Cost + Write Cost)
= 2* (16*10^7 + 156250 * (15+64))
10*10^6 pages Files320 pagesaverage seek time 10 ms, average rotational delay 5 msTransfer time 1 ms per pagee) Create four ‘input’ buffers of 64 pages each, create an ‘output’ buffer of 64 pages, and do four-way merges.
Total Cost for subsequent merges = No. of Passes * (Read Cost + Write Cost)
No. of passes = ceil ( ln 31250 / ln No. of ways) =8
Read/Write Cost:= No. of blocks * ( 10 + 5 + 1 * No. of pages per block) =
156250 * (15+64) No. of blocks= Ceil (10*10^6 / No. of pages per block) =
ceil (10* 10^6 /64) = 156250
Total Cost=8 * (2 * 156250 * (15+64))