CPSC 461 Final Review I Hessam Zakerzadeh Dina Said.

CPSC 461Final Review I

Hessam ZakerzadehDina Said

9.1) What is the most important difference between a disk and a tape?

9.1) What is the most important difference between a disk and a tape?

Tapes are sequential devices that do not support direct access to a desired page. We must essentially step through all pages in order.

Disks support direct access to a desired page.

Exercise 11.4 Answer the following questions about Linear Hashing:

1. How does Linear Hashing provide an average-case search cost of only slightly more than one disk I/O, given that overflow buckets are part of its data structure?

Linear Hashing

• No directory• More flexibility wrt time for bucket splits• Worse performance than Extendible Hashing if

data is skewed.• Utilizes a family of Hash function h0,h1,… such

that hi(v)=h(v) mod 2iN–N is the initial number of buckets–If N is power of 2d0, then apply h and look at the last

di bits → di=d0+1

Inserting a Data Entry in LH

Find bucket by applying hLevel/ hLevel+1: If the bucket to insert into is full:

Add overflow page and insert data entry. (Maybe) split Next bucket and increment Next.

Else simply insert the data entry into the bucket.

Bucket Split A split can be triggered by

the addition of a new overflow page conditions such as space utilization

Whenever a split is triggered, the Next bucket is split, and hash function hLevel+1 redistributes entries

between this bucket (say bucket number b) and its split image;

the split image is therefore bucket number b+NLevel.

Next Next + 1.

Example: Insert 44 (11100), 9 (01001)

0hh

1

Level=0, Next=0, N=4

00

01

10

11

000001

010011

PRIMARYPAGES

44*

36*

32*

25*

9*

5*

14*

18*

10*

30*

31*

35*

11*

7*

(This infois for illustrationonly!)

Next=0

Example: Insert 43 (101011)0

hh1

Level=0, N=4

00

01

10

11

000001

010011

Next=0

PRIMARYPAGES

44*

36*

32*

25*

9*

5*

14*

18*

10*

30*

31*

35*

11*

7*


0hh

1

Level=0

0001

10

11

000

001

010011

Next=1PRIMARYPAGES

OVERFLOWPAGE

S

00

100

44*

36*

32*

25*

9* 5*

14*

18*

10*

30*

31*

35*

11*

7* 43*


ç

Example: End of a Round

0hh1

22*

00

0110

11

000

001

010

011

00

100

Next=3

01

10

101

110

Level=0, Next = 3 PRIMARY

PAGES

OVERFLOWPAGES

32*

9*

5*

14*

25*

66* 10*18* 34*

35*31* 7* 11* 43*

44*36*

37*29*

30*

0hh1

37*

00

01

10

11

000

001

010

011

00

100

10

101

110

Next=0

111

11

PRIMARYPAGES

OVERFLOWPAGES

11

32*

9*

25*

66* 18* 10* 34*

35* 11*

44* 36*

5*

29*

43*

14* 30* 22*

31* 7*

50*

Insert 50 (110010)Level=1, Next = 0


1. How does Linear Hashing provide an average-case search cost of only slightly more than one disk I/O, given that overflow buckets are part of its data structure?

If we start with an index which has B buckets, during the round all the buckets will be split in order, one after the other.

A hash function is expected to distribute the search key values uniformly in all the buckets

A split can be triggered by Conditions such as space utilization → length of the overflow chain reduces.

Therefore, number of overflow pages isn't expect to be more than 1


Does Linear Hashing guarantee at most one disk access to retrieve a record with a given key value?


Does Linear Hashing guarantee at most one disk access to retrieve a record with a given key value?

No. Overflow chains are part of the structure, so no such guarantees are provided


If a Linear Hashing index using Alternative (1) for data entries contains N records, with P records per page and an average storage utilization of 80 percent, what is the worst-case cost for an equality search? Under what conditions would this cost be the actual search cost?

Maximum Number of records in each page = 0.8 * P

If all keys map to the same bucket We will have (N / 0.8P) pages in that bucket. This is the worst time


If the hash function distributes data entries over the space of bucket numbers in a very skewed (non-uniform) way, what can you say about the space utilization in data pages?

Space utilization = Total Number of buckets / Total Number of pages

If data is skewed: All records are mapped to the same bucket

Suppose that we have m main pages All records will be mapped to bucket 0 Each additional overflow will cause split Suppose we added n overflow pages to bucket 0 → we

added n buckets Total Number of buckets = n+1 Total Number of pages = m + n +n Space Utilization = (n+1) / (m+2n) < 50% → Very bad

10*10^6 pages Files320 pagesaverage seek time 10 ms, average rotational delay 5 msTransfer time 1 ms per pageThe page is 4K

For Pass 0: Ceil(10*10^6 / 320)= 31250 Runs Read Cost per Run = (10+5 + 1*320) Write Cost per Run = (10+5 + 1*320) Total I/O cost =

No of Runs * (Cost of read + Cost of Write)

= 31250 * 2* (15+320) → Cost of Pass 0

10*10^6 pages Files320 pagesaverage seek time 10 ms, average rotational delay 5 msTransfer time 1 ms per pageThe page is 4K

Total Cost for subsequent merges = No. of Passes * (Read Cost + Write Cost)

No. of passes = ceil (lognoOfWay31250)

= ceil ( ln 31250 / ln No. of ways) Read/Write Cost:

= No. of blocks * ( 10 + 5 + 1 * No. of pages per block) No. of blocks= Ceil (10*10^6 / No. of pages per block)

10*10^6 pages Files320 pagesaverage seek time 10 ms, average rotational delay 5 msTransfer time 1 ms per pageb) Create 256 ‘input’ buffers of 1 page each, create an ‘output’ buffer of 64 pages, and do 256-way merges.

Total Cost for subsequent merges = No. of Passes * (Read Cost + Write Cost) No. of passes =

= ceil ( ln 31250 / ln No. of ways) = ceil ( ln 31250 / ln 256) = 2

Read Cost:= 16 *10^7


Write Cost:= No. of blocks * ( 10 + 5 + 1 * No. of pages per block) =

156250 * (15+64) No. of blocks= Ceil (10*10^6 / No. of pages per block) = ceil

(10* 10^6 /64) = 156250



= 2* (16*10^7 + 156250 * (15+64))

10*10^6 pages Files320 pagesaverage seek time 10 ms, average rotational delay 5 msTransfer time 1 ms per pagee) Create four ‘input’ buffers of 64 pages each, create an ‘output’ buffer of 64 pages, and do four-way merges.


No. of passes = ceil ( ln 31250 / ln No. of ways) =8

Read/Write Cost:= No. of blocks * ( 10 + 5 + 1 * No. of pages per block) =

156250 * (15+64) No. of blocks= Ceil (10*10^6 / No. of pages per block) =

ceil (10* 10^6 /64) = 156250

Total Cost=8 * (2 * 156250 * (15+64))

CPSC 461 Final Review I Hessam Zakerzadeh Dina Said.

Documents

Transcript of CPSC 461 Final Review I Hessam Zakerzadeh Dina Said.