Chapter 5 Hashing - Computer Science | William & Marytadavis/cs303/ch05sm.pdfChapter 5 Hashing...

18
Chapter 5 Hashing Introduction 2 hashing performs basic operations, such as insertion, deletion, and finds in average time Hashing 3 a hash table is merely an of some fixed size hashing converts into locations in a hash table searching on the key becomes something like array lookup hashing is typically a many-to-one map: multiple keys are mapped to the same array index mapping multiple keys to the same position results in a ____________ that must be resolved two parts to hashing: a hash function, which transforms keys into array indices a collision resolution procedure Hashing Functions 4 let be the set of search keys hash functions map into the set of in the hash table ideally, distributes over the slots of the hash table, to minimize collisions if we are hashing items, we want the number of items hashed to each location to be close to example: Library of Congress Classification System hash function if we look at the first part of the call numbers (e.g., E470, PN1995) collision resolution involves going to the stacks and looking through the books almost all of CS is hashed to QA75 and QA76 (BAD)

Transcript of Chapter 5 Hashing - Computer Science | William & Marytadavis/cs303/ch05sm.pdfChapter 5 Hashing...

Page 1: Chapter 5 Hashing - Computer Science | William & Marytadavis/cs303/ch05sm.pdfChapter 5 Hashing Introduction 2 hashing performs basic operations, such as insertion, deletion, and finds

Chapter 5Hashing

Introduction

2

hashing performs basic operations, such as insertion,deletion, and finds in average time

Hashing

3

a hash table is merely an of some fixed sizehashing converts into locations in a hashtable

searching on the key becomes something like arraylookup

hashing is typically a many-to-one map: multiple keys aremapped to the same array index

mapping multiple keys to the same position results in a____________ that must be resolved

two parts to hashing:a hash function, which transforms keys into array indicesa collision resolution procedure

Hashing Functions

4

let be the set of search keyshash functions map into the set of in thehash table

ideally, distributes over the slots ofthe hash table, to minimize collisionsif we are hashing items, we want the number of itemshashed to each location to be close to

example: Library of Congress Classification Systemhash function if we look at the first part of the call numbers(e.g., E470, PN1995)

collision resolution involves going to the stacks andlooking through the booksalmost all of CS is hashed to QA75 and QA76 (BAD)

Page 2: Chapter 5 Hashing - Computer Science | William & Marytadavis/cs303/ch05sm.pdfChapter 5 Hashing Introduction 2 hashing performs basic operations, such as insertion, deletion, and finds

Hashing Functions

5

suppose we are storing a set of nonnegative integersgiven , we can obtain hash values between 0 and 1 with thehash function

when is divided byfast operation, but we need to be careful when choosing

example: if , is just the lowest-order bits ofare all the hash values equally likely?

choosing to be a not too close to a power of 2works well in practice

Hashing Functions

6

we can also use the hash function below for floating pointnumbers if we interpret the bits as an ______________

two ways to do this in C, assuming long int and doubletypes have the same lengthfirst method uses C to accomplish this task

Hashing Functions

7

we can also use the hash function below for floating pointnumbers if we interpret the bits as an integer (cont.)second uses a , which is a variable that can holdobjects of different types and sizes

Hashing Functions

8

we can hash strings by combining a hash of each _________

is an additional parameter we get to chooseif is larger than any character value, then this approach iswhat you would obtain if you treated the string as a base-______________

Page 3: Chapter 5 Hashing - Computer Science | William & Marytadavis/cs303/ch05sm.pdfChapter 5 Hashing Introduction 2 hashing performs basic operations, such as insertion, deletion, and finds

Hashing Functions

9

K&R suggest a slightly simpler hash function, correspondingto

Weiss suggests

Hashing Functions

10

we can use the idea for strings if our search key has_____________ parts, say, street, city, state:

hash = ((street * R + city) % M) * R + state) % M;

same ideas apply to hashing vectors

Hash Functions

11

the choice of parameters can have a ____________ effecton the results of hashingcompare the text's string hashing algorithm for differentpairs of and

plot _______________ of the number of words hashed toeach hash table location; we use the American dictionaryfrom the aspell program as data (305,089 words)

Hash Functions

12

example:good: words are __________________________

Page 4: Chapter 5 Hashing - Computer Science | William & Marytadavis/cs303/ch05sm.pdfChapter 5 Hashing Introduction 2 hashing performs basic operations, such as insertion, deletion, and finds

Hash Functions

13

example:very bad

Hash Functions

14

example:better

Hash Functions

15

example:bad

Collision Resolution

16

hash table collisionoccurs when elements hash to the __________________in the tablevarious for dealing with collision

separate chainingopen addressinglinear probingother methods

Page 5: Chapter 5 Hashing - Computer Science | William & Marytadavis/cs303/ch05sm.pdfChapter 5 Hashing Introduction 2 hashing performs basic operations, such as insertion, deletion, and finds

Separate Chaining

17

separate chainingkeep a list of all elements that to the same locationeach location in the hash table is a _________________example: first 10 squares

Separate Chaining

18

insert, search, delete in listsall proportional to of linked listinsert

new elements can be inserted at of listduplicates can increment ______________

other structures could be used instead of listsbinary search treeanother hash table

linked lists good if table is and hash function isgood

Separate Chaining

19

how long are the linked lists in a hash table?value: where is the number of keys

and is the size of the tableis it reasonable to assume the hash table would exhibitthis behavior?

load factoraverage length of a listtime to search: time to evaluate the hashfunction + time to the list

unsuccessful search:successful search:

Separate Chaining

20

observationsmore important than table size

general rule: make the table as large as the number ofelements to be stored,keep table size prime to ensure good ________________

Page 6: Chapter 5 Hashing - Computer Science | William & Marytadavis/cs303/ch05sm.pdfChapter 5 Hashing Introduction 2 hashing performs basic operations, such as insertion, deletion, and finds

Separate Chaining

21

declaration of hash structure

Separate Chaining

22

hash member function

Separate Chaining

23

routines for separate chaining

Separate Chaining

24

routines for separate chaining

Page 7: Chapter 5 Hashing - Computer Science | William & Marytadavis/cs303/ch05sm.pdfChapter 5 Hashing Introduction 2 hashing performs basic operations, such as insertion, deletion, and finds

Open Addressing

25

linked lists incur extra coststime to for new cellseffort and complexity of defining second data structure

a different collision strategy involves placing colliding keysin nearby slots

if a collision occurs, try cells until an emptyone is foundbigger table size needed withload factor should be below

three common strategieslinear probingquadratic probingdouble hashing

Linear Probing

26

linear probing insert operationwhen is hashed, if slot is open, place thereif there is a collision, then start looking for an empty slotstarting with location in the hash table, andproceed through , , 0, 1, 2,

wrapping around the hash table, looking foran empty slot

search operation is similarchecking whether a table entry is vacant (or is one weseek) is called a ___________

Linear Probing

27

example: add 89, 18, 49, 58, 69 with and

Linear Probing

28

as long as the table is , a vacant cell can be foundbut time to locate an empty cell can become largeblocks of occupied cells results in primary _____________

deleting entries leaves __________some entries may no longer be foundmay require moving many other entries

expected number of probes

for search hits:

for insertion and search misses:

for , these values are and , respectively

Page 8: Chapter 5 Hashing - Computer Science | William & Marytadavis/cs303/ch05sm.pdfChapter 5 Hashing Introduction 2 hashing performs basic operations, such as insertion, deletion, and finds

Linear Probing

29

performance of linear probing (dashed) vs. more randomcollision resolution

adequate up toSuccessful, Unsuccessful, Insertion

Quadratic Probing

30

quadratic probingeliminates _______________________collision function is quadraticexample: add 89, 18, 49, 58, 69 with and

Quadratic Probing

31

in linear probing, letting table get nearly greatlyhurts performancequadratic probing

no of finding an empty cell once thetable gets larger than half fullat most, of the table can be used to resolvecollisionsif table is half empty and the table size is prime, then weare always guaranteed to accommodate a new elementcould end up with situation where all keys map to thesame table location

Quadratic Probing

32

quadratic probingcollisions will probe the same alternative cells

clusteringcauses less than half an extra probe per search

Page 9: Chapter 5 Hashing - Computer Science | William & Marytadavis/cs303/ch05sm.pdfChapter 5 Hashing Introduction 2 hashing performs basic operations, such as insertion, deletion, and finds

Double Hashing

33

double hashing

apply a second hash function to and probe across___________________________function must never evaluate to ____make sure all cells can be probed

Double Hashing

34

double hashing examplewith

is a prime smaller than table sizeinsert 89, 18, 49, 58, 69

Double Hashing

35

double hashing example (cont.)note here that the size of the table (10) is not primeif 23 inserted in the table, it would collide with 58

since and the table size is 10,only one alternative location, which is taken

Rehashing

36

table may get _______________run time of operations may take too longinsertions may for quadratic resolution

too many removals may be intermixed with insertionssolution: build a new table (with a newhash function)

go through original hash table to compute a hash valuefor each (non-deleted) elementinsert it into the new table

Page 10: Chapter 5 Hashing - Computer Science | William & Marytadavis/cs303/ch05sm.pdfChapter 5 Hashing Introduction 2 hashing performs basic operations, such as insertion, deletion, and finds

Rehashing

37

example: insert 13, 15, 24, 6 into a hash table of size 7with

Rehashing

38

example (cont.)insert 23table will be over 70% full; therefore, a new table iscreated

Rehashing

39

example (cont.)new table is size 17new hash functionall old elements are inserted into newtable

Rehashing

40

rehashing run time since elements and to rehashthe entire table of size roughly

must have been insertions since last rehashrehashing may run OK if in ________________

if interactive session, rehashing operation could producea slowdown

rehashing can be implemented with _________________could rehash as soon as the table is half fullcould rehash only when an insertion failscould rehash only when a certain ______ ___ is reached

may be best, as performance degrades as load factorincreases

Page 11: Chapter 5 Hashing - Computer Science | William & Marytadavis/cs303/ch05sm.pdfChapter 5 Hashing Introduction 2 hashing performs basic operations, such as insertion, deletion, and finds

Hash Tables with Worst-Case Access

41

hash tables so faraverage case for insertions, searches, and

deletionsseparate chaining: worst case

some queries will take nearly logarithmic timeworst-case time would be better

important for applications such as lookup tables forrouters and memory cachesif is known in advance, and elements can be_______________, worst-case time isachievable

Hash Tables with Worst-Case Access

42

perfect hashingassume all items known _________________separate chaining

if the number of lists continually increases, the lists willbecome shorter and shorterwith enough lists, high probability of ______________two problems

number of lists might be unreasonably __________the hashing might still be unfortunate

can be made large enough to have probabilityof no collisionsif collision detected, clear table and try again with adifferent hash function (at most done 2 times)

Hash Tables with Worst-Case Access

43

perfect hashing (cont.)how large must be?

theoretically, should be , which is ____________solution: use lists

resolve collisions by using hash tables instead oflinked lists

each of these lists can have elementseach secondary hash table will use a different hashfunction until it is ______________________

can also perform similar operation for primary hashtable

total size of secondary hash tables is at most

Hash Tables with Worst-Case Access

44

perfect hashing (cont.)example: slots 1, 3, 5, 7 empty; slots 0, 4, 8 have 1element each; slots 2, 6 have 2 elements each; slot 9has 3 elements

Page 12: Chapter 5 Hashing - Computer Science | William & Marytadavis/cs303/ch05sm.pdfChapter 5 Hashing Introduction 2 hashing performs basic operations, such as insertion, deletion, and finds

Hash Tables with Worst-Case Access

45

cuckoo hashingbound known for a long time

researchers surprised in 1990s to learn that if one of twotables were chosen as items were inserted,the size of the largest list would be , which issignificantly smallermain idea: use 2 tables

neither more than fulluse a separate hash function for eachitem will be stored in one of these two locationscollisions resolved by elements

Hash Tables with Worst-Case Access

46

cuckoo hashing (cont.)example: 6 items; 2 tables of size 5; each table hasrandomly chosen hash function

A can be placed at position 0 in Table 1 or position 2 inTable 2a search therefore requires at most 2 table accesses inthis exampleitem deletion is trivial

Hash Tables with Worst-Case Access

47

cuckoo hashing (cont.)insertion

ensure item is not already in one of the tablesuse first hash function and if first table location is___________, insert thereif location in first table is occupied

____________ element there and place current itemin correct position in first tabledisplaced element goes to its alternate hash positionin the second table

Hash Tables with Worst-Case Access

48

cuckoo hashing (cont.)example: insert A

insert B (displace A)

Page 13: Chapter 5 Hashing - Computer Science | William & Marytadavis/cs303/ch05sm.pdfChapter 5 Hashing Introduction 2 hashing performs basic operations, such as insertion, deletion, and finds

Hash Tables with Worst-Case Access

49

cuckoo hashing (cont.)insert C

insert D (displace C) and E

Hash Tables with Worst-Case Access

50

cuckoo hashing (cont.)insert F (displace E) (E displaces A)

(A displaces B) (B relocated)

Hash Tables with Worst-Case Access

51

cuckoo hashing (cont.)insert G

displacements are _______________G D B A E F C G

also results in a displacement cycle

Hash Tables with Worst-Case Access

52

cuckoo hashing (cont.)cycles

_______insertions should require < displacements

if a certain number of displacements is reached on aninsertion, tables can be with new hashfunctions

Page 14: Chapter 5 Hashing - Computer Science | William & Marytadavis/cs303/ch05sm.pdfChapter 5 Hashing Introduction 2 hashing performs basic operations, such as insertion, deletion, and finds

Hash Tables with Worst-Case Access

53

cuckoo hashing (cont.)variations

higher number of tables (3 or 4)place item in second hash slot immediately instead of________________ other itemsallow each cell to store keys

space utilization increased

Hash Tables with Worst-Case Access

54

cuckoo hashing (cont.)benefits

worst-case lookup and deletion timesavoidance of _____________________potential for __________________

potential issuesextremely sensitive to choice of hash functionstime for insertion increases rapidly as load factorapproaches 0.5

Hash Tables with Worst-Case Access

55

hopscotch hashingimproves on linear probing algorithm

linear probing tries cells in sequential order, startingfrom hash location, which can be long due to primaryand secondary clusteringinstead, hopscotch hashing places a bound on__________________ of the probe sequence

results in worst-case constant-time lookupcan be parallelized

Hash Tables with Worst-Case Access

56

hopscotch hashing (cont.)if insertion would place an element too far from itshash location, go backward and otherelements

evicted elements cannot be placed farther than themaximal length

each position in the table contains information aboutthe current element inhabiting it, plus others that________ to it

Page 15: Chapter 5 Hashing - Computer Science | William & Marytadavis/cs303/ch05sm.pdfChapter 5 Hashing Introduction 2 hashing performs basic operations, such as insertion, deletion, and finds

Hash Tables with Worst-Case Access

57

hopscotch hashing (cont.)example: MAX_DIST = 4

each bit string provides 1 bit of information about thecurrent position and the next 3 that follow

1: item hashes to current location; 0: no

Hash Tables with Worst-Case Access

58

hopscotch hashing (cont.)example: insert H in 9

try in position 13, but too far, so try candidates foreviction (10, 11, 12)evict G in 11

Hash Tables with Worst-Case Access

59

hopscotch hashing (cont.)example: insert I in 6

position 14 too far, so try positions 11, 12, 13G can move down oneposition 13 still too far; F can move down one

Hash Tables with Worst-Case Access

60

hopscotch hashing (cont.)example: insert I in 6

position 12 still too far, so try positions 9, 10, 11B can move down threenow slot is open for I, fourth from 6

Page 16: Chapter 5 Hashing - Computer Science | William & Marytadavis/cs303/ch05sm.pdfChapter 5 Hashing Introduction 2 hashing performs basic operations, such as insertion, deletion, and finds

Hash Tables with Worst-Case Access

61

universal hashingin principle, we can end up with a situation where all ofour keys are hashed to the in thehash table (bad)more realistically, we could choose a hash functionthat does not distribute the keysto avoid this, we can choose the hash function______________ so that it is independent of the keysbeing storedyields provably good performance on average

Hash Tables with Worst-Case Access

62

universal hashing (cont.)let be a finite collection of functions mappingour set of keys to the range }

is a collection if for each pair ofdistinct keys , the number of hash functions

for which is at mostthat is, with a randomly selected hash function ,the chance of a between distinct andis not more than the probability of a collision if

and were chosen randomly andindependently from }

Hash Tables with Worst-Case Access

63

universal hashing (cont.)example: choose a prime sufficiently large that everykey is in the range 0 to (inclusive)let andthen the family

is a universal class of hash functions

Hash Tables with Worst-Case Access

64

extendible hashingamount of data too large to fit in _____________

main consideration is then the number of diskaccesses

assume we need to store records andrecords fit in one disk blockcurrent problems

if probing or separate chaining is used, collisionscould cause to be examinedduring a searchrehashing would be expensive in this case

Page 17: Chapter 5 Hashing - Computer Science | William & Marytadavis/cs303/ch05sm.pdfChapter 5 Hashing Introduction 2 hashing performs basic operations, such as insertion, deletion, and finds

Hash Tables with Worst-Case Access

65

extendible hashing (cont.)allows search to be performed in disk accesses

insertions require a bit moreuse B-tree

as increases, height of B-tree ______________could make height = 1, but multi-way branchingwould be extremely high

Hash Tables with Worst-Case Access

66

extendible hashing (cont.)example: 6-bit integers

root contains 4 pointers determined by first 2 bitseach leaf has up to 4 elements

Hash Tables with Worst-Case Access

67

extendible hashing (cont.)example: insert 100100

place in third leaf, but fullsplit leaf into 2 leaves, determined by 3 bits

Hash Tables with Worst-Case Access

68

extendible hashing (cont.)example: insert 000000

first leaf split

Page 18: Chapter 5 Hashing - Computer Science | William & Marytadavis/cs303/ch05sm.pdfChapter 5 Hashing Introduction 2 hashing performs basic operations, such as insertion, deletion, and finds

Hash Tables with Worst-Case Access

69

extendible hashing (cont.)considerations

several directory may be required if theelements in a leaf agree in more than D+1 leadingbits

number of bits to distinguish bit stringsdoes not work well with (duplicates: does not work at all)

Hash Tables with Worst-Case Access

70

final pointschoose hash function carefullywatch _________________

separate chaining: close to 1probing hashing: 0.5

hash tables have some _______________not possible to find min/maxnot possible to for a string unless theexact string is known

binary search trees can do this, and is onlyslightly worse than

Hash Tables with Worst-Case Access

71

final points (cont.)hash tables good for

symbol tablegaming

remembering locations to avoid recomputingthrough transposition table

spell checkers