Hashing 1. Def. Hash Table a data structure in which objects are associated with an integer and...

12
Hashing 1. Def. Hash Table a data structure in which objects are associated with an integer and stored in an array at that integer location (i.e. a key field is used to determine the index of the item). This provides constant insert and lookup efficiency. Ex. Student records stored in an array where each student is assigned an id no. and that number is used for the index. Are there any problems with this idea? Space may be wasted and insertions of new students are limited by the original size of array. Knowing the student id no. is not convenient. Applications that need quick lookup include 911 calls and airline control tower flight number tables.

Transcript of Hashing 1. Def. Hash Table a data structure in which objects are associated with an integer and...

Page 1: Hashing 1. Def. Hash Table a data structure in which objects are associated with an integer and stored in an array at that integer location (i.e. a key.

Hashing

1. Def. Hash Table a data structure in which objects are associated with an integer and stored in an array at that integer location (i.e. a key field is used to determine the index of the item). This provides constant insert and lookup efficiency.

Ex. Student records stored in an array where each student is assigned an id no. and that number is used for the index. Are there any problems with this idea?

Space may be wasted and insertions of new students are limited by the original size of array. Knowing the student id no. is not convenient.

Applications that need quick lookup include 911 calls and airline control tower flight number tables.

Page 2: Hashing 1. Def. Hash Table a data structure in which objects are associated with an integer and stored in an array at that integer location (i.e. a key.

2. Def. Hash Function - a function used to convert numbers from a large range into numbers in a small range.

(The key field is usually the large range and the index of the array is usually the small range.)

Ex. Dictionary of 50,000 words. Use the word itself as the key field, but code it numerically to determine a unique location to store the word in the array.

Let a = 1, b = 2, c = 3, …z = 26 and let positions of letters in the word have power of ten values:

Ex. dab = 4 * 102 + 1 * 101 + 2 * 100 = 412What size array would be needed to store these 50,000 words, if no word is longer than 10 characters?

Page 3: Hashing 1. Def. Hash Table a data structure in which objects are associated with an integer and stored in an array at that integer location (i.e. a key.

zzzzzzzzzz would have the code 28,888,888,890!(too big - bigger than largest int - no array could be that big) Also, if locations were chosen this way, there would be many many empty cells.

What size array should be needed for this dictionary?

100,000 - usually twice as large as the no. of items to allow room for collisions (def. obvious but coming up)

A hash function is needed to convert the numeric code to a smaller range.

Page 4: Hashing 1. Def. Hash Table a data structure in which objects are associated with an integer and stored in an array at that integer location (i.e. a key.

Commonly used hash function: index = largerange % arraysize

Ex. Hash the word gave to find its location in the array dictionary.

7*103 + 1*102 + 22*101 + 5*100 = 7325

Ex. Hash the word gaty to find its location in the array dictionary.

7*103 + 1*102 + 20*101 + 25*100 = 7325

COLLISION!

Page 5: Hashing 1. Def. Hash Table a data structure in which objects are associated with an integer and stored in an array at that integer location (i.e. a key.

4. There are 2 methods to resolve collisions:

3. Def. Collision - hashvalue of occupied cell occurs.

Def. Open addressing - in case of collision, search for or store in some other available cell.

Def. Separate chaining - install a linked list at each index of the array and insert all items that hash to an index at the beginning of the list.

Page 6: Hashing 1. Def. Hash Table a data structure in which objects are associated with an integer and stored in an array at that integer location (i.e. a key.

5. Types of open addressing:

Linear probe method - if collision occurs at index x, search locations x+1, x+2, etc.

Ex. Gaty would be stored in location 7326 (if available) otherwise location 7327, or 7328, etc.

Note: resolves collisions but primary clusters occur.

Quadratic probe method - search x+1, x+22, x+23 etc.Note: resolves primary clusters, but secondary clusters occur.

Page 7: Hashing 1. Def. Hash Table a data structure in which objects are associated with an integer and stored in an array at that integer location (i.e. a key.

Rehashing (also called double hashing) - when collision occurs determine step to search for available cell by hashing the key value again by a new function.Ex. Step = 5 - key % 5What steps result? 5,4,3,2,1How is this different from the linear & quadratic probe methods?

The step is different for different keys. Note: table size must be prime in order to probe all cells. (ex. size=20, step=5, x=0: 0,5,10,15,0,5, 10,15,… try size=19, step=5, x=0:

0,5,10,15,1,6,11,16,2,7,12,17,3,8,13,18,4,9,14

Page 8: Hashing 1. Def. Hash Table a data structure in which objects are associated with an integer and stored in an array at that integer location (i.e. a key.

Write code to increase a hash value by step.

Hashval += step

What do we do if a hash value becomes greater than the size of the array?

Wrap around: hashval %= arraysize

What do we do about duplicate key values?

Should not be allowed. When first item with key is found, search stops. Second item with same key would never be found. Select a key value that is unique to the item. (ex. Social security no.)

Page 9: Hashing 1. Def. Hash Table a data structure in which objects are associated with an integer and stored in an array at that integer location (i.e. a key.

How do we handle deletions?

Replace one field by -1 rather than replace entire object by null. Often object info may be needed in the future. Ex. Even when employee leaves, pension & tax info is needed. However, there is another reason in this code. Something undesirable occurs if the object is replaced by null. Demonstrate what and explain why.

What method requires this condition and why?

While (hashRay[hashVal] != null && hashRay[hashVal].iData != -1)

Page 10: Hashing 1. Def. Hash Table a data structure in which objects are associated with an integer and stored in an array at that integer location (i.e. a key.

6. Def. Load factor - the ratio of the no. of items in a hash table to the size of the table (array).

The more full a table is the worse clustering becomes. Therefore, hash tables should be designed to never become more than 1/2 to 2/3 full when open addressing is used.

7. When separate chaining is used to avoid collisions, is load factor a concern?

No. n items or more can be placed in a table of size n and the load factor will be 1 or more.(i.e.some locations will hold 1 or more items in its linked list.)

Page 11: Hashing 1. Def. Hash Table a data structure in which objects are associated with an integer and stored in an array at that integer location (i.e. a key.

How do we handle duplicates with separate chaining?

Duplicates are allowed and will be stored in the same list. Note: search process slows as list is searched linearly.

How do we handle deletions?

Deletions can be made from a linked list, if appropriate for the application, without empty cell problems resulting.

Page 12: Hashing 1. Def. Hash Table a data structure in which objects are associated with an integer and stored in an array at that integer location (i.e. a key.

7. What is the advantage of a hash table?

O(1) complexity to search for or insert an item (i.e. constant time regardless of the number of items).

8. Disadvantage?Must know size of array needed in advance (in Java arrays can not be resized - another bigger array would be needed). This problem is reduced when separate chaining is used.

Also, there is no way to access items in order.