Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees...
-
Upload
jamya-hardaway -
Category
Documents
-
view
220 -
download
0
Transcript of Asst. Prof. Dr. İlker Kocabaş Hash Tables. 2 Overview Information Retrieval Binary Search Trees...
Asst. Prof. Dr. İlker Kocabaş
Hash Tables
2
Overview
Information RetrievalBinary Search TreesHashing.Applications.Example.Hash Functions.
Hash TablesCollisionsLinear ProbingProblems with Linear ProbChaining
3
R. Kruse, C. Tondo, B. Leung, “Data Structures and Program Design in C”, 1991, Prentice Hall.
E. Horowitz, S. Salini, S. Anderson-Freed, “Fundamentals of Data Structures in C”, 1993, Computer Science Press.
R. Sedgewick, “Algorithms in C”, 1990, Addison-Wesley.
A. Aho, J. Hopcroft, J. Ullman, “Data Structures and Algorithms”, 1983, Addison-Wesley.
T.A. Standish, “Data Structures, Algorithms & Software Principles in C”, 1995, Addison-Wesley.
D. Knuth, “The Art of Computer Programming”, 1975, Addison-Wesley.
Y. Langsam, M. Augenstein, M. Fenenbaum, “Data Structures using C and C++”, 1996, Prentice Hall.
Example: Bibliography
4
Insert the information into a Binary Search Tree,
using the first author’s surname as the key
5
Kruse
Horowitz Sedgewick
Aho Knuth
Langsam
Standish
Insert the information into a Binary Search Tree,
using the first author’s surname as the key
Kruse Horowitz Sedgewick Aho Knuth Langsam Standish
6
Complexity
Inserting Balanced Trees O(log(n)) Unbalanced Trees O(n)
Searching Balanced Trees O(log(n)) Unbalanced Trees O(n)
7
Hashing
key
hash function
0
1
2
3
TABLESIZE - 1
:
:
hash table
pos
8
“Kruse”
0
1
2
3
6
4
5
hash table
Example:
5
Kruse
hash function
9
Hashing
Each item has a unique key.Use a large array called a Hash Table.Use a Hash Function.
10
Applications
Databases.Spell checkers.Computer chess games.Compilers.
11
Operations
Initialize all locations in Hash Table are empty.
InsertSearchDelete
12
Hash Function
Maps keys to positions in the Hash Table.Be easy to calculate.Use all of the key.Spread the keys uniformly.
13
unsigned hash(char* s){ int i = 0; unsigned value = 0; while (s[i] != ‘\0’) { value = (s[i] + 31*value) % 101; i++; } return value;}
Example: Hash Function #1
14
A. Aho, J. Hopcroft, J. Ullman, “Data Structures and Algorithms”, 1983, Addison-Wesley.
‘A’ = 65
‘h’ = 104
‘o’ = 111
value = (65 + 31 * 0) % 101 = 65
value = (104 + 31 * 65) % 101 = 99
value = (111 + 31 * 99) % 101 = 49
Example: Hash Function #1
value = (s[i] + 31*value) % 101;
15
resultingtable is “sparse”
Example: Hash Function #1
value = (s[i] + 31*value) % 101;Hash
Key ValueAho 49Kruse 95Standish 60Horowitz 28Langsam 21Sedgewick 24Knuth 44
16
value = (s[i] + 1024*value) % 128;
Example: Hash Function #2
likely toresult in
“clustering”
Hash Key ValueAho 111Kruse 101Standish 104Horowitz 122Langsam 109Sedgewick 107Knuth 104
17
Example: Hash Function #3
“collisions”
value = (s[i] + 3*value) % 7;
Hash Key Value
Aho 0Kruse 5Standish 1Horowitz 5Langsam 5Sedgewick 2Knuth 1
18
Insert
Apply hash function to get a position.Try to insert key at this position.Deal with collision.
19
Aho Hash Function
0
1
2
3
6
4
5
hash table
Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth
0
Example: Insert
Aho
20
Kruse
0
1
2
3
6
4
5
hash table
Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth
5
Example: Insert
Aho
Kruse
Hash Function
21
Standish
0
1
2
3
6
4
5
hash table
Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth
1
Example: Insert
Aho
Kruse
StandishHash
Function
22
Search
Apply hash function to get a position.Look in that position.Deal with collision.
23
Kruse
Kruse
0
1
2
3
6
4
5
hash table
Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth
5
Example: Search
Aho
StandishHash
Function
found.
24
Kruse
Sedgwick
0
1
2
3
6
4
5
hash table
Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth
2
Example: Search
Aho
StandishHash
Function
Not found.
Hash Tables:Collision Resolution
26
Hashing
key
hash function
0
1
2
3
TABLESIZE - 1
:
:
hash table
pos
27
“Kruse”
0
1
2
3
6
4
5
hash table
Example:
5
Kruse
hash function
28
Hashing
Each item has a unique key.Uses a large array called a Hash Table.Uses a Hash Function.
Hash Function• Maps keys to positions in the Hash Table.• Be easy to calculate.• Use all of the key.• Spread the keys uniformly.
29
Hash Table Operations
Initialize all locations in Hash Table are empty.
InsertSearchDelete
30
Example: Hash Function #3
“collisions”
value = (s[i] + 3*value) % 7;
Hash Key Value
Aho 0Kruse 5Standish 1Horowitz 5Langsam 5Sedgewick 2Knuth 1
31
Collision
When two keys are mapped to the same position.Very likely.
10
20
30
40
50
60
70
0.1169
0.4114
0.7063
0.8912
0.9704
0.9941
0.9992
Number of People ProbabilityBirthdays
32
Collision Resolution
Two methods are commonly used: Linear Probing. Chaining.
33
Linear Probing
Linear search in the array from the position where collision occurred.
34
Insert with Linear Probing
Apply hash function to get a position.Try to insert key at this position.Deal with collision. Must also deal with a full table!
35
Aho Hash Function
0
1
2
3
6
4
5
hash table
Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth
0
Example: Insert with Linear Probing
Aho
36
Kruse
0
1
2
3
6
4
5
hash table
Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth
5
Example: Insert with Linear Probing
Aho
Kruse
Hash Function
37
Standish
0
1
2
3
6
4
5
hash table
Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth
1
Example: Insert with Linear Probing
Aho
Kruse
Standish
Hash Function
38
Horowitz
0
1
2
3
6
4
5
hash table
Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth
5
Example: Insert with Linear Probing
Aho
Kruse
Standish
Horowitz
Hash Function
39
Langsam
0
1
2
3
6
4
5
hash table
Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth
5
Example: Insert with Linear Probing
Aho
Kruse
Standish
Horowitz
Hash Function
Langsam
40
module linearProbe(item){ position = hash(key of item) count = 0 loop { if (count == hashTableSize) then { output “Table is full” exit loop } if (hashTable[position] is empty) then { hashTable[position] = item exit loop } position = (position + 1) % hashTableSize count++ }}
41
Search with Linear Probing
Apply hash function to get a position.Look in that position.Deal with collision.
42
Kruse
Langsam
0
1
2
3
6
4
5
hash table
Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth
5
Example: Search with Linear Probing
Aho
Standish
Horowitz
Hash Function
Langsam
found.
43
Kruse
Knuth
0
1
2
3
6
4
5
hash table
1
Example: Search with Linear Probing
Aho
Standish
Horowitz
Hash Function
Langsam
not found.
Aho, Kruse, Standish, Horowiz, Langsam, Sedgewick, Knuth
44
module search(target){ count = 0 position = hash(key of target) loop { if (count == hashTableSize) then { output “Target is not in Hash Table” return -1. } else if (hashTable[position] is empty) then { output “Item is not in Hash Table” return -1. } else if (hashTable[position].key == target) then { return position. } position = (position + 1) % hashTableSize count++ }}
45
Delete with Linear Probing
Use the search function to find the itemIf found check that items after that also don’t hash to the item’s positionIf items after do hash to that position, move them back in the hash table and delete the item.
Very difficult and time/resource consuming!Very difficult and time/resource consuming!
46
Linear Probing: Problems
Speed.Tendency for clustering to occur as the table becomes half full.Deletion of records is very difficult.If implemented in arrays – table may become full fairly quickly, resizing is time and resource consuming
47
Chaining
Uses a Linked List at each position in the Hash Table. Linked list at a position contains all the items that ‘hash’
to that position. May keep linked lists sorted or not.
48
hash table0
1
2
3
:
:
49
0
1
2
3
6
4
5
Aho, Kruse, Standish, Horowiz, Langsam, Sedgwick, Knuth
Example: Chaining
0, 5, 1, 5, 5, 2, 1
3
1
2
Kruse Horowitz
Knuth
1
Standish
Aho
Sedgewick
Langsam
0
0
0
50
Hashtable with ChainingAt each position in the array you have a list:
List hashTable[MAXTABLE];
You must initialise each list in the table.0
1
2
1
2
1
:
51
Insert with Chaining
Apply hash function to get a position in the array.Insert key into the Linked List at this position in the array.
52
module InsertChaining(item){ posHash = hash(key of item)
insert (hashTable[posHash], item); }
0
1
2
1
2 Knuth
1
Standish
Aho
Sedgewick:
53
Search with Chaining
Apply hash function to get a position in the array.Search the Linked List at this position in the array.
54
/* module returns NULL if not found, or the address of the * node if found */
module SearchChaining(item){ posHash = hash(key of item) Node* found;
found = searchList (hashTable[posHash], item);
return found;
}
54
0
1
2
1
2 Knuth
1
Standish
Aho
Sedgewick:
55
Delete with Chaining
Apply hash function to get a position in the array.Delete the node in the Linked List at this position in the array.
/* module uses the Linked list delete function to delete an item *inside that list, it does nothing if that item isn’t there. */
module DeleteChaining(item){ posHash = hash(key of item) deleteList (hashTable[posHash], item);}
56
0
1
2
1
2 Knuth
1
Standish
Aho
Sedgewick:
57
Disadvantages of Chaining
Uses more space.More complex to implement. Contains a linked list at every element in
the array. Requires linear searching. May be time consuming.
58
Advantages of Chaining
Insertions and Deletions are easy and quick.Allows more records to be stored.Naturally resizable, allows a varying number of records to be stored.
Dictionaries and Hash Tables 59
Double HashingDouble hashing uses a secondary hash function d(k) and handles collisions by placing an item in the first available cell of the series
(i jd(k)) mod N for j 0, 1, … , N 1The secondary hash function d(k) cannot have zero valuesThe table size N must be a prime to allow probing of all the cells
Common choice of compression map for the secondary hash function: d2(k) q k mod q
where q N q is a prime
The possible values for d2(k) are
1, 2, … , q
Dictionaries and Hash Tables 60
Performance of Hashing
In the worst case, searches, insertions and removals on a hash table take O(n) timeThe worst case occurs when all the keys inserted into the dictionary collideThe load factor nN affects the performance of a hash tableAssuming that the hash values are like random numbers, it can be shown that the expected number of probes for an insertion with open addressing is
1 (1 )
The expected running time of all the dictionary ADT operations in a hash table is O(1) In practice, hashing is very fast provided the load factor is not close to 100%Applications of hash tables:
small databases compilers browser caches