Post on 26-Jan-2016
description
COSC 1030 Lecture 10COSC 1030 Lecture 10
Hash Table
TopicsTopics
TableHash ConceptHash FunctionResolve collisionComplexity Analysis
TableTable
Table– A collection of entries– Entry :<key, info>– Insert, search and delete– Update, and retrieve
Array representation– Indexed– Maps key to index
Hash TableHash Table Hash Table
– A table– Key range >> table size– Many-to-one mapping (hashing)– Indexed – hash code as index
Tabbed Address Book– Map names to A:Z– Multiple names start with same letter
Same tab, sequential slots
Hash Table ADTHash Table ADT
Interface Hashtable {
void insert(Item anItem);
Item search(Key aKey);
boolean remove(Key aKey);
boolean isFull();
boolean isEmpty();
}
Hash FunctionHash Function
Maps key to index evenlyFor any n in N,
hash(n) = n mod Mwhere M is the size of hash table.
hash(k*M + n) = n, where n < M, k: integerMap to integer first if key is not an integer
– A:Z 0:25String s h(s[0]) + h(s[1])*26 +…+ h(s[n-1])*26^(n-1)String s h(s[0])*26^(n-1) + …+h(s[n-1])
Hash FunctionHash Function
String s h(s[0])*26^(n-1) + …+h(s[n-1])
int toInt(String s) {
assert(s != null);
int c = 0;
for (int I = 0; I < s.length(); I ++) {
c = c*26 + toInt(s.charAt(I));
}
return c;
}
int hash(String s) { return hash(toInt(s)); }
Example Example
Table[7] – HASHTABLE_SIZE = 7 Insert ‘B2’, ‘H7’, ‘M12’, ‘D4’, ‘Z26’ into the table
2, 0, 5, 4, 5 Collision
– The slot indexed by hash code is already occupied
A simple solution– Sequentially decreases index until find an empty slot or
table is full
Collision PossibilityCollision Possibility
How often collision may occur? Insert 100 random number into a table of 200 slots 1 – ((200 – I)/200), I=0:99
= 1 – 6.66E-14 > 0.99999999999993 Load factor
– 100/200 = 0.5 = 50% 0.99999999999993– 20/ 200 = 0.1 = 10% 0.63– 10/200 = 0.05 = 5% 0.2
Default load factor is 75% in java Hashtable
Primary ClusterPrimary Cluster
The biggest solid block in hash tableJoin clustersThe bigger the primary cluster is, the easier
to growDistributed evenly to avoid primary cluster
Probe MethodProbe Method
What we can do when collision occurred?– A consistent way of searching for an empty slot– Probe
Linear probe – decrease index by 1, wrap up when 0 Double hash – use quotient to calculate decrement
– Max(1, (Key / M) % M)
Separate chaining – linked list to store collision items Hash tree – link to another hash table (A4)
Probe sequence coverageProbe sequence coverage
Ensure probe sequence cover all table– Utilizes the whole table– Even distribution– M and probe decrement are relative prime
No common factor except 1
– Makes M a prime number M and any decrement (< M) are relative prime
Probe MethodProbe Method
void insert(Item item) {
if(!isFull()) {
int index = probe(item.key);
assert(index >=0 && index < M);
table[index] = item;
count ++;
}
}
Linear Probe MethodLinear Probe Method int probe(int key) { int hashcode = key % HASHTABLE_SIZE;
if(table[hashcode] == null) { return hashcode;
} else { int index = hashcode;
do { index--; if(index < 0) index += HASHTABLE_SIZE;
} while (index != hashcode && table[index] != null); if(index == hashcode) return –1; else return index; }}
Double Hash Probe MethodDouble Hash Probe Method int probe(int key) {
int hashcode = key % HASHTABLE_SIZE;if(table[hashcode] == null) { return hashcode;
} else { int index = hashcode;
int dec = (key / HASHTABLE_SIZE) % HASHTABLE_SIZE; dec = Math.max(1, dec);
do { index -= dec; if(index < 0) index += HASHTABLE_SIZE;
} while (index != hashcode && table[index] != null); if(index == hashcode) return –1; else return index; }}
Search MethodSearch Method Item search(int key) {
int hashcode = key % HASHTABLE_SIZE;
int dec = max(1, (key / HASHTABLE_SIZE) % HASHTABLE_SIZE);
while(table[hashcode] != null) {
if(table[hashcode].key == key) break;
hashcode -= dec;
}
return table[hashcode];
}
Delete MethodDelete Method
Difficulty with delete when open addressing– Destroy hash probe chain
Solution– Set a deleted flag– Search takes it as occupied– Insert takes it as deleted– Forms primary cluster
Separate chaining– Move one up from chained structure
EfficiencyEfficiency Successful search
– Best case – first hit, one comparison– Average
Half of average length of probe sequence Load factor dependent O(1) if load factor < 0.5
– Worst case – longest probe sequence Load factor dependent
Unsuccessful search– Average - average length of probe sequence– Worst case - longest probe sequence
Advanced TopicsAdvanced Topics Choosing Hash Functions
– Generate hash code randomly and uniformly– Use all bits of the key– Assume K=b0b1b2b3– Division
h(k) = k % M; p(k) = max (1, (k / M) % M)
– Folding h(k) = b1^b3 % M; p(k) = b0^b2 % M; // XOR
– Middle squaring h(k) = (b1b2) ^ 2
– Truncating h(k) = b3;
Advanced TopicsAdvanced TopicsHash Tree
– Separate chained collision resolution– Recursively hashing the key
Hash Table
Hash Table Hash Table Hash Table
Hash Table
Hash Table
Hash TreeHash Treevoid insert(int key, Item item) {
Int h = h(key);Int k = g(key); // one-to-one mapping Key KeyIf(table[h] == null) {
table[h] = item;} else {
if(table[h].link == null) table[h].link = new HashTree();
table[h].link.insert(k, item);}
}