COMP 103 Hashing 2013-T2 Lecture 28 Thomas Kuehne School of Engineering and Computer Science,...
-
Upload
lorin-johnson -
Category
Documents
-
view
217 -
download
0
Transcript of COMP 103 Hashing 2013-T2 Lecture 28 Thomas Kuehne School of Engineering and Computer Science,...
![Page 1: COMP 103 Hashing 2013-T2 Lecture 28 Thomas Kuehne School of Engineering and Computer Science, Victoria University of Wellington Marcus Frean, Lindsay.](https://reader036.fdocuments.us/reader036/viewer/2022082505/56649ebb5503460f94bc3b62/html5/thumbnails/1.jpg)
CO
MP 1
03
Hashing
2013-T2 Lecture 28
Thomas KuehneSchool of Engineering and Computer Science, Victoria
University of Wellington
Marcus Frean, Lindsay Groves, Peter Andreae and Thomas Kuehne, VUW
![Page 2: COMP 103 Hashing 2013-T2 Lecture 28 Thomas Kuehne School of Engineering and Computer Science, Victoria University of Wellington Marcus Frean, Lindsay.](https://reader036.fdocuments.us/reader036/viewer/2022082505/56649ebb5503460f94bc3b62/html5/thumbnails/2.jpg)
2
RECAP-TODAY
RECAP Linked Structures, including trees, heaps
achieved perfect O(log n) insert/find performance
TODAY Mind-blowingly fast sorting
O(1) insert/find performance!
![Page 3: COMP 103 Hashing 2013-T2 Lecture 28 Thomas Kuehne School of Engineering and Computer Science, Victoria University of Wellington Marcus Frean, Lindsay.](https://reader036.fdocuments.us/reader036/viewer/2022082505/56649ebb5503460f94bc3b62/html5/thumbnails/3.jpg)
3
Linear Time Sorting Algorithm
Constant time per entry to sort
private int HashSort(int[] numbers) {
int[] present = new int[7];
for (int i = 0; i < numbers.length(); i++)
present[numbers[i]]++;}
Limitations elements must be integers
element value range must be limited
frequency data structure may be sparsely populated
5
3
5
2
6
1
numbers present
0
1
2
3
4
5
6
1
1
2
1
1
1
cf. BucketSor
t
![Page 4: COMP 103 Hashing 2013-T2 Lecture 28 Thomas Kuehne School of Engineering and Computer Science, Victoria University of Wellington Marcus Frean, Lindsay.](https://reader036.fdocuments.us/reader036/viewer/2022082505/56649ebb5503460f94bc3b62/html5/thumbnails/4.jpg)
4
Hashing
Fixing the limitations convert element into an integer
use a hash function to assign an integer to an element
Potential Set, Bag, Maps with constant time insert /
find!
Challenges how to compute the hash code? how to deal with collisions?
![Page 5: COMP 103 Hashing 2013-T2 Lecture 28 Thomas Kuehne School of Engineering and Computer Science, Victoria University of Wellington Marcus Frean, Lindsay.](https://reader036.fdocuments.us/reader036/viewer/2022082505/56649ebb5503460f94bc3b62/html5/thumbnails/5.jpg)
5
O(1) Sets with big values?
✔
We need a way to compute an index for an object:add(“2001 – A Space Odyssey”)
“Hashing”: compute the “hash code” of an object
0 1 2 3 4 5 6 7 8 9 581 N✔ ✗ ✔ ✔✗ ✗ ✗ ✗ ✗ ✗ ✗⋯ ⋯✗
Hash function 581
“2001 – A Space Odyssey”
![Page 6: COMP 103 Hashing 2013-T2 Lecture 28 Thomas Kuehne School of Engineering and Computer Science, Victoria University of Wellington Marcus Frean, Lindsay.](https://reader036.fdocuments.us/reader036/viewer/2022082505/56649ebb5503460f94bc3b62/html5/thumbnails/6.jpg)
6
O(1) Sets with big values?
But there are too many possible film titles!
Suppose the hash function always produces a number between 0 and 1000 ⇒ some film titles must end up with the same
number!
⇒ “Collision”
0 1 2 3 4 5 6 7 8 9 581 N✔ ✗ ✔ ✔✗ ✗ ✗ ✗ ✗ ✗ ✗⋯ ⋯✔✔
HASH
“Gravity”“2001 – A Space Odyssey”
HA
SH
![Page 7: COMP 103 Hashing 2013-T2 Lecture 28 Thomas Kuehne School of Engineering and Computer Science, Victoria University of Wellington Marcus Frean, Lindsay.](https://reader036.fdocuments.us/reader036/viewer/2022082505/56649ebb5503460f94bc3b62/html5/thumbnails/7.jpg)
7
Detecting collisions Store the item in the array, instead of a
boolean
Questions1. How to choose hash function that minimises
collisions?2. How to manage collisions when they occur?
0 1 2 3 4 5 6 7 8 9 581 N⋯ ⋯
“Gravity”“2001 – A Space
Odyssey”HA
SH
HASH
![Page 8: COMP 103 Hashing 2013-T2 Lecture 28 Thomas Kuehne School of Engineering and Computer Science, Victoria University of Wellington Marcus Frean, Lindsay.](https://reader036.fdocuments.us/reader036/viewer/2022082505/56649ebb5503460f94bc3b62/html5/thumbnails/8.jpg)
9
Computing Hash Codes
Wish list Summary for HashCode Function Should produce an integer
Should distribute the hash codes evenly through the range
minimises collisions
Should be fast to compute
Should take account of all components of the object
Must be consistent with equals() two items that are equal must have the same
hash value
Can we avoid clashes altogether? That would be perfect! perfect hash function
![Page 9: COMP 103 Hashing 2013-T2 Lecture 28 Thomas Kuehne School of Engineering and Computer Science, Victoria University of Wellington Marcus Frean, Lindsay.](https://reader036.fdocuments.us/reader036/viewer/2022082505/56649ebb5503460f94bc3b62/html5/thumbnails/9.jpg)
10
A Simple Hash Function for Strings
We could add up the codes of all the characters:
private int hash(String value) {int hashCode = 0;
for (int i = 0; i < value.length(); i++) hashCode += value.charAt(i);
return hashCode;}
Why is this not very good?
![Page 10: COMP 103 Hashing 2013-T2 Lecture 28 Thomas Kuehne School of Engineering and Computer Science, Victoria University of Wellington Marcus Frean, Lindsay.](https://reader036.fdocuments.us/reader036/viewer/2022082505/56649ebb5503460f94bc3b62/html5/thumbnails/10.jpg)
11
Example: Hashing course codes
418 ← DEAF101
419 ← DEAF102 DEAF201 ⋮
429 ← BBSC201 MDIA101
430 ← ECHI410 MDIA102 MDIA201
431 ← ECHI303 JAPA111 JAPA201 MDIA202 MDIA220 MDIA301
432 ← ARCH101 ASIA101 BBSC231 BBSC303 BBSC321 CHEM201 ECHI403 ECHI412 JAPA112 JAPA211 JAPA301 MDIA203 MDIA302 MDIA320 ⋮
450 ← ANTH412 ARCH389 ARTH111 BIOL228 BIOL327 BIOL372 CHEM489 COML304 COML403 COML421 COMP102 COMP201 CRIM313 CRIM421 DESN215 DESN233 ECON328 ECON409 ECON418 ECON508 EDUC449 EDUC458 EDUC548 EDUC557 ENGL228 ENGL408 ENGL426 ENGL435 ENGL444 ENGL453 FREN124 FREN331 FREN403 FREN412 GEOL362 GEOL407 GERM214 GERM403 GERM412 INFO213 INFO312 INFO402 ITAL206 ITAL215 LALS501 LATI404 LING224 LING323 LING404 MAOR102 MARK304 MARK403 MATH206
MATH314 MATH323 MATH431 MOFI403 PHIL104 PHIL203 PHIL302 PHIL320 PHIL401 PHIL410 RELI321 RELI411 SAMO101
⋮
a lot of collisions!
![Page 11: COMP 103 Hashing 2013-T2 Lecture 28 Thomas Kuehne School of Engineering and Computer Science, Victoria University of Wellington Marcus Frean, Lindsay.](https://reader036.fdocuments.us/reader036/viewer/2022082505/56649ebb5503460f94bc3b62/html5/thumbnails/11.jpg)
12
Better Hash Functions Make the contribution of each character depend on its
position:private int hash(String course) {
int k = 257;int hashCode = 0;
for (int i = 0; i < course.length(); i++)hashCode = hashCode * k + course.charAt(i);
return hashCode;}
hashCode(s) = k6x s0 + k5x s1 + k4x s2 + k3x s3 + k2x s4 + k1x s5 + s6
(it is best to use a prime number for the constant k)
![Page 12: COMP 103 Hashing 2013-T2 Lecture 28 Thomas Kuehne School of Engineering and Computer Science, Victoria University of Wellington Marcus Frean, Lindsay.](https://reader036.fdocuments.us/reader036/viewer/2022082505/56649ebb5503460f94bc3b62/html5/thumbnails/12.jpg)
13
Perfect Hash Functions Perfect hash function gives no collisions for a
given data set
Example - for VUW courses
private int hash(String course) { int hash = 0; for (int i = 0; i < course.length(); i++) hash = (hash * 51 + course.charAt(i)) % 72201; return hash;}
Building a perfect hash function is very difficult very specific to a particular set of possible
values only useful in very specialised circumstances
![Page 13: COMP 103 Hashing 2013-T2 Lecture 28 Thomas Kuehne School of Engineering and Computer Science, Victoria University of Wellington Marcus Frean, Lindsay.](https://reader036.fdocuments.us/reader036/viewer/2022082505/56649ebb5503460f94bc3b62/html5/thumbnails/13.jpg)
14
Dealing with Collisions
Two approaches Use a collection at each place
(“buckets” or “chaining”)
Look for an empty place in the hashtable(“probing” or “open addressing”)
0 1 2 3 4 5 6 7 8 9 581 N⋯ ⋯
“2001 – A Space Odyssey”
HA
SH
“Gravity”
HASH
![Page 14: COMP 103 Hashing 2013-T2 Lecture 28 Thomas Kuehne School of Engineering and Computer Science, Victoria University of Wellington Marcus Frean, Lindsay.](https://reader036.fdocuments.us/reader036/viewer/2022082505/56649ebb5503460f94bc3b62/html5/thumbnails/14.jpg)
15
Collisions: chaining / buckets Store a Set in each cell:
hash value → which set
Performance?
if the array is of size k, each subset will be about 1/kth of size()
cost ≈ cost(hashCode) + cost (subset)
ant fox
hen
dog
bee
kea
cow elk
owl
pig sow
tui
ape bat
bug cat
eel gnu
jay nit
ray
yak cod
roe
This is what Java's HashMap does.
If the sets get too big Rehash:double array size and reassign elements
![Page 15: COMP 103 Hashing 2013-T2 Lecture 28 Thomas Kuehne School of Engineering and Computer Science, Victoria University of Wellington Marcus Frean, Lindsay.](https://reader036.fdocuments.us/reader036/viewer/2022082505/56649ebb5503460f94bc3b62/html5/thumbnails/15.jpg)
16
Java and hashCode
All objects have a hashCode method and an equals method, so:
you can call equals on any object and you can put any object into a HashSet,
HashMap, … Many predefined objects (eg String) have good equals
and hashCode methods defined
The default equals method: compares references, i.e., equals is == if this is not what you want, define your own
equals method
The default hashCode returns an integer based on the reference (pointer
value) If you redefine equals, you should redefine hashCode
too!