Preparation Data Structures 09 hash tables
-
Upload
andres-mendez-vazquez -
Category
Education
-
view
25 -
download
2
Transcript of Preparation Data Structures 09 hash tables
Images/cinvestav-1.jpg
Outline1 Introduction2 Operation in a ADT dictionary
AddRemoveGetValueContainsIterators
3 Example4 Implementation5 Hash Tables
Number of KeysHash FunctionsHashing By Division
6 Overflow HandlingOpen AddressingChaining
2 / 127
Images/cinvestav-1.jpg
Dictionaries
DefinitionThe ADT dictionary—also called a map, table, or associativearray—contains entries that each have two parts:
A keyword—usually called a search key—such as an English word or aperson’s nameA value—such as a definition, an address, or a telephonenumber—associated with that key
3 / 127
Images/cinvestav-1.jpg
Dictionaries
DefinitionThe ADT dictionary—also called a map, table, or associativearray—contains entries that each have two parts:
A keyword—usually called a search key—such as an English word or aperson’s nameA value—such as a definition, an address, or a telephonenumber—associated with that key
3 / 127
Images/cinvestav-1.jpg
Dictionaries
DefinitionThe ADT dictionary—also called a map, table, or associativearray—contains entries that each have two parts:
A keyword—usually called a search key—such as an English word or aperson’s nameA value—such as a definition, an address, or a telephonenumber—associated with that key
3 / 127
Images/cinvestav-1.jpg
Example
Dictionary with DuplicatesPairs are of the form (word, meaning).
More than a single meaning(bolt, a threaded pin)(bolt, a crash of thunder)(bolt, to shoot forth suddenly)(bolt, a gulp)(bolt, a standard roll of cloth)etc.
4 / 127
Images/cinvestav-1.jpg
Example
Dictionary with DuplicatesPairs are of the form (word, meaning).
More than a single meaning(bolt, a threaded pin)(bolt, a crash of thunder)(bolt, to shoot forth suddenly)(bolt, a gulp)(bolt, a standard roll of cloth)etc.
4 / 127
Images/cinvestav-1.jpg
Example
Dictionary with DuplicatesPairs are of the form (word, meaning).
More than a single meaning(bolt, a threaded pin)(bolt, a crash of thunder)(bolt, to shoot forth suddenly)(bolt, a gulp)(bolt, a standard roll of cloth)etc.
4 / 127
Images/cinvestav-1.jpg
Example
Dictionary with DuplicatesPairs are of the form (word, meaning).
More than a single meaning(bolt, a threaded pin)(bolt, a crash of thunder)(bolt, to shoot forth suddenly)(bolt, a gulp)(bolt, a standard roll of cloth)etc.
4 / 127
Images/cinvestav-1.jpg
Example
Dictionary with DuplicatesPairs are of the form (word, meaning).
More than a single meaning(bolt, a threaded pin)(bolt, a crash of thunder)(bolt, to shoot forth suddenly)(bolt, a gulp)(bolt, a standard roll of cloth)etc.
4 / 127
Images/cinvestav-1.jpg
Example
Dictionary with DuplicatesPairs are of the form (word, meaning).
More than a single meaning(bolt, a threaded pin)(bolt, a crash of thunder)(bolt, to shoot forth suddenly)(bolt, a gulp)(bolt, a standard roll of cloth)etc.
4 / 127
Images/cinvestav-1.jpg
Example
Dictionary with DuplicatesPairs are of the form (word, meaning).
More than a single meaning(bolt, a threaded pin)(bolt, a crash of thunder)(bolt, to shoot forth suddenly)(bolt, a gulp)(bolt, a standard roll of cloth)etc.
4 / 127
Images/cinvestav-1.jpg
Operation in the ADT dictionary
Common operations with most databasesinsert adds a new entry to the dictionary, given a search key andassociated value.delete removes an entry, given its associated search keyretrieve retrieves the value associated with a given search keysearch sees whether the dictionary contains a given search keytraverse
I It traverse all the search keys in the dictionaryI It traverse all the values in the dictionary
6 / 127
Images/cinvestav-1.jpg
Operation in the ADT dictionary
Common operations with most databasesinsert adds a new entry to the dictionary, given a search key andassociated value.delete removes an entry, given its associated search keyretrieve retrieves the value associated with a given search keysearch sees whether the dictionary contains a given search keytraverse
I It traverse all the search keys in the dictionaryI It traverse all the values in the dictionary
6 / 127
Images/cinvestav-1.jpg
Operation in the ADT dictionary
Common operations with most databasesinsert adds a new entry to the dictionary, given a search key andassociated value.delete removes an entry, given its associated search keyretrieve retrieves the value associated with a given search keysearch sees whether the dictionary contains a given search keytraverse
I It traverse all the search keys in the dictionaryI It traverse all the values in the dictionary
6 / 127
Images/cinvestav-1.jpg
Operation in the ADT dictionary
Common operations with most databasesinsert adds a new entry to the dictionary, given a search key andassociated value.delete removes an entry, given its associated search keyretrieve retrieves the value associated with a given search keysearch sees whether the dictionary contains a given search keytraverse
I It traverse all the search keys in the dictionaryI It traverse all the values in the dictionary
6 / 127
Images/cinvestav-1.jpg
Operation in the ADT dictionary
Common operations with most databasesinsert adds a new entry to the dictionary, given a search key andassociated value.delete removes an entry, given its associated search keyretrieve retrieves the value associated with a given search keysearch sees whether the dictionary contains a given search keytraverse
I It traverse all the search keys in the dictionaryI It traverse all the values in the dictionary
6 / 127
Images/cinvestav-1.jpg
Operation in the ADT dictionary
Common operations with most databasesinsert adds a new entry to the dictionary, given a search key andassociated value.delete removes an entry, given its associated search keyretrieve retrieves the value associated with a given search keysearch sees whether the dictionary contains a given search keytraverse
I It traverse all the search keys in the dictionaryI It traverse all the values in the dictionary
6 / 127
Images/cinvestav-1.jpg
Operation in the ADT dictionary
Common operations with most databasesinsert adds a new entry to the dictionary, given a search key andassociated value.delete removes an entry, given its associated search keyretrieve retrieves the value associated with a given search keysearch sees whether the dictionary contains a given search keytraverse
I It traverse all the search keys in the dictionaryI It traverse all the values in the dictionary
6 / 127
Images/cinvestav-1.jpg
In addition
We have the following extra operationsDetect whether a dictionary is emptyGet the number of entries in the dictionaryRemove all entries from the dictionary
7 / 127
Images/cinvestav-1.jpg
In addition
We have the following extra operationsDetect whether a dictionary is emptyGet the number of entries in the dictionaryRemove all entries from the dictionary
7 / 127
Images/cinvestav-1.jpg
In addition
We have the following extra operationsDetect whether a dictionary is emptyGet the number of entries in the dictionaryRemove all entries from the dictionary
7 / 127
Images/cinvestav-1.jpg
Outline1 Introduction2 Operation in a ADT dictionary
AddRemoveGetValueContainsIterators
3 Example4 Implementation5 Hash Tables
Number of KeysHash FunctionsHashing By Division
6 Overflow HandlingOpen AddressingChaining
8 / 127
Images/cinvestav-1.jpg
Specifications: Add
Pseudocodeadd(key, value)
TaskIt adds the pair (key , value) to the dictionary.
Input and OutputInput: key is an object search key, value is an associated object.
Output: None.
9 / 127
Images/cinvestav-1.jpg
Specifications: Add
Pseudocodeadd(key, value)
TaskIt adds the pair (key , value) to the dictionary.
Input and OutputInput: key is an object search key, value is an associated object.
Output: None.
9 / 127
Images/cinvestav-1.jpg
Specifications: Add
Pseudocodeadd(key, value)
TaskIt adds the pair (key , value) to the dictionary.
Input and OutputInput: key is an object search key, value is an associated object.
Output: None.
9 / 127
Images/cinvestav-1.jpg
Outline1 Introduction2 Operation in a ADT dictionary
AddRemoveGetValueContainsIterators
3 Example4 Implementation5 Hash Tables
Number of KeysHash FunctionsHashing By Division
6 Overflow HandlingOpen AddressingChaining
11 / 127
Images/cinvestav-1.jpg
Specifications: Remove
Pseudocoderemove(key)
TaskIt removes from the dictionary the entry that corresponds to a given searchkey.
Input and OutputInput: key is an object search key.
Output: Returns either the value that was associated with the searchkey or null if no such object exists.
12 / 127
Images/cinvestav-1.jpg
Specifications: Remove
Pseudocoderemove(key)
TaskIt removes from the dictionary the entry that corresponds to a given searchkey.
Input and OutputInput: key is an object search key.
Output: Returns either the value that was associated with the searchkey or null if no such object exists.
12 / 127
Images/cinvestav-1.jpg
Specifications: Remove
Pseudocoderemove(key)
TaskIt removes from the dictionary the entry that corresponds to a given searchkey.
Input and OutputInput: key is an object search key.
Output: Returns either the value that was associated with the searchkey or null if no such object exists.
12 / 127
Images/cinvestav-1.jpg
Outline1 Introduction2 Operation in a ADT dictionary
AddRemoveGetValueContainsIterators
3 Example4 Implementation5 Hash Tables
Number of KeysHash FunctionsHashing By Division
6 Overflow HandlingOpen AddressingChaining
14 / 127
Images/cinvestav-1.jpg
Specifications: GetValue
PseudocodegetValue(key)
TaskIt retrieves from the dictionary the value that corresponds to a givensearch key.
Input and OutputInput: key is an object search key.
Output: Returns either the value associated with the search key ornull if no such object exists.
15 / 127
Images/cinvestav-1.jpg
Specifications: GetValue
PseudocodegetValue(key)
TaskIt retrieves from the dictionary the value that corresponds to a givensearch key.
Input and OutputInput: key is an object search key.
Output: Returns either the value associated with the search key ornull if no such object exists.
15 / 127
Images/cinvestav-1.jpg
Specifications: GetValue
PseudocodegetValue(key)
TaskIt retrieves from the dictionary the value that corresponds to a givensearch key.
Input and OutputInput: key is an object search key.
Output: Returns either the value associated with the search key ornull if no such object exists.
15 / 127
Images/cinvestav-1.jpg
Outline1 Introduction2 Operation in a ADT dictionary
AddRemoveGetValueContainsIterators
3 Example4 Implementation5 Hash Tables
Number of KeysHash FunctionsHashing By Division
6 Overflow HandlingOpen AddressingChaining
16 / 127
Images/cinvestav-1.jpg
Specifications: Contains
Pseudocodecontains(key)
TaskIt sees whether any entry in the dictionary has a given search key.
Input and OutputInput: key is an object search key.
Output: Returns true if an entry in the dictionary has key as itssearch key.
17 / 127
Images/cinvestav-1.jpg
Specifications: Contains
Pseudocodecontains(key)
TaskIt sees whether any entry in the dictionary has a given search key.
Input and OutputInput: key is an object search key.
Output: Returns true if an entry in the dictionary has key as itssearch key.
17 / 127
Images/cinvestav-1.jpg
Specifications: Contains
Pseudocodecontains(key)
TaskIt sees whether any entry in the dictionary has a given search key.
Input and OutputInput: key is an object search key.
Output: Returns true if an entry in the dictionary has key as itssearch key.
17 / 127
Images/cinvestav-1.jpg
Outline1 Introduction2 Operation in a ADT dictionary
AddRemoveGetValueContainsIterators
3 Example4 Implementation5 Hash Tables
Number of KeysHash FunctionsHashing By Division
6 Overflow HandlingOpen AddressingChaining
18 / 127
Images/cinvestav-1.jpg
Specifications: GetKeyIterator
PseudocodegetKeyIterator()
TaskIt creates an iterator that traverses all search keys in the dictionary.
Input and OutputInput: None.
Output: Returns an iterator that provides sequential access to thesearch keys in the dictionary.
19 / 127
Images/cinvestav-1.jpg
Specifications: GetKeyIterator
PseudocodegetKeyIterator()
TaskIt creates an iterator that traverses all search keys in the dictionary.
Input and OutputInput: None.
Output: Returns an iterator that provides sequential access to thesearch keys in the dictionary.
19 / 127
Images/cinvestav-1.jpg
Specifications: GetKeyIterator
PseudocodegetKeyIterator()
TaskIt creates an iterator that traverses all search keys in the dictionary.
Input and OutputInput: None.
Output: Returns an iterator that provides sequential access to thesearch keys in the dictionary.
19 / 127
Images/cinvestav-1.jpg
Specifications: GetValueIterator
PseudocodegetValueIterator()
TaskIt creates an iterator that traverses all values in the dictionary.
Input and OutputInput: None.
Output: Returns an iterator that provides sequential access to thevalues in the dictionary.
20 / 127
Images/cinvestav-1.jpg
Specifications: GetValueIterator
PseudocodegetValueIterator()
TaskIt creates an iterator that traverses all values in the dictionary.
Input and OutputInput: None.
Output: Returns an iterator that provides sequential access to thevalues in the dictionary.
20 / 127
Images/cinvestav-1.jpg
Specifications: GetValueIterator
PseudocodegetValueIterator()
TaskIt creates an iterator that traverses all values in the dictionary.
Input and OutputInput: None.
Output: Returns an iterator that provides sequential access to thevalues in the dictionary.
20 / 127
Images/cinvestav-1.jpg
Other Operations
isEmpty()It sees whether the dictionary is empty.
getSize()It gets the size of the dictionary.
clear()It removes all entries from the dictionary.
21 / 127
Images/cinvestav-1.jpg
Other Operations
isEmpty()It sees whether the dictionary is empty.
getSize()It gets the size of the dictionary.
clear()It removes all entries from the dictionary.
21 / 127
Images/cinvestav-1.jpg
Other Operations
isEmpty()It sees whether the dictionary is empty.
getSize()It gets the size of the dictionary.
clear()It removes all entries from the dictionary.
21 / 127
Images/cinvestav-1.jpg
However, we have two scenarios
Distinct search keysCase 1 You can refuse to add another key-value.Case 2 You can change the existing value associated with key to
the new value. Then, you return the old value
Duplicate search keysif the method add adds every given key-value entry to a dictionary
The methods remove and getValue must deal with multiple entriesthat have the same search key.What do you remove or return!!!
22 / 127
Images/cinvestav-1.jpg
However, we have two scenarios
Distinct search keysCase 1 You can refuse to add another key-value.Case 2 You can change the existing value associated with key to
the new value. Then, you return the old value
Duplicate search keysif the method add adds every given key-value entry to a dictionary
The methods remove and getValue must deal with multiple entriesthat have the same search key.What do you remove or return!!!
22 / 127
Images/cinvestav-1.jpg
However, we have two scenarios
Distinct search keysCase 1 You can refuse to add another key-value.Case 2 You can change the existing value associated with key to
the new value. Then, you return the old value
Duplicate search keysif the method add adds every given key-value entry to a dictionary
The methods remove and getValue must deal with multiple entriesthat have the same search key.What do you remove or return!!!
22 / 127
Images/cinvestav-1.jpg
However, we have two scenarios
Distinct search keysCase 1 You can refuse to add another key-value.Case 2 You can change the existing value associated with key to
the new value. Then, you return the old value
Duplicate search keysif the method add adds every given key-value entry to a dictionary
The methods remove and getValue must deal with multiple entriesthat have the same search key.What do you remove or return!!!
22 / 127
Images/cinvestav-1.jpg
However, we have two scenarios
Distinct search keysCase 1 You can refuse to add another key-value.Case 2 You can change the existing value associated with key to
the new value. Then, you return the old value
Duplicate search keysif the method add adds every given key-value entry to a dictionary
The methods remove and getValue must deal with multiple entriesthat have the same search key.What do you remove or return!!!
22 / 127
Images/cinvestav-1.jpg
Interface
We have the following interface
impor t j a v a . u t i l . I t e r a t o r ;
p u b l i c i n t e r f a c e D i c t i o n a r y I n t e r f a c e <Key , Value>{
p u b l i c Value add (Key k , Value Item ) ;p u b l i c Value remove (Key k ) ;p u b l i c Value ge tVa lue (Key k ) ;p u b l i c boo l ean c o n t a i n s (Key k ) ;p u b l i c I t e r a t o r <Key> g e tK e y I t e r a t o r ( ) ;p u b l i c I t e r a t o r <Value> g e t V a l u e I t e r a t o r ( ) ;p u b l i c boo l ean isEmpty ( ) ;p u b l i c i n t g e t S i z e ( ) ;p u b l i c vo i d c l e a r ( ) ;
}
23 / 127
Images/cinvestav-1.jpg
Where we can use this ADT?
In the phone directory problemIt is a directory that uses a name as the key and adds and returns a phonenumber
For exampleName Number
Suzanne Nouveaux 401-555-1234Andres Mendez-Vazquez 301-123-2345
... ...
24 / 127
Images/cinvestav-1.jpg
Where we can use this ADT?
In the phone directory problemIt is a directory that uses a name as the key and adds and returns a phonenumber
For exampleName Number
Suzanne Nouveaux 401-555-1234Andres Mendez-Vazquez 301-123-2345
... ...
24 / 127
Images/cinvestav-1.jpg
Thus, we have the following diagram
A class Diagram
TelephoneDirectory
readFile(data)getPhoneNumber(name)
PhoneDirectory
DictionaryName
String
1 1
*
* *
*
25 / 127
Images/cinvestav-1.jpg
Basic Code
We have something like this
p u b l i c c l a s s Te l e phoneD i r e c t o r y{
p r i v a t e D i c t i o n a r y I n t e r f a c e <Name , S t r i ng> phoneBook ;p u b l i c Te l e phoneD i r e c t o r y ( ){
phoneBook = new So r t e dD i c t i o n a r y <Name , S t r i ng >() ;} // end d e f a u l t c o n s t r u c t o r
/∗∗ Reads a t e x t f i l e o f names and t e l e p h o n e numbers∗∗/p u b l i c vo i d r e a d F i l e ( Scanner data ){ . . . }/∗∗ Gets the phone number o f a g i v e n pe r son . ∗/p u b l i c S t r i n g getPhoneNumber (Name personName ){ . . . } // end getPhoneNumber
} // end T e l e p h o n e D i r e c t o r y
26 / 127
Images/cinvestav-1.jpg
Now, the Big Question
It is a big oneHow do we implement this data structure?
Possible waysLinear ListSkip ListHash Tables...
27 / 127
Images/cinvestav-1.jpg
Now, the Big Question
It is a big oneHow do we implement this data structure?
Possible waysLinear ListSkip ListHash Tables...
27 / 127
Images/cinvestav-1.jpg
Now, the Big Question
It is a big oneHow do we implement this data structure?
Possible waysLinear ListSkip ListHash Tables...
27 / 127
Images/cinvestav-1.jpg
Now, the Big Question
It is a big oneHow do we implement this data structure?
Possible waysLinear ListSkip ListHash Tables...
27 / 127
Images/cinvestav-1.jpg
Now, the Big Question
It is a big oneHow do we implement this data structure?
Possible waysLinear ListSkip ListHash Tables...
27 / 127
Images/cinvestav-1.jpg
First: Represent As A Linear List
You haveL = (e0, e1, ..., en−1)
WhereEach ei is a pair (key, element).
We can use the following representationsArray or linked representation.
28 / 127
Images/cinvestav-1.jpg
First: Represent As A Linear List
You haveL = (e0, e1, ..., en−1)
WhereEach ei is a pair (key, element).
We can use the following representationsArray or linked representation.
28 / 127
Images/cinvestav-1.jpg
First: Represent As A Linear List
You haveL = (e0, e1, ..., en−1)
WhereEach ei is a pair (key, element).
We can use the following representationsArray or linked representation.
28 / 127
Images/cinvestav-1.jpg
Array Representation
Our classic array
b c d e a
We have thenOperation in Array Representation Complexity
getValue(theKey) O(size)add(theKey, theItem) O(size) to find duplicate
O(1) to add at right endremove(theKey) O(size)
29 / 127
Images/cinvestav-1.jpg
Array Representation
Our classic array
b c d e a
We have thenOperation in Array Representation Complexity
getValue(theKey) O(size)add(theKey, theItem) O(size) to find duplicate
O(1) to add at right endremove(theKey) O(size)
29 / 127
Images/cinvestav-1.jpg
What if we sort the array?
Sorted Keys
b c d ea
We have thenOperation in Array Representation Complexity
getValue(theKey) O(logsize)add(theKey, theItem) O(logsize) to find duplicate
O(size) to addremove(theKey) O(size)
30 / 127
Images/cinvestav-1.jpg
What if we sort the array?
Sorted Keys
b c d ea
We have thenOperation in Array Representation Complexity
getValue(theKey) O(logsize)add(theKey, theItem) O(logsize) to find duplicate
O(size) to addremove(theKey) O(size)
30 / 127
Images/cinvestav-1.jpg
Unsorted Chain
Our Structure
b c d e a
ComplexityOperation in Chain Representation Complexity
getValue(theKey) O(size)add(theKey, theItem) O(size) to find duplicate
O(1) to addremove(theKey) O(size)
31 / 127
Images/cinvestav-1.jpg
Unsorted Chain
Our Structure
b c d e a
ComplexityOperation in Chain Representation Complexity
getValue(theKey) O(size)add(theKey, theItem) O(size) to find duplicate
O(1) to addremove(theKey) O(size)
31 / 127
Images/cinvestav-1.jpg
Sorted Chain
Our Structure
b c d ea
ComplexityOperation in Chain Representation Complexity
getValue(theKey) O(size)add(theKey, theItem) O(size) to find duplicate
O(1) to addremove(theKey) O(size)
32 / 127
Images/cinvestav-1.jpg
Sorted Chain
Our Structure
b c d ea
ComplexityOperation in Chain Representation Complexity
getValue(theKey) O(size)add(theKey, theItem) O(size) to find duplicate
O(1) to addremove(theKey) O(size)
32 / 127
Images/cinvestav-1.jpg
Skip Lists: we will skip it - It is for an advance class ofanalysis of algorithms
We have something like this
ComplexityOperation Complexity - Worst Case Complexity - Expected
getValue(theKey) O(size) O(log size)add(theKey, theItem) O(size) O(logsize)remove(theKey) O(size) O(logsize)
33 / 127
Images/cinvestav-1.jpg
Skip Lists: we will skip it - It is for an advance class ofanalysis of algorithms
We have something like this
ComplexityOperation Complexity - Worst Case Complexity - Expected
getValue(theKey) O(size) O(log size)add(theKey, theItem) O(size) O(logsize)remove(theKey) O(size) O(logsize)
33 / 127
Images/cinvestav-1.jpg
We will concentrate our efforts in the Hash Tables
DefinitionA hash table or hash map T is a data structure, most commonly anarray, that uses a hash function to efficiently map certain identifiers ofkeys (e.g. person names) to associated values.
Why?Operation in Array Representation Complexity - Worst Case Complexity - Expected
getValue(theKey) O(size) O (1 + C)
add(theKey, theItem) O(size) O (1 + C)
remove(theKey) O(size) O (1 + C)
34 / 127
Images/cinvestav-1.jpg
We will concentrate our efforts in the Hash Tables
DefinitionA hash table or hash map T is a data structure, most commonly anarray, that uses a hash function to efficiently map certain identifiers ofkeys (e.g. person names) to associated values.
Why?Operation in Array Representation Complexity - Worst Case Complexity - Expected
getValue(theKey) O(size) O (1 + C)
add(theKey, theItem) O(size) O (1 + C)
remove(theKey) O(size) O (1 + C)
34 / 127
Images/cinvestav-1.jpg
Then
AdvantagesThey have the advantage of having a expected complexity ofoperations of O(1 + C)
I Still, be aware of C because this will change depending on whichoverflow policy you use...
35 / 127
Images/cinvestav-1.jpg
Then
AdvantagesThey have the advantage of having a expected complexity ofoperations of O(1 + C)
I Still, be aware of C because this will change depending on whichoverflow policy you use...
35 / 127
Images/cinvestav-1.jpg
Outline1 Introduction2 Operation in a ADT dictionary
AddRemoveGetValueContainsIterators
3 Example4 Implementation5 Hash Tables
Number of KeysHash FunctionsHashing By Division
6 Overflow HandlingOpen AddressingChaining
36 / 127
Images/cinvestav-1.jpg
You have two cases for this data structure
FirstSmall universe of keys.
SecondLarge number of keys
37 / 127
Images/cinvestav-1.jpg
You have two cases for this data structure
FirstSmall universe of keys.
SecondLarge number of keys
37 / 127
Images/cinvestav-1.jpg
When you have a small universe of keys, U
We can do the followingKey values are direct addresses in the array.Direct implementation or Direct-address tables.
Operations1 Direct-Address-Search(Table, key)
I return Table[key]2 Direct-Address-Search(Table,key,value)
I Table[key]=value3 Direct-Address-Delete(T , x)
I Table[key]=null
38 / 127
Images/cinvestav-1.jpg
When you have a small universe of keys, U
We can do the followingKey values are direct addresses in the array.Direct implementation or Direct-address tables.
Operations1 Direct-Address-Search(Table, key)
I return Table[key]2 Direct-Address-Search(Table,key,value)
I Table[key]=value3 Direct-Address-Delete(T , x)
I Table[key]=null
38 / 127
Images/cinvestav-1.jpg
When you have a small universe of keys, U
We can do the followingKey values are direct addresses in the array.Direct implementation or Direct-address tables.
Operations1 Direct-Address-Search(Table, key)
I return Table[key]2 Direct-Address-Search(Table,key,value)
I Table[key]=value3 Direct-Address-Delete(T , x)
I Table[key]=null
38 / 127
Images/cinvestav-1.jpg
When you have a small universe of keys, U
We can do the followingKey values are direct addresses in the array.Direct implementation or Direct-address tables.
Operations1 Direct-Address-Search(Table, key)
I return Table[key]2 Direct-Address-Search(Table,key,value)
I Table[key]=value3 Direct-Address-Delete(T , x)
I Table[key]=null
38 / 127
Images/cinvestav-1.jpg
When you have a small universe of keys, U
We can do the followingKey values are direct addresses in the array.Direct implementation or Direct-address tables.
Operations1 Direct-Address-Search(Table, key)
I return Table[key]2 Direct-Address-Search(Table,key,value)
I Table[key]=value3 Direct-Address-Delete(T , x)
I Table[key]=null
38 / 127
Images/cinvestav-1.jpg
When you have a small universe of keys, U
We can do the followingKey values are direct addresses in the array.Direct implementation or Direct-address tables.
Operations1 Direct-Address-Search(Table, key)
I return Table[key]2 Direct-Address-Search(Table,key,value)
I Table[key]=value3 Direct-Address-Delete(T , x)
I Table[key]=null
38 / 127
Images/cinvestav-1.jpg
When you have a small universe of keys, U
We can do the followingKey values are direct addresses in the array.Direct implementation or Direct-address tables.
Operations1 Direct-Address-Search(Table, key)
I return Table[key]2 Direct-Address-Search(Table,key,value)
I Table[key]=value3 Direct-Address-Delete(T , x)
I Table[key]=null
38 / 127
Images/cinvestav-1.jpg
When you have a small universe of keys, U
We can do the followingKey values are direct addresses in the array.Direct implementation or Direct-address tables.
Operations1 Direct-Address-Search(Table, key)
I return Table[key]2 Direct-Address-Search(Table,key,value)
I Table[key]=value3 Direct-Address-Delete(T , x)
I Table[key]=null
38 / 127
Images/cinvestav-1.jpg
When you have a large universe of keys, U
ThenThen, it is impractical to store a table of the size of |U|.
You can use a especial function for mapping
h : U→{0, 1, ...,m − 1} (1)
39 / 127
Images/cinvestav-1.jpg
When you have a large universe of keys, U
ThenThen, it is impractical to store a table of the size of |U|.
You can use a especial function for mapping
h : U→{0, 1, ...,m − 1} (1)
39 / 127
Images/cinvestav-1.jpg
Example
Imagine that you haveA 1D array (or table) table[0 : m − 1].
Thush(k)is the home bucket for key k.
ThenEvery dictionary pair (key, Item) is stored in its home bucket table[h[key ]].
40 / 127
Images/cinvestav-1.jpg
Example
Imagine that you haveA 1D array (or table) table[0 : m − 1].
Thush(k)is the home bucket for key k.
ThenEvery dictionary pair (key, Item) is stored in its home bucket table[h[key ]].
40 / 127
Images/cinvestav-1.jpg
Example
Imagine that you haveA 1D array (or table) table[0 : m − 1].
Thush(k)is the home bucket for key k.
ThenEvery dictionary pair (key, Item) is stored in its home bucket table[h[key ]].
40 / 127
Images/cinvestav-1.jpg
Example
Push the following pairs in a hash table of size m = 8(22,a), (33,c), (3,d), (73,e), (85,f).
Hash function iskey/11
Then, we have that
41 / 127
Images/cinvestav-1.jpg
Example
Push the following pairs in a hash table of size m = 8(22,a), (33,c), (3,d), (73,e), (85,f).
Hash function iskey/11
Then, we have that
41 / 127
Images/cinvestav-1.jpg
Example
Push the following pairs in a hash table of size m = 8(22,a), (33,c), (3,d), (73,e), (85,f).
Hash function iskey/11
Then, we have that
41 / 127
Images/cinvestav-1.jpg
What Can Go Wrong?
What if we add?Where does (26,g) go?
Then
PROBLEM!!!Keys that have the same home bucket are synonyms.22 and 26 are synonyms with respect to the hash function that is inuse.This is known as collision or overflow.
42 / 127
Images/cinvestav-1.jpg
What Can Go Wrong?
What if we add?Where does (26,g) go?
Then
PROBLEM!!!Keys that have the same home bucket are synonyms.22 and 26 are synonyms with respect to the hash function that is inuse.This is known as collision or overflow.
42 / 127
Images/cinvestav-1.jpg
What Can Go Wrong?
What if we add?Where does (26,g) go?
Then
PROBLEM!!!Keys that have the same home bucket are synonyms.22 and 26 are synonyms with respect to the hash function that is inuse.This is known as collision or overflow.
42 / 127
Images/cinvestav-1.jpg
What Can Go Wrong?
What if we add?Where does (26,g) go?
Then
PROBLEM!!!Keys that have the same home bucket are synonyms.22 and 26 are synonyms with respect to the hash function that is inuse.This is known as collision or overflow.
42 / 127
Images/cinvestav-1.jpg
Collisions
This is a problemWe might try to avoid this by using a suitable hash function h.
IdeaMake appear to be “random” enough to avoid collisions altogether(Highly Improbable) or to minimize the probability of them.
You still have the problem of collisionsPossible Solutions to the problem:
1 Chaining2 Open Addressing
43 / 127
Images/cinvestav-1.jpg
Collisions
This is a problemWe might try to avoid this by using a suitable hash function h.
IdeaMake appear to be “random” enough to avoid collisions altogether(Highly Improbable) or to minimize the probability of them.
You still have the problem of collisionsPossible Solutions to the problem:
1 Chaining2 Open Addressing
43 / 127
Images/cinvestav-1.jpg
Collisions
This is a problemWe might try to avoid this by using a suitable hash function h.
IdeaMake appear to be “random” enough to avoid collisions altogether(Highly Improbable) or to minimize the probability of them.
You still have the problem of collisionsPossible Solutions to the problem:
1 Chaining2 Open Addressing
43 / 127
Images/cinvestav-1.jpg
Collisions
This is a problemWe might try to avoid this by using a suitable hash function h.
IdeaMake appear to be “random” enough to avoid collisions altogether(Highly Improbable) or to minimize the probability of them.
You still have the problem of collisionsPossible Solutions to the problem:
1 Chaining2 Open Addressing
43 / 127
Images/cinvestav-1.jpg
Collisions
This is a problemWe might try to avoid this by using a suitable hash function h.
IdeaMake appear to be “random” enough to avoid collisions altogether(Highly Improbable) or to minimize the probability of them.
You still have the problem of collisionsPossible Solutions to the problem:
1 Chaining2 Open Addressing
43 / 127
Images/cinvestav-1.jpg
Other Issues
First IssueThe choice of the possible hash function.
SecondThe collision handling method
ThirdThe size (number of buckets) at the hash table
44 / 127
Images/cinvestav-1.jpg
Other Issues
First IssueThe choice of the possible hash function.
SecondThe collision handling method
ThirdThe size (number of buckets) at the hash table
44 / 127
Images/cinvestav-1.jpg
Other Issues
First IssueThe choice of the possible hash function.
SecondThe collision handling method
ThirdThe size (number of buckets) at the hash table
44 / 127
Images/cinvestav-1.jpg
Outline1 Introduction2 Operation in a ADT dictionary
AddRemoveGetValueContainsIterators
3 Example4 Implementation5 Hash Tables
Number of KeysHash FunctionsHashing By Division
6 Overflow HandlingOpen AddressingChaining
45 / 127
Images/cinvestav-1.jpg
Hash Functions
They have two parts1 The conversion of the key into an integer in the case the key is not an
integer.2 The mapping to the home bucket.
46 / 127
Images/cinvestav-1.jpg
Hash Functions
They have two parts1 The conversion of the key into an integer in the case the key is not an
integer.2 The mapping to the home bucket.
46 / 127
Images/cinvestav-1.jpg
Hash Functions
They have two parts1 The conversion of the key into an integer in the case the key is not an
integer.2 The mapping to the home bucket.
46 / 127
Images/cinvestav-1.jpg
String To Integer
Something NotableEach Java character is 2 bytes long.An int is 4 bytes long.
We could have the following string s = “pt”We then do the following:
1 int answer = s.charAt(0);2 answer = (answer<�<16)+s.charAt(1) ;
47 / 127
Images/cinvestav-1.jpg
String To Integer
Something NotableEach Java character is 2 bytes long.An int is 4 bytes long.
We could have the following string s = “pt”We then do the following:
1 int answer = s.charAt(0);2 answer = (answer<�<16)+s.charAt(1) ;
47 / 127
Images/cinvestav-1.jpg
String To Integer
Something NotableEach Java character is 2 bytes long.An int is 4 bytes long.
We could have the following string s = “pt”We then do the following:
1 int answer = s.charAt(0);2 answer = (answer<�<16)+s.charAt(1) ;
47 / 127
Images/cinvestav-1.jpg
String To Integer
Something NotableEach Java character is 2 bytes long.An int is 4 bytes long.
We could have the following string s = “pt”We then do the following:
1 int answer = s.charAt(0);2 answer = (answer<�<16)+s.charAt(1) ;
47 / 127
Images/cinvestav-1.jpg
String To Integer
Something NotableEach Java character is 2 bytes long.An int is 4 bytes long.
We could have the following string s = “pt”We then do the following:
1 int answer = s.charAt(0);2 answer = (answer<�<16)+s.charAt(1) ;
47 / 127
Images/cinvestav-1.jpg
What about longer Strings?
We can do the following process
p u b l i c s t a t i c i n t i n t e g e r ( S t r i n g s ){
i n t l e n g t h = s . l e n g t h ( ) ;// number o f c h a r a c t e r s i n s
i n t answer = 0 ;i f ( l e n g t h % 2 == 1){// l e n g t h i s odd
answer = s . charAt ( l e n g t h − 1 ) ;l eng th −−;
}
48 / 127
Images/cinvestav-1.jpg
Cont...
The rest of the code
// l e n g t h i s now evenf o r ( i n t i = 0 ; i < l e n g t h ; i += 2){// do two c h a r a c t e r s a t a t ime
answer += s . charAt ( i ) ;answer += ( ( i n t ) s . charAt ( i + 1) ) << 16 ;
}r e t u r n ( answer < 0) ? −answer : answer ; }
49 / 127
Images/cinvestav-1.jpg
Analysis of hashing: Which hash function?
Consider that:Good hash functions should maintain the property of simple uniformhashing!
The keys have the same probability 1/m to be hashed to any bucket!!!A uniform hash function minimizes the likelihood of an overflow whenkeys are selected at random.
50 / 127
Images/cinvestav-1.jpg
Analysis of hashing: Which hash function?
Consider that:Good hash functions should maintain the property of simple uniformhashing!
The keys have the same probability 1/m to be hashed to any bucket!!!A uniform hash function minimizes the likelihood of an overflow whenkeys are selected at random.
50 / 127
Images/cinvestav-1.jpg
Analysis of hashing: Which hash function?
Consider that:Good hash functions should maintain the property of simple uniformhashing!
The keys have the same probability 1/m to be hashed to any bucket!!!A uniform hash function minimizes the likelihood of an overflow whenkeys are selected at random.
50 / 127
Images/cinvestav-1.jpg
Possible hash function when the keys are natural numbers
The division methodh(k) = k mod m.Good choices for m are primes not too close to a power of 2.
51 / 127
Images/cinvestav-1.jpg
Possible hash function when the keys are natural numbers
The division methodh(k) = k mod m.Good choices for m are primes not too close to a power of 2.
51 / 127
Images/cinvestav-1.jpg
What if...Question:What about something with keys in a normal distribution?
52 / 127
Images/cinvestav-1.jpg
Outline1 Introduction2 Operation in a ADT dictionary
AddRemoveGetValueContainsIterators
3 Example4 Implementation5 Hash Tables
Number of KeysHash FunctionsHashing By Division
6 Overflow HandlingOpen AddressingChaining
53 / 127
Images/cinvestav-1.jpg
Hashing By Division
Universe of keyskeySpace = all integers.
Thus, we have thatFor every m, the number of integers that get mapped (hashed) into bucketi is approximately 232/m.
PropertiesThe division method results in a uniform hash function when keySpace =all integers.
54 / 127
Images/cinvestav-1.jpg
Hashing By Division
Universe of keyskeySpace = all integers.
Thus, we have thatFor every m, the number of integers that get mapped (hashed) into bucketi is approximately 232/m.
PropertiesThe division method results in a uniform hash function when keySpace =all integers.
54 / 127
Images/cinvestav-1.jpg
Hashing By Division
Universe of keyskeySpace = all integers.
Thus, we have thatFor every m, the number of integers that get mapped (hashed) into bucketi is approximately 232/m.
PropertiesThe division method results in a uniform hash function when keySpace =all integers.
54 / 127
Images/cinvestav-1.jpg
However
ProblemIn practice, keys tend to be correlated.
ThusThe choice of the divisor b affects the distribution of home buckets.
ThenBecause of this correlation, applications tend to have a bias towards keysthat map into odd integers (or into even ones).
55 / 127
Images/cinvestav-1.jpg
However
ProblemIn practice, keys tend to be correlated.
ThusThe choice of the divisor b affects the distribution of home buckets.
ThenBecause of this correlation, applications tend to have a bias towards keysthat map into odd integers (or into even ones).
55 / 127
Images/cinvestav-1.jpg
However
ProblemIn practice, keys tend to be correlated.
ThusThe choice of the divisor b affects the distribution of home buckets.
ThenBecause of this correlation, applications tend to have a bias towards keysthat map into odd integers (or into even ones).
55 / 127
Images/cinvestav-1.jpg
Examples
Odd number and m an even numberOdd integers hash into odd home buckets
15%14 = 1, 3%14 = 3, 23%14 = 9
Even number and m an even numberEven integers into even home buckets.
20%14 = 6, 30%14 = 2, 8%14 = 8
PropertiesThe bias in the keys results in a bias toward either the odd or even homebuckets.
56 / 127
Images/cinvestav-1.jpg
Examples
Odd number and m an even numberOdd integers hash into odd home buckets
15%14 = 1, 3%14 = 3, 23%14 = 9
Even number and m an even numberEven integers into even home buckets.
20%14 = 6, 30%14 = 2, 8%14 = 8
PropertiesThe bias in the keys results in a bias toward either the odd or even homebuckets.
56 / 127
Images/cinvestav-1.jpg
Examples
Odd number and m an even numberOdd integers hash into odd home buckets
15%14 = 1, 3%14 = 3, 23%14 = 9
Even number and m an even numberEven integers into even home buckets.
20%14 = 6, 30%14 = 2, 8%14 = 8
PropertiesThe bias in the keys results in a bias toward either the odd or even homebuckets.
56 / 127
Images/cinvestav-1.jpg
What if we use an Odd number?
Odd number and m an odd numberodd integers may hash into any home.
15%15 = 0, 3%15 = 3, 23%15 = 8
Even number and m an odd numberEven integers may hash into any home.
20%15 = 5, 30%15 = 0, 8%15 = 8
ThusThe bias in the keys does not result in a bias toward either the odd or evenhome buckets.
57 / 127
Images/cinvestav-1.jpg
What if we use an Odd number?
Odd number and m an odd numberodd integers may hash into any home.
15%15 = 0, 3%15 = 3, 23%15 = 8
Even number and m an odd numberEven integers may hash into any home.
20%15 = 5, 30%15 = 0, 8%15 = 8
ThusThe bias in the keys does not result in a bias toward either the odd or evenhome buckets.
57 / 127
Images/cinvestav-1.jpg
What if we use an Odd number?
Odd number and m an odd numberodd integers may hash into any home.
15%15 = 0, 3%15 = 3, 23%15 = 8
Even number and m an odd numberEven integers may hash into any home.
20%15 = 5, 30%15 = 0, 8%15 = 8
ThusThe bias in the keys does not result in a bias toward either the odd or evenhome buckets.
57 / 127
Images/cinvestav-1.jpg
Bias
Something NotableThe bias in the keys does not result in a bias toward either the odd or evenhome buckets.
Then, we haveWe have a better chance of uniformly distributed home buckets.
ThusSo do not use an even divisor.
58 / 127
Images/cinvestav-1.jpg
Bias
Something NotableThe bias in the keys does not result in a bias toward either the odd or evenhome buckets.
Then, we haveWe have a better chance of uniformly distributed home buckets.
ThusSo do not use an even divisor.
58 / 127
Images/cinvestav-1.jpg
Bias
Something NotableThe bias in the keys does not result in a bias toward either the odd or evenhome buckets.
Then, we haveWe have a better chance of uniformly distributed home buckets.
ThusSo do not use an even divisor.
58 / 127
Images/cinvestav-1.jpg
Selecting The Divisor
Another ProblemSimilar biased distribution of home buckets is seen, in practice, when thedivisor is a multiple of prime numbers such as 3, 5, 7, . . .
HoweverThe effect of each prime divisor p of m decreases as p gets larger.
Rules of Choosing mIdeally, choose m so that it is a prime number.Not to close to a power of 2.
59 / 127
Images/cinvestav-1.jpg
Selecting The Divisor
Another ProblemSimilar biased distribution of home buckets is seen, in practice, when thedivisor is a multiple of prime numbers such as 3, 5, 7, . . .
HoweverThe effect of each prime divisor p of m decreases as p gets larger.
Rules of Choosing mIdeally, choose m so that it is a prime number.Not to close to a power of 2.
59 / 127
Images/cinvestav-1.jpg
Selecting The Divisor
Another ProblemSimilar biased distribution of home buckets is seen, in practice, when thedivisor is a multiple of prime numbers such as 3, 5, 7, . . .
HoweverThe effect of each prime divisor p of m decreases as p gets larger.
Rules of Choosing mIdeally, choose m so that it is a prime number.Not to close to a power of 2.
59 / 127
Images/cinvestav-1.jpg
However
Something NotableEven with this hash function, we can have problems
RememberThe Gaussian Keys...
60 / 127
Images/cinvestav-1.jpg
However
Something NotableEven with this hash function, we can have problems
RememberThe Gaussian Keys...
60 / 127
Images/cinvestav-1.jpg
Java.util.HashTable
Quite simpleSimply uses a divisor that is an odd number.
Simplify the implementationThis simplifies implementation because we must be able to resize the hashtable as more pairs are put into the dictionary.
Why?Array doubling, for example, requires you to go from a 1D array tablewhose length is m (which is odd) to an array whose length is 2m + 1(which is also odd).
61 / 127
Images/cinvestav-1.jpg
Java.util.HashTable
Quite simpleSimply uses a divisor that is an odd number.
Simplify the implementationThis simplifies implementation because we must be able to resize the hashtable as more pairs are put into the dictionary.
Why?Array doubling, for example, requires you to go from a 1D array tablewhose length is m (which is odd) to an array whose length is 2m + 1(which is also odd).
61 / 127
Images/cinvestav-1.jpg
Java.util.HashTable
Quite simpleSimply uses a divisor that is an odd number.
Simplify the implementationThis simplifies implementation because we must be able to resize the hashtable as more pairs are put into the dictionary.
Why?Array doubling, for example, requires you to go from a 1D array tablewhose length is m (which is odd) to an array whose length is 2m + 1(which is also odd).
61 / 127
Images/cinvestav-1.jpg
A Better Solution: Universal Hashing
IssuesIn practice, keys are not randomly distributed.Any fixed hash function might yield retrieval O(n) time.
GoalTo find hash functions that produce uniform random table indexesirrespective of the keys.
IdeaTo select a hash function at random from a designed class of functions atthe beginning of the execution.
62 / 127
Images/cinvestav-1.jpg
A Better Solution: Universal Hashing
IssuesIn practice, keys are not randomly distributed.Any fixed hash function might yield retrieval O(n) time.
GoalTo find hash functions that produce uniform random table indexesirrespective of the keys.
IdeaTo select a hash function at random from a designed class of functions atthe beginning of the execution.
62 / 127
Images/cinvestav-1.jpg
A Better Solution: Universal Hashing
IssuesIn practice, keys are not randomly distributed.Any fixed hash function might yield retrieval O(n) time.
GoalTo find hash functions that produce uniform random table indexesirrespective of the keys.
IdeaTo select a hash function at random from a designed class of functions atthe beginning of the execution.
62 / 127
Images/cinvestav-1.jpg
Hashing methods: Universal hashing
Example
Set of hash functions
Choose a hash function randomly
(At the beginning of the execution)
HASH TABLE
63 / 127
Images/cinvestav-1.jpg
Example of Universal HashProceed as follows:
Choose a primer number p large enough so that every possible key k is inthe range [0, ..., p − 1]
Zp = {0, 1, ..., p − 1}and Z∗p = {1, ..., p − 1}
Define the following hash function:
ha,b(k) = ((ak + b) mod p) mod m,∀a ∈ Z∗p and b ∈ Zp
The family of all such hash functions is:
Hp,m = {ha,b : a ∈ Z∗p and b ∈ Zp}
Importanta and b are chosen randomly at the beginning of execution.The class Hp,m of hash functions is universal. 64 / 127
Images/cinvestav-1.jpg
Example of Universal HashProceed as follows:
Choose a primer number p large enough so that every possible key k is inthe range [0, ..., p − 1]
Zp = {0, 1, ..., p − 1}and Z∗p = {1, ..., p − 1}
Define the following hash function:
ha,b(k) = ((ak + b) mod p) mod m,∀a ∈ Z∗p and b ∈ Zp
The family of all such hash functions is:
Hp,m = {ha,b : a ∈ Z∗p and b ∈ Zp}
Importanta and b are chosen randomly at the beginning of execution.The class Hp,m of hash functions is universal. 64 / 127
Images/cinvestav-1.jpg
Example of Universal HashProceed as follows:
Choose a primer number p large enough so that every possible key k is inthe range [0, ..., p − 1]
Zp = {0, 1, ..., p − 1}and Z∗p = {1, ..., p − 1}
Define the following hash function:
ha,b(k) = ((ak + b) mod p) mod m,∀a ∈ Z∗p and b ∈ Zp
The family of all such hash functions is:
Hp,m = {ha,b : a ∈ Z∗p and b ∈ Zp}
Importanta and b are chosen randomly at the beginning of execution.The class Hp,m of hash functions is universal. 64 / 127
Images/cinvestav-1.jpg
Example of Universal HashProceed as follows:
Choose a primer number p large enough so that every possible key k is inthe range [0, ..., p − 1]
Zp = {0, 1, ..., p − 1}and Z∗p = {1, ..., p − 1}
Define the following hash function:
ha,b(k) = ((ak + b) mod p) mod m,∀a ∈ Z∗p and b ∈ Zp
The family of all such hash functions is:
Hp,m = {ha,b : a ∈ Z∗p and b ∈ Zp}
Importanta and b are chosen randomly at the beginning of execution.The class Hp,m of hash functions is universal. 64 / 127
Images/cinvestav-1.jpg
Example of Universal HashProceed as follows:
Choose a primer number p large enough so that every possible key k is inthe range [0, ..., p − 1]
Zp = {0, 1, ..., p − 1}and Z∗p = {1, ..., p − 1}
Define the following hash function:
ha,b(k) = ((ak + b) mod p) mod m,∀a ∈ Z∗p and b ∈ Zp
The family of all such hash functions is:
Hp,m = {ha,b : a ∈ Z∗p and b ∈ Zp}
Importanta and b are chosen randomly at the beginning of execution.The class Hp,m of hash functions is universal. 64 / 127
Images/cinvestav-1.jpg
Example of Universal HashProceed as follows:
Choose a primer number p large enough so that every possible key k is inthe range [0, ..., p − 1]
Zp = {0, 1, ..., p − 1}and Z∗p = {1, ..., p − 1}
Define the following hash function:
ha,b(k) = ((ak + b) mod p) mod m,∀a ∈ Z∗p and b ∈ Zp
The family of all such hash functions is:
Hp,m = {ha,b : a ∈ Z∗p and b ∈ Zp}
Importanta and b are chosen randomly at the beginning of execution.The class Hp,m of hash functions is universal. 64 / 127
Images/cinvestav-1.jpg
Example of Universal HashProceed as follows:
Choose a primer number p large enough so that every possible key k is inthe range [0, ..., p − 1]
Zp = {0, 1, ..., p − 1}and Z∗p = {1, ..., p − 1}
Define the following hash function:
ha,b(k) = ((ak + b) mod p) mod m,∀a ∈ Z∗p and b ∈ Zp
The family of all such hash functions is:
Hp,m = {ha,b : a ∈ Z∗p and b ∈ Zp}
Importanta and b are chosen randomly at the beginning of execution.The class Hp,m of hash functions is universal. 64 / 127
Images/cinvestav-1.jpg
Example of Universal HashProceed as follows:
Choose a primer number p large enough so that every possible key k is inthe range [0, ..., p − 1]
Zp = {0, 1, ..., p − 1}and Z∗p = {1, ..., p − 1}
Define the following hash function:
ha,b(k) = ((ak + b) mod p) mod m,∀a ∈ Z∗p and b ∈ Zp
The family of all such hash functions is:
Hp,m = {ha,b : a ∈ Z∗p and b ∈ Zp}
Importanta and b are chosen randomly at the beginning of execution.The class Hp,m of hash functions is universal. 64 / 127
Images/cinvestav-1.jpg
Example: Universal hash functions
Examplep = 977, m = 50, a and b random numbers
I ha,b(k) = ((ak + b) mod p) mod m
65 / 127
Images/cinvestav-1.jpg
Example of key distribution
Example, mean = 488.5 and dispersion = 5
66 / 127
Images/cinvestav-1.jpg
Another Example: Matrix MethodThen
Let us say keys are u-bits long.Say the table size M is power of 2.an index is b-bits long with M = 2b.
The h functionPick h to be a random b-by-u 0/1 matrix, and define h(x) = hxwhere after the inner product we apply mod 2
Example
h
b
1 0 0 00 1 1 11 1 1 0
u
x1010
=
h (x) 110
71 / 127
Images/cinvestav-1.jpg
Another Example: Matrix MethodThen
Let us say keys are u-bits long.Say the table size M is power of 2.an index is b-bits long with M = 2b.
The h functionPick h to be a random b-by-u 0/1 matrix, and define h(x) = hxwhere after the inner product we apply mod 2
Example
h
b
1 0 0 00 1 1 11 1 1 0
u
x1010
=
h (x) 110
71 / 127
Images/cinvestav-1.jpg
Another Example: Matrix MethodThen
Let us say keys are u-bits long.Say the table size M is power of 2.an index is b-bits long with M = 2b.
The h functionPick h to be a random b-by-u 0/1 matrix, and define h(x) = hxwhere after the inner product we apply mod 2
Example
h
b
1 0 0 00 1 1 11 1 1 0
u
x1010
=
h (x) 110
71 / 127
Images/cinvestav-1.jpg
Another Example: Matrix MethodThen
Let us say keys are u-bits long.Say the table size M is power of 2.an index is b-bits long with M = 2b.
The h functionPick h to be a random b-by-u 0/1 matrix, and define h(x) = hxwhere after the inner product we apply mod 2
Example
h
b
1 0 0 00 1 1 11 1 1 0
u
x1010
=
h (x) 110
71 / 127
Images/cinvestav-1.jpg
Another Example: Matrix MethodThen
Let us say keys are u-bits long.Say the table size M is power of 2.an index is b-bits long with M = 2b.
The h functionPick h to be a random b-by-u 0/1 matrix, and define h(x) = hxwhere after the inner product we apply mod 2
Example
h
b
1 0 0 00 1 1 11 1 1 0
u
x1010
=
h (x) 110
71 / 127
Images/cinvestav-1.jpg
Implementation of the column*vector mod 2
Code - SWAR-Popcount - Divide and Conquer
// This works on l y i n 32 b i t si n t p roduc t ( i n t row , i n t v e c t o r ){
i n t i = row & ve c t o r ;
i = i − ( ( i >> 1) & 0x55555555 ) ;i = ( i & 0x33333333 ) + ( ( i >> 2) & 0x33333333 ) ;i = ( ( ( i + ( i >> 4)) & 0x0F0F0F0F ) ∗ 0x01010101 ) >> 24 ;
r e t u r n i & 0x00000001 ;
}
72 / 127
Images/cinvestav-1.jpg
Explanation
Counting Duo-BitsA bit-duo (two neighboring bits) can be interpreted with bit 0 = a, and bit1 = b as
duo := 2b + a
The duo population is
popcnt(duo) := b + a
This can be achieved by(2b + a)− (2b + a)÷ 2 or (2b + a)− b
73 / 127
Images/cinvestav-1.jpg
Explanation
Counting Duo-BitsA bit-duo (two neighboring bits) can be interpreted with bit 0 = a, and bit1 = b as
duo := 2b + a
The duo population is
popcnt(duo) := b + a
This can be achieved by(2b + a)− (2b + a)÷ 2 or (2b + a)− b
73 / 127
Images/cinvestav-1.jpg
Explanation
Counting Duo-BitsA bit-duo (two neighboring bits) can be interpreted with bit 0 = a, and bit1 = b as
duo := 2b + a
The duo population is
popcnt(duo) := b + a
This can be achieved by(2b + a)− (2b + a)÷ 2 or (2b + a)− b
73 / 127
Images/cinvestav-1.jpg
We have thenSomething Notable
i i div 2 popcnt(i)00 00 0001 00 0110 01 0111 01 10
We have thenOnly the lower bit is needed from i div 2 - and one do not has to worryabout borrows from neighboring duos.
We need to clear the upper bitsRemember 5 in binary 0101 (Granularity Problem)... we use the 0 toremove the upper bits
74 / 127
Images/cinvestav-1.jpg
We have thenSomething Notable
i i div 2 popcnt(i)00 00 0001 00 0110 01 0111 01 10
We have thenOnly the lower bit is needed from i div 2 - and one do not has to worryabout borrows from neighboring duos.
We need to clear the upper bitsRemember 5 in binary 0101 (Granularity Problem)... we use the 0 toremove the upper bits
74 / 127
Images/cinvestav-1.jpg
We have thenSomething Notable
i i div 2 popcnt(i)00 00 0001 00 0110 01 0111 01 10
We have thenOnly the lower bit is needed from i div 2 - and one do not has to worryabout borrows from neighboring duos.
We need to clear the upper bitsRemember 5 in binary 0101 (Granularity Problem)... we use the 0 toremove the upper bits
74 / 127
Images/cinvestav-1.jpg
Thus, we have that
Clearing BitsSWAR-wise, one needs to clear all "even" bits of the div 2 subtrahendto perform a 32-bit subtraction of all 16 duos:
I x = x - ((x >�> 1) & 0x55555555);
NoteThe popcount-result of the bit-duos still takes two bits.
Now
What?
75 / 127
Images/cinvestav-1.jpg
Thus, we have that
Clearing BitsSWAR-wise, one needs to clear all "even" bits of the div 2 subtrahendto perform a 32-bit subtraction of all 16 duos:
I x = x - ((x >�> 1) & 0x55555555);
NoteThe popcount-result of the bit-duos still takes two bits.
Now
What?
75 / 127
Images/cinvestav-1.jpg
Thus, we have that
Clearing BitsSWAR-wise, one needs to clear all "even" bits of the div 2 subtrahendto perform a 32-bit subtraction of all 16 duos:
I x = x - ((x >�> 1) & 0x55555555);
NoteThe popcount-result of the bit-duos still takes two bits.
Now
What?
75 / 127
Images/cinvestav-1.jpg
Counting Nibble-Bits
We haveThe next step is to add the duo-counts to populations of four neighboringbits, the 8 nibble-counts, which may range from zero to four
We can do this using 3 in binary0x3=0011
76 / 127
Images/cinvestav-1.jpg
Counting Nibble-Bits
We haveThe next step is to add the duo-counts to populations of four neighboringbits, the 8 nibble-counts, which may range from zero to four
We can do this using 3 in binary0x3=0011
76 / 127
Images/cinvestav-1.jpg
How?
How many ones you can have in 4 bits0 to 22
Thus, we only need 4 bits to store the resultWe create a counter for each of the two bits in four bits
This is done using the mask - after How many ones can you have intwo bits?
I From 0 to 2 you can use the lower 2 bits to represent this!!!
This is the instructioni = (i & 0x33333333) + ((i >�> 2) & 0x33333333);
77 / 127
Images/cinvestav-1.jpg
How?
How many ones you can have in 4 bits0 to 22
Thus, we only need 4 bits to store the resultWe create a counter for each of the two bits in four bits
This is done using the mask - after How many ones can you have intwo bits?
I From 0 to 2 you can use the lower 2 bits to represent this!!!
This is the instructioni = (i & 0x33333333) + ((i >�> 2) & 0x33333333);
77 / 127
Images/cinvestav-1.jpg
How?
How many ones you can have in 4 bits0 to 22
Thus, we only need 4 bits to store the resultWe create a counter for each of the two bits in four bits
This is done using the mask - after How many ones can you have intwo bits?
I From 0 to 2 you can use the lower 2 bits to represent this!!!
This is the instructioni = (i & 0x33333333) + ((i >�> 2) & 0x33333333);
77 / 127
Images/cinvestav-1.jpg
How?
How many ones you can have in 4 bits0 to 22
Thus, we only need 4 bits to store the resultWe create a counter for each of the two bits in four bits
This is done using the mask - after How many ones can you have intwo bits?
I From 0 to 2 you can use the lower 2 bits to represent this!!!
This is the instructioni = (i & 0x33333333) + ((i >�> 2) & 0x33333333);
77 / 127
Images/cinvestav-1.jpg
How?
How many ones you can have in 4 bits0 to 22
Thus, we only need 4 bits to store the resultWe create a counter for each of the two bits in four bits
This is done using the mask - after How many ones can you have intwo bits?
I From 0 to 2 you can use the lower 2 bits to represent this!!!
This is the instructioni = (i & 0x33333333) + ((i >�> 2) & 0x33333333);
77 / 127
Images/cinvestav-1.jpg
Example
We haveOld i new i 3 new i&3
0000 0000 0011 00000001 0001 0011 00010010 0001 0011 00010011 0010 0011 00100100 0100 0011 00000101 0101 0011 00011111 1010 0011 0010
new i>�>2 3 new i>�>2&3
0000 0011 00000000 0011 00000000 0011 00000000 0011 00000001 0011 00010001 0011 00010010 0011 0010
Final i
0000000100010010000100100100
78 / 127
Images/cinvestav-1.jpg
We have the following
Now what?You already got the idea? Now it is about to get the byte-populationsfrom two nibble-populations (8 bits)
Do you remember your hexadecimal notation0x0f = 00001111
How many ones do you have in 8 positionsFrom 0 to 23, you only require 4 bits for this!!!
79 / 127
Images/cinvestav-1.jpg
We have the following
Now what?You already got the idea? Now it is about to get the byte-populationsfrom two nibble-populations (8 bits)
Do you remember your hexadecimal notation0x0f = 00001111
How many ones do you have in 8 positionsFrom 0 to 23, you only require 4 bits for this!!!
79 / 127
Images/cinvestav-1.jpg
We have the following
Now what?You already got the idea? Now it is about to get the byte-populationsfrom two nibble-populations (8 bits)
Do you remember your hexadecimal notation0x0f = 00001111
How many ones do you have in 8 positionsFrom 0 to 23, you only require 4 bits for this!!!
79 / 127
Images/cinvestav-1.jpg
Thus the next step
It will get the counter for the number of ones in 8 bits(i + (i >�> 4)) & 0x0f0f0f0f0f0f0f0f;
80 / 127
Images/cinvestav-1.jpg
Example
We haveOld x 2 bits 4 bits
11110000 10100000 0100000011111111 10101010 01000100
Now we get1 01000000 7−→ (i >�> 4) → 00000100 7−→ i+(i >�> 4)→01000100&000011117−→ 00000100
2 01000100 7−→ (i >�> 4) → 00000100 7−→ i+(i >�> 4)→01001000&000011117−→ 00001000
81 / 127
Images/cinvestav-1.jpg
Example
We haveOld x 2 bits 4 bits
11110000 10100000 0100000011111111 10101010 01000100
Now we get1 01000000 7−→ (i >�> 4) → 00000100 7−→ i+(i >�> 4)→01000100&000011117−→ 00000100
2 01000100 7−→ (i >�> 4) → 00000100 7−→ i+(i >�> 4)→01001000&000011117−→ 00001000
81 / 127
Images/cinvestav-1.jpg
We can keep doing this
It is possible to keep goingi = (i + (i >�> 8)) & 0x00ff00ff
Theni = (i + (i >�> 16)) & 0x000000ff
82 / 127
Images/cinvestav-1.jpg
We can keep doing this
It is possible to keep goingi = (i + (i >�> 8)) & 0x00ff00ff
Theni = (i + (i >�> 16)) & 0x000000ff
82 / 127
Images/cinvestav-1.jpg
With today fast 64 bit multiplications
Something Notableone can multiply the vector of 8-byte-counts with 0x0101010101010101 toget the final result in the most significant byte,
For this the code(((i + (i >�> 4)) & 0x0F0F0F0F) * 0x01010101)>�>24
83 / 127
Images/cinvestav-1.jpg
With today fast 64 bit multiplications
Something Notableone can multiply the vector of 8-byte-counts with 0x0101010101010101 toget the final result in the most significant byte,
For this the code(((i + (i >�> 4)) & 0x0F0F0F0F) * 0x01010101)>�>24
83 / 127
Images/cinvestav-1.jpg
Divide and Conquer
This is the natural way we do many thingsWe always attack smaller versions first of the large one!!!
84 / 127
Images/cinvestav-1.jpg
Overflow HandlingOverflowAn overflow occurs when the home bucket for a new pair (key, element) isfull.
One strategy to handle overflow, small universe of keysSearch the hash table in some systematic fashion for a bucket that is notfull.
Linear probing (linear open addressing).Quadratic probing.Random probing.
The other strategy, a large universe of keysEliminate overflows by permitting each bucket to keep a list of all pairs forwhich it is the home bucket.
Array linear list.Chain.
85 / 127
Images/cinvestav-1.jpg
Overflow HandlingOverflowAn overflow occurs when the home bucket for a new pair (key, element) isfull.
One strategy to handle overflow, small universe of keysSearch the hash table in some systematic fashion for a bucket that is notfull.
Linear probing (linear open addressing).Quadratic probing.Random probing.
The other strategy, a large universe of keysEliminate overflows by permitting each bucket to keep a list of all pairs forwhich it is the home bucket.
Array linear list.Chain.
85 / 127
Images/cinvestav-1.jpg
Overflow HandlingOverflowAn overflow occurs when the home bucket for a new pair (key, element) isfull.
One strategy to handle overflow, small universe of keysSearch the hash table in some systematic fashion for a bucket that is notfull.
Linear probing (linear open addressing).Quadratic probing.Random probing.
The other strategy, a large universe of keysEliminate overflows by permitting each bucket to keep a list of all pairs forwhich it is the home bucket.
Array linear list.Chain.
85 / 127
Images/cinvestav-1.jpg
Overflow HandlingOverflowAn overflow occurs when the home bucket for a new pair (key, element) isfull.
One strategy to handle overflow, small universe of keysSearch the hash table in some systematic fashion for a bucket that is notfull.
Linear probing (linear open addressing).Quadratic probing.Random probing.
The other strategy, a large universe of keysEliminate overflows by permitting each bucket to keep a list of all pairs forwhich it is the home bucket.
Array linear list.Chain.
85 / 127
Images/cinvestav-1.jpg
Overflow HandlingOverflowAn overflow occurs when the home bucket for a new pair (key, element) isfull.
One strategy to handle overflow, small universe of keysSearch the hash table in some systematic fashion for a bucket that is notfull.
Linear probing (linear open addressing).Quadratic probing.Random probing.
The other strategy, a large universe of keysEliminate overflows by permitting each bucket to keep a list of all pairs forwhich it is the home bucket.
Array linear list.Chain.
85 / 127
Images/cinvestav-1.jpg
Overflow HandlingOverflowAn overflow occurs when the home bucket for a new pair (key, element) isfull.
One strategy to handle overflow, small universe of keysSearch the hash table in some systematic fashion for a bucket that is notfull.
Linear probing (linear open addressing).Quadratic probing.Random probing.
The other strategy, a large universe of keysEliminate overflows by permitting each bucket to keep a list of all pairs forwhich it is the home bucket.
Array linear list.Chain.
85 / 127
Images/cinvestav-1.jpg
Outline1 Introduction2 Operation in a ADT dictionary
AddRemoveGetValueContainsIterators
3 Example4 Implementation5 Hash Tables
Number of KeysHash FunctionsHashing By Division
6 Overflow HandlingOpen AddressingChaining
86 / 127
Images/cinvestav-1.jpg
Open addressing
DefinitionAll the elements occupy the hash table itself.
What is it?We systematically examine table slots until either we find the desiredelement or we have ascertained that the element is not in the table.
AdvantagesThe advantage of open addressing is that it avoids pointers altogether.
87 / 127
Images/cinvestav-1.jpg
Open addressing
DefinitionAll the elements occupy the hash table itself.
What is it?We systematically examine table slots until either we find the desiredelement or we have ascertained that the element is not in the table.
AdvantagesThe advantage of open addressing is that it avoids pointers altogether.
87 / 127
Images/cinvestav-1.jpg
Open addressing
DefinitionAll the elements occupy the hash table itself.
What is it?We systematically examine table slots until either we find the desiredelement or we have ascertained that the element is not in the table.
AdvantagesThe advantage of open addressing is that it avoids pointers altogether.
87 / 127
Images/cinvestav-1.jpg
Insert in Open addressing
Extended hash function to probeInstead of being fixed in the order 0, 1, 2, ...,m − 1 with Θ (n) searchtimeExtend the hash function toh : U × {0, 1, ...,m − 1} → {0, 1, ...,m − 1}This gives the probe sequence 〈h(k, 0), h(k, 1), ..., h(k,m − 1)〉
I A permutation of 〈0, 1, 2, ...,m − 1〉
88 / 127
Images/cinvestav-1.jpg
Insert in Open addressing
Extended hash function to probeInstead of being fixed in the order 0, 1, 2, ...,m − 1 with Θ (n) searchtimeExtend the hash function toh : U × {0, 1, ...,m − 1} → {0, 1, ...,m − 1}This gives the probe sequence 〈h(k, 0), h(k, 1), ..., h(k,m − 1)〉
I A permutation of 〈0, 1, 2, ...,m − 1〉
88 / 127
Images/cinvestav-1.jpg
Insert in Open addressing
Extended hash function to probeInstead of being fixed in the order 0, 1, 2, ...,m − 1 with Θ (n) searchtimeExtend the hash function toh : U × {0, 1, ...,m − 1} → {0, 1, ...,m − 1}This gives the probe sequence 〈h(k, 0), h(k, 1), ..., h(k,m − 1)〉
I A permutation of 〈0, 1, 2, ...,m − 1〉
88 / 127
Images/cinvestav-1.jpg
Insert in Open addressing
Extended hash function to probeInstead of being fixed in the order 0, 1, 2, ...,m − 1 with Θ (n) searchtimeExtend the hash function toh : U × {0, 1, ...,m − 1} → {0, 1, ...,m − 1}This gives the probe sequence 〈h(k, 0), h(k, 1), ..., h(k,m − 1)〉
I A permutation of 〈0, 1, 2, ...,m − 1〉
88 / 127
Images/cinvestav-1.jpg
Hashing methods in Open Addressing
HASH-INSERT(T , k)1 i = 02 repeat3 j = h (k , i)4 if T [j ] == NIL5 T [j ] = k6 return j7 else i = i + 18 until i == m9 error “Hash Table Overflow”
89 / 127
Images/cinvestav-1.jpg
Hashing methods in Open Addressing
HASH-SEARCH(T,k)1 i = 02 repeat3 j = h (k , i)4 if T [j ] == k5 return j6 i = i + 17 until T [j ] == NIL or i == m8 return NIL
90 / 127
Images/cinvestav-1.jpg
Linear probing: Definition and properties
Hash functionGiven an ordinary hash function h′ : 0, 1, ...,m − 1→ U fori = 0, 1, ...,m − 1, we get the extended hash function
h(k, i) =(h′(k) + i
)mod m, (2)
Sequence of probesGiven key k, we first probe T [h′(k)], then T [h′(k) + 1] and so on untilT [m − 1]. Then, we wrap around T [0] to T [h′(k)− 1].
Distinct probesBecause the initial probe determines the entire probe sequence, there arem distinct probe sequences.
91 / 127
Images/cinvestav-1.jpg
Linear probing: Definition and properties
Hash functionGiven an ordinary hash function h′ : 0, 1, ...,m − 1→ U fori = 0, 1, ...,m − 1, we get the extended hash function
h(k, i) =(h′(k) + i
)mod m, (2)
Sequence of probesGiven key k, we first probe T [h′(k)], then T [h′(k) + 1] and so on untilT [m − 1]. Then, we wrap around T [0] to T [h′(k)− 1].
Distinct probesBecause the initial probe determines the entire probe sequence, there arem distinct probe sequences.
91 / 127
Images/cinvestav-1.jpg
Linear probing: Definition and properties
Hash functionGiven an ordinary hash function h′ : 0, 1, ...,m − 1→ U fori = 0, 1, ...,m − 1, we get the extended hash function
h(k, i) =(h′(k) + i
)mod m, (2)
Sequence of probesGiven key k, we first probe T [h′(k)], then T [h′(k) + 1] and so on untilT [m − 1]. Then, we wrap around T [0] to T [h′(k)− 1].
Distinct probesBecause the initial probe determines the entire probe sequence, there arem distinct probe sequences.
91 / 127
Images/cinvestav-1.jpg
Linear probing: Definition and properties
DisadvantagesLinear probing suffers of primary clustering.Long runs of occupied slots build up increasing the average searchtime.
I Clusters arise because an empty slot preceded by i full slots gets fillednext with probability i+1
m .
Long runs of occupied slots tend to get longer, and the averagesearch time increases.
92 / 127
Images/cinvestav-1.jpg
Linear probing: Definition and properties
DisadvantagesLinear probing suffers of primary clustering.Long runs of occupied slots build up increasing the average searchtime.
I Clusters arise because an empty slot preceded by i full slots gets fillednext with probability i+1
m .
Long runs of occupied slots tend to get longer, and the averagesearch time increases.
92 / 127
Images/cinvestav-1.jpg
Linear probing: Definition and properties
DisadvantagesLinear probing suffers of primary clustering.Long runs of occupied slots build up increasing the average searchtime.
I Clusters arise because an empty slot preceded by i full slots gets fillednext with probability i+1
m .
Long runs of occupied slots tend to get longer, and the averagesearch time increases.
92 / 127
Images/cinvestav-1.jpg
Linear probing: Definition and properties
DisadvantagesLinear probing suffers of primary clustering.Long runs of occupied slots build up increasing the average searchtime.
I Clusters arise because an empty slot preceded by i full slots gets fillednext with probability i+1
m .
Long runs of occupied slots tend to get longer, and the averagesearch time increases.
92 / 127
Images/cinvestav-1.jpg
Example: Linear Probing – Get And Put
Constraintsdivisor = m (number of buckets) = 17.Home bucket = key % 17.
ThenPut in pairs whose keys are 6, 12, 34, 29, 28, 11, 23, 7, 0, 33, 30, 45
We have
93 / 127
Images/cinvestav-1.jpg
Example: Linear Probing – Get And Put
Constraintsdivisor = m (number of buckets) = 17.Home bucket = key % 17.
ThenPut in pairs whose keys are 6, 12, 34, 29, 28, 11, 23, 7, 0, 33, 30, 45
We have
93 / 127
Images/cinvestav-1.jpg
Example: Linear Probing – Get And Put
Constraintsdivisor = m (number of buckets) = 17.Home bucket = key % 17.
ThenPut in pairs whose keys are 6, 12, 34, 29, 28, 11, 23, 7, 0, 33, 30, 45
We have
93 / 127
Images/cinvestav-1.jpg
Linear Probing – Remove
Example
remove(0)
Compact ClusterSearch cluster for pair (if any) to fill vacated bucket.
94 / 127
Images/cinvestav-1.jpg
Linear Probing – Remove
Example
remove(0)
Compact ClusterSearch cluster for pair (if any) to fill vacated bucket.
94 / 127
Images/cinvestav-1.jpg
Linear Probing – Remove
Example
remove(0)
Compact ClusterSearch cluster for pair (if any) to fill vacated bucket.
94 / 127
Images/cinvestav-1.jpg
Linear Probing – remove(34)
Example
remove(34)
Compact ClusterSearch cluster for pair (if any) to fill vacated bucket.
95 / 127
Images/cinvestav-1.jpg
Linear Probing – remove(34)
Example
remove(34)
Compact ClusterSearch cluster for pair (if any) to fill vacated bucket.
95 / 127
Images/cinvestav-1.jpg
Linear Probing – remove(34)Example
remove(34)
Compact ClusterSearch cluster for pair (if any) to fill vacated bucket.
95 / 127
Images/cinvestav-1.jpg
Linear Probing – remove(29)
Example
Compact ClusterSearch cluster for pair (if any) to fill vacated bucket.
96 / 127
Images/cinvestav-1.jpg
Linear Probing – remove(29)
Example
Compact ClusterSearch cluster for pair (if any) to fill vacated bucket.
96 / 127
Images/cinvestav-1.jpg
Code for RemovingWe have the following
p u b l i c vo i d remove ( key ){i n t p o s i t i o n sChe c k ed = 1 ;i n t i = F i ndS l o t (Key ) ;i f ( Table [ i ] == n u l l )
r e t u r n ; // key i s not i n the t a b l ej = i ;wh i l e ( p o s i t i o n sCh e c k ed <= Table . l e n g t h ){
j = ( j +1) % Table . l e n g t h ;i f ( Table [ j ] == n u l l ) b reak ;k = Hashing ( Table [ j ] . key ) ;i f ( i < j && ( k <= i | | k > j ) ) | |
( j < i && ( k <= i && k > j ) ) {Table [ i ] = Table [ j ] ;i = j ;
}po s i t i o nChe ck ed++;
}Table [ i ] = n u l l ;
}97 / 127
Images/cinvestav-1.jpg
Explanation
FirstFor all records in a cluster, there must be no vacant slots between theirnatural hash position and their current position (else lookups willterminate before finding the record).
Secondk is the raw hash where the record at j would naturally land in thehash table if there were no collisions.
ThusThis test is asking if the record at j is invalidly positioned with respect tothe required properties of a cluster now that i is vacant.
98 / 127
Images/cinvestav-1.jpg
Explanation
FirstFor all records in a cluster, there must be no vacant slots between theirnatural hash position and their current position (else lookups willterminate before finding the record).
Secondk is the raw hash where the record at j would naturally land in thehash table if there were no collisions.
ThusThis test is asking if the record at j is invalidly positioned with respect tothe required properties of a cluster now that i is vacant.
98 / 127
Images/cinvestav-1.jpg
Explanation
FirstFor all records in a cluster, there must be no vacant slots between theirnatural hash position and their current position (else lookups willterminate before finding the record).
Secondk is the raw hash where the record at j would naturally land in thehash table if there were no collisions.
ThusThis test is asking if the record at j is invalidly positioned with respect tothe required properties of a cluster now that i is vacant.
98 / 127
Images/cinvestav-1.jpg
Case 1
We have the followingCase 1
i j
k k
we have i < jIf i < k ≤ j then moving j to the i position will be incorrect... Why?
99 / 127
Images/cinvestav-1.jpg
Case 2
We have the followingCase 2
ij
k
We have j < iIf k ≤ j < or i < k then moving j to the i position will be incorrect...Why?
100 / 127
Images/cinvestav-1.jpg
Complexity
We have thenWorst-case get/put/remove time is Θ(n), where n is the number of pairsin the table.
Something NotableThis happens when all pairs are in the same cluster.
101 / 127
Images/cinvestav-1.jpg
Complexity
We have thenWorst-case get/put/remove time is Θ(n), where n is the number of pairsin the table.
Something NotableThis happens when all pairs are in the same cluster.
101 / 127
Images/cinvestav-1.jpg
Explaining α
We have thatα= loading density = (number of pairs)/m.
In the example α = 12/17
We have the following terms to explain complexity in Open AddressingSn = expected number of buckets examined in a successful searchwhen n is large.Un= expected number of buckets examined in a unsuccessful searchwhen n is large
102 / 127
Images/cinvestav-1.jpg
Explaining α
We have thatα= loading density = (number of pairs)/m.
In the example α = 12/17
We have the following terms to explain complexity in Open AddressingSn = expected number of buckets examined in a successful searchwhen n is large.Un= expected number of buckets examined in a unsuccessful searchwhen n is large
102 / 127
Images/cinvestav-1.jpg
Explaining α
We have thatα= loading density = (number of pairs)/m.
In the example α = 12/17
We have the following terms to explain complexity in Open AddressingSn = expected number of buckets examined in a successful searchwhen n is large.Un= expected number of buckets examined in a unsuccessful searchwhen n is large
102 / 127
Images/cinvestav-1.jpg
Explaining α
We have thatα= loading density = (number of pairs)/m.
In the example α = 12/17
We have the following terms to explain complexity in Open AddressingSn = expected number of buckets examined in a successful searchwhen n is large.Un= expected number of buckets examined in a unsuccessful searchwhen n is large
102 / 127
Images/cinvestav-1.jpg
Expected Performance in Open Addressing, 0 ≤ α ≤ 1
We have for unsuccessful search
Un = O(1 + 1
1− α
)(3)
We have for successful search
Sn = O(1 + 1
αln 1
1− α
)(4)
103 / 127
Images/cinvestav-1.jpg
Expected Performance in Open Addressing, 0 ≤ α ≤ 1
We have for unsuccessful search
Un = O(1 + 1
1− α
)(3)
We have for successful search
Sn = O(1 + 1
αln 1
1− α
)(4)
103 / 127
Images/cinvestav-1.jpg
Thus, for different α′s
We have using uniform keys!!!α Sn Un
0.50 1.5 2.50.75 2.5 8.50.90 5.5 50.5
Recommendationα ≤ 0.75 is recommended.
104 / 127
Images/cinvestav-1.jpg
Thus, for different α′s
We have using uniform keys!!!α Sn Un
0.50 1.5 2.50.75 2.5 8.50.90 5.5 50.5
Recommendationα ≤ 0.75 is recommended.
104 / 127
Images/cinvestav-1.jpg
Hash Table Design
For exampleIf performance requirements are given, determine maximum permissibleloading density.
We want a successful search to make no more than 5 compares(expected)
4 = 1 + 11−α ‘
α = 3/4
We want an unsuccessful search to make no more than 6 compares(expected).
6 = 1 + 1α log 1
1−α
α ≈ 0.964
105 / 127
Images/cinvestav-1.jpg
Hash Table Design
For exampleIf performance requirements are given, determine maximum permissibleloading density.
We want a successful search to make no more than 5 compares(expected)
4 = 1 + 11−α ‘
α = 3/4
We want an unsuccessful search to make no more than 6 compares(expected).
6 = 1 + 1α log 1
1−α
α ≈ 0.964
105 / 127
Images/cinvestav-1.jpg
Hash Table Design
For exampleIf performance requirements are given, determine maximum permissibleloading density.
We want a successful search to make no more than 5 compares(expected)
4 = 1 + 11−α ‘
α = 3/4
We want an unsuccessful search to make no more than 6 compares(expected).
6 = 1 + 1α log 1
1−α
α ≈ 0.964
105 / 127
Images/cinvestav-1.jpg
Use the minimum α for your trigger of doubling the array!!!
Thus, we have
αf ≤ min {3/4, 0.964}
106 / 127
Images/cinvestav-1.jpg
Hash Table Design
Dynamic resizing of table.Whenever loading density exceeds threshold, rehash into a table ofapproximately twice the current size.
107 / 127
Images/cinvestav-1.jpg
But , even with a nice division method!!!
Example using keys uniformly distributedIt was generated using the division method
Then
108 / 127
Images/cinvestav-1.jpg
But , even with a nice division method!!!Example using keys uniformly distributedIt was generated using the division method
Then
108 / 127
Images/cinvestav-1.jpg
Example
Example using Gaussian keysIt was generated using the division method
Then
109 / 127
Images/cinvestav-1.jpg
ExampleExample using Gaussian keysIt was generated using the division method
Then
109 / 127
Images/cinvestav-1.jpg
Outline1 Introduction2 Operation in a ADT dictionary
AddRemoveGetValueContainsIterators
3 Example4 Implementation5 Hash Tables
Number of KeysHash FunctionsHashing By Division
6 Overflow HandlingOpen AddressingChaining
110 / 127
Images/cinvestav-1.jpg
Linear List Of Synonyms
ThusEach bucket keeps a linear list of all pairs for which it is the homebucket.The linear list may or may not be sorted by key.The linear list may be an array linear list or a chain.
111 / 127
Images/cinvestav-1.jpg
Collision Handling: Chaining
A Possible SolutionInsert the elements that hash to the same slot into a linked list.
U(Universe of Keys)
112 / 127
Images/cinvestav-1.jpg
Example Sorted Chains
Add to a hash table with m = 11Put in pairs whose keys are 6, 17, 12, 23, 28, 5, 16, 3, 8
So, we haveHome bucket = key % 11.
113 / 127
Images/cinvestav-1.jpg
Example Sorted Chains
Add to a hash table with m = 11Put in pairs whose keys are 6, 17, 12, 23, 28, 5, 16, 3, 8
So, we haveHome bucket = key % 11.
113 / 127
Images/cinvestav-1.jpg
Example
The Table
0
1
2
3
4
5
6
7
8
9
10
6, 17, 12, 23, 28, 5, 16, 3, 8
114 / 127
Images/cinvestav-1.jpg
Example
The Table
0
1
2
3
4
5
6
7
8
9
10
6, 17, 12, 23, 28, 5, 16, 3
8
115 / 127
Images/cinvestav-1.jpg
Example
The Table
0
1
2
3
4
5
6
7
8
9
10
6, 17, 12, 23, 28, 5, 16
8
3
116 / 127
Images/cinvestav-1.jpg
Example
The Table
0
1
2
3
4
5
6
7
8
9
10
6, 17, 12, 23, 28, 5
8
3
16
117 / 127
Images/cinvestav-1.jpg
Example
The Table
0
1
2
3
4
5
6
7
8
9
10
6, 17, 12, 23, 28, 5
8
3
16
118 / 127
Images/cinvestav-1.jpg
Example
The Table
0
1
2
3
4
5
6
7
8
9
10
6, 17, 12, 23, 28
8
3
5 16
119 / 127
Images/cinvestav-1.jpg
Expected Complexity of Hash Table under Chaining
We have for unsuccessful search
Un = O (1 + α) (5)
We have for successful search
Sn = O (1 + α) (6)
126 / 127
Images/cinvestav-1.jpg
Expected Complexity of Hash Table under Chaining
We have for unsuccessful search
Un = O (1 + α) (5)
We have for successful search
Sn = O (1 + α) (6)
126 / 127
Images/cinvestav-1.jpg
Now, Back to the real world
java.util.HashtableIt uses unsorted chains.It uses a default initial m = divisor = 101It uses a default α ≤ 0.75When loading density exceeds a max permissible threshold, It rehashwith new m = 2m+1.
127 / 127
Images/cinvestav-1.jpg
Now, Back to the real world
java.util.HashtableIt uses unsorted chains.It uses a default initial m = divisor = 101It uses a default α ≤ 0.75When loading density exceeds a max permissible threshold, It rehashwith new m = 2m+1.
127 / 127
Images/cinvestav-1.jpg
Now, Back to the real world
java.util.HashtableIt uses unsorted chains.It uses a default initial m = divisor = 101It uses a default α ≤ 0.75When loading density exceeds a max permissible threshold, It rehashwith new m = 2m+1.
127 / 127