Hashing by Rafael Jaffarove CS157b. Motivation Fast data access Search Insertion Deletion Ideal...
-
Upload
francine-pope -
Category
Documents
-
view
213 -
download
1
description
Transcript of Hashing by Rafael Jaffarove CS157b. Motivation Fast data access Search Insertion Deletion Ideal...
HashingHashingby Rafael Jaffaroveby Rafael Jaffarove
CS157bCS157b
MotivationMotivation
Fast data accessFast data access SearchSearch InsertionInsertion DeletionDeletion
Ideal seek time is O(1)Ideal seek time is O(1)
Types of OrganizationTypes of Organization
File organizationFile organization search-key points to the disk block with search-key points to the disk block with
desired recorddesired record Index organizationIndex organization
search-key is stored together with a pointer in search-key is stored together with a pointer in a hash table. Pointer points to a particular a hash table. Pointer points to a particular bucket where the record is storedbucket where the record is stored
Types of HashingTypes of Hashing
Static hashingStatic hashing Fixed file sizeFixed file size
Dynamic hashingDynamic hashing Extendable hashingExtendable hashing
Problems with Static HashingProblems with Static Hashing
Databases tend to grow over timeDatabases tend to grow over time The number of buckets must be The number of buckets must be
predefined predefined If number is too large then the space is If number is too large then the space is
wastedwasted If number is too small then we have too If number is too small then we have too
many collisionsmany collisions Bucket overflowBucket overflow
Handling Bucket OverflowHandling Bucket Overflow
Providing overflow bucketsProviding overflow buckets If an initial bucket is full a new bucket is given. If an initial bucket is full a new bucket is given.
If the second bucket is full then a 3If the second bucket is full then a 3rdrd bucket is bucket is given and so on.given and so on.
Additional buckets are linked together in a Additional buckets are linked together in a linked listlinked list
Problems: Problems: searches and insertions might take liner timesearches and insertions might take liner time deletions are difficult to performdeletions are difficult to perform
Dynamic HashingDynamic Hashing Extendable hashingExtendable hashing
buckets created as neededbuckets created as needed Example of extendable hashingExample of extendable hashing
Insert the following countries into database: Insert the following countries into database: England, France, China, Germany, Egypt, England, France, China, Germany, Egypt, AustraliaAustralia
We will use hash function of sum of ASCII We will use hash function of sum of ASCII codes of all characters in a namecodes of all characters in a name
Assumption: bucket can’t hold more than 2 Assumption: bucket can’t hold more than 2 recordsrecords
Extendable HashingExtendable Hashing
Example (contd.)Example (contd.)
Extendable HashingExtendable Hashing
Problem with dynamic hashingProblem with dynamic hashing additional level of indirectionadditional level of indirection
Hash functionHash function
Importance of choosing the right hash Importance of choosing the right hash functionfunction Uniform function = even distribution of dataUniform function = even distribution of data Table size is a prime numberTable size is a prime number
There is no perfect hash function so There is no perfect hash function so collisions are possiblecollisions are possible
Handling CollisionsHandling Collisions
Linear probingLinear probing Quadratic probingQuadratic probing Double hashingDouble hashing ChainingChaining
Linear ProbingLinear Probing If a slot is used, take next availableIf a slot is used, take next available If next is used, continue until an empty slot is If next is used, continue until an empty slot is
foundfound If end of table is reached, wrap around from If end of table is reached, wrap around from
beginning.beginning.
Problems:Problems: Clustering of dataClustering of data How far to go if there are no empty slots?How far to go if there are no empty slots? Deletion: deleting key in the middle of a clusterDeletion: deleting key in the middle of a cluster
Quadratic probingQuadratic probing
To avoid clustering take not the next slot To avoid clustering take not the next slot but 1but 122, 2, 222, 3, 322, 4, 422, etc., etc.
Problem:Problem: Secondary clustering, since the same seek Secondary clustering, since the same seek
pattern is used in case of a collisionpattern is used in case of a collision
Double HashingDouble Hashing
In case of collision, apply second hash In case of collision, apply second hash function. function.
Overall better performance than linear and Overall better performance than linear and quadratic probingquadratic probing
ChainingChaining Entries are linked listsEntries are linked lists In case of a collision the entries are added In case of a collision the entries are added
to those linked lists.to those linked lists.
Problem:Problem: In case of frequent collisions on the same In case of frequent collisions on the same
key, search for that key in linked list becomes key, search for that key in linked list becomes linear. Alternative data structures are used to linear. Alternative data structures are used to solve this problem (i.e. Bsolve this problem (i.e. B++-trees).-trees).