Hashing by Rafael Jaffarove CS157b. Motivation Fast data access Search Insertion Deletion Ideal...

19
Hashing Hashing by Rafael Jaffarove by Rafael Jaffarove CS157b CS157b

description

Types of Organization  File organization  search-key points to the disk block with desired record  Index organization  search-key is stored together with a pointer in a hash table. Pointer points to a particular bucket where the record is stored

Transcript of Hashing by Rafael Jaffarove CS157b. Motivation Fast data access Search Insertion Deletion Ideal...

Page 1: Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)

HashingHashingby Rafael Jaffaroveby Rafael Jaffarove

CS157bCS157b

Page 2: Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)

MotivationMotivation

Fast data accessFast data access SearchSearch InsertionInsertion DeletionDeletion

Ideal seek time is O(1)Ideal seek time is O(1)

Page 3: Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)

Types of OrganizationTypes of Organization

File organizationFile organization search-key points to the disk block with search-key points to the disk block with

desired recorddesired record Index organizationIndex organization

search-key is stored together with a pointer in search-key is stored together with a pointer in a hash table. Pointer points to a particular a hash table. Pointer points to a particular bucket where the record is storedbucket where the record is stored

Page 4: Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)

Types of HashingTypes of Hashing

Static hashingStatic hashing Fixed file sizeFixed file size

Dynamic hashingDynamic hashing Extendable hashingExtendable hashing

Page 5: Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)

Problems with Static HashingProblems with Static Hashing

Databases tend to grow over timeDatabases tend to grow over time The number of buckets must be The number of buckets must be

predefined predefined If number is too large then the space is If number is too large then the space is

wastedwasted If number is too small then we have too If number is too small then we have too

many collisionsmany collisions Bucket overflowBucket overflow

Page 6: Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)

Handling Bucket OverflowHandling Bucket Overflow

Providing overflow bucketsProviding overflow buckets If an initial bucket is full a new bucket is given. If an initial bucket is full a new bucket is given.

If the second bucket is full then a 3If the second bucket is full then a 3rdrd bucket is bucket is given and so on.given and so on.

Additional buckets are linked together in a Additional buckets are linked together in a linked listlinked list

Problems: Problems: searches and insertions might take liner timesearches and insertions might take liner time deletions are difficult to performdeletions are difficult to perform

Page 7: Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)

Dynamic HashingDynamic Hashing Extendable hashingExtendable hashing

buckets created as neededbuckets created as needed Example of extendable hashingExample of extendable hashing

Insert the following countries into database: Insert the following countries into database: England, France, China, Germany, Egypt, England, France, China, Germany, Egypt, AustraliaAustralia

We will use hash function of sum of ASCII We will use hash function of sum of ASCII codes of all characters in a namecodes of all characters in a name

Assumption: bucket can’t hold more than 2 Assumption: bucket can’t hold more than 2 recordsrecords

Page 8: Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)

Extendable HashingExtendable Hashing

Example (contd.)Example (contd.)

Page 9: Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)
Page 10: Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)
Page 11: Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)
Page 12: Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)
Page 13: Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)

Extendable HashingExtendable Hashing

Problem with dynamic hashingProblem with dynamic hashing additional level of indirectionadditional level of indirection

Page 14: Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)

Hash functionHash function

Importance of choosing the right hash Importance of choosing the right hash functionfunction Uniform function = even distribution of dataUniform function = even distribution of data Table size is a prime numberTable size is a prime number

There is no perfect hash function so There is no perfect hash function so collisions are possiblecollisions are possible

Page 15: Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)

Handling CollisionsHandling Collisions

Linear probingLinear probing Quadratic probingQuadratic probing Double hashingDouble hashing ChainingChaining

Page 16: Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)

Linear ProbingLinear Probing If a slot is used, take next availableIf a slot is used, take next available If next is used, continue until an empty slot is If next is used, continue until an empty slot is

foundfound If end of table is reached, wrap around from If end of table is reached, wrap around from

beginning.beginning.

Problems:Problems: Clustering of dataClustering of data How far to go if there are no empty slots?How far to go if there are no empty slots? Deletion: deleting key in the middle of a clusterDeletion: deleting key in the middle of a cluster

Page 17: Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)

Quadratic probingQuadratic probing

To avoid clustering take not the next slot To avoid clustering take not the next slot but 1but 122, 2, 222, 3, 322, 4, 422, etc., etc.

Problem:Problem: Secondary clustering, since the same seek Secondary clustering, since the same seek

pattern is used in case of a collisionpattern is used in case of a collision

Page 18: Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)

Double HashingDouble Hashing

In case of collision, apply second hash In case of collision, apply second hash function. function.

Overall better performance than linear and Overall better performance than linear and quadratic probingquadratic probing

Page 19: Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)

ChainingChaining Entries are linked listsEntries are linked lists In case of a collision the entries are added In case of a collision the entries are added

to those linked lists.to those linked lists.

Problem:Problem: In case of frequent collisions on the same In case of frequent collisions on the same

key, search for that key in linked list becomes key, search for that key in linked list becomes linear. Alternative data structures are used to linear. Alternative data structures are used to solve this problem (i.e. Bsolve this problem (i.e. B++-trees).-trees).