Cormen Algo-lec7

download Cormen Algo-lec7

of 26

Transcript of Cormen Algo-lec7

  • 8/14/2019 Cormen Algo-lec7

    1/26

    October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.1

    Introduction to Algorithms

    6.046J/18.401J

    LECTURE 7Hashing I

    Direct-access tables

    Resolving collisions bychaining

    Choosing hash functions

    Open addressing

    Prof. Charles E. Leiserson

  • 8/14/2019 Cormen Algo-lec7

    2/26

    October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.2

    Symbol-table problem

    Symbol table Sholding n records:

    recordxkey[x]

    key[x]

    Otherfieldscontaining

    satellite data

    Operations on S:

    INSERT(S,x)

    DELETE(S,x) SEARCH(S, k)

    How should the data structure Sbe organized?

  • 8/14/2019 Cormen Algo-lec7

    3/26

    October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.3

    Direct-access table

    IDEA: Suppose that the keys are drawn fromthe set U {0, 1, , m1}, and keys aredistinct. Set up an array T[0 . . m1]:

    T[k] =x ifx Kand key[x] = k,

    NIL otherwise.Then, operations take (1) time.

    Problem: The range of keys can be large: 64-bit numbers (which represent

    18,446,744,073,709,551,616 different keys),

    character strings (even larger!).

  • 8/14/2019 Cormen Algo-lec7

    4/26October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.4

    Hash functions

    Solution: Use a hash function h to map theuniverse Uof all keys into

    {0, 1, , m1}:

    U

    S

    k1

    k2 k3

    k4k5

    0

    m1

    h(k1)

    h(k4)

    h(k2)

    h(k3)

    T

    = h(k5)

    As each key is inserted, h maps it to a slot ofT.When a record to be inserted maps to an alreadyoccupied slot in T, a collision occurs.

  • 8/14/2019 Cormen Algo-lec7

    5/26October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.5

    Resolving collisions by

    chaining Link records in the same slot into a list.

    h(49) = h(86) = h(52) = i

    T

    4949 8686 5252

    Worst case:

    Every key

    hashes to thesame slot.

    Access time =

    (n) if|S| = n

    i

  • 8/14/2019 Cormen Algo-lec7

    6/26

  • 8/14/2019 Cormen Algo-lec7

    7/26October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.7

    Search cost

    The expected time for an unsuccessfulsearch for a record with a given key is

    = (1 + ).

  • 8/14/2019 Cormen Algo-lec7

    8/26

  • 8/14/2019 Cormen Algo-lec7

    9/26

    October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.9

    Search cost

    The expected time for an unsuccessfulsearch for a record with a given key is

    = (1 + ).

    apply hash functionand access slot

    searchthe list

    Expected search time = (1) if = O(1),

    or equivalently, ifn = O(m).

  • 8/14/2019 Cormen Algo-lec7

    10/26

  • 8/14/2019 Cormen Algo-lec7

    11/26

    October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.11

    Choosing a hash function

    The assumption of simple uniform hashing

    is hard to guarantee, but several commontechniques tend to work well in practice aslong as their deficiencies can be avoided.

    Desirata:

    A good hash function should distribute the

    keys uniformly into the slots of the table. Regularity in the key distribution should

    not affect this uniformity.

  • 8/14/2019 Cormen Algo-lec7

    12/26

    October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.12

    Division method

    Assume all keys are integers, and define

    h(k) = kmod m.

    Deficiency: Dont pick an m that has a smalldivisord. A preponderance of keys that arecongruent modulo

    dcan adversely affect

    uniformity.

    h(k)

    Extreme deficiency: Ifm = 2r, then the hash

    doesnt even depend on all the bits ofk:

    Ifk= 10110001110110102 and r= 6, then

    h(k) = 0110102 .

  • 8/14/2019 Cormen Algo-lec7

    13/26

    October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.13

    Division method (continued)

    h(k) = kmod m.

    Pickm to be a prime not too close to a powerof2 or10 and not otherwise used prominentlyin the computing environment.

    Annoyance: Sometimes, making the table size a prime is

    inconvenient.But, this method is popular, although the nextmethod well see is usually superior.

  • 8/14/2019 Cormen Algo-lec7

    14/26

    October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.14

    Multiplication method

    Assume that all keys are integers, m = 2r, and ourcomputer has w-bit words. Define

    h(k) = (Akmod 2w) rsh (wr),

    where rsh is the bitwise right-shift operator and

    A is an odd integer in the range 2w1

  • 8/14/2019 Cormen Algo-lec7

    15/26

    October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.15

    Multiplication method

    example

    4

    0

    35

    2617

    Modular wheel

    h(k) = (Akmod 2w) rsh (wr)

    Suppose that m = 8 = 23

    and that our computerhas w = 7-bit words:

    1 0 1 1 0 0 1 1 1 0 1 0 1 11 0 0 1 0 1 0 0 1 1 0 0 1 1

    = A= k

    h(k) A ..2A

    ..

    3A

    ..

    R l i lli i b

  • 8/14/2019 Cormen Algo-lec7

    16/26

    October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.16

    Resolving collisions by open

    addressingNo storage is used outside of the hash table itself.

    Insertion systematically probes the table until anempty slot is found.

    The hash function depends on both the key and

    probe number:h : U {0, 1, , m1} {0, 1, , m1}.

    The probe sequence h(k,0), h(k,1), , h(k,m1)should be a permutation of{0, 1, , m1}. The table may fill up, and deletion is difficult (but

    not impossible).

  • 8/14/2019 Cormen Algo-lec7

    17/26

  • 8/14/2019 Cormen Algo-lec7

    18/26

    October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.18

    Example of open addressing

    Insert key k= 496:

    0. Probe h(496,0)586133

    204

    481

    T0

    collision1. Probe h(496,1) 586

    m1

  • 8/14/2019 Cormen Algo-lec7

    19/26

    October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.19

    Example of open addressing

    Insert key k= 496:

    0. Probe h(496,0)586133

    204

    481

    T0

    1. Probe h(496,1)

    496

    2. Probe h(496,2)

    insertion

    m1

  • 8/14/2019 Cormen Algo-lec7

    20/26

    October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.20

    Example of open addressing

    Search for key k= 496:

    0. Probe h(496,0)586133

    204

    481

    T0

    m1

    1. Probe h(496,1)

    496

    2. Probe h(496,2)

    Search uses the same probesequence, terminating suc-cessfully if it finds the key

    and unsuccessfully if it encounters an empty slot.

  • 8/14/2019 Cormen Algo-lec7

    21/26

    October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.21

    Probing strategies

    Linear probing:

    Given an ordinary hash function h(k), linearprobing uses the hash function

    h(k,i) = (h(k) + i) mod m.This method, though simple, suffers fromprimaryclustering, where long runs of occupied slots build

    up, increasing the average search time. Moreover,the long runs of occupied slots tend to get longer.

  • 8/14/2019 Cormen Algo-lec7

    22/26

    October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.22

    Probing strategies

    Double hashing

    Given two ordinary hash functions h1(k) and h2(k),double hashing uses the hash function

    h(k,i) = (h1(k) + i

    h2(k)) mod m.This method generally produces excellent results,

    but h2(k) must be relatively prime to m. One way

    is to make m a power of2 and design h2(k) toproduce only odd numbers.

  • 8/14/2019 Cormen Algo-lec7

    23/26

    October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.23

    Analysis of open addressing

    We make the assumption ofuniform hashing:

    Each key is equally likely to have any one ofthe m!permutations as its probe sequence.

    Theorem. Given an open-addressed hashtable with load factor = n/m < 1, theexpected number of probes in an unsuccessfulsearch is at most 1/(1).

  • 8/14/2019 Cormen Algo-lec7

    24/26

    October 3, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.24

    Proof of the theorem

    Proof. At least one probe is always necessary.

    With probability n/m, the first probe hits anoccupied slot, and a second probe is necessary.

    With probability (n1)/(m1), the second probe

    hits an occupied slot, and a third probe isnecessary.

    With probability (n2)/(m2), the third probehits an occupied slot, etc.

    =