Cormen Algo-lec8

download Cormen Algo-lec8

of 19

Transcript of Cormen Algo-lec8

  • 8/14/2019 Cormen Algo-lec8

    1/19

    Introduction to Algorithms

    6.046J/18.401J

    LECTURE

    8Hashing II Universal hashing

    Universality theorem Constructing a set of

    universal hash functions

    Perfect hashing

    Prof. Charles E. LeisersonOctober 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.1

  • 8/14/2019 Cormen Algo-lec8

    2/19

    A weakness of hashing

    Problem: For any hash function h, a set of keys exists that can cause the average access time of a hash table to skyrocket. An adversary can pick all keys from

    {kU: h(k) = i} for some slot i.IDEA: Choose the hash function at random,independently of the keys.

    Even if an adversary can see your code,he or she cannot find a bad set of keys,since he or she doesnt know exactly

    which hash function will be chosen.October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.2

  • 8/14/2019 Cormen Algo-lec8

    3/19

    Universal hashing

    Definition. Let Ube a universe of keys, andletH be a finite collection of hash functions,

    each mapping Uto {0, 1, , m1}. We sayH is universalif for allx,y U, wherex y,we have |{h H : h(x) = h(y)}| = |H| /m.That is, the chance of a collision {h : h(x) = Hh(y)}

    betweenx andy is

    1/m if we choose h |H|randomly fromH.

    m

    October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.3

  • 8/14/2019 Cormen Algo-lec8

    4/19

    Universality is good

    Theorem. Let hbe a hash function chosen

    (uniformly) at random from a universal setH

    of hash functions. Suppose h is used to hash

    n arbitrary keys into the m slots of a table T.Then, for a given keyx, we have

    E[#collisions withx] < n/m.

    October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.4

  • 8/14/2019 Cormen Algo-lec8

    5/19

    Proof of theorem

    Proof. Let Cxbe the random variable denoting

    the total number of collisions of keys in Twithx, and let1 ifh(x) = h(y),

    c =xy 0 otherwise.

    Note: E[cxy] = 1/m and Cx = cxy . }{Ty x

    October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.5

  • 8/14/2019 Cormen Algo-lec8

    6/19

    Proof (continued)

    [CE x ]=E

    Take expectation of both sides.

    cxy }{Ty x

    October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.6

  • 8/14/2019 Cormen Algo-lec8

    7/19

    Proof (continued)

    [CE x ]=E Take expectationof both sides.

    cxy }{Ty x

    = [cE xy ] Linearity ofexpectation. }{Ty x

    October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.7

  • 8/14/2019 Cormen Algo-lec8

    8/19

    Proof (continued)

    [CE x ]=E

    Take expectation of both sides.

    cxy }{Ty x

    = [cE xy ] Linearity ofexpectation. }{Ty x

    = 1/ m E[cxy] = 1/m. }{Ty x

    October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.8

  • 8/14/2019 Cormen Algo-lec8

    9/19

    Proof (continued)

    CE[ x ]=E Take expectationof both sides.

    cxyTy x }{

    = [cE xy ] Linearity ofexpectation.Ty x }{

    = 1/ m E[cxy] = 1/m.Ty x }{1 Algebra..n=m

    October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.9

  • 8/14/2019 Cormen Algo-lec8

    10/19

    Constructing a set of

    universal hash functionsLet mbe prime. Decompose key kinto r+ 1digits, each with value in the set {0, 1, , m1}.That is, let k= k0, k1, , kr, where 0 ki < m.Randomized strategy:Picka = a0, a1, , arwhere each ai is chosen randomly from {0, 1, , m1}.

    rDefine ha (k) = ka i mod m . Dot product,i modulo mi=0

    |H| = mr+ 1. REMEMBERHow big is H = {ha}?THIS!

    October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.10

  • 8/14/2019 Cormen Algo-lec8

    11/19

    Universality of dot-product

    hash functionsTheorem. The setH = {ha} is universal.

    Proof. Suppose that x = x0,x1, ,xrandy =y0,y1, ,yrbe distinct keys. Thus, they differin at least one digit position, wlog position 0.

    For how many ha H dox andy collide?We must have ha(x) = ha(y), which implies that

    r r

    xa i ya (mod m) .i i ii=0 i=0

    October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.11

  • 8/14/2019 Cormen Algo-lec8

    12/19

    Proof (continued)

    Equivalently, we haver

    ai (xi yi ) 0 (mod m)i=0or

    ra0 (x0 y0 ) + ai (xi yi ) 0 (mod m) ,

    i=1which implies that

    r

    a0 (x0 y0 ) ai (xi yi ) (mod m) .i=1

    October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.12

  • 8/14/2019 Cormen Algo-lec8

    13/19

    Fact from number theory

    Theorem. Let mbe prime. For anyzZmsuch thatz0, there exists a uniquez1 Zmsuch that

    zz1 1 (mod m).Example: m = 7.

    z 1 2 3 4 5 6z1 1 4 5 2 3 6

    October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.13

  • 8/14/2019 Cormen Algo-lec8

    14/19

    Back to the proof

    We haver

    a0 (x0 y0 ) ai (xi yi ) (mod m) ,i=1and sincex0 y0 , an inverse (x0y0 )1 must exist,which implies that

    ra0 ai (xi yi ) (x0 y0 )1 (mod m).

    i=1

    Thus, for any choices ofa1, a2, , ar, exactlyone choice ofa0 causesx andy to collide.

    October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.14

  • 8/14/2019 Cormen Algo-lec8

    15/19

    Proof (completed)

    Q. How many has causex andy to collide?

    A. There are m choices for each ofa1, a2, , ar,but once these are chosen, exactly one choicefora0 causesx andy to collide, namely

    r (x0 y0 ) 1

    = ai(xi yi) mod m .a0 i 1=Thus, the number ofh s that causex andy

    ato collide is mr 1 = mr= |H|/m.

    October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.15

  • 8/14/2019 Cormen Algo-lec8

    16/19

    40 37 22

    26

    14 274 31

    1 00

    9 86

    Perfect hashing

    Given a set ofn keys, construct a static hash table of size m = O(n) such that SEARCH takes (1) time in the worst case.

    40 37

    26

    14 27

    S4

    S6

    4 31

    1 00

    9 86

    h31(14) = h31(27) = 1

    T S1IDEA: Two-

    level scheme

    01with universal 2

    hashing at 3both levels. 4

    No collisions 5at level 2!

    6m a 0 1 2 3 4 5 6 7 8

    October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.16

    22

  • 8/14/2019 Cormen Algo-lec8

    17/19

    Collisions at level 2

    Theorem. LetHbe a class of universal hash

    functions for a table of size m = n2. Then, if we

    use a random h H to hash n keys into the table,the expected number of collisions is at most 1/2.

    Proof. By the definition of universality, the

    probability that 2 given keys in the table collidenunderh is 1/m = 1/n22

    of keys that can possibly collide, the expectedSince there are pairs.

    number of collisions is

    n 1 =

    (nn 1) 1

  • 8/14/2019 Cormen Algo-lec8

    18/19

    No collisions at level 2

    Corollary. The probability of no collisionsis at least 1/2.

    Proof. Markovs inequality says that for anynonnegative random variableX, we have

    Pr{Xt} E[X]/t.Applying this inequality with t= 1, we findthat the probability of1 or more collisions is

    at most 1/2.Thus, just by testing random hash functions

    in H, well quickly find one that works. October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.18

  • 8/14/2019 Cormen Algo-lec8

    19/19

    Analysis of storage

    For the level-1 hash table T, choose m = n, andlet nibe random variable for the number of keys

    that hash to slot i in T. By using ni2 slots for thelevel-2 hash table Si, the expected total storagerequired for the two-level scheme is therefore

    E

    m 1

    i=0( ) ni

    2 = (n),

    since the analysis is identical to the analysis fromrecitation of the expected running time of bucketsort. (For a probability bound, apply Markov.)

    October 5, 2005 Copyright 2001-5 by Erik D. Demaine and Charles E. Leiserson L7.19