FC06d_LCPRG

download FC06d_LCPRG

of 30

description

FC06d_LCPRG

Transcript of FC06d_LCPRG

  • Dr. John Mellor-Crummey

    Department of Computer ScienceRice University

    [email protected]

    Random Number Generation

    COMP 528 Lecture 21 5 April 2005

  • 2Topics for Today

    Understand Motivation Desired properties of a good generator Linear congruential generators

    multiplicative and mixed Tausworthe generators Combined generators Seed selection Myths about random number generation Whats used today: MATLAB, R, Linux

  • 3Why Random Number Generation?

    Simulation must generate random values for variables in aspecified random distributionexamples: normal, exponential,

    How? Two stepsrandom number generation: generate a sequence of uniform FP

    random numbers in [0,1]random variate generation: transform a uniform random

    sequence to produce a sequence with the desired distribution

  • 4How Random Number Generators Work

    Most commonly use recurrence relation

    recurrence is a function of last 1 (or a few numbers), e.g.

    Example:For x0= 5, first 32 numbers are 10, 3, 0, 1, 6, 15, 12, 13, 2, 11, 8, 9,

    14, 7, 4, 5, 10, 3, 0, 1, 6, 15, 12, 13, 2, 11, 8, 9, 14, 7, 4, 5xs are integers in [0,16]dividing by 16, get random numbers in interval [0,1]

    Properties of pseudo-random number sequencesfrom seed value, can determine entire sequencethey pass statistical tests for randomnessreproducibility (often desirable)

    !

    xn = f (xn"1,xn"2,...)

    !

    xn

    = (5xn"1 +1) mod 16

  • 5Random Number Sequences

    Some generators do not repeat the initial part of a sequence

    tail cycle length

    period

  • 6Desired Properties of a Good Generator

    Efficiently computable Period should be large

    dont want random numbers in a simulation to recycle Successive values should be

    independentuniformly distributed

  • 7Linear-Congruential Generators

    1951: D.H. Lehmer found that residues of successive powersof a number have good randomness

    Lehmers generator: multiplicative LCG Modern generalization: mixed LCG

    a,b,m > 0 Result: xn are integers in [0, m-1] Popular because

    analyzed easilycertain guarantees can be made about their properties

    !

    xn

    = an mod m; after computing x

    n"1, xn = axn"1 mod m

    multiplier modulus

    !

    xn

    = (axn"1 + b) mod m

  • 8Properties of LCGs

    Choice of a, b, m affectsperiodautocorrelation

    Observations about LCGsperiod can never be more than m modulus m should be largem = 2k yields efficient implementation by truncationif b is non-zero, obtain period of m iff

    m & b are relatively prime every prime that is a factor of m is also a factor of a - 1 if m is a multiple of 4, a - 1 must be too all of these conditions are met if

    m = 2k, for some integer k a = 4c + 1, for some integer c b is an odd integer

    Full-period generator = one with period mnot all are equally goodlower autocorrelation between adjacent elements = better!

    xn

    = (axn"1 + b) mod m

  • 9Example: Two Candidate LCGs

    Which is better?

    Both must be full period generatorsm = 2k, for some integer ka = 4c + 1, for some integer cb is an odd integer

    !

    xn

    = ((234

    +1)xn"1 +1) mod 2

    35

    xn

    = ((218

    +1)xn"1 +1) mod 2

    35

    !

    xn

    = (axn"1 + b) mod m

  • 10

    Multiplicative LCGs

    More efficient than mixed LCGs: no addition Two classes: m = 2k, m 2k

  • 11

    Multiplicative LCG with m = 2k

    Most efficient LCG: mod = truncation Not full-period: maximum possible period for m = 2k is 2k-2

    only possible if multipler a = 8i3 and x0 is oddconsider

    If 2k-2 period suffices, may use multiplicative LCG for efficiency

    !

    xn

    = 5xn"1 mod 2

    5 (lcg_m2k_good)

    xn

    = 7xn"1 mod 2

    5 (lcg_m2k_bad)

    !

    xn

    = an

    mod 2k

  • 12

    Multiplicative LCG with m 2k

    Avoid small period of LCG when m = 2k: use prime modulus Full period generator with proper choice of a

    when a is primitive root of m i.e. an mod m 1 for n = 1, 2, , m-2

    Consider

    Observationsunlike mixed LCG, xn can never be 0 when m is prime

    !

    xn

    = anmod m, m " 2

    k

    !

    xn

    = 3xn"1 mod 31 (lcg_mprime_good)

    xn

    = 5xn"1 mod 31 (lcg_mprime_bad)

    !

    Note : 53mod 31 =125 mod 31 =1

  • 13

    Examining Bits of a Multiplicative LCG

    testgenerator(@r1,1,20) n decimal binary--- ---------- ----------------- 1 25173 01100010 01010101 2 12345 00110000 00111001 3 54509 11010100 11101101 4 27825 01101100 10110001 5 55493 11011000 11000101 6 25449 01100011 01101001 7 13277 00110011 11011101 8 53857 11010010 01100001 9 64565 11111100 00110101 10 1945 00000111 10011001 11 6093 00010111 11001101 12 24849 01100001 00010001 13 48293 10111100 10100101 14 52425 11001100 11001001 15 61629 11110000 10111101 16 18625 01001000 11000001 17 2581 00001010 00010101 18 25337 01100010 11111001 19 11949 00101110 10101101 20 47473 10111001 01110001

    !

    xn

    = 25,173xn"1 mod 2

    16

    bit 1: always 1bit 2: always 0bit 3: cycle (10) of length 2bit 4: cycle (0110) of length 4

    In general: kth bit follows cycle of length 2k-2, k 2

    Typical of multiplicativeLCG with modulus 2k

  • 14

    Examining Bits of a Mixed LCG

    testgenerator(@r2,1,20) n decimal binary--- ---------- ----------------- 1 39022 10011000 01101110 2 61087 11101110 10011111 3 20196 01001110 11100100 4 45005 10101111 11001101 5 3882 00001111 00101010 6 21259 01010011 00001011 7 65216 11111110 11000000 8 19417 01001011 11011001 9 30502 01110111 00100110 10 20919 01010001 10110111 11 26076 01100101 11011100 12 16421 01000000 00100101 13 44130 10101100 01100010 14 63139 11110110 10100011 15 32824 10000000 00111000 16 14513 00111000 10110001 17 51934 11001010 11011110 18 36303 10001101 11001111 19 35284 10001001 11010100 20 8573 00100001 01111101

    !

    xn

    = (25,173xn"1 +13,849)mod 2

    16

    bit 1: cycle (10) of length 2bit 2: cycle (1100) of length 4bit 3: cycle (11110000) of length 8

    In general: kth bit follows cycle of length 2k

    Typical of mixed LCG withmodulus 2k

  • 15

    LCG Cautions

    Properties guaranteed only ifcomputations are exact: no roundoff

    use integer arithmetic without overflow Low-order bits not very random, high-order bits better

    if one wants k bits && k < machine word length better to choose high-order k bits than low-order k bits.

  • 16

    Tausworthe Generators

    Significant interest in huge random numberscryptographic applications want many-bit random numbersproduce k-bit numbers by

    produce random sequence of bits chunk bit stream into k-bit quantities

    1965: Tausworthe generator

    uses last q bits of bit stream to compute next bit autoregressive, order q: AR(q)

    AR(q) generator maximum period = 2q - 1!

    bn = cq"1bn"1 # cq"2bn"2 # cq"3bn"3 # ...# c0bn"q

    ci and bi are binary variables

    # is the xor operation (mod 2 addition)

  • 17

    Tausworthe Generator Notation

    Characteristic polynomial notation

    Most polynomials for Tausworthe generators are trinomials Period depends on characteristic polynomial

    if period = 2q - 1, characteristic polynomial is primitive polynomial!

    x7

    + x3

    +1

    bn+7 " bn+3 " bn = 0, n = 0,1,2,...

    bn+7 = bn+3 " bn, n = 0,1,2,...

    bn

    = bn#4 " bn#7, n = 7,8,9,...

    characteristic polynomial

  • 18

    Implementing Tausworthe Generators

    Linear feedback shift registers

    Disadvantage of Tausworthe generatorswhile sequence is good overall, local behavior may not be

    known to perform negatively on runs up and down testfirst-order serial correlation almost 0suspected that some polynomials may give poor high-order corr.

    bn bn-4bn-1 bn-7bn-6bn-5bn-2 bn-3 out!

    x7

    + x3

    +1

    bn+7 " bn+3 " bn = 0, n = 0,1,2,...

    bn+7 = bn+3 " bn, n = 0,1,2,...

    bn

    = bn#4 " bn#7, n = 7,8,9,...

  • 19

    Generating k-bit Random Numbers

    k-bit random numbers xn from binary sequence bn Generalized feedback shift register method (Lewis & Payne 73)

    s is carefully selected delays k: xn and xj have no bits in common for n js relatively prime to 2q - 1: guarantees full period for xn

    Advantagexn can be generated very efficiently with wide-word shift and

    exclusive or operations Requires

    storing an array of seed numberscareful initialization of seed array

    !

    xn

    = 0. bnbn+sbn+2s... bn+(k"1)s

  • 20

    Extended Fibonacci Generators Fibonacci sequence: Fibonacci RNG: Properties

    not very good randomness high serial correlation

    Extended Fibonacci generator (Marsaglia 1983)

    state: ring buffer with 17 valuesinitialization

    save integers in 17 values (not all integers even) initialize j=16,k=4 cursors for buffer

    generate x = B[j] + B[k] B[j] = x j = j -1 mod 17; k = k -1 mod 17 return x

    Propertiespasses most statistical testsperiod = 2k(217-1) (much longer than LCGs)

    !

    xn

    = xn"1 + xn -2

    !

    xn

    = (xn"1 + xn -2)modm

    !

    xn

    = (xn"5 + xn -17)mod2

    k

  • 21

    Some Combined Generators

    Can combine 2 or more generators to produce a better one

    Adding random numbers from 2 or more generatorsif xn and yn are random sequences in [0,m-1], then

    wn= (xn + yn) mod m can be used as a random numberwhy do this?

    can increase period and randomness if two generators have different periods Exclusive-or random numbers from 2 or more generators

    Santha & Vazirani (1984) xor of 2 random n-bit streams generates a more random sequence

    Shuffleuse sequence a to pick which recent element in sequence b to returnMarsaglia & Bray (1964)

    keep 100 items of sequence b use sequence a to select which to return next and replace

    claim: better k distributivity than LFSR methodsproblem: not easy to skip long sequence for multi-stream simulations

  • 22

    Seed Selection Issues

    Wrong combination of seed and RNG can hurtespecially if RNG is flawed

    e.g. seed might be RNG fixed point Cases

    one stream needed if RNG has full period, then any seed as good as another

    multiple streams needed e.g. queue simulation requires

    interarrival time stream service time stream

    requires special care!

  • 23

    Seed Selection Guidelines I Dont use 0

    multiplicative LCGs and Tausworthe generators would stick at 0 Avoid even values

    seed should be odd for multiplicative LCG with m = 2kfor full period generators, all non-zero values equally good

    Dont subdivide one streamdont use a single stream for all random variables

    might be a strong correlation between items in same stream Use non-overlapping streams

    each stream requires separate seed dont use same seed for 2 or more streams!

    if seeds are bad, streams will overlap and not be independentright way: select seeds so streams dont overlap at all

    example: need 3 streams of 20,000 numbers pick u0 as seed for first stream pick u20,000 as seed for second stream pick u40,000 as seed for third stream

  • 24

    Seed Selection Guidelines II

    Reuse seeds in successive replicationsif simulation experiment is replicated several times

    can use seeds from end of previous replication in next one Dont use random seeds

    simulation cant be reproducedimpossible to guarantee multiple streams wont overlap

  • 25

    Myths I

    A complex set of operations leads to random resultscomplicated code random sequence of numbers that will pass

    tests of uniformity and independence A single test of goodness suffices

    sequence 0, 1, , m-1 not random but passes chi-square test will fail run test

    use as many tests as possible Pseudo-random numbers are unpredictable

    e.g. can identify LCG parameters with a few numbers and predictLCG unsuitable for cryptographic applications where

    unpredictability is desired Some seeds are better than others

    e.g. odd vs. even, avoid particular seeds, etc.

    may be true for some generators, but these should be avoided!any non-zero seed should produce equally valid results

    !

    xn

    = (9806xn"1 +1)mod(2

    17"1) 37,911 is a fixed point!

  • 26

    Myths II

    Accurate implementation is not importantperiod and randomness are guaranteed only if formula is

    implemented without overflow or truncation overflows and truncations can

    change the path of a generator reduce the period

    Bits of successive words are equally-randomly distributedif an algorithm produces a k-bit wide number, randomness is

    only guaranteed when all k bits are usedunless specified otherwise, assume any particular bit position

    (or sequence thereof) will not be equally random

  • 27

    Whats Used Today: MATLAB

    rand functionlagged Fibonacci generatorseed cache of 32 floating point numbers combined with a shift register random integer generator

    core: j ^= (j17); j ^= (j

  • 28

    Whats Used Today: R

    Mersenne-Twister (Matsumoto and Nishimura,1998) [default]twisted GFSR based on Mersenne primesseed: 623-dimensional set of 32-bit integers + a cursorperiod: 219937 - 1equi-distribution in 623 consecutive dimensions (whole period)[note: variant of MT for independent parallel streams exists too]

    Knuth-TAOCP (Knuth, 1997)GFSR using lagged Fibonacci sequences with subtraction

    X[j] = (X[j-100] - X[j-37]) mod 230seed: the set of the 100 last numbers + cyclic shift of bufferperiod: about 2^129.

    Knuth-TAOCP-2002 initialization of GFSR from seed was altered

  • 29

    Whats Used Today: R (continued)

    Wichmann-Hillseed: integer vector of length 3

    seed[i] is in 1:(p[i] - 1) p is the length 3 vector of primes, p = (30269, 30307, 30323)

    cycle length: 6.9536e12 = prod(p-1)/4reference: Applied Statistics (1984) 33, 123

    Marsaglia-Multicarry multiply-with-carry RNG (Marsaglia)seed: two integers, all values allowedperiod: > 260

    has passed all tests (according to Marsaglia) Super-Duper (Marsaglia)

    doesnt pass the MTUPLE test of the Diehard batteryperiod: about 4.6*10^18 for most initial seedsseed: 2 integers (first: all values allowed; second: odd value).

    default seeds are the Tausworthe and congruence long integers

  • 30

    Whats Used Today: Linux

    random functionnon-linear additive feedback-based generatorstate: 8, 32, 64, 128, or 256 bytesall bits considered random

    rand functionbottom 12 bits go through cyclic patternhigher-order bits more random