FC06d_LCPRG
description
Transcript of FC06d_LCPRG
-
Dr. John Mellor-Crummey
Department of Computer ScienceRice University
Random Number Generation
COMP 528 Lecture 21 5 April 2005
-
2Topics for Today
Understand Motivation Desired properties of a good generator Linear congruential generators
multiplicative and mixed Tausworthe generators Combined generators Seed selection Myths about random number generation Whats used today: MATLAB, R, Linux
-
3Why Random Number Generation?
Simulation must generate random values for variables in aspecified random distributionexamples: normal, exponential,
How? Two stepsrandom number generation: generate a sequence of uniform FP
random numbers in [0,1]random variate generation: transform a uniform random
sequence to produce a sequence with the desired distribution
-
4How Random Number Generators Work
Most commonly use recurrence relation
recurrence is a function of last 1 (or a few numbers), e.g.
Example:For x0= 5, first 32 numbers are 10, 3, 0, 1, 6, 15, 12, 13, 2, 11, 8, 9,
14, 7, 4, 5, 10, 3, 0, 1, 6, 15, 12, 13, 2, 11, 8, 9, 14, 7, 4, 5xs are integers in [0,16]dividing by 16, get random numbers in interval [0,1]
Properties of pseudo-random number sequencesfrom seed value, can determine entire sequencethey pass statistical tests for randomnessreproducibility (often desirable)
!
xn = f (xn"1,xn"2,...)
!
xn
= (5xn"1 +1) mod 16
-
5Random Number Sequences
Some generators do not repeat the initial part of a sequence
tail cycle length
period
-
6Desired Properties of a Good Generator
Efficiently computable Period should be large
dont want random numbers in a simulation to recycle Successive values should be
independentuniformly distributed
-
7Linear-Congruential Generators
1951: D.H. Lehmer found that residues of successive powersof a number have good randomness
Lehmers generator: multiplicative LCG Modern generalization: mixed LCG
a,b,m > 0 Result: xn are integers in [0, m-1] Popular because
analyzed easilycertain guarantees can be made about their properties
!
xn
= an mod m; after computing x
n"1, xn = axn"1 mod m
multiplier modulus
!
xn
= (axn"1 + b) mod m
-
8Properties of LCGs
Choice of a, b, m affectsperiodautocorrelation
Observations about LCGsperiod can never be more than m modulus m should be largem = 2k yields efficient implementation by truncationif b is non-zero, obtain period of m iff
m & b are relatively prime every prime that is a factor of m is also a factor of a - 1 if m is a multiple of 4, a - 1 must be too all of these conditions are met if
m = 2k, for some integer k a = 4c + 1, for some integer c b is an odd integer
Full-period generator = one with period mnot all are equally goodlower autocorrelation between adjacent elements = better!
xn
= (axn"1 + b) mod m
-
9Example: Two Candidate LCGs
Which is better?
Both must be full period generatorsm = 2k, for some integer ka = 4c + 1, for some integer cb is an odd integer
!
xn
= ((234
+1)xn"1 +1) mod 2
35
xn
= ((218
+1)xn"1 +1) mod 2
35
!
xn
= (axn"1 + b) mod m
-
10
Multiplicative LCGs
More efficient than mixed LCGs: no addition Two classes: m = 2k, m 2k
-
11
Multiplicative LCG with m = 2k
Most efficient LCG: mod = truncation Not full-period: maximum possible period for m = 2k is 2k-2
only possible if multipler a = 8i3 and x0 is oddconsider
If 2k-2 period suffices, may use multiplicative LCG for efficiency
!
xn
= 5xn"1 mod 2
5 (lcg_m2k_good)
xn
= 7xn"1 mod 2
5 (lcg_m2k_bad)
!
xn
= an
mod 2k
-
12
Multiplicative LCG with m 2k
Avoid small period of LCG when m = 2k: use prime modulus Full period generator with proper choice of a
when a is primitive root of m i.e. an mod m 1 for n = 1, 2, , m-2
Consider
Observationsunlike mixed LCG, xn can never be 0 when m is prime
!
xn
= anmod m, m " 2
k
!
xn
= 3xn"1 mod 31 (lcg_mprime_good)
xn
= 5xn"1 mod 31 (lcg_mprime_bad)
!
Note : 53mod 31 =125 mod 31 =1
-
13
Examining Bits of a Multiplicative LCG
testgenerator(@r1,1,20) n decimal binary--- ---------- ----------------- 1 25173 01100010 01010101 2 12345 00110000 00111001 3 54509 11010100 11101101 4 27825 01101100 10110001 5 55493 11011000 11000101 6 25449 01100011 01101001 7 13277 00110011 11011101 8 53857 11010010 01100001 9 64565 11111100 00110101 10 1945 00000111 10011001 11 6093 00010111 11001101 12 24849 01100001 00010001 13 48293 10111100 10100101 14 52425 11001100 11001001 15 61629 11110000 10111101 16 18625 01001000 11000001 17 2581 00001010 00010101 18 25337 01100010 11111001 19 11949 00101110 10101101 20 47473 10111001 01110001
!
xn
= 25,173xn"1 mod 2
16
bit 1: always 1bit 2: always 0bit 3: cycle (10) of length 2bit 4: cycle (0110) of length 4
In general: kth bit follows cycle of length 2k-2, k 2
Typical of multiplicativeLCG with modulus 2k
-
14
Examining Bits of a Mixed LCG
testgenerator(@r2,1,20) n decimal binary--- ---------- ----------------- 1 39022 10011000 01101110 2 61087 11101110 10011111 3 20196 01001110 11100100 4 45005 10101111 11001101 5 3882 00001111 00101010 6 21259 01010011 00001011 7 65216 11111110 11000000 8 19417 01001011 11011001 9 30502 01110111 00100110 10 20919 01010001 10110111 11 26076 01100101 11011100 12 16421 01000000 00100101 13 44130 10101100 01100010 14 63139 11110110 10100011 15 32824 10000000 00111000 16 14513 00111000 10110001 17 51934 11001010 11011110 18 36303 10001101 11001111 19 35284 10001001 11010100 20 8573 00100001 01111101
!
xn
= (25,173xn"1 +13,849)mod 2
16
bit 1: cycle (10) of length 2bit 2: cycle (1100) of length 4bit 3: cycle (11110000) of length 8
In general: kth bit follows cycle of length 2k
Typical of mixed LCG withmodulus 2k
-
15
LCG Cautions
Properties guaranteed only ifcomputations are exact: no roundoff
use integer arithmetic without overflow Low-order bits not very random, high-order bits better
if one wants k bits && k < machine word length better to choose high-order k bits than low-order k bits.
-
16
Tausworthe Generators
Significant interest in huge random numberscryptographic applications want many-bit random numbersproduce k-bit numbers by
produce random sequence of bits chunk bit stream into k-bit quantities
1965: Tausworthe generator
uses last q bits of bit stream to compute next bit autoregressive, order q: AR(q)
AR(q) generator maximum period = 2q - 1!
bn = cq"1bn"1 # cq"2bn"2 # cq"3bn"3 # ...# c0bn"q
ci and bi are binary variables
# is the xor operation (mod 2 addition)
-
17
Tausworthe Generator Notation
Characteristic polynomial notation
Most polynomials for Tausworthe generators are trinomials Period depends on characteristic polynomial
if period = 2q - 1, characteristic polynomial is primitive polynomial!
x7
+ x3
+1
bn+7 " bn+3 " bn = 0, n = 0,1,2,...
bn+7 = bn+3 " bn, n = 0,1,2,...
bn
= bn#4 " bn#7, n = 7,8,9,...
characteristic polynomial
-
18
Implementing Tausworthe Generators
Linear feedback shift registers
Disadvantage of Tausworthe generatorswhile sequence is good overall, local behavior may not be
known to perform negatively on runs up and down testfirst-order serial correlation almost 0suspected that some polynomials may give poor high-order corr.
bn bn-4bn-1 bn-7bn-6bn-5bn-2 bn-3 out!
x7
+ x3
+1
bn+7 " bn+3 " bn = 0, n = 0,1,2,...
bn+7 = bn+3 " bn, n = 0,1,2,...
bn
= bn#4 " bn#7, n = 7,8,9,...
-
19
Generating k-bit Random Numbers
k-bit random numbers xn from binary sequence bn Generalized feedback shift register method (Lewis & Payne 73)
s is carefully selected delays k: xn and xj have no bits in common for n js relatively prime to 2q - 1: guarantees full period for xn
Advantagexn can be generated very efficiently with wide-word shift and
exclusive or operations Requires
storing an array of seed numberscareful initialization of seed array
!
xn
= 0. bnbn+sbn+2s... bn+(k"1)s
-
20
Extended Fibonacci Generators Fibonacci sequence: Fibonacci RNG: Properties
not very good randomness high serial correlation
Extended Fibonacci generator (Marsaglia 1983)
state: ring buffer with 17 valuesinitialization
save integers in 17 values (not all integers even) initialize j=16,k=4 cursors for buffer
generate x = B[j] + B[k] B[j] = x j = j -1 mod 17; k = k -1 mod 17 return x
Propertiespasses most statistical testsperiod = 2k(217-1) (much longer than LCGs)
!
xn
= xn"1 + xn -2
!
xn
= (xn"1 + xn -2)modm
!
xn
= (xn"5 + xn -17)mod2
k
-
21
Some Combined Generators
Can combine 2 or more generators to produce a better one
Adding random numbers from 2 or more generatorsif xn and yn are random sequences in [0,m-1], then
wn= (xn + yn) mod m can be used as a random numberwhy do this?
can increase period and randomness if two generators have different periods Exclusive-or random numbers from 2 or more generators
Santha & Vazirani (1984) xor of 2 random n-bit streams generates a more random sequence
Shuffleuse sequence a to pick which recent element in sequence b to returnMarsaglia & Bray (1964)
keep 100 items of sequence b use sequence a to select which to return next and replace
claim: better k distributivity than LFSR methodsproblem: not easy to skip long sequence for multi-stream simulations
-
22
Seed Selection Issues
Wrong combination of seed and RNG can hurtespecially if RNG is flawed
e.g. seed might be RNG fixed point Cases
one stream needed if RNG has full period, then any seed as good as another
multiple streams needed e.g. queue simulation requires
interarrival time stream service time stream
requires special care!
-
23
Seed Selection Guidelines I Dont use 0
multiplicative LCGs and Tausworthe generators would stick at 0 Avoid even values
seed should be odd for multiplicative LCG with m = 2kfor full period generators, all non-zero values equally good
Dont subdivide one streamdont use a single stream for all random variables
might be a strong correlation between items in same stream Use non-overlapping streams
each stream requires separate seed dont use same seed for 2 or more streams!
if seeds are bad, streams will overlap and not be independentright way: select seeds so streams dont overlap at all
example: need 3 streams of 20,000 numbers pick u0 as seed for first stream pick u20,000 as seed for second stream pick u40,000 as seed for third stream
-
24
Seed Selection Guidelines II
Reuse seeds in successive replicationsif simulation experiment is replicated several times
can use seeds from end of previous replication in next one Dont use random seeds
simulation cant be reproducedimpossible to guarantee multiple streams wont overlap
-
25
Myths I
A complex set of operations leads to random resultscomplicated code random sequence of numbers that will pass
tests of uniformity and independence A single test of goodness suffices
sequence 0, 1, , m-1 not random but passes chi-square test will fail run test
use as many tests as possible Pseudo-random numbers are unpredictable
e.g. can identify LCG parameters with a few numbers and predictLCG unsuitable for cryptographic applications where
unpredictability is desired Some seeds are better than others
e.g. odd vs. even, avoid particular seeds, etc.
may be true for some generators, but these should be avoided!any non-zero seed should produce equally valid results
!
xn
= (9806xn"1 +1)mod(2
17"1) 37,911 is a fixed point!
-
26
Myths II
Accurate implementation is not importantperiod and randomness are guaranteed only if formula is
implemented without overflow or truncation overflows and truncations can
change the path of a generator reduce the period
Bits of successive words are equally-randomly distributedif an algorithm produces a k-bit wide number, randomness is
only guaranteed when all k bits are usedunless specified otherwise, assume any particular bit position
(or sequence thereof) will not be equally random
-
27
Whats Used Today: MATLAB
rand functionlagged Fibonacci generatorseed cache of 32 floating point numbers combined with a shift register random integer generator
core: j ^= (j17); j ^= (j
-
28
Whats Used Today: R
Mersenne-Twister (Matsumoto and Nishimura,1998) [default]twisted GFSR based on Mersenne primesseed: 623-dimensional set of 32-bit integers + a cursorperiod: 219937 - 1equi-distribution in 623 consecutive dimensions (whole period)[note: variant of MT for independent parallel streams exists too]
Knuth-TAOCP (Knuth, 1997)GFSR using lagged Fibonacci sequences with subtraction
X[j] = (X[j-100] - X[j-37]) mod 230seed: the set of the 100 last numbers + cyclic shift of bufferperiod: about 2^129.
Knuth-TAOCP-2002 initialization of GFSR from seed was altered
-
29
Whats Used Today: R (continued)
Wichmann-Hillseed: integer vector of length 3
seed[i] is in 1:(p[i] - 1) p is the length 3 vector of primes, p = (30269, 30307, 30323)
cycle length: 6.9536e12 = prod(p-1)/4reference: Applied Statistics (1984) 33, 123
Marsaglia-Multicarry multiply-with-carry RNG (Marsaglia)seed: two integers, all values allowedperiod: > 260
has passed all tests (according to Marsaglia) Super-Duper (Marsaglia)
doesnt pass the MTUPLE test of the Diehard batteryperiod: about 4.6*10^18 for most initial seedsseed: 2 integers (first: all values allowed; second: odd value).
default seeds are the Tausworthe and congruence long integers
-
30
Whats Used Today: Linux
random functionnon-linear additive feedback-based generatorstate: 8, 32, 64, 128, or 256 bytesall bits considered random
rand functionbottom 12 bits go through cyclic patternhigher-order bits more random