Random Number Generators.

24
07/04/22 1 Modeling and Modeling and Simulation - 8 Simulation - 8 Random Number Generators.

description

Random Number Generators. There is no such thing as a "random number generator", just processes that will generate sequences of numbers that will satisfy more and more statistical tests for "randomness". - PowerPoint PPT Presentation

Transcript of Random Number Generators.

Page 1: Random Number Generators.

04/24/23 1

Modeling and Simulation - 8Modeling and Simulation - 8

• Random Number Generators.

Page 2: Random Number Generators.

04/24/23 2

Modeling and Simulation - 8Modeling and Simulation - 8There is no such thing as a "random number generator", just processes that will generate sequences of numbers that will satisfy more and more statistical tests for "randomness".In some cases the process will generate an unrepeatable sequence - not really desirable from the point of view of creating "repeatable tests", but probably desirable from the point of view of satisfying tests for randomness - while in other cases the sequence will be completely determined by its first element - and still pass an acceptable battery of randomness tests.Basically, what one wants is a way of producing numbers that appear to be independent draws from the U(0,1) distribution, in that they pass a series of statistical tests.

Page 3: Random Number Generators.

04/24/23 3

Modeling and Simulation - 8Modeling and Simulation - 8Characteristics of a good random number generator.1) The numbers produced should appear uniformly distributed on [0, 1] and uncorrelated.2) The generator should be fast and not require much memory - the first condition can be satisfied by a large table, but the second can't.3) The stream of random numbers should be reproducible, both for debugging purposes and for comparison between different systems (or ones where just some parameters have been changed).4) There should be an ability to generate several independent streams.

Page 4: Random Number Generators.

04/24/23 4

Modeling and Simulation - 8Modeling and Simulation - 8One of the earliest generators, credited to von Neumann and Morgenstern (the originators of the theory of games - although John von Neumann was involved in the origination of much that has existed only since 1940), uses the so-called mid-square method. It simply starts from a four digit number, squares it, takes the middle four digits of the result, squares that, etc. It is not very good - tends to zero out fairly quickly - but was the starter.

Page 5: Random Number Generators.

04/24/23 5

Modeling and Simulation - 8Modeling and Simulation - 8Linear Congruential Generators.A sequence of numbers Z1, Z2, …, is defined recursively

Zi = (aZi-1 + c) (mod m),

where m (the modulus), a (the multiplier), c (the increment) and Z0 (the seed) are positive integers.To obtain a sequence of random numbers in [0, 1], we need to divide by m: Ui = Zi/m.In addition to non-negativity, the integers m, a, c and Z0 should satisfy the inequalities:

0 < m, a < m, c < m, Z0 < m.There are at least two obvious problems with this scheme.

Page 6: Random Number Generators.

04/24/23 6

Modeling and Simulation - 8Modeling and Simulation - 8 problem 1:

Zi = ai Z0 +c ai −1( )a−1

⎡ ⎣ ⎢

⎤ ⎦ ⎥ modm( )

The sequence is completely determined from the four parameters - not surprising, but the formula is surprisingly simple…problem 2: the only values attainable are

0, 1/m, 2/m, 3/m, …, i/m, …, (m - 1)/m,and this may be already unduly optimistic since choice for the other parameters and the approximations introduced by floating point computations may substantially reduce this set.This implies that we should expect cycles: sequences Z0, Z1, …, Zn, where Z0 = Zn, and such that no other pair 0 ≤ j < k ≤ n satisfies Zj = Zk. The length of the cycle is called the period of the generator. It should be obvious that the period is at most m.

Page 7: Random Number Generators.

04/24/23 7

Modeling and Simulation - 8Modeling and Simulation - 8Theorem: The Linear Congruence Generator Zi = (aZi-1 + c) (mod m),

has full period (= m) if and only if:a) m and c are relatively prime;b) if q is a prime factor of m, then q divides a - 1;c) if 4 divides m, then 4 divides a - 1.

Prescription: take m to be the largest prime representable in the hardware and go. Noooo…. Why not? Efficiency of operations: the remainder after division by m needs to be obtained, and division is, usually, a slow operation. Choosing m to be of the form 2b, allows one to use hardware operations like shifts and make use of controlled overflow effects, thus speeding up the generation. Choosing a = 2l + 1 allows for a faster multiplication by a. Unfortunately, this has undesired effects on the properties of the generator - even though one can still have full period. The "Law of no free lunch" strikes again.

Page 8: Random Number Generators.

04/24/23 8

Modeling and Simulation - 8Modeling and Simulation - 8If c = 0, then we lose the full period property. It appears still possible to obtain period m - 1 - with a careful choice of m and a.The choice m = 2b appears to have various negatives: either much shorter periods, large gaps in [0, 1], or other bad statistical properties.If m is chosen to be the largest representable prime, choosing a such that the smallest integer l for which al - 1 is divisible by m is l = m - 1 (a is a primitive element modulo m), will give period m - 1. This kind of generator is called a prime modulus multiplicative LCG.A difficulty with this method is the determination of a: no good recipes seem to exists, and trial and error can be rather time consuming...

Page 9: Random Number Generators.

04/24/23 9

Modeling and Simulation - 8Modeling and Simulation - 8Multiple Recursive Generators.A general pattern for a generator would be provided by a function of the type

Zi = g(Zi-1, Zi-2, …, Zi-q) (mod m), q ≥ 1.The linear congruence generators are a special case; a few generators are given as follows.

Quadratic Congruential Generator: g(Zi-1, Zi-2, …, Zi-q) = a Z2

i-1 + b Zi-1 + c

Multiple Recursive Generator: g(Zi-1, Zi-2, …, Zi-q) = a1 Zi-1 + a2 Zi-2 + ... + aq Zi-q

"Fibonacci" Generator: (Fibonacci numbers for q = 2)g(Zi-1, Zi-2, …, Zi-q) = Zi-1 + Zi-q

Page 10: Random Number Generators.

04/24/23 10

Modeling and Simulation - 8Modeling and Simulation - 8Obviously, the choices of the parameters determine the periods and the statistical properties. The first method is related to the mid-square one; the second gives rise to large periods (up to mq - 1); the third has reasonably large periods but unacceptable statistical properties.A last possibility is to change both the multiplier a and the addend c according to two linear recursive generator streams each time we want a new random variate (Zi). It appears that both large periods and good statistical properties can be obtained: the cost is the extra time and complexity - nothing new here.

Page 11: Random Number Generators.

04/24/23 11

Modeling and Simulation - 8Modeling and Simulation - 8Composite Generators.One can try to take two or more separate generators and combine them in some way to generate the final sequence of random numbers. The hope is that combining two bad generators in some relatively inexpensive way will give a good sequence - longer period and better statistical behavior.Shuffling: use the first generator to generate [V1, V2, …, Vk]. Use the second generator to generate an integer in {1, 2, …, k}, say J. Now pick VJ. Replace VJ by the next number generated by the first generator, and repeat the process.Additivity: let {Z1i} and {Z2i} denote the sequences generated by two different generators with different moduli. Let Zi = (Z1i - Z2i) (mod m) for some integer m. Finally set Ui = Zi/m for the (hopefully) uniform variate in [0, 1]. These appear to have long period and good statistical properties, while each of the individual generators does not need large multipliers.

Page 12: Random Number Generators.

04/24/23 12

Modeling and Simulation - 8Modeling and Simulation - 8Tweaking MRGs.J different MRGs, with functions g1, g2, …, gJ, are implemented simultaneously to to form the sequences {Z1,k}, {Z2,k}, …, {ZJ,k}. Let m1 be the modulus from the first one; let 1, 2, …, k be specified constants, and define

Yj = (1 Z1,j + 2 Z2,j +…+ k ZJ,j) (mod m1)

Uj =Yj/m1.See code - for a portable implementation with period approximately 3.1*1057, and excellent statistical properties

Page 13: Random Number Generators.

04/24/23 13

Modeling and Simulation - 8Modeling and Simulation - 8Testing Random Number Generators.We will limit ourselves to introducing some empirical tests - to be applied to actual sequences produced by a generator. No theoretical discussion will be pursued.1) Uniformity. This is just a special case of the 2 test with all parameters known.Divide [0, 1] into k subintervals of equal length and generate U1, U2, …, Un, where k is fairly large (k ≥ 100) and n is large enough so that n/k ≥ 5 (recall that it is recommended that each interval contain at least 5 sample points). Let fj be the number of the Uj's in the j-th subinterval, and compute

χ2 =kn

fj −nk

⎛ ⎝

⎞ ⎠

2

j=1k∑

which, for large n, will have an approximate chi-square distribution with k - 1 degrees of freedom, under the null hypothesis that the Ui's are IID U(0, 1) random variables.

Page 14: Random Number Generators.

04/24/23 14

Modeling and Simulation - 8Modeling and Simulation - 8We reject the null hypothesis at level , if 2 ≥ 2k-1,1-

where 2k-1,1- is the upper 1 - critical point of the chi-square distribution with k - 1 degrees of freedom.Example:

Page 15: Random Number Generators.

04/24/23 15

Modeling and Simulation - 8Modeling and Simulation - 82) Serial Test.This is a generalization of the chi-square test to higher dimensions. Pick a positive integer d and construct d-dimensional vectors U1 = (U1, …, Ud), U2 = (Ud+1, …, U2d), …, Un, points in the d-dimensional hypercube [0, 1]d. Divide [0, 1] into k equal subintervals, and let fj1j2...jd

be the number of Uj's having first component in subinterval j1, second in subinterval j2, etc.. Let χ2 d( )=kd

n... f j1 j2 ...jd − n

kd⎛ ⎝

⎞ ⎠ j d=1

k∑j2=1k∑j1=1

k∑which has an approximate chi-square distribution with kd - 1 degrees of freedom. Again, make sure that each small hypercube is likely to contain at least 5 samples.Why this test? If the individual Ui's are correlated, the distribution of the d-vectors Ui will deviate from d-dimensional uniformity. If d = 2, this means that the pairs (Ui, Ui+1) will tend to cluster along a diagonal - the SW-NE one if positively correlated..

Page 16: Random Number Generators.

04/24/23 16

Modeling and Simulation - 8Modeling and Simulation - 8Example:

Page 17: Random Number Generators.

04/24/23 17

Modeling and Simulation - 8Modeling and Simulation - 83) Runs Test. (or Runs-Up test)This is a test of independence, not of uniformity.A run-up is a sequence Ui ≤ Ui+1 ≤ Ui+2 ≤ … ≤ Ui+n, of maximal length (i.e., any longer one, from either end, will violate the inequality). Now count the number of runs:

ri = numberof runsuptolength i, fori =1,2,...,5;numberof runsupof length ≥6, fori =6.

⎧ ⎨ ⎩ The test statistic is given by the formula

R=1n

aij ri −nbi( ) rj −nbj( )j=16∑i=1

6∑whereand the matrix of the aij is given by:

b1,b2,...,b6( ) = 16, 5

24, 11120,

19720,

295040,

1840

⎛ ⎝

⎞ ⎠

Page 18: Random Number Generators.

04/24/23 18

Modeling and Simulation - 8Modeling and Simulation - 8

aij[ ]=

4,520.4 9,044.9 13,568 18,091 22,615 27,8929,044.9 18,097 27,139 36,187 45,234 55,789

13,568 27,139 40,721 54,281 67,852 83,68518,091 36,187 54,281 72,414 90,470 111,58022,615 45,234 67,852 90,470 113,262 139,47627,892 55,789 83,685 111,580 139,476 172,860

⎢ ⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥ ⎥

A derivation, and appropriate references can be found in D. Knuth, The Art of Computer Programming, Vol II, Seminumerical Algorithms.For n ≥ 4000 (or larger), R will have an approximate chi-square distribution with 6 degrees of freedom, under the null hypothesis that the Ui's are IID random variables.

Page 19: Random Number Generators.

04/24/23 19

Modeling and Simulation - 8Modeling and Simulation - 8Example:

Page 20: Random Number Generators.

04/24/23 20

Modeling and Simulation - 8Modeling and Simulation - 84) Correlation Test.Do the generated Uj's exhibit discernible correlation? Simply compute estimates of the correlation at lags k = 1, 2, …, l.

Recall that the correlation at lag j in a sequence of random variables X1, X2, …, is defined as

ρj =C j

C0=Cov Xi,Xi+j( )

C0=E Xi Xi+j( )−E Xi( )E Xi+j( )

Var Xi( )We are interested in determining whether we have a uniform distribution, so that Xi = Ui (the samples are - hopefully -uniformly distributed in [0, 1]). From this hypothesis we have E(Ui) = 1/2, Var(Ui) = 1/12. We have Cj = E(Ui Ui+j) - 1/4, C0 = 1/12. Finally,

j = 12 E(Ui Ui+j) -3.How do we estimate it? Or, how do we estimate E(Ui Ui+j) ?

Page 21: Random Number Generators.

04/24/23 21

Modeling and Simulation - 8Modeling and Simulation - 8If n is large, we can use a subsequence, and we should obtain an acceptable estimate. What is this subsequence? Let

h = floor((n - 1)/j) - 1.Then:so that we are just using the sequence U1, U1+j, U1+2j,…

If the Uj's are independent, one can also show that

ˆ ρ j = 12h+1 U1+kjU1+(k+1)jk=0

h∑ −3

Var ˆ ρ j( ) =13h+7h+1( )2 .

Finally, under the null hypothesis that j = 0 - assuming n large - one can show that the test statistic

Aj =ˆ ρ j

Var ˆ ρ j( )has an approximate standard normal distribution. This gives us a test for zero correlation with lag j at level by rejecting the zero correlation hypothesis if |Aj| > z1-/2.

Page 22: Random Number Generators.

04/24/23 22

Modeling and Simulation - 8Modeling and Simulation - 8The test should be carried out at several values of j.Example:

Page 23: Random Number Generators.

04/24/23 23

Modeling and Simulation - 8Modeling and Simulation - 8Some observations on empirical tests.The primary objection is that they all suffer from testing only what has been generated: what happens after one stops? Is there any guarantee that we did not stop just before our generator "goes bad"? What does "going bad" mean?There were a large number of studies about environmental causes for cancers - triggered by geographical clusters of cancers. Unfortunately, clustering is the way we would expect random occurrences to present themselves. The clustering could be noticeable enough to cause a town to start all kinds of studies, while the reality is that there are no unusual factors acting on the appearance of the cancers: statistics caused a panic.

Page 24: Random Number Generators.

04/24/23 24

Modeling and Simulation - 8Modeling and Simulation - 8Theoretical Tests: See Knuth, Vol. 2 - and, to some extent, our textbook.Discussion of the C code.