Download - cs668-lec10-MonteCarlo.ppt

7/27/2019 cs668-lec10-MonteCarlo.ppt

http://slidepdf.com/reader/full/cs668-lec10-montecarloppt 1/65

Lecture 10 Outline

Monte Carlo methods

History of methods

Sequential random number generators

Parallel random number generators

Generating non-uniform random numbers

Monte Carlo case studies



Monte Carlo Methods

Monte Carlo is another name for statisticalsampling methods of great importance to physicsand computer science

Applications of Monte Carlo Method

Evaluating integrals of arbitrary functions of 6+dimensions

Predicting future values of stocks

Solving partial differential equations Sharpening satellite images

Modeling cell populations

Finding approximate solutions to NP-hard

problems



• In 1738, Swiss physicist and mathematician Daniel Bernoulli published

Hydrodynamica which laid the basis for the kinetic theory of gases: great

numbers of molecules moving in all directions, that their impact on asurface causes the gas pressure that we feel, and that what we experience

as heat is simply the kinetic energy of their motion.

• In 1859, Scottish physicist James Clerk Maxwell formulated the

distribution of molecular velocities, which gave the proportion of molecules having a certain velocity in a specific range. This was the

first-ever statistical law in physics. Maxwell used a simple thought

experiment: particles must move independent of any chosen coordinates,

hence the only possible distribution of velocities must be normal in each

coordinate.

• In 1864, Ludwig Boltzmann, a young student in Vienna, came across

Maxwell‟s paper and was so inspired by it that he spent much of his

long, distinguished, and tortured life developing the subject further.

An Interesting History

http://en.wikipedia.org/wiki/Daniel_Bernoulli

http://en.wikipedia.org/wiki/Kinetic_theory_of_gases

http://en.wikipedia.org/wiki/Heat

http://en.wikipedia.org/wiki/James_Clerk_Maxwell

http://en.wikipedia.org/wiki/Maxwell_distribution

http://en.wikipedia.org/wiki/Ludwig_Boltzmann

http://en.wikipedia.org/wiki/Ludwig_Boltzmann

http://en.wikipedia.org/wiki/Maxwell_distribution

http://en.wikipedia.org/wiki/James_Clerk_Maxwell

http://en.wikipedia.org/wiki/Heat

http://en.wikipedia.org/wiki/Kinetic_theory_of_gases

http://en.wikipedia.org/wiki/Daniel_Bernoulli



History of Monte Carlo Method Credit for inventing the Monte Carlo method is shared by Stanislaw Ulam,

John von Neuman and Nicholas Metropolis.

Ulam, a Polish born mathematician, worked for John von Neumann on theManhattan Project. Ulam is known for designing the hydrogen bomb withEdward Teller in 1951. In a thought experiment he conceived of the MCmethod in 1946 while pondering the probabilities of winning a card game of solitaire.

Ulam, von Neuman, and Metropolis developed algorithms for computer implementations, as well as exploring means of transforming non-random problems into random forms that would facilitate their solution via statisticalsampling. This work transformed statistical sampling from a mathematicalcuriosity to a formal methodology applicable to a wide variety of problems. Itwas Metropolis who named the new methodology after the casinos of MonteCarlo. Ulam and Metropolis published a paper called “The Monte CarloMethod” in Journal of the American Statistical Association, 44 (247), 335-341, in 1949.



Solving Integration Problems via Statistical Sampling:

Monte Carlo Approximation

How to evaluate integral of f(x)?



Integration Approximation

Can approximate using another function g(x)




Can approximate by taking the average or expected value




Estimate the average by taking N samples



Monte Carlo Integration

Im = Monte Carlo estimate

N = number of samples

x1, x2, …, x N are uniformly distributed random numbers between aand b




We have the definition of expected value and how to estimate it.

Since the expected value can be expressed as an integral, the integral

is also approximated by the sum.

To simplify the integral, we can substitute g(x) = f(x)p(x).



Variance

The variance describes how much the sampled values vary fromeach other.

Variance proportional to 1/N



Variance

Standard Deviation is just the square root of the variance

Standard Deviation proportional to 1 / sqrt(N)

Need 4X samples to halve the error



Variance

Problem: Variance (noise) decreases slowly

Using more samples only removes a small amount of noise



Variance Reduction

There are several ways to reduce the variance Importance Sampling

Stratified Sampling

Quasi-random Sampling

Metropolis Random Mutations



Importance Sampling

Idea: use more samples in important regions of the function If function is high in small areas, use more samples there



Importance Sampling

Want g/p to have low variance

Choose a good function p similar to g:



Stratified Sampling

Partition S into smaller domains Si

Evaluate integral as sum of integrals over Si

Example: jittering for pixel sampling

Often works much better than importance sampling in practice



Parallelism in Monte Carlo Methods

Monte Carlo methods often amenable to

parallelism

Find an estimate about p times faster

OR

Reduce error of estimate by p1/2



Random versus Pseudo-random

Virtually all computers have “random number”generators

Their operation is deterministic

Sequences are predictable

More accurately called “pseudo-random number”generators

In this chapter “random” is shorthand for “pseudo-random”

“RNG” means “random number generator”



Properties of an Ideal RNG

Uniformly distributed

Uncorrelated

Never cycles

Satisfies any statistical test for randomness

Reproducible

Machine-independent

Changing “seed” value changes sequence

Easily split into independent subsequences

Fast

Limited memory requirements



No RNG Is Ideal

Finite precision arithmetic finite number

of states cycles

Period = length of cycle

If period > number of values needed,

effectively acyclic

Reproducible correlations

Often speed versus quality trade-offs



Linear Congruential RNGs

M c X a X ii mod)( 1

Multiplier

Additive constant

Modulus

Sequence depends on choice of seed, X 0



Period of Linear Congruential RNG

Maximum period is M

For 32-bit integers maximum period is 232,

or about 4 billion

This is too small for modern computers

Use a generator with at least 48 bits of

precision



Producing Floating-Point Numbers

X i, a, c, and M are all integers

X is range in value from 0 to M -1

To produce floating-point numbers in range

[0, 1), divide X i by M



Defects of Linear Congruential RNGs

Least significant bits correlated

Especially when M is a power of 2

k -tuples of random numbers form a lattice

Points tend to lie on hyperplanes

Especially pronounced when k is large



Lagged Fibonacci RNGs

qi pii X X X

p and q are lags, p > q * is any binary arithmetic operation

Addition modulo M

Subtraction modulo M Multiplication modulo M

Bitwise exclusive or



Properties of Lagged Fibonacci RNGs

Require p seed values

Careful selection of seed values, p, and q

can result in very long periods and goodrandomness

For example, suppose M has b bits

Maximum period for additive laggedFibonacci RNG is (2 p -1)2b-1



Ideal Parallel RNGs

All properties of sequential RNGs

No correlations among numbers in different

sequences

Scalability

Locality



Parallel RNG Designs

Manager-worker

Leapfrog

Sequence splitting

Independent sequences



Manager-Worker Parallel RNG

Manager process generates random

numbers

Worker processes consume them

If algorithm is synchronous, may achieve

goal of consistency

Not scalable

Does not exhibit locality



Leapfrog Method

Process with rank 1 of 4 processes



Properties of Leapfrog Method

Easy modify linear congruential RNG to

support jumping by p

Can allow parallel program to generatesame tuples as sequential program

Does not support dynamic creation of new

random number streams



Sequence Splitting

Process with rank 1 of 4 processes



Properties of Sequence Splitting

Forces each process to move ahead to its

starting point

Does not support goal of reproducibility

May run into long-range correlation

problems

Can be modified to support dynamiccreation of new sequences



Independent Sequences

Run sequential RNG on each process

Start each with different seed(s) or other

parameters

Example: linear congruential RNGs with

different additive constants

Works well with lagged Fibonacci RNGs

Supports goals of locality and scalability



Statistical Simulation: Metropolis Algorithm

Metropolis algorithm. [Metropolis, Rosenbluth, Rosenbluth, Teller,

Teller 1953] Simulate behavior of a physical system according to principles of

statistical mechanics.

Globally biased toward "downhill" lower-energy steps, but

occasionally makes "uphill" steps to break out of local minima.

Gibbs-Boltzmann function. The probability of finding a physical

system in a state with energy E is proportional to e -E / (kT), where T > 0

is temperature and k is a constant.

For any temperature T > 0, function is monotone decreasing

function of energy E.

System more likely to be in a lower energy state than higher one.

T large: high and low energy states have roughly same

probability

T small: low energy states are much more probable



Metropolis algorithm.

Given a fixed temperature T, maintain current state S.

Randomly perturb current state S to new state S' N(S). If E(S') E(S), update current state to S'

Otherwise, update current state to S' with probability e - E / (kT),

where E = E(S') - E(S) > 0.

Convergence Theorem. Let f S(t) be fraction of first t steps in whichsimulation is in state S. Then, assuming some technical conditions,

with probability 1:

Intuition. Simulation spends roughly the right amount of time in each

state, according to Gibbs-Boltzmann equation.

limt

f S (t ) 1

Z e E (S ) /(kT ),

where Z e E (S ) /(kT )

S N (S )

.



Simulated Annealing Simulated annealing.

T large

probability of accepting an uphill move is large. T small uphill moves are almost never accepted.

Idea: turn knob to control T.

Cooling schedule: T = T(i) at iteration i.

Physical analog.

Take solid and raise it to high temperature, we do not expect it to

maintain a nice crystal structure.

Take a molten solid and freeze it very abruptly, we do not expect

to get a perfect crystal either.

Annealing: cool material gradually from high temperature,

allowing it to reach equilibrium at succession of intermediate

lower temperatures.



Other Distributions

Analytical transformations

Box-Muller Transformation

Rejection method



Analytical Transformation-probability density function f(x)

-cumulative distribution F(x)

In theory of probability , a quanti le function of a distribution is theinverse of its cumulative distribution function.

http://en.wikipedia.org/wiki/Theory_of_probability

http://en.wikipedia.org/wiki/Probability_distribution

http://en.wikipedia.org/wiki/Inverse_function

http://en.wikipedia.org/wiki/Cumulative_distribution_function

http://en.wikipedia.org/wiki/Cumulative_distribution_function

http://en.wikipedia.org/wiki/Inverse_function

http://en.wikipedia.org/wiki/Probability_distribution

http://en.wikipedia.org/wiki/Theory_of_probability



Exponential Distribution:An exponential distribution arises naturally when modeling the time between independent

events that happen at a constant average rate and are memoryless. One of the few cases

where the quartile function is known analytically.

1.0

m xe

m x f

/1)(

m xe x F /1)(

)1ln()(1 umu F



Example 1:

Produce four samples from an exponential

distribution with mean 3

Uniform sample: 0.540, 0.619, 0.452, 0.095

Take natural log of each value and multiply

by -3

Exponential sample: 1.850, 1.440, 2.317,7.072



Example 2:

Simulation advances in time steps of 1 second

Probability of an event happening is from an exponential

distribution with mean 5 seconds

What is probability that event will happen in next second?

F(x=1/5) =1 - exp(-1/5)) = 0.181269247

Use uniform random number to test for occurrence of

event (if u < 0.181 then „event‟ else „no event‟)



Normal Distributions:


Cannot invert cumulative distribution

function to produce formula yielding

random numbers from normal (gaussian)distribution

Box-Muller transformation produces a pair

of standard normal deviates g 1 and g 2 froma pair of normal deviates u1 and u2




repeat

v 1 2u 1 - 1

v 2

2u 2 - 1r v 1 2 + v 2

2

until r > 0 and r < 1

f sqrt (-2 ln r /r )

g 1 f v 1

g 2 f v 2

This is a consequence of

the fact that the chi-

square distribution withtwo degrees of freedom

is an easily-generated

exponential random

variable.



Example

Produce four samples from a normal

distribution with mean 0 and standard

deviation 1

u 1 u 2 v 1 v 2 r f g 1 g 2

0.234 0.784 -0.532 0.568 0.605 1.290 -0.686 0.732

0.824 0.039 0.648 -0.921 1.269

0.430 0.176 -0.140 -0.648 0.439 1.935 -0.271 -1.254



Rejection Method



Example

Generate random variables from this

probability density function

otherwise,0

4/2/4if ),28/()84(

4/0if ,sin

)(

x x

x x

x f



Example (cont.)

otherwise,0

4/20if ),4/2/(1)(

x xh

)2/2/()4/2(

)2/2/()4/2(

otherwise,0

4/20if ,2/2)( x xh

So h(x) f(x) for all x



Example (cont.)

x i u i u i h(x i ) f(x i ) Outcome

0.860 0.975 0.689 0.681 Reject

1.518 0.357 0.252 0.448 Accept

0.357 0.920 0.650 0.349 Reject

1.306 0.272 0.192 0.523 Accept

Two samples from f(x) are 1.518 and 1.306



Case Studies (Topics Introduced)

Temperature inside a 2-D plate (Randomwalk)

Two-dimensional Ising model(Metropolis algorithm)

Room assignment problem (Simulatedannealing)

Parking garage (Monte Carlo time)

Traffic circle (Simulating queues)



Temperature Inside a 2-D Plate

Random walk



Example of Random Walk

}3,2,1,0{410 uu

13121322



NP-Hard Assignment Problems

TSP:Find a tour of US cities that minimizes distance.



Physical Annealing

Heat a solid until it melts

Cool slowly to allow material to reach state

of minimum energy Produces strong, defect-free crystal with

regular structure



Simulated Annealing

Makes analogy between physical annealing

and solving combinatorial optimization

problem Solution to problem = state of material

Value of objective function = energy

associated with state Optimal solution = minimum energy state



How Simulated Annealing Works

Iterative algorithm, slowly lower T

Randomly change solution to create

alternate solution Compute , the change in value of objective

function

If < 0, then jump to alternate solution

Otherwise, jump to alternate solution with

probability e- /T



Performance of Simulated

Annealing Rate of convergence depends on initial value of T

and temperature change function

Geometric temperature change functions typical;e.g., T i+1 = 0.999 T i

Not guaranteed to find optimal solution

Same algorithm using different random number streams can converge on different solutions

Opportunity for parallelism



Convergence

Starting with higher

initial temperature

leads to more iterations

before convergence



Parking Garage

Parking garage has S stalls

Car arrivals fit Poisson distribution with

mean A: Exponentially distributed inter-arrival times

Stay in garage fits a normal distribution

with mean M and standard deviation M /S



Implementation Idea

101.2 142.1 70.3 91.7 223.1

64.2

Current Time

Times Spaces Are Available

15

Car Count Cars Rejected

2



Summary (1/3)

Applications of Monte Carlo methods

Numerical integration

Simulation

Random number generators

Linear congruential

Lagged Fibonacci



Summary (2/3)

Parallel random number generators

Manager/worker

Leapfrog

Sequence splitting

Independent sequences

Non-uniform distributions

Analytical transformations Box-Muller transformation

(Rejection method)



Summary (3/3)

Concepts revealed in case studies

Monte Carlo time

Random walk

Metropolis algorithm

Simulated annealing

Modeling queues