Monte Carlo Integration I
Digital Image Synthesisg g yYung-Yu Chuang
with slides by Pat Hanrahan and Torsten Moller
Introduction
)ωp,( ooL )ω,p( oeL
iiiio ωθcos)ωp,()ω,ωp,(2
dLf is• The integral equations generally don’t have
analytic solutions, so we must turn to numerical methods.
• Standard methods like Trapezoidal integration p gor Gaussian quadrature are not effective for high-dimensional and discontinuous integrals.g g
Numerical quadrature
• Suppose we want to calculate , but can’t solve it analytically The approximations
b
adxxfI )(
can t solve it analytically. The approximations through quadrature rules have the form
n
iii xfwI
1)(ˆ
which is essentially the weighted sum of samples of the function at various pointsp p
Midpoint rule
convergenceconvergence
Trapezoid rule
convergenceconvergence
Simpson’s rule
• Similar to trapezoid but using a quadratic polynomial approximationpolynomial approximation
convergenceconvergence
assuming f has a continuous fourth derivative.
Curse of dimensionality and discontinuity
• For an sd f i ffunction f,
• If the 1d rule has a convergence rate of O(n-r), the sd rule would require a much larger number (ns) of samples to work as well as the 1d one. Thus, the convergence rate is only O(n-r/s).
• If f is discontinuous, convergence is O(n-1/s) for , g ( )sd.
Randomized algorithms
• Las Vegas v.s. Monte CarloL V l i h i h b • Las Vegas: always gives the right answer by using randomness.
• Monte Carlo: gives the right answer on the average. Results depend on random numbers used, but statistically likely to be close to the right answer.
Monte Carlo integration
• Monte Carlo integration: uses sampling to estimate the values of integrals It only estimate the values of integrals. It only requires to be able to evaluate the integrand at arbitrary points making it easy to implementarbitrary points, making it easy to implementand applicable to many problems.If l d it t th t • If n samples are used, its converges at the rate of O(n-1/2). That is, to cut the error in half, it is
t l t f ti necessary to evaluate four times as many samples.
• Images by Monte Carlo methods are often noisy. Most current methods try to reduce noise.
Monte Carlo methods
• AdvantagesE t i l t– Easy to implement
– Easy to think about (but be careful of statistical bias)R b t h d ith l i t d d – Robust when used with complex integrands and domains (shapes, lights, …)Efficient for high dimensional integrals– Efficient for high dimensional integrals
• Disadvantages– Noisy– Slow (many samples needed for convergence)
Basic concepts
• X is a random variableA l i f i d i bl i • Applying a function to a random variable gives another random variable, Y=f(X).
• CDF (cumulative distribution function) }Pr{)( xXxP
• PDF (probability density function): nonnegative, sum to 1 xdP )(
}{)(
sum to 1
i l if d i bl ξ ( id d dx
xdPxp )()(
• canonical uniform random variable ξ (provided by standard library and easy to transform to th di t ib ti )other distributions)
Discrete probability distributions• Discrete events Xi with probability pi
n0ip
1
1n
ii
p
ip
• Cumulative PDF (distribution)ip
j
P p 1
• Construction of samples:1
j ii
P p
U
• Construction of samples:To randomly select an event,
Select Xi if 1i iP U P 0iP 0
Uniform random variable 3X
Continuous probability distributions
• PDF ( )p x Uniform
( ) 0p x
• CDF1x
(1) 1P ( )P x
0
( ) ( )P x p x dx ( ) Pr( )P x X x
P ( ) ( )X d
0Pr( ) ( )X p x dx
10( ) ( )P P 10
Expected values
• Average value of a function f(x) over some distribution of values p(x) over its domain Ddistribution of values p(x) over its domain D
dxxpxfxfE )()()(
• Example: cos function over [0, π], p is uniform
Dp dxxpxfxfE )()()(
p [ , ], p
01cos)cos( dxxxE 00cos)cos( dxxxp
Variance
• Expected deviation from the expected valueF d l f if i h • Fundamental concept of quantifying the error in Monte Carlo methods
2)()()]([ xfExfExfV
Properties
)()( xfaExafE
ii XfEXfE )()(
ii
)()( 2 xfVaxafV )()( xfVaxafV
22 )()()( xfExfExfV )()()( xfExfExfV
Monte Carlo estimator
• Assume that we want to evaluate the integral of evaluate the integral of f(x) over [a,b]Gi if d • Given a uniform random variable Xi over [a,b], M t C l ti t Monte Carlo estimator
N
XfabF )(
says that the expected
i
iN XfN
F1
)(
says that the expected value E[FN] of the estimator FN equals the estimator FN equals the integral
General Monte Carlo estimator
• Given a random variable X drawn from an arbitrary PDF p(x) then the estimator isarbitrary PDF p(x), then the estimator is
N
iXfF )(1
i i
N XpNF
1 )(
• Although the converge rate of MC estimator is O(N1/2), slower than other integral methods, its ( )converge rate is independent of the dimension, making it the only practical method for high dimensional integral
Convergence of Monte Carlo
• Chebyshev’s inequality: let X be a random variable with expected value μ and variance σ2 variable with expected value μ and variance σ2. For any real number k>0,
2
1}|Pr{|k
kX
• For example, for k= , it shows that at least half of the value lie in the interval
2)2,2(
• Let , the MC estimate FN becomes)(/)( iii XpXfY N1
),(
i
iN YN
F1
1
Convergence of Monte Carlo• According to Chebyshev’s inequality,
1
2][|][|Pr NNN
FVFEF
YVN
YVN
YVN
YN
VFVN
i
N
i
N
iN1111][ 22
Plugging into Chebyshev’s inequality
NNNN i
ii
ii
iN ][1
21
21
• Plugging into Chebyshev’s inequality,
21
][1||Pr YVIFN
So, for a fixed threshold, the error decreases at
||PrN
IFN
, ,the rate N-1/2.
Properties of estimators
• An estimator FN is called unbiased if for all N
That is, the expected value is independent of N.QFE N ][
• Otherwise, the bias of the estimator is defined as
• If the bias goes to zero as N increases the
QFEF NN ][][
• If the bias goes to zero as N increases, the estimator is called consistent
0][li F 0][lim NN
F
QFE ][lim QFE NN
][lim
Example of a biased consistent estimator
• Suppose we are doing antialiasing on a 1d pixel, to determine the pixel value we need to to determine the pixel value, we need to evaluate , where is the filter function with
1
0)()( dxxfxwI )(xw
1
1)( dfilter function with• A common way to evaluate this is
0
1)( dxxw
N
N
i
N
i iiN
Xw
XfXwF 1
)(
)()(
• When N=1, we have i iXw
1)(
XfX )()( IdxxfXfEXw
XfXwEFE
1
011
111 )()]([
)()()(][
Example of a biased consistent estimator
• When N=2, we have
Idxdxxwxw
xfxwxfxwFE
1
0
1
0 2121
22112 )()(
)()()()(][
• However, when N is very large, the bias approaches to zeropp
N
i ii
N
XfXwNF
1
1
)()(1
N
i i
N
XwN 1
)(1
XfXN
1)()(1liIdxxfxw
dxxw
dxxfxw
XwN
XfXwNFE
N
i i
i iiNNN
1
01
0
1
0
1
1)()(
)(
)()(
)(1lim
)()(lim][lim
N i iN 01
)(
Choosing samples
• N
iN X
XfN
F)()(1
• Carefully choosing the PDF from which samples are drawn is an important technique to reduce
i iXpN 1 )(
are drawn is an important technique to reduce variance. We want the f/p to have a low variance Hence it is necessary to be able to variance. Hence, it is necessary to be able to draw samples from the chosen PDF.
• How to sample an arbitrary distribution from a • How to sample an arbitrary distribution from a variable of uniform distribution?– Inversionve s o– Rejection– Transform
Inversion method
• Cumulative probability distribution function
( ) Pr( )P x X x
• Construction of samplesSolve for X=P-1(U)
1
Solve for X P (U)
• Must know:
U
• Must know:1. The integral of p(x)
12. The inverse function P-1(x)X
0
Proof for the inversion method
• Let U be an uniform random variable and its CDF is P (x) x We will show that Y P-1(U) has CDF is Pu(x)=x. We will show that Y=P-1(U) has the CDF P(x).
Proof for the inversion method
• Let U be an uniform random variable and its CDF is P (x) x We will show that Y P-1(U) has CDF is Pu(x)=x. We will show that Y=P-1(U) has the CDF P(x).
)())(()(Pr)(PrPr 1 xPxPPxPUxUPxY u
because P is monotonic,)()( 2121 xPxPxx
Thus, Y’s CDF is exactly P(x).)()( 2121
Inversion method
• Compute CDF P(x)
• Compute P-1(x)
• Obtain ξξ
• Compute Xi=P-1(ξ)
Example: power functionIt is used in sampling Blinn’s microfacet model.
nnxxp )(
Example: power function
A
It is used in sampling Blinn’s microfacet model.
• Assume( ) ( 1) np x n x
11 1
0 0
11 1
nn xx dx
n n
0 0
1( ) nP x x
1 1~ ( ) ( ) nX p x X P U U
Trick (It only works for sampling power distribution)
1 2 1max( , , , , )n nY U U U U
11
n
1
1
Pr( ) Pr( ) n
i
Y x U x x
Example: exponential distribution
useful for rendering participating media.axcexp )(
• Compute CDF P(x)
• Compute P-1(x)• Compute P (x)
Obt i ξ• Obtain ξ• Compute Xi=P-1(ξ)
Example: exponential distribution
useful for rendering participating media.axcexp )(
10
dxce ax ac
• Compute CDF P(x)
0axx as edsaexP 1)(
• Compute P-1(x)
edsaexP 1)(0
1• Compute P (x)
Obt i ξ
)1ln(1)(1 xa
xP
• Obtain ξ• Compute Xi=P-1(ξ) ln1)1ln(1X )(
aa
Rejection method
• Sometimes, we can’t integrate into CDF or invert CDF
1
( )I f x dx
invert CDF
0
( )f
dx dy
( )y f x
• Algorithm( )y f x
dx dy
• Algorithm
Pick U1 and U2Accept U if U < f(U )Accept U1 if U2 < f(U1)
• Wasteful? Efficiency = Area / Area of rectangle• Wasteful? Efficiency = Area / Area of rectangle
Rejection method
• Rejection method is a dart-throwing method without performing integration and inversion without performing integration and inversion.
1. Find q(x) so that p(x)<Mq(x)2. Dart throwing
a. Choose a pair (X, ξ), where X is sampled from q(x)b. If (ξ<p(X)/Mq(X)) return X
• Equivalently, we pick a qu vale tly, we p c a point (X, ξMq(X)). If it lies beneath (X)it lies beneath p(X)then we are fine.
Why it works
• For each iteration, we generate Xi from q. The sample is returned if ξ<p(X)/Mq(X) which sample is returned if ξ<p(X)/Mq(X), which happens with probability p(X)/Mq(X). S th b bilit t t i • So, the probability to return x is
xpxpxq )()()(
whose integral is 1/M
MxMqxq
)()(
whose integral is 1/M• Thus, when a sample is returned (with
probability 1/M) X is distributed according to probability 1/M), Xi is distributed according to p(x).
Example: sampling a unit diskvoid RejectionSampleDisk(float *x, float *y) {
float sx, sy;, y;do {
sx = 1.f -2.f * RandomFloat();sy = 1.f -2.f * RandomFloat();
} while (sx*sx + sy*sy > 1.f)*x = sx; *y = sy;*x = sx; *y = sy;
}
π/4~78.5% good samples, gets worse in higher dimensions, for example, for sphere, π/6~52.4% dimensions, for example, for sphere, π/6 52.4%
Transformation of variables
• Given a random variable X from distribution px(x)to a random variable Y y(X) where y is one toto a random variable Y=y(X), where y is one-to-one, i.e. monotonic. We want to derive the distribution of Y p (y)distribution of Y, py(y).
• )(}Pr{)}(Pr{))(( xPxXxyYxyP xy P (x)
• PDF:
dxxdP
dxxydP xy )())((
x
Px(x)
dxdx
)(xpdyydPdyyp y )()(
xPy(y)
)(xpxdxy
dydxyyp y
y )(
)()(1
xpdyyp
y
)()( xpdxyyp xy
Example
xxpx 2)( XY sin
2
11
1sin2
cos2)()(cos)(
yy
xxxpxyp xy
1cos yx
Transformation method
• A problem to apply the above method is that we usually have some PDF to sample from not we usually have some PDF to sample from, not a given transformation.Gi d i bl X ith ( ) d • Given a source random variable X with px(x) and a target distribution py(y), try transform X into t th d i bl Y th t Y h th to another random variable Y so that Y has the distribution py(y).
• We first have to find a transformation y(x) so that Px(x)=Py(y(x)). Thus, y
))(()( 1 xPPxy xy
Transformation method
• Let’s prove that the above transform works.W fi h h d i bl ( )We first prove that the random variable Z= Px(x) has a uniform distribution. If so, then )(1 ZPy
should have distribution Py from the inversion method.
Thus Z is uniform and the transformation works xxPPxPXxXPxZ xxxx ))(()(Pr)(PrPr 11
Thus, Z is uniform and the transformation works.• It is an obvious generalization of the inversion
method in which X is uniform and P (x)=xmethod, in which X is uniform and Px(x)=x.
Example
xxpx )( yy eyp )(y
Example
xxpx )( yy eyp )(y
2)(
2xxPx yy eyP )(
2yyPy ln)(1
2lnln2)2
ln())(()(2
1 xxxPPxy xy
Thus, if X has the distribution , then the
2
xxpx )(random variable has the distribution2lnln2 XY
x
yy eyp )(
Multiple dimensions
We often need the other way around,
Spherical coordinates
• The spherical coordinate representation of directions is idirections is
sinsincossin
ryrx
cosrz
sin|| 2rJ sin|| rJT
),,(sin),,( 2 zyxprrp
Spherical coordinates
• Now, look at relation between spherical directions and a solid anglesdirections and a solid angles
ddd sin• Hence, the density in terms of ,
dpddp )(),( dpddp )(),(
)(sin),( pp )(),( pp
Multidimensional sampling
• Separable case: independently sample X from pxand Y from p p(x y) p (x)p (y)and Y from py. p(x,y)=px(x)py(y)
• Often, this is not possible. Compute the i l d it f ti ( ) fi tmarginal density function p(x) first.
dyyxpxp )()(
• Then compute the conditional density function
dyyxpxp ),()(
• Then, compute the conditional density function
)(),()|( yxpxyp
• Use 1D sampling with p(x) and p(y|x).)(
)|(xp
yp
Sampling a hemisphere
• Sample a hemisphere uniformly, i.e. cp )(
Sampling a hemisphere
• Sample a hemisphere uniformly, i.e. cp )(
21)( p )(1 p
21
c
• Sample first
2sin),( p
sin2
sin),()(22
ddpp
• Now sampling
200
21
)(),()|(
ppp
2)(p
Sampling a hemisphere
• Now, we use inversion technique in order to sample the PDF’ssample the PDF s
cos1''sin)( dP cos1sin)(0
dP
'1)|( dP
• Inverting these:
2'
2)|(
0
dP
• Inverting these:
11cos
2
1
2
Sampling a hemisphere
• Convert these to Cartesian coordinate
11cos
212 1)2cos(cossin x
2
1
2cos
212 1)2sin(sinsin
y
Si il d i ti f f ll h
1cos z
• Similar derivation for a full sphere
Sampling a disk
RIGHT Equi-ArealWRONG Equi-Areal RIGHT Equi ArealWRONG Equi Areal
1
2
2 Ur U
12 U
r U
2U2r U
Sampling a disk
RIGHT Equi-ArealWRONG Equi-Areal RIGHT Equi ArealWRONG Equi Areal
12 U 12 U
2r U2r U
Sampling a disk
• Uniform1),( yxp
ryxrprp ),(),(
• Sample r first.
d 2)()(2
• Then sample
rdrprp 2),()(0
• Then, sample .
21
)(),()|(
rprprp
• Invert the CDF.
2)(rp
2)( rrP )|( rP)( rrP
2
)|( rP
r 2 1r 22
Shirley’s mapping
1r U
2
14UU
Sampling a triangle
0u ( ,1 )u u
01
vu v
v
1u v
u1
121 1 1 (1 ) 1u u
1u v
0 0 00
(1 ) 1(1 )2 2uA dv du u du
( ) 2p u v( , ) 2p u v
Sampling a triangle
• Here u and v are not independent! ( , ) 2p u v
),()|( vupvup
• Conditional probability
d)()( )()|(
vpvup
1
( ) 2 2(1 )u
d
dvvupup ),()(
0 2( ) 2(1 ) (1 )u
P u u du u 0
( ) 2 2(1 )p u dv u 0 11u U
1( | )p v u
0 00( ) 2(1 ) (1 )P u u du u
0 1 2v U U
00 0 00 0
1( | ) ( | )(1 ) (1 )
o ov v vP v u p v u dv dvu u
( | )(1 )
p v uu
0 00 0(1 ) (1 )u u
Cosine weighted hemisphere
cos)( p
dp )(1
2
0 0
2 sincos1 ddc 0 0
2 sincos21
dc 01c
ddd sin
sincos1),( p
Cosine weighted hemisphere
sincos1),( p
sincos),(p
2ii2i1)(2
d
2sinsincos2sincos)(0
dp
1)(p
21
)(),()|(
ppp
1212cos
21)( P )21(cos
21
11
22
2)|( P
2
22 22)|(
22
Cosine weighted hemisphere
• Malley’s method: uniformly generates points on the unit disk and then generates directions by the unit disk and then generates directions by projecting them up to the hemisphere above it.
Vector CosineSampleHemisphere(float u1 float u2){Vector CosineSampleHemisphere(float u1,float u2){Vector ret;ConcentricSampleDisk(u1 u2 &ret x &ret y);ConcentricSampleDisk(u1, u2, &ret.x, &ret.y);ret.z = sqrtf(max(0.f,1.f - ret.x*ret.x -
ret.y*ret.y));ret.y ret.y));return ret;
}}
r
Cosine weighted hemisphere
• Why does Malley’s method work?U i di k li r)(• Unit disk sampling
• Map to hemisphere
rp ),(
),(sin),( r
i),( rY ),( XT
sinr
)()())(( 1 xpxJxTp xTy
)()())(( pp xTy
cos0cos
)(
xJ cos10
)(
xJT
Cosine weighted hemisphere
),( rY ),( XT
sinr),( ),(
)()())(( 1 xpxJxTp )()())(( xpxJxTp xTy
0cos
cos100cos
)(
xJT
sincos),(),( rpJp T ),(),( pp T
Sampling Phong lobe
np cos)(
ncp cos)(
2 2/
1sincos ddc n
0 0
0
1coscos2 dc n 12
c
1cos
1coscos2
dc 11
n1n
21
nc
sincos1)( nnp
sincos
2),(p
Sampling Phong lobe
sincos2
1),( nnp
sincos)1(sincos2
1)(2
0
nn ndnp
sincos)1()'('
0
n dnP
cos)1(coscos)1('cos1'
0
n
n ndn
'cos1
1)1(coscos)1(
11cos0
n
nndn
cos1
11
1cos n
Sampling Phong lobe
sincos2
1),( nnp
1sincos),()|( 21pp
nn
2
'1
2sincos)1()()|(
'
2
np
p n
221)|'(
0
dP
22
Sampling Phong lobe
cosine-weighted hemisphereWhen n=1, it is actually equivalent to
)2,(cos),( 1,n 211
211 2),21(cos
21),(
212cos
21)( P 21 cos1cos1)( nP
222 cos1sin21)sin21(
21
212cos
21
Piecewise-constant 2d distributions
• Sample from discrete 2D distributions. Useful for texture maps and environment lightsfor texture maps and environment lights.
• Consider f(u, v) defined by a set of nunv values f[ ]f[ui, vj].
• Given a continuous [u, v], we will use [u’, v’] to denote the corresponding discrete (ui, vj) indices.
Piecewise-constant 2d distributions
1 1
],[1),(u vn n
jif vufdudvvufIintegral 0 0
],[),(i j
jivu
f fnn
f
vufvuf '')(
integral
i j jivu vufnn
vufdudvvuf
vufvup,)(1
,),(
i iu vufn
d
',)1()()(marginal
f
i
Iduvupvp
),()(marginaldensity
]'[]','[
)(),()|(
vpIvuf
vpvupvup fconditional
probability ][)( pp
Piecewise-constant 2d distributions
),( vuf Distribution2D
low-resmarginal
low-resconditionalmarginal
density)(vp
conditionalprobability
)|( vup)(vp )|( vup
Metropolis sampling
• Metropolis sampling can efficiently generate a set of samples from any non negative function fset of samples from any non-negative function frequiring only the ability to evaluate f.Di d t i l i th • Disadvantage: successive samples in the sequence are often correlated. It is not possible t th t ll b f l to ensure that a small number of samples generated by Metropolis is well distributed over th d i Th i t h i lik the domain. There is no technique like stratified sampling for Metropolis.
Metropolis sampling
• Problem: given an arbitrary function
assuming
generate a set of samples
Metropolis sampling
• StepsG i i i l l – Generate initial sample x0
– mutating current sample xi to propose x’– If it is accepted, xi+1 = x’
Otherwise xi 1 = xiOtherwise, xi+1 xi
• Acceptance probability guarantees distribution is the stationary distribution fis the stationary distribution f
Metropolis sampling
• Mutations propose x’ given xi
• T(x→x’) is the tentative transition probability density of proposing x’ from xy p p g
• Being able to calculate tentative transition probability is the only restriction for the choice probability is the only restriction for the choice of mutations
• a(x→x’) is the acceptance probability of accepting the transition
• By defining a(x→x’) carefully, we ensure
Metropolis sampling
• Detailed balance
stationary distributionstationary distribution
Pseudo code
Pseudo code (expected value)
Binary example I
Binary example II
Acceptance probability
• Does not affect unbiasedness; just varianceW i i h b i i • Want transitions to happen because transitions are often heading where f is large
• Maximize the acceptance probability– Explore state space better– Reduce correlation
Mutation strategy
• Very free and flexible; the only requirement is to be able to calculate transition probabilityto be able to calculate transition probability
• Based on applications and experience• The more mutation, the better• Relative frequency of them is not so importantq y p
Start-up bias
• Using an initial sample not from f’s distribution leads to a problem called start up bias leads to a problem called start-up bias.
• Solution #1: run MS for a while and use the t l th i iti l l t t t current sample as the initial sample to re-start
the process.– Expensive start-up cost– Need to guess when to re-start
• Solution #2: use another available sampling method to start
1D example
1D example (mutation)
1D example
mutation 1 mutation 210,000 iterations
1D example
mutation 1 mutation 2300,000 iterations
1D example
mutation 1 90% mutation 2+ 10% mutation 1
Periodically using uniform mutations increases ergodicity
2D example (image copy)
2D example (image copy)
2D example (image copy)
1 sample i l
8 samples i l
256 samples i lper pixel per pixel per pixel
Top Related