Monte Carlo Integration

38
1 Ch 5: Monte Carlo Integration and Variance Reduction Book Statistical Computing with R Maria L. Rizzo Chapman & Hall/CRC, 2008

description

Monte Carlo Integration

Transcript of Monte Carlo Integration

1

Ch 5: Monte Carlo Integration and Variance Reduction

Book Statistical Computing with R Maria L. Rizzo Chapman & Hall/CRC, 2008

2

Integral estimation

=

=

==

=

n

ii

n

XY

Xgn

XgE

XfXXX

dxxfxgXgEYE

XgYXfXXfX

dxxg

xg

1

21

-

).(1)]([

ofestimator unbiasedan Then ).( from sample i.i.d.an be ,...,,Let

estimator. following thesuggests This .)()())(()(

then variable,randomanother is )( and))(~ as(written )(on distributi with variablerandom a is if Recall

integrals. estimate tomoments lstatistica from facts use We

finite. is integral theassuming ,)( compute want to We

function. a is )(

3

Simple Monte Carlo estimator for an integral over [0,1]

Numbers. Large of Law Strong by the 1y probabilitwith

))(()(1ˆ

.,...,, variablesrandom 1) (0, i.i.d. Generate

.)( estimate tois Goal

1

21

1

0

=

=→=

=•

m

ii

m

XgEXgm

XXXUm

dxxg

θθ

θ

Exercise:  Write  R  code  to  compute  the  Monte  Carlo  estimate  of  the  integral  of  exp(-­‐x)  on  the  interval  [0,1]  and  compare  it  to  the  exact  answer.  

U(0,1)  used  because  it  fits  the  domain  of  integration  [0,1].  

4

One step harder:domain [a,b]

. and )()(Then .)( is hat worksfunction t The

.))(())((

:nintegratio theperform and 1 )( 0,)(such that )(function find ly,Specifical

used. becan [0,1]over estimator Carlo Monte simple that theso variablesof change a use tois idea One

.)( Estimate

1

0

)(

abdydtyabayt

abatty

dydydtytgdy

dydtytg

byayty

dttg

by

y(a)

b

a

−=−+=−

−=•

=

==•

=•

∫∫

∫θ

5

One step harder:domain [a,b]

∫∫∫

=

−=•

−=−=−

−=

⋅≤≤−

m

ii

iid

m

U

b

aU

b

a

b

a

U

Xgmab

baUXXX

ugEabduufugabdtab

tgabdttg

ufa,bU

IbUaIab

a,bU

a,bUa,b

1

21

)(ˆ

),(~,...,, Generate

:ALGORITHM SAMPLING

)].([)()()()(1)()()(

:follows as )(density )( the toregardn with expectatio the torelated is want weinterval that theNote

function.indicator theis )( where),(1 form hasdensity )( The

that.use and density, )( theexamplefor ),( limitsth density wiy probabilit a find ely,Alternativ

θ

6

Example 5.3 from book: Non-finite limits

.arbitrary an for 21)(

cdf normal standard theestimate oapproach t above the Use

22

x

dt,eπ

xΦ /tx

-

∞∫=

0.for estimating toreduces problem The

above. method use so ),(1)(0,For

limits. finite back to ,215.0)( 0, For

0

2

2

0

2

2

>=•

−=<•

+=>•

xdte

xΦx-Φ -x

dteπ

xΦx

x /t

/tx

θ

7

Example 5.3:Non-finite limits

? variablesrandom (0,1) of sample onegeneratingjust by every for problem thesolvepossibly weCould

. of choiceevery for generation new a require wouldbut thisshown,just as variablesrandom )(0, generatingby done be could This

.arbitrary an for 21)( Estimate 2

0

2

Umx

xxU

xdt,eπ

xΦ /tx

=• −∫

8

Example 5.3:Non-finite limits

).(ˆ1)(ˆ 0, if ;2/ˆ5.0)(ˆ 0, If

.1ˆSet

).1,0(~,..., Generate

ALGORITHM SAMPLING

(0,1).~ where],[

becomes solved be tointegral The

. 1, 0, 0Then

. with variablesof change Use

.arbitrary an for Estimate

1

2/)(

1

2)(1

0

2)(

0

2

2

22

2

xΦxΦxxΦx

xem

UUU

UYxeEdyxe

xdydt xytyxtyt

t/xy

x,dte

m

i

xU

iid

m

/xyY

/xy

x /t

i

−−=<+=>•

=•

==

=→==→==→=

=

=

=

−−

πθ

θ

θ

θ

9

R Code for Example 5.3

Generates the integral for 10 positive x’s ranging from .1 to 2.5.

Note, u and hence g is a vector; We are looping through the vector x.

R has a function pnorm to calculate this automatically.

Close except for the very high values of x.

10

Example 5.4: Semi-finite limits

∫∫∫

=

∞−

−−∞

∞−

≤=•

Φ==≤=≤=≤

m

ii

iid

m

x/z/z

Z

x /t

xZIm

NZZ

xdzedzexzIdzzfxzIxIE

N~Z

dtex

1

1

22

--

2

).(1)(ˆSet

).1,0(~,..., Generate

ALGORITHM SAMPLING

).(21

21)()()()](Z[

(0,1).Let

disposal.your at

generator normal standard a haveyou where21)( Calculate

22

2

ππ

π

By the strong law of large numbers this estimate approximates the true normal probability P(Z ≤ x) with probability 1.

11

R Code for Example 5.4

Margin = 1 means apply over rows

12

General Result

(SLLN). Numbers Large of Law Strongby the 1y probabilit with as )ˆ(ˆ

).(1ˆset and )(~,..., generate

,)()( estimate To

.set on supporteddensity y probabilit a )(

11

A

∞→=→•

=

=•

=

mE

Xgm

xfXX

dxxfxg

Axf

m

ii

iid

m

θθθ

θ

θ

13

Standard errors

.)Var( that statistics from Recall

.

).( variablerandom theof variance theis ))(Var( where,1

))((1)(1)ˆVar(

.principles lstatistica basic use and )(),...,(),(t independen theofmean sample a is ˆ

that realize we),(1ˆ oferror standard thecalculate To

2

2

222

12

1

21

1

nX

m

XgXgmm

XgVarm

Xgm

Var

XgXgXg

Xgm

m

ii

m

ii

m

m

ii

σ

σ

σσ

θ

θ

θ

=

=

==

=⎟⎠

⎞⎜⎝

⎛=

=

∑∑

==

=

Uses that the variance of a sum is the sum of the variances of independent things.

14

Standard errors

.)ˆ)((1ˆ

is estimate likelihood maximum the while,)ˆ)((1

1 is variancesample of estimate unbiased that thestatistics from Recall

).(),...,(),( of variancesample ...by the

? estimate wedo How

)).(( where,)ˆVar(

.principles lstatistica basic use and ).(),...,(),(t independen theofmean sample a is ˆ

2

1

2

2

1

2

21

2

22

21

θσ

θ

σ

σσ

θ

θ

=

=

−=

−−

=

==

m

ii

m

ii

m

m

Xgm

Xgm

s

XgXgXg

XgVarm

XgXgXg

Since m/(m-1) approaches 1 for m large, and m can be fixed large by the user, we will follow the book and use the second estimate.

15

Standard errors

.)ˆ)((1ˆ

is estimate likelihood maximum the while,)ˆ)((1

1 is variancesample of estimate unbiased that thestatistics from Recall

).(),...,(),( of variancesample ...by the

? estimate wedo How

)).(( where,)ˆVar(

.principles lstatistica basic use and ).(),...,(),(t independen theofmean sample a is ˆ

2

1

2

2

1

2

21

2

22

21

θσ

θ

σ

σσ

θ

θ

=

=

−=

−−

=

==

m

ii

m

ii

m

m

Xgm

Xgm

s

XgXgXg

XgVarm

XgXgXg

Since m/(m-1) approaches 1 for m large, and m can be fixed large by the user, we will follow the book and use the second estimate.

16

Standard errors

.)ˆ)((

)ˆs.e.(

and

,)ˆ)((

)ˆ)((1

ˆ)ˆVar(

2

1

2

2

1

2

1

2

m

Xg

m

Xg

m

Xgm

m

m

ii

m

ii

m

ii

θθ

θ

θ

σθ

=

=

=

=

−=

−=

Have to be careful to have two m’s in the denominator.

17

Confidence intervals (CI)

).ˆ.(.1.96ˆ is for CI 95%A

0.95.))ˆ.(.1.96ˆ )ˆ.(.1.96ˆ(

and 0.95,1.96))ˆ.(.

ˆ (-1.96

)ˆ.(.

ˆ ngsubstituti and 0.951.96) (-1.96 (0,1),~For

.for interval confidence 95% a develop toused isfact this,)ˆ( Since

. ason distributiin )1,0()ˆVar(

)ˆ(ˆ thatimplies (CLT) TheoremLimit Central The

θθθ

θθθθ

θ

θθ

θ

θθ

θθθ

θ

θθ

es

esesθP

esP

esZZPNZ

E

mNE

±

=+<<−

=<−

<

⇒−

==<<

=

∞→→−

18

Note the mean includes a division by m.

Can use = instead of <-

Example 5.5

19

.))(1)(( is ))(,Bin(for varianceThe d.distribute Binomial iswhich

),(y probabilit successeach with trials,m ofout successes of proportion the

equals also and trials,Bernoullit independen of average theis )(1ˆ

)).(1)(()]([ on,distributi Bernoulli the toaccording Therefore, ).(1])([ y,probabilit success theis )(

).()()(0)(1)]([)]([otherwise. 0 and Zif 1 value taking variable,random Bernoulli a is )()(

)1,0(~,2

1

mxxxm

x

mXgm

xxZgVarxZgPx

xxZPxZPxZPxZIEZgExxZIZg

NZx

m

ii

Φ−ΦΦ

Φ

=

Φ−Φ=

Φ==Φ

Φ=<=≥⋅+<⋅=<=

<<=

=

∑=

θ

Example 5.5 continued

If this does not ring a bell, maybe variance of Bin(n,p) = p(1-p)/n does.

20

Example 5.5 from book

close. very is estimate varianceMC The 06.-2.223e0000.977)/10,-0.977(1 varianceal theoreticyield ld which wou,977.0)2( =≈Φ

> pnorm(2) [1] 0.9772499

MC variance estimate

21

Remarks on Example 5.5

below. examplein 0.06- example,for , of spacesupport theof endlower at the is if however, s,simulationmany require could algorithm This 3.)

).( hits therecords and variablesrandom oflot a generatesit because algorithm miss"or hit " theas toreferred sometimes is

)( form theof functions general estimatingfor shown just algorithm The 2.)

used. becan Either s.proportion estimating of case for the estimate MC n therather tha

,/))(ˆ1)((ˆ)]([ estimate, second prefer the Some 1.)

=

<

<

Φ−Φ≈

xZx

xZZ

xZI

mxxZgVar

22

Efficiency

.efficiencywith yourselfconcern you before (unbiased)correct isestimator your y whether first worr

oyou want t suggests,cartoon theAs property.order -second a called is Efficiency

.1)ˆ()ˆ( if ˆthan efficient more is ˆ then ,for estimators twoare ˆ and ˆ If

s.simulation ofnumber same for the estimateyour of riancesmaller va a getting meansit ,simulationIn

faster. thingsdoing means generalin Efficiency

2

12121 <

θθ

θθθθθVarVar

23

Notes on efficiency

.)ˆr(

)ˆ()ˆ(100

:reported is ˆ of instead ˆ usingreduction percent theSometimes

s.simulation ofnumber theincreasingby is variance thedecrease way toone so increases)

ssimulation ofnumber theas (decrease 1order of are averages of Variances

ns.calculatio efficiencyfor used are estimates MC their sounknown are Variances

1

21

12

θθθ

θθ

VaVarVar

m/m

−×

24

Power calculations

. need t weobtain tha tofor solve We

s.experimentprior from of estimate priori" a"an have We. belowerror standard a achieve toneeded ssimulation ofnumber thedetermine

want toand costly, is study that simulation arun toplanning are weSuppose

)].([ of average the takingare object we

theof variance true theis where,)ˆat var(earlier th saw We

accuracy. of level desired aget toperform tossimulationor collect tosamplesmany how gdeterminin refer to nscalculatiopower lStatistica

2

2

2

22

εσ

εσ

ε

θ

><•

=•

mmm

σm

Xg

σmσ

Jim Carrey, Bruce Almighty

25

Tricks for reducing MC variance

There are some tricks for reducing the variance of MC integration, which ultimately reduce the number of random variable generations. Two include the use of antithetic variables and control variates in Sections 5.4 and 5.5. These are beyond the scope of the course.

26

Importance sampling

∫∫∫

=

−=•

−=−=−

−=

=

m

ii

iid

m

U

b

aU

b

a

b

a

U

b

a

Xgmab

baUXXX

ugEabduufugabdtab

tgabdttg

baUfdttg

1

21

)(ˆ

),(~,...,, Generate

: wasalgorithm sampling The

)].([)()()()(1)()()(

that noting density, generating a as

),( a used have n weintegratio MC using )( calculate To

:MOTIVATION

θ

27

Importance sampling

density. )( by the wellmatchednot is if ellnot work w willThis

)(ˆ

),(~,...,, Generate

1

21

a,bU g

Xgm

abbaUXXX

m

ii

iid

m

∑=

−=•

θ

g(x)

28

Importance sampling

)]([)()()()(1)()()( ugEabduufugabdtab

tgabdttg U

b

aU

b

a

b

a

−=−=−

−= ∫∫∫

The idea is to replace the generating density f here by something that is easy to sample from and more closely represents the function to be integrated.

29

Importance sampling

.)()(1][Set

).(~,..., Generate

:ALGORITHM

integral. required thegives )()()()(

)()(][Then

).(~ where, of variablerandom ed transforma be )()(Let

function. importance thecalled is )( from; generatecan you that 0})({set on the 0)(such that )(density Find

:LOGIC

.)( Calculate :GOAL

1

1

∫∫

=

=•

==⎥⎦

⎤⎢⎣

⎡=•

=•

≠>•

m

i i

i

iid

m

XfXg

mYE

XfXX

dxxgdxxfxfxg

XfXgEYE

XfXXXfXgY

xfxx:gxfxf

dxxg

30

Picking the right f

0. isconstant a of variance thesince constant, a ,)()( that so possible

asclosely as of shape themimic to choose tois thisdo obest way t The

ty. variabililittle has )()(such that )( choose want to We

.)()(

)(1at earlier th from Recall1

cXfXg

gf

XfXgXf

mXfXgVar

mYVarY

mVar

m

ii

⎟⎟⎠

⎞⎜⎜⎝

==⎟⎠

⎞⎜⎝

⎛∑=

31

Example from book

Note that some have a bigger support than g.

Cauchy = t1

Exp(1)

Uniform

Rescaled Exp(1) Rescaled Cauchy

32

g  

•  Plot g(x) and each of the f’s. •  See which f matches the

shape of g most closely. •  It looks as if f3 is the best.

f3  

f0  

f1  

f2  

f4  

Example continued

33

g/f3  

g/f2  

g/f4  

•  Plot g(x)/f(x) for each of the f’s. •  See which is most constant. •  f3 looks the best. •  Rescaling the Cauchy (f2 à f4)

really helped!

Example continued

34

Uniform

Exp(1)

Cauchy

Note these will have g(x) = 0 so it does not matter what you assign to them.

Example continued

35

Re-scaled Exp(1)

Re-scaled Cauchy

Example continued

36

•  f3 has the smallest standard error, followed by f4. •  The Cauchy (f2) is the worst. This is because its support

is so much larger than [0,1] that most of the generated g/f’s = 0. In fact 75% were 0.

Summary statistics of g/f2.

Example continued

37

Importance sampling to calculate expectations

.)(

)()(1])([Set

).(~,..., Generate

:ALGORITHM

).( )( resemblesclosely now that function, envelope theas toreferred sometimes ),( from, sample to

density another find toneeds one Here sampling. importanceapply stillCan

inference.ayesian Bin happensregularly This from.sample easy tonot isit density,y probabilit aalready is )(Although

.)()())(( calculate want to),(For :GOAL

1

1

=

=•

=

m

i i

ii

iid

m

XXfXg

mXgE

XXX

xgxfx

Xf

dxxfxgXgEXf~X

φ

φ

φ

•  All estimates approach the true value of the integral as m approaches infinity by the SLLN.

•  Despite its simplicity importance sampling is rarely used for expectation calculations due to the difficulty in finding an appropriate envelope function.

38

End of Chapter 5

Ch 6: MC Methods in Inference   Very important applications of what is

learned in Chapter 5.

  Not covered in this course except as potential homework problems.