Monte Carlo Notes From Main Book

download Monte Carlo Notes From Main Book

of 585

Transcript of Monte Carlo Notes From Main Book

  • 7/26/2019 Monte Carlo Notes From Main Book

    1/584

    Markov Chain Monte Carlo Methods

    Markov Chain Monte Carlo Methods

    Christian P. Robert

    Universite Paris Dauphine and CREST-INSEEhttp://www.ceremade.dauphine.fr/~xian

    3-6 Mayo 2005

    http://www.ceremade.dauphine.fr/~xianhttp://www.ceremade.dauphine.fr/~xianhttp://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    2/584

    Markov Chain Monte Carlo Methods

    Outline

    Motivation and leading example

    Random variable generation

    Monte Carlo Integration

    Notions on Markov Chains

    The Metropolis-Hastings Algorithm

    The Gibbs Sampler

    MCMC tools for variable dimension problems

    Sequential importance sampling

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    3/584

    Markov Chain Monte Carlo Methods

    New [2004] edition:

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    4/584

    Markov Chain Monte Carlo Methods

    Motivation and leading example

    Motivation and leading example

    Motivation and leading exampleIntroductionLikelihood methodsMissing variable models

    Bayesian MethodsBayesian troubles

    Random variable generation

    Monte Carlo Integration

    Notions on Markov Chains

    The Metropolis-Hastings Algorithm

    http://goforward/http://find/http://goback/
  • 7/26/2019 Monte Carlo Notes From Main Book

    5/584

    Markov Chain Monte Carlo Methods

    Motivation and leading example

    Introduction

    Latent structures make life harder!

    Even simple models may lead to computational complications, as

    in latent variable models

    f(x|) =

    f(x, x|) dx

    M k Ch i M C l M h d

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    6/584

    Markov Chain Monte Carlo Methods

    Motivation and leading example

    Introduction

    Latent structures make life harder!

    Even simple models may lead to computational complications, as

    in latent variable models

    f(x|) =

    f(x, x|) dx

    If (x, x) observed, fine!

    M k Ch i M t C l M th d

    http://find/http://goback/
  • 7/26/2019 Monte Carlo Notes From Main Book

    7/584

    Markov Chain Monte Carlo Methods

    Motivation and leading example

    Introduction

    Latent structures make life harder!

    Even simple models may lead to computational complications, asin latent variable models

    f(x|) =

    f(x, x|) dx

    If (x, x) observed, fine!

    Ifonly x observed, trouble!

    Markov Chain Monte Carlo Methods

    http://find/http://goback/
  • 7/26/2019 Monte Carlo Notes From Main Book

    8/584

    Markov Chain Monte Carlo Methods

    Motivation and leading example

    Introduction

    Example (Mixture models)

    Models ofmixtures of distributions:

    Xfj with probability pj,

    forj = 1, 2, . . . , k, with overall density

    Xp1f1(x) + +pkfk(x).

    Markov Chain Monte Carlo Methods

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    9/584

    Markov Chain Monte Carlo Methods

    Motivation and leading example

    Introduction

    Example (Mixture models)

    Models ofmixtures of distributions:

    Xfj with probability pj,

    forj = 1, 2, . . . , k, with overall density

    Xp1f1(x) + +pkfk(x).

    For a sample of independent random variables (X1, , Xn),sample density

    ni=1

    {p1f1(xi) + +pkfk(xi)} .

    Markov Chain Monte Carlo Methods

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    10/584

    Markov Chain Monte Carlo Methods

    Motivation and leading example

    Introduction

    Example (Mixture models)

    Models ofmixtures of distributions:

    Xfj with probability pj,

    forj = 1, 2, . . . , k, with overall density

    Xp1f1(x) + +pkfk(x).

    For a sample of independent random variables (X1, , Xn),sample density

    ni=1

    {p1f1(xi) + +pkfk(xi)} .

    Expanding this product involves kn elementary terms: prohibitive

    to compute in large samples.

    Markov Chain Monte Carlo Methods

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    11/584

    Markov Chain Monte Carlo Methods

    Motivation and leading example

    Introduction

    1 0 1 2 3

    1

    0

    1

    2

    3

    1

    2

    Case of the 0.3N(1, 1) + 0.7N(2, 1) likelihood

    Markov Chain Monte Carlo Methods

    http://find/http://goback/
  • 7/26/2019 Monte Carlo Notes From Main Book

    12/584

    Motivation and leading example

    Likelihood methods

    Maximum likelihood methods

    Go Bayes!!

    For an iid sample X1, . . . , X n from a population with densityf(x|1, . . . , k), the likelihood function is

    L(|x) = L(1, . . . , k|x1, . . . , xn)=n

    i=1f(xi|1, . . . , k).

    Markov Chain Monte Carlo Methods

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    13/584

    Motivation and leading example

    Likelihood methods

    Maximum likelihood methods

    Go Bayes!!

    For an iid sample X1, . . . , X n from a population with densityf(x|1, . . . , k), the likelihood function is

    L(|x) = L(1, . . . , k|x1, . . . , xn)=n

    i=1f(xi|1, . . . , k).

    Global justifications from asymptotics

    Markov Chain Monte Carlo Methods

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    14/584

    Motivation and leading example

    Likelihood methods

    Maximum likelihood methods

    Go Bayes!!

    For an iid sample X1, . . . , X n from a population with densityf(x|1, . . . , k), the likelihood function is

    L(|x) = L(1, . . . , k|x1, . . . , xn)=n

    i=1f(xi|1, . . . , k).

    Global justifications from asymptotics Computational difficulty depends on structure, eg latent

    variables

    Markov Chain Monte Carlo Methods

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    15/584

    Motivation and leading example

    Likelihood methods

    Example (Mixtures again)

    For a mixture of two normal distributions,

    p

    N(, 2) + (1

    p)

    N(, 2),

    likelihood proportional to

    n

    i=1 p1

    xi

    + (1 p)1

    xi

    containing2n terms.

    Markov Chain Monte Carlo Methods

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    16/584

    Motivation and leading example

    Likelihood methods

    Standard maximization techniques often fail to find the globalmaximum because of multimodality of the likelihood function.

    Example

    In the special case

    f(x|, ) = (1)exp{(1/2)x2}+

    exp{(1/22)(x)2} (1)

    with >0 known,

    Markov Chain Monte Carlo Methods

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    17/584

    Motivation and leading example

    Likelihood methods

    Standard maximization techniques often fail to find the globalmaximum because of multimodality of the likelihood function.

    Example

    In the special case

    f(x|, ) = (1)exp{(1/2)x2}+

    exp{(1/22)(x)2} (1)

    with >0 known, whatever n, the likelihood is unbounded:

    lim0 (= x1, |x1, . . . , xn) =

    Markov Chain Monte Carlo Methods

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    18/584

    Motivation and leading example

    Missing variable models

    The special case of missing variable models

    Consider again a latent variable representation

    g(x|) = Zf(x, z|) dz

    Markov Chain Monte Carlo Methods

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    19/584

    Motivation and leading example

    Missing variable models

    The special case of missing variable models

    Consider again a latent variable representation

    g(x|) = Zf(x, z|) dzDefine the completed (but unobserved) likelihood

    Lc(

    |x, z) =f(x, z

    |)

    Markov Chain Monte Carlo Methods

    M i i d l di l

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    20/584

    Motivation and leading example

    Missing variable models

    The special case of missing variable models

    Consider again a latent variable representation

    g(x|) = Zf(x, z|) dzDefine the completed (but unobserved) likelihood

    Lc(

    |x, z) =f(x, z

    |)

    Useful for optimisation algorithm

    Markov Chain Monte Carlo Methods

    M ti ti d l di l

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    21/584

    Motivation and leading example

    Missing variable models

    The EM Algorithm

    Gibbs connection Bayes rather than EM

    Algorithm (ExpectationMaximisation)

    Iterate (in m)

    1. (E step) Compute

    Q(|(m),x) = E[log Lc(|x,Z)|(m),x] ,

    Markov Chain Monte Carlo Methods

    Motivation and leading example

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    22/584

    Motivation and leading example

    Missing variable models

    The EM Algorithm

    Gibbs connection Bayes rather than EM

    Algorithm (ExpectationMaximisation)

    Iterate (in m)

    1. (E step) Compute

    Q(|(m),x) = E[log Lc(|x,Z)|(m),x] ,2. (M step) Maximise Q(|(m),x) in and take

    (m+1)= arg max

    Q(|(m),x).

    until a fixed point [ofQ] is reached

    Markov Chain Monte Carlo Methods

    Motivation and leading example

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    23/584

    Motivation and leading example

    Missing variable models

    Echantillon N(0,1)

    2 1 0 1 2

    0.0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    Sample from (1)

    Markov Chain Monte Carlo Methods

    Motivation and leading example

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    24/584

    Motivation and leading example

    Missing variable models

    1 0 1 2 3

    1

    0

    1

    2

    3

    1

    2

    Likelihood of .7N(1, 1) + .3N(2, 1) and EM steps

    Markov Chain Monte Carlo Methods

    Motivation and leading example

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    25/584

    Motivation and leading example

    Bayesian Methods

    The Bayesian Perspective

    In the Bayesian paradigm, the information brought by the data x,realization of

    Xf(x|),

    Markov Chain Monte Carlo Methods

    Motivation and leading example

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    26/584

    g p

    Bayesian Methods

    The Bayesian Perspective

    In the Bayesian paradigm, the information brought by the data x,realization of

    Xf(x|),is combined with prior information specified by prior distributionwith density

    ()

    Markov Chain Monte Carlo Methods

    Motivation and leading example

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    27/584

    g p

    Bayesian Methods

    Central tool

    Summary in a probability distribution, (|x), called the posteriordistribution

    Markov Chain Monte Carlo MethodsMotivation and leading example

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    28/584

    Bayesian Methods

    Central tool

    Summary in a probability distribution, (|x), called the posteriordistributionDerived from the jointdistribution f(x|)(), according to

    (|x) = f(x|)()f(x|)()d ,

    [Bayes Theorem]

    Markov Chain Monte Carlo MethodsMotivation and leading example

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    29/584

    Bayesian Methods

    Central tool

    Summary in a probability distribution, (|x), called the posteriordistributionDerived from the jointdistribution f(x|)(), according to

    (|x) = f(x|)()f(x|)()d ,

    [Bayes Theorem]

    wherem(x) =

    f(x|)()d

    is the marginal densityofX

    Markov Chain Monte Carlo MethodsMotivation and leading example

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    30/584

    Bayesian Methods

    Central tool...central to Bayesian inference

    Posterior defined up to a constant as

    (|x)f(x|) ()

    Operatesconditionalupon the observations

    Markov Chain Monte Carlo MethodsMotivation and leading example

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    31/584

    Bayesian Methods

    Central tool...central to Bayesian inference

    Posterior defined up to a constant as

    (|x)f(x|) ()

    Operatesconditionalupon the observations Integrate simultaneously prior informationandinformation

    brought by x

    Markov Chain Monte Carlo MethodsMotivation and leading example

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    32/584

    Bayesian Methods

    Central tool...central to Bayesian inference

    Posterior defined up to a constant as

    (|x)f(x|) ()

    Operatesconditionalupon the observations Integrate simultaneously prior informationandinformation

    brought by x

    Avoids averaging over theunobservedvalues ofx

    Markov Chain Monte Carlo MethodsMotivation and leading example

    http://goforward/http://find/http://goback/
  • 7/26/2019 Monte Carlo Notes From Main Book

    33/584

    Bayesian Methods

    Central tool...central to Bayesian inference

    Posterior defined up to a constant as

    (|x)f(x|) ()

    Operatesconditionalupon the observations Integrate simultaneously prior informationandinformation

    brought by x

    Avoids averaging over theunobservedvalues ofx

    Coherentupdating of the information available on ,independent of the order in which i.i.d. observations arecollected

    Markov Chain Monte Carlo MethodsMotivation and leading example

    B i M h d

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    34/584

    Bayesian Methods

    Central tool...central to Bayesian inference

    Posterior defined up to a constant as

    (|x)f(x|) ()

    Operatesconditionalupon the observations Integrate simultaneously prior informationandinformation

    brought by x

    Avoids averaging over theunobservedvalues ofx

    Coherentupdating of the information available on ,independent of the order in which i.i.d. observations arecollected

    Provides acompleteinferential scope and a unique motor of

    inference

    Markov Chain Monte Carlo MethodsMotivation and leading example

    B i t bl

    http://goforward/http://find/http://goback/
  • 7/26/2019 Monte Carlo Notes From Main Book

    35/584

    Bayesian troubles

    Conjugate bonanza...

    Example (Binomial)

    For an observation X

    B(n, p) so-called conjugate prior is the

    family of beta Be(a, b) distributions

    Markov Chain Monte Carlo MethodsMotivation and leading example

    Bayesian troubles

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    36/584

    Bayesian troubles

    Conjugate bonanza...

    Example (Binomial)

    For an observation X

    B(n, p) so-called conjugate prior is the

    family of beta Be(a, b) distributionsThe classical Bayes estimator is the posterior mean

    (a + b + n)

    (a + x)(n

    x + b)

    1

    0p px+a1(1 p)nx+b1dp

    = x + aa + b + n

    .

    Markov Chain Monte Carlo MethodsMotivation and leading example

    Bayesian troubles

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    37/584

    Bayesian troubles

    Example (Normal)

    In the normalN(, 2) case, with both and unknown,conjugate prior on = (, 2) of the form

    (2) exp

    ( )2 +

    /2

    Markov Chain Monte Carlo MethodsMotivation and leading example

    Bayesian troubles

    http://goforward/http://find/http://goback/
  • 7/26/2019 Monte Carlo Notes From Main Book

    38/584

    Bayesian troubles

    Example (Normal)

    In the normalN(, 2) case, with both and unknown,conjugate prior on = (, 2) of the form

    (2) exp

    ( )2 +

    /2

    since

    ((, 2)|x1, . . . , xn) (2) exp

    ( )2 +

    /2

    (2)n exp

    n( x)2 + s2x

    /2

    (2)+n exp (+ n)( x)2+ + s2x+

    nn +

    /2

    Markov Chain Monte Carlo MethodsMotivation and leading example

    Bayesian troubles

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    39/584

    Bayesian troubles

    ...and conjugate curse

    The use ofconjugate priors for computational reasons

    implies a restriction on the modeling of the available priorinformation

    Markov Chain Monte Carlo MethodsMotivation and leading example

    Bayesian troubles

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    40/584

    y

    ...and conjugate curse

    The use ofconjugate priors for computational reasons

    implies a restriction on the modeling of the available priorinformation may be detrimental to the usefulness of the Bayesian approach

    Markov Chain Monte Carlo MethodsMotivation and leading example

    Bayesian troubles

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    41/584

    y

    ...and conjugate curse

    The use ofconjugate priors for computational reasons

    implies a restriction on the modeling of the available priorinformation may be detrimental to the usefulness of the Bayesian approach gives an impression of subjective manipulation of the prior

    information disconnected from reality.

    Markov Chain Monte Carlo MethodsMotivation and leading example

    Bayesian troubles

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    42/584

    A typology of Bayes computational problems

    (i). use of a complex parameter space, as for instance inconstrained parameter sets like those resulting from imposingstationarity constraints in dynamic models;

    Markov Chain Monte Carlo MethodsMotivation and leading example

    Bayesian troubles

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    43/584

    A typology of Bayes computational problems

    (i). use of a complex parameter space, as for instance inconstrained parameter sets like those resulting from imposingstationarity constraints in dynamic models;

    (ii). use of a complex sampling model with an intractablelikelihood, as for instance in missing data and graphicalmodels;

    Markov Chain Monte Carlo MethodsMotivation and leading example

    Bayesian troubles

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    44/584

    A typology of Bayes computational problems

    (i). use of a complex parameter space, as for instance inconstrained parameter sets like those resulting from imposingstationarity constraints in dynamic models;

    (ii). use of a complex sampling model with an intractablelikelihood, as for instance in missing data and graphicalmodels;

    (iii). use of a huge dataset;

    Markov Chain Monte Carlo MethodsMotivation and leading example

    Bayesian troubles

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    45/584

    A typology of Bayes computational problems

    (i). use of a complex parameter space, as for instance inconstrained parameter sets like those resulting from imposingstationarity constraints in dynamic models;

    (ii). use of a complex sampling model with an intractablelikelihood, as for instance in missing data and graphicalmodels;

    (iii). use of a huge dataset;

    (iv). use of a complex prior distribution (which may be theposterior distribution associated with an earlier sample);

    Markov Chain Monte Carlo MethodsMotivation and leading example

    Bayesian troubles

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    46/584

    A typology of Bayes computational problems

    (i). use of a complex parameter space, as for instance inconstrained parameter sets like those resulting from imposingstationarity constraints in dynamic models;

    (ii). use of a complex sampling model with an intractablelikelihood, as for instance in missing data and graphicalmodels;

    (iii). use of a huge dataset;

    (iv). use of a complex prior distribution (which may be theposterior distribution associated with an earlier sample);

    (v). use of a complex inferential procedure as for instance, Bayesfactors

    B01(x) =P(0 | x)P(

    1

    |x)

    (0)(

    1)

    .

    Markov Chain Monte Carlo MethodsMotivation and leading example

    Bayesian troubles

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    47/584

    Example (Mixture once again)

    Observations from

    x1, . . . , xnf(x|) =p(x; 1, 1) + (1 p)(x; 2, 2)

    Markov Chain Monte Carlo MethodsMotivation and leading example

    Bayesian troubles

    http://goforward/http://find/http://goback/
  • 7/26/2019 Monte Carlo Notes From Main Book

    48/584

    Example (Mixture once again)

    Observations from

    x1, . . . , xnf(x|) =p(x; 1, 1) + (1 p)(x; 2, 2)

    Prior

    i|iN(i, 2i /ni), 2i IG(i/2, s2i /2), p Be(, )

    Markov Chain Monte Carlo MethodsMotivation and leading example

    Bayesian troubles

    http://goforward/http://find/http://goback/
  • 7/26/2019 Monte Carlo Notes From Main Book

    49/584

    Example (Mixture once again)

    Observations from

    x1, . . . , xnf(x|) =p(x; 1, 1) + (1 p)(x; 2, 2)

    Prior

    i|iN(i, 2i /ni), 2i IG(i/2, s2i /2), p Be(, )

    Posterior

    (|x1, . . . , xn) n

    j=1 {p(xj; 1, 1) + (1 p)(xj; 2, 2)} ()

    =n=0

    (kt)

    (kt)(|(kt))

    n

    Markov Chain Monte Carlo MethodsMotivation and leading example

    Bayesian troubles

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    50/584

    Example (Mixture once again (contd))

    For a given permutation (kt), conditional posterior distribution

    (|(kt)) = N1(kt), 21n1+ IG((1+ )/2, s1(kt)/2)N

    2(kt),

    22n2+ n

    IG((2+ n )/2, s2(kt)/2)

    Be( + , + n

    )

    Markov Chain Monte Carlo MethodsMotivation and leading example

    Bayesian troubles

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    51/584

    Example (Mixture once again (contd))

    where

    x1(kt) = 1

    t=1 xkt , s1(kt) =

    t=1(xktx1(kt))2,

    x2(kt) = 1n

    nt=+1 xkt , s2(kt) =

    nt=+1(xktx2(kt))2

    and

    1(kt) = n11+ x1(kt)

    n1+ , 2(kt) =

    n22+ (n )x2(kt)n2+ n ,

    s1(kt) = s21+ s

    21(kt) +

    n1

    n1+

    (1

    x1(kt))

    2,

    s2(kt) = s22+ s

    22(kt) +

    n2(n )n2+ n (2x2(kt))

    2,

    posterior updates of the hyperparameters

    Markov Chain Monte Carlo MethodsMotivation and leading example

    Bayesian troubles

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    52/584

    Example (Mixture once again)

    Bayes estimator of:

    (x1, . . . , xn) =n=0

    (kt)

    (kt)E[|x, (kt)]

    Too costly: 2n terms

    Markov Chain Monte Carlo Methods

    Motivation and leading example

    Bayesian troubles

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    53/584

    press for AR

    Example (Poly-t priors)

    Normal observation x N(, 1), with conjugate prior

    N(, )

    Closed form expression for the posterior mean

    f(x|) () d

    f(x|) () d=

    = x + 2

    1 + 2 .

    Markov Chain Monte Carlo Methods

    Motivation and leading example

    Bayesian troubles

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    54/584

    Example (Poly-t priors (2))

    More involved prior distribution:poly-t distribution

    [Bauwens,1985]

    () =ki=1

    i+ ( i)2

    i i, i >0

    Markov Chain Monte Carlo Methods

    Motivation and leading example

    Bayesian troubles

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    55/584

    Example (Poly-t priors (2))

    More involved prior distribution:poly-t distribution

    [Bauwens,1985]

    () =ki=1

    i+ ( i)2

    i i, i >0Computation ofE[|x]???

    Markov Chain Monte Carlo Methods

    Motivation and leading example

    Bayesian troubles

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    56/584

    Example (AR(p) model)

    Auto-regressive representation of a time series,

    xt =

    p

    i=1 ixti+ t

    Markov Chain Monte Carlo Methods

    Motivation and leading example

    Bayesian troubles

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    57/584

    Example (AR(p) model)

    Auto-regressive representation of a time series,

    xt =

    p

    i=1 ixti+ tIf order p unknown, predictive distribution ofxt+1 given by

    (xt+1|xt, . . . , x1) f(xt+1|xt, . . . , xtp+1)(, p|xt, . . . , x1)dpd,

    Markov Chain Monte Carlo Methods

    Motivation and leading example

    Bayesian troubles

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    58/584

    Example (AR(p) model (contd))

    Integration over the parameters of all models

    p=0

    f(xt+1|xt, . . . , xtp+1)(|p, xt, . . . , x1) d (p|xt, . . . , x1) .

    Markov Chain Monte Carlo Methods

    Motivation and leading example

    Bayesian troubles

    http://goforward/http://find/http://goback/
  • 7/26/2019 Monte Carlo Notes From Main Book

    59/584

    Example (AR(p) model (contd))Multiple layers of complexity

    (i). Complex parameter space within each AR(p) model becauseof stationarity constraint

    (ii). ifp unbounded, infinity of models(iii). varies between models AR(p) and AR(p + 1), with a

    different stationarity constraint (except for rootreparameterisation).

    (iv). if prediction used sequentially, every tick/second/hour/day,posterior distribution(, p|xt, . . . , x1) must be re-evaluated

    Markov Chain Monte Carlo Methods

    Random variable generation

    R d i bl ti

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    60/584

    Random variable generation

    Motivation and leading example

    Random variable generationBasic methods

    Uniform pseudo-random generatorBeyond Uniform distributionsTransformation methodsAccept-Reject MethodsFundamental theorem of simulation

    Log-concave densities

    Monte Carlo Integration

    Notions on Markov Chains Markov Chain Monte Carlo Methods

    Random variable generation

    R d i bl ti

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    61/584

    Random variable generation

    Rely on the possibility of producing (computer-wise) anendless flow of random variables (usually iid) from well-knowndistributions

    Markov Chain Monte Carlo Methods

    Random variable generation

    R d i bl ti

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    62/584

    Random variable generation

    Rely on the possibility of producing (computer-wise) anendless flow of random variables (usually iid) from well-knowndistributions

    Given a uniform random number generator, illustration ofmethods that produce random variables from both standardand nonstandard distributions

    Markov Chain Monte Carlo Methods

    Random variable generation

    Basic methods

    The inverse transform method

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    63/584

    The inverse transform method

    For a functionF on R, thegeneralized inverseofF, F, is definedby

    F

    (u) = inf{x; F(x)u}.

    Markov Chain Monte Carlo Methods

    Random variable generation

    Basic methods

    The inverse transform method

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    64/584

    The inverse transform method

    For a functionF on R, thegeneralized inverseofF, F, is definedby

    F

    (u) = inf{x; F(x)u}.Definition (Probability Integral Transform)

    IfU U[0,1], then the random variable F(U) has the distributionF.

    Markov Chain Monte Carlo Methods

    Random variable generation

    Basic methods

    The inverse transform method (2)

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    65/584

    The inverse transform method (2)

    To generate a random variable XF, simply generate

    U U[0,1]

    Markov Chain Monte Carlo Methods

    Random variable generation

    Basic methods

    The inverse transform method (2)

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    66/584

    The inverse transform method (2)

    To generate a random variable XF, simply generate

    U U[0,1]and then make the transform

    x=F(u)

    Markov Chain Monte Carlo Methods

    Random variable generation

    Uniform pseudo-random generator

    Desiderata and limitations

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    67/584

    Desiderata and limitationsskip Uniform

    Production of a deterministicsequence of values in [0, 1]which imitates a sequence ofiiduniform random variablesU[0,1].

    Markov Chain Monte Carlo Methods

    Random variable generation

    Uniform pseudo-random generator

    Desiderata and limitations

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    68/584

    Desiderata and limitationsskip Uniform

    Production of a deterministicsequence of values in [0, 1]which imitates a sequence ofiiduniform random variablesU[0,1].

    Cant use the physical imitation of a random draw [noguarantee of uniformity, no reproducibility]

    Markov Chain Monte Carlo Methods

    Random variable generation

    Uniform pseudo-random generator

    Desiderata and limitations

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    69/584

    Desiderata and limitationsskip Uniform

    Production of a deterministicsequence of values in [0, 1]which imitates a sequence ofiiduniform random variablesU[0,1].

    Cant use the physical imitation of a random draw [noguarantee of uniformity, no reproducibility]

    Randomsequence in the sense: Having generated(X1, , Xn), knowledge ofXn [or of (X1, , Xn)] impartsno discernible knowledge of the value ofXn+1.

    Markov Chain Monte Carlo Methods

    Random variable generation

    Uniform pseudo-random generator

    Desiderata and limitations

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    70/584

    Desiderata and limitationsskip Uniform

    Production of a deterministicsequence of values in [0, 1]which imitates a sequence ofiiduniform random variablesU[0,1].

    Cant use the physical imitation of a random draw [noguarantee of uniformity, no reproducibility]

    Randomsequence in the sense: Having generated(X1, , Xn), knowledge ofXn [or of (X1, , Xn)] impartsno discernible knowledge of the value ofXn+1.

    Deterministic: Given the initial value X0, sample(X1, , Xn) always the same

    Markov Chain Monte Carlo Methods

    Random variable generation

    Uniform pseudo-random generator

    Desiderata and limitations

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    71/584

    Desiderata and limitationsskip Uniform

    Production of a deterministicsequence of values in [0, 1]which imitates a sequence ofiiduniform random variablesU[0,1].

    Cant use the physical imitation of a random draw [noguarantee of uniformity, no reproducibility]

    Randomsequence in the sense: Having generated(X1, , Xn), knowledge ofXn [or of (X1, , Xn)] impartsno discernible knowledge of the value ofXn+1.

    Deterministic: Given the initial value X0, sample(X1, , Xn) always the same Validity of a random number generator based on a singlesample X1, , Xn when n tends to +, not on replications

    (X11, , X1n), (X21, , X2n), . . . (Xk1, , Xkn)

    where n fixed and k tends to infinity Markov Chain Monte Carlo Methods

    Random variable generation

    Uniform pseudo-random generator

    Uniform pseudo-random generator

    http://goforward/http://find/http://goback/
  • 7/26/2019 Monte Carlo Notes From Main Book

    72/584

    Uniform pseudo random generator

    Algorithm starting from an initial value 0u01 and atransformationD, which produces a sequence

    (ui) = (Di(u

    0))

    in [0, 1].

    Markov Chain Monte Carlo Methods

    Random variable generation

    Uniform pseudo-random generator

    Uniform pseudo-random generator

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    73/584

    p g

    Algorithm starting from an initial value 0u01 and atransformationD, which produces a sequence

    (ui) = (Di(u

    0))

    in [0, 1].For all n,

    (u1, , un)reproduces the behavior of an iid U[0,1] sample (V1, , Vn) whencompared through usual tests

    Markov Chain Monte Carlo Methods

    Random variable generation

    Uniform pseudo-random generator

    Uniform pseudo-random generator (2)

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    74/584

    p g ( )

    Validity means the sequence U1, , Un leads to accept thehypothesis

    H :U1

    ,

    , Un

    are iid U[0,1]

    .

    Markov Chain Monte Carlo Methods

    Random variable generation

    Uniform pseudo-random generator

    Uniform pseudo-random generator (2)

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    75/584

    p g ( )

    Validity means the sequence U1, , Un leads to accept thehypothesis

    H :U1

    ,

    , Un

    are iid U[0,1]

    .

    The set of tests used is generally of some consequence KolmogorovSmirnov and other nonparametric tests Time series methods, for correlation between Ui and(Ui1, , Uik) Marsaglias battery of tests called Die Hard(!)

    Markov Chain Monte Carlo Methods

    Random variable generation

    Uniform pseudo-random generator

    Usual generators

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    76/584

    g

    In R and S-plus, procedure runif()

    The Uniform Distribution

    Description:

    runif generates random deviates.

    Example:

    u

  • 7/26/2019 Monte Carlo Notes From Main Book

    77/584

    500 520 540 560 580 600

    0.

    0

    0.

    2

    0.

    4

    0.6

    0.8

    1.

    0

    uniform sample

    0.0 0.2 0.4 0.6 0.8 1.0

    0.

    0

    0.

    5

    1.0

    1.

    5

    Markov Chain Monte Carlo Methods

    Random variable generation

    Uniform pseudo-random generator

    Usual generators (2)

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    78/584

    ( )

    In C, procedure rand() orrandom()

    SYNOPSIS

    #include

    long int random(void);

    DESCRIPTION

    The random() function uses a non-linear additivefeedback random number generator employing a

    default table of size 31 long integers to return

    successive pseudo-random numbers in the range

    from 0 to RAND_MAX. The period of this random

    generator is very large, approximately

    16*((2**31)-1).

    RETURN VALUE

    random() returns a value between 0 and RAND_MAX.

    Markov Chain Monte Carlo Methods

    Random variable generation

    Uniform pseudo-random generator

    Usual generators(3)

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    79/584

    In Scilab, procedure rand()

    rand() : with no arguments gives a scalar whose

    value changes each time it is referenced. By

    default, random numbers are uniformly distributedin the interval (0,1). rand(normal) switches to

    a normal distribution with mean 0 and variance 1.

    EXAMPLE

    x=rand(10,10,uniform)

    Markov Chain Monte Carlo Methods

    Random variable generation

    Beyond Uniform distributions

    Beyond Uniform generators

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    80/584

    Generation of any sequence of random variables can beformally implemented through a uniform generator

    Distributions with explicit F (for instance, exponential, andWeibull distributions), use the probability integral

    transform here

    Markov Chain Monte Carlo Methods

    Random variable generation

    Beyond Uniform distributions

    Beyond Uniform generators

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    81/584

    Generation of any sequence of random variables can beformally implemented through a uniform generator

    Distributions with explicit F (for instance, exponential, andWeibull distributions), use the probability integral

    transform here Case specific methods rely on properties of the distribution (forinstance, normal distribution, Poisson distribution)

    Markov Chain Monte Carlo Methods

    Random variable generation

    Beyond Uniform distributions

    Beyond Uniform generators

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    82/584

    Generation of any sequence of random variables can beformally implemented through a uniform generator

    Distributions with explicit F (for instance, exponential, andWeibull distributions), use the probability integral

    transform here Case specific methods rely on properties of the distribution (forinstance, normal distribution, Poisson distribution)

    More generic methods (for instance, accept-reject andratio-of-uniform)

    Markov Chain Monte Carlo Methods

    Random variable generation

    Beyond Uniform distributions

    Beyond Uniform generators

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    83/584

    Generation of any sequence of random variables can beformally implemented through a uniform generator

    Distributions with explicit F (for instance, exponential, andWeibull distributions), use the probability integral

    transform here Case specific methods rely on properties of the distribution (forinstance, normal distribution, Poisson distribution)

    More generic methods (for instance, accept-reject andratio-of-uniform)

    Simulation of the standard distributions is accomplished quiteefficiently by many numerical and statistical programmingpackages.

    Markov Chain Monte Carlo Methods

    Random variable generation

    Transformation methods

    Transformation methods

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    84/584

    Case where a distribution F is linked in a simple way to anotherdistribution easy to simulate.

    Example (Exponential variables)

    IfU

    U[0,1], the random variable

    X= log U/

    has distribution

    P(Xx) = P( log Ux)= P(U ex) = 1 ex,

    the exponential distribution Exp().

    Markov Chain Monte Carlo Methods

    Random variable generation

    Transformation methods

    Other random variables that can be generated starting from an

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    85/584

    g gexponential include

    Y =2

    j=1

    log(Uj)22

    Y = 1

    aj=1

    log(Uj) Ga(a, )

    Y =aj=1log(Uj)a+bj=1log(Uj)

    Be(a, b)

    Markov Chain Monte Carlo Methods

    Random variable generation

    Transformation methods

    Points to note

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    86/584

    Transformation quite simple to use There are more efficient algorithms for gamma and beta

    random variables Cannot generate gamma random variables with a non-integershape parameter

    For instance, cannot get a 21 variable, which would get us a

    N(0, 1) variable.

    Markov Chain Monte Carlo Methods

    Random variable generation

    Transformation methods

    Box-Muller Algorithm

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    87/584

    Example (Normal variables)

    Ifr, polar coordinates of (X1, X2), then,

    r2 =X21 + X22

    22 = E(1/2) and

    U[0, 2]

    Markov Chain Monte Carlo Methods

    Random variable generationTransformation methods

    Box-Muller Algorithm

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    88/584

    Example (Normal variables)

    Ifr, polar coordinates of (X1, X2), then,

    r2 =X21 + X22

    22 = E(1/2) and

    U[0, 2]

    Consequence: IfU1, U2 iidU[0,1],

    X1 =

    2log(U1) cos(2U2)

    X2 = 2log(U1) sin(2U2)iidN(0, 1).

    Markov Chain Monte Carlo Methods

    Random variable generationTransformation methods

    Box-Muller Algorithm (2)

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    89/584

    1. Generate U1, U2 iidU[0,1] ;2. Define

    x1 = 2log(u1) cos(2u2),x2 =

    2log(u1) sin(2u2) ;

    3. Take x1 and x2 as two independent draws from

    N(0, 1).

    Markov Chain Monte Carlo Methods

    Random variable generationTransformation methods

    Box-Muller Algorithm (3)

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    90/584

    4 2 0 2 4

    4

    3

    2

    1

    0

    1

    2

    3

    Unlike algorithms based on the CLT,this algorithm is exact

    Get two normals for the price oftwo uniforms

    4 2 0 2 4

    0.

    0

    0.

    1

    0.

    2

    0.

    3

    0.

    4

    Drawback (in speed)in calculating log, cos and sin.

    Markov Chain Monte Carlo Methods

    Random variable generationTransformation methods

    More transforms

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    91/584

    Reject

    Example (Poisson generation)

    Poissonexponential connection:IfN P() and Xi Exp(), i N,

    P(N=k) =

    P(X1+

    + Xk

    1< X1+

    + Xk+1).

    Markov Chain Monte Carlo Methods

    Random variable generationTransformation methods

    More Poisson

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    92/584

    Skip Poisson

    A Poisson can be simulated by generating Exp(1) till theirsum exceeds 1.

    This method is simple, but is really practical only for smallervalues of.

    On average, the number of exponential variables required is .

    Other approaches are more suitable for large s.

    Markov Chain Monte Carlo Methods

    Random variable generationTransformation methods

    Atkinsons PoissonTo generate N P():

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    93/584

    To generate N P():1. Define

    =/

    3, = and k= log clog ;

    2. Generate U1

    U[0,1] and calculate

    x={ log{(1 u1)/u1}}/

    until x >0.5 ;3. Define N=

    x + 0.5

    and generate

    U2 U[0,1];4. Accept N if

    x+log (u2/{1+exp(x)}2)k+Nlog log N!.

    Markov Chain Monte Carlo Methods

    Random variable generationTransformation methods

    Negative extension

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    94/584

    A generator of Poisson random variables can produce negativebinomial random variables since,

    Y Ga(n, (1 p)/p) X|y P(y)

    impliesX Neg(n, p)

    Markov Chain Monte Carlo Methods

    Random variable generationTransformation methods

    Mixture representation

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    95/584

    The representation of the negative binomial is a particularcase of a mixture distribution

    The principle of a mixture representation is to represent adensityfas the marginal of another distribution, for example

    f(x) =iY

    pi fi(x),

    If the component distributions fi(x) can be easily generated,

    Xcan be obtained by first choosing fi with probability pi andthen generating an observation from fi.

    Markov Chain Monte Carlo Methods

    Random variable generationTransformation methods

    Partitioned sampling

    http://goforward/http://find/http://goback/
  • 7/26/2019 Monte Carlo Notes From Main Book

    96/584

    Special case of mixture sampling when

    fi(x) =f(x) IAi

    (x)Ai f(x) dxand

    pi= Pr(XAi)for a partition (A

    i)i

    Markov Chain Monte Carlo Methods

    Random variable generationAccept-Reject Methods

    Accept-Reject algorithm

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    97/584

    Many distributions from which it is difficult, or evenimpossible, to directly simulate.

    Another class of methods that only require us to know thefunctional form of the density fof interest only up to amultiplicative constant.

    The key to this method is to use a simpler (simulation-wise)densityg, the instrumental density, from which the simulation

    from the target density f is actually done.

    Markov Chain Monte Carlo Methods

    Random variable generationFundamental theorem of simulation

    Fundamental theorem of simulation

    http://goforward/http://find/http://goback/
  • 7/26/2019 Monte Carlo Notes From Main Book

    98/584

    Lemma

    Simulating

    Xf(x)

    equivalent to simulating

    (X, U) U{(x, u) : 0< u < f(x)}0 2 4 6 8 10

    0.

    00

    0.

    05

    0.

    10

    0.

    15

    0.

    20

    0.

    25

    x

    f(x)

    Markov Chain Monte Carlo Methods

    Random variable generationFundamental theorem of simulation

    The Accept-Reject algorithm

    http://goforward/http://find/http://goback/
  • 7/26/2019 Monte Carlo Notes From Main Book

    99/584

    Given a density of interest f, find a density g and a constant Msuch that

    f(x)M g(x)on the support off.

    Markov Chain Monte Carlo Methods

    Random variable generationFundamental theorem of simulation

    The Accept-Reject algorithm

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    100/584

    Given a density of interest f, find a density g and a constant Msuch that

    f(x)M g(x)on the support off.

    1. Generate Xg, U U[0,1] ;2. Accept Y =X ifU f(X)/M g(X) ;

    3. Return to 1. otherwise.

    Markov Chain Monte Carlo Methods

    Random variable generationFundamental theorem of simulation

    Validation of the Accept-Reject method

    Warranty:

    http://goforward/http://find/http://goback/
  • 7/26/2019 Monte Carlo Notes From Main Book

    101/584

    Warranty:

    This algorithm produces a variable Ydistributed according to f

    4 2 0 2 4

    0

    1

    2

    3

    4

    5

    Markov Chain Monte Carlo Methods

    Random variable generationFundamental theorem of simulation

    Two interesting properties

    http://goforward/http://find/http://goback/
  • 7/26/2019 Monte Carlo Notes From Main Book

    102/584

    First, it provides a generic method to simulate from anydensityf that is known up to a multiplicative factorProperty particularly important in Bayesian calculations wherethe posterior distribution

    (|x)()f(x|).

    is specified up to a normalizing constant

    Markov Chain Monte Carlo Methods

    Random variable generationFundamental theorem of simulation

    Two interesting properties

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    103/584

    First, it provides a generic method to simulate from anydensityf that is known up to a multiplicative factorProperty particularly important in Bayesian calculations wherethe posterior distribution

    (|x)()f(x|).

    is specified up to a normalizing constant

    Second, the probability of acceptance in the algorithm is

    1/M, e.g., expected number of trials until a variable isaccepted is M

    Markov Chain Monte Carlo Methods

    Random variable generationFundamental theorem of simulation

    More interesting properties

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    104/584

    In cases f and g both probability densities, the constant M isnecessarily larger that 1.

    Markov Chain Monte Carlo Methods

    Random variable generationFundamental theorem of simulation

    More interesting properties

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    105/584

    In cases f and g both probability densities, the constant M isnecessarily larger that 1.

    The size ofM, and thus the efficiency of the algorithm, arefunctions of how closely g can imitate f, especially in the tails

    Markov Chain Monte Carlo Methods

    Random variable generationFundamental theorem of simulation

    More interesting properties

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    106/584

    In cases f and g both probability densities, the constant M isnecessarily larger that 1.

    The size ofM, and thus the efficiency of the algorithm, arefunctions of how closely g can imitate f, especially in the tails

    For f/g to remain bounded, necessary for g to have tailsthicker than those off.It is therefore impossible to use the A-R algorithm to simulatea Cauchy distribution fusing a normal distribution g, howeverthe reverse works quite well.

    Markov Chain Monte Carlo Methods

    Random variable generationFundamental theorem of simulation

    No Cauchy!

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    107/584

    Example (Normal from a Cauchy)Take

    f(x) = 1

    2exp(x2/2)

    and

    g(x) = 1

    11 + x2

    ,

    densities of the normal and Cauchy distributions.

    Markov Chain Monte Carlo Methods

    Random variable generationFundamental theorem of simulation

    No Cauchy!

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    108/584

    Example (Normal from a Cauchy)Take

    f(x) = 1

    2exp(x2/2)

    and

    g(x) = 1

    11 + x2

    ,

    densities of the normal and Cauchy distributions.Then

    f(x)

    g(x) = 2 (1 + x2)ex2/2 2e = 1.52attained at x=1.

    Markov Chain Monte Carlo Methods

    Random variable generationFundamental theorem of simulation

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    109/584

    Example (Normal from a Cauchy (2))

    So probability of acceptance

    1/1.52 = 0.66,

    and, on the average, one out of every three simulated Cauchyvariables is rejected.

    Markov Chain Monte Carlo Methods

    Random variable generationFundamental theorem of simulation

    No Double!

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    110/584

    Example (Normal/Double Exponential)

    Generate a N(0, 1) by using a double-exponential distributionwith density

    g(x|) = (/2) exp(|x|)Then

    f(x)

    g(x|)

    2

    1e

    2/2

    and minimum of this bound (in ) attained for

    = 1

    Markov Chain Monte Carlo Methods

    Random variable generationFundamental theorem of simulation

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    111/584

    Example (Normal/Double Exponential (2))

    Probability of acceptance

    /2e= .76To produce one normal random variable requires on the average1/.761.3 uniform variables.

    Markov Chain Monte Carlo Methods

    Random variable generationFundamental theorem of simulation

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    112/584

    truncate

    Example (Gamma generation)

    Illustrates a real advantage of the Accept-Reject algorithmThe gamma distributionGa(, ) represented as the sum ofexponential random variables, only if is an integer

    Markov Chain Monte Carlo Methods

    Random variable generation

    Fundamental theorem of simulation

    Example (Gamma generation (2))

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    113/584

    Can use the Accept-Reject algorithm with instrumental distribution

    Ga(a, b), with a= [], 0.

    (Without loss of generality, = 1.)

    Markov Chain Monte Carlo Methods

    Random variable generation

    Fundamental theorem of simulation

    Example (Gamma generation (2))

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    114/584

    Can use the Accept-Reject algorithm with instrumental distribution

    Ga(a, b), with a= [], 0.

    (Without loss of generality, = 1.)

    Up to a normalizing constant,

    f /gb =baxa exp{(1 b)x} ba

    a(1 b)e

    aforb1.The maximum is attained at b= a/.

    Markov Chain Monte Carlo Methods

    Random variable generation

    Fundamental theorem of simulation

    Cheng and Feasts Gamma generator

    Gamma Ga(, 1), >1 distribution

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    115/584

    ( , ),

    1. Define c1 = 1, c2= ( (1/6))/c1,c3 = 2/c1, c4 = 1 + c3, and c5 = 1/

    .

    2. Repeatgenerate U1, U2

    take U1 =U2+ c5(1 1.86U1) if >2.5until 0< U1

  • 7/26/2019 Monte Carlo Notes From Main Book

    116/584

    Example (Truncated Normal distributions)Constraint x produces density proportional to

    e(x)2/22 Ix

    for a bound large compared with

    Markov Chain Monte Carlo Methods

    Random variable generation

    Fundamental theorem of simulation

    Truncated Normal simulation

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    117/584

    Example (Truncated Normal distributions)Constraint x produces density proportional to

    e(x)2/22 Ix

    for a bound large compared with There exists alternatives far superior to the nave method ofgenerating aN(, 2) until exceeding, which requires an averagenumber of

    1/(( )/)simulations fromN(, 2) for a single acceptance.

    Markov Chain Monte Carlo Methods

    Random variable generation

    Fundamental theorem of simulation

    http://goforward/http://find/http://goback/
  • 7/26/2019 Monte Carlo Notes From Main Book

    118/584

    Example (Truncated Normal distributions (2))Instrumental distribution: translated exponential distribution,E(, ), with density

    g(z) =e(z) Iz.

    Markov Chain Monte Carlo Methods

    Random variable generation

    Fundamental theorem of simulation

    http://goforward/http://find/http://goback/
  • 7/26/2019 Monte Carlo Notes From Main Book

    119/584

    Example (Truncated Normal distributions (2))Instrumental distribution: translated exponential distribution,E(, ), with density

    g(z) =e(z) Iz.

    The ratio f /g is bounded by

    f/g 1/ exp(2/2 ) if > ,1/ exp(

    2/2) otherwise.

    Markov Chain Monte Carlo Methods

    Random variable generation

    Log-concave densities

    Log-concave densities (1)

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    120/584

    move to next chapter Densities fwhose logarithm is concave, forinstance Bayesian posterior distributions such that

    log (|x) = log () + log f(x|) + c

    concave

    Markov Chain Monte Carlo Methods

    Random variable generation

    Log-concave densities

    Log-concave densities (2)

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    121/584

    TakeSn={xi, i= 0, 1, . . . , n + 1} supp(f)

    such that h(xi) = log f(xi) known up to the same constant.

    By concavity ofh, line Li,i+1 through (xi, h(xi)) and(xi+1, h(xi+1))

    Markov Chain Monte Carlo Methods

    Random variable generation

    Log-concave densities

    Log-concave densities (2)

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    122/584

    TakeSn={xi, i= 0, 1, . . . , n + 1} supp(f)

    such that h(xi) = log f(xi) known up to the same constant.

    By concavity ofh, line Li,i+1 through (xi, h(xi)) and(xi+1, h(xi+1))

    x1

    x2

    x3

    x4

    x

    L (x)2,3

    log f(x)

    below h in [xi, xi+1] and

    above this graph outside this interval

    Markov Chain Monte Carlo Methods

    Random variable generation

    Log-concave densities

    Log-concave densities (3)

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    123/584

    For x[xi, xi+1], if

    hn(x) = min{Li1,i(x), Li+1,i+2(x)} and hn(x) =Li,i+1(x) ,

    the envelopes arehn(x)h(x)hn(x)

    uniformly on the support off, with

    hn(x) =

    and hn(x) = min(L0,1(x), Ln,n+1(x))

    on [x0, xn+1]c.

    Markov Chain Monte Carlo Methods

    Random variable generation

    Log-concave densities

    Log-concave densities (4)

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    124/584

    Therefore, if

    fn

    (x) = exp hn(x) and fn(x) = exp hn(x)

    thenfn

    (x)f(x)fn(x) =n gn(x),where n normalizing constant offn

    Markov Chain Monte Carlo Methods

    Random variable generation

    Log-concave densities

    ARS Algorithm

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    125/584

    1. Initialize n and Sn.

    2. Generate X

    gn(x), U U[0,1]

    .

    3. IfU fn

    (X)/n gn(X), accept X;otherwise, ifU f(X)/n gn(X), accept X

    Markov Chain Monte Carlo Methods

    Random variable generation

    Log-concave densities

    kill ducks

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    126/584

    Example (Northern Pintail ducks)

    Ducks captured at time i with both probability pi and size N ofthe population unknown.Dataset

    (n1, . . . , n11) = (32, 20, 8, 5, 1, 2, 0, 2, 1, 1, 0)

    Number of recoveries over the years 19571968 ofN= 1612Northern Pintail ducks banded in 1956

    Markov Chain Monte Carlo Methods

    Random variable generation

    Log-concave densities

    E l (N th Pi t il d k (2))

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    127/584

    Example (Northern Pintail ducks (2))Corresponding conditional likelihood

    L(p1, . . . , pI

    |N, n1, . . . , nI) =

    N!

    (N r)!

    I

    i=1 pnii (1

    pi)

    Nni ,

    where Inumber of captures, ni number of captured animalsduring the ith capture, and r is the total number of differentcaptured animals.

    Markov Chain Monte Carlo Methods

    Random variable generation

    Log-concave densities

    E l (N th Pi t il d k (3))

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    128/584

    Example (Northern Pintail ducks (3))Prior selectionIf

    N P()and

    i = log

    pi

    1 pi

    N(i, 2),

    [Normal logistic]

    Markov Chain Monte Carlo Methods

    Random variable generation

    Log-concave densities

    E l (N th Pi t il d k (4))

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    129/584

    Example (Northern Pintail ducks (4))

    Posterior distribution

    (, N

    |, n1, . . . , nI)

    N!

    (N r)!N

    N!

    I

    i=1(1 + ei)NIi=1

    exp

    ini 1

    22(i i)2

    Markov Chain Monte Carlo Methods

    Random variable generation

    Log-concave densities

    Example (Northern Pintail ducks (5))

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    130/584

    For the conditional posterior distribution

    (i|N, n1, . . . , nI) exp

    ini 122

    (i i)2

    (1+ei)N ,

    the ARS algorithm can be implemented since

    ini 122

    (i i)2 N log(1 + ei)

    is concave in i.

    Markov Chain Monte Carlo Methods

    Random variable generation

    Log-concave densities

    Posterior distributions of capture log-odds ratios for theyears 19571965.

    1.

    0

    1.

    0

    1.

    0

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    131/584

    10 9 8 7 6 5 4 3

    0.

    0

    0.

    2

    0.

    4

    0.

    6

    0.

    8

    1

    10 9 8 7 6 5 4 3

    0.

    0

    0.

    2

    0.

    4

    0.

    6

    0.

    8

    1

    10 9 8 7 6 5 4 3

    0.

    0

    0.

    2

    0.

    4

    0.

    6

    0.

    8

    1

    10 9 8 7 6 5 4 3

    0.

    0

    0.

    2

    0.

    4

    0.

    6

    0.

    8

    1.

    01960

    10 9 8 7 6 5 4 3

    0.

    0

    0.

    2

    0.

    4

    0.

    6

    0.

    8

    1.

    01961

    10 9 8 7 6 5 4 3

    0.

    0

    0.

    2

    0.

    4

    0.

    6

    0.

    8

    1.

    01962

    10 9 8 7 6 5 4 3

    0.

    0

    0.

    2

    0.

    4

    0.

    6

    0.

    8

    1.

    0

    1963

    10 9 8 7 6 5 4 3

    0.

    0

    0.

    2

    0.

    4

    0.

    6

    0.

    8

    1.

    0

    1964

    10 9 8 7 6 5 4 3

    0.

    0

    0.

    2

    0.

    4

    0.

    6

    0.

    8

    1.

    0

    1965

    Markov Chain Monte Carlo Methods

    Random variable generation

    Log-concave densities

    1960

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    132/584

    1960

    8 7 6 5 4

    0.0

    0.2

    0.

    4

    0.

    6

    0.

    8

    True distribution versus histogram of simulated sample

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Monte Carlo integration

    Motivation and leading example

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    133/584

    Random variable generation

    Monte Carlo Integration

    IntroductionMonte Carlo integrationImportance SamplingAcceleration methodsBayesian importance sampling

    Notions on Markov Chains

    The Metropolis-Hastings Algorithm

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Introduction

    Quick reminder

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    134/584

    Two major classes of numerical problems that arise in statisticalinference

    Optimization - generally associated with the likelihoodapproach

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Introduction

    Quick reminder

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    135/584

    Two major classes of numerical problems that arise in statisticalinference

    Optimization - generally associated with the likelihoodapproach

    Integration- generally associated with the Bayesian approach

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Introduction

    skip Example!

    Example (Bayesian decision theory)

    Bayes estimators are not always posterior expectations but rathersolutions of the minimization problem

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    136/584

    Bayes estimators are not always posterior expectations, but rathersolutions of the minimization problem

    min

    L(, )()f(x|)d .

    Proper loss:For L(, ) = ( )2, the Bayes estimator is the posterior mean

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Introduction

    skip Example!

    Example (Bayesian decision theory)

    Bayes estimators are not always posterior expectations, but rathersolutions of the minimization problem

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    137/584

    Bayes estimators are not always posterior expectations, but rathersolutions of the minimization problem

    min

    L(, )()f(x|)d .

    Proper loss:For L(, ) = ( )2, the Bayes estimator is the posterior meanAbsolute error loss:For L(, ) =| |, the Bayes estimator is theposterior median

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Introduction

    skip Example!

    Example (Bayesian decision theory)

    Bayes estimators are not always posterior expectations, but rathersolutions of the minimization problem

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    138/584

    solutions of the minimization problem

    min

    L(, )()f(x|)d .

    Proper loss:For L(, ) = ( )2, the Bayes estimator is the posterior meanAbsolute error loss:For L(, ) =| |, the Bayes estimator is theposterior medianWith no loss functionuse themaximum a posteriori (MAP) estimator

    arg max

    (|x)()

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Monte Carlo integration

    Monte Carlo integration

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    139/584

    Theme:Generic problem of evaluating the integral

    I = Ef[h(X)] = X

    h(x)f(x)dx

    where X is uni- or multidimensional, fis a closed form, partlyclosed form, or implicit density, and h is a function

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Monte Carlo integration

    Monte Carlo integration (2)

    Monte Carlo solution

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    140/584

    Monte Carlo solutionFirst use a sample (X1, . . . , X m) from the density f toapproximate the integral I by the empirical average

    hm= 1m

    mj=1

    h(xj)

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Monte Carlo integration

    Monte Carlo integration (2)

    Monte Carlo solution

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    141/584

    Monte Carlo solutionFirst use a sample (X1, . . . , X m) from the density f toapproximate the integral I by the empirical average

    hm= 1m

    mj=1

    h(xj)

    which convergeshm

    Ef[h(X)]

    by the Strong Law of Large Numbers

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Monte Carlo integration

    Monte Carlo precision

    Estimate the variance with

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    142/584

    vm= 1

    m

    1

    m 1m

    j=1

    [h(xj) hm]2,

    and for m large,

    hm Ef[h(X)]vm

    N(0, 1).

    Note: This can lead to the construction of a convergence test andof confidence bounds on the approximation ofEf[h(X)].

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Monte Carlo integration

    Example (Cauchy prior/normal sample)

    For estimating a normal mean, a robust prior is a Cauchy prior

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    143/584

    g , p y p

    X N(, 1), C(0, 1).

    Under squared error loss, posterior mean

    (x) =

    1 + 2e(x)

    2/2d

    1

    1 +

    2e(x)

    2/2d

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Monte Carlo integration

    Example (Cauchy prior/normal sample (2))

    Form of

    suggests simulating iid variables

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    144/584

    1, , mN(x, 1)

    and calculating

    m(x) =mi=1

    i1 + 2i

    mi=1

    1

    1 + 2i.

    The Law of Large Numbers implies

    m(x)(x) as m .

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Monte Carlo integration

    10.

    6

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    145/584

    0 200 400 600 800 1000

    9.6

    9.8

    1

    0.0

    10.2

    10.4

    iterations

    Range of estimators m for 100 runs and x= 10

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Importance Sampling

    Importance sampling

    P d

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    146/584

    Paradox

    Simulation from f(the true density) is not necessarily optimal

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Importance Sampling

    Importance sampling

    Paradox

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    147/584

    Paradox

    Simulation from f(the true density) is not necessarily optimal

    Alternative to direct sampling from f is importance sampling,

    based on the alternative representation

    Ef[h(X)] =

    X

    h(x)

    f(x)

    g(x)

    g(x)dx .

    which allows us to use other distributions than f

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Importance Sampling

    Importance sampling algorithm

    Evaluation of

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    148/584

    Ef[h(X)] =

    X

    h(x)f(x)dx

    by

    1. Generate a sample X1, . . . , X n from a distributiong

    2. Use the approximation

    1

    m

    m

    j=1f(Xj)

    g(Xj)

    h(Xj)

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Importance Sampling

    Same thing as before!!!

    http://goforward/http://find/http://goback/
  • 7/26/2019 Monte Carlo Notes From Main Book

    149/584

    Convergence of the estimator

    1

    m

    m

    j=1f(Xj)

    g(Xj)

    h(Xj)

    X h(x)f(x)dx

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Importance Sampling

    Same thing as before!!!

    http://goforward/http://find/http://goback/
  • 7/26/2019 Monte Carlo Notes From Main Book

    150/584

    Convergence of the estimator

    1

    m

    m

    j=1f(Xj)

    g(Xj)

    h(Xj)

    X h(x)f(x)dxconverges for any choice of the distribution g[as long as supp(g)supp(f)]

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Importance Sampling

    Important details

    http://goforward/http://find/http://goback/
  • 7/26/2019 Monte Carlo Notes From Main Book

    151/584

    Instrumental distribution g chosen from distributions easy tosimulate

    The same sample (generated from g) can be used repeatedly,

    not only for different functions h, but also for differentdensities f

    Even dependent proposals can be used, as seen laterPMC chapter

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Importance Sampling

    Althoughg can be any density, some choices are better thanothers:

    Finite variance only when

    http://goforward/http://find/http://goback/
  • 7/26/2019 Monte Carlo Notes From Main Book

    152/584

    Ef

    h2(X)

    f(X)

    g(X)

    =

    X

    h2(x)f2(X)

    g(X) dx

  • 7/26/2019 Monte Carlo Notes From Main Book

    153/584

    Ef

    h2(X)

    f(X)

    g(X)

    =

    X

    h2(x)f2(X)

    g(X) dx

  • 7/26/2019 Monte Carlo Notes From Main Book

    154/584

    Ef

    h2(X)

    f(X)

    g(X)

    =

    X

    h2(x)f2(X)

    g(X) dx

  • 7/26/2019 Monte Carlo Notes From Main Book

    155/584

    Ratio of the densities

    (x) = p(x)

    p0(x)=

    2

    exp x2/2

    (1 + x2)

    very badly behaved: e.g.,

    (x)2p0(x)dx= .

    Poor performances of the associated importance samplingestimator

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Importance Sampling

    200

    http://goforward/http://find/http://goback/
  • 7/26/2019 Monte Carlo Notes From Main Book

    156/584

    0 2000 4000 6000 8000 10000

    0

    50

    100

    150

    iterations

    Range and average of500 replications of IS estimate ofE[exp X] over 10, 000 iterations.

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Importance Sampling

    Optimal importance function

    The choice of g that minimizes the variance of the

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    157/584

    The choice ofg that minimizes the variance of theimportance sampling estimator is

    g

    (x) = |h(x)

    |f(x)Z |h(z)|f(z)dz .

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    158/584

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Importance Sampling

    Practical impact

    mj=1 h(Xj)f(Xj)/g(Xj)mj=1 f(Xj)/g(Xj)

    ,

  • 7/26/2019 Monte Carlo Notes From Main Book

    159/584

    where f and g are known up to constants.

    Also converges to I by the Strong Law of Large Numbers. Biased, but the bias is quite small

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Importance Sampling

    Practical impact

    mj=1 h(Xj)f(Xj)/g(Xj)mj=1 f(Xj)/g(Xj)

    ,

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    160/584

    where f and g are known up to constants.

    Also converges to I by the Strong Law of Large Numbers. Biased, but the bias is quite small In some settings beats the unbiased estimator in squared error

    loss.

    Using the optimal solution does not always work:mj=1 h(xj)f(xj)/|h(xj)|f(xj)m

    j=1 f(xj)/|h(xj)|f(xj) =

    #positive h #negative hmj=11/|h(xj)|

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Importance Sampling

    Selfnormalised importance sampling

    For ratio estimator

    http://goforward/http://find/http://goback/
  • 7/26/2019 Monte Carlo Notes From Main Book

    161/584

    nh =n

    i=1i h(xi)

    n

    i=1i

    with Xig(y) and Wi such that

    E[Wi|Xi=x] =f(x)/g(x)

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Importance Sampling

    Selfnormalised variance

    then

    var(nh) 1

    var(Snh ) 2E[h]cov(Snh Sn1 ) + E[h]2 var(Sn1 )

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    162/584

    var(h) n22

    var(Sh ) 2E [h]cov(Sh , S1 ) + E [h] var(S1 )

    .

    for

    Snh =

    ni=1

    Wih(Xi) , Sn1 =

    ni=1

    Wi

    Rough approximation

    varnh 1nvar(h(X)) {1 + varg(W)}

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Importance Sampling

    Example (Students t distribution)

    X T(,,2), with density

    http://goforward/http://find/http://goback/
  • 7/26/2019 Monte Carlo Notes From Main Book

    163/584

    f(x) = ((+ 1)/2)

    (/2)

    1 +

    (x )22

    (+1)/2.

    Without loss of generality, take = 0, = 1.Problem: Calculate the integral

    2.1 sin(x)

    x n

    f(x)dx.

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Importance Sampling

    Example (Students t distribution (2)) Simulation possibilities

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    164/584

    p

    Directly from f, since f= N(0,1)2

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Importance Sampling

    Example (Students t distribution (2)) Simulation possibilities

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    165/584

    p

    Directly from f, since f= N(0,1)2

    Importance sampling using Cauchy C(0, 1)

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Importance Sampling

    Example (Students t distribution (2)) Simulation possibilities

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    166/584

    p

    Directly from f, since f= N(0,1)2

    Importance sampling using Cauchy C(0, 1) Importance sampling using a normal N(0, 1)

    (expected to be nonoptimal)

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Importance Sampling

    Example (Students t distribution (2)) Simulation possibilities

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    167/584

    Directly from f, since f= N(0,1)2

    Importance sampling using Cauchy C(0, 1) Importance sampling using a normal N(0, 1)

    (expected to be nonoptimal) Importance sampling using a U([0, 1/2.1])

    change of variables

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Importance Sampling

    6.5

    7.0

    6.5

    7.0

    6.5

    7.0

    6.5

    7.0

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    168/584

    0 10000 20000 30000 40000 50000

    5.0

    5.5

    6.0

    0 10000 20000 30000 40000 50000

    5.0

    5.5

    6.0

    0 10000 20000 30000 40000 50000

    5.0

    5.5

    6.0

    0 10000 20000 30000 40000 50000

    5.0

    5.5

    6.0

    Sampling from f(solid lines), importance sampling with

    Cauchy instrumental (short dashes), U([0, 1/2.1])instrumental (long dashes) and normal instrumental (dots).

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Importance Sampling

    IS suffers from curse of dimensionality

    As dimension increases, discrepancy between importance andtarget worsens

    skip explanation

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    169/584

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    170/584

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Importance Sampling

    Since E[Vn+1] = 1 and Vn+1 independent fromFn,

    E(Un+1| F

    n) =UnE(Vn+1| F

    n) =Un,

    and thus{Un}n0 martingaleSi

    b J i lit

  • 7/26/2019 Monte Carlo Notes From Main Book

    171/584

    Sincex x concave, by Jensen s inequality,

    E(Un+1 | Fn) E(Un+1 | Fn) Unand thus{Un}n0 supermartingaleAssume E(

    Vn+1)< 1. Then

    E(Un) =n

    k=1

    E(Vk) 0, n .

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Importance Sampling

    But{Un}n0 is a nonnegative supermartingale and thus

    Unconverges a.s. to a random variable Z 0. By Fatous lemma,

    E(Z) = E

    limn

    Un lim infn E(Un) = 0.

    Hence Z = 0 and U 0 a s which implies that the martingale

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    172/584

    Hence, Z= 0 and Un 0 a.s., which implies that the martingale{Un}n0 is not regular.Apply these results to Vk =

    dd

    (ik), i {1, . . . , N }:

    E

    d

    d(ik)

    E

    d

    d(ik)

    = 1.

    with equality iff dd

    = 1, -a.e., i.e. = .

    Thus all importance weights converge to 0

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Importance Sampling

    too volatile!

    Example (Stochastic volatility model)

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    173/584

    yt=exp(xt/2) t , t N(0, 1)

    with AR(1) log-variance process (or volatility)

    xt+1=xt+ ut , ut N(0, 1)

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Importance Sampling

    Evolution of IBM stocks (corrected from trend and log-ratio-ed)

    6

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    174/584

    0 100 200 300 400 500

    10

    9

    8

    7

    time

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Importance Sampling

    Example (Stochastic volatility model (2))

    Observed likelihood unavailable in closed from.Joint posterior (or conditional) distribution of the hidden state

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    175/584

    p ( )sequence{Xk}1kKcan be evaluated explicitly

    Kk=2

    exp 2(xk xk1)2 + 2 exp(xk)y2k+ xk /2 , (2)up to a normalizing constant.

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Importance Sampling

    Computational problems

    Example (Stochastic volatility model (3))

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    176/584

    Example (Stochastic volatility model (3))

    Direct simulation from this distribution impossible because of

    (a) dependence among the Xks,(b) dimension of the sequence{Xk}1kK, and(c) exponential term exp(xk)y2k within (2).

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Importance Sampling

    Importance sampling

    Example (Stochastic volatility model (4))

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    177/584

    Natural candidate: replace the exponential term with a quadraticapproximation to preserve Gaussianity.

    E.g., expand exp(xk) around its conditional expectationxk1 as

    exp(xk)exp(xk1)

    1 (xk xk1) +12

    (xk xk1)2

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Importance Sampling

    Example (Stochastic volatility model (5))

    Corresponding Gaussian importance distribution with mean

    k =xk1{2 + y2kexp(xk1)/2} {1 y2kexp(xk1)}/2

    2 + 2 ( )/2

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    178/584

    k2 + y2kexp(xk1)/2

    and variance

    2k = (2 + y2kexp(xk1)/2)1

    Prior proposal on X1,

    X1 N(0, 2)

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Importance Sampling

    Example (Stochastic volatility model (6))

    Simulation starts with X1 and proceeds forward to Xn, each Xkbeing generated conditional on Yk and the previously generatedX

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    179/584

    Xk1.Importance weight computed sequentially as the product of

    exp 2(xk xk1)2 + exp(xk)y2k+ xk /2exp 2k (xk k)2 1k .

    (1kK)

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Importance Sampling

    sity

    0.0

    6

    0.0

    8

    0.1

    0

    2

    0.1

    0.0

    0.1

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    180/584

    weights

    Dens

    15 5 0 5 10 15

    0.0

    0

    0.0

    2

    0.0

    4

    0 20 40 60 80 100

    0.4

    0.3

    0.2

    t

    Histogram of the logarithms of the importance weights (left)and comparison between the true volatility and the best fit,based on 10, 000 simulated importance samples.

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Importance Sampling

    0.

    2

    0.

    4

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    181/584

    0 20 40 60 80 100

    0.

    4

    0.

    2

    0.0

    t

    Corresponding range of the simulated{Xk}1k100,compared with the true value.

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Acceleration methods

    Correlated simulations

    Negative correlation reduces varianceSpecial technique but efficient when it appliesTwo samples (X1, . . . , X m) and (Y1, . . . , Y m) fromf to estimate

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    182/584

    I = R h(x)f(x)dxby

    I1 =

    1

    m

    m

    i=1h(Xi) and

    I2 =

    1

    m

    m

    i=1h(Yi)

    with mean I and variance 2

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Acceleration methods

    Variance reduction

    Variance of the averageI1 +I2 2 1( )

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    183/584

    var

    I1+ I2

    2

    =

    2 +

    1

    2cov(

    I1,

    I2).

    If the two samples are negatively correlated,

    cov(I1,I2)0 ,they improve on two independent samples of same size

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Acceleration methods

    Antithetic variables

    If f symmetric about take Yi = 2 Xi

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    184/584

    Iff symmetric about , take Yi = 2 Xi IfXi=F1(Ui), take Yi=F1(1 Ui) If (Ai)i partition ofX,partitioned sampling by sampling

    Xjs in each Ai (requires to know Pr(Ai))

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Acceleration methods

    Control variates

    out of control!

    For

    I =

    h(x)f(x)dx

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    185/584

    unknown and

    I0 = h0(x)f(x)dxknown,

    I0 estimated by

    I0 and

    I estimated byI

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Acceleration methods

    Control variates (2)

    Combined estimator

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    186/584

    I =

    I+ (

    I0 I0)

    I is unbiased for I andvar(I) = var(I) + 2var(I) + 2cov(I,I0)

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Acceleration methods

    Optimal control

    Optimal choice of

    cov(I,I0)

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    187/584

    = ( , 0)var(

    I0)

    ,

    withvar(I) = (1 2) var(I),

    where correlation between

    I and

    I0

    Usual solution: regression coefficient ofh(xi) over h0(xi)

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Acceleration methods

    Example (Quantile Approximation)

    Evaluate

    = Pr(X > a) =

    a

    f(x)dx

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    188/584

    by

    = 1nni=1

    I(Xi> a),

    with Xi iid f.If Pr(X > ) = 12 known

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Acceleration methods

    Example (Quantile Approximation (2))

    Control variate

    1

    nI(X > a) +

    1

    nI(X > ) Pr(X > )

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    189/584

    =n

    i=1

    I(Xi> a) +

    n

    i=1

    I(Xi> ) Pr(X > )

    improves upon if

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Acceleration methods

    Example (Quantile Approximation (2))

    Control variate

    = 1

    nI(Xi > a) +

    1

    nI(Xi > ) Pr(X > )

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    190/584

    =n

    i=1

    I(Xi> a) +

    n

    i=1

    I(Xi> ) Pr(X > )

    improves upon if ).

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Acceleration methods

    Integration by conditioning

    Use Rao-Blackwell Theorem

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    191/584

    UseRao-Blackwell Theorem

    var(E[(

    X)|Y]) var((X))

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Acceleration methods

    Consequence

    IfI unbiased estimator ofI = Ef[h(X)], with X simulated from ajoint density f(x, y), where

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    192/584

    f(x, y)dy=f(x),

    the estimator I = Ef[I|Y1, . . . , Y n]dominateI(X1, . . . , X n) variance-wise (and is unbiased)

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Acceleration methods

    skip expectation

    Example (Students t expectation)

    For2 2

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    193/584

    E[h(x)] = E[exp(x2)] with X T(, 0, 2)

    a Studentst distribution can be simulated as

    X|y N(, 2y) and Y1 2.

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Acceleration methods

    Example (Students t expectation (2))

    Empirical distribution

    1

    m

    mj=1

    exp(X2j ),

    can be improved from the joint sample

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    194/584

    p j p

    ((X1, Y1), . . . , (Xm, Ym))

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Acceleration methods

    Example (Students t expectation (2))

    Empirical distribution

    1

    m

    mj=1

    exp(X2j ),

    can be improved from the joint sample

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    195/584

    p j p

    ((X1, Y1), . . . , (Xm, Ym))

    since

    1

    m

    m

    j=1 E[exp(X2)|Yj] = 1

    m

    m

    j=11

    22Yj+ 1is the conditional expectation.In this example, precision ten times better

    Markov Chain Monte Carlo Methods

    Monte Carlo Integration

    Acceleration methods

    0.5

    6

    0.5

    8

    0.

    60

    0.5

    6

    0.5

    8

    0.

    60

    http://find/
  • 7/26/2019 Monte Carlo Notes From Main Book

    196/584

    0 2000 4000 6000 8000 10000

    0.

    50

    0.

    52

    0.

    54