Bayesian Estimation of DSGE Models - DynareEstimation of DSGE models (III, Likelihood) – b – •...

39
Bayesian Estimation of DSGE Models Stéphane Adjemian Université du Maine, GAINS & CEPREMAP [email protected] http://www.dynare.org/stepan June 28, 2011 June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 1

Transcript of Bayesian Estimation of DSGE Models - DynareEstimation of DSGE models (III, Likelihood) – b – •...

  • Bayesian Estimation of DSGE Models

    Stéphane Adjemian

    Université du Maine, GAINS & CEPREMAP

    [email protected]

    http://www.dynare.org/stepan

    June 28, 2011

    June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 1

  • Bayesian paradigm (motivations)

    • Bayesian estimation of DSGE models with Dynare.

    1. Data are not informative enough...

    2. DSGE models are misspecified.

    3. Model comparison.

    • Prior elicitation.

    • Efficiency issues.

    June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 2

  • Bayesian paradigm (basics)

    • A model defines a joint probability distribution

    parametrized function over a sample of variables:

    p(Y⋆T |θ) (1)

    ⇒ Likelihood.

    • We Assume that our prior information about parameters

    can be summarized by a joint probability density function.

    Let the prior density be p0(θ).

    • The posterior distribution is given by (Bayes theorem

    squared):

    p1 (θ|Y⋆T ) =

    p0 (θ) p(Y⋆T |θ)

    p(Y⋆T )(2)

    June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 3

  • Bayesian paradigm (basics, contd)

    • The denominator is defined by

    p (Y⋆T ) =

    Θp0 (θ) p(Y

    ⋆T |θ)dθ (3)

    ⇒ the marginal density of the sample.

    ⇒ A weighted mean of the sample conditional densities

    over all the possible values for the parameters.

    • The posterior density is proportional to the product of the

    prior density and the density of the sample.

    p1 (θ|Y⋆T ) ∝ p0 (θ) p(Y

    ⋆T |θ)

    ⇒ That’s all we need for any inference about θ!

    • The prior density deforms the shape of the likelihood!

    June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 4

  • A simple example (I)

    • Data Generating Process

    yt = µ+ εt

    where εt ∼iid

    N (0, 1) is a gaussian white noise.

    • Let YT ≡ (y1, . . . , yT ). The likelihood is given by:

    p(YT |µ) = (2π)−T2 e−

    12

    ∑Tt=1(yt−µ)

    2

    • And the ML estimator of µ is:

    µ̂ML,T =1

    T

    T∑

    t=1

    yt ≡ y

    June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 5

  • A simple example (II)

    • Note that the variance of this estimator is a simple function

    of the sample size

    V[µ̂ML,T ] =1

    T

    • Noting that:

    T∑

    t=1

    (yt − µ)2 = νs2 + T (µ− µ̂)2

    with ν = T − 1 and s2 = (T − 1)−1∑T

    t=1(yt − µ̂)2.

    • The likelihood can be equivalently written as:

    p(YT |µ) = (2π)−T2 e−

    12(νs

    2+T (µ−µ̂)2)

    The two statistics s2 and µ̂ are summing up the sample

    information.

    June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 6

  • A simple example (II, bis)

    T∑

    t=1

    (yt − µ)2 =

    T∑

    t=1

    ([yt − µ̂]− [µ− µ̂])2

    =T∑

    t=1

    (yt − µ̂)2 +

    T∑

    t=1

    (µ− µ̂)2 −T∑

    t=1

    (yt − µ̂)(µ− µ̂)

    = νs2 + T (µ− µ̂)2 −

    (T∑

    t=1

    yt − T µ̂

    )(µ− µ̂)

    = νs2 + T (µ− µ̂)2

    The last term cancels out by definition of the sample mean.

    June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 7

  • A simple example (III)

    • Let our prior be a gaussian distribution with expectation

    µ0 and variance σ2µ.

    • The posterior density is defined, up to a constant, by:

    p (µ|YT ) ∝ (2πσ2µ)

    −12 e

    −12

    (µ−µ0)2

    σ2µ × (2π)−T2 e−

    12(νs

    2+T (µ−µ̂)2)

    where the missing constant (denominator) is the marginal

    density (does not depend on µ).

    • We also have:

    p(µ|YT ) ∝ exp

    {−1

    2

    (T (µ− µ̂)2 +

    1

    σ2µ(µ− µ0)

    2

    )}

    June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 8

  • A simple example (IV)

    A(µ) = T (µ− µ̂)2 +1

    σ2µ(µ− µ0)

    2

    = T(µ2 + µ̂2 − 2µµ̂

    )+

    1

    σ2µ

    (µ2 + µ20 − 2µµ0

    )

    =

    (T +

    1

    σ2µ

    )µ2 − 2µ

    (T µ̂+

    1

    σ2µµ0

    )+

    (T µ̂

    2 +1

    σ2µµ20

    )

    =

    (T +

    1

    σ2µ

    )

    µ2 − 2µT µ̂+ 1

    σ2µ

    µ0

    T + 1σ2µ

    +(T µ̂

    2 +1

    σ2µµ20

    )

    =

    (T +

    1

    σ2µ

    )

    µ−T µ̂+ 1

    σ2µ

    µ0

    T + 1σ2µ

    2

    +

    (T µ̂

    2 +1

    σ2µµ20

    )

    (T µ̂+ 1

    σ2µ

    µ0

    )2

    T + 1σ2µ

    June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 9

  • A simple example (V)

    • Finally we have:

    p(µ|YT ) ∝ exp

    1

    2

    (T +

    1

    σ2µ

    )[µ−

    T µ̂+ 1σ2µµ0

    T + 1σ2µ

    ]2

    • Up to a constant, this is a gaussian density with (posterior)

    expectation:

    E [µ] =T µ̂+ 1

    σ2µµ0

    T + 1σ2µ

    and (posterior) variance:

    V [µ] =1

    T + 1σ2µ

    June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 10

  • A simple example (VI, The bridge)

    • The posterior mean is a convex combination of the prior

    mean and the ML estimate.

    – If σ2µ → ∞ (no prior information) then E[µ] → µ̂ (ML).

    – If σ2µ → 0 (calibration) then E[µ] → µ0.

    • If σ2µ

  • Bayesian paradigm (II, Model Comparison)

    • Comparison of marginal densities of the (same) data across

    models.

    • p(Y⋆T |I) measures the fit of model I.

    • Suppose we have a prior distribution over models A, B, ...:

    p(A), p(B), ...

    • Again, using the Bayes theorem we can compute the

    posterior distribution over models:

    p(I|Y⋆T ) =p(I)p(Y⋆T |I)∑Ip(I)p(Y⋆T |I)

    June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 12

  • Estimation of DSGE models (I, Reduced form)

    • Compute the steady state of the model (a system of non

    linear recurrence equations.

    • Compute linear approximation of the model.

    • Solve the linearized model:

    yt − ȳ(θ)n×1

    = T (θ)n×n

    (yt−1 − ȳ(θ))n×1

    +R(θ)n×q

    εtq×1

    (4)

    where n is the number of endogenous variables, q is the

    number of structural innovations.

    – The reduced form model is non linear w.r.t the deep

    parameters.

    – We do not observe all the endogenous variables.

    June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 13

  • Estimation of DSGE models (II, SSM)

    • Let y⋆t be a subset of yt gathering p observed variables.

    • To bring the model to the data, we use a state-space

    representation:

    y⋆t = Z (yt + ȳ(θ))+ηt (5a)

    ŷt = T (θ)ŷt−1 +R(θ)εt (5b)

    where ŷt = yt − ȳ(θ).

    • Equation (5b) is the reduced form of the DSGE model.

    ⇒ state equation

    • Equation (5a) selects a subset of the endogenous variables,

    Z is a p× n matrix filled with zeros and ones.

    ⇒ measurement equation

    June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 14

  • Estimation of DSGE models (III, Likelihood) – a –

    • Let Y⋆T = {y⋆1 , y

    ⋆2 , . . . , y

    ⋆T } be the sample.

    • Let ψ be the vector of parameters to be estimated (θ, the

    covariance matrices of ε and η).

    • The likelihood, that is the density of Y⋆T conditionally on

    the parameters, is given by:

    L(ψ;Y⋆T ) = p (Y⋆T |ψ) = p (y

    ⋆0 |ψ)

    T∏

    t=1

    p(y⋆t |Y

    ⋆t−1, ψ

    )(6)

    • To evaluate the likelihood we need to specify the marginal

    density p (y⋆0 |ψ) (or p (y0|ψ)) and the conditional density

    p(y⋆t |Y

    ⋆t−1, ψ

    ).

    June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 15

  • Estimation of DSGE models (III, Likelihood) – b –

    • The state-space model (5), describes the evolution of the

    endogenous variables’ distribution.

    • The distribution of the initial condition (y0) is set equal to

    the ergodic distribution of the stochastic difference

    equation (so that the distribution of yt is time invariant).

    • Because we consider a linear(ized) reduce form model and

    the disturbances are supposed to be gaussian (say

    ε ∼ N (0,Σ)) then the initial (ergodic) distribution is also

    gaussian:

    y0 ∼ N (E∞[yt],V∞[yt])

    • Unit roots (diffuse kalman filter).

    June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 16

  • Estimation (III, Likelihood) – c –

    • Evaluation of the density of y⋆t |Y⋆t−1 is not trivial, because

    y⋆t also depends on unobserved endogenous variables.

    • The following identity can be used:

    p(y⋆t |Y

    ⋆t−1, ψ

    )=

    Λp (y⋆t |yt, ψ) p(yt|Y

    ⋆t−1, ψ)dyt (7)

    The density of y⋆t |Y⋆t−1 is the mean of the density of y

    ⋆t |yt

    weigthed by the density of yt|Y⋆t−1.

    • The first conditional density is given by the measurement

    equation (5a).

    • A Kalman filter is used to evaluate the density of the latent

    variables (yt) conditional on the sample up to time t− 1

    (Y⋆t−1) [⇒ predictive density ].

    June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 17

  • Estimation (III, Likelihood) – d –

    • The Kalman filter can be seen as a bayesian recursiveestimation routine:

    p (yt|Y⋆t−1, ψ) =

    Λ

    p (yt|yt−1, ψ) p (yt−1|Y⋆t−1, ψ) dyt−1 (8a)

    p (yt|Y⋆t , ψ) =

    p (y⋆t |yt, ψ) p (yt|Y⋆t−1, ψ)∫

    Λp (y⋆t |yt, ψ) p

    (yt|Y⋆t−1, ψ

    )dyt

    (8b)

    • Equation (8a) says that the predictive density of the latent

    variables is the mean of the density of yt|yt−1, given by the

    state equation (5b), weigthed by the density yt−1

    conditional on Y⋆t−1 (given by (8b)).

    • The update equation (8b) is an application of the Bayes

    theorem → how to update our knowledge about the latent

    variables when new information (data) becomes available.

    June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 18

  • Estimation (III, Likelihood) – e –

    p (yt|Y⋆t , ψ) =

    p (y⋆t |yt, ψ) p(yt|Y

    ⋆t−1, ψ

    )∫Λ p (y

    ⋆t |yt, ψ) p

    (yt|Y⋆t−1, ψ

    )dyt

    • p(yt|Y⋆t−1, ψ

    )is the a priori density of the latent variables

    at time t.

    • p (y⋆t |yt, ψ) is the density of the observation at time t

    knowing the state and the parameters (this density is

    obtained from the measurement equation (5a)) ⇒ the

    likelihood associated to y⋆t .

    •∫Λ p (y

    ∗t |yt, ψ) p

    (yt|Y⋆t−1, ψ

    )dyt is the marginal density of

    the new information.

    June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 19

  • Estimation (III, Likelihood) – f –

    • The linear–gaussian Kalman filter recursion is given by:

    vt = y⋆t − Z(ŷt + ȳ(θ)

    Ft = ZPtZ′+V [η]

    Kt = T (θ)PtT (θ)′

    F−1t

    ŷt+1 = T (θ)ŷt +Ktvt

    Pt+1 = T (θ)Pt(T (θ)−KtZ)′ + R(θ)ΣR(θ)′

    for t = 1, . . . , T , with ŷ0 and P0 given.

    • Finally the (log)-likelihood is:

    lnL (ψ|Y⋆T ) = −Tk

    2ln(2π)−

    1

    2

    T∑

    t=1

    |Ft| −1

    2v′tF

    −1t vt

    • References: Harvey, Hamilton.

    June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 20

  • Simulations for exact posterior analysis

    • Noting that:

    E [ϕ(ψ)] =

    Ψϕ(ψ)p1(ψ|Y

    ⋆T )dψ

    we can use the empirical mean of(ϕ(ψ(1)), ϕ(ψ(2)), . . . , ϕ(ψ(n))

    ), where ψ(i) are draws from

    the posterior distribution to evaluate the expectation of

    ϕ(ψ). The approximation error goes to zero when n→ ∞.

    • We need to simulate draws from the posterior distribution

    ⇒ Metropolis-Hastings.

    • We build a stochastic recurrence whose limiting

    distribution is the posterior distribution.

    June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 21

  • Simulations (Metropolis-Hastings) – a –

    1. Choose a starting point Ψ0 & run a loop over 2-3-4.

    2. Draw a proposal Ψ⋆ from a jumping distribution

    J(Ψ⋆|Ψt−1) = N (Ψt−1, c× Ωm)

    3. Compute the acceptance ratio

    r =p1(Ψ

    ⋆|Y⋆T )

    p(Ψt−1|Y⋆T )=

    K(Ψ⋆|Y⋆T )

    K(Ψt−1|Y⋆T )

    4. Finally

    Ψt =

    Ψ⋆ with probability min(r, 1)

    Ψt−1 otherwise.

    June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 22

  • Simulations (Metropolis-Hastings) – b –

    θo

    K (θo)

    posterior kernel

    June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 23

  • Simulations (Metropolis-Hastings) – c –

    θo

    K (θo)

    θ1 = θ∗

    K(

    θ1)

    = K (θ∗)

    posterior kernel

    June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 24

  • Simulations (Metropolis-Hastings) – d –

    θo

    K (θo)

    θ1

    K(

    θ1)

    θ∗

    K(θ∗) ??

    posterior kernel

    June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 25

  • Simulations (Metropolis-Hastings) – e –

    • How should we choose the scale factor c (variance of the

    jumping distribution) ?

    • The acceptance rate should be strictly positive and not too

    important.

    • How many draws ?

    • Convergence has to be assessed...

    • Parallel Markov chains → Pooled moments have to be

    close to Within moments.

    June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 26

  • Dynare syntax (I)

    var A B C;

    varexo E;

    parameters a b c d e f ;

    model(linear);

    A=A(+1)-b/e*(B-C(+1)+A(+1)-A);

    C=f*A+(1-d)*C(-1);

    ....

    end;

    June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 27

  • Dynare syntax (II)

    estimated_params;

    stderr e, uniform_pdf,,,0,1;

    ....

    a, normal_pdf, 1.5, 0.25;

    b, gamma_pdf, 1, 2;

    c, gamma_pdf, 2, 2, 1;

    ....

    end;

    varobs pie r y rw;

    estimation(datafile=dataraba,first_obs=10,

    ...,mh_jscale=0.5);

    June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 28

  • Prior Elicitation

    • The results may depend heavily on our choice for the prior

    density or the parametrization of the model (not

    asymptotically).

    • How to choose the prior ?

    – Subjective choice (data driven or theoretical), example:

    the Calvo parameter for the Phillips curve.

    – Objective choice, examples: the (optimized)

    Minnesota prior for VAR (Phillips, 1996).

    • Robustness of the results must be evaluated:

    – Try different parametrization.

    – Use more general prior densities.

    – Uninformative priors.

    June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 29

  • Prior Elicitation (parametrization of the model) – a –

    • Estimation of the Phillips curve :

    πt = βEπt+1 +(1− ξp)(1− βξp)

    ξp

    ((σc + σl)yt + τt

    )

    • ξp is the (Calvo) probability (for an intermediate firm) of

    being able to optimally choose its price at time t. With

    probability 1− ξp the price is indexed on past inflation

    an/or steady state inflation.

    • Let αp ≡1

    1−ξpbe the expected period length during which

    a firm will not optimally adjust its price.

    • Let λ = (1−ξp)(1−βξp)ξp

    be the slope of the Phillips curve.

    • Suppose that β, σc and σl are known.

    June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 30

  • Prior Elicitation (parametrization of the model) – b –

    • The prior may be defined on ξp, αp or the slope λ.

    • Say we choose a uniform prior for the Calvo probability:

    ξp ∼ U[.51,.99]

    The prior mean is .75 (so that the implied value for αp is 4

    quarters). This prior is often think as a non informative

    prior...

    • An alternative would be to choose a uniform prior for αp:

    αp ∼ U[1− 1.51 ,1−1.99 ]

    • These two priors are very different!

    June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 31

  • Prior Elicitation (parametrization of the model) – c –

    0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

    5

    10

    15

    20

    25

    30

    35

    40

    45

    # The prior on αp is much more informative than the prior on

    ξp.

    June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 32

  • Prior Elicitation (parametrization of the model) – d –

    0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 10

    10

    20

    30

    40

    50

    60

    70

    80

    Implied prior density of ξp = 1−1αp

    if the prior density of αp is

    uniform.

    June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 33

  • Prior Elicitation (more general prior densities)

    • Robustness of the results may be evaluated by considering

    a more general prior density.

    • For instance, in our simple example we could assume a

    student prior density for µ instead of a gaussian density.

    June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 34

  • Prior Elicitation (flat prior)

    • If a parameter, say µ, can take values between −∞ and ∞,

    the flat prior is a uniform density between −∞ and ∞.

    • If a parameter, say σ, can take values between 0 and ∞, the

    flat prior is a uniform density between −∞ and ∞ for log σ:

    p0(log σ) ∝ 1 ⇔ p0(σ) ∝1

    σ

    • Invariance.

    • Why is this prior non informative ?...∫p0(µ)dµ is not

    defined! ⇒ Improper prior.

    • Practical implications for DSGE estimation.

    June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 35

  • Prior Elicitation (non informative prior)

    • An alternative, proposed by Jeffrey, is to use the Fisher

    information matrix:

    p0(ψ) ∝ |I(ψ)|12

    with

    I(ψ) = E

    [(∂p(Y⋆T |ψ)

    ∂ψ

    )(∂p(Y⋆T |ψ)

    ∂ψ

    )′]

    • The idea is to mimic the information in the data...

    • Automatic choice of the prior.

    • Invariance to any continuous transformation of the

    parameters.

    • Very different results (compared to the flat prior) ⇒ Unit

    root controverse.

    June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 36

  • Effective prior mass

    • Dynare excludes parameters such that the steady state does

    not exist, or such that the BK conditions are not satisfied.

    • The effective prior mass can be less than 1.

    • Comparison of marginal densities of the data is not

    informative if the prior mass is not invariant across models.

    • The estimation of the posterior mode is more difficult if the

    effective prior mass is less than 1.

    June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 37

  • Efficiency (I)

    • If possible use options use_dll (need a compiler, gcc)

    • Do not let dynare compute the steady state!

    • Even if there is no closed form solution for the steady state,

    the static model can be concentrated and reduced to a

    small nonlinear system of equations, which can be solved by

    using standard newton algorithm in the steastate file.

    This numerical part can be done in a mex routine called by

    the steastate file.

    • Alternative initialization of the Kalman filter (lik_init=4).

    June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 38

  • Efficiency (II)

    Fixed point of the state equation Fixed point of the Riccati equation

    Execution time 7.28 3.31

    Likelihood -1339.034 -1338.915

    Marginal density 1296.336 1296.203

    α 0.3526 0.3527

    β 0.9937 0.9937

    γa 0.0039 0.0039

    γm 1.0118 1.0118

    ρ 0.6769 0.6738

    ψ 0.6508 0.6508

    δ 0.0087 0.0087

    σεa 0.0146 0.0146

    σεm 0.0043 0.0043

    Table 1: Estimation of fs2000. In percentage the maximum

    deviation is less than 0.45.

    June 28, 2011 Université du Maine, GAINS & CEPREMAP Page 39