Estimation and Inference by the Method of Projection...

54
This version, October 2008 Estimation and Inference by the Method of Projection Minimum Distance: An Application to the New Keynesian Hybrid Phillips Curve Abstract In most macroeconomic models, the stability of the solution path implies that the system is covariance- stationary and hence admits a Wold representation. The ability to estimate this Wold representation semi-parametrically by local projections (Jordà, 2005), even when the solution path’s process is unknown or unconventional, can be exploited to estimate the model’s parameters by minimum distance techniques. We label this two-step, least-squares, estimation procedure “projection minimum distance” (PMD) and show that: (1) it is consistent and asymptotically normal for a large class of problems; (2) it is ecient even in relatively small samples; and (3) it is asymptotically equivalent to maximum likelihood and nests most applications of generalized method of moments as a special case. Although PMD is a general method, we investigate its properties in the context of the New Keynesian hybrid Phillips curve, providing ample Monte Carlo evidence and revisiting Fuhrer and Olivei’s (2005) empirical analysis to an illustrative application. Keywords: impulse response, local projection, minimum chi-square, minimum distance. JEL Codes: C32, E47, C53. Òscar Jordà Department of Economics University of California, Davis One Shields Ave. Davis, CA 95616 e-mail: [email protected] Sharon Kozicki Research Department Bank of Canada 234 Wellington Street Ottawa, Ontario, Canada K1A 0G9 e-mail: [email protected] The views expressed herein are solely those of the authors and do not necessarily reect the views of the Bank of Canada. We thank Colin Cameron, Timothy Cogley, David DeJong, Richard Dennis, Stephen Donald, David Drukker, Jerey Fuhrer, James Hamilton, Peter Hansen, Kevin Hoover, Giovanni Olivei, Peter Robinson, Paul Ruud, Frank Schorfheide, Aaron Smith, Harald Uhlig, Frank Wolak and seminar participants at the Bank of Italy, Bocconi Univeristy - IGIER, Duke University, the Federal Reserve Bank of Dallas, the Federal Reserve Bank of Philadelphia, the Federal Reserve Bank of New York, Federal Reserve Bank of San Francisco, Federal Reserve Bank of St. Louis, Southern Methodist University, Stanford University, the University of California, Berkeley, the University of California, Davis, the University of California, Riverside, the University of Houston, University of Kansas, the University of Pennsylavania, the University of Texas at Austin, and the 2006 Winter Meetings of the American Economic Association in Boston, the 2006 European Meetings of the Econometric Society in Vienna, and the 3 rd Macro Workshop in Vienna, 2006 for useful comments and suggestions. Jordà is thankful for the hospitality of the Federal Reserve Bank of San Francisco during the preparation of this paper.

Transcript of Estimation and Inference by the Method of Projection...

  • This version, October 2008

    Estimation and Inference by the Method of Projection Minimum

    Distance: An Application to the New Keynesian Hybrid Phillips Curve∗

    Abstract

    In most macroeconomic models, the stability of the solution path implies that the system is covariance-stationary and hence admits a Wold representation. The ability to estimate this Wold representationsemi-parametrically by local projections (Jordà, 2005), even when the solution path’s process is unknownor unconventional, can be exploited to estimate the model’s parameters by minimum distance techniques.We label this two-step, least-squares, estimation procedure “projection minimum distance” (PMD) andshow that: (1) it is consistent and asymptotically normal for a large class of problems; (2) it is efficienteven in relatively small samples; and (3) it is asymptotically equivalent to maximum likelihood andnests most applications of generalized method of moments as a special case. Although PMD is a generalmethod, we investigate its properties in the context of the New Keynesian hybrid Phillips curve, providingample Monte Carlo evidence and revisiting Fuhrer and Olivei’s (2005) empirical analysis to an illustrativeapplication.

    • Keywords: impulse response, local projection, minimum chi-square, minimum distance.• JEL Codes: C32, E47, C53.

    Òscar JordàDepartment of EconomicsUniversity of California, DavisOne Shields Ave.Davis, CA 95616e-mail: [email protected]

    Sharon KozickiResearch DepartmentBank of Canada234 Wellington StreetOttawa, Ontario, CanadaK1A 0G9e-mail: [email protected]

    ∗The views expressed herein are solely those of the authors and do not necessarily reflect the views of the Bankof Canada. We thank Colin Cameron, Timothy Cogley, David DeJong, Richard Dennis, Stephen Donald, DavidDrukker, Jeffrey Fuhrer, James Hamilton, Peter Hansen, Kevin Hoover, Giovanni Olivei, Peter Robinson, PaulRuud, Frank Schorfheide, Aaron Smith, Harald Uhlig, Frank Wolak and seminar participants at the Bank of Italy,Bocconi Univeristy - IGIER, Duke University, the Federal Reserve Bank of Dallas, the Federal Reserve Bank ofPhiladelphia, the Federal Reserve Bank of New York, Federal Reserve Bank of San Francisco, Federal ReserveBank of St. Louis, Southern Methodist University, Stanford University, the University of California, Berkeley, theUniversity of California, Davis, the University of California, Riverside, the University of Houston, University ofKansas, the University of Pennsylavania, the University of Texas at Austin, and the 2006 Winter Meetings of theAmerican Economic Association in Boston, the 2006 European Meetings of the Econometric Society in Vienna, andthe 3rd Macro Workshop in Vienna, 2006 for useful comments and suggestions. Jordà is thankful for the hospitalityof the Federal Reserve Bank of San Francisco during the preparation of this paper.

  • 1 Introduction

    Econometric estimation of dynamic stochastic (partial or general) equilibrium models requires

    that practitioners confront the limits that model tractability impose on the universe of variables

    and on the wealth of dynamic interactions observed in reality. Approaches based on model implied

    likelihoods, be it through the classical (e.g. Canova, 2007) or through the Bayesian (e.g. An and

    Schorfheide, 2007) approaches, are only sensible with sufficiently complex and complete models

    that narrow this separation with reality, and when suitable sources of exogenous variation are

    properly ascertained. The inability to conduct controlled experiments in macroeconomics and

    the capriciousness of natural or quasi-natural experiments often limit a practitioner’s choice to

    estimation strategies based on appropriate instrumental variables techniques.

    This paper introduces a statistical method of parameter estimation in which the economic

    model’s restrictions are cast against a flexible, semi-parametric representation of the data based

    on its Wold (or impulse response) representation. The estimation methodology is particularly well-

    suited for models designed to capture dynamic comovement, such as real business cycle models or

    new Keynesian specifications, whose performance if often evaluated based on their ability to match

    persistence and cross-correlation properties of macroeconomic data. The objective is to obtain

    parameter estimates that are robust to incomplete characterizations of the dynamics and/or the

    forcing variables that the behavioral model is trying to explain. The result is a minimum-distance

    estimator that is computationally simple, whose asymptotic properties we fully derive, and whose

    relation to maximum likelihood (ML) and other minimum distance estimators available (such as

    the generalized method of moments or GMM, Sbordone’s 2002 forecast matching estimator, and

    the impulse response matching estimator in Rotemberg and Woodford, 1997; more recently used

    in Christiano, Eichenbaum and Evans, 2005) we establish.

    Perhaps it is useful to frame our discussion in the context of the voluminous literature that

    investigates inflation dynamics (e.g. volume 52 of the Journal of Monetary Economics in 2005 was

    1

  • exclusively dedicated to estimation of the Phillips curve). A critical divide in this literature appears

    to emerge between proponents of limited information, single-equation, instrumental-variable based

    methods (primarily in that issue Galí, Gertler and López-Salido, 2005) versus its critics and

    proponents of full information methods (e.g. Kurmann, 2005; Lindé, 2005; Rudd and Whelan,

    2005), where often times, a complete, New Keynesian formulation of the economy is required. It

    is not difficult to grasp that central to this line of research is the desire to determine from the

    data the degree of backward/forward looking behavior of the Phillips curve because it is so central

    in determining optimal monetary policy responses, sacrifice ratios, and the stability of competing

    policy prescriptions (see e.g., Levin and Williams, 2003).

    Common arguments against GMM have to do with fears about poor small sample properties,

    and weak instrument problems. Our estimator addresses some of these issues but instead, we

    wish to highlight a far more fundamental issue that has been previously neglected. For expository

    purposes, consider a researcher that is interested in estimating the following generic regression

    (presumably, a representation of a fundamental relation derived from an economic model):

    y = Y β + u (1)

    where Y includes endogenous variables and possibly exogenous or predetermined variables, and

    where z are a set of proposed instruments (that will include whatever variables in Y are exogenous

    or predetermined). Instead, suppose the true data generating process (DGP) is characterized by

    y = Y β + xγ + ε (2)

    where x is a vector of omitted (exogenous or predetermined) variables. Here the omission of

    the variables x is motivated by the researcher’s express belief in the structural nature of the

    relationship in (1), not by their unavailability. Even when the z are valid instruments for (2),

    they will not be valid instruments for (1) in general if E(z0x) 6= 0 and γ 6= 0 since the validity

    of z depends on E(z0u) = 0 and in this case E(z0u) = E(z0x)γ + E(z0ε) = E(z0x)γ 6= 0. Thus,

    2

  • z are valid instruments from the perspective of the DGP in (2), not the proposed model (1).

    This problem is particularly acute in macroeconomics and the common practice of estimating

    Euler equations with GMM using lagged endogenous variables as instruments. As is clear from

    the preceding discussion, lagged endogenous variables can become illegitimate instruments when

    there is omitted feedback and/or omitted variables in the Euler expression.

    A natural solution is to orthogonalize the instrument set z with respect to the possibly omitted

    variables x. Hence consider a first stage regression

    z = xδ + v

    where the residuals of this regression, v, (rather than the predicted values, as is commonly done in

    two-stage least-squares) are proper instruments for the model in expression (1). It turns out that

    the estimator that we propose achieves similar instrument pre-treatment in a manner that can

    be exploited to examine model misspecification and that can be seen as a direct generalization

    of the typical GMM estimator. The paper presents our methods using examples based on the

    Phillips curve as its backdrop but it should be clear from our presentation that our methods are

    not limited to the examples that we provide. In fact, we derive the statistical properties of our

    estimator under general assumptions that include possibly non-linear system’s estimation.

    2 Projection Minimum Distance

    The dynamics of many macroeconomic models often depend on expectations about their future

    values. This is the natural consequence of models with rational expectations or many models with

    learning mechanisms. Furthermore, the relative significance of forward versus backward looking

    terms is of considerable importance in determining optimal policy responses — the stability of the

    solution paths and the economy often depend on this feature. Unfortunately, because expectations

    are based on the same information set that determines backward-looking behavior, it is empirically

    3

  • difficult to disentangle which type of behavior is dominant. Single-equation, limited-information

    estimation methods therefore require appropriate instrumental variables, while full-information

    approaches based on the likelihood (classical or bayesian) require complete and correctly specified

    models of the economy that describe how available information is allocated.

    This section presents the mechanics of our estimation method using as a backdrop the desire

    to estimate a New Keynesian hybrid Phillips curve. We do this because determining the relative

    degree of forward versus backward looking behavior plays such a pivotal role in designing optimal

    monetary policy (see, e.g., Walsh, 2003). Further, the Phillips curve is one of the pillars on

    which standard New Keynesian DSGE models are erected, and as we will show, our method is

    conveniently scalable to estimate such systems.

    The majority of current Phillips curve specifications are derived by imposing a friction on

    a firm’s ability to adjust its price optimally (see, e.g. Calvo, 1983; Galí and Gertler, 1999;

    Christiano, Eichenbaum and Evans, 2005; Eichenbaum and Fisher, 2007; to cite a few). The usual

    set-up involves a continuum of monopolistically competitive, intermediate goods producing firms

    that rent capital and labor in perfectly competitive factor markets. Depending on the choice of

    friction, optimal price-setting rules depend on expectations of future aggregate prices and marginal

    costs (or, under some further assumptions, the gap between actual and potential output).

    We begin from a less ambitious theoretical vantage point and instead consider a common

    formulation of New Keynesian monetary models, specifically

    πt = γfEtπt+1 + γbπt−1 + γggt + επ,t (3)

    gt = βfEtgt+1 + βbgt−1 − βr(Rt −Etπt+1) + εg,t (4)

    Rt = (1− ρ) (ωππt + ωggt) + ρRt−1 + εR,t (5)

    where the first equation is the New Keynesian hybrid Phillips curve with πt the aggregate inflation

    rate, gt the output gap, and where the restriction γf + γb = 1 is commonly imposed as a result of

    4

  • the theory; the second equation is the aggregate demand or IS curve with Rt the nominal interest

    rate; and the third equation is the standard Taylor rule with interest rate smoothing. Such a

    formulation has been studied extensively by Clarida, Galí and Gertler (1999) and more recently

    by Lindé (2005) for a comparative study of the properties of GMM versus FIML estimation of

    the Phillips curve in (3). In fact, we will use an extended formulation of this model to generate

    Monte Carlo simulations in section 5.

    In what follows we focus our attention to estimation of expression (3) exclusively. We do this

    because it makes exposition of the mechanics of our estimator easier to understand but also to

    highlight some of the properties of our estimator when used in a limited-information context.

    It should be clear from our presentation how one would instead do full-information system’s

    estimation and indeed, the formal derivation of the large sample properties in section 4 is done

    under this more general assumption.

    The familiar stable solution path of the system of equations (3)-(5) can be expressed as

    yt = Ayt−1 + Cεt

    where yt ≡ ( πt gt Rt)0; εt ≡ (επ,t εg,t εR,t)0 and A and C are coefficient matrices whose

    values are nonlinear functions of the structural parameters©γf , γb, γg;βf ,βb,βR; ρ,ωπ,ωg

    ª. De-

    fine the resulting reduced-form residuals vt = Cεt, then this stable solution path admits a reduced-

    form Wold representation given by

    yt =∞Xh=0

    Bhvt−h

    where B0 = I; Bh are the reduced-form moving-average or impulse response coefficient matrices

    and E (vtv0t) = Ωv = CΩεC0, where Ωε is a diagonal matrix. More generally, whether or not the

    solution path has this convenient VAR(1) form is not important. What is important is that the

    stability of the solution (which, for example, in other models has a VARMA form instead, see

    5

  • e.g. Fernández Villaverde, Rubio Ramírez, Sargent and Watson 2007) ensures the existence of a

    reduced-form Wold representation.

    We also wish to highlight that we focus on the reduced-form representation because in practice,

    there is usually no formal statistical procedure to verify commonly used structural identification

    assumptions (such as the ubiquitous short-run or long-run recursive schemes). Further Fernández-

    Villaverde et al. (2007) highlight the dangers of imposing incorrect identification assumptions

    when estimating structural parameters with impulse response matching estimators. Our focus on

    the reduced-form representation is a departure from what is common practice in the literature

    (see, e.g. Christiano, Eichenbaum, and Evans, 2005) but a departure that we deem particularly

    advantageous to the extent that the model’s parameters can be estimated from information about

    the serial correlation properties of the data (which are unambiguous) rather than from the con-

    temporaneous correlation between the variables in the system, where the direction of causation is

    much harder to establish formally and is prone to generate inconsistent estimates.

    The mechanics of our estimation method, which we call projection minimum distance (PMD),

    are broadly described as follows. First, we obtain estimates of the first H elements Bh of the

    Wold decomposition with local projections (Jordà, 2005). Second, substitute the variables in

    expression (3) by their Wold representation to obtain a mapping between the Bh (for which

    first stage estimates by local projections will now be available) and the parameters of interest,

    γ ≡ (γf γb γg)0 . Minimize an appropriately weighted distance function to obtain consistent

    and asymptotically normal estimates of γ. We explain these two steps in more detail.

    Using matrix notation to facilitate the explanation and practical implementation of the esti-

    mator, let X be a T 0 × n matrix where T 0 = T −H − k and where n is the number of variables

    in the system (e.g. n = 3 in the example of expressions (3)-(5)). This matrix stacks the obser-

    vations {πt gt Rt}T−Ht=k+1, let Y be a T0H × n matrix that stacks the H, T 0 × n matrices of

    observations {πt+h gt+h Rt+h}Tt=H+k+1 for h = 1, ...,H and let Z collect T0×nk observations

    6

  • corresponding to the k lags {πt−1 gt−1 Rt−1 ... πt−k gt−k Rt−k}T−H=kt=1 . Then, if B

    is the nH×n matrix that stacks the h = 1, ...,H matrices Bh, it is easily estimated with the least

    squares formula

    bBT = ³I ⊗ (X 0MX)−1´ (I ⊗ (X 0MY )) (6)where M = I −Z(Z0Z)−1Z0 and where the covariance matrix of bbT = vec(bBT ) can be computedas

    bΩb = bΨb ⊗ (X 0MX)−1, (7)where

    bΨb = HXh=1

    Φ0hη0η

    (T −H − k)Φh,

    Φh =

    µ0 ... 0 I B1 ... BH−h−1

    ¶and η is the T ×n matrix of residuals of the local projection of yt+1 on to yt. In section 4 we will

    show formally that

    √T −H − k

    ³bbT − b0´ d→ N (0,Ωb)under rather general assumptions about the underlying data generating process.

    The second stage consists of replacing πt and gt by their Wold expressions in expression (3).

    This delivers the following mapping with the parameters of interest:

    Bhi1 = Bh+1i1γf +Bh−1i1γb +Bhi2γg h = 1, ...,H (8)

    where ij refers to the jth column of the identity matrix I. Given first stage estimates bBh andthe linear relation between these and the γ, formal estimates of the latter can be conveniently

    calculated by least squares.

    7

  • More formally, let S0, Sf , Sb be appropriate selector matrices such that, using the first stage

    estimates bBh, expression (8) can be cast simultaneously for every h = 1, ...,H as

    f(bbT ;γ) = hS0 bBT i1 − (Sf bBT i1 Sb bBT i1 S0 bBT i2)γithen consistent and asymptotically normal estimates of γ can be obtained by minimizing

    minγQ(bbT ;γ) =f(bbT ;γ)0cWf(bbT ;γ)

    where cW = ( bF 0bbΩb bFb)−1 and bFb = ∂f(bbT ;bγT )∂b . Hence, if one defines bBY ≡ S0 bBT i1 and bBX ≡( Sf bBT i1 Sb bBT i1 S0 bBT i2 ), the parameters of the Phillips curve in expression (3) can beestimated as:

    bγT = ³bB0XcW bBX´−1 ³bB0XcW bBY ´ , (9)with covariance matrix

    bΩγ = ³bB0XcW bBX´−1 . (10)Section 4 shows formally that for general problems

    √T −H − k (bγT − γ0) d→ N (0,Ωγ)

    where Ωγ = (F 0γWFγ)−1 and Fγ =

    ∂f(b;γ )∂γ . In other words, our estimator can be summarized by

    the following two least-squares steps:

    bBT = ³I ⊗ (X 0MX)−1´ (I ⊗ (X 0MY ))bγT = ³bB0XcW bBX´−1 ³bB0XcW bBY ´

    and the covariance matrix of bγT computed as8

  • bΩγ = ³bB0XcW bBX´−1 .Several remarks deserve mention. First, the optimal weighting matrix cW described above

    can be replaced with the identity matrix and still obtain consistent estimates of γ. This is called

    the equal-weights estimator. The minimum distance literature (see Cameron and Trivedi, 2005)

    suggests that the equal-weights estimator, although less efficient, has lower small-sample bias

    when the sample size is specially short. Second, the optimal weighting matrix is a function of

    bγT itself and hence (9) is not directly feasible. Although one could use a continuously updatedestimator, a simpler (and asymptotically equivalent) solution is to obtain bγEWT from the feasibleequal-weights estimator to construct the optimal weighting matrix and then obtain the optimal

    weights estimator bγOWT and its covariance matrix. Third, when the optimal weights estimator isused and dim(f(bbT ;γ)) > dim(γ) then section 4 shows that

    Q(bbT ; bγT ) d→ χ2dim(f(bbT ;γ ))−dim(γ )which provides a test of overidentifying restrictions (and hence model misspecification) along the

    same lines as the J-test commonly used in GMM.

    Minimum distance approaches are not new in macroeconomics. Although it is very rare to

    find formal derivations of the statistical properties of these estimators (e.g. minimization of struc-

    tural impulse response distances as in Rotemberg and Woodford, 1997; Christiano, Eichenbaum

    and Evans, 2005; of minimization of VAR forecast distances as in Sbordone, 2002; 2005) this

    is not where we see our most important contribution. Instead, the semi-parametric nature of

    the first-stage allows us to be quite general and agnostic about the underlying DGP (which as a

    consequence, includes VARMA specifications, for example).

    This generality is useful in several respects. Like GMM (and unlike MLE) our method does not

    require solving for the rational-expectations equilibrium and then selecting the appropriate stable

    9

  • roots (we only require that the solution be stable so that we can invoke the Wold representation

    theorem). Further, when the Euler expressions are linear, our estimator boils down to two simple

    GLS-type steps. In addition, the flexibility of the first stage has several important payoffs with

    respect to GMM.

    First, in many covariance-stationary processes, the rate at which Bh → 0 as h → ∞ is quite

    fast (exponential, typically) and hence, although in finite samples we truncate at some horizon H,

    our estimator is almost as efficient as MLE (an example of which is provided in our Monte Carlo

    experiments in section 5). The choice of truncation H in practice can be determined conveniently

    with Hall, Inoue, Nason and Rossi’s (2007) information criterion, which is

    bH = arg minH∈{hmin,...,hmax}

    ln³¯̄̄bΩγ ¯̄̄´+ h ln

    ³pT/k

    ´³p

    T/k´ (11)

    where hmin is such that dim(f(bbT ;γ)) = dim(γ).Second, by assuming a Wold representation for yt, we are able to obtain closed-form analytic

    expressions for the optimal weighting matrix cW rather than having to use a semi-parametricestimate such as Newey-West as is common in GMM. This results in obvious gains in efficiency

    of the estimates as we shall see in the Monte Carlo experiments of section 5. Third, it turns out

    that our estimator can be seen as a version of GMM that embeds a recursive pre-treatment of

    potentially illegitimate instruments due to feedback, a feature that we will exploit to check for

    model misspecification and that we elaborate on in more detail below. Finally, notice that the

    method is fully scalable to systems and to nonlinear specifications with little difficulty.

    3 Illegitimate Instruments

    Micro-founded models of the macroeconomy distill a rich economic environment with many vari-

    ables and a plethora of interactions into a few key relations that allow us to understand the

    fundamental forces that drive the economy. The equilibrium conditions characterized by the re-

    sulting Euler equations therefore impose considerable restrictions in the dynamic specifications

    10

  • and included variables. Further, often times the best (or even the only) instruments available to

    estimate such relations are lags of the endogenous variables specified in these expressions. This

    section shows that the validity of these instruments depends on the data, not on Euler conditions

    specified by the economic model and as a result, unmodeled dynamics and /or omitted variables

    generate illegitimate instruments due to feedback and inconsistent GMM parameter estimates.

    One solution would be to enrich the economic model to account more completely for the

    features of the data and certainly many new models (e.g. An and Schorfheide, 2007; Christiano,

    Eichenbaum and Evans, 2005; Smets and Wouters, 2003) have taken this approach while trying to

    preserve enough tractability and the original economic insights of simpler models. However, it is

    difficult to extend this technique as a general (albeit desirable) principle and the fact remains that

    many popular Euler expressions fall well short of properly characterizing the statistical properties

    of the data.

    Here we show that a more practical solution consists in projecting the Euler conditions on

    to the space of likely omitted dynamics/variables or alternatively, projecting the instruments

    themselves onto this same space. We will show that one of the advantages of our estimation

    method over GMM is due to this feature. Specifically, let us return to the example we presented

    in the introduction, where a researcher is interested in estimating the expression

    y = Y β + u (12)

    where y is the dependent variable, Y are endogenous variables, and z are candidate instruments.

    Notice that Y could contain other exogenous or predetermined variables in which case, they would

    be included directly into z so that expression (12) is quite general.

    As an example, suppose we are interested in estimating a Phillips curve with forward-looking

    terms only (see, e.g. Galí, Gertler, and López-Salido, 2001), say

    πt = βEtπt+1 + ut (13)

    where for reasons that will become clear momentarily, we have omitted the usual term associated

    11

  • with demand (e.g. marginal costs, the output gap, etc.). Instead, suppose the DGP is characterized

    by

    y = Y β + xγ + ε (14)

    where x are exogenous and/or predetermined variables (such as other lags of y). Here the key is

    to realize that E(z0x) 6= 0 and γ 6= 0 so that the z are invalid instruments in expression (12)

    although they would be perfectly valid for (14).

    In terms of the simple Phillips curve example, suppose the DGP is

    πt = γfEtπt+1 + γbπt−1 + εt (15)

    instead of that specified in expression (13). LetM ≡ I−x(x0x)−1x0 and notice that EL(z0Mx) = 0

    where EL is the linear projection operator. Hence, if one is interested in estimating expression

    (12) one could pursue two alternatives. One is to run the first stage regression

    z = xφ+ ezand use ez (which are the residuals, not the predicted values as the typical two-stage least-squaresprocedure) as regular instruments in (12). Equivalently, one can project (12) on to the space of x

    and estimate β from

    ey = eY β + εusing z as instruments and where ey and eY are the residuals of the projections of y and Y onto x.In a nonlinear context, of course, the latter projection argument breaks down and the first option

    is clearly more appropriate even if it is approximate.

    We return now to the link between our discussion, GMM estimation and estimation by PMD.

    Consider the running example given by expressions (13) and (15) where in particular, a researcher

    12

  • estimates expression (13) using πt−h as an instrument. It is easy to see that

    bβGMM = PTh πt−hπtPTh πt−hπt+1

    = γf + γb

    PTh πt−hπt−1PTh πt−hπt+1

    +

    PTh πt−hεtPT

    h πt−hπt+1

    and under typical assumptions

    bβGMM p→ γf + γbφh−1φh+1where φh = cov(πt,πt−h). Hence bβ is an inconsistent estimate of γf as long as γb 6= 0 and thebias does not disappear by choosing later lags of πt since φh−1/φh+1 becomes indeterminate as h

    grows.

    Instead PMD suggests estimating β by choosing bβPMD such thatbbh = βPMDbbh+1

    where Mt−h = I −Xt−h(X 0t−hXt−h)−1Xt−h with Xt−h = (1,πt−h−1, ...,πt−h−k)0,

    bbh = PTh πt−hMt−hπtPTh πt−hMt−hπt−h

    and hence

    bβPMD = PTh πt−hMt−hπtPTh πt−hMt−hπt+1

    specifically,

    bβPMD = γf + γbPTh πt−hMt−hπt−1PTh πt−hMt−hπt+1

    +

    PTh πt−hMt−hεtPT

    h πt−hMt−hπt+1

    so that clearly

    bβPMD p→ γf + γb δh−1δh+113

  • where δh is the conditional covariance between πt and πt−h and therefore δh → 0 as h→ 0 (with

    positively serially correlated data).

    In other words, the local projection step automatically projects the instrument, πt−h onto a

    sub-space of omitted dynamics (throughXt−h), thus decreasingly sterilizing the sources of feedback

    that make πt−h an illegitimate instrument. The smaller h is, the more the instrument is sterilized

    and the smaller the bias (all the way down to zero in the limit). As a consequence, a natural

    and complementary way to investigate model misspecification is by plotting estimates bβPMD asa function of h. If the model is correctly specified, the bβPMD(h) will be approximately the samefor any h. Otherwise, fluctuations in bβPMD(h) will be symptomatic of dynamic misspecificationwith the bβPMD(h) estimated with the smallest values of h being the less precise but the moreconsistent estimates of γf .

    We conclude this section by remarking that estimates of the optimal weighting matrix in

    GMM (a key element in constructing efficient standard errors) are notoriously problematic: non-

    parametric spectral density estimators at frequency zero tend to have poor small sample properties

    (see e.g. Christiano and Den Haan, 1996). In contrast, the assumption that the data has a Wold

    representation allows us to provide a simple, analytic expression for the estimate of this matrix

    with good small sample properties (given the general assumptions in the propositions we present

    below).

    Finally, we comment on the relationship between PMD and MLE by observing that the Wold

    representation, under the common assumption of Gaussianity, is a complete representation of all

    the data’s second order properties and hence, as the truncation horizon H →∞, PMD approaches

    MLE. A similar result exists for GMM where if one were to use infinite moment conditions, then

    one would recover MLE’s lower efficiency bound. However, most covariance-stationary processes

    exhibit exponential rates of serial correlation that decay toward zero (just think of an AR(1) with

    parameter 0.5 and its impulse response, which is 0.5, 0.25, 0.125, 0.0625, ...) and in practice one

    can achieve similar parameter estimation efficiency to MLE with relatively small values of H as

    14

  • the small Monte Carlo experiments of section 5 demonstrate.

    4 Statistical Properties of PMD

    This section derives the large sample approximate properties of our estimator in a general setting.

    For this reason, the notation is slightly different than the notation in section 2. We begin by

    showing that the first-stage local projection estimates are consistent and asymptotically normal

    under general conditions and then show that the second stage estimators are also consistent and

    asymptotically normal.

    4.1 Asymptotic Properties of Local Projections: First Stage

    Suppose the n× 1 vector yt is covariance-stationary with Wold representation given by

    yt = μ+∞Xj=0

    Bjut−j (16)

    and where the ut are i.i.d., mean zero with finite covariance matrix Σu and the Bj satisfyP∞j=0 ||Bj || < ∞ where ||Bj ||2 = tr(B0jBj) with B0 = In. Further, assume det{B(z)} 6= 0 for

    |z| ≤ 1 where B(z) =P∞j=0Bjz

    j so that the process can be written in its infinite VAR represen-

    tation

    yt =∞Xj=1

    Ajyt−j + ut

    withP∞j=1 ||Aj ||

  • yt+h = Ah1yt + ...+A

    hkyt−k+1 + vk,t+h

    vk,t+h =∞X

    j=k+1

    Ahj yt−j + ut+h +h−1Xj=1

    Bjut+h−j

    Proposition 1 Consistency. Let {yt} satisfy (16) and assume that:

    (i) E|uit, ujt, ukt, ult|

  • (ii) k satisfies

    k3

    T→ 0;T, k →∞

    (iii) k satisfies

    √T − k −H

    ∞Xk+1

    ||Aj ||→ 0;T, k →∞

    Then

    √T − k −Hvec(bBT −B0) d→ N (0,Ωb)

    Ωb =£(X 0MX)−1 ⊗ Σv

    ¤bΣv = V V 0

    T − k −H

    where recall that bBT = (X 0MX)−1 (X 0MY ), Y is the T × nH matrix of observations for(yt+1, ...,yt+H)

    0;X is the T × n matrix of observations for yt;M = I − Z(Z0Z)−1Z0 where Z

    is the T × n(k + 1) matrix of observations for (1,yt−1, ...,yt−k+1)0 and bV =MY −MX bBT . Theproof is provided in the appendix. Notice that we have modified the dimensions of bBT with respectto section 2 to make the derivations here and in the appendix more straight-forward but without

    loss of generality.

    4.2 Statistical Properties of Projection Minimum Distance: Second

    Step

    Given bBT (and hence bbT = vec(bBT )) consider estimating γ as described in section 2 by minimizingminγbQT (bbT ;γ) =f ³bbT ;γ´0cWf ³bbT ;γ´

    Let Q0(γ) denote the objective function at b0. Then the following lemma shows that the solution

    of this problem, bγT is consistent for γ0.Lemma 3 Consistency. Given that bbT p→ b0 from proposition 1, assume that:(i) cW p→W is a positive semidefinite matrix.

    17

  • (ii) Q0(γ) is uniquely maximized at (b0,γ0) = θ0 ∈ Θ

    (iii) The parameter space Θ is compact

    (iv) f(b0, γ) is continuous in a neighborhood of γ0 ∈ Θ

    (v) instrument relevance condition: rank [WFγ ] = dim(γ) where Fγ =∂f(b0,γ0)

    ∂γ

    (vi) identification condition: dim³f³bbT ;γ´´ ≥ dim(γ)

    Then

    bγT p→ γ0The proof is provided in the appendix where it is worth remarking that the proof takes H to be

    finite and given. If instead H → ∞ with the sample size, then bbT becomes infinite-dimensionaland then one would have to appeal to higher order conditions (such as empirical process theory

    and stochastic equicontinuity of f³bbT ;γ´ with respect to bbT ), which would make the proof more

    general but far less transparent. By taking H to be finite, it is relatively straight-forward to show

    that bQT (γ) p→Q0 uniformly (see Andrews 1994, 1995).Lemma 4 Normality. Assume:

    (i) cW p→ W where W = (FbΩbFb)−1 , a positive definite matrix and where Fb is defined as inassumption (v) below.

    (ii) bbT p→ b0; bγT p→ γ0 from proposition 1 and lemma 3.(iii) b0 and γ0 are in the interior of Θ.

    (iv) f³bbT ;γ´ is continuously differentiable in a neighborhood N of θ0.

    (v) There is a Fb and a Fγ that are continuous at b0 and γ0 respectively and

    supb,γ∈N

    ||∇bf(b,γ)−Fb||p→ 0

    supb,γ∈N

    ||∇γ f(b,γ)−Fγ ||p→ 0

    18

  • (vi) For Fγ = Fγ (γ0) then F0γWFγ is invertible.

    Then

    √T −H − k (bγT − γ0) d→ N (0,Ωγ)

    Ωγ =¡F 0γWFγ

    ¢−1The proof is provided in the appendix using the same principles required to derive the proof

    of asymptotic normality typical of GMM and minimum distance problems (see e.g., Newey and

    McFadden, 1994; Wooldridge, 1994). We have taken the simpler route here of brushing aside

    weak instrument conditions/problems such as those discussed, e.g., in Bekker (1994), Staiger and

    Stock (1997), Stock, Wright and Yogo (2002) and many others with assumption (v) in Lemma 3.

    We felt it was more useful to provide the foundational results first and since the weak instrument

    problems that can arise with projection minimum distance are of a similar nature than those

    already investigated in the literature in a GMM context, we refer the reader to this literature

    directly. In practice, we recommend choosing the optimal impulse response horizon using the

    information criterion in Hall et. al. (2007), whose formula appears in expression (11). In finite

    samples, all asymptotic expressions can be replaced by their usual small sample estimates. Lastly,

    we note that Fb is a function of γ and hence the expression of the optimal weighting matrix cW =(F 0bΩbFb)

    −1 cannot be computed directly. However, a consistent estimate of γ can be obtained

    with the equal-weights matrix cW = I (lemma 3 only requires W to be positive semidefinite toachieve consistency) from which an estimate of γ can be obtained to then construct the optimal-

    weights estimator and hence compute all the relevant statistics. In principle, one can iterate on

    this procedure to refine the estimates of γ although asymptotically, one iteration is sufficient.

    Finally, lemma 4 and standard results are all that is needed to show that a test of overidentifying

    restrictions can be easily obtained by realizing that the minimum distance function bQT evaluatedat the optimum bbT , bγT has a chi-square distribution with degrees of freedom dim³f ³bbT ;γ´´−dim(γ).

    19

  • 5 Small-Sample Properties: Monte Carlo Experiments

    This section contains Monte Carlo experiments designed to show that PMD is computationally

    convenient while not incurring in significant efficiency losses relative to MLE in models whose

    likelihood requires numerical algorithms for its maximization; that PMD provides more efficient

    but similarly unbiased estimates to GMM when the specification of the model is correct; and

    that PMD can be more robust than GMM to certain types of misspecification due to illegitimate

    instrument problems. We showcase these features with two experiments: one compares estimation

    of a simple ARMA(1,1) model estimated by PMD and by MLE. The second generates data from

    an extended version of the New Keynesian model introduced in section 2 and compares the small

    sample properties of the New Keynesian hybrid Phillips curve estimates obtained by PMD and

    with GMM.

    5.1 PMD vs. MLE

    The data for this set of experiments is generated from the univariate ARMA(1,1) model

    yt = ρyt−1 + εt + θεt−1 εt ∼ N(0, 0.5)

    for the following four different pairs of parameter values: (1) ρ = 0.25, θ = 0.50; (2) ρ = 0.50,

    θ = 0.25; (3) ρ = 0, θ = 0.5; and (4) ρ = 0.5, θ = 0. The last two cases are a pure MA(1) and a

    pure AR(1) models but they will be specified as ARMA(1,1) models in the estimation.

    Each of the 1,000 simulation runs has the following features. We use 500 burn-in replications

    to avoid initialization issues with sample sizes T = 50, 100, and 400. The lag length of the local

    projection step is determined automatically by AICC — a correction to AIC for autoregressive

    models introduced by Hurvich and Tsai (1989) with better small sample properties than alternative

    information criteria. For the minimum distance step, we experiment with fixed values H = 2, 5,

    and 10. For H = 2, we have just-identification, otherwise, we have overidentifying restrictions.

    20

  • Given our choices of ρ and θ in all 4 cases, for H = 5 the impulse response coefficients are all

    very close to 0 at that horizon. Hence, by including the case where H = 10 we hope to capture

    possible distortions to the parameter estimates of ρ and θ generated by first stage estimates that

    have virtually zero information content (akin to having weak instruments). It is worth remarking

    that while MLE requires numerical optimization routines, PMD for this example requires two very

    simple least-squares steps. Tables 1.1-1.4 summarize the experiments by reporting Monte Carlo

    averages and standard errors of the parameter estimates calculated with the analytic formulas of

    the large-sample approximations. In addition, empirical Monte Carlo standard errors are provided

    as a check that the formulas provide appropriate values.

    The tables show that PMD estimates converge to the true parameter values as the sample size

    grows at roughly the same or better speed than MLE estimates. This is true even for the small

    samples T = 50 although when H = 10, there is a clear deterioration of the PMD estimates, not

    surprisingly. What is surprising though, is that the effect of having a large number of conditions

    with little information value (H = 10 rather than H = 2 or 5) does not appear to distort the

    estimates (or the standard errors) with sample sizes as low as T = 100 observations. For sample

    sizes T = 100 and 400, PMD and MLE standard errors are virtually the same (when H = 5, 10)

    and comparable to the empirical Monte Carlo values. Finally, we remark that in tables 3 and 4

    (the pure MA(1) and AR(1) DGPs) we had difficulty getting convergence of the MLE estimator

    for all the runs. Instead of trying to redo (or disregard) specific runs, we preferred to leave the

    results blank as a way to highlight that although MLE run into numerical difficulties, PMD is

    numerically stable and robust in all the cases. Thus, a fair summary of these experiments suggests

    that PMD has very good small sample properties, converging quickly to the theoretical values and

    with relatively the same efficiency as MLE even though PMD uses simple least squares algebra

    and MLE requires numerical routines to maximize the likelihood.

    21

  • 5.2 PMD vs. GMM

    This set of experiments borrows several elements from the simulation study in Lindé (2005). In

    that paper, the objective was to compare the small sample properties of GMM vs. FIML estimation

    of the New Keynesian hybrid Phillips curve (such as expression (3)). Here we simulate data from

    a slightly modified version of the New Keynesian model discussed in section 2, equations (3)-(5)

    and compare GMM to PMD instead. Specifically, data will be generated from the model

    ⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩πt = γfEtπt+1 + γ

    1bπt−1 + γ

    2bπt−2 + γggt + επ,t

    gt = βfEtgt+1 + β1bgt−1 + β

    2bgt−2 − βr(Rt −Etπt+1) + εg,t

    Rt = (1− ρ)(ωππt + ωggt) + ρRt−1 + εR,t⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩επ,t = uπ,t

    εg,t = ρgεg,t−1 + ug,t

    εR,t = ρRεR,t−1 + uR,t

    ut ∼ N

    ⎛⎜⎜⎜⎜⎜⎜⎝

    ⎡⎢⎢⎢⎢⎢⎢⎣0

    0

    0

    ⎤⎥⎥⎥⎥⎥⎥⎦ ;⎡⎢⎢⎢⎢⎢⎢⎣0.52 0 0

    0 0.2882 0

    0 0 0.2522

    ⎤⎥⎥⎥⎥⎥⎥⎦

    ⎞⎟⎟⎟⎟⎟⎟⎠for different combinations of parameters to be made more explicit shortly. First, however, notice

    we modified the Phillips and IS curves slightly to include an extra lag than what is conventional.

    We provide no theoretical justification for this but use this device as a way to generate small

    distortions to the canonical specification and check the robustness of PMD and GMM to dynamic

    misspecification. Hence, some of the simulations are conducted with γf + γ1b = 1 and γ

    2b = 0 (and

    similarly for the IS curve parameters), which is the standard specification. In other experiments,

    we simply set γf + γ1b + γ

    2b = 1 and γ

    1b = γ

    2b (and similarly for the IS curve parameters) to induce

    additional serial correlation.

    Most of the parameter choices are borrowed from Lindé (2005) and we refer the reader for a

    more careful justification of these choices there. We investigate three primary different combina-

    tions of parameters:

    1. γf = βf = 0.7; γ1b = β

    1b = 0.3 or γ

    1b = γ

    2b = β

    1b = β

    2b = 0.15; γg = 0.13; and βr = 0.09

    22

  • 2. γf = βf = 0.5; γ1b = β

    1b = 0.5 or γ

    1b = γ

    2b = β

    1b = β

    2b = 0.25; γg = 0.25; and βr = 0.30

    3. γf = βf = 0.3; γ1b = β

    1b = 0.7 or γ

    1b = γ

    2b = β

    1b = β

    2b = 0.35; γg = 0.40; and βr = 1

    The Taylor rule parameters are the same in all cases with ρ = 0.5;ωπ = 1.5 and ωg = 0.5

    and the shock processes are allowed to take the two pairs of values ρg = 0.5 and ρR = 0.8

    or ρg = ρR = 0. The latter case is included as a benchmark since then, a standard New-

    Keynesian hybrid Phillips curve specification estimated by GMM using as instruments lagged

    values of the endogenous variables (including lags of Rt) is correct and should provide estimates

    of the parameters close to the theoretical values. Like Lindé (2005), we experimented by allowing

    Rt to be part of the instrument set originally. However, we found the distortions to the GMM

    estimates to be so considerable with respect to the PMD estimates that we decided to include, for

    completeness, estimates that only use lagged values of Rt.

    1,000 Monte Carlo runs are generated with the different combinations of parameters described

    above, in all 36 different cases summarized in tables 2.1.1-2.3.2. Each run is initialized with 500

    burn-in replications with which a sample of 200 observations (as in Lindé, 2005) is then generated.

    Because we argued in section 3 that dynamic misspecification can be best detected when there is

    variation in the parameter estimates as a function of the choice of impulse response horizon H

    selected, we report estimates based on 2 impulse response horizons and based on impulse response

    horizons optimally selected with Hall et al.’s (2007) information criterion. At the same time, we

    compute GMM estimates based on the same lags for comparison purposes. The lag length of the

    first stage local projections is automatically selected by AICC as in our previous experiments.

    It would be very tedious to comment on each of the numerous cases investigated but some

    general lessons are apparent. First, when the shocks are i.i.d. (so that there are no distortions

    to the internal dynamics of the model) and we examine the case we label “Benchmark” (with

    the traditional dynamic specification), both PMD and GMM provide good estimates although

    estimates of the output gap parameter of the Phillips curve tend to be somewhat downward

    23

  • biased with GMM but to a much lesser extent with PMD. Virtually in all cases, PMD estimates

    are more efficient than their GMM counterparts as we had anticipated. Generally speaking, using

    Rt as an instrument turns out to be a very bad idea for GMM estimation. The reason is that

    this is clearly an invalid instrument and Tables 2.1.2, 2.2.2, and 2.3.2 make this clear. This is

    less of a problem for PMD because all instruments essentially are orthogonalized with respect to

    past information and this solves to a great extent this problem. When Rt is excluded from the

    instrument list, GMM performs much better and we concentrate on these tables next (tables 2.1.1,

    2.2.1, and 2.3.1).

    Whether the dynamic structure is modified by allowing serial correlation in the shocks, richer

    dynamics in the Phillips curve, or richer dynamics in the IS curve, both PMD and GMM have

    more difficulty in obtaining accurate estimates of the parameters, specially the output gap pa-

    rameter. For example, in table 2.1.1 the additional serial correlation in the structural inflation

    Euler equation is enough to cause estimates of the output parameter to flip sign, although more

    generally, we simply observed estimates that were downward biased. Distortions to the degree

    of forward/backward looking behavior of the Phillips curve were, on the other hand, much more

    muted although the distortions obtained with GMM tend to be considerable larger than with

    PMD. In these cases, the parameter estimates changed quite a bit with the number of instruments

    included, an indication of specification problems along the lines anticipated in our discussion of

    section 3.

    Overall, while PMD was not a universal panacea for every foreseeable type of misspecification,

    we obtained estimates that had a smaller bias than GMM in the majority of the cases. When

    the model was correctly specified, there was little difference between the methods but even here

    PMD was less biased and provided more efficient estimates. The introduction of relatively small

    distortions in the dynamic behavior of the model was enough to generate considerable distortions

    in the estimation of the output gap parameter, which plays a very prominent role in this literature.

    Almost in every case considered, the distortion caused the parameter to be downward biased. PMD

    24

  • mitigates this bias somewhat with respect to GMM but not to the extent that would have been

    desirable.

    6 Empirical Application: Fuhrer and Olivei (2005) Revis-

    ited

    Estimating the Phillips and IS curves in expressions (3) and (4) by limited-information methods is

    difficult due to the poor small-sample properties of popular estimators. Fuhrer and Olivei (2005)

    discuss the weak instrument problem that characterizes GMM in this type of application and

    then propose a GMM variant where the dynamic constraints of the economic model are imposed

    on the instruments to improve small sample performance. They dub this procedure “optimal

    instruments” GMM (OI−GMM) and explore its properties relative to conventional GMM and

    MLE estimators with Monte Carlo experiments.

    We find it is useful to apply PMD to the same examples Fuhrer and Olivei (2005) analyze to

    provide the reader a context of comparison for our method. The basic specification is (using the

    same notation as in Fuhrer and Olivei, 2005):

    zt = (1− μ) zt−1 + μEtzt+1 + γEtxt + εt (18)

    In the output Euler equation, zt is a measure of the output gap, xt is a measure of the real interest

    rate, and hence, γ < 0. In the inflation Euler version of (18), zt is a measure of inflation, xt is

    a measure of the output gap, and γ > 0 signifying that a positive output gap exerts “demand

    pressure” on inflation.

    Fuhrer and Olivei (2005) experiment with a quarterly sample from 1966:Q1 to 2001:Q4 and use

    the following measures for zt and xt. The output gap is measured, either by the log deviation of

    real GDP from its Hodrick-Prescott (HP) trend or, from a segmented time trend (ST) with breaks

    in 1974 and 1995. Real interest rates are measured by the difference of the federal funds rate

    25

  • and next period’s inflation. Inflation is measured by the log change in the GDP, chain-weighted

    price index. In addition, Fuhrer and Olivei (2005) experiment with real unit labor costs (RULC)

    instead of the output gap for the inflation Euler equation. Further details can be found in their

    paper.

    Table 3.1 and figure 1 summarize the empirical estimates of the output Euler equation and

    correspond to the results in table 4 in Fuhrer and Olivei (2005), whereas table 3.2 and figure 2

    summarize the estimates of the inflation Euler equation and correspond to the results in Table 5

    instead. For each Euler equation, we report the original GMM,MLE, andOI−GMM estimates and

    below these, we include the PMD results based on choosing h with Hall et al.’s (2007) information

    criterion. The top panels of figures 1 and 2 display the estimates of μ and γ in (18) as a function

    of h and the associated two-standard error bands. The bottom left panel displays the value of Hall

    et al.’s (2007) information criterion and the bottom right panel, the p-value of the overidentifying

    restrictions misspecification test.

    Since the true model is unknowable, there is no definitive metric by which one method can be

    judged to offer closer estimates to the true parameter values. Rather, we wish to investigate in

    which ways PMD coincides or departs from results that have been well studied in the literature.

    We begin by reviewing the estimates for the output Euler equation reported in table 3.1 and figure

    1. PMD estimates of μ are close to GMM estimates but with similar standard errors, and not

    very different from MLE or OI-GMM. On the other hand, PMD estimates for γ are slightly larger

    in magnitude, of the correct sign and statistically significant. This would seem like good news,

    however as figure 1 shows, while the estimates of μ appear to be somewhat stable to the choice

    of h, the estimates of γ are positive for any h < 7. This suggests that estimates of γ should be

    taken with caution as the model is likely dynamically misspecified (although the misspecification

    test does not suggest anything evident).

    Estimates of the inflation Euler equation follow a similar pattern. For all three specifications,

    μ and γ are estimated to be similar to the GMM estimates but in all three specifications, the

    26

  • misspecification tests rejects the model very clearly. Figure 2 shows that while estimates of μ are

    relatively stable, estimates of γ for the HP and ST specifications are virtually negative for any

    h. The RULC specification suggests γ is mostly positive (with γ negative only for h = 3 and 4).

    Overall, the results suggests caution since every indication (from the overidentifying restrictions

    tests to the plots of the parameter estimates as a function of h) is that the model is dynamically

    misspecified.

    With the exception of the inflation Euler model estimated with RULC, we find that the data

    reject most of the specifications commonly estimated (either outright, as indicated by the overi-

    dentifying restrictions test, or because of the variation of the parameter estimates as a function

    of h). The ability to check model specification by these two complementary methods is useful (es-

    pecially in instances when the data do not reject the model but variation in parameters estimates

    for low values of h is substantial). With some notable exceptions, PMD estimates are often close

    to estimates obtained by other methods but with smaller standard errors so that at a minimum,

    we are able to ascertain that our results are not caused by extreme differences.

    7 Conclusion

    This paper introduces a disarmingly simple and novel method of estimation for macroeconomic

    data. Several features make it appealing: (1) for many models, including some whose likelihood

    would require numerical optimization routines, PMD only requires simple least-squares algebra;

    (2) for many models, PMD approximates the maximum likelihood estimator in relatively small

    samples; (3) however, PMD is efficient in finite samples because it accounts for serial correlation

    in a convenient parametric way; (4) as a consequence, PMD is generally more efficient than GMM;

    (5) PMD provides an unsupervised method of conditioning for unknown omitted dynamics that

    in many cases mitigates invalid instrument problems; (6) PMD provides many natural statistics

    with which to evaluate estimates of a model including, an overall misspecification test, and a way

    27

  • to assess which parameter estimates are most sensitive to misspecification.

    The paper provides basic but generally applicable asymptotic results and ample Monte Carlo

    evidence in support of our claims. In addition, the empirical application provides a natural

    example of how PMD may be applied in practice. However, there are many research questions

    that space considerations prevented us from exploring. Throughout the paper, we have mentioned

    some of them, such as the need for a more detailed investigation of the power properties of the

    misspecification test in light of the GMM literature; and generalizations of our basic assumptions

    in the main theorems.

    Other natural extensions include nonlinear generalizations of the local projection step to ex-

    tend beyond the Wold assumption. Such generalizations are likely to be very approachable because

    local projections lend themselves well to more complex specifications. Similarly, we have excluded

    processes that are not covariance-stationary, mainly because they require slightly different as-

    sumptions on their infinite representation and the non-standard nature of the asymptotics are

    beyond the scope of this paper. In the end, we hope that the main contribution of the paper is

    to provide applied researchers with a new method of estimation that is simpler than many others

    available, while at the same time more robust and informative.

    8 Appendix

    8.1 Definitions and Notation

    We find it useful to begin by defining and collecting the notation that we use for the proofs of the

    propositions and lemmas introduced above. Specifically:

    (i) Xt,k−1kn×1

    =¡y0t,y

    0t−1, ...,y

    0t−k+1

    ¢0(ii) Yt,H

    Hn×1=¡y0t+1, ...,y

    0t+H

    ¢0(iii) Mt−1,k

    1×1= 1−

    PT−ht=k X

    0t−1,k

    ³PT−Ht=k Xt−1,kX

    0t−1,k

    ´−1Xt−1,k

    28

  • (iv) dΓn×n

    (j)=(T-k-H)−1PT−Ht=k yty

    0t−j

    (v) dΓn×n

    (j|1-k)=(T-k-H)−1PT−H

    t=k ytMt−1,ky0t−j

    (vi) bΓkkn×kn

    =(T-k-H)−1PT−Ht=k Xt,kX

    0t,k

    (vii) bΓ1−k,hkn×n

    = (T − k −H)−1PT−Ht=k Xt,ky

    0t+h;h = 1, ...,H

    (viii) bΓ1−H|1−kHn×n

    = (T − k −H)−1PT−H

    t=h Yt,HMt−1,ky0t

    8.2 Proof of Proposition 1

    The mean-square error linear predictor of yt+h based on yt, ...,yt−k+1 is bA(k, h)Xt,k−1 wherebA(k, h) is given by the least-squares formula

    bAn×kn

    (k,h)=( bAh1 ,..., bAhk)=bΓ01−k,hbΓ−1k (19)Notice that

    bA(k, h)−A(k, h) = bΓ01−k,hbΓ−1k −A(k, h)bΓkbΓ−1k =⎧⎨⎩(T − k − h)−1∞Xj=k

    vk,t+hX0t,k

    ⎫⎬⎭ bΓ−1kwhere

    vk,t+h =∞X

    j=k+1

    Ahj yt−j + ut+h +h−1Xj=1

    Bjut+h−j

    Hence,

    bA(k, h)−A(k, h) =⎧⎨⎩(T − k − h−1)

    T−hXt=k

    ⎛⎝ ∞Xj=k+1

    Ahj yt−j

    ⎞⎠X 0t,k⎫⎬⎭ bΓ−1k +(

    (T − k − h−1)T−hXt=k

    ut+hX0t,k

    )bΓ−1k +⎧⎨⎩(T − k − h−1)T−hXt=k

    ⎛⎝ hXj=1

    Bjut+h−j

    ⎞⎠X 0t,k⎫⎬⎭ bΓ−1k

    29

  • Define the matrix norm kCk21 = supl 6=0 l0C0C0

    l0l , that is, the largest eigenvalue of C0C. When C is

    symmetric, this is the square of the largest eigenvalue of C. Then

    kABk2 ≤ kAk21 kBk2 and kABk2 ≤ kAk2 kBk21

    Hence °°° bA(k, h)−A(k, h)°°° ≤ kU1Tk°°°bΓ−1k °°°1+ kU2T k

    °°°bΓ−1k °°°1+ kU3T k

    °°°bΓ−1k °°°1

    where

    U1T =

    ⎧⎨⎩(T − k − h−1)T−hXt=k

    ⎛⎝ ∞Xj=k+1

    Ahj yt−j

    ⎞⎠X 0t,k⎫⎬⎭

    U2T =

    ((T − k − h−1)

    T−hXt=k

    ut+hX0t,k

    )

    U3T =

    ⎧⎨⎩(T − k − h−1)T−hXt=k

    ⎛⎝ hXj=1

    Bjut+h−j

    ⎞⎠X 0t,k⎫⎬⎭

    Lewis and Reinsel (1985) show that°°°bΓ−1k °°°

    1is bounded, therefore, the next objective is to show

    kU1T kp→ 0, kU2T k

    p→ 0, and kU3Tkp→ 0. We begin by showing kU2T k

    p→ 0, which is easiest to

    see since ut+h and X 0t,k are independent, so that their covariance is zero. Formally and following

    similar derivations in Lewis and Reinsel (1985)

    E³kU2T k2

    ´= (T − k − h)−2

    T−hXt=k

    E¡ut+hu

    0t+h

    ¢E(X 0t,kX

    0t,k)

    by independence. Hence

    E³kU2T k2

    ´= (T − k − h)−1tr(Σu)k {tr [Γ(0)]}

    Since kT−k−H → 0 by assumption (ii), then E³kU2Tk2

    ´p→ 0, and hence kU2T k

    p→ 0.

    Next, consider kU3T kp→ 0. The proof is very similar since ut+h−j, j = 1, ..., h − 1 and X 0t,k

    are independent. As long as kBjk2

  • Finally, we show that kU1T kp→ 0. The objective here is to show that assumption (iii) implies

    that

    k1/2∞X

    j=k+1

    °°Ahj °°→ 0, k, T → 0because we will need this condition to hold to complete the proof later. Recall that

    Ahj = Bh−1Aj +Ah−1j+1 ; A

    0j+1 = 0; B0 = Ir; h, j ≥ 1, h finite

    Hence

    k1/2∞X

    j=k+1

    °°Ahj °° = k1/2⎧⎨⎩

    ∞Xj=k+1

    kBh−1Aj +Bh−2Aj+1 + ...+B1Aj+h−2 +Aj+h−1k

    ⎫⎬⎭by recursive substitution. Thus

    k1/2∞X

    j=k+1

    °°Ahj °° ≤ k1/2⎧⎨⎩

    ∞Xj=k+1

    kBh−1Ajk+ ...+ kB1Aj+h−2k+ kAj+h−1k

    ⎫⎬⎭Define λ as the max {kBh−1k , ..., kB1k} , then since

    P∞j=0 kBjk

  • 8.3 Proof of Proposition 2

    Notice that

    bA(k, h)−A(k, h) = ((T − k − h)−1 T−hXt=k

    vk,t+hX0t,k

    )bΓ−1k= (T − k − h)−1

    ⎡⎣T−hXt=k

    ⎧⎨⎩⎛⎝ ∞Xj=k+1

    Ahj yt−j

    ⎞⎠+ ut+h + h−1Xj=1

    Bjut+h−j

    ⎫⎬⎭X 0t,k⎤⎦ bΓ−1k

    = (T − k − h)−1⎧⎨⎩T−hXt=k

    ⎛⎝ ∞Xj=k+1

    Ahj yt−j

    ⎞⎠X 0t,k⎫⎬⎭nΓ−1k + ³bΓ−1k − Γ−1k ´o+

    (T − k − h)−1⎧⎨⎩T−hXt=k

    ⎛⎝ut+h + h−1Xj=1

    Bjut+h−j

    ⎞⎠X 0t,k⎫⎬⎭nΓ−1k + ³bΓ−1k − Γ−1k ´o

    Hence, the strategy of the proof will consist in showing that the first term in the sum above vanishes

    in probability so that,

    (T − k − h)1/2 vech bA(k, h)−A(k, h)i p→

    (T − k − h)1/2 vec

    ⎡⎣(T − k − h)−1⎧⎨⎩T−hXt=k

    ⎛⎝ut+h + h−1Xj=1

    Bjut+h−j

    ⎞⎠X 0t,k⎫⎬⎭Γ−1k

    ⎤⎦ .and then all we need to do is show that this last term is asymptotically normal. First we prove the

    convergence in probability result in this last expression. Define,

    U1T =

    ⎧⎨⎩(T − k − h)−1T−hXt=k

    ⎛⎝ ∞Xj=k+1

    Ahj yt−j

    ⎞⎠X 0t,k⎫⎬⎭

    U∗2T =

    ⎧⎨⎩(T − k − h)−1T−hXt=k

    ⎛⎝ut+h + h−1Xj=1

    Bjut+h−j

    ⎞⎠X 0t,k⎫⎬⎭

    then

    (T − k − h)1/2 vech bA(k, h)−A(k, h)i =

    (T − k − h)1/2

    ⎧⎪⎪⎨⎪⎪⎩vec

    £U1TΓ

    −1k

    ¤+ vec

    hU1T

    ³bΓ−1k − Γ−1k ´i+vec

    £U∗2TΓ

    −1k

    ¤+ vec

    hU∗2T

    ³bΓ−1k − Γ−1k ´i⎫⎪⎪⎬⎪⎪⎭

    32

  • hence

    (T − k − h)1/2 vech bA(k, h)−A(k, h)i− (T − k − h)1/2 vec £U∗2TΓ−1k ¤ =

    (T − k − h)1/2

    ⎧⎪⎪⎨⎪⎪⎩vec

    £U1TΓ

    −1k

    ¤+ vec

    hU1T

    ³bΓ−1k − Γ−1k ´i+vec

    hU∗2T

    ³bΓ−1k − Γ−1k ´i⎫⎪⎪⎬⎪⎪⎭ =

    ¡Γ−1k ⊗ Ir

    ¢vec

    h(T − k − h)1/2 U1T

    i+n³bΓ−1k − Γ−1k ´⊗ Iro vec h(T − k − h)1/2 U1T i+n³bΓ−1k − Γ−1k ´⊗ Iro vec h(T − k − h)1/2 U∗2T i

    Define, with a slight change in the order of the summands,

    W1T =n³bΓ−1k − Γ−1k ´⊗ Iro vec h(T − k − h)1/2 U1T i

    W2T =n³bΓ−1k − Γ−1k ´⊗ Iro vec h(T − k − h)1/2 U∗2T i

    W3T =¡Γ−1k ⊗ Ir

    ¢vec

    h(T − k − h)1/2 U1T

    iThe proof proceeds by showing that W1T

    p→ 0, W2Tp→ 0, W3T

    p→ 0.

    We begin by showing that W1Tp→ 0. Lewis and Reinsel (1985) show that under assumption (ii),

    k1/2°°°bΓ−1k − Γ−1k °°°

    1

    p→ 0 and E³°°°k−1/2 (T − k − h)1/2 U1T°°°´ ≤ s (T − k − h)1/2P∞j=k+1 °°Ahj °° p→

    0; k, T → ∞ from assumption (iii) and using similar derivations as in the proof of consistency

    with s being a generic constant. Hence W1Tp→ 0.

    Next, we show W2Tp→ 0. Notice that

    |W2T | ≤ k1/2°°°bΓ−1k − Γ−1k °°°

    1

    °°°k−1/2(T − k − h)1/2U∗2T°°°As in the previous step, Lewis and Reinsel (1985) establish that k1/2

    °°°bΓ−1k − Γ−1k °°°1

    p→ 0 and from

    the proof of consistency, we know the second term is bounded in probability, which is all we need

    to establish the result.

    Lastly, we need to show W3Tp→ 0, however, the proof of this result is identical to that in Lewis

    33

  • and Reinsel once one realizes that assumption (iii) implies that

    (T − k − h)1/2∞X

    j=k+1

    °°Ahj °° p→ 0and substituting this result into their proof.

    The asymptotic normality result then follows directly from Lewis and Reinsel (1985) by redefin-

    ing

    ATm = (T − k − h)1/2 vec

    ⎡⎣⎧⎨⎩(T − k − h)−1T−hXt=k

    ⎛⎝ut+h + h−1Xj=1

    Bjut+h−j

    ⎞⎠X 0t,k(m)⎫⎬⎭Γ−1k

    ⎤⎦for m = 1, 2, ... and Xt,k(m) as defined in Lewis and Reinsel (1985) and using their proof.

    8.4 Proof of Lemma 3

    Since bbT p→ b0, then

    f³bbT ;φ´ p→ f (b0;φ)

    by the continuous mapping theorem since by assumption (iv), f (.) is continuous. Furthermore

    and given assumption (i)

    bQT (φ) = f ³bbT ;φ´0cWf ³bbT ;φ´ p→ f (b0;φ)0cWf (b0;φ) ≡ Q0 (φ)which is a quadratic expression that is maximized at φ0. Assumption (vi) provides a necessary

    condition for identification of the parameters (i.e., that there be at least as many matching condi-

    tions as parameters) that must be satisfied to establish uniqueness. As a quadratic function, Q0(φ)

    is obviously a continuous function. The last thing to show is that

    supφ∈Θ

    ¯̄̄ bQT (φ)−Q0(φ)¯̄̄ p→ 0uniformly.

    For compact Θ and continuous Q0(φ), Lemma 2.8 in Newey and McFadden (1994) provides

    that this condition holds if and only if bQT (φ) p→ Q0(φ) for all φ in Θ and bQT (φ) is stochastically34

  • equicontinuous. The former has already been established, so it remains to show stochastic equicon-

    tinuity of bQT (φ).1 Whether bQT (φ) is stochastically equicontinuous depends on each applicationand, specifically, on the properties and assumptions made on the specific nature of f (.) . In the

    example that we use in this paper and presented in section 2, the function f(.) is rather trivial

    (linear in the parameters) so that proving uniform convergence is rather easy and we do not really

    require stochastic equicontinuity (since we have assumed that H is finite and not a function of

    T ). In general, we directly assume here that stochastic continuity holds and we refer the reader

    to Andrews (1994, 1995) for examples and sets of specific conditions that apply even when b is

    infinite dimensional and for more general forms of f(.).

    8.5 Proof of Lemma 4

    Under assumption (iii) b0 and γ0 are in the interior of their parameter spaces and by assumption

    (ii) bbT p→ b0, bγT p→ γ0. Further, by assumption (iv), f(bbT ; γ) is continuously differentiable in aneighborhood of b0 and γ0 and hence bγT solves the first order conditions of the minimum-distanceproblem

    minγf(bbT ;γ)0cWf(bbT ;γ)

    which are

    ³bbT ;γ´0cWf(bbT ;γ) = 0By assumption (iv), these first order conditions can be expanded about γ0 in mean value expansion

    f(bbT ; bγT ) = f(bbT ;γ0) + Fγ ³bbT ;γ´ (bγT − γ0)where γ ∈ [bγT , γ0]. Similarly, a mean value expansion of f(bbT ; γ0) around b0 is

    f(bbT ;γ0) = f(b0;γ0) + Fb ¡b;γ0¢ ³bbT − b0´1 Stochastic equicontinuity: For every ², η > 0 there exists a sequence of random variables ∆̂t and a sample

    size t0 such that for t ≥ t0, Prob(|∆̂T | > ²) < η and for each φ there is an open set N containing φ withsupφ̃∈N

    ¯̄̄ bQT (φ̃)− bQT (φ)¯̄̄ ≤ ∆̂T , for t ≥ t0.

    35

  • Combining both mean value expansions and multiplying by√T, we have

    √Tf(bbT ; bγT ) = √Tf(b0;γ0) + Fγ ³bbT ;γ´√T (bγT − γ0) +

    Fb¡b;γ0

    ¢√T³bbT − b0´

    Since b ∈ [bbT , b0], γ ∈ [bγT , γ0] and bbT p→ b0, bγT p→ γ0 then, along with assumption (iv), we haveFγ

    ³bbT ;γ´ p→ Fγ (b0;γ0) = FγFb¡b;γ0

    ¢ p→ Fb(b0;γ0) = Fband hence

    √Tf(bbT ; bγT ) = √Tf(b0;γ0) + Fγ√T (bγT − γ0) + Fb√T ³bbT − b0´+ op(1)

    In addition, by assumption (i) cW p→ W and notice that f (b0,γ0) = 0, which combined with thefirst order conditions and the mean value expansions described above, allow us to write

    −F 0γWhFγ√T (bγT − γ0) + Fb√T ³bbT − b0´i = op(1)

    Since we know that

    √T³bbT − b0´ d→ N (0,Ωb)

    by proposition 2, then

    √T (bγT − γ0) d→ − ¡F 0γWFγ ¢−1 ¡F 0γWFb¢√T ³bbT − b0´

    by assumption (vii) which ensures that F 0γWFγ is invertible. Therefore, from the previous expres-

    sion we arrive at

    √T (bγT − γ0) d→ N (0,Ωγ )

    Ωγ =¡F 0γWFγ

    ¢−1 ¡F 0γWFbΩbF

    0bWFγ

    ¢ ¡F 0γWFγ

    ¢−1Notice that since we are using the optimal weighting matrix, then W = (FbΩbF 0b)

    −1 and hence,

    the previous expression simplifies considerably to

    36

  • Ωγ =¡F 0γWFγ

    ¢−1W = (FbΩbF

    0b)−1

    ReferencesAn, Sungbae and Frank Schorfheide (2007) “Bayesian Analysis of DSGE Models,” Econo-metric Reviews, 26(2-4): 113-172.

    Andrews, Donald W. K. (1994) “Asymptotics for Semiparametric Econometric Models viaStochastic Equicontinuity,” Econometrica, 62(1): 43-72.

    Andrews, Donald W. K. (1995) “Non-parametric Kernel Estimation for SemiparametricModels,” Econometric Theory, 11(3): 560-596.

    Bekker, Paul A. (1994) “Alternative Approximations to the Distribution of InstrumentalVariable Estimators,” Econometrica, 62: 657-681.

    Calvo, Guillermo A. (1983) “Staggered Prices in a Utility Maximizing Framework,” Journalof Monetary Economics, 12: 383-98.

    Cameron, A. Colin and Pravin K. Trivedi (2005) Microeconometrics: Methods andApplications. Cambridge: Cambridge University Press.

    Canova, Fabio (2007) Methods for Applied Macroeconomic Research. Princeton:Princeton University Press.

    Christiano, Lawrence J. and Wouter den Haan (1996) “Small-Sample Properties of GMMfor Business Cycle Analysis,” Journal of Business and Economic Statistics, 14(3): 309-327.

    Christiano, Lawrence J., Martin Eichenbaum, and Charles L. Evans (2005) “Nominal Rigidi-ties and the Dynamic Effects of a Shock to Monetary Policy,” Journal of Political Economy,113(1): 1-45.

    Clarida, Richard, Jordi Galí and Mark Gertler (1999) “The Science of Monetary Policy: ANew Keynesian Perspective,” Journal of Economic Literature, 37(4): 1661-1707.

    Eichenbaum, Martin and Jonas D. M. Fisher (2007) “Estimating the Frequency of PriceRe-optimization in Calvo-style Models,” Journal of Monetary Economics, 54(7): 2032-2047.

    Fernández-Villaverde, Jesús, Juan F. Rubio-Ramírez, Thomas J. Sargent and Mark W. Wat-son (2007) “A, B, Cs (and Ds) of Understanding VARs,” American Economic Review,97(3):1021-1026.

    Fuhrer, Jeffrey C. and Giovanni P. Olivei (2005) “Estimating Forward-Looking Euler Equa-tions with GMM Estimators: An Optimal Instruments Approach,” in Models and Mon-etary Policy: Research in the Tradition of Dale Henderson, Richard Porter,and Peter Tinsley, Board of Governors of the Federal Reserve System: Washington, DC,87-104.

    37

  • Galí, Jordi and Mark Gertler (1999) “Inflation Dynamics: A Structural Econometric Ap-proach,” Journal of Monetary Economics, 44(2): 195-222.

    Galí, Jordi, Mark Gertler and David J. López-Salido (2001) “European Inflation Dynamics,”European Economic Review, 45(7): 1237-1270 (Erratum, September 2002).

    Galí, Jordi, Mark Gertler and David J. López-Salido (2005) “Robustness of the Estimates ofthe Hybrid New Keynesian Phillips Curve,” Journal of Monetary Economics, 52(6): 1107-1118.

    Gonçalves, Silvia and Lutz Kilian (2006) “Asymptotic and Bootstrap Inference for AR(∞)Processes with Conditional Heteroskedasticity,” Econometric Reviews, forthcoming.

    Hall, Alastair, Atsushi Inoue, James M. Nason, and Barbara Rossi (2007) “InformationCriteria for Impulse Response Function Matching Estimation of DSGE Models,” Duke Uni-versity, mimeo.

    Hurvich, Clifford M. and Chih-Ling Tsai (1989) “Regression and Time Series Model Selectionin Small Samples,” Biometrika, 76(2): 297-307.

    Jordà, Òscar (2005) “Estimation and Inference of Impulse Responses by Local Projections,”American Economic Review, 95(1): 161-182.

    Kurmann, André (2005) “Quantifying the Uncertainty about a Forward-Looking New Key-nesian Pricing Model,” Journal of Monetary Economics, 52(6): 1119-1134.

    Kuersteiner, Guido M. (2005) “Automatic Inference for Infinite Order Vector Autoregres-sions,” Econometric Theory, 21: 85-115.

    Lewis, R. A. and Gregory C. Reinsel (1985) “Prediction of Multivariate Time Series byAutoregressive Model Fitting,” Journal of Multivariate Analysis, 16(33): 393-411.

    Levin, Andrew T. and John C. Williams (2003) “Robust Monetary Policy with CompetingReference Models,” Journal of Monetary Economics, 50(5): 945-975.

    Lindé, Jesper (2005) “Estimating New Keynesian Phillips Curves: A Full Information Max-imum Likelihood Approach,” Journal of Monetary Economics, 52(6): 1135-1149.

    Newey, Whitney K. and Daniel L. McFadden (1994) “Large Sample Estimation and Hy-pothesis Testing,” in Handbook of Econometrics, v. 4, Robert F. Engle and Daniel L.McFadden, (eds.). Amsterdam: North Holland.

    Newey, Whitney K. and Kenneth D. West (1987) “A Simple, Positive Semi-Definite, Het-eroscedasticity and Autocorrelation Consistent Covariance Matrix,” Econometrica, 55:703-708.

    Rotemberg, Julio J. and Michael Woodford (1997) “An Optimization-Based EconometricFramework for the Evaluation of Monetary Policy,” NBER Macroeconomics Annual, 297-346.

    Rudd, Jeremy and Karl Whelan (2005) “New Test of the New-Keynesian Phillips Curve,”Journal of Monetary Economics, 52(6): 1167-1181.

    Sbordone, Argia (2002) “Prices and Unit Labor Costs: Testing Models of Pricing Behavior,”Journal of Monetary Economics, 49(2): 265-292.

    38

  • Smets, Frank and Raf Wouters (2003) “An Estimated Dynamic Stochastic General Equi-librium Model of the Euro Area,” Journal of the European Economic Association, 1(5):1123-1175.

    Staiger, Douglas and James H. Stock (1997) “Instrumental Variables Regression with WeakInstruments,” Econometrica, 65(3): 557-586.

    Stock, James H., Jonathan H. Wright and Motohiro Yogo (2002) “A Survey of Weak Instru-ments and Weak Identification in Generalized Method of Moments,” Journal of Businessand Economic Statistics, 20(4): 518-529.

    Walsh, Carl E. (2003) Monetary Theory and Policy, second edition. Cambridge,Massachusetts: The MIT Press.

    39

  • 40

    TABLE 1.1 – ARMA(1,1) Monte Carlo Experiments

    T = 50 h = 2 h = 5 h = 10 ρ θ ρ θ ρ θ PMD Est. 0.23 0.49 0.25 0.44 0.31 0.28 SE 0.22 0.20 0.20 0.19 0.20 0.18 SE (MC) 0.31 0.27 0.21 0.20 0.22 0.28 MLE Est. 0.22 0.52 0.23 0.52 0.22 0.53 SE 0.21 0.18 0.20 0.18 0.20 0.18 SE (MC) 0.27 0.24 0.27 0.23 0.27 0.23 T = 100 h = 2 h = 5 h = 10 ρ θ ρ θ ρ θ PMD Est. 0.24 0.50 0.25 0.47 0.27 0.45 SE 0.15 0.14 0.15 0.13 0.14 0.13 SE (MC) 0.17 0.15 0.15 0.13 0.15 0.15 MLE Est. 0.25 0.51 0.24 0.51 0.24 0.50 SE 0.14 0.13 0.14 0.13 0.14 0.13 SE (MC) 0.15 0.13 0.16 0.14 0.14 0.14 T = 400 h = 2 h = 5 h = 10 ρ θ ρ θ ρ θ PMD Est. 0.25 0.51 0.25 0.50 0.25 0.50 SE 0.07 0.07 0.07 0.06 0.07 0.06 SE (MC) 0.08 0.07 0.07 0.07 0.07 0.07 MLE Est. 0.25 0.50 0.25 0.50 0.24 0.51 SE 0.07 0.06 0.07 0.07 0.07 0.06 SE (MC) 0.07 0.06 0.07 0.07 0.07 0.06 Notes: 1,000 Monte Carlo replications, 1st-stage regression lag length chosen automatically by AICC, SE refers to the standard error calculated with the PMD/MLE formula. SE (MC) refers to the Monte Carlo standard error based on the 1,000 estimates of the parameter. 500 burn-in observations disregarded when generating the data.

  • 41

    TABLE 1.2 – ARMA(1,1) Monte Carlo Experiments

    T = 50 h = 2 h = 5 h = 10 ρ Θ ρ θ ρ θ PMD Est. 0.46 0.23 0.47 0.17 0.49 0.15 SE 0.19 0.20 0.18 0.19 0.18 0.18 SE (MC) 0.23 0.23 0.21 0.22 0.20 0.28 MLE Est. 0.45 0.29 0.44 0.27 0.45 0.29 SE 0.20 0.20 0.20 0.21 0.20 0.20 SE (MC) 0.21 0.23 0.23 0.25 0.19 0.22 T = 100 h = 2 h = 5 h = 10 ρ Θ ρ θ ρ θ PMD Est. 0.48 0.23 0.47 0.23 0.50 0.23 SE 0.13 0.14 0.13 0.14 0.12 0.13 SE (MC) 0.15 0.16 0.14 0.16 0.13 0.18 MLE Est. 0.48 0.27 0.47 0.25 0.48 0.26 SE 0.14 0.14 0.14 0.15 0.13 0.14 SE (MC) 0.14 0.15 0.13 0.15 0.13 0.14 T = 400 h = 2 h = 5 h = 10 ρ Θ ρ θ ρ θ PMD Est. 0.50 0.5 0.49 0.26 0.49 0.25 SE 0.07 0.07 0.06 0.07 0.06 0.07 SE (MC) 0.07 0.08 0.07 0.08 0.06 0.07 MLE Est. 0.50 0.25 0.49 0.26 0.49 0.26 SE 0.07 0.07 0.07 0.07 0.07 0.07 SE (MC) 0.06 0.07 0.07 0.07 0.06 0.07 Notes: 1,000 Monte Carlo replications, 1st-stage regression lag length chosen automatically by AICC, SE refers to the standard error calculated with the PMD/MLE formula. SE (MC) refers to the Monte Carlo standard error based on the 1,000 estimates of the parameter. 500 burn-in observations disregarded when generating the data.

  • 42

    TABLE 1.3 – ARMA(1,1) Monte Carlo Experiments

    T = 50 h = 2 h = 5 h = 10 ρ θ ρ θ ρ θ PMD Est. -0.06 0.56 0.06 0.40 0.16 0.28 SE 0.36 0.32 0.27 0.25 0.25 0.22 SE (MC) 0.61 0.55 0.28 0.29 0.31 0.37 MLE Est. - - - - - - SE - - - - - - SE (MC) - - - - - - T = 100 h = 2 h = 5 h = 10 ρ θ ρ θ ρ θ PMD Est. -0.03 0.54 0.04 0.45 0.09 0.41 SE 0.24 0.21 0.19 0.18 0.19 0.17 SE (MC) 0.33 0.30 0.21 0.21 0.22 0.23 MLE Est. - - - - - - SE - - - - - - SE (MC) - - - - - - T = 400 h = 2 h = 5 h = 10 ρ θ ρ θ ρ θ PMD Est. -0.01 0.51 0.00 0.50 0.02 0.48 SE 0.11 0.10 0.10 0.09 0.10 0.09 SE (MC) 0.11 0.10 0.10 0.09 0.09 0.09 MLE Est. 0.04 0.50 0.00 0.50 0.00 0.50 SE 0.10 0.09 0.10 0.09 0.10 0.08 SE (MC) 0.10 0.09 0.10 0.09 0.09 0.08 Notes: 1,000 Monte Carlo replications, 1st-stage regression lag length chosen automatically by AICC, SE refers to the standard error calculated with the PMD/MLE formula. SE (MC) refers to the Monte Carlo standard error based on the 1,000 estimates of the parameter. 500 burn-in observations disregarded when generating the data.

  • 43

    TABLE 1.4 – ARMA(1,1) Monte Carlo Experiments

    T = 50 h = 2 h = 5 h = 10 ρ θ ρ θ ρ θ PMD Est. 0.47 0.04 0.43 0.03 0.54 -0.10 SE 0.28 0.30 0.24 0.26 0.21 0.23 SE (MC) 0.40 0.40 0.24 0.26 0.24 0.30 MLE Est. - - - - - - SE - - - - - - SE (MC) - - - - - - T = 100 h = 2 h = 5 h = 10 ρ θ ρ θ ρ θ PMD Est. 0.49 0.01 0.45 0.03 0.53 -0.04 SE 0.19 0.20 0.17 0.18 -.15 0.17 SE (MC) 0.20 0.20 0.19 0.19 0.18 0.21 MLE Est. 0.49 -0.02 0.47 0.03 0.47 0.03 SE 0.17 0.20 0.18 0.20 0.18 0.20 SE (MC) 0.18 0.20 0.19 0.20 0.18 0.20 T = 400 h = 2 h = 5 h = 10 ρ θ ρ θ ρ θ PMD Est. 0.50 0.01 0.50 0.00 0.50 0.00 SE 0.09 0.10 0.08 0.09 0.08 0.09 SE (MC) 0.09 0.10 0.10 0.10 0.10 0.11 MLE Est. 0.49 0.01 0.49 0.01 0.48 0.02 SE 0.09 0.10 0.09 0.10 0.09 0.10 SE (MC) 0.09 0.10 0.09 0.10 0.09 0.10 Notes: 1,000 Monte Carlo replications, 1st-stage regression lag length chosen automatically by AICC, SE refers to the standard error calculated with the PMD/MLE formula. SE (MC) refers to the Monte Carlo standard error based on the 1,000 estimates of the parameter. 500 burn-in observations disregarded when generating the data.

  • 44

    Table 2.1.1 Monte Carlo Comparison: GMM vs. PMD.

    Case 1:

    D.G.P.

    Instrument list excludes Rt

    PMD GMM PMD GMM Benchmark h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6

    0.703 (0.038)

    0.689 (0.030)

    0.697 (0.041)

    0.678 (0.036)

    0.718 (0.026)

    0.692 (0.021)

    0.705 (0.032)

    0.689 (0.030)

    0.297 (0.038)

    0.312 (0.071)

    0.303 (0.041)

    0.322 (0.036)

    0.281 (0.026)

    0.307 (0.021)

    0.295 (0.032)

    0.311 (0.030)

    0.127 (0.097)

    0.102 (0.030)

    0.108 (0.119)

    0.098 (0.102)

    0.078 (0.036)

    0.074 (0.027)

    0.078 (0.048)

    0.064 (0.045)

    PMD GMM PMD GMM Lagged PC h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6

    0.332 (0.411)

    0.503 (0.135)

    0.507 (0.350)

    0.267 (0.219)

    0.422 (0.313)

    0.453 (0.109)

    0.698 (0.295)

    0.369 (0.196)

    0.140 (0.100)

    0.160 (0.051)

    0.080 (0.082)

    0.098 (0.075)

    0.154 (0.079)

    0.192 (0.040)

    0.071 (0.072)

    0.116 (0.064)

    -0.063 (0.216)

    -0.071 (0.100)

    -0.039 (0.184)

    -0.068 (0.174)

    0.107 (0.087)

    0.079 (0.042)

    0.082 (0.083)

    0.118 (0.078)

    PMD GMM PMD GMM Lagged IS h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6

    0.708 (0.041)

    0.682 (0.030)

    0.697 (0.043)

    0.680 (0.036)

    0.714 (0.026)

    0.684 (0.019)

    0.704 (0.031)

    0.682 (0.029)

    0.291 (0.041)

    0.317 (0.030)

    0.303 (0.043)

    0.320 (0.036)

    0.285 (0.026)

    0.316 (0.019)

    0.296 (0.031)

    0.317 (0.029)

    0.082 (0.151)

    0.082 (0.101)

    0.067 (0.189)

    0.043 (0.151)

    0.043 (0.052)

    0.043 (0.036)

    0.040 (0.065)

    0.037 (0.062)

    Notes: 1,000 Monte Carlo replications. Each run initialized with 500 burn-in replications later disregarded. Sample size T = 200. Monte Carlo median values of the parameter estimates and the associated standard errors reported. “Lagged PC” refers to when the DGP consists of a Phillips Curve with first and second lag inflation terms. Similarly, “Lagged IS” refers to when the DGP consists of a Phillips curve with first and second lag output gap terms. h = 2 uses the first 2 horizons of the impulse response function when estimating with PMD (to obtain over-identification) and corresponds to using the first two lags of the variables as instruments when estimating by GMM. h* refers to the optimal horizon selected by Hall et al.’s (2007) information criterion and varies with the model. “Benchmark” and “Lagged IS” cases impose . “Lagged PC” case estimates these parameters unconstrained (since in the DGP but ).

  • 45

    Table 2.1.2 Monte Carlo Comparison: GMM vs. PMD.

    Case 1:

    D.G.P.

    Instrument list includes Rt

    PMD GMM PMD GMM Benchmark h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6

    0.710 (0.037)

    0.684 (0.031)

    0.969 (0.054)

    0.919 (0.046)

    0.720 (0.026)

    0.694 (0.021)

    0.896 (0.037)

    0.860 (0.035)

    0.290 (0.037)

    0.316 (0.031)

    0.031 (0.054)

    0.080 (0.046)

    0.280 (0.026)

    0.306 (0.021)

    0.104 (0.037)

    0.140 (0.035)

    0.130 (0.094)

    0.110 (0.070)

    -0.427 (0.165)

    -0.332 (0.134)

    0.076 (0.036)

    0.076 (0.027)

    -0.024 (0.059)

    -0.019 (0.055)

    PMD GMM PMD GMM Lagged PC h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6

    0.318 (0.412)

    0.491 (0.139)

    2.303 (0.690)

    1.094 (0.206)

    0.410 (0.314)

    0.453 (0.113)

    2.373 (0.591)

    1.172 (0.192)

    0.136 (0.103)

    0.159 (0.051)

    -0.178 (0.194)

    0.001 (0.075)

    0.166 (0.083)

    0.186 (0.041)

    -0.219 (0.174)

    -0.013 (0.068)

    -0.062 (0.213)

    -0.063 (0.099)

    -0.008 (0.456)

    0.022 (0.173)

    0.104 (0.091)

    0.083 (0.043)

    -0.133 (0.199)

    0.034 (0.080)

    PMD GMM PMD GMM Lagged IS h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6

    0.715 (0.041)

    0.686 (0.030)

    1.04 (0.082)

    0.957 (0.058)

    0.715 (0.026)

    0.686 (0.020)

    0.921 (0.040)

    0.879 (0.036)

    0.285 (0.041)

    0.314 (0.030)

    -0.044 (0.082)

    0.042 (0.,058)

    0.284 (0.026)

    0.314 (0.020)

    0.079 (0.040)

    0.121 (0.036)

    0.076 (0.154)

    0.082 (0.102)

    -1.180 (0.374)

    -0.809 (0.245)

    0.051 (0.052)

    0.051 (0.036)

    -0.125 (0.088)

    -0.123 (0.080)

    Notes: 1,000 Monte Carlo replications. Each run initialized with 500 burn-in replications later disregarded. Sample size T = 200. Monte Carlo median values of the parameter estimates and the associated standard errors reported. “Lagged PC” refers to when the DGP consists of a Phillips Curve with first and second lag inflation terms. Similarly, “Lagged IS” refers to when the DGP consists of a Phillips curve with first and second lag output gap terms. h = 2 uses the first 2 horizons of the impulse response function when estimating with PMD (to obtain over-identification) and corresponds to using the first two lags of the variables as instruments when estimating by GMM. h* refers to the optimal horizon selected by Hall et al.’s (2007) information criterion and varies with the model. “Benchmark” and “Lagged IS” cases impose . “Lagged PC” case estimates these parameters unconstrained (since in the DGP but ).

  • 46

    Table 2.2.1 Monte Carlo Comparison: GMM vs. PMD.

    Case 2:

    D.G.P.

    Instrument list excludes Rt

    PMD GMM PMD GMM Benchmark h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6

    0.509 (0.035)

    0.510 (0.028)

    0.516 (0.052)

    0.523 (0.046)

    0.569 (0.027)

    0.543 (0.021)

    0.557 (0.037)

    0.550 (0.035)

    0.491 (0.035)

    0.490 (0.028)

    0.484 (0.052)

    0.476 (0.046)

    0.431 (0.027)

    0.456 (0.021)

    0.443 (0.037)

    0.450 (0.035)

    0.251 (0.040)

    0.220 (0.032)

    0.229 (0.073)

    0.197 (0.064)

    0.159 (0.027)

    0.164 (0.022)

    0.159 (0.042)

    0.148 (0.040)

    PMD GMM PMD GMM Lagged PC h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6

    0.356 (0.385)

    0.509 (0.124)

    0.619 (0.328)

    0.318 (0.205)

    0.556 (0.166)

    0.560 (0.079)

    0.888 (0.207)

    0.581 (0.152)

    0.228 (0.113)

    0.251 (0.052)

    0.084 (0.098)

    0.131 (0.078)

    0.280 (0.071)

    0.345 (0.036)

    0.085 (0.078)

    0.187 (0.065)

    0.079 (0.119)

    0.040 (0.050)

    0.028 (0.106)

    0.061 (0.088)

    0.121 (0.063)

    0.088 (0.033)

    0.050 (0.086)

    0.141 (0.069)

    PMD GMM PMD GMM Lagged IS h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6

    0.547 (0.057)

    0.535 (0.032)

    0.534 (0.054)

    0.542 (0.045)

    0.571 (0.030)

    0.557 (0.019)

    0.559 (0.036)

    0.554 (0.033)

    0.452 (0.057)

    0.465 (0.032)

    0.466 (0.054)

    0.457 (0.045)

    0.429 (0.030)

    0.442 (0.019)

    0.441 (0.036)

    0.446 (0.033)

    0.126 (0.103)

    0.145 (0.048)

    0.142 (0.093)

    0.118 (0.078)

    0.093 (0.041)

    0.109 (0.025)

    0.098 (0.049)

    0.090 (0.046)

    Notes: 1,000 Monte Carlo replications. Each run initialized with 500 burn-in replications later disregarded. Sample size T = 200. Monte Carlo median values of the parameter estimates and the associated standard errors reported. “Lagged PC” refers to when the DGP consists of a Phillips Curve with first and second lag inflation terms. Similarly, “Lagged IS” refers to when the DGP consists of a Phillips curve with first and second lag output gap terms. h = 2 uses the first 2 horizons of the impulse response function when estimating with PMD (to obtain over-identification) and corresponds to using the first two lags of the variables as instruments when estimating by GMM. h* refers to the optimal horizon selected by Hall et al.’s (2007) information criterion and varies with the model. “Benchmark” and “Lagged IS” cases impose . “Lagged PC” case estimates these parameters unconstrained (since in the DGP but ).

  • 47

    Table 2.2.2 Monte Carlo Comparison: GMM vs. PMD.

    Case 2:

    D.G.P.

    Instrument list includes Rt

    PMD GMM PMD GMM Benchmark h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6

    0.509 (0.035)

    0.505 (0.028)

    0.931 (0.053)

    0.887 (0.046)

    0.570 (0.027)

    0.545 (0.021)

    0.793 (0.035)

    0.765 (0.0