Estimation and Inference by the Method of Projection...

This version, October 2008

Estimation and Inference by the Method of Projection Minimum

Distance: An Application to the New Keynesian Hybrid Phillips Curve∗

Abstract

In most macroeconomic models, the stability of the solution path implies that the system is covariance-stationary and hence admits a Wold representation. The ability to estimate this Wold representationsemi-parametrically by local projections (Jordà, 2005), even when the solution path’s process is unknownor unconventional, can be exploited to estimate the model’s parameters by minimum distance techniques.We label this two-step, least-squares, estimation procedure “projection minimum distance” (PMD) andshow that: (1) it is consistent and asymptotically normal for a large class of problems; (2) it is efficienteven in relatively small samples; and (3) it is asymptotically equivalent to maximum likelihood andnests most applications of generalized method of moments as a special case. Although PMD is a generalmethod, we investigate its properties in the context of the New Keynesian hybrid Phillips curve, providingample Monte Carlo evidence and revisiting Fuhrer and Olivei’s (2005) empirical analysis to an illustrativeapplication.

• Keywords: impulse response, local projection, minimum chi-square, minimum distance.• JEL Codes: C32, E47, C53.

Òscar JordàDepartment of EconomicsUniversity of California, DavisOne Shields Ave.Davis, CA 95616e-mail: [email protected]

Sharon KozickiResearch DepartmentBank of Canada234 Wellington StreetOttawa, Ontario, CanadaK1A 0G9e-mail: [email protected]

∗The views expressed herein are solely those of the authors and do not necessarily reflect the views of the Bankof Canada. We thank Colin Cameron, Timothy Cogley, David DeJong, Richard Dennis, Stephen Donald, DavidDrukker, Jeffrey Fuhrer, James Hamilton, Peter Hansen, Kevin Hoover, Giovanni Olivei, Peter Robinson, PaulRuud, Frank Schorfheide, Aaron Smith, Harald Uhlig, Frank Wolak and seminar participants at the Bank of Italy,Bocconi Univeristy - IGIER, Duke University, the Federal Reserve Bank of Dallas, the Federal Reserve Bank ofPhiladelphia, the Federal Reserve Bank of New York, Federal Reserve Bank of San Francisco, Federal ReserveBank of St. Louis, Southern Methodist University, Stanford University, the University of California, Berkeley, theUniversity of California, Davis, the University of California, Riverside, the University of Houston, University ofKansas, the University of Pennsylavania, the University of Texas at Austin, and the 2006 Winter Meetings of theAmerican Economic Association in Boston, the 2006 European Meetings of the Econometric Society in Vienna, andthe 3rd Macro Workshop in Vienna, 2006 for useful comments and suggestions. Jordà is thankful for the hospitalityof the Federal Reserve Bank of San Francisco during the preparation of this paper.

1 Introduction

Econometric estimation of dynamic stochastic (partial or general) equilibrium models requires

that practitioners confront the limits that model tractability impose on the universe of variables

and on the wealth of dynamic interactions observed in reality. Approaches based on model implied

likelihoods, be it through the classical (e.g. Canova, 2007) or through the Bayesian (e.g. An and

Schorfheide, 2007) approaches, are only sensible with sufficiently complex and complete models

that narrow this separation with reality, and when suitable sources of exogenous variation are

properly ascertained. The inability to conduct controlled experiments in macroeconomics and

the capriciousness of natural or quasi-natural experiments often limit a practitioner’s choice to

estimation strategies based on appropriate instrumental variables techniques.

This paper introduces a statistical method of parameter estimation in which the economic

model’s restrictions are cast against a flexible, semi-parametric representation of the data based

on its Wold (or impulse response) representation. The estimation methodology is particularly well-

suited for models designed to capture dynamic comovement, such as real business cycle models or

new Keynesian specifications, whose performance if often evaluated based on their ability to match

persistence and cross-correlation properties of macroeconomic data. The objective is to obtain

parameter estimates that are robust to incomplete characterizations of the dynamics and/or the

forcing variables that the behavioral model is trying to explain. The result is a minimum-distance

estimator that is computationally simple, whose asymptotic properties we fully derive, and whose

relation to maximum likelihood (ML) and other minimum distance estimators available (such as

the generalized method of moments or GMM, Sbordone’s 2002 forecast matching estimator, and

the impulse response matching estimator in Rotemberg and Woodford, 1997; more recently used

in Christiano, Eichenbaum and Evans, 2005) we establish.

Perhaps it is useful to frame our discussion in the context of the voluminous literature that

investigates inflation dynamics (e.g. volume 52 of the Journal of Monetary Economics in 2005 was

1

exclusively dedicated to estimation of the Phillips curve). A critical divide in this literature appears

to emerge between proponents of limited information, single-equation, instrumental-variable based

methods (primarily in that issue Galí, Gertler and López-Salido, 2005) versus its critics and

proponents of full information methods (e.g. Kurmann, 2005; Lindé, 2005; Rudd and Whelan,

2005), where often times, a complete, New Keynesian formulation of the economy is required. It

is not difficult to grasp that central to this line of research is the desire to determine from the

data the degree of backward/forward looking behavior of the Phillips curve because it is so central

in determining optimal monetary policy responses, sacrifice ratios, and the stability of competing

policy prescriptions (see e.g., Levin and Williams, 2003).

Common arguments against GMM have to do with fears about poor small sample properties,

and weak instrument problems. Our estimator addresses some of these issues but instead, we

wish to highlight a far more fundamental issue that has been previously neglected. For expository

purposes, consider a researcher that is interested in estimating the following generic regression

(presumably, a representation of a fundamental relation derived from an economic model):

y = Y β + u (1)

where Y includes endogenous variables and possibly exogenous or predetermined variables, and

where z are a set of proposed instruments (that will include whatever variables in Y are exogenous

or predetermined). Instead, suppose the true data generating process (DGP) is characterized by

y = Y β + xγ + ε (2)

where x is a vector of omitted (exogenous or predetermined) variables. Here the omission of

the variables x is motivated by the researcher’s express belief in the structural nature of the

relationship in (1), not by their unavailability. Even when the z are valid instruments for (2),

they will not be valid instruments for (1) in general if E(z0x) 6= 0 and γ 6= 0 since the validity

of z depends on E(z0u) = 0 and in this case E(z0u) = E(z0x)γ + E(z0ε) = E(z0x)γ 6= 0. Thus,

2

z are valid instruments from the perspective of the DGP in (2), not the proposed model (1).

This problem is particularly acute in macroeconomics and the common practice of estimating

Euler equations with GMM using lagged endogenous variables as instruments. As is clear from

the preceding discussion, lagged endogenous variables can become illegitimate instruments when

there is omitted feedback and/or omitted variables in the Euler expression.

A natural solution is to orthogonalize the instrument set z with respect to the possibly omitted

variables x. Hence consider a first stage regression

z = xδ + v

where the residuals of this regression, v, (rather than the predicted values, as is commonly done in

two-stage least-squares) are proper instruments for the model in expression (1). It turns out that

the estimator that we propose achieves similar instrument pre-treatment in a manner that can

be exploited to examine model misspecification and that can be seen as a direct generalization

of the typical GMM estimator. The paper presents our methods using examples based on the

Phillips curve as its backdrop but it should be clear from our presentation that our methods are

not limited to the examples that we provide. In fact, we derive the statistical properties of our

estimator under general assumptions that include possibly non-linear system’s estimation.

2 Projection Minimum Distance

The dynamics of many macroeconomic models often depend on expectations about their future

values. This is the natural consequence of models with rational expectations or many models with

learning mechanisms. Furthermore, the relative significance of forward versus backward looking

terms is of considerable importance in determining optimal policy responses — the stability of the

solution paths and the economy often depend on this feature. Unfortunately, because expectations

are based on the same information set that determines backward-looking behavior, it is empirically

3

difficult to disentangle which type of behavior is dominant. Single-equation, limited-information

estimation methods therefore require appropriate instrumental variables, while full-information

approaches based on the likelihood (classical or bayesian) require complete and correctly specified

models of the economy that describe how available information is allocated.

This section presents the mechanics of our estimation method using as a backdrop the desire

to estimate a New Keynesian hybrid Phillips curve. We do this because determining the relative

degree of forward versus backward looking behavior plays such a pivotal role in designing optimal

monetary policy (see, e.g., Walsh, 2003). Further, the Phillips curve is one of the pillars on

which standard New Keynesian DSGE models are erected, and as we will show, our method is

conveniently scalable to estimate such systems.

The majority of current Phillips curve specifications are derived by imposing a friction on

a firm’s ability to adjust its price optimally (see, e.g. Calvo, 1983; Galí and Gertler, 1999;

Christiano, Eichenbaum and Evans, 2005; Eichenbaum and Fisher, 2007; to cite a few). The usual

set-up involves a continuum of monopolistically competitive, intermediate goods producing firms

that rent capital and labor in perfectly competitive factor markets. Depending on the choice of

friction, optimal price-setting rules depend on expectations of future aggregate prices and marginal

costs (or, under some further assumptions, the gap between actual and potential output).

We begin from a less ambitious theoretical vantage point and instead consider a common

formulation of New Keynesian monetary models, specifically

πt = γfEtπt+1 + γbπt−1 + γggt + επ,t (3)

gt = βfEtgt+1 + βbgt−1 − βr(Rt −Etπt+1) + εg,t (4)

Rt = (1− ρ) (ωππt + ωggt) + ρRt−1 + εR,t (5)

where the first equation is the New Keynesian hybrid Phillips curve with πt the aggregate inflation

rate, gt the output gap, and where the restriction γf + γb = 1 is commonly imposed as a result of

4

the theory; the second equation is the aggregate demand or IS curve with Rt the nominal interest

rate; and the third equation is the standard Taylor rule with interest rate smoothing. Such a

formulation has been studied extensively by Clarida, Galí and Gertler (1999) and more recently

by Lindé (2005) for a comparative study of the properties of GMM versus FIML estimation of

the Phillips curve in (3). In fact, we will use an extended formulation of this model to generate

Monte Carlo simulations in section 5.

In what follows we focus our attention to estimation of expression (3) exclusively. We do this

because it makes exposition of the mechanics of our estimator easier to understand but also to

highlight some of the properties of our estimator when used in a limited-information context.

It should be clear from our presentation how one would instead do full-information system’s

estimation and indeed, the formal derivation of the large sample properties in section 4 is done

under this more general assumption.

The familiar stable solution path of the system of equations (3)-(5) can be expressed as

yt = Ayt−1 + Cεt

where yt ≡ ( πt gt Rt)0; εt ≡ (επ,t εg,t εR,t)0 and A and C are coefficient matrices whose

values are nonlinear functions of the structural parameters©γf , γb, γg;βf ,βb,βR; ρ,ωπ,ωg

ª. De-

fine the resulting reduced-form residuals vt = Cεt, then this stable solution path admits a reduced-

form Wold representation given by

yt =∞Xh=0

Bhvt−h

where B0 = I; Bh are the reduced-form moving-average or impulse response coefficient matrices

and E (vtv0t) = Ωv = CΩεC0, where Ωε is a diagonal matrix. More generally, whether or not the

solution path has this convenient VAR(1) form is not important. What is important is that the

stability of the solution (which, for example, in other models has a VARMA form instead, see

5

e.g. Fernández Villaverde, Rubio Ramírez, Sargent and Watson 2007) ensures the existence of a

reduced-form Wold representation.

We also wish to highlight that we focus on the reduced-form representation because in practice,

there is usually no formal statistical procedure to verify commonly used structural identification

assumptions (such as the ubiquitous short-run or long-run recursive schemes). Further Fernández-

Villaverde et al. (2007) highlight the dangers of imposing incorrect identification assumptions

when estimating structural parameters with impulse response matching estimators. Our focus on

the reduced-form representation is a departure from what is common practice in the literature

(see, e.g. Christiano, Eichenbaum, and Evans, 2005) but a departure that we deem particularly

advantageous to the extent that the model’s parameters can be estimated from information about

the serial correlation properties of the data (which are unambiguous) rather than from the con-

temporaneous correlation between the variables in the system, where the direction of causation is

much harder to establish formally and is prone to generate inconsistent estimates.

The mechanics of our estimation method, which we call projection minimum distance (PMD),

are broadly described as follows. First, we obtain estimates of the first H elements Bh of the

Wold decomposition with local projections (Jordà, 2005). Second, substitute the variables in

expression (3) by their Wold representation to obtain a mapping between the Bh (for which

first stage estimates by local projections will now be available) and the parameters of interest,

γ ≡ (γf γb γg)0 . Minimize an appropriately weighted distance function to obtain consistent

and asymptotically normal estimates of γ. We explain these two steps in more detail.

Using matrix notation to facilitate the explanation and practical implementation of the esti-

mator, let X be a T 0 × n matrix where T 0 = T −H − k and where n is the number of variables

in the system (e.g. n = 3 in the example of expressions (3)-(5)). This matrix stacks the obser-

vations {πt gt Rt}T−Ht=k+1, let Y be a T0H × n matrix that stacks the H, T 0 × n matrices of

observations {πt+h gt+h Rt+h}Tt=H+k+1 for h = 1, ...,H and let Z collect T0×nk observations

6

corresponding to the k lags {πt−1 gt−1 Rt−1 ... πt−k gt−k Rt−k}T−H=kt=1 . Then, if B

is the nH×n matrix that stacks the h = 1, ...,H matrices Bh, it is easily estimated with the least

squares formula

bBT = ³I ⊗ (X 0MX)−1´ (I ⊗ (X 0MY )) (6)where M = I −Z(Z0Z)−1Z0 and where the covariance matrix of bbT = vec(bBT ) can be computedas

bΩb = bΨb ⊗ (X 0MX)−1, (7)where

bΨb = HXh=1

Φ0hη0η

(T −H − k)Φh,

Φh =

µ0 ... 0 I B1 ... BH−h−1

¶and η is the T ×n matrix of residuals of the local projection of yt+1 on to yt. In section 4 we will

show formally that

√T −H − k

³bbT − b0´ d→ N (0,Ωb)under rather general assumptions about the underlying data generating process.

The second stage consists of replacing πt and gt by their Wold expressions in expression (3).

This delivers the following mapping with the parameters of interest:

Bhi1 = Bh+1i1γf +Bh−1i1γb +Bhi2γg h = 1, ...,H (8)

where ij refers to the jth column of the identity matrix I. Given first stage estimates bBh andthe linear relation between these and the γ, formal estimates of the latter can be conveniently

calculated by least squares.

7

More formally, let S0, Sf , Sb be appropriate selector matrices such that, using the first stage

estimates bBh, expression (8) can be cast simultaneously for every h = 1, ...,H as

f(bbT ;γ) = hS0 bBT i1 − (Sf bBT i1 Sb bBT i1 S0 bBT i2)γithen consistent and asymptotically normal estimates of γ can be obtained by minimizing

minγQ(bbT ;γ) =f(bbT ;γ)0cWf(bbT ;γ)

where cW = ( bF 0bbΩb bFb)−1 and bFb = ∂f(bbT ;bγT )∂b . Hence, if one defines bBY ≡ S0 bBT i1 and bBX ≡( Sf bBT i1 Sb bBT i1 S0 bBT i2 ), the parameters of the Phillips curve in expression (3) can beestimated as:

bγT = ³bB0XcW bBX´−1 ³bB0XcW bBY ´ , (9)with covariance matrix

bΩγ = ³bB0XcW bBX´−1 . (10)Section 4 shows formally that for general problems

√T −H − k (bγT − γ0) d→ N (0,Ωγ)

where Ωγ = (F 0γWFγ)−1 and Fγ =

∂f(b;γ )∂γ . In other words, our estimator can be summarized by

the following two least-squares steps:

bBT = ³I ⊗ (X 0MX)−1´ (I ⊗ (X 0MY ))bγT = ³bB0XcW bBX´−1 ³bB0XcW bBY ´

and the covariance matrix of bγT computed as8

bΩγ = ³bB0XcW bBX´−1 .Several remarks deserve mention. First, the optimal weighting matrix cW described above

can be replaced with the identity matrix and still obtain consistent estimates of γ. This is called

the equal-weights estimator. The minimum distance literature (see Cameron and Trivedi, 2005)

suggests that the equal-weights estimator, although less efficient, has lower small-sample bias

when the sample size is specially short. Second, the optimal weighting matrix is a function of

bγT itself and hence (9) is not directly feasible. Although one could use a continuously updatedestimator, a simpler (and asymptotically equivalent) solution is to obtain bγEWT from the feasibleequal-weights estimator to construct the optimal weighting matrix and then obtain the optimal

weights estimator bγOWT and its covariance matrix. Third, when the optimal weights estimator isused and dim(f(bbT ;γ)) > dim(γ) then section 4 shows that

Q(bbT ; bγT ) d→ χ2dim(f(bbT ;γ ))−dim(γ )which provides a test of overidentifying restrictions (and hence model misspecification) along the

same lines as the J-test commonly used in GMM.

Minimum distance approaches are not new in macroeconomics. Although it is very rare to

find formal derivations of the statistical properties of these estimators (e.g. minimization of struc-

tural impulse response distances as in Rotemberg and Woodford, 1997; Christiano, Eichenbaum

and Evans, 2005; of minimization of VAR forecast distances as in Sbordone, 2002; 2005) this

is not where we see our most important contribution. Instead, the semi-parametric nature of

the first-stage allows us to be quite general and agnostic about the underlying DGP (which as a

consequence, includes VARMA specifications, for example).

This generality is useful in several respects. Like GMM (and unlike MLE) our method does not

require solving for the rational-expectations equilibrium and then selecting the appropriate stable

9

roots (we only require that the solution be stable so that we can invoke the Wold representation

theorem). Further, when the Euler expressions are linear, our estimator boils down to two simple

GLS-type steps. In addition, the flexibility of the first stage has several important payoffs with

respect to GMM.

First, in many covariance-stationary processes, the rate at which Bh → 0 as h → ∞ is quite

fast (exponential, typically) and hence, although in finite samples we truncate at some horizon H,

our estimator is almost as efficient as MLE (an example of which is provided in our Monte Carlo

experiments in section 5). The choice of truncation H in practice can be determined conveniently

with Hall, Inoue, Nason and Rossi’s (2007) information criterion, which is

bH = arg minH∈{hmin,...,hmax}

ln³¯̄̄bΩγ ¯̄̄´+ h ln

³pT/k

´³p

T/k´ (11)

where hmin is such that dim(f(bbT ;γ)) = dim(γ).Second, by assuming a Wold representation for yt, we are able to obtain closed-form analytic

expressions for the optimal weighting matrix cW rather than having to use a semi-parametricestimate such as Newey-West as is common in GMM. This results in obvious gains in efficiency

of the estimates as we shall see in the Monte Carlo experiments of section 5. Third, it turns out

that our estimator can be seen as a version of GMM that embeds a recursive pre-treatment of

potentially illegitimate instruments due to feedback, a feature that we will exploit to check for

model misspecification and that we elaborate on in more detail below. Finally, notice that the

method is fully scalable to systems and to nonlinear specifications with little difficulty.

3 Illegitimate Instruments

Micro-founded models of the macroeconomy distill a rich economic environment with many vari-

ables and a plethora of interactions into a few key relations that allow us to understand the

fundamental forces that drive the economy. The equilibrium conditions characterized by the re-

sulting Euler equations therefore impose considerable restrictions in the dynamic specifications

10

and included variables. Further, often times the best (or even the only) instruments available to

estimate such relations are lags of the endogenous variables specified in these expressions. This

section shows that the validity of these instruments depends on the data, not on Euler conditions

specified by the economic model and as a result, unmodeled dynamics and /or omitted variables

generate illegitimate instruments due to feedback and inconsistent GMM parameter estimates.

One solution would be to enrich the economic model to account more completely for the

features of the data and certainly many new models (e.g. An and Schorfheide, 2007; Christiano,

Eichenbaum and Evans, 2005; Smets and Wouters, 2003) have taken this approach while trying to

preserve enough tractability and the original economic insights of simpler models. However, it is

difficult to extend this technique as a general (albeit desirable) principle and the fact remains that

many popular Euler expressions fall well short of properly characterizing the statistical properties

of the data.

Here we show that a more practical solution consists in projecting the Euler conditions on

to the space of likely omitted dynamics/variables or alternatively, projecting the instruments

themselves onto this same space. We will show that one of the advantages of our estimation

method over GMM is due to this feature. Specifically, let us return to the example we presented

in the introduction, where a researcher is interested in estimating the expression

y = Y β + u (12)

where y is the dependent variable, Y are endogenous variables, and z are candidate instruments.

Notice that Y could contain other exogenous or predetermined variables in which case, they would

be included directly into z so that expression (12) is quite general.

As an example, suppose we are interested in estimating a Phillips curve with forward-looking

terms only (see, e.g. Galí, Gertler, and López-Salido, 2001), say

πt = βEtπt+1 + ut (13)

where for reasons that will become clear momentarily, we have omitted the usual term associated

11

with demand (e.g. marginal costs, the output gap, etc.). Instead, suppose the DGP is characterized

by

y = Y β + xγ + ε (14)

where x are exogenous and/or predetermined variables (such as other lags of y). Here the key is

to realize that E(z0x) 6= 0 and γ 6= 0 so that the z are invalid instruments in expression (12)

although they would be perfectly valid for (14).

In terms of the simple Phillips curve example, suppose the DGP is

πt = γfEtπt+1 + γbπt−1 + εt (15)

instead of that specified in expression (13). LetM ≡ I−x(x0x)−1x0 and notice that EL(z0Mx) = 0

where EL is the linear projection operator. Hence, if one is interested in estimating expression

(12) one could pursue two alternatives. One is to run the first stage regression

z = xφ+ ezand use ez (which are the residuals, not the predicted values as the typical two-stage least-squaresprocedure) as regular instruments in (12). Equivalently, one can project (12) on to the space of x

and estimate β from

ey = eY β + εusing z as instruments and where ey and eY are the residuals of the projections of y and Y onto x.In a nonlinear context, of course, the latter projection argument breaks down and the first option

is clearly more appropriate even if it is approximate.

We return now to the link between our discussion, GMM estimation and estimation by PMD.

Consider the running example given by expressions (13) and (15) where in particular, a researcher

12

estimates expression (13) using πt−h as an instrument. It is easy to see that

bβGMM = PTh πt−hπtPTh πt−hπt+1

= γf + γb

PTh πt−hπt−1PTh πt−hπt+1

+

PTh πt−hεtPT

h πt−hπt+1

and under typical assumptions

bβGMM p→ γf + γbφh−1φh+1where φh = cov(πt,πt−h). Hence bβ is an inconsistent estimate of γf as long as γb 6= 0 and thebias does not disappear by choosing later lags of πt since φh−1/φh+1 becomes indeterminate as h

grows.

Instead PMD suggests estimating β by choosing bβPMD such thatbbh = βPMDbbh+1

where Mt−h = I −Xt−h(X 0t−hXt−h)−1Xt−h with Xt−h = (1,πt−h−1, ...,πt−h−k)0,

bbh = PTh πt−hMt−hπtPTh πt−hMt−hπt−h

and hence

bβPMD = PTh πt−hMt−hπtPTh πt−hMt−hπt+1

specifically,

bβPMD = γf + γbPTh πt−hMt−hπt−1PTh πt−hMt−hπt+1

+

PTh πt−hMt−hεtPT

h πt−hMt−hπt+1

so that clearly

bβPMD p→ γf + γb δh−1δh+113

where δh is the conditional covariance between πt and πt−h and therefore δh → 0 as h→ 0 (with

positively serially correlated data).

In other words, the local projection step automatically projects the instrument, πt−h onto a

sub-space of omitted dynamics (throughXt−h), thus decreasingly sterilizing the sources of feedback

that make πt−h an illegitimate instrument. The smaller h is, the more the instrument is sterilized

and the smaller the bias (all the way down to zero in the limit). As a consequence, a natural

and complementary way to investigate model misspecification is by plotting estimates bβPMD asa function of h. If the model is correctly specified, the bβPMD(h) will be approximately the samefor any h. Otherwise, fluctuations in bβPMD(h) will be symptomatic of dynamic misspecificationwith the bβPMD(h) estimated with the smallest values of h being the less precise but the moreconsistent estimates of γf .

We conclude this section by remarking that estimates of the optimal weighting matrix in

GMM (a key element in constructing efficient standard errors) are notoriously problematic: non-

parametric spectral density estimators at frequency zero tend to have poor small sample properties

(see e.g. Christiano and Den Haan, 1996). In contrast, the assumption that the data has a Wold

representation allows us to provide a simple, analytic expression for the estimate of this matrix

with good small sample properties (given the general assumptions in the propositions we present

below).

Finally, we comment on the relationship between PMD and MLE by observing that the Wold

representation, under the common assumption of Gaussianity, is a complete representation of all

the data’s second order properties and hence, as the truncation horizon H →∞, PMD approaches

MLE. A similar result exists for GMM where if one were to use infinite moment conditions, then

one would recover MLE’s lower efficiency bound. However, most covariance-stationary processes

exhibit exponential rates of serial correlation that decay toward zero (just think of an AR(1) with

parameter 0.5 and its impulse response, which is 0.5, 0.25, 0.125, 0.0625, ...) and in practice one

can achieve similar parameter estimation efficiency to MLE with relatively small values of H as

14

the small Monte Carlo experiments of section 5 demonstrate.

4 Statistical Properties of PMD

This section derives the large sample approximate properties of our estimator in a general setting.

For this reason, the notation is slightly different than the notation in section 2. We begin by

showing that the first-stage local projection estimates are consistent and asymptotically normal

under general conditions and then show that the second stage estimators are also consistent and

asymptotically normal.

4.1 Asymptotic Properties of Local Projections: First Stage

Suppose the n× 1 vector yt is covariance-stationary with Wold representation given by

yt = μ+∞Xj=0

Bjut−j (16)

and where the ut are i.i.d., mean zero with finite covariance matrix Σu and the Bj satisfyP∞j=0 ||Bj || < ∞ where ||Bj ||2 = tr(B0jBj) with B0 = In. Further, assume det{B(z)} 6= 0 for

|z| ≤ 1 where B(z) =P∞j=0Bjz

j so that the process can be written in its infinite VAR represen-

tation

yt =∞Xj=1

Ajyt−j + ut

withP∞j=1 ||Aj ||

yt+h = Ah1yt + ...+A

hkyt−k+1 + vk,t+h

vk,t+h =∞X

j=k+1

Ahj yt−j + ut+h +h−1Xj=1

Bjut+h−j

Proposition 1 Consistency. Let {yt} satisfy (16) and assume that:

(i) E|uit, ujt, ukt, ult|

(ii) k satisfies

k3

T→ 0;T, k →∞

(iii) k satisfies

√T − k −H

∞Xk+1

||Aj ||→ 0;T, k →∞

Then

√T − k −Hvec(bBT −B0) d→ N (0,Ωb)

Ωb =£(X 0MX)−1 ⊗ Σv

¤bΣv = V V 0

T − k −H

where recall that bBT = (X 0MX)−1 (X 0MY ), Y is the T × nH matrix of observations for(yt+1, ...,yt+H)

0;X is the T × n matrix of observations for yt;M = I − Z(Z0Z)−1Z0 where Z

is the T × n(k + 1) matrix of observations for (1,yt−1, ...,yt−k+1)0 and bV =MY −MX bBT . Theproof is provided in the appendix. Notice that we have modified the dimensions of bBT with respectto section 2 to make the derivations here and in the appendix more straight-forward but without

loss of generality.

4.2 Statistical Properties of Projection Minimum Distance: Second

Step

Given bBT (and hence bbT = vec(bBT )) consider estimating γ as described in section 2 by minimizingminγbQT (bbT ;γ) =f ³bbT ;γ´0cWf ³bbT ;γ´

Let Q0(γ) denote the objective function at b0. Then the following lemma shows that the solution

of this problem, bγT is consistent for γ0.Lemma 3 Consistency. Given that bbT p→ b0 from proposition 1, assume that:(i) cW p→W is a positive semidefinite matrix.

17

(ii) Q0(γ) is uniquely maximized at (b0,γ0) = θ0 ∈ Θ

(iii) The parameter space Θ is compact

(iv) f(b0, γ) is continuous in a neighborhood of γ0 ∈ Θ

(v) instrument relevance condition: rank [WFγ ] = dim(γ) where Fγ =∂f(b0,γ0)

∂γ

(vi) identification condition: dim³f³bbT ;γ´´ ≥ dim(γ)

Then

bγT p→ γ0The proof is provided in the appendix where it is worth remarking that the proof takes H to be

finite and given. If instead H → ∞ with the sample size, then bbT becomes infinite-dimensionaland then one would have to appeal to higher order conditions (such as empirical process theory

and stochastic equicontinuity of f³bbT ;γ´ with respect to bbT ), which would make the proof more

general but far less transparent. By taking H to be finite, it is relatively straight-forward to show

that bQT (γ) p→Q0 uniformly (see Andrews 1994, 1995).Lemma 4 Normality. Assume:

(i) cW p→ W where W = (FbΩbFb)−1 , a positive definite matrix and where Fb is defined as inassumption (v) below.

(ii) bbT p→ b0; bγT p→ γ0 from proposition 1 and lemma 3.(iii) b0 and γ0 are in the interior of Θ.

(iv) f³bbT ;γ´ is continuously differentiable in a neighborhood N of θ0.

(v) There is a Fb and a Fγ that are continuous at b0 and γ0 respectively and

supb,γ∈N

||∇bf(b,γ)−Fb||p→ 0

supb,γ∈N

||∇γ f(b,γ)−Fγ ||p→ 0

18

(vi) For Fγ = Fγ (γ0) then F0γWFγ is invertible.

Then

√T −H − k (bγT − γ0) d→ N (0,Ωγ)

Ωγ =¡F 0γWFγ

¢−1The proof is provided in the appendix using the same principles required to derive the proof

of asymptotic normality typical of GMM and minimum distance problems (see e.g., Newey and

McFadden, 1994; Wooldridge, 1994). We have taken the simpler route here of brushing aside

weak instrument conditions/problems such as those discussed, e.g., in Bekker (1994), Staiger and

Stock (1997), Stock, Wright and Yogo (2002) and many others with assumption (v) in Lemma 3.

We felt it was more useful to provide the foundational results first and since the weak instrument

problems that can arise with projection minimum distance are of a similar nature than those

already investigated in the literature in a GMM context, we refer the reader to this literature

directly. In practice, we recommend choosing the optimal impulse response horizon using the

information criterion in Hall et. al. (2007), whose formula appears in expression (11). In finite

samples, all asymptotic expressions can be replaced by their usual small sample estimates. Lastly,

we note that Fb is a function of γ and hence the expression of the optimal weighting matrix cW =(F 0bΩbFb)

−1 cannot be computed directly. However, a consistent estimate of γ can be obtained

with the equal-weights matrix cW = I (lemma 3 only requires W to be positive semidefinite toachieve consistency) from which an estimate of γ can be obtained to then construct the optimal-

weights estimator and hence compute all the relevant statistics. In principle, one can iterate on

this procedure to refine the estimates of γ although asymptotically, one iteration is sufficient.

Finally, lemma 4 and standard results are all that is needed to show that a test of overidentifying

restrictions can be easily obtained by realizing that the minimum distance function bQT evaluatedat the optimum bbT , bγT has a chi-square distribution with degrees of freedom dim³f ³bbT ;γ´´−dim(γ).

19

5 Small-Sample Properties: Monte Carlo Experiments

This section contains Monte Carlo experiments designed to show that PMD is computationally

convenient while not incurring in significant efficiency losses relative to MLE in models whose

likelihood requires numerical algorithms for its maximization; that PMD provides more efficient

but similarly unbiased estimates to GMM when the specification of the model is correct; and

that PMD can be more robust than GMM to certain types of misspecification due to illegitimate

instrument problems. We showcase these features with two experiments: one compares estimation

of a simple ARMA(1,1) model estimated by PMD and by MLE. The second generates data from

an extended version of the New Keynesian model introduced in section 2 and compares the small

sample properties of the New Keynesian hybrid Phillips curve estimates obtained by PMD and

with GMM.

5.1 PMD vs. MLE

The data for this set of experiments is generated from the univariate ARMA(1,1) model

yt = ρyt−1 + εt + θεt−1 εt ∼ N(0, 0.5)

for the following four different pairs of parameter values: (1) ρ = 0.25, θ = 0.50; (2) ρ = 0.50,

θ = 0.25; (3) ρ = 0, θ = 0.5; and (4) ρ = 0.5, θ = 0. The last two cases are a pure MA(1) and a

pure AR(1) models but they will be specified as ARMA(1,1) models in the estimation.

Each of the 1,000 simulation runs has the following features. We use 500 burn-in replications

to avoid initialization issues with sample sizes T = 50, 100, and 400. The lag length of the local

projection step is determined automatically by AICC — a correction to AIC for autoregressive

models introduced by Hurvich and Tsai (1989) with better small sample properties than alternative

information criteria. For the minimum distance step, we experiment with fixed values H = 2, 5,

and 10. For H = 2, we have just-identification, otherwise, we have overidentifying restrictions.

20

Given our choices of ρ and θ in all 4 cases, for H = 5 the impulse response coefficients are all

very close to 0 at that horizon. Hence, by including the case where H = 10 we hope to capture

possible distortions to the parameter estimates of ρ and θ generated by first stage estimates that

have virtually zero information content (akin to having weak instruments). It is worth remarking

that while MLE requires numerical optimization routines, PMD for this example requires two very

simple least-squares steps. Tables 1.1-1.4 summarize the experiments by reporting Monte Carlo

averages and standard errors of the parameter estimates calculated with the analytic formulas of

the large-sample approximations. In addition, empirical Monte Carlo standard errors are provided

as a check that the formulas provide appropriate values.

The tables show that PMD estimates converge to the true parameter values as the sample size

grows at roughly the same or better speed than MLE estimates. This is true even for the small

samples T = 50 although when H = 10, there is a clear deterioration of the PMD estimates, not

surprisingly. What is surprising though, is that the effect of having a large number of conditions

with little information value (H = 10 rather than H = 2 or 5) does not appear to distort the

estimates (or the standard errors) with sample sizes as low as T = 100 observations. For sample

sizes T = 100 and 400, PMD and MLE standard errors are virtually the same (when H = 5, 10)

and comparable to the empirical Monte Carlo values. Finally, we remark that in tables 3 and 4

(the pure MA(1) and AR(1) DGPs) we had difficulty getting convergence of the MLE estimator

for all the runs. Instead of trying to redo (or disregard) specific runs, we preferred to leave the

results blank as a way to highlight that although MLE run into numerical difficulties, PMD is

numerically stable and robust in all the cases. Thus, a fair summary of these experiments suggests

that PMD has very good small sample properties, converging quickly to the theoretical values and

with relatively the same efficiency as MLE even though PMD uses simple least squares algebra

and MLE requires numerical routines to maximize the likelihood.

21

5.2 PMD vs. GMM

This set of experiments borrows several elements from the simulation study in Lindé (2005). In

that paper, the objective was to compare the small sample properties of GMM vs. FIML estimation

of the New Keynesian hybrid Phillips curve (such as expression (3)). Here we simulate data from

a slightly modified version of the New Keynesian model discussed in section 2, equations (3)-(5)

and compare GMM to PMD instead. Specifically, data will be generated from the model

⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩πt = γfEtπt+1 + γ

1bπt−1 + γ

2bπt−2 + γggt + επ,t

gt = βfEtgt+1 + β1bgt−1 + β

2bgt−2 − βr(Rt −Etπt+1) + εg,t

Rt = (1− ρ)(ωππt + ωggt) + ρRt−1 + εR,t⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩επ,t = uπ,t

εg,t = ρgεg,t−1 + ug,t

εR,t = ρRεR,t−1 + uR,t

ut ∼ N

⎛⎜⎜⎜⎜⎜⎜⎝

⎡⎢⎢⎢⎢⎢⎢⎣0

0

0

⎤⎥⎥⎥⎥⎥⎥⎦ ;⎡⎢⎢⎢⎢⎢⎢⎣0.52 0 0

0 0.2882 0

0 0 0.2522

⎤⎥⎥⎥⎥⎥⎥⎦

⎞⎟⎟⎟⎟⎟⎟⎠for different combinations of parameters to be made more explicit shortly. First, however, notice

we modified the Phillips and IS curves slightly to include an extra lag than what is conventional.

We provide no theoretical justification for this but use this device as a way to generate small

distortions to the canonical specification and check the robustness of PMD and GMM to dynamic

misspecification. Hence, some of the simulations are conducted with γf + γ1b = 1 and γ

2b = 0 (and

similarly for the IS curve parameters), which is the standard specification. In other experiments,

we simply set γf + γ1b + γ

2b = 1 and γ

1b = γ

2b (and similarly for the IS curve parameters) to induce

additional serial correlation.

Most of the parameter choices are borrowed from Lindé (2005) and we refer the reader for a

more careful justification of these choices there. We investigate three primary different combina-

tions of parameters:

1. γf = βf = 0.7; γ1b = β

1b = 0.3 or γ

1b = γ

2b = β

1b = β

2b = 0.15; γg = 0.13; and βr = 0.09

22

2. γf = βf = 0.5; γ1b = β

1b = 0.5 or γ

1b = γ

2b = β

1b = β

2b = 0.25; γg = 0.25; and βr = 0.30

3. γf = βf = 0.3; γ1b = β

1b = 0.7 or γ

1b = γ

2b = β

1b = β

2b = 0.35; γg = 0.40; and βr = 1

The Taylor rule parameters are the same in all cases with ρ = 0.5;ωπ = 1.5 and ωg = 0.5

and the shock processes are allowed to take the two pairs of values ρg = 0.5 and ρR = 0.8

or ρg = ρR = 0. The latter case is included as a benchmark since then, a standard New-

Keynesian hybrid Phillips curve specification estimated by GMM using as instruments lagged

values of the endogenous variables (including lags of Rt) is correct and should provide estimates

of the parameters close to the theoretical values. Like Lindé (2005), we experimented by allowing

Rt to be part of the instrument set originally. However, we found the distortions to the GMM

estimates to be so considerable with respect to the PMD estimates that we decided to include, for

completeness, estimates that only use lagged values of Rt.

1,000 Monte Carlo runs are generated with the different combinations of parameters described

above, in all 36 different cases summarized in tables 2.1.1-2.3.2. Each run is initialized with 500

burn-in replications with which a sample of 200 observations (as in Lindé, 2005) is then generated.

Because we argued in section 3 that dynamic misspecification can be best detected when there is

variation in the parameter estimates as a function of the choice of impulse response horizon H

selected, we report estimates based on 2 impulse response horizons and based on impulse response

horizons optimally selected with Hall et al.’s (2007) information criterion. At the same time, we

compute GMM estimates based on the same lags for comparison purposes. The lag length of the

first stage local projections is automatically selected by AICC as in our previous experiments.

It would be very tedious to comment on each of the numerous cases investigated but some

general lessons are apparent. First, when the shocks are i.i.d. (so that there are no distortions

to the internal dynamics of the model) and we examine the case we label “Benchmark” (with

the traditional dynamic specification), both PMD and GMM provide good estimates although

estimates of the output gap parameter of the Phillips curve tend to be somewhat downward

23

biased with GMM but to a much lesser extent with PMD. Virtually in all cases, PMD estimates

are more efficient than their GMM counterparts as we had anticipated. Generally speaking, using

Rt as an instrument turns out to be a very bad idea for GMM estimation. The reason is that

this is clearly an invalid instrument and Tables 2.1.2, 2.2.2, and 2.3.2 make this clear. This is

less of a problem for PMD because all instruments essentially are orthogonalized with respect to

past information and this solves to a great extent this problem. When Rt is excluded from the

instrument list, GMM performs much better and we concentrate on these tables next (tables 2.1.1,

2.2.1, and 2.3.1).

Whether the dynamic structure is modified by allowing serial correlation in the shocks, richer

dynamics in the Phillips curve, or richer dynamics in the IS curve, both PMD and GMM have

more difficulty in obtaining accurate estimates of the parameters, specially the output gap pa-

rameter. For example, in table 2.1.1 the additional serial correlation in the structural inflation

Euler equation is enough to cause estimates of the output parameter to flip sign, although more

generally, we simply observed estimates that were downward biased. Distortions to the degree

of forward/backward looking behavior of the Phillips curve were, on the other hand, much more

muted although the distortions obtained with GMM tend to be considerable larger than with

PMD. In these cases, the parameter estimates changed quite a bit with the number of instruments

included, an indication of specification problems along the lines anticipated in our discussion of

section 3.

Overall, while PMD was not a universal panacea for every foreseeable type of misspecification,

we obtained estimates that had a smaller bias than GMM in the majority of the cases. When

the model was correctly specified, there was little difference between the methods but even here

PMD was less biased and provided more efficient estimates. The introduction of relatively small

distortions in the dynamic behavior of the model was enough to generate considerable distortions

in the estimation of the output gap parameter, which plays a very prominent role in this literature.

Almost in every case considered, the distortion caused the parameter to be downward biased. PMD

24

mitigates this bias somewhat with respect to GMM but not to the extent that would have been

desirable.

6 Empirical Application: Fuhrer and Olivei (2005) Revis-

ited

Estimating the Phillips and IS curves in expressions (3) and (4) by limited-information methods is

difficult due to the poor small-sample properties of popular estimators. Fuhrer and Olivei (2005)

discuss the weak instrument problem that characterizes GMM in this type of application and

then propose a GMM variant where the dynamic constraints of the economic model are imposed

on the instruments to improve small sample performance. They dub this procedure “optimal

instruments” GMM (OI−GMM) and explore its properties relative to conventional GMM and

MLE estimators with Monte Carlo experiments.

We find it is useful to apply PMD to the same examples Fuhrer and Olivei (2005) analyze to

provide the reader a context of comparison for our method. The basic specification is (using the

same notation as in Fuhrer and Olivei, 2005):

zt = (1− μ) zt−1 + μEtzt+1 + γEtxt + εt (18)

In the output Euler equation, zt is a measure of the output gap, xt is a measure of the real interest

rate, and hence, γ < 0. In the inflation Euler version of (18), zt is a measure of inflation, xt is

a measure of the output gap, and γ > 0 signifying that a positive output gap exerts “demand

pressure” on inflation.

Fuhrer and Olivei (2005) experiment with a quarterly sample from 1966:Q1 to 2001:Q4 and use

the following measures for zt and xt. The output gap is measured, either by the log deviation of

real GDP from its Hodrick-Prescott (HP) trend or, from a segmented time trend (ST) with breaks

in 1974 and 1995. Real interest rates are measured by the difference of the federal funds rate

25

and next period’s inflation. Inflation is measured by the log change in the GDP, chain-weighted

price index. In addition, Fuhrer and Olivei (2005) experiment with real unit labor costs (RULC)

instead of the output gap for the inflation Euler equation. Further details can be found in their

paper.

Table 3.1 and figure 1 summarize the empirical estimates of the output Euler equation and

correspond to the results in table 4 in Fuhrer and Olivei (2005), whereas table 3.2 and figure 2

summarize the estimates of the inflation Euler equation and correspond to the results in Table 5

instead. For each Euler equation, we report the original GMM,MLE, andOI−GMM estimates and

below these, we include the PMD results based on choosing h with Hall et al.’s (2007) information

criterion. The top panels of figures 1 and 2 display the estimates of μ and γ in (18) as a function

of h and the associated two-standard error bands. The bottom left panel displays the value of Hall

et al.’s (2007) information criterion and the bottom right panel, the p-value of the overidentifying

restrictions misspecification test.

Since the true model is unknowable, there is no definitive metric by which one method can be

judged to offer closer estimates to the true parameter values. Rather, we wish to investigate in

which ways PMD coincides or departs from results that have been well studied in the literature.

We begin by reviewing the estimates for the output Euler equation reported in table 3.1 and figure

1. PMD estimates of μ are close to GMM estimates but with similar standard errors, and not

very different from MLE or OI-GMM. On the other hand, PMD estimates for γ are slightly larger

in magnitude, of the correct sign and statistically significant. This would seem like good news,

however as figure 1 shows, while the estimates of μ appear to be somewhat stable to the choice

of h, the estimates of γ are positive for any h < 7. This suggests that estimates of γ should be

taken with caution as the model is likely dynamically misspecified (although the misspecification

test does not suggest anything evident).

Estimates of the inflation Euler equation follow a similar pattern. For all three specifications,

μ and γ are estimated to be similar to the GMM estimates but in all three specifications, the

26

misspecification tests rejects the model very clearly. Figure 2 shows that while estimates of μ are

relatively stable, estimates of γ for the HP and ST specifications are virtually negative for any

h. The RULC specification suggests γ is mostly positive (with γ negative only for h = 3 and 4).

Overall, the results suggests caution since every indication (from the overidentifying restrictions

tests to the plots of the parameter estimates as a function of h) is that the model is dynamically

misspecified.

With the exception of the inflation Euler model estimated with RULC, we find that the data

reject most of the specifications commonly estimated (either outright, as indicated by the overi-

dentifying restrictions test, or because of the variation of the parameter estimates as a function

of h). The ability to check model specification by these two complementary methods is useful (es-

pecially in instances when the data do not reject the model but variation in parameters estimates

for low values of h is substantial). With some notable exceptions, PMD estimates are often close

to estimates obtained by other methods but with smaller standard errors so that at a minimum,

we are able to ascertain that our results are not caused by extreme differences.

7 Conclusion

This paper introduces a disarmingly simple and novel method of estimation for macroeconomic

data. Several features make it appealing: (1) for many models, including some whose likelihood

would require numerical optimization routines, PMD only requires simple least-squares algebra;

(2) for many models, PMD approximates the maximum likelihood estimator in relatively small

samples; (3) however, PMD is efficient in finite samples because it accounts for serial correlation

in a convenient parametric way; (4) as a consequence, PMD is generally more efficient than GMM;

(5) PMD provides an unsupervised method of conditioning for unknown omitted dynamics that

in many cases mitigates invalid instrument problems; (6) PMD provides many natural statistics

with which to evaluate estimates of a model including, an overall misspecification test, and a way

27

to assess which parameter estimates are most sensitive to misspecification.

The paper provides basic but generally applicable asymptotic results and ample Monte Carlo

evidence in support of our claims. In addition, the empirical application provides a natural

example of how PMD may be applied in practice. However, there are many research questions

that space considerations prevented us from exploring. Throughout the paper, we have mentioned

some of them, such as the need for a more detailed investigation of the power properties of the

misspecification test in light of the GMM literature; and generalizations of our basic assumptions

in the main theorems.

Other natural extensions include nonlinear generalizations of the local projection step to ex-

tend beyond the Wold assumption. Such generalizations are likely to be very approachable because

local projections lend themselves well to more complex specifications. Similarly, we have excluded

processes that are not covariance-stationary, mainly because they require slightly different as-

sumptions on their infinite representation and the non-standard nature of the asymptotics are

beyond the scope of this paper. In the end, we hope that the main contribution of the paper is

to provide applied researchers with a new method of estimation that is simpler than many others

available, while at the same time more robust and informative.

8 Appendix

8.1 Definitions and Notation

We find it useful to begin by defining and collecting the notation that we use for the proofs of the

propositions and lemmas introduced above. Specifically:

(i) Xt,k−1kn×1

=¡y0t,y

0t−1, ...,y

0t−k+1

¢0(ii) Yt,H

Hn×1=¡y0t+1, ...,y

0t+H

¢0(iii) Mt−1,k

1×1= 1−

PT−ht=k X

0t−1,k

³PT−Ht=k Xt−1,kX

0t−1,k

´−1Xt−1,k

28

(iv) dΓn×n

(j)=(T-k-H)−1PT−Ht=k yty

0t−j

(v) dΓn×n

(j|1-k)=(T-k-H)−1PT−H

t=k ytMt−1,ky0t−j

(vi) bΓkkn×kn

=(T-k-H)−1PT−Ht=k Xt,kX

0t,k

(vii) bΓ1−k,hkn×n

= (T − k −H)−1PT−Ht=k Xt,ky

0t+h;h = 1, ...,H

(viii) bΓ1−H|1−kHn×n

= (T − k −H)−1PT−H

t=h Yt,HMt−1,ky0t

8.2 Proof of Proposition 1

The mean-square error linear predictor of yt+h based on yt, ...,yt−k+1 is bA(k, h)Xt,k−1 wherebA(k, h) is given by the least-squares formula

bAn×kn

(k,h)=( bAh1 ,..., bAhk)=bΓ01−k,hbΓ−1k (19)Notice that

bA(k, h)−A(k, h) = bΓ01−k,hbΓ−1k −A(k, h)bΓkbΓ−1k =⎧⎨⎩(T − k − h)−1∞Xj=k

vk,t+hX0t,k

⎫⎬⎭ bΓ−1kwhere

vk,t+h =∞X

j=k+1

Ahj yt−j + ut+h +h−1Xj=1

Bjut+h−j

Hence,

bA(k, h)−A(k, h) =⎧⎨⎩(T − k − h−1)

T−hXt=k

⎛⎝ ∞Xj=k+1

Ahj yt−j

⎞⎠X 0t,k⎫⎬⎭ bΓ−1k +(

(T − k − h−1)T−hXt=k

ut+hX0t,k

)bΓ−1k +⎧⎨⎩(T − k − h−1)T−hXt=k

⎛⎝ hXj=1

Bjut+h−j

⎞⎠X 0t,k⎫⎬⎭ bΓ−1k

29

Define the matrix norm kCk21 = supl 6=0 l0C0C0

l0l , that is, the largest eigenvalue of C0C. When C is

symmetric, this is the square of the largest eigenvalue of C. Then

kABk2 ≤ kAk21 kBk2 and kABk2 ≤ kAk2 kBk21

Hence °°° bA(k, h)−A(k, h)°°° ≤ kU1Tk°°°bΓ−1k °°°1+ kU2T k

°°°bΓ−1k °°°1+ kU3T k

°°°bΓ−1k °°°1

where

U1T =

⎧⎨⎩(T − k − h−1)T−hXt=k

⎛⎝ ∞Xj=k+1

Ahj yt−j

⎞⎠X 0t,k⎫⎬⎭

U2T =

((T − k − h−1)

T−hXt=k

ut+hX0t,k

)

U3T =

⎧⎨⎩(T − k − h−1)T−hXt=k

⎛⎝ hXj=1

Bjut+h−j

⎞⎠X 0t,k⎫⎬⎭

Lewis and Reinsel (1985) show that°°°bΓ−1k °°°

1is bounded, therefore, the next objective is to show

kU1T kp→ 0, kU2T k

p→ 0, and kU3Tkp→ 0. We begin by showing kU2T k

p→ 0, which is easiest to

see since ut+h and X 0t,k are independent, so that their covariance is zero. Formally and following

similar derivations in Lewis and Reinsel (1985)

E³kU2T k2

´= (T − k − h)−2

T−hXt=k

E¡ut+hu

0t+h

¢E(X 0t,kX

0t,k)

by independence. Hence

E³kU2T k2

´= (T − k − h)−1tr(Σu)k {tr [Γ(0)]}

Since kT−k−H → 0 by assumption (ii), then E³kU2Tk2

´p→ 0, and hence kU2T k

p→ 0.

Next, consider kU3T kp→ 0. The proof is very similar since ut+h−j, j = 1, ..., h − 1 and X 0t,k

are independent. As long as kBjk2

Finally, we show that kU1T kp→ 0. The objective here is to show that assumption (iii) implies

that

k1/2∞X

j=k+1

°°Ahj °°→ 0, k, T → 0because we will need this condition to hold to complete the proof later. Recall that

Ahj = Bh−1Aj +Ah−1j+1 ; A

0j+1 = 0; B0 = Ir; h, j ≥ 1, h finite

Hence

k1/2∞X

j=k+1

°°Ahj °° = k1/2⎧⎨⎩

∞Xj=k+1

kBh−1Aj +Bh−2Aj+1 + ...+B1Aj+h−2 +Aj+h−1k

⎫⎬⎭by recursive substitution. Thus

k1/2∞X

j=k+1

°°Ahj °° ≤ k1/2⎧⎨⎩

∞Xj=k+1

kBh−1Ajk+ ...+ kB1Aj+h−2k+ kAj+h−1k

⎫⎬⎭Define λ as the max {kBh−1k , ..., kB1k} , then since

P∞j=0 kBjk

8.3 Proof of Proposition 2

Notice that

bA(k, h)−A(k, h) = ((T − k − h)−1 T−hXt=k

vk,t+hX0t,k

)bΓ−1k= (T − k − h)−1

⎡⎣T−hXt=k

⎧⎨⎩⎛⎝ ∞Xj=k+1

Ahj yt−j

⎞⎠+ ut+h + h−1Xj=1

Bjut+h−j

⎫⎬⎭X 0t,k⎤⎦ bΓ−1k

= (T − k − h)−1⎧⎨⎩T−hXt=k

⎛⎝ ∞Xj=k+1

Ahj yt−j

⎞⎠X 0t,k⎫⎬⎭nΓ−1k + ³bΓ−1k − Γ−1k ó+

(T − k − h)−1⎧⎨⎩T−hXt=k

⎛⎝ut+h + h−1Xj=1

Bjut+h−j

⎞⎠X 0t,k⎫⎬⎭nΓ−1k + ³bΓ−1k − Γ−1k ó

Hence, the strategy of the proof will consist in showing that the first term in the sum above vanishes

in probability so that,

(T − k − h)1/2 vech bA(k, h)−A(k, h)i p→

(T − k − h)1/2 vec

⎡⎣(T − k − h)−1⎧⎨⎩T−hXt=k

⎛⎝ut+h + h−1Xj=1

Bjut+h−j

⎞⎠X 0t,k⎫⎬⎭Γ−1k

⎤⎦ .and then all we need to do is show that this last term is asymptotically normal. First we prove the

convergence in probability result in this last expression. Define,

U1T =

⎧⎨⎩(T − k − h)−1T−hXt=k

⎛⎝ ∞Xj=k+1

Ahj yt−j

⎞⎠X 0t,k⎫⎬⎭

U∗2T =

⎧⎨⎩(T − k − h)−1T−hXt=k

⎛⎝ut+h + h−1Xj=1

Bjut+h−j

⎞⎠X 0t,k⎫⎬⎭

then

(T − k − h)1/2 vech bA(k, h)−A(k, h)i =

(T − k − h)1/2

⎧⎪⎪⎨⎪⎪⎩vec

£U1TΓ

−1k

¤+ vec

hU1T

³bΓ−1k − Γ−1k í+vec

£U∗2TΓ

−1k

¤+ vec

hU∗2T

³bΓ−1k − Γ−1k í⎫⎪⎪⎬⎪⎪⎭

32

hence

(T − k − h)1/2 vech bA(k, h)−A(k, h)i− (T − k − h)1/2 vec £U∗2TΓ−1k ¤ =

(T − k − h)1/2

⎧⎪⎪⎨⎪⎪⎩vec

£U1TΓ

−1k

¤+ vec

hU1T

³bΓ−1k − Γ−1k ´i+vec

hU∗2T

³bΓ−1k − Γ−1k ´i⎫⎪⎪⎬⎪⎪⎭ =

¡Γ−1k ⊗ Ir

¢vec

h(T − k − h)1/2 U1T

i+n³bΓ−1k − Γ−1k ´⊗ Iro vec h(T − k − h)1/2 U1T i+n³bΓ−1k − Γ−1k ´⊗ Iro vec h(T − k − h)1/2 U∗2T i

Define, with a slight change in the order of the summands,

W1T =n³bΓ−1k − Γ−1k ´⊗ Iro vec h(T − k − h)1/2 U1T i

W2T =n³bΓ−1k − Γ−1k ´⊗ Iro vec h(T − k − h)1/2 U∗2T i

W3T =¡Γ−1k ⊗ Ir

¢vec

h(T − k − h)1/2 U1T

iThe proof proceeds by showing that W1T

p→ 0, W2Tp→ 0, W3T

p→ 0.

We begin by showing that W1Tp→ 0. Lewis and Reinsel (1985) show that under assumption (ii),

k1/2°°°bΓ−1k − Γ−1k °°°

1

p→ 0 and E³°°°k−1/2 (T − k − h)1/2 U1T°°°´ ≤ s (T − k − h)1/2P∞j=k+1 °°Ahj °° p→

0; k, T → ∞ from assumption (iii) and using similar derivations as in the proof of consistency

with s being a generic constant. Hence W1Tp→ 0.

Next, we show W2Tp→ 0. Notice that

|W2T | ≤ k1/2°°°bΓ−1k − Γ−1k °°°

1

°°°k−1/2(T − k − h)1/2U∗2T°°°As in the previous step, Lewis and Reinsel (1985) establish that k1/2

°°°bΓ−1k − Γ−1k °°°1

p→ 0 and from

the proof of consistency, we know the second term is bounded in probability, which is all we need

to establish the result.

Lastly, we need to show W3Tp→ 0, however, the proof of this result is identical to that in Lewis

33

and Reinsel once one realizes that assumption (iii) implies that

(T − k − h)1/2∞X

j=k+1

°°Ahj °° p→ 0and substituting this result into their proof.

The asymptotic normality result then follows directly from Lewis and Reinsel (1985) by redefin-

ing

ATm = (T − k − h)1/2 vec

⎡⎣⎧⎨⎩(T − k − h)−1T−hXt=k

⎛⎝ut+h + h−1Xj=1

Bjut+h−j

⎞⎠X 0t,k(m)⎫⎬⎭Γ−1k

⎤⎦for m = 1, 2, ... and Xt,k(m) as defined in Lewis and Reinsel (1985) and using their proof.

8.4 Proof of Lemma 3

Since bbT p→ b0, then

f³bbT ;φ´ p→ f (b0;φ)

by the continuous mapping theorem since by assumption (iv), f (.) is continuous. Furthermore

and given assumption (i)

bQT (φ) = f ³bbT ;φ´0cWf ³bbT ;φ´ p→ f (b0;φ)0cWf (b0;φ) ≡ Q0 (φ)which is a quadratic expression that is maximized at φ0. Assumption (vi) provides a necessary

condition for identification of the parameters (i.e., that there be at least as many matching condi-

tions as parameters) that must be satisfied to establish uniqueness. As a quadratic function, Q0(φ)

is obviously a continuous function. The last thing to show is that

supφ∈Θ

¯̄̄ bQT (φ)−Q0(φ)¯̄̄ p→ 0uniformly.

For compact Θ and continuous Q0(φ), Lemma 2.8 in Newey and McFadden (1994) provides

that this condition holds if and only if bQT (φ) p→ Q0(φ) for all φ in Θ and bQT (φ) is stochastically34

equicontinuous. The former has already been established, so it remains to show stochastic equicon-

tinuity of bQT (φ).1 Whether bQT (φ) is stochastically equicontinuous depends on each applicationand, specifically, on the properties and assumptions made on the specific nature of f (.) . In the

example that we use in this paper and presented in section 2, the function f(.) is rather trivial

(linear in the parameters) so that proving uniform convergence is rather easy and we do not really

require stochastic equicontinuity (since we have assumed that H is finite and not a function of

T ). In general, we directly assume here that stochastic continuity holds and we refer the reader

to Andrews (1994, 1995) for examples and sets of specific conditions that apply even when b is

infinite dimensional and for more general forms of f(.).

8.5 Proof of Lemma 4

Under assumption (iii) b0 and γ0 are in the interior of their parameter spaces and by assumption

(ii) bbT p→ b0, bγT p→ γ0. Further, by assumption (iv), f(bbT ; γ) is continuously differentiable in aneighborhood of b0 and γ0 and hence bγT solves the first order conditions of the minimum-distanceproblem

minγf(bbT ;γ)0cWf(bbT ;γ)

which are

Fγ

³bbT ;γ´0cWf(bbT ;γ) = 0By assumption (iv), these first order conditions can be expanded about γ0 in mean value expansion

f(bbT ; bγT ) = f(bbT ;γ0) + Fγ ³bbT ;γ´ (bγT − γ0)where γ ∈ [bγT , γ0]. Similarly, a mean value expansion of f(bbT ; γ0) around b0 is

f(bbT ;γ0) = f(b0;γ0) + Fb ¡b;γ0¢ ³bbT − b0´1 Stochastic equicontinuity: For every ², η > 0 there exists a sequence of random variables ∆̂t and a sample

size t0 such that for t ≥ t0, Prob(|∆̂T | > ²) < η and for each φ there is an open set N containing φ withsupφ̃∈N

¯̄̄ bQT (φ̃)− bQT (φ)¯̄̄ ≤ ∆̂T , for t ≥ t0.

35

Combining both mean value expansions and multiplying by√T, we have

√Tf(bbT ; bγT ) = √Tf(b0;γ0) + Fγ ³bbT ;γ´√T (bγT − γ0) +

Fb¡b;γ0

¢√T³bbT − b0´

Since b ∈ [bbT , b0], γ ∈ [bγT , γ0] and bbT p→ b0, bγT p→ γ0 then, along with assumption (iv), we haveFγ

³bbT ;γ´ p→ Fγ (b0;γ0) = FγFb¡b;γ0

¢ p→ Fb(b0;γ0) = Fband hence

√Tf(bbT ; bγT ) = √Tf(b0;γ0) + Fγ√T (bγT − γ0) + Fb√T ³bbT − b0´+ op(1)

In addition, by assumption (i) cW p→ W and notice that f (b0,γ0) = 0, which combined with thefirst order conditions and the mean value expansions described above, allow us to write

−F 0γWhFγ√T (bγT − γ0) + Fb√T ³bbT − b0´i = op(1)

Since we know that

√T³bbT − b0´ d→ N (0,Ωb)

by proposition 2, then

√T (bγT − γ0) d→ − ¡F 0γWFγ ¢−1 ¡F 0γWFb¢√T ³bbT − b0´

by assumption (vii) which ensures that F 0γWFγ is invertible. Therefore, from the previous expres-

sion we arrive at

√T (bγT − γ0) d→ N (0,Ωγ )

Ωγ =¡F 0γWFγ

¢−1 ¡F 0γWFbΩbF

0bWFγ

¢ ¡F 0γWFγ

¢−1Notice that since we are using the optimal weighting matrix, then W = (FbΩbF 0b)

−1 and hence,

the previous expression simplifies considerably to

36

Ωγ =¡F 0γWFγ

¢−1W = (FbΩbF

0b)−1

ReferencesAn, Sungbae and Frank Schorfheide (2007) “Bayesian Analysis of DSGE Models,” Econo-metric Reviews, 26(2-4): 113-172.

Andrews, Donald W. K. (1994) “Asymptotics for Semiparametric Econometric Models viaStochastic Equicontinuity,” Econometrica, 62(1): 43-72.

Andrews, Donald W. K. (1995) “Non-parametric Kernel Estimation for SemiparametricModels,” Econometric Theory, 11(3): 560-596.

Bekker, Paul A. (1994) “Alternative Approximations to the Distribution of InstrumentalVariable Estimators,” Econometrica, 62: 657-681.

Calvo, Guillermo A. (1983) “Staggered Prices in a Utility Maximizing Framework,” Journalof Monetary Economics, 12: 383-98.

Cameron, A. Colin and Pravin K. Trivedi (2005) Microeconometrics: Methods andApplications. Cambridge: Cambridge University Press.

Canova, Fabio (2007) Methods for Applied Macroeconomic Research. Princeton:Princeton University Press.

Christiano, Lawrence J. and Wouter den Haan (1996) “Small-Sample Properties of GMMfor Business Cycle Analysis,” Journal of Business and Economic Statistics, 14(3): 309-327.

Christiano, Lawrence J., Martin Eichenbaum, and Charles L. Evans (2005) “Nominal Rigidi-ties and the Dynamic Effects of a Shock to Monetary Policy,” Journal of Political Economy,113(1): 1-45.

Clarida, Richard, Jordi Galí and Mark Gertler (1999) “The Science of Monetary Policy: ANew Keynesian Perspective,” Journal of Economic Literature, 37(4): 1661-1707.

Eichenbaum, Martin and Jonas D. M. Fisher (2007) “Estimating the Frequency of PriceRe-optimization in Calvo-style Models,” Journal of Monetary Economics, 54(7): 2032-2047.

Fernández-Villaverde, Jesús, Juan F. Rubio-Ramírez, Thomas J. Sargent and Mark W. Wat-son (2007) “A, B, Cs (and Ds) of Understanding VARs,” American Economic Review,97(3):1021-1026.

Fuhrer, Jeffrey C. and Giovanni P. Olivei (2005) “Estimating Forward-Looking Euler Equa-tions with GMM Estimators: An Optimal Instruments Approach,” in Models and Mon-etary Policy: Research in the Tradition of Dale Henderson, Richard Porter,and Peter Tinsley, Board of Governors of the Federal Reserve System: Washington, DC,87-104.

37

Galí, Jordi and Mark Gertler (1999) “Inflation Dynamics: A Structural Econometric Ap-proach,” Journal of Monetary Economics, 44(2): 195-222.

Galí, Jordi, Mark Gertler and David J. López-Salido (2001) “European Inflation Dynamics,”European Economic Review, 45(7): 1237-1270 (Erratum, September 2002).

Galí, Jordi, Mark Gertler and David J. López-Salido (2005) “Robustness of the Estimates ofthe Hybrid New Keynesian Phillips Curve,” Journal of Monetary Economics, 52(6): 1107-1118.

Gonçalves, Silvia and Lutz Kilian (2006) “Asymptotic and Bootstrap Inference for AR(∞)Processes with Conditional Heteroskedasticity,” Econometric Reviews, forthcoming.

Hall, Alastair, Atsushi Inoue, James M. Nason, and Barbara Rossi (2007) “InformationCriteria for Impulse Response Function Matching Estimation of DSGE Models,” Duke Uni-versity, mimeo.

Hurvich, Clifford M. and Chih-Ling Tsai (1989) “Regression and Time Series Model Selectionin Small Samples,” Biometrika, 76(2): 297-307.

Jordà, Òscar (2005) “Estimation and Inference of Impulse Responses by Local Projections,”American Economic Review, 95(1): 161-182.

Kurmann, André (2005) “Quantifying the Uncertainty about a Forward-Looking New Key-nesian Pricing Model,” Journal of Monetary Economics, 52(6): 1119-1134.

Kuersteiner, Guido M. (2005) “Automatic Inference for Infinite Order Vector Autoregres-sions,” Econometric Theory, 21: 85-115.

Lewis, R. A. and Gregory C. Reinsel (1985) “Prediction of Multivariate Time Series byAutoregressive Model Fitting,” Journal of Multivariate Analysis, 16(33): 393-411.

Levin, Andrew T. and John C. Williams (2003) “Robust Monetary Policy with CompetingReference Models,” Journal of Monetary Economics, 50(5): 945-975.

Lindé, Jesper (2005) “Estimating New Keynesian Phillips Curves: A Full Information Max-imum Likelihood Approach,” Journal of Monetary Economics, 52(6): 1135-1149.

Newey, Whitney K. and Daniel L. McFadden (1994) “Large Sample Estimation and Hy-pothesis Testing,” in Handbook of Econometrics, v. 4, Robert F. Engle and Daniel L.McFadden, (eds.). Amsterdam: North Holland.

Newey, Whitney K. and Kenneth D. West (1987) “A Simple, Positive Semi-Definite, Het-eroscedasticity and Autocorrelation Consistent Covariance Matrix,” Econometrica, 55:703-708.

Rotemberg, Julio J. and Michael Woodford (1997) “An Optimization-Based EconometricFramework for the Evaluation of Monetary Policy,” NBER Macroeconomics Annual, 297-346.

Rudd, Jeremy and Karl Whelan (2005) “New Test of the New-Keynesian Phillips Curve,”Journal of Monetary Economics, 52(6): 1167-1181.

Sbordone, Argia (2002) “Prices and Unit Labor Costs: Testing Models of Pricing Behavior,”Journal of Monetary Economics, 49(2): 265-292.

38

Smets, Frank and Raf Wouters (2003) “An Estimated Dynamic Stochastic General Equi-librium Model of the Euro Area,” Journal of the European Economic Association, 1(5):1123-1175.

Staiger, Douglas and James H. Stock (1997) “Instrumental Variables Regression with WeakInstruments,” Econometrica, 65(3): 557-586.

Stock, James H., Jonathan H. Wright and Motohiro Yogo (2002) “A Survey of Weak Instru-ments and Weak Identification in Generalized Method of Moments,” Journal of Businessand Economic Statistics, 20(4): 518-529.

Walsh, Carl E. (2003) Monetary Theory and Policy, second edition. Cambridge,Massachusetts: The MIT Press.

39

40

TABLE 1.1 – ARMA(1,1) Monte Carlo Experiments

T = 50 h = 2 h = 5 h = 10 ρ θ ρ θ ρ θ PMD Est. 0.23 0.49 0.25 0.44 0.31 0.28 SE 0.22 0.20 0.20 0.19 0.20 0.18 SE (MC) 0.31 0.27 0.21 0.20 0.22 0.28 MLE Est. 0.22 0.52 0.23 0.52 0.22 0.53 SE 0.21 0.18 0.20 0.18 0.20 0.18 SE (MC) 0.27 0.24 0.27 0.23 0.27 0.23 T = 100 h = 2 h = 5 h = 10 ρ θ ρ θ ρ θ PMD Est. 0.24 0.50 0.25 0.47 0.27 0.45 SE 0.15 0.14 0.15 0.13 0.14 0.13 SE (MC) 0.17 0.15 0.15 0.13 0.15 0.15 MLE Est. 0.25 0.51 0.24 0.51 0.24 0.50 SE 0.14 0.13 0.14 0.13 0.14 0.13 SE (MC) 0.15 0.13 0.16 0.14 0.14 0.14 T = 400 h = 2 h = 5 h = 10 ρ θ ρ θ ρ θ PMD Est. 0.25 0.51 0.25 0.50 0.25 0.50 SE 0.07 0.07 0.07 0.06 0.07 0.06 SE (MC) 0.08 0.07 0.07 0.07 0.07 0.07 MLE Est. 0.25 0.50 0.25 0.50 0.24 0.51 SE 0.07 0.06 0.07 0.07 0.07 0.06 SE (MC) 0.07 0.06 0.07 0.07 0.07 0.06 Notes: 1,000 Monte Carlo replications, 1st-stage regression lag length chosen automatically by AICC, SE refers to the standard error calculated with the PMD/MLE formula. SE (MC) refers to the Monte Carlo standard error based on the 1,000 estimates of the parameter. 500 burn-in observations disregarded when generating the data.

41


T = 50 h = 2 h = 5 h = 10 ρ Θ ρ θ ρ θ PMD Est. 0.46 0.23 0.47 0.17 0.49 0.15 SE 0.19 0.20 0.18 0.19 0.18 0.18 SE (MC) 0.23 0.23 0.21 0.22 0.20 0.28 MLE Est. 0.45 0.29 0.44 0.27 0.45 0.29 SE 0.20 0.20 0.20 0.21 0.20 0.20 SE (MC) 0.21 0.23 0.23 0.25 0.19 0.22 T = 100 h = 2 h = 5 h = 10 ρ Θ ρ θ ρ θ PMD Est. 0.48 0.23 0.47 0.23 0.50 0.23 SE 0.13 0.14 0.13 0.14 0.12 0.13 SE (MC) 0.15 0.16 0.14 0.16 0.13 0.18 MLE Est. 0.48 0.27 0.47 0.25 0.48 0.26 SE 0.14 0.14 0.14 0.15 0.13 0.14 SE (MC) 0.14 0.15 0.13 0.15 0.13 0.14 T = 400 h = 2 h = 5 h = 10 ρ Θ ρ θ ρ θ PMD Est. 0.50 0.5 0.49 0.26 0.49 0.25 SE 0.07 0.07 0.06 0.07 0.06 0.07 SE (MC) 0.07 0.08 0.07 0.08 0.06 0.07 MLE Est. 0.50 0.25 0.49 0.26 0.49 0.26 SE 0.07 0.07 0.07 0.07 0.07 0.07 SE (MC) 0.06 0.07 0.07 0.07 0.06 0.07 Notes: 1,000 Monte Carlo replications, 1st-stage regression lag length chosen automatically by AICC, SE refers to the standard error calculated with the PMD/MLE formula. SE (MC) refers to the Monte Carlo standard error based on the 1,000 estimates of the parameter. 500 burn-in observations disregarded when generating the data.

42


T = 50 h = 2 h = 5 h = 10 ρ θ ρ θ ρ θ PMD Est. -0.06 0.56 0.06 0.40 0.16 0.28 SE 0.36 0.32 0.27 0.25 0.25 0.22 SE (MC) 0.61 0.55 0.28 0.29 0.31 0.37 MLE Est. - - - - - - SE - - - - - - SE (MC) - - - - - - T = 100 h = 2 h = 5 h = 10 ρ θ ρ θ ρ θ PMD Est. -0.03 0.54 0.04 0.45 0.09 0.41 SE 0.24 0.21 0.19 0.18 0.19 0.17 SE (MC) 0.33 0.30 0.21 0.21 0.22 0.23 MLE Est. - - - - - - SE - - - - - - SE (MC) - - - - - - T = 400 h = 2 h = 5 h = 10 ρ θ ρ θ ρ θ PMD Est. -0.01 0.51 0.00 0.50 0.02 0.48 SE 0.11 0.10 0.10 0.09 0.10 0.09 SE (MC) 0.11 0.10 0.10 0.09 0.09 0.09 MLE Est. 0.04 0.50 0.00 0.50 0.00 0.50 SE 0.10 0.09 0.10 0.09 0.10 0.08 SE (MC) 0.10 0.09 0.10 0.09 0.09 0.08 Notes: 1,000 Monte Carlo replications, 1st-stage regression lag length chosen automatically by AICC, SE refers to the standard error calculated with the PMD/MLE formula. SE (MC) refers to the Monte Carlo standard error based on the 1,000 estimates of the parameter. 500 burn-in observations disregarded when generating the data.

43


T = 50 h = 2 h = 5 h = 10 ρ θ ρ θ ρ θ PMD Est. 0.47 0.04 0.43 0.03 0.54 -0.10 SE 0.28 0.30 0.24 0.26 0.21 0.23 SE (MC) 0.40 0.40 0.24 0.26 0.24 0.30 MLE Est. - - - - - - SE - - - - - - SE (MC) - - - - - - T = 100 h = 2 h = 5 h = 10 ρ θ ρ θ ρ θ PMD Est. 0.49 0.01 0.45 0.03 0.53 -0.04 SE 0.19 0.20 0.17 0.18 -.15 0.17 SE (MC) 0.20 0.20 0.19 0.19 0.18 0.21 MLE Est. 0.49 -0.02 0.47 0.03 0.47 0.03 SE 0.17 0.20 0.18 0.20 0.18 0.20 SE (MC) 0.18 0.20 0.19 0.20 0.18 0.20 T = 400 h = 2 h = 5 h = 10 ρ θ ρ θ ρ θ PMD Est. 0.50 0.01 0.50 0.00 0.50 0.00 SE 0.09 0.10 0.08 0.09 0.08 0.09 SE (MC) 0.09 0.10 0.10 0.10 0.10 0.11 MLE Est. 0.49 0.01 0.49 0.01 0.48 0.02 SE 0.09 0.10 0.09 0.10 0.09 0.10 SE (MC) 0.09 0.10 0.09 0.10 0.09 0.10 Notes: 1,000 Monte Carlo replications, 1st-stage regression lag length chosen automatically by AICC, SE refers to the standard error calculated with the PMD/MLE formula. SE (MC) refers to the Monte Carlo standard error based on the 1,000 estimates of the parameter. 500 burn-in observations disregarded when generating the data.

44

Table 2.1.1 Monte Carlo Comparison: GMM vs. PMD.

Case 1:

D.G.P.

Instrument list excludes Rt

PMD GMM PMD GMM Benchmark h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6

0.703 (0.038)

0.689 (0.030)

0.697 (0.041)

0.678 (0.036)

0.718 (0.026)

0.692 (0.021)

0.705 (0.032)

0.689 (0.030)

0.297 (0.038)

0.312 (0.071)

0.303 (0.041)

0.322 (0.036)

0.281 (0.026)

0.307 (0.021)

0.295 (0.032)

0.311 (0.030)

0.127 (0.097)

0.102 (0.030)

0.108 (0.119)

0.098 (0.102)

0.078 (0.036)

0.074 (0.027)

0.078 (0.048)

0.064 (0.045)

PMD GMM PMD GMM Lagged PC h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6

0.332 (0.411)

0.503 (0.135)

0.507 (0.350)

0.267 (0.219)

0.422 (0.313)

0.453 (0.109)

0.698 (0.295)

0.369 (0.196)

0.140 (0.100)

0.160 (0.051)

0.080 (0.082)

0.098 (0.075)

0.154 (0.079)

0.192 (0.040)

0.071 (0.072)

0.116 (0.064)

-0.063 (0.216)

-0.071 (0.100)

-0.039 (0.184)

-0.068 (0.174)

0.107 (0.087)

0.079 (0.042)

0.082 (0.083)

0.118 (0.078)

PMD GMM PMD GMM Lagged IS h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6

0.708 (0.041)

0.682 (0.030)

0.697 (0.043)

0.680 (0.036)

0.714 (0.026)

0.684 (0.019)

0.704 (0.031)

0.682 (0.029)

0.291 (0.041)

0.317 (0.030)

0.303 (0.043)

0.320 (0.036)

0.285 (0.026)

0.316 (0.019)

0.296 (0.031)

0.317 (0.029)

0.082 (0.151)

0.082 (0.101)

0.067 (0.189)

0.043 (0.151)

0.043 (0.052)

0.043 (0.036)

0.040 (0.065)

0.037 (0.062)

Notes: 1,000 Monte Carlo replications. Each run initialized with 500 burn-in replications later disregarded. Sample size T = 200. Monte Carlo median values of the parameter estimates and the associated standard errors reported. “Lagged PC” refers to when the DGP consists of a Phillips Curve with first and second lag inflation terms. Similarly, “Lagged IS” refers to when the DGP consists of a Phillips curve with first and second lag output gap terms. h = 2 uses the first 2 horizons of the impulse response function when estimating with PMD (to obtain over-identification) and corresponds to using the first two lags of the variables as instruments when estimating by GMM. h* refers to the optimal horizon selected by Hall et al.’s (2007) information criterion and varies with the model. “Benchmark” and “Lagged IS” cases impose . “Lagged PC” case estimates these parameters unconstrained (since in the DGP but ).

45


Case 1:

D.G.P.

Instrument list includes Rt


0.710 (0.037)

0.684 (0.031)

0.969 (0.054)

0.919 (0.046)

0.720 (0.026)

0.694 (0.021)

0.896 (0.037)

0.860 (0.035)

0.290 (0.037)

0.316 (0.031)

0.031 (0.054)

0.080 (0.046)

0.280 (0.026)

0.306 (0.021)

0.104 (0.037)

0.140 (0.035)

0.130 (0.094)

0.110 (0.070)

-0.427 (0.165)

-0.332 (0.134)

0.076 (0.036)

0.076 (0.027)

-0.024 (0.059)

-0.019 (0.055)


0.318 (0.412)

0.491 (0.139)

2.303 (0.690)

1.094 (0.206)

0.410 (0.314)

0.453 (0.113)

2.373 (0.591)

1.172 (0.192)

0.136 (0.103)

0.159 (0.051)

-0.178 (0.194)

0.001 (0.075)

0.166 (0.083)

0.186 (0.041)

-0.219 (0.174)

-0.013 (0.068)

-0.062 (0.213)

-0.063 (0.099)

-0.008 (0.456)

0.022 (0.173)

0.104 (0.091)

0.083 (0.043)

-0.133 (0.199)

0.034 (0.080)


0.715 (0.041)

0.686 (0.030)

1.04 (0.082)

0.957 (0.058)

0.715 (0.026)

0.686 (0.020)

0.921 (0.040)

0.879 (0.036)

0.285 (0.041)

0.314 (0.030)

-0.044 (0.082)

0.042 (0.,058)

0.284 (0.026)

0.314 (0.020)

0.079 (0.040)

0.121 (0.036)

0.076 (0.154)

0.082 (0.102)

-1.180 (0.374)

-0.809 (0.245)

0.051 (0.052)

0.051 (0.036)

-0.125 (0.088)

-0.123 (0.080)


46


Case 2:

D.G.P.

Instrument list excludes Rt


0.509 (0.035)

0.510 (0.028)

0.516 (0.052)

0.523 (0.046)

0.569 (0.027)

0.543 (0.021)

0.557 (0.037)

0.550 (0.035)

0.491 (0.035)

0.490 (0.028)

0.484 (0.052)

0.476 (0.046)

0.431 (0.027)

0.456 (0.021)

0.443 (0.037)

0.450 (0.035)

0.251 (0.040)

0.220 (0.032)

0.229 (0.073)

0.197 (0.064)

0.159 (0.027)

0.164 (0.022)

0.159 (0.042)

0.148 (0.040)


0.356 (0.385)

0.509 (0.124)

0.619 (0.328)

0.318 (0.205)

0.556 (0.166)

0.560 (0.079)

0.888 (0.207)

0.581 (0.152)

0.228 (0.113)

0.251 (0.052)

0.084 (0.098)

0.131 (0.078)

0.280 (0.071)

0.345 (0.036)

0.085 (0.078)

0.187 (0.065)

0.079 (0.119)

0.040 (0.050)

0.028 (0.106)

0.061 (0.088)

0.121 (0.063)

0.088 (0.033)

0.050 (0.086)

0.141 (0.069)


0.547 (0.057)

0.535 (0.032)

0.534 (0.054)

0.542 (0.045)

0.571 (0.030)

0.557 (0.019)

0.559 (0.036)

0.554 (0.033)

0.452 (0.057)

0.465 (0.032)

0.466 (0.054)

0.457 (0.045)

0.429 (0.030)

0.442 (0.019)

0.441 (0.036)

0.446 (0.033)

0.126 (0.103)

0.145 (0.048)

0.142 (0.093)

0.118 (0.078)

0.093 (0.041)

0.109 (0.025)

0.098 (0.049)

0.090 (0.046)


47


Case 2:

D.G.P.

Instrument list includes Rt


0.509 (0.035)

0.505 (0.028)

0.931 (0.053)

0.887 (0.046)

0.570 (0.027)

0.545 (0.021)

0.793 (0.035)

0.765 (0.0

Estimation and Inference by the Method of Projection...

Documents

Transcript of Estimation and Inference by the Method of Projection...