Estimation and Inference by the Method of Projection...
Transcript of Estimation and Inference by the Method of Projection...
-
This version, October 2008
Estimation and Inference by the Method of Projection Minimum
Distance: An Application to the New Keynesian Hybrid Phillips Curve∗
Abstract
In most macroeconomic models, the stability of the solution path implies that the system is covariance-stationary and hence admits a Wold representation. The ability to estimate this Wold representationsemi-parametrically by local projections (Jordà, 2005), even when the solution path’s process is unknownor unconventional, can be exploited to estimate the model’s parameters by minimum distance techniques.We label this two-step, least-squares, estimation procedure “projection minimum distance” (PMD) andshow that: (1) it is consistent and asymptotically normal for a large class of problems; (2) it is efficienteven in relatively small samples; and (3) it is asymptotically equivalent to maximum likelihood andnests most applications of generalized method of moments as a special case. Although PMD is a generalmethod, we investigate its properties in the context of the New Keynesian hybrid Phillips curve, providingample Monte Carlo evidence and revisiting Fuhrer and Olivei’s (2005) empirical analysis to an illustrativeapplication.
• Keywords: impulse response, local projection, minimum chi-square, minimum distance.• JEL Codes: C32, E47, C53.
Òscar JordàDepartment of EconomicsUniversity of California, DavisOne Shields Ave.Davis, CA 95616e-mail: [email protected]
Sharon KozickiResearch DepartmentBank of Canada234 Wellington StreetOttawa, Ontario, CanadaK1A 0G9e-mail: [email protected]
∗The views expressed herein are solely those of the authors and do not necessarily reflect the views of the Bankof Canada. We thank Colin Cameron, Timothy Cogley, David DeJong, Richard Dennis, Stephen Donald, DavidDrukker, Jeffrey Fuhrer, James Hamilton, Peter Hansen, Kevin Hoover, Giovanni Olivei, Peter Robinson, PaulRuud, Frank Schorfheide, Aaron Smith, Harald Uhlig, Frank Wolak and seminar participants at the Bank of Italy,Bocconi Univeristy - IGIER, Duke University, the Federal Reserve Bank of Dallas, the Federal Reserve Bank ofPhiladelphia, the Federal Reserve Bank of New York, Federal Reserve Bank of San Francisco, Federal ReserveBank of St. Louis, Southern Methodist University, Stanford University, the University of California, Berkeley, theUniversity of California, Davis, the University of California, Riverside, the University of Houston, University ofKansas, the University of Pennsylavania, the University of Texas at Austin, and the 2006 Winter Meetings of theAmerican Economic Association in Boston, the 2006 European Meetings of the Econometric Society in Vienna, andthe 3rd Macro Workshop in Vienna, 2006 for useful comments and suggestions. Jordà is thankful for the hospitalityof the Federal Reserve Bank of San Francisco during the preparation of this paper.
-
1 Introduction
Econometric estimation of dynamic stochastic (partial or general) equilibrium models requires
that practitioners confront the limits that model tractability impose on the universe of variables
and on the wealth of dynamic interactions observed in reality. Approaches based on model implied
likelihoods, be it through the classical (e.g. Canova, 2007) or through the Bayesian (e.g. An and
Schorfheide, 2007) approaches, are only sensible with sufficiently complex and complete models
that narrow this separation with reality, and when suitable sources of exogenous variation are
properly ascertained. The inability to conduct controlled experiments in macroeconomics and
the capriciousness of natural or quasi-natural experiments often limit a practitioner’s choice to
estimation strategies based on appropriate instrumental variables techniques.
This paper introduces a statistical method of parameter estimation in which the economic
model’s restrictions are cast against a flexible, semi-parametric representation of the data based
on its Wold (or impulse response) representation. The estimation methodology is particularly well-
suited for models designed to capture dynamic comovement, such as real business cycle models or
new Keynesian specifications, whose performance if often evaluated based on their ability to match
persistence and cross-correlation properties of macroeconomic data. The objective is to obtain
parameter estimates that are robust to incomplete characterizations of the dynamics and/or the
forcing variables that the behavioral model is trying to explain. The result is a minimum-distance
estimator that is computationally simple, whose asymptotic properties we fully derive, and whose
relation to maximum likelihood (ML) and other minimum distance estimators available (such as
the generalized method of moments or GMM, Sbordone’s 2002 forecast matching estimator, and
the impulse response matching estimator in Rotemberg and Woodford, 1997; more recently used
in Christiano, Eichenbaum and Evans, 2005) we establish.
Perhaps it is useful to frame our discussion in the context of the voluminous literature that
investigates inflation dynamics (e.g. volume 52 of the Journal of Monetary Economics in 2005 was
1
-
exclusively dedicated to estimation of the Phillips curve). A critical divide in this literature appears
to emerge between proponents of limited information, single-equation, instrumental-variable based
methods (primarily in that issue Galí, Gertler and López-Salido, 2005) versus its critics and
proponents of full information methods (e.g. Kurmann, 2005; Lindé, 2005; Rudd and Whelan,
2005), where often times, a complete, New Keynesian formulation of the economy is required. It
is not difficult to grasp that central to this line of research is the desire to determine from the
data the degree of backward/forward looking behavior of the Phillips curve because it is so central
in determining optimal monetary policy responses, sacrifice ratios, and the stability of competing
policy prescriptions (see e.g., Levin and Williams, 2003).
Common arguments against GMM have to do with fears about poor small sample properties,
and weak instrument problems. Our estimator addresses some of these issues but instead, we
wish to highlight a far more fundamental issue that has been previously neglected. For expository
purposes, consider a researcher that is interested in estimating the following generic regression
(presumably, a representation of a fundamental relation derived from an economic model):
y = Y β + u (1)
where Y includes endogenous variables and possibly exogenous or predetermined variables, and
where z are a set of proposed instruments (that will include whatever variables in Y are exogenous
or predetermined). Instead, suppose the true data generating process (DGP) is characterized by
y = Y β + xγ + ε (2)
where x is a vector of omitted (exogenous or predetermined) variables. Here the omission of
the variables x is motivated by the researcher’s express belief in the structural nature of the
relationship in (1), not by their unavailability. Even when the z are valid instruments for (2),
they will not be valid instruments for (1) in general if E(z0x) 6= 0 and γ 6= 0 since the validity
of z depends on E(z0u) = 0 and in this case E(z0u) = E(z0x)γ + E(z0ε) = E(z0x)γ 6= 0. Thus,
2
-
z are valid instruments from the perspective of the DGP in (2), not the proposed model (1).
This problem is particularly acute in macroeconomics and the common practice of estimating
Euler equations with GMM using lagged endogenous variables as instruments. As is clear from
the preceding discussion, lagged endogenous variables can become illegitimate instruments when
there is omitted feedback and/or omitted variables in the Euler expression.
A natural solution is to orthogonalize the instrument set z with respect to the possibly omitted
variables x. Hence consider a first stage regression
z = xδ + v
where the residuals of this regression, v, (rather than the predicted values, as is commonly done in
two-stage least-squares) are proper instruments for the model in expression (1). It turns out that
the estimator that we propose achieves similar instrument pre-treatment in a manner that can
be exploited to examine model misspecification and that can be seen as a direct generalization
of the typical GMM estimator. The paper presents our methods using examples based on the
Phillips curve as its backdrop but it should be clear from our presentation that our methods are
not limited to the examples that we provide. In fact, we derive the statistical properties of our
estimator under general assumptions that include possibly non-linear system’s estimation.
2 Projection Minimum Distance
The dynamics of many macroeconomic models often depend on expectations about their future
values. This is the natural consequence of models with rational expectations or many models with
learning mechanisms. Furthermore, the relative significance of forward versus backward looking
terms is of considerable importance in determining optimal policy responses — the stability of the
solution paths and the economy often depend on this feature. Unfortunately, because expectations
are based on the same information set that determines backward-looking behavior, it is empirically
3
-
difficult to disentangle which type of behavior is dominant. Single-equation, limited-information
estimation methods therefore require appropriate instrumental variables, while full-information
approaches based on the likelihood (classical or bayesian) require complete and correctly specified
models of the economy that describe how available information is allocated.
This section presents the mechanics of our estimation method using as a backdrop the desire
to estimate a New Keynesian hybrid Phillips curve. We do this because determining the relative
degree of forward versus backward looking behavior plays such a pivotal role in designing optimal
monetary policy (see, e.g., Walsh, 2003). Further, the Phillips curve is one of the pillars on
which standard New Keynesian DSGE models are erected, and as we will show, our method is
conveniently scalable to estimate such systems.
The majority of current Phillips curve specifications are derived by imposing a friction on
a firm’s ability to adjust its price optimally (see, e.g. Calvo, 1983; Galí and Gertler, 1999;
Christiano, Eichenbaum and Evans, 2005; Eichenbaum and Fisher, 2007; to cite a few). The usual
set-up involves a continuum of monopolistically competitive, intermediate goods producing firms
that rent capital and labor in perfectly competitive factor markets. Depending on the choice of
friction, optimal price-setting rules depend on expectations of future aggregate prices and marginal
costs (or, under some further assumptions, the gap between actual and potential output).
We begin from a less ambitious theoretical vantage point and instead consider a common
formulation of New Keynesian monetary models, specifically
πt = γfEtπt+1 + γbπt−1 + γggt + επ,t (3)
gt = βfEtgt+1 + βbgt−1 − βr(Rt −Etπt+1) + εg,t (4)
Rt = (1− ρ) (ωππt + ωggt) + ρRt−1 + εR,t (5)
where the first equation is the New Keynesian hybrid Phillips curve with πt the aggregate inflation
rate, gt the output gap, and where the restriction γf + γb = 1 is commonly imposed as a result of
4
-
the theory; the second equation is the aggregate demand or IS curve with Rt the nominal interest
rate; and the third equation is the standard Taylor rule with interest rate smoothing. Such a
formulation has been studied extensively by Clarida, Galí and Gertler (1999) and more recently
by Lindé (2005) for a comparative study of the properties of GMM versus FIML estimation of
the Phillips curve in (3). In fact, we will use an extended formulation of this model to generate
Monte Carlo simulations in section 5.
In what follows we focus our attention to estimation of expression (3) exclusively. We do this
because it makes exposition of the mechanics of our estimator easier to understand but also to
highlight some of the properties of our estimator when used in a limited-information context.
It should be clear from our presentation how one would instead do full-information system’s
estimation and indeed, the formal derivation of the large sample properties in section 4 is done
under this more general assumption.
The familiar stable solution path of the system of equations (3)-(5) can be expressed as
yt = Ayt−1 + Cεt
where yt ≡ ( πt gt Rt)0; εt ≡ (επ,t εg,t εR,t)0 and A and C are coefficient matrices whose
values are nonlinear functions of the structural parameters©γf , γb, γg;βf ,βb,βR; ρ,ωπ,ωg
ª. De-
fine the resulting reduced-form residuals vt = Cεt, then this stable solution path admits a reduced-
form Wold representation given by
yt =∞Xh=0
Bhvt−h
where B0 = I; Bh are the reduced-form moving-average or impulse response coefficient matrices
and E (vtv0t) = Ωv = CΩεC0, where Ωε is a diagonal matrix. More generally, whether or not the
solution path has this convenient VAR(1) form is not important. What is important is that the
stability of the solution (which, for example, in other models has a VARMA form instead, see
5
-
e.g. Fernández Villaverde, Rubio Ramírez, Sargent and Watson 2007) ensures the existence of a
reduced-form Wold representation.
We also wish to highlight that we focus on the reduced-form representation because in practice,
there is usually no formal statistical procedure to verify commonly used structural identification
assumptions (such as the ubiquitous short-run or long-run recursive schemes). Further Fernández-
Villaverde et al. (2007) highlight the dangers of imposing incorrect identification assumptions
when estimating structural parameters with impulse response matching estimators. Our focus on
the reduced-form representation is a departure from what is common practice in the literature
(see, e.g. Christiano, Eichenbaum, and Evans, 2005) but a departure that we deem particularly
advantageous to the extent that the model’s parameters can be estimated from information about
the serial correlation properties of the data (which are unambiguous) rather than from the con-
temporaneous correlation between the variables in the system, where the direction of causation is
much harder to establish formally and is prone to generate inconsistent estimates.
The mechanics of our estimation method, which we call projection minimum distance (PMD),
are broadly described as follows. First, we obtain estimates of the first H elements Bh of the
Wold decomposition with local projections (Jordà, 2005). Second, substitute the variables in
expression (3) by their Wold representation to obtain a mapping between the Bh (for which
first stage estimates by local projections will now be available) and the parameters of interest,
γ ≡ (γf γb γg)0 . Minimize an appropriately weighted distance function to obtain consistent
and asymptotically normal estimates of γ. We explain these two steps in more detail.
Using matrix notation to facilitate the explanation and practical implementation of the esti-
mator, let X be a T 0 × n matrix where T 0 = T −H − k and where n is the number of variables
in the system (e.g. n = 3 in the example of expressions (3)-(5)). This matrix stacks the obser-
vations {πt gt Rt}T−Ht=k+1, let Y be a T0H × n matrix that stacks the H, T 0 × n matrices of
observations {πt+h gt+h Rt+h}Tt=H+k+1 for h = 1, ...,H and let Z collect T0×nk observations
6
-
corresponding to the k lags {πt−1 gt−1 Rt−1 ... πt−k gt−k Rt−k}T−H=kt=1 . Then, if B
is the nH×n matrix that stacks the h = 1, ...,H matrices Bh, it is easily estimated with the least
squares formula
bBT = ³I ⊗ (X 0MX)−1´ (I ⊗ (X 0MY )) (6)where M = I −Z(Z0Z)−1Z0 and where the covariance matrix of bbT = vec(bBT ) can be computedas
bΩb = bΨb ⊗ (X 0MX)−1, (7)where
bΨb = HXh=1
Φ0hη0η
(T −H − k)Φh,
Φh =
µ0 ... 0 I B1 ... BH−h−1
¶and η is the T ×n matrix of residuals of the local projection of yt+1 on to yt. In section 4 we will
show formally that
√T −H − k
³bbT − b0´ d→ N (0,Ωb)under rather general assumptions about the underlying data generating process.
The second stage consists of replacing πt and gt by their Wold expressions in expression (3).
This delivers the following mapping with the parameters of interest:
Bhi1 = Bh+1i1γf +Bh−1i1γb +Bhi2γg h = 1, ...,H (8)
where ij refers to the jth column of the identity matrix I. Given first stage estimates bBh andthe linear relation between these and the γ, formal estimates of the latter can be conveniently
calculated by least squares.
7
-
More formally, let S0, Sf , Sb be appropriate selector matrices such that, using the first stage
estimates bBh, expression (8) can be cast simultaneously for every h = 1, ...,H as
f(bbT ;γ) = hS0 bBT i1 − (Sf bBT i1 Sb bBT i1 S0 bBT i2)γithen consistent and asymptotically normal estimates of γ can be obtained by minimizing
minγQ(bbT ;γ) =f(bbT ;γ)0cWf(bbT ;γ)
where cW = ( bF 0bbΩb bFb)−1 and bFb = ∂f(bbT ;bγT )∂b . Hence, if one defines bBY ≡ S0 bBT i1 and bBX ≡( Sf bBT i1 Sb bBT i1 S0 bBT i2 ), the parameters of the Phillips curve in expression (3) can beestimated as:
bγT = ³bB0XcW bBX´−1 ³bB0XcW bBY ´ , (9)with covariance matrix
bΩγ = ³bB0XcW bBX´−1 . (10)Section 4 shows formally that for general problems
√T −H − k (bγT − γ0) d→ N (0,Ωγ)
where Ωγ = (F 0γWFγ)−1 and Fγ =
∂f(b;γ )∂γ . In other words, our estimator can be summarized by
the following two least-squares steps:
bBT = ³I ⊗ (X 0MX)−1´ (I ⊗ (X 0MY ))bγT = ³bB0XcW bBX´−1 ³bB0XcW bBY ´
and the covariance matrix of bγT computed as8
-
bΩγ = ³bB0XcW bBX´−1 .Several remarks deserve mention. First, the optimal weighting matrix cW described above
can be replaced with the identity matrix and still obtain consistent estimates of γ. This is called
the equal-weights estimator. The minimum distance literature (see Cameron and Trivedi, 2005)
suggests that the equal-weights estimator, although less efficient, has lower small-sample bias
when the sample size is specially short. Second, the optimal weighting matrix is a function of
bγT itself and hence (9) is not directly feasible. Although one could use a continuously updatedestimator, a simpler (and asymptotically equivalent) solution is to obtain bγEWT from the feasibleequal-weights estimator to construct the optimal weighting matrix and then obtain the optimal
weights estimator bγOWT and its covariance matrix. Third, when the optimal weights estimator isused and dim(f(bbT ;γ)) > dim(γ) then section 4 shows that
Q(bbT ; bγT ) d→ χ2dim(f(bbT ;γ ))−dim(γ )which provides a test of overidentifying restrictions (and hence model misspecification) along the
same lines as the J-test commonly used in GMM.
Minimum distance approaches are not new in macroeconomics. Although it is very rare to
find formal derivations of the statistical properties of these estimators (e.g. minimization of struc-
tural impulse response distances as in Rotemberg and Woodford, 1997; Christiano, Eichenbaum
and Evans, 2005; of minimization of VAR forecast distances as in Sbordone, 2002; 2005) this
is not where we see our most important contribution. Instead, the semi-parametric nature of
the first-stage allows us to be quite general and agnostic about the underlying DGP (which as a
consequence, includes VARMA specifications, for example).
This generality is useful in several respects. Like GMM (and unlike MLE) our method does not
require solving for the rational-expectations equilibrium and then selecting the appropriate stable
9
-
roots (we only require that the solution be stable so that we can invoke the Wold representation
theorem). Further, when the Euler expressions are linear, our estimator boils down to two simple
GLS-type steps. In addition, the flexibility of the first stage has several important payoffs with
respect to GMM.
First, in many covariance-stationary processes, the rate at which Bh → 0 as h → ∞ is quite
fast (exponential, typically) and hence, although in finite samples we truncate at some horizon H,
our estimator is almost as efficient as MLE (an example of which is provided in our Monte Carlo
experiments in section 5). The choice of truncation H in practice can be determined conveniently
with Hall, Inoue, Nason and Rossi’s (2007) information criterion, which is
bH = arg minH∈{hmin,...,hmax}
ln³¯̄̄bΩγ ¯̄̄´+ h ln
³pT/k
´³p
T/k´ (11)
where hmin is such that dim(f(bbT ;γ)) = dim(γ).Second, by assuming a Wold representation for yt, we are able to obtain closed-form analytic
expressions for the optimal weighting matrix cW rather than having to use a semi-parametricestimate such as Newey-West as is common in GMM. This results in obvious gains in efficiency
of the estimates as we shall see in the Monte Carlo experiments of section 5. Third, it turns out
that our estimator can be seen as a version of GMM that embeds a recursive pre-treatment of
potentially illegitimate instruments due to feedback, a feature that we will exploit to check for
model misspecification and that we elaborate on in more detail below. Finally, notice that the
method is fully scalable to systems and to nonlinear specifications with little difficulty.
3 Illegitimate Instruments
Micro-founded models of the macroeconomy distill a rich economic environment with many vari-
ables and a plethora of interactions into a few key relations that allow us to understand the
fundamental forces that drive the economy. The equilibrium conditions characterized by the re-
sulting Euler equations therefore impose considerable restrictions in the dynamic specifications
10
-
and included variables. Further, often times the best (or even the only) instruments available to
estimate such relations are lags of the endogenous variables specified in these expressions. This
section shows that the validity of these instruments depends on the data, not on Euler conditions
specified by the economic model and as a result, unmodeled dynamics and /or omitted variables
generate illegitimate instruments due to feedback and inconsistent GMM parameter estimates.
One solution would be to enrich the economic model to account more completely for the
features of the data and certainly many new models (e.g. An and Schorfheide, 2007; Christiano,
Eichenbaum and Evans, 2005; Smets and Wouters, 2003) have taken this approach while trying to
preserve enough tractability and the original economic insights of simpler models. However, it is
difficult to extend this technique as a general (albeit desirable) principle and the fact remains that
many popular Euler expressions fall well short of properly characterizing the statistical properties
of the data.
Here we show that a more practical solution consists in projecting the Euler conditions on
to the space of likely omitted dynamics/variables or alternatively, projecting the instruments
themselves onto this same space. We will show that one of the advantages of our estimation
method over GMM is due to this feature. Specifically, let us return to the example we presented
in the introduction, where a researcher is interested in estimating the expression
y = Y β + u (12)
where y is the dependent variable, Y are endogenous variables, and z are candidate instruments.
Notice that Y could contain other exogenous or predetermined variables in which case, they would
be included directly into z so that expression (12) is quite general.
As an example, suppose we are interested in estimating a Phillips curve with forward-looking
terms only (see, e.g. Galí, Gertler, and López-Salido, 2001), say
πt = βEtπt+1 + ut (13)
where for reasons that will become clear momentarily, we have omitted the usual term associated
11
-
with demand (e.g. marginal costs, the output gap, etc.). Instead, suppose the DGP is characterized
by
y = Y β + xγ + ε (14)
where x are exogenous and/or predetermined variables (such as other lags of y). Here the key is
to realize that E(z0x) 6= 0 and γ 6= 0 so that the z are invalid instruments in expression (12)
although they would be perfectly valid for (14).
In terms of the simple Phillips curve example, suppose the DGP is
πt = γfEtπt+1 + γbπt−1 + εt (15)
instead of that specified in expression (13). LetM ≡ I−x(x0x)−1x0 and notice that EL(z0Mx) = 0
where EL is the linear projection operator. Hence, if one is interested in estimating expression
(12) one could pursue two alternatives. One is to run the first stage regression
z = xφ+ ezand use ez (which are the residuals, not the predicted values as the typical two-stage least-squaresprocedure) as regular instruments in (12). Equivalently, one can project (12) on to the space of x
and estimate β from
ey = eY β + εusing z as instruments and where ey and eY are the residuals of the projections of y and Y onto x.In a nonlinear context, of course, the latter projection argument breaks down and the first option
is clearly more appropriate even if it is approximate.
We return now to the link between our discussion, GMM estimation and estimation by PMD.
Consider the running example given by expressions (13) and (15) where in particular, a researcher
12
-
estimates expression (13) using πt−h as an instrument. It is easy to see that
bβGMM = PTh πt−hπtPTh πt−hπt+1
= γf + γb
PTh πt−hπt−1PTh πt−hπt+1
+
PTh πt−hεtPT
h πt−hπt+1
and under typical assumptions
bβGMM p→ γf + γbφh−1φh+1where φh = cov(πt,πt−h). Hence bβ is an inconsistent estimate of γf as long as γb 6= 0 and thebias does not disappear by choosing later lags of πt since φh−1/φh+1 becomes indeterminate as h
grows.
Instead PMD suggests estimating β by choosing bβPMD such thatbbh = βPMDbbh+1
where Mt−h = I −Xt−h(X 0t−hXt−h)−1Xt−h with Xt−h = (1,πt−h−1, ...,πt−h−k)0,
bbh = PTh πt−hMt−hπtPTh πt−hMt−hπt−h
and hence
bβPMD = PTh πt−hMt−hπtPTh πt−hMt−hπt+1
specifically,
bβPMD = γf + γbPTh πt−hMt−hπt−1PTh πt−hMt−hπt+1
+
PTh πt−hMt−hεtPT
h πt−hMt−hπt+1
so that clearly
bβPMD p→ γf + γb δh−1δh+113
-
where δh is the conditional covariance between πt and πt−h and therefore δh → 0 as h→ 0 (with
positively serially correlated data).
In other words, the local projection step automatically projects the instrument, πt−h onto a
sub-space of omitted dynamics (throughXt−h), thus decreasingly sterilizing the sources of feedback
that make πt−h an illegitimate instrument. The smaller h is, the more the instrument is sterilized
and the smaller the bias (all the way down to zero in the limit). As a consequence, a natural
and complementary way to investigate model misspecification is by plotting estimates bβPMD asa function of h. If the model is correctly specified, the bβPMD(h) will be approximately the samefor any h. Otherwise, fluctuations in bβPMD(h) will be symptomatic of dynamic misspecificationwith the bβPMD(h) estimated with the smallest values of h being the less precise but the moreconsistent estimates of γf .
We conclude this section by remarking that estimates of the optimal weighting matrix in
GMM (a key element in constructing efficient standard errors) are notoriously problematic: non-
parametric spectral density estimators at frequency zero tend to have poor small sample properties
(see e.g. Christiano and Den Haan, 1996). In contrast, the assumption that the data has a Wold
representation allows us to provide a simple, analytic expression for the estimate of this matrix
with good small sample properties (given the general assumptions in the propositions we present
below).
Finally, we comment on the relationship between PMD and MLE by observing that the Wold
representation, under the common assumption of Gaussianity, is a complete representation of all
the data’s second order properties and hence, as the truncation horizon H →∞, PMD approaches
MLE. A similar result exists for GMM where if one were to use infinite moment conditions, then
one would recover MLE’s lower efficiency bound. However, most covariance-stationary processes
exhibit exponential rates of serial correlation that decay toward zero (just think of an AR(1) with
parameter 0.5 and its impulse response, which is 0.5, 0.25, 0.125, 0.0625, ...) and in practice one
can achieve similar parameter estimation efficiency to MLE with relatively small values of H as
14
-
the small Monte Carlo experiments of section 5 demonstrate.
4 Statistical Properties of PMD
This section derives the large sample approximate properties of our estimator in a general setting.
For this reason, the notation is slightly different than the notation in section 2. We begin by
showing that the first-stage local projection estimates are consistent and asymptotically normal
under general conditions and then show that the second stage estimators are also consistent and
asymptotically normal.
4.1 Asymptotic Properties of Local Projections: First Stage
Suppose the n× 1 vector yt is covariance-stationary with Wold representation given by
yt = μ+∞Xj=0
Bjut−j (16)
and where the ut are i.i.d., mean zero with finite covariance matrix Σu and the Bj satisfyP∞j=0 ||Bj || < ∞ where ||Bj ||2 = tr(B0jBj) with B0 = In. Further, assume det{B(z)} 6= 0 for
|z| ≤ 1 where B(z) =P∞j=0Bjz
j so that the process can be written in its infinite VAR represen-
tation
yt =∞Xj=1
Ajyt−j + ut
withP∞j=1 ||Aj ||
-
yt+h = Ah1yt + ...+A
hkyt−k+1 + vk,t+h
vk,t+h =∞X
j=k+1
Ahj yt−j + ut+h +h−1Xj=1
Bjut+h−j
Proposition 1 Consistency. Let {yt} satisfy (16) and assume that:
(i) E|uit, ujt, ukt, ult|
-
(ii) k satisfies
k3
T→ 0;T, k →∞
(iii) k satisfies
√T − k −H
∞Xk+1
||Aj ||→ 0;T, k →∞
Then
√T − k −Hvec(bBT −B0) d→ N (0,Ωb)
Ωb =£(X 0MX)−1 ⊗ Σv
¤bΣv = V V 0
T − k −H
where recall that bBT = (X 0MX)−1 (X 0MY ), Y is the T × nH matrix of observations for(yt+1, ...,yt+H)
0;X is the T × n matrix of observations for yt;M = I − Z(Z0Z)−1Z0 where Z
is the T × n(k + 1) matrix of observations for (1,yt−1, ...,yt−k+1)0 and bV =MY −MX bBT . Theproof is provided in the appendix. Notice that we have modified the dimensions of bBT with respectto section 2 to make the derivations here and in the appendix more straight-forward but without
loss of generality.
4.2 Statistical Properties of Projection Minimum Distance: Second
Step
Given bBT (and hence bbT = vec(bBT )) consider estimating γ as described in section 2 by minimizingminγbQT (bbT ;γ) =f ³bbT ;γ´0cWf ³bbT ;γ´
Let Q0(γ) denote the objective function at b0. Then the following lemma shows that the solution
of this problem, bγT is consistent for γ0.Lemma 3 Consistency. Given that bbT p→ b0 from proposition 1, assume that:(i) cW p→W is a positive semidefinite matrix.
17
-
(ii) Q0(γ) is uniquely maximized at (b0,γ0) = θ0 ∈ Θ
(iii) The parameter space Θ is compact
(iv) f(b0, γ) is continuous in a neighborhood of γ0 ∈ Θ
(v) instrument relevance condition: rank [WFγ ] = dim(γ) where Fγ =∂f(b0,γ0)
∂γ
(vi) identification condition: dim³f³bbT ;γ´´ ≥ dim(γ)
Then
bγT p→ γ0The proof is provided in the appendix where it is worth remarking that the proof takes H to be
finite and given. If instead H → ∞ with the sample size, then bbT becomes infinite-dimensionaland then one would have to appeal to higher order conditions (such as empirical process theory
and stochastic equicontinuity of f³bbT ;γ´ with respect to bbT ), which would make the proof more
general but far less transparent. By taking H to be finite, it is relatively straight-forward to show
that bQT (γ) p→Q0 uniformly (see Andrews 1994, 1995).Lemma 4 Normality. Assume:
(i) cW p→ W where W = (FbΩbFb)−1 , a positive definite matrix and where Fb is defined as inassumption (v) below.
(ii) bbT p→ b0; bγT p→ γ0 from proposition 1 and lemma 3.(iii) b0 and γ0 are in the interior of Θ.
(iv) f³bbT ;γ´ is continuously differentiable in a neighborhood N of θ0.
(v) There is a Fb and a Fγ that are continuous at b0 and γ0 respectively and
supb,γ∈N
||∇bf(b,γ)−Fb||p→ 0
supb,γ∈N
||∇γ f(b,γ)−Fγ ||p→ 0
18
-
(vi) For Fγ = Fγ (γ0) then F0γWFγ is invertible.
Then
√T −H − k (bγT − γ0) d→ N (0,Ωγ)
Ωγ =¡F 0γWFγ
¢−1The proof is provided in the appendix using the same principles required to derive the proof
of asymptotic normality typical of GMM and minimum distance problems (see e.g., Newey and
McFadden, 1994; Wooldridge, 1994). We have taken the simpler route here of brushing aside
weak instrument conditions/problems such as those discussed, e.g., in Bekker (1994), Staiger and
Stock (1997), Stock, Wright and Yogo (2002) and many others with assumption (v) in Lemma 3.
We felt it was more useful to provide the foundational results first and since the weak instrument
problems that can arise with projection minimum distance are of a similar nature than those
already investigated in the literature in a GMM context, we refer the reader to this literature
directly. In practice, we recommend choosing the optimal impulse response horizon using the
information criterion in Hall et. al. (2007), whose formula appears in expression (11). In finite
samples, all asymptotic expressions can be replaced by their usual small sample estimates. Lastly,
we note that Fb is a function of γ and hence the expression of the optimal weighting matrix cW =(F 0bΩbFb)
−1 cannot be computed directly. However, a consistent estimate of γ can be obtained
with the equal-weights matrix cW = I (lemma 3 only requires W to be positive semidefinite toachieve consistency) from which an estimate of γ can be obtained to then construct the optimal-
weights estimator and hence compute all the relevant statistics. In principle, one can iterate on
this procedure to refine the estimates of γ although asymptotically, one iteration is sufficient.
Finally, lemma 4 and standard results are all that is needed to show that a test of overidentifying
restrictions can be easily obtained by realizing that the minimum distance function bQT evaluatedat the optimum bbT , bγT has a chi-square distribution with degrees of freedom dim³f ³bbT ;γ´´−dim(γ).
19
-
5 Small-Sample Properties: Monte Carlo Experiments
This section contains Monte Carlo experiments designed to show that PMD is computationally
convenient while not incurring in significant efficiency losses relative to MLE in models whose
likelihood requires numerical algorithms for its maximization; that PMD provides more efficient
but similarly unbiased estimates to GMM when the specification of the model is correct; and
that PMD can be more robust than GMM to certain types of misspecification due to illegitimate
instrument problems. We showcase these features with two experiments: one compares estimation
of a simple ARMA(1,1) model estimated by PMD and by MLE. The second generates data from
an extended version of the New Keynesian model introduced in section 2 and compares the small
sample properties of the New Keynesian hybrid Phillips curve estimates obtained by PMD and
with GMM.
5.1 PMD vs. MLE
The data for this set of experiments is generated from the univariate ARMA(1,1) model
yt = ρyt−1 + εt + θεt−1 εt ∼ N(0, 0.5)
for the following four different pairs of parameter values: (1) ρ = 0.25, θ = 0.50; (2) ρ = 0.50,
θ = 0.25; (3) ρ = 0, θ = 0.5; and (4) ρ = 0.5, θ = 0. The last two cases are a pure MA(1) and a
pure AR(1) models but they will be specified as ARMA(1,1) models in the estimation.
Each of the 1,000 simulation runs has the following features. We use 500 burn-in replications
to avoid initialization issues with sample sizes T = 50, 100, and 400. The lag length of the local
projection step is determined automatically by AICC — a correction to AIC for autoregressive
models introduced by Hurvich and Tsai (1989) with better small sample properties than alternative
information criteria. For the minimum distance step, we experiment with fixed values H = 2, 5,
and 10. For H = 2, we have just-identification, otherwise, we have overidentifying restrictions.
20
-
Given our choices of ρ and θ in all 4 cases, for H = 5 the impulse response coefficients are all
very close to 0 at that horizon. Hence, by including the case where H = 10 we hope to capture
possible distortions to the parameter estimates of ρ and θ generated by first stage estimates that
have virtually zero information content (akin to having weak instruments). It is worth remarking
that while MLE requires numerical optimization routines, PMD for this example requires two very
simple least-squares steps. Tables 1.1-1.4 summarize the experiments by reporting Monte Carlo
averages and standard errors of the parameter estimates calculated with the analytic formulas of
the large-sample approximations. In addition, empirical Monte Carlo standard errors are provided
as a check that the formulas provide appropriate values.
The tables show that PMD estimates converge to the true parameter values as the sample size
grows at roughly the same or better speed than MLE estimates. This is true even for the small
samples T = 50 although when H = 10, there is a clear deterioration of the PMD estimates, not
surprisingly. What is surprising though, is that the effect of having a large number of conditions
with little information value (H = 10 rather than H = 2 or 5) does not appear to distort the
estimates (or the standard errors) with sample sizes as low as T = 100 observations. For sample
sizes T = 100 and 400, PMD and MLE standard errors are virtually the same (when H = 5, 10)
and comparable to the empirical Monte Carlo values. Finally, we remark that in tables 3 and 4
(the pure MA(1) and AR(1) DGPs) we had difficulty getting convergence of the MLE estimator
for all the runs. Instead of trying to redo (or disregard) specific runs, we preferred to leave the
results blank as a way to highlight that although MLE run into numerical difficulties, PMD is
numerically stable and robust in all the cases. Thus, a fair summary of these experiments suggests
that PMD has very good small sample properties, converging quickly to the theoretical values and
with relatively the same efficiency as MLE even though PMD uses simple least squares algebra
and MLE requires numerical routines to maximize the likelihood.
21
-
5.2 PMD vs. GMM
This set of experiments borrows several elements from the simulation study in Lindé (2005). In
that paper, the objective was to compare the small sample properties of GMM vs. FIML estimation
of the New Keynesian hybrid Phillips curve (such as expression (3)). Here we simulate data from
a slightly modified version of the New Keynesian model discussed in section 2, equations (3)-(5)
and compare GMM to PMD instead. Specifically, data will be generated from the model
⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩πt = γfEtπt+1 + γ
1bπt−1 + γ
2bπt−2 + γggt + επ,t
gt = βfEtgt+1 + β1bgt−1 + β
2bgt−2 − βr(Rt −Etπt+1) + εg,t
Rt = (1− ρ)(ωππt + ωggt) + ρRt−1 + εR,t⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩επ,t = uπ,t
εg,t = ρgεg,t−1 + ug,t
εR,t = ρRεR,t−1 + uR,t
ut ∼ N
⎛⎜⎜⎜⎜⎜⎜⎝
⎡⎢⎢⎢⎢⎢⎢⎣0
0
0
⎤⎥⎥⎥⎥⎥⎥⎦ ;⎡⎢⎢⎢⎢⎢⎢⎣0.52 0 0
0 0.2882 0
0 0 0.2522
⎤⎥⎥⎥⎥⎥⎥⎦
⎞⎟⎟⎟⎟⎟⎟⎠for different combinations of parameters to be made more explicit shortly. First, however, notice
we modified the Phillips and IS curves slightly to include an extra lag than what is conventional.
We provide no theoretical justification for this but use this device as a way to generate small
distortions to the canonical specification and check the robustness of PMD and GMM to dynamic
misspecification. Hence, some of the simulations are conducted with γf + γ1b = 1 and γ
2b = 0 (and
similarly for the IS curve parameters), which is the standard specification. In other experiments,
we simply set γf + γ1b + γ
2b = 1 and γ
1b = γ
2b (and similarly for the IS curve parameters) to induce
additional serial correlation.
Most of the parameter choices are borrowed from Lindé (2005) and we refer the reader for a
more careful justification of these choices there. We investigate three primary different combina-
tions of parameters:
1. γf = βf = 0.7; γ1b = β
1b = 0.3 or γ
1b = γ
2b = β
1b = β
2b = 0.15; γg = 0.13; and βr = 0.09
22
-
2. γf = βf = 0.5; γ1b = β
1b = 0.5 or γ
1b = γ
2b = β
1b = β
2b = 0.25; γg = 0.25; and βr = 0.30
3. γf = βf = 0.3; γ1b = β
1b = 0.7 or γ
1b = γ
2b = β
1b = β
2b = 0.35; γg = 0.40; and βr = 1
The Taylor rule parameters are the same in all cases with ρ = 0.5;ωπ = 1.5 and ωg = 0.5
and the shock processes are allowed to take the two pairs of values ρg = 0.5 and ρR = 0.8
or ρg = ρR = 0. The latter case is included as a benchmark since then, a standard New-
Keynesian hybrid Phillips curve specification estimated by GMM using as instruments lagged
values of the endogenous variables (including lags of Rt) is correct and should provide estimates
of the parameters close to the theoretical values. Like Lindé (2005), we experimented by allowing
Rt to be part of the instrument set originally. However, we found the distortions to the GMM
estimates to be so considerable with respect to the PMD estimates that we decided to include, for
completeness, estimates that only use lagged values of Rt.
1,000 Monte Carlo runs are generated with the different combinations of parameters described
above, in all 36 different cases summarized in tables 2.1.1-2.3.2. Each run is initialized with 500
burn-in replications with which a sample of 200 observations (as in Lindé, 2005) is then generated.
Because we argued in section 3 that dynamic misspecification can be best detected when there is
variation in the parameter estimates as a function of the choice of impulse response horizon H
selected, we report estimates based on 2 impulse response horizons and based on impulse response
horizons optimally selected with Hall et al.’s (2007) information criterion. At the same time, we
compute GMM estimates based on the same lags for comparison purposes. The lag length of the
first stage local projections is automatically selected by AICC as in our previous experiments.
It would be very tedious to comment on each of the numerous cases investigated but some
general lessons are apparent. First, when the shocks are i.i.d. (so that there are no distortions
to the internal dynamics of the model) and we examine the case we label “Benchmark” (with
the traditional dynamic specification), both PMD and GMM provide good estimates although
estimates of the output gap parameter of the Phillips curve tend to be somewhat downward
23
-
biased with GMM but to a much lesser extent with PMD. Virtually in all cases, PMD estimates
are more efficient than their GMM counterparts as we had anticipated. Generally speaking, using
Rt as an instrument turns out to be a very bad idea for GMM estimation. The reason is that
this is clearly an invalid instrument and Tables 2.1.2, 2.2.2, and 2.3.2 make this clear. This is
less of a problem for PMD because all instruments essentially are orthogonalized with respect to
past information and this solves to a great extent this problem. When Rt is excluded from the
instrument list, GMM performs much better and we concentrate on these tables next (tables 2.1.1,
2.2.1, and 2.3.1).
Whether the dynamic structure is modified by allowing serial correlation in the shocks, richer
dynamics in the Phillips curve, or richer dynamics in the IS curve, both PMD and GMM have
more difficulty in obtaining accurate estimates of the parameters, specially the output gap pa-
rameter. For example, in table 2.1.1 the additional serial correlation in the structural inflation
Euler equation is enough to cause estimates of the output parameter to flip sign, although more
generally, we simply observed estimates that were downward biased. Distortions to the degree
of forward/backward looking behavior of the Phillips curve were, on the other hand, much more
muted although the distortions obtained with GMM tend to be considerable larger than with
PMD. In these cases, the parameter estimates changed quite a bit with the number of instruments
included, an indication of specification problems along the lines anticipated in our discussion of
section 3.
Overall, while PMD was not a universal panacea for every foreseeable type of misspecification,
we obtained estimates that had a smaller bias than GMM in the majority of the cases. When
the model was correctly specified, there was little difference between the methods but even here
PMD was less biased and provided more efficient estimates. The introduction of relatively small
distortions in the dynamic behavior of the model was enough to generate considerable distortions
in the estimation of the output gap parameter, which plays a very prominent role in this literature.
Almost in every case considered, the distortion caused the parameter to be downward biased. PMD
24
-
mitigates this bias somewhat with respect to GMM but not to the extent that would have been
desirable.
6 Empirical Application: Fuhrer and Olivei (2005) Revis-
ited
Estimating the Phillips and IS curves in expressions (3) and (4) by limited-information methods is
difficult due to the poor small-sample properties of popular estimators. Fuhrer and Olivei (2005)
discuss the weak instrument problem that characterizes GMM in this type of application and
then propose a GMM variant where the dynamic constraints of the economic model are imposed
on the instruments to improve small sample performance. They dub this procedure “optimal
instruments” GMM (OI−GMM) and explore its properties relative to conventional GMM and
MLE estimators with Monte Carlo experiments.
We find it is useful to apply PMD to the same examples Fuhrer and Olivei (2005) analyze to
provide the reader a context of comparison for our method. The basic specification is (using the
same notation as in Fuhrer and Olivei, 2005):
zt = (1− μ) zt−1 + μEtzt+1 + γEtxt + εt (18)
In the output Euler equation, zt is a measure of the output gap, xt is a measure of the real interest
rate, and hence, γ < 0. In the inflation Euler version of (18), zt is a measure of inflation, xt is
a measure of the output gap, and γ > 0 signifying that a positive output gap exerts “demand
pressure” on inflation.
Fuhrer and Olivei (2005) experiment with a quarterly sample from 1966:Q1 to 2001:Q4 and use
the following measures for zt and xt. The output gap is measured, either by the log deviation of
real GDP from its Hodrick-Prescott (HP) trend or, from a segmented time trend (ST) with breaks
in 1974 and 1995. Real interest rates are measured by the difference of the federal funds rate
25
-
and next period’s inflation. Inflation is measured by the log change in the GDP, chain-weighted
price index. In addition, Fuhrer and Olivei (2005) experiment with real unit labor costs (RULC)
instead of the output gap for the inflation Euler equation. Further details can be found in their
paper.
Table 3.1 and figure 1 summarize the empirical estimates of the output Euler equation and
correspond to the results in table 4 in Fuhrer and Olivei (2005), whereas table 3.2 and figure 2
summarize the estimates of the inflation Euler equation and correspond to the results in Table 5
instead. For each Euler equation, we report the original GMM,MLE, andOI−GMM estimates and
below these, we include the PMD results based on choosing h with Hall et al.’s (2007) information
criterion. The top panels of figures 1 and 2 display the estimates of μ and γ in (18) as a function
of h and the associated two-standard error bands. The bottom left panel displays the value of Hall
et al.’s (2007) information criterion and the bottom right panel, the p-value of the overidentifying
restrictions misspecification test.
Since the true model is unknowable, there is no definitive metric by which one method can be
judged to offer closer estimates to the true parameter values. Rather, we wish to investigate in
which ways PMD coincides or departs from results that have been well studied in the literature.
We begin by reviewing the estimates for the output Euler equation reported in table 3.1 and figure
1. PMD estimates of μ are close to GMM estimates but with similar standard errors, and not
very different from MLE or OI-GMM. On the other hand, PMD estimates for γ are slightly larger
in magnitude, of the correct sign and statistically significant. This would seem like good news,
however as figure 1 shows, while the estimates of μ appear to be somewhat stable to the choice
of h, the estimates of γ are positive for any h < 7. This suggests that estimates of γ should be
taken with caution as the model is likely dynamically misspecified (although the misspecification
test does not suggest anything evident).
Estimates of the inflation Euler equation follow a similar pattern. For all three specifications,
μ and γ are estimated to be similar to the GMM estimates but in all three specifications, the
26
-
misspecification tests rejects the model very clearly. Figure 2 shows that while estimates of μ are
relatively stable, estimates of γ for the HP and ST specifications are virtually negative for any
h. The RULC specification suggests γ is mostly positive (with γ negative only for h = 3 and 4).
Overall, the results suggests caution since every indication (from the overidentifying restrictions
tests to the plots of the parameter estimates as a function of h) is that the model is dynamically
misspecified.
With the exception of the inflation Euler model estimated with RULC, we find that the data
reject most of the specifications commonly estimated (either outright, as indicated by the overi-
dentifying restrictions test, or because of the variation of the parameter estimates as a function
of h). The ability to check model specification by these two complementary methods is useful (es-
pecially in instances when the data do not reject the model but variation in parameters estimates
for low values of h is substantial). With some notable exceptions, PMD estimates are often close
to estimates obtained by other methods but with smaller standard errors so that at a minimum,
we are able to ascertain that our results are not caused by extreme differences.
7 Conclusion
This paper introduces a disarmingly simple and novel method of estimation for macroeconomic
data. Several features make it appealing: (1) for many models, including some whose likelihood
would require numerical optimization routines, PMD only requires simple least-squares algebra;
(2) for many models, PMD approximates the maximum likelihood estimator in relatively small
samples; (3) however, PMD is efficient in finite samples because it accounts for serial correlation
in a convenient parametric way; (4) as a consequence, PMD is generally more efficient than GMM;
(5) PMD provides an unsupervised method of conditioning for unknown omitted dynamics that
in many cases mitigates invalid instrument problems; (6) PMD provides many natural statistics
with which to evaluate estimates of a model including, an overall misspecification test, and a way
27
-
to assess which parameter estimates are most sensitive to misspecification.
The paper provides basic but generally applicable asymptotic results and ample Monte Carlo
evidence in support of our claims. In addition, the empirical application provides a natural
example of how PMD may be applied in practice. However, there are many research questions
that space considerations prevented us from exploring. Throughout the paper, we have mentioned
some of them, such as the need for a more detailed investigation of the power properties of the
misspecification test in light of the GMM literature; and generalizations of our basic assumptions
in the main theorems.
Other natural extensions include nonlinear generalizations of the local projection step to ex-
tend beyond the Wold assumption. Such generalizations are likely to be very approachable because
local projections lend themselves well to more complex specifications. Similarly, we have excluded
processes that are not covariance-stationary, mainly because they require slightly different as-
sumptions on their infinite representation and the non-standard nature of the asymptotics are
beyond the scope of this paper. In the end, we hope that the main contribution of the paper is
to provide applied researchers with a new method of estimation that is simpler than many others
available, while at the same time more robust and informative.
8 Appendix
8.1 Definitions and Notation
We find it useful to begin by defining and collecting the notation that we use for the proofs of the
propositions and lemmas introduced above. Specifically:
(i) Xt,k−1kn×1
=¡y0t,y
0t−1, ...,y
0t−k+1
¢0(ii) Yt,H
Hn×1=¡y0t+1, ...,y
0t+H
¢0(iii) Mt−1,k
1×1= 1−
PT−ht=k X
0t−1,k
³PT−Ht=k Xt−1,kX
0t−1,k
´−1Xt−1,k
28
-
(iv) dΓn×n
(j)=(T-k-H)−1PT−Ht=k yty
0t−j
(v) dΓn×n
(j|1-k)=(T-k-H)−1PT−H
t=k ytMt−1,ky0t−j
(vi) bΓkkn×kn
=(T-k-H)−1PT−Ht=k Xt,kX
0t,k
(vii) bΓ1−k,hkn×n
= (T − k −H)−1PT−Ht=k Xt,ky
0t+h;h = 1, ...,H
(viii) bΓ1−H|1−kHn×n
= (T − k −H)−1PT−H
t=h Yt,HMt−1,ky0t
8.2 Proof of Proposition 1
The mean-square error linear predictor of yt+h based on yt, ...,yt−k+1 is bA(k, h)Xt,k−1 wherebA(k, h) is given by the least-squares formula
bAn×kn
(k,h)=( bAh1 ,..., bAhk)=bΓ01−k,hbΓ−1k (19)Notice that
bA(k, h)−A(k, h) = bΓ01−k,hbΓ−1k −A(k, h)bΓkbΓ−1k =⎧⎨⎩(T − k − h)−1∞Xj=k
vk,t+hX0t,k
⎫⎬⎭ bΓ−1kwhere
vk,t+h =∞X
j=k+1
Ahj yt−j + ut+h +h−1Xj=1
Bjut+h−j
Hence,
bA(k, h)−A(k, h) =⎧⎨⎩(T − k − h−1)
T−hXt=k
⎛⎝ ∞Xj=k+1
Ahj yt−j
⎞⎠X 0t,k⎫⎬⎭ bΓ−1k +(
(T − k − h−1)T−hXt=k
ut+hX0t,k
)bΓ−1k +⎧⎨⎩(T − k − h−1)T−hXt=k
⎛⎝ hXj=1
Bjut+h−j
⎞⎠X 0t,k⎫⎬⎭ bΓ−1k
29
-
Define the matrix norm kCk21 = supl 6=0 l0C0C0
l0l , that is, the largest eigenvalue of C0C. When C is
symmetric, this is the square of the largest eigenvalue of C. Then
kABk2 ≤ kAk21 kBk2 and kABk2 ≤ kAk2 kBk21
Hence °°° bA(k, h)−A(k, h)°°° ≤ kU1Tk°°°bΓ−1k °°°1+ kU2T k
°°°bΓ−1k °°°1+ kU3T k
°°°bΓ−1k °°°1
where
U1T =
⎧⎨⎩(T − k − h−1)T−hXt=k
⎛⎝ ∞Xj=k+1
Ahj yt−j
⎞⎠X 0t,k⎫⎬⎭
U2T =
((T − k − h−1)
T−hXt=k
ut+hX0t,k
)
U3T =
⎧⎨⎩(T − k − h−1)T−hXt=k
⎛⎝ hXj=1
Bjut+h−j
⎞⎠X 0t,k⎫⎬⎭
Lewis and Reinsel (1985) show that°°°bΓ−1k °°°
1is bounded, therefore, the next objective is to show
kU1T kp→ 0, kU2T k
p→ 0, and kU3Tkp→ 0. We begin by showing kU2T k
p→ 0, which is easiest to
see since ut+h and X 0t,k are independent, so that their covariance is zero. Formally and following
similar derivations in Lewis and Reinsel (1985)
E³kU2T k2
´= (T − k − h)−2
T−hXt=k
E¡ut+hu
0t+h
¢E(X 0t,kX
0t,k)
by independence. Hence
E³kU2T k2
´= (T − k − h)−1tr(Σu)k {tr [Γ(0)]}
Since kT−k−H → 0 by assumption (ii), then E³kU2Tk2
´p→ 0, and hence kU2T k
p→ 0.
Next, consider kU3T kp→ 0. The proof is very similar since ut+h−j, j = 1, ..., h − 1 and X 0t,k
are independent. As long as kBjk2
-
Finally, we show that kU1T kp→ 0. The objective here is to show that assumption (iii) implies
that
k1/2∞X
j=k+1
°°Ahj °°→ 0, k, T → 0because we will need this condition to hold to complete the proof later. Recall that
Ahj = Bh−1Aj +Ah−1j+1 ; A
0j+1 = 0; B0 = Ir; h, j ≥ 1, h finite
Hence
k1/2∞X
j=k+1
°°Ahj °° = k1/2⎧⎨⎩
∞Xj=k+1
kBh−1Aj +Bh−2Aj+1 + ...+B1Aj+h−2 +Aj+h−1k
⎫⎬⎭by recursive substitution. Thus
k1/2∞X
j=k+1
°°Ahj °° ≤ k1/2⎧⎨⎩
∞Xj=k+1
kBh−1Ajk+ ...+ kB1Aj+h−2k+ kAj+h−1k
⎫⎬⎭Define λ as the max {kBh−1k , ..., kB1k} , then since
P∞j=0 kBjk
-
8.3 Proof of Proposition 2
Notice that
bA(k, h)−A(k, h) = ((T − k − h)−1 T−hXt=k
vk,t+hX0t,k
)bΓ−1k= (T − k − h)−1
⎡⎣T−hXt=k
⎧⎨⎩⎛⎝ ∞Xj=k+1
Ahj yt−j
⎞⎠+ ut+h + h−1Xj=1
Bjut+h−j
⎫⎬⎭X 0t,k⎤⎦ bΓ−1k
= (T − k − h)−1⎧⎨⎩T−hXt=k
⎛⎝ ∞Xj=k+1
Ahj yt−j
⎞⎠X 0t,k⎫⎬⎭nΓ−1k + ³bΓ−1k − Γ−1k ´o+
(T − k − h)−1⎧⎨⎩T−hXt=k
⎛⎝ut+h + h−1Xj=1
Bjut+h−j
⎞⎠X 0t,k⎫⎬⎭nΓ−1k + ³bΓ−1k − Γ−1k ´o
Hence, the strategy of the proof will consist in showing that the first term in the sum above vanishes
in probability so that,
(T − k − h)1/2 vech bA(k, h)−A(k, h)i p→
(T − k − h)1/2 vec
⎡⎣(T − k − h)−1⎧⎨⎩T−hXt=k
⎛⎝ut+h + h−1Xj=1
Bjut+h−j
⎞⎠X 0t,k⎫⎬⎭Γ−1k
⎤⎦ .and then all we need to do is show that this last term is asymptotically normal. First we prove the
convergence in probability result in this last expression. Define,
U1T =
⎧⎨⎩(T − k − h)−1T−hXt=k
⎛⎝ ∞Xj=k+1
Ahj yt−j
⎞⎠X 0t,k⎫⎬⎭
U∗2T =
⎧⎨⎩(T − k − h)−1T−hXt=k
⎛⎝ut+h + h−1Xj=1
Bjut+h−j
⎞⎠X 0t,k⎫⎬⎭
then
(T − k − h)1/2 vech bA(k, h)−A(k, h)i =
(T − k − h)1/2
⎧⎪⎪⎨⎪⎪⎩vec
£U1TΓ
−1k
¤+ vec
hU1T
³bΓ−1k − Γ−1k ´i+vec
£U∗2TΓ
−1k
¤+ vec
hU∗2T
³bΓ−1k − Γ−1k ´i⎫⎪⎪⎬⎪⎪⎭
32
-
hence
(T − k − h)1/2 vech bA(k, h)−A(k, h)i− (T − k − h)1/2 vec £U∗2TΓ−1k ¤ =
(T − k − h)1/2
⎧⎪⎪⎨⎪⎪⎩vec
£U1TΓ
−1k
¤+ vec
hU1T
³bΓ−1k − Γ−1k ´i+vec
hU∗2T
³bΓ−1k − Γ−1k ´i⎫⎪⎪⎬⎪⎪⎭ =
¡Γ−1k ⊗ Ir
¢vec
h(T − k − h)1/2 U1T
i+n³bΓ−1k − Γ−1k ´⊗ Iro vec h(T − k − h)1/2 U1T i+n³bΓ−1k − Γ−1k ´⊗ Iro vec h(T − k − h)1/2 U∗2T i
Define, with a slight change in the order of the summands,
W1T =n³bΓ−1k − Γ−1k ´⊗ Iro vec h(T − k − h)1/2 U1T i
W2T =n³bΓ−1k − Γ−1k ´⊗ Iro vec h(T − k − h)1/2 U∗2T i
W3T =¡Γ−1k ⊗ Ir
¢vec
h(T − k − h)1/2 U1T
iThe proof proceeds by showing that W1T
p→ 0, W2Tp→ 0, W3T
p→ 0.
We begin by showing that W1Tp→ 0. Lewis and Reinsel (1985) show that under assumption (ii),
k1/2°°°bΓ−1k − Γ−1k °°°
1
p→ 0 and E³°°°k−1/2 (T − k − h)1/2 U1T°°°´ ≤ s (T − k − h)1/2P∞j=k+1 °°Ahj °° p→
0; k, T → ∞ from assumption (iii) and using similar derivations as in the proof of consistency
with s being a generic constant. Hence W1Tp→ 0.
Next, we show W2Tp→ 0. Notice that
|W2T | ≤ k1/2°°°bΓ−1k − Γ−1k °°°
1
°°°k−1/2(T − k − h)1/2U∗2T°°°As in the previous step, Lewis and Reinsel (1985) establish that k1/2
°°°bΓ−1k − Γ−1k °°°1
p→ 0 and from
the proof of consistency, we know the second term is bounded in probability, which is all we need
to establish the result.
Lastly, we need to show W3Tp→ 0, however, the proof of this result is identical to that in Lewis
33
-
and Reinsel once one realizes that assumption (iii) implies that
(T − k − h)1/2∞X
j=k+1
°°Ahj °° p→ 0and substituting this result into their proof.
The asymptotic normality result then follows directly from Lewis and Reinsel (1985) by redefin-
ing
ATm = (T − k − h)1/2 vec
⎡⎣⎧⎨⎩(T − k − h)−1T−hXt=k
⎛⎝ut+h + h−1Xj=1
Bjut+h−j
⎞⎠X 0t,k(m)⎫⎬⎭Γ−1k
⎤⎦for m = 1, 2, ... and Xt,k(m) as defined in Lewis and Reinsel (1985) and using their proof.
8.4 Proof of Lemma 3
Since bbT p→ b0, then
f³bbT ;φ´ p→ f (b0;φ)
by the continuous mapping theorem since by assumption (iv), f (.) is continuous. Furthermore
and given assumption (i)
bQT (φ) = f ³bbT ;φ´0cWf ³bbT ;φ´ p→ f (b0;φ)0cWf (b0;φ) ≡ Q0 (φ)which is a quadratic expression that is maximized at φ0. Assumption (vi) provides a necessary
condition for identification of the parameters (i.e., that there be at least as many matching condi-
tions as parameters) that must be satisfied to establish uniqueness. As a quadratic function, Q0(φ)
is obviously a continuous function. The last thing to show is that
supφ∈Θ
¯̄̄ bQT (φ)−Q0(φ)¯̄̄ p→ 0uniformly.
For compact Θ and continuous Q0(φ), Lemma 2.8 in Newey and McFadden (1994) provides
that this condition holds if and only if bQT (φ) p→ Q0(φ) for all φ in Θ and bQT (φ) is stochastically34
-
equicontinuous. The former has already been established, so it remains to show stochastic equicon-
tinuity of bQT (φ).1 Whether bQT (φ) is stochastically equicontinuous depends on each applicationand, specifically, on the properties and assumptions made on the specific nature of f (.) . In the
example that we use in this paper and presented in section 2, the function f(.) is rather trivial
(linear in the parameters) so that proving uniform convergence is rather easy and we do not really
require stochastic equicontinuity (since we have assumed that H is finite and not a function of
T ). In general, we directly assume here that stochastic continuity holds and we refer the reader
to Andrews (1994, 1995) for examples and sets of specific conditions that apply even when b is
infinite dimensional and for more general forms of f(.).
8.5 Proof of Lemma 4
Under assumption (iii) b0 and γ0 are in the interior of their parameter spaces and by assumption
(ii) bbT p→ b0, bγT p→ γ0. Further, by assumption (iv), f(bbT ; γ) is continuously differentiable in aneighborhood of b0 and γ0 and hence bγT solves the first order conditions of the minimum-distanceproblem
minγf(bbT ;γ)0cWf(bbT ;γ)
which are
Fγ
³bbT ;γ´0cWf(bbT ;γ) = 0By assumption (iv), these first order conditions can be expanded about γ0 in mean value expansion
f(bbT ; bγT ) = f(bbT ;γ0) + Fγ ³bbT ;γ´ (bγT − γ0)where γ ∈ [bγT , γ0]. Similarly, a mean value expansion of f(bbT ; γ0) around b0 is
f(bbT ;γ0) = f(b0;γ0) + Fb ¡b;γ0¢ ³bbT − b0´1 Stochastic equicontinuity: For every ², η > 0 there exists a sequence of random variables ∆̂t and a sample
size t0 such that for t ≥ t0, Prob(|∆̂T | > ²) < η and for each φ there is an open set N containing φ withsupφ̃∈N
¯̄̄ bQT (φ̃)− bQT (φ)¯̄̄ ≤ ∆̂T , for t ≥ t0.
35
-
Combining both mean value expansions and multiplying by√T, we have
√Tf(bbT ; bγT ) = √Tf(b0;γ0) + Fγ ³bbT ;γ´√T (bγT − γ0) +
Fb¡b;γ0
¢√T³bbT − b0´
Since b ∈ [bbT , b0], γ ∈ [bγT , γ0] and bbT p→ b0, bγT p→ γ0 then, along with assumption (iv), we haveFγ
³bbT ;γ´ p→ Fγ (b0;γ0) = FγFb¡b;γ0
¢ p→ Fb(b0;γ0) = Fband hence
√Tf(bbT ; bγT ) = √Tf(b0;γ0) + Fγ√T (bγT − γ0) + Fb√T ³bbT − b0´+ op(1)
In addition, by assumption (i) cW p→ W and notice that f (b0,γ0) = 0, which combined with thefirst order conditions and the mean value expansions described above, allow us to write
−F 0γWhFγ√T (bγT − γ0) + Fb√T ³bbT − b0´i = op(1)
Since we know that
√T³bbT − b0´ d→ N (0,Ωb)
by proposition 2, then
√T (bγT − γ0) d→ − ¡F 0γWFγ ¢−1 ¡F 0γWFb¢√T ³bbT − b0´
by assumption (vii) which ensures that F 0γWFγ is invertible. Therefore, from the previous expres-
sion we arrive at
√T (bγT − γ0) d→ N (0,Ωγ )
Ωγ =¡F 0γWFγ
¢−1 ¡F 0γWFbΩbF
0bWFγ
¢ ¡F 0γWFγ
¢−1Notice that since we are using the optimal weighting matrix, then W = (FbΩbF 0b)
−1 and hence,
the previous expression simplifies considerably to
36
-
Ωγ =¡F 0γWFγ
¢−1W = (FbΩbF
0b)−1
ReferencesAn, Sungbae and Frank Schorfheide (2007) “Bayesian Analysis of DSGE Models,” Econo-metric Reviews, 26(2-4): 113-172.
Andrews, Donald W. K. (1994) “Asymptotics for Semiparametric Econometric Models viaStochastic Equicontinuity,” Econometrica, 62(1): 43-72.
Andrews, Donald W. K. (1995) “Non-parametric Kernel Estimation for SemiparametricModels,” Econometric Theory, 11(3): 560-596.
Bekker, Paul A. (1994) “Alternative Approximations to the Distribution of InstrumentalVariable Estimators,” Econometrica, 62: 657-681.
Calvo, Guillermo A. (1983) “Staggered Prices in a Utility Maximizing Framework,” Journalof Monetary Economics, 12: 383-98.
Cameron, A. Colin and Pravin K. Trivedi (2005) Microeconometrics: Methods andApplications. Cambridge: Cambridge University Press.
Canova, Fabio (2007) Methods for Applied Macroeconomic Research. Princeton:Princeton University Press.
Christiano, Lawrence J. and Wouter den Haan (1996) “Small-Sample Properties of GMMfor Business Cycle Analysis,” Journal of Business and Economic Statistics, 14(3): 309-327.
Christiano, Lawrence J., Martin Eichenbaum, and Charles L. Evans (2005) “Nominal Rigidi-ties and the Dynamic Effects of a Shock to Monetary Policy,” Journal of Political Economy,113(1): 1-45.
Clarida, Richard, Jordi Galí and Mark Gertler (1999) “The Science of Monetary Policy: ANew Keynesian Perspective,” Journal of Economic Literature, 37(4): 1661-1707.
Eichenbaum, Martin and Jonas D. M. Fisher (2007) “Estimating the Frequency of PriceRe-optimization in Calvo-style Models,” Journal of Monetary Economics, 54(7): 2032-2047.
Fernández-Villaverde, Jesús, Juan F. Rubio-Ramírez, Thomas J. Sargent and Mark W. Wat-son (2007) “A, B, Cs (and Ds) of Understanding VARs,” American Economic Review,97(3):1021-1026.
Fuhrer, Jeffrey C. and Giovanni P. Olivei (2005) “Estimating Forward-Looking Euler Equa-tions with GMM Estimators: An Optimal Instruments Approach,” in Models and Mon-etary Policy: Research in the Tradition of Dale Henderson, Richard Porter,and Peter Tinsley, Board of Governors of the Federal Reserve System: Washington, DC,87-104.
37
-
Galí, Jordi and Mark Gertler (1999) “Inflation Dynamics: A Structural Econometric Ap-proach,” Journal of Monetary Economics, 44(2): 195-222.
Galí, Jordi, Mark Gertler and David J. López-Salido (2001) “European Inflation Dynamics,”European Economic Review, 45(7): 1237-1270 (Erratum, September 2002).
Galí, Jordi, Mark Gertler and David J. López-Salido (2005) “Robustness of the Estimates ofthe Hybrid New Keynesian Phillips Curve,” Journal of Monetary Economics, 52(6): 1107-1118.
Gonçalves, Silvia and Lutz Kilian (2006) “Asymptotic and Bootstrap Inference for AR(∞)Processes with Conditional Heteroskedasticity,” Econometric Reviews, forthcoming.
Hall, Alastair, Atsushi Inoue, James M. Nason, and Barbara Rossi (2007) “InformationCriteria for Impulse Response Function Matching Estimation of DSGE Models,” Duke Uni-versity, mimeo.
Hurvich, Clifford M. and Chih-Ling Tsai (1989) “Regression and Time Series Model Selectionin Small Samples,” Biometrika, 76(2): 297-307.
Jordà, Òscar (2005) “Estimation and Inference of Impulse Responses by Local Projections,”American Economic Review, 95(1): 161-182.
Kurmann, André (2005) “Quantifying the Uncertainty about a Forward-Looking New Key-nesian Pricing Model,” Journal of Monetary Economics, 52(6): 1119-1134.
Kuersteiner, Guido M. (2005) “Automatic Inference for Infinite Order Vector Autoregres-sions,” Econometric Theory, 21: 85-115.
Lewis, R. A. and Gregory C. Reinsel (1985) “Prediction of Multivariate Time Series byAutoregressive Model Fitting,” Journal of Multivariate Analysis, 16(33): 393-411.
Levin, Andrew T. and John C. Williams (2003) “Robust Monetary Policy with CompetingReference Models,” Journal of Monetary Economics, 50(5): 945-975.
Lindé, Jesper (2005) “Estimating New Keynesian Phillips Curves: A Full Information Max-imum Likelihood Approach,” Journal of Monetary Economics, 52(6): 1135-1149.
Newey, Whitney K. and Daniel L. McFadden (1994) “Large Sample Estimation and Hy-pothesis Testing,” in Handbook of Econometrics, v. 4, Robert F. Engle and Daniel L.McFadden, (eds.). Amsterdam: North Holland.
Newey, Whitney K. and Kenneth D. West (1987) “A Simple, Positive Semi-Definite, Het-eroscedasticity and Autocorrelation Consistent Covariance Matrix,” Econometrica, 55:703-708.
Rotemberg, Julio J. and Michael Woodford (1997) “An Optimization-Based EconometricFramework for the Evaluation of Monetary Policy,” NBER Macroeconomics Annual, 297-346.
Rudd, Jeremy and Karl Whelan (2005) “New Test of the New-Keynesian Phillips Curve,”Journal of Monetary Economics, 52(6): 1167-1181.
Sbordone, Argia (2002) “Prices and Unit Labor Costs: Testing Models of Pricing Behavior,”Journal of Monetary Economics, 49(2): 265-292.
38
-
Smets, Frank and Raf Wouters (2003) “An Estimated Dynamic Stochastic General Equi-librium Model of the Euro Area,” Journal of the European Economic Association, 1(5):1123-1175.
Staiger, Douglas and James H. Stock (1997) “Instrumental Variables Regression with WeakInstruments,” Econometrica, 65(3): 557-586.
Stock, James H., Jonathan H. Wright and Motohiro Yogo (2002) “A Survey of Weak Instru-ments and Weak Identification in Generalized Method of Moments,” Journal of Businessand Economic Statistics, 20(4): 518-529.
Walsh, Carl E. (2003) Monetary Theory and Policy, second edition. Cambridge,Massachusetts: The MIT Press.
39
-
40
TABLE 1.1 – ARMA(1,1) Monte Carlo Experiments
T = 50 h = 2 h = 5 h = 10 ρ θ ρ θ ρ θ PMD Est. 0.23 0.49 0.25 0.44 0.31 0.28 SE 0.22 0.20 0.20 0.19 0.20 0.18 SE (MC) 0.31 0.27 0.21 0.20 0.22 0.28 MLE Est. 0.22 0.52 0.23 0.52 0.22 0.53 SE 0.21 0.18 0.20 0.18 0.20 0.18 SE (MC) 0.27 0.24 0.27 0.23 0.27 0.23 T = 100 h = 2 h = 5 h = 10 ρ θ ρ θ ρ θ PMD Est. 0.24 0.50 0.25 0.47 0.27 0.45 SE 0.15 0.14 0.15 0.13 0.14 0.13 SE (MC) 0.17 0.15 0.15 0.13 0.15 0.15 MLE Est. 0.25 0.51 0.24 0.51 0.24 0.50 SE 0.14 0.13 0.14 0.13 0.14 0.13 SE (MC) 0.15 0.13 0.16 0.14 0.14 0.14 T = 400 h = 2 h = 5 h = 10 ρ θ ρ θ ρ θ PMD Est. 0.25 0.51 0.25 0.50 0.25 0.50 SE 0.07 0.07 0.07 0.06 0.07 0.06 SE (MC) 0.08 0.07 0.07 0.07 0.07 0.07 MLE Est. 0.25 0.50 0.25 0.50 0.24 0.51 SE 0.07 0.06 0.07 0.07 0.07 0.06 SE (MC) 0.07 0.06 0.07 0.07 0.07 0.06 Notes: 1,000 Monte Carlo replications, 1st-stage regression lag length chosen automatically by AICC, SE refers to the standard error calculated with the PMD/MLE formula. SE (MC) refers to the Monte Carlo standard error based on the 1,000 estimates of the parameter. 500 burn-in observations disregarded when generating the data.
-
41
TABLE 1.2 – ARMA(1,1) Monte Carlo Experiments
T = 50 h = 2 h = 5 h = 10 ρ Θ ρ θ ρ θ PMD Est. 0.46 0.23 0.47 0.17 0.49 0.15 SE 0.19 0.20 0.18 0.19 0.18 0.18 SE (MC) 0.23 0.23 0.21 0.22 0.20 0.28 MLE Est. 0.45 0.29 0.44 0.27 0.45 0.29 SE 0.20 0.20 0.20 0.21 0.20 0.20 SE (MC) 0.21 0.23 0.23 0.25 0.19 0.22 T = 100 h = 2 h = 5 h = 10 ρ Θ ρ θ ρ θ PMD Est. 0.48 0.23 0.47 0.23 0.50 0.23 SE 0.13 0.14 0.13 0.14 0.12 0.13 SE (MC) 0.15 0.16 0.14 0.16 0.13 0.18 MLE Est. 0.48 0.27 0.47 0.25 0.48 0.26 SE 0.14 0.14 0.14 0.15 0.13 0.14 SE (MC) 0.14 0.15 0.13 0.15 0.13 0.14 T = 400 h = 2 h = 5 h = 10 ρ Θ ρ θ ρ θ PMD Est. 0.50 0.5 0.49 0.26 0.49 0.25 SE 0.07 0.07 0.06 0.07 0.06 0.07 SE (MC) 0.07 0.08 0.07 0.08 0.06 0.07 MLE Est. 0.50 0.25 0.49 0.26 0.49 0.26 SE 0.07 0.07 0.07 0.07 0.07 0.07 SE (MC) 0.06 0.07 0.07 0.07 0.06 0.07 Notes: 1,000 Monte Carlo replications, 1st-stage regression lag length chosen automatically by AICC, SE refers to the standard error calculated with the PMD/MLE formula. SE (MC) refers to the Monte Carlo standard error based on the 1,000 estimates of the parameter. 500 burn-in observations disregarded when generating the data.
-
42
TABLE 1.3 – ARMA(1,1) Monte Carlo Experiments
T = 50 h = 2 h = 5 h = 10 ρ θ ρ θ ρ θ PMD Est. -0.06 0.56 0.06 0.40 0.16 0.28 SE 0.36 0.32 0.27 0.25 0.25 0.22 SE (MC) 0.61 0.55 0.28 0.29 0.31 0.37 MLE Est. - - - - - - SE - - - - - - SE (MC) - - - - - - T = 100 h = 2 h = 5 h = 10 ρ θ ρ θ ρ θ PMD Est. -0.03 0.54 0.04 0.45 0.09 0.41 SE 0.24 0.21 0.19 0.18 0.19 0.17 SE (MC) 0.33 0.30 0.21 0.21 0.22 0.23 MLE Est. - - - - - - SE - - - - - - SE (MC) - - - - - - T = 400 h = 2 h = 5 h = 10 ρ θ ρ θ ρ θ PMD Est. -0.01 0.51 0.00 0.50 0.02 0.48 SE 0.11 0.10 0.10 0.09 0.10 0.09 SE (MC) 0.11 0.10 0.10 0.09 0.09 0.09 MLE Est. 0.04 0.50 0.00 0.50 0.00 0.50 SE 0.10 0.09 0.10 0.09 0.10 0.08 SE (MC) 0.10 0.09 0.10 0.09 0.09 0.08 Notes: 1,000 Monte Carlo replications, 1st-stage regression lag length chosen automatically by AICC, SE refers to the standard error calculated with the PMD/MLE formula. SE (MC) refers to the Monte Carlo standard error based on the 1,000 estimates of the parameter. 500 burn-in observations disregarded when generating the data.
-
43
TABLE 1.4 – ARMA(1,1) Monte Carlo Experiments
T = 50 h = 2 h = 5 h = 10 ρ θ ρ θ ρ θ PMD Est. 0.47 0.04 0.43 0.03 0.54 -0.10 SE 0.28 0.30 0.24 0.26 0.21 0.23 SE (MC) 0.40 0.40 0.24 0.26 0.24 0.30 MLE Est. - - - - - - SE - - - - - - SE (MC) - - - - - - T = 100 h = 2 h = 5 h = 10 ρ θ ρ θ ρ θ PMD Est. 0.49 0.01 0.45 0.03 0.53 -0.04 SE 0.19 0.20 0.17 0.18 -.15 0.17 SE (MC) 0.20 0.20 0.19 0.19 0.18 0.21 MLE Est. 0.49 -0.02 0.47 0.03 0.47 0.03 SE 0.17 0.20 0.18 0.20 0.18 0.20 SE (MC) 0.18 0.20 0.19 0.20 0.18 0.20 T = 400 h = 2 h = 5 h = 10 ρ θ ρ θ ρ θ PMD Est. 0.50 0.01 0.50 0.00 0.50 0.00 SE 0.09 0.10 0.08 0.09 0.08 0.09 SE (MC) 0.09 0.10 0.10 0.10 0.10 0.11 MLE Est. 0.49 0.01 0.49 0.01 0.48 0.02 SE 0.09 0.10 0.09 0.10 0.09 0.10 SE (MC) 0.09 0.10 0.09 0.10 0.09 0.10 Notes: 1,000 Monte Carlo replications, 1st-stage regression lag length chosen automatically by AICC, SE refers to the standard error calculated with the PMD/MLE formula. SE (MC) refers to the Monte Carlo standard error based on the 1,000 estimates of the parameter. 500 burn-in observations disregarded when generating the data.
-
44
Table 2.1.1 Monte Carlo Comparison: GMM vs. PMD.
Case 1:
D.G.P.
Instrument list excludes Rt
PMD GMM PMD GMM Benchmark h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6
0.703 (0.038)
0.689 (0.030)
0.697 (0.041)
0.678 (0.036)
0.718 (0.026)
0.692 (0.021)
0.705 (0.032)
0.689 (0.030)
0.297 (0.038)
0.312 (0.071)
0.303 (0.041)
0.322 (0.036)
0.281 (0.026)
0.307 (0.021)
0.295 (0.032)
0.311 (0.030)
0.127 (0.097)
0.102 (0.030)
0.108 (0.119)
0.098 (0.102)
0.078 (0.036)
0.074 (0.027)
0.078 (0.048)
0.064 (0.045)
PMD GMM PMD GMM Lagged PC h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6
0.332 (0.411)
0.503 (0.135)
0.507 (0.350)
0.267 (0.219)
0.422 (0.313)
0.453 (0.109)
0.698 (0.295)
0.369 (0.196)
0.140 (0.100)
0.160 (0.051)
0.080 (0.082)
0.098 (0.075)
0.154 (0.079)
0.192 (0.040)
0.071 (0.072)
0.116 (0.064)
-0.063 (0.216)
-0.071 (0.100)
-0.039 (0.184)
-0.068 (0.174)
0.107 (0.087)
0.079 (0.042)
0.082 (0.083)
0.118 (0.078)
PMD GMM PMD GMM Lagged IS h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6
0.708 (0.041)
0.682 (0.030)
0.697 (0.043)
0.680 (0.036)
0.714 (0.026)
0.684 (0.019)
0.704 (0.031)
0.682 (0.029)
0.291 (0.041)
0.317 (0.030)
0.303 (0.043)
0.320 (0.036)
0.285 (0.026)
0.316 (0.019)
0.296 (0.031)
0.317 (0.029)
0.082 (0.151)
0.082 (0.101)
0.067 (0.189)
0.043 (0.151)
0.043 (0.052)
0.043 (0.036)
0.040 (0.065)
0.037 (0.062)
Notes: 1,000 Monte Carlo replications. Each run initialized with 500 burn-in replications later disregarded. Sample size T = 200. Monte Carlo median values of the parameter estimates and the associated standard errors reported. “Lagged PC” refers to when the DGP consists of a Phillips Curve with first and second lag inflation terms. Similarly, “Lagged IS” refers to when the DGP consists of a Phillips curve with first and second lag output gap terms. h = 2 uses the first 2 horizons of the impulse response function when estimating with PMD (to obtain over-identification) and corresponds to using the first two lags of the variables as instruments when estimating by GMM. h* refers to the optimal horizon selected by Hall et al.’s (2007) information criterion and varies with the model. “Benchmark” and “Lagged IS” cases impose . “Lagged PC” case estimates these parameters unconstrained (since in the DGP but ).
-
45
Table 2.1.2 Monte Carlo Comparison: GMM vs. PMD.
Case 1:
D.G.P.
Instrument list includes Rt
PMD GMM PMD GMM Benchmark h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6
0.710 (0.037)
0.684 (0.031)
0.969 (0.054)
0.919 (0.046)
0.720 (0.026)
0.694 (0.021)
0.896 (0.037)
0.860 (0.035)
0.290 (0.037)
0.316 (0.031)
0.031 (0.054)
0.080 (0.046)
0.280 (0.026)
0.306 (0.021)
0.104 (0.037)
0.140 (0.035)
0.130 (0.094)
0.110 (0.070)
-0.427 (0.165)
-0.332 (0.134)
0.076 (0.036)
0.076 (0.027)
-0.024 (0.059)
-0.019 (0.055)
PMD GMM PMD GMM Lagged PC h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6
0.318 (0.412)
0.491 (0.139)
2.303 (0.690)
1.094 (0.206)
0.410 (0.314)
0.453 (0.113)
2.373 (0.591)
1.172 (0.192)
0.136 (0.103)
0.159 (0.051)
-0.178 (0.194)
0.001 (0.075)
0.166 (0.083)
0.186 (0.041)
-0.219 (0.174)
-0.013 (0.068)
-0.062 (0.213)
-0.063 (0.099)
-0.008 (0.456)
0.022 (0.173)
0.104 (0.091)
0.083 (0.043)
-0.133 (0.199)
0.034 (0.080)
PMD GMM PMD GMM Lagged IS h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6
0.715 (0.041)
0.686 (0.030)
1.04 (0.082)
0.957 (0.058)
0.715 (0.026)
0.686 (0.020)
0.921 (0.040)
0.879 (0.036)
0.285 (0.041)
0.314 (0.030)
-0.044 (0.082)
0.042 (0.,058)
0.284 (0.026)
0.314 (0.020)
0.079 (0.040)
0.121 (0.036)
0.076 (0.154)
0.082 (0.102)
-1.180 (0.374)
-0.809 (0.245)
0.051 (0.052)
0.051 (0.036)
-0.125 (0.088)
-0.123 (0.080)
Notes: 1,000 Monte Carlo replications. Each run initialized with 500 burn-in replications later disregarded. Sample size T = 200. Monte Carlo median values of the parameter estimates and the associated standard errors reported. “Lagged PC” refers to when the DGP consists of a Phillips Curve with first and second lag inflation terms. Similarly, “Lagged IS” refers to when the DGP consists of a Phillips curve with first and second lag output gap terms. h = 2 uses the first 2 horizons of the impulse response function when estimating with PMD (to obtain over-identification) and corresponds to using the first two lags of the variables as instruments when estimating by GMM. h* refers to the optimal horizon selected by Hall et al.’s (2007) information criterion and varies with the model. “Benchmark” and “Lagged IS” cases impose . “Lagged PC” case estimates these parameters unconstrained (since in the DGP but ).
-
46
Table 2.2.1 Monte Carlo Comparison: GMM vs. PMD.
Case 2:
D.G.P.
Instrument list excludes Rt
PMD GMM PMD GMM Benchmark h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6
0.509 (0.035)
0.510 (0.028)
0.516 (0.052)
0.523 (0.046)
0.569 (0.027)
0.543 (0.021)
0.557 (0.037)
0.550 (0.035)
0.491 (0.035)
0.490 (0.028)
0.484 (0.052)
0.476 (0.046)
0.431 (0.027)
0.456 (0.021)
0.443 (0.037)
0.450 (0.035)
0.251 (0.040)
0.220 (0.032)
0.229 (0.073)
0.197 (0.064)
0.159 (0.027)
0.164 (0.022)
0.159 (0.042)
0.148 (0.040)
PMD GMM PMD GMM Lagged PC h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6
0.356 (0.385)
0.509 (0.124)
0.619 (0.328)
0.318 (0.205)
0.556 (0.166)
0.560 (0.079)
0.888 (0.207)
0.581 (0.152)
0.228 (0.113)
0.251 (0.052)
0.084 (0.098)
0.131 (0.078)
0.280 (0.071)
0.345 (0.036)
0.085 (0.078)
0.187 (0.065)
0.079 (0.119)
0.040 (0.050)
0.028 (0.106)
0.061 (0.088)
0.121 (0.063)
0.088 (0.033)
0.050 (0.086)
0.141 (0.069)
PMD GMM PMD GMM Lagged IS h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6
0.547 (0.057)
0.535 (0.032)
0.534 (0.054)
0.542 (0.045)
0.571 (0.030)
0.557 (0.019)
0.559 (0.036)
0.554 (0.033)
0.452 (0.057)
0.465 (0.032)
0.466 (0.054)
0.457 (0.045)
0.429 (0.030)
0.442 (0.019)
0.441 (0.036)
0.446 (0.033)
0.126 (0.103)
0.145 (0.048)
0.142 (0.093)
0.118 (0.078)
0.093 (0.041)
0.109 (0.025)
0.098 (0.049)
0.090 (0.046)
Notes: 1,000 Monte Carlo replications. Each run initialized with 500 burn-in replications later disregarded. Sample size T = 200. Monte Carlo median values of the parameter estimates and the associated standard errors reported. “Lagged PC” refers to when the DGP consists of a Phillips Curve with first and second lag inflation terms. Similarly, “Lagged IS” refers to when the DGP consists of a Phillips curve with first and second lag output gap terms. h = 2 uses the first 2 horizons of the impulse response function when estimating with PMD (to obtain over-identification) and corresponds to using the first two lags of the variables as instruments when estimating by GMM. h* refers to the optimal horizon selected by Hall et al.’s (2007) information criterion and varies with the model. “Benchmark” and “Lagged IS” cases impose . “Lagged PC” case estimates these parameters unconstrained (since in the DGP but ).
-
47
Table 2.2.2 Monte Carlo Comparison: GMM vs. PMD.
Case 2:
D.G.P.
Instrument list includes Rt
PMD GMM PMD GMM Benchmark h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6 h = 2 h* = 6
0.509 (0.035)
0.505 (0.028)
0.931 (0.053)
0.887 (0.046)
0.570 (0.027)
0.545 (0.021)
0.793 (0.035)
0.765 (0.0