Why You Should Never Use the Hodrick-Prescott Filter∗
James D. Hamilton
Department of Economics, UC San Diego
July 30, 2016
Revised: May 13, 2017
ABSTRACT
Here’s why. (1) The HP filter produces series with spurious dynamic relations that have no
basis in the underlying data-generating process. (2) Filtered values at the end of the sample are
very different from those in the middle, and are also characterized by spurious dynamics. (3) A
statistical formalization of the problem typically produces values for the smoothing parameter
vastly at odds with common practice, e.g., a value for λ far below 1600 for quarterly data. (4)
There’s a better alternative. A regression of the variable at date t + h on the four most recent
values as of date t offers a robust approach to detrending that achieves all the objectives sought
by users of the HP filter with none of its drawbacks.
—————————————————————————————————-
∗I thank Daniel Leff for outstanding research assistance on this project and Frank Diebold,
Robert King, James Morley, and anonymous referees for helpful comments on an earlier draft of
this paper .
1 Introduction.
Often economic researchers have a theory that is specified in terms of a stationary environment,
and wish to relate the theory to observed nonstationary data without modeling the nonstationar-
ity. Hodrick and Prescott (1981, 1997) proposed a very popular method for doing this, commonly
interpreted as decomposing an observed variable into trend and cycle. Although drawbacks to
their approach have been known for some time, the method continues today to be very widely
adopted in academic research, policy studies, and analysis by private-sector economists. For
this reason it seems useful to collect and expand on those earlier concerns here and note that
there is a better way to solve this problem.
2 Characterizations of the Hodrick-Prescott filter.
Given T observations on a variable yt, Hodrick and Prescott (1981, 1997) proposed interpreting
the trend component gt as a very smooth series that does not differ too much from the observed
yt.1 It is calculated as
min{gt}Tt=−1
{∑Tt=1(yt − gt)
2 + λ∑T
t=1[(gt − gt−1)− (gt−1 − gt−2)]2}. (1)
When the smoothness penalty λ→ 0, gt would just be the series yt itself, whereas when λ→∞
the procedure amounts to a regression on a linear time trend (that is, produces a series whose
second difference is exactly 0). The common practice is to use a value of λ = 1600 for quarterly
time series.
1 Phillips and Jin (2015) reviewed the rich prior history of generalizations of this approach.
1
A closed-form expression for the resulting series can be written in vector notation by defining
T = T + 2, y(T×1)
= (yT , yT−1, ..., y1)′, g
(T×1)= (gT , gT−1, ..., g−1)
′ and
H(T×T )
=
[IT
(T×T )0
(T×2)
]
Q(T×T )
=
1 −2 1 0 · · · 0 0 0
0 1 −2 1 · · · 0 0 0
......
...... · · · ...
......
0 0 0 0 · · · −2 1 0
0 0 0 0 · · · 1 −2 1
.
The solution to (1) is then given by2
g∗ = (H ′H + λQ′Q)−1H ′y = A∗y. (2)
The inferred trend g∗t for any date t is thus a linear function of the full set of observations on y
for all dates.
As noted by Hodrick and Prescott (1981) and King and Rebelo (1993), the identical inference
can alternatively be motivated from particular assumptions about the time-series behavior of the
trend and cycle components. Suppose our goal was to choose a value for a (T × 1) vector at
such that the estimate gt = a′ty has minimum expected squared difference from the true trend:
minatE(gt − a′ty)2. (3)
2 The appendix provides a derivation of equations (2) and (4). Cornea-Madeira (forthcoming) providedfurther details on A∗ and a convenient algorithm for calculating it.
2
The solution to this problem is the population analog to a sample regression coefficient, and is
a function of the variance of y and its covariance with g:
g = E(gy′) [E(yy′)]−1y = Ay. (4)
As an example of a particular set of assumptions we might make about these covariances, let ct
denote the cyclical component and vt the second difference of the trend component:
yt = gt + ct (5)
gt = 2gt−1 − gt−2 + vt. (6)
Suppose that we believed that vt and ct are uncorrelated white noise processes that are also
uncorrelated with (g0, g−1), and let C0 denote the (2×2) variance of (g0, g−1). These assumptions
imply a particular value for A in (4). As we let the variance of (g0, g−1) become arbitrarily large
(represented as C−10 → 0), then in every sample the inference (4) would be numerically identical
to expression (2).
Proposition 1. For λ = σ2c/σ
2v and any fixed T, under conditions (5)-(6) with ct and vt
white noise uncorrelated with each other and uncorrelated with (g0, g−1), the matrix A in (4)
converges to the matrix A∗ in (2) as C−10 → 0.
The proposition establishes that if researcher 1 sought to identify a trend by solving the
minimization problem (1) while researcher 2 found the optimal linear estimate of a trend process
that was assumed to be characterized by the particular assumption that vt and ct were both
white noise, the two researchers would arrive at the numerically identical series for trend and
cycle provided the ratio of σ2c to σ2
v assumed by researcher 2 was identical to the value of λ used
by researcher 1.
3
The Kalman smoother is an iterative algorithm for calculating the population linear projec-
tion (4) for models where the variance and covariance can be characterized by some recursive
structure.3 In this case, (5) is the observation equation and (6) is the state equation. Thus
as noted by Hodrick and Prescott, applying the Kalman smoother to the above state-space
model starting from a very large initial variance for (g0, g−1)′ offers a convenient algorithm for
calculating the HP filter, and is in fact a way that the HP filter is often calculated in practice.
Nevertheless, this observation should also be a bit troubling for users of the HP filter, in that
they never defend the claim that the particular structure assumed in Proposition 1 is an accurate
representation of the true data-generating process. Indeed, if a researcher did know for certain
that these equations were the true data-generating process, and further knew for certain the
value of the population parameter λ = σ2c/σ
2v, he would probably be unhappy with using (2) to
separate cycle from trend! The reason is that if this state-space structure was the true DGP,
the resulting estimate of the cyclical component ct = yt − gt would be white noise– it would
be random and exhibit no discernible patterns. By contrast, users of the HP filter hope to
see suggestive patterns in plots of the series that is supposed to be interpreted as the cyclical
component of yt.
Premultiplying (2) by H ′H + λQ′Q gives a system of equations whose tth element is
[1 + λ(1− L−1)2(1− L)2]g∗t = yt for t = 1, 2, ..., T − 2 (7)
for L the lag operator (Lkxt = xt−k, L−kxt = xt+k). In other words, F (L)g∗t = yt for
F (L) = 1 + λ(1− L−1)2(1− L)2. (8)
3 See for example Hamilton, 1994, equation [13.6.3].
4
The following proposition establishes some properties of this filter.4
Proposition 2. For any λ : 0 < λ <∞, the inverse of the operator (8) can be written
[F (L)]−1 = C
[1− (φ2
1/4)L
1− φ1L− φ2L2
+1− (φ2
1/4)L−1
1− φ1L−1 − φ2L
−2 − 1
](9)
where
1
1− φ1z − φ2z2
=∑∞
j=0Rj[cos(mj) + cot(m) sin(mj)]zj (10)
1
1− φ1z−1 − φ2z
−2 =∑∞
j=0Rj[cos(mj) + cot(m) sin(mj)]z−j
φ1(1− φ2) = −4φ2 (11)
(1− φ1 − φ2)2 = −φ2/λ (12)
C =−φ2
λ(1− φ21 − φ2
2 + φ31/2)
(13)
R =√−φ2
cos(m) = φ1/(2R). (14)
Roots of (1−φ1z−φ2z2) = 0 are complex and outside the unit circle, φ1 is a real number between
0 and 2, φ2 a real number between −1 and 0, and R a real number between 0 and 1.
Figure 1 plots the values of φ1 and φ2 generated by different values of λ. For λ = 1600,
φ1 = 1.777 and φ2 = −0.7994. These imply R = 0.8941, so that the absolute value of the
weights decay with a half-life of about 6 quarters while R60 = 0.0012.5
4 Related results have been developed by Singleton (1988), King and Rebelo (1989, 1993), Cogley and Nason(1995), and McElroy (2008). Unlike these papers, here I provide simple direct expressions for the values of φ1and φ2, and my analytical expressions of the HP filter entirely in terms of real parameters in (9) and (10) appearto be new.
5 The other parameters for this case are C = 0.056075, m = 0.111687 and cot(m) = 8.9164.
5
Expression (7) means that for t more than 15 years from the start or end of a sample of
quarterly data, the cyclical component ct = yt − g∗t is well approximated by
ct = λ(1− L−1)2(1− L)2g∗t =λ(1− L−1)2(1− L)2
F (L)yt =
λ(1− L)4
F (L)yt+2. (15)
As noted by King and Rebelo, obtaining the cyclical component for these observations thus
amounts to taking fourth differences of the original yt+2 and applying the operator [F (L)]−1 to
the result, so that the HP cycle might be expected to produce a stationary series as long as
fourth-differences of the original series are stationary. However, De Jong and Sakarya (2016)
noted there could still be significant nonstationarity coming from observations near the start or
end of the sample, and Phillips and Jin (2015) concluded that for commonly encountered sample
sizes, the HP filter may not successfully remove the trend even if the true series is only I(1).
3 Drawbacks to the HP filter.
3.1 Appropriateness for typical economic time series.
The presumption by users of the HP filter is that it offers a reasonable approach to detrending
for a range of commonly encountered economic time series. The leading example of a time-series
process for which we would want to be particularly convinced of the procedure’s appropriateness
would be a random walk. Simple economic theory suggests that variables such as stock prices
(Fama, 1965), futures prices (Samuelson, 1965), long-term interest rates (Sargent, 1976; Pesando,
1979), oil prices (Hamilton, 2009), consumption spending (Hall, 1978), inflation, tax rates, and
money supply growth rates (Mankiw, 1987) should all follow martingales or near martingales. To
be sure, hundreds of studies have claimed to find evidence of statistically detectable departures
from pure martingale behavior in all these series. Even so, there is indisputable evidence that
6
a random walk is often extremely hard to beat in out-of-sample forecasting comparisons, as has
been found for example by Meese and Rogoff (1983) and Cheung, Chinn, and Pascual (2005)
for exchange rates, Flood and Rose (2010) for stock prices, Atkeson and Ohanian (2001) for
inflation, or Balcilar, et al. (2015) for GDP, among many others. Certainly if we are not
comfortable with the consequences of applying the HP filter to a random walk, then we should
not be using it as an all-purpose approach to economic time series.
For yt = yt−1+εt, where εt is white noise and (1−L)yt = εt, Cogley and Nason (1995)6 noted
that expression (15) means that when the HP filter is applied to a random walk, the cyclical
component for observations near the middle of the sample will approximately be characterized
by
ct =λ(1− L)3
F (L)εt+2.
For λ = 1600 this is
ct = 89.72{−q0,t+2 +
∑∞j=0(0.8941)j[cos(0.1117j) + 8.916 sin(0.1117j)](q1,t+2−j + q2,t+2+j)
}with q0t = εt − 3εt−1 + 3εt−2 − εt−3, q1t = εt − 3.79εt−1 + 5.37εt−2 − 3.37εt−3 + 0.79εt−4),
7 and
q2t = −0.79εt+1+3.37εt−5.37εt−1+3.79εt−2−εt−3. The underlying innovations εt are completely
random and exhibit no patterns, whereas the series ct is both highly predictable (as a result of
the dependence on lags of εt−j) and will in turn predict the future (as a result of dependence
on future values of εt+j). Since the coefficients that make up [F (L)]−1 are determined solely by
the value of λ, these patterns in the cyclical component are entirely a feature of having applied
6 Harvey and Jaeger (1993) also have a related discussion.
7 The term q1t is the expansion of (1− L)3[1− (φ21/4)L]εt.
7
the HP filter to the data rather than reflecting any true dynamics of the data-generating process
itself.
For example, consider the behavior of stock prices and real consumption spending.8 The
top panels of Figure 2 show the autocorrelation functions for first-differences of these series,
confirming that there is little ability to predict either from its own past values, as we might
have expected from the literature cited at the start of this section. The lower panels show cross
correlations. Consumption has no predictive power for stocks, though stock prices may have a
modest ability to anticipate changes in aggregate consumption.
Figure 3 shows the analogous results if we tried to remove the trend by HP filtering rather
than first-differencing. The HP cyclical components of stock prices and consumption are both
extremely predictable from their own lagged values as well as each other. The rich dynamics in
these series are purely an artifact of the filter itself and tell us nothing about the underlying data-
generating process. Filtering takes us from the very clean understanding of the true properties
of these series that we can easily see in Figure 2 to the artificial set of relations that appear in
Figure 3. The values plotted in Figure 3 summarize the filter, not the data.
3.2 Properties of the one-sided HP filter.
The HP trend and cycle have an artificial ability to “predict” the future because they are by
construction a function of future realizations. One way we might try to get around this would
be to restrict the minimization problem in (3), forcing at to load only on values (yt, yt−1, ..., y1)′
8 Stock prices were measured as 100 times the natural log of the end-of-quarter value for the S&P 500 andconsumption from 100 times the natural log of real personal consumption expenditures from the U.S. NIPAaccounts. All data for this figure are quarterly for the period 1950:1 to 2016:1.
8
that have been observed as of date t, rather than also using future values as was done in the HP
filter (4). The value of this one-sided projection for date t could be calculated by taking the
end-of-sample HP-filtered series for a sample ending at t, repeated for each t.9
The top panel of Figure 4 shows the result of applying the usual two-sided HP filter to stock
prices. The trend is identified to have been essentially flat throughout the 2000s, with the pre-
recession booms and post-recession busts in stock prices viewed entirely as cyclical phenomena.
The bottom panel shows the results of applying a one-sided HP filter to the same data. This
would instead identify the trend component as rising during economic expansions and falling
during recessions. The reason is that a real-time observer would not know in early 2009, for
example, that stock prices were about to appreciate remarkably, and accordingly would have
judged much of the drop observed up to that date to be permanent. It is only with hindsight
that we are tempted to interpret the 2008 stock-market crash as a temporary phenomenon.
Making use of unknowable future values in this way is in fact a fundamental reason that HP-
filtered series exhibit the visual properties that they do, precisely because they impose patterns
that are not a feature of the data-generating process and could not be recognized in real time.
Some researchers might be attracted by the simple picture of the “long-run” component of stock
prices summarized by the top panel of Figure 4. But that picture is just something that their
imagination has imposed on the data. And the end-of-sample value obtained from the procedure
is actually quite different from what the researcher is seeing in the middle of the sample.
Moreover, although a one-sided filter would eliminate the problem of generating a series that
9 An easier way to calculate the one-sided projection is with a single pass of the Kalman filter through theentire sample for the state-space model assumed in Proposition 1 with C0 large and σ2
c/σ2v = 1600. The Kalman
filter gives the one-sided projection while the Kalman smoother gives the usual two-sided HP filter.
9
is artificially able to predict the future, changes in both the one-sided trend and its implied
cycle are readily forecastable from their own lagged values, and likewise by values of any other
variables. Again this is not a feature of the stock prices themselves, but instead is an artifact
of choosing to characterize the cycle and trend in this particular way.
3.3 Data-coherent values for λ.
A separate question is what value we should use for the smoothing parameter λ. Hodrick and
Prescott motivated their choice of λ = 1600 based on the prior belief that a large change in
the cyclical component within a quarter would be around 5%, whereas a large change in the
trend component would be around (1/8)%, suggesting a choice of λ = σ2c/σ
2v = (5/(1/8))2 =
1600. Ravn and Uhlig (2002) showed how to choose the smoothing parameter for data at other
frequencies if indeed it would be correct to use 1600 on quarterly data. These rules of thumb
are almost universally followed.
It’s worth noting that if the state-space representation in Proposition 1 were indeed an
accurate characterization of the trend that we were trying to infer, we would not need to make
up a value for λ but could in fact estimate it from the data. If for example we assumed a Normal
distribution for the innovations (vt, ct)′ we could use the Kalman filter to evaluate the likelihood
function for the observed sample (y1, ...., yT )′ and find the values for σ2v and σ2
c that maximize
the likelihood function.10 This could alternatively be given a quasi-maximum likelihood
interpretation as a GLS minimization of the squared forecast errors weighted by reciprocals of
10 See for example Hamilton (1994, equations [13.4.1]-[13.4.2]). Note that although the inferred value for thetrend gt depends only on the ratio σ2
c/σ2v, the parameters σ2
c and σ2v are separately identifiable because σ2
c canbe inferred from the average observed size of (yt − gt)2.
10
their model-implied variance.
Table 1 reports MLEs of σ2v, σ
2c , and λ for a number of commonly studied macroeconomic
series. For every one of these we would estimate a value for σ2c whose magnitude is similar to,
and in fact often smaller than, σ2v, and certainly not 1600 times as large.11 If we used a value
of λ = 1 instead of λ = 1600, the resulting series for gt would differ little from the original data
yt itself; λ = 1 implies a value for R in expression (10) of 0.48, which decays with a half-life of
less than one quarter.
Thus not only is the HP filter very inappropriate if the true process is a random walk. As
commonly applied with λ = 1600, the HP filter is not even optimal for the only example (namely
(5)-(6)) for which anyone has claimed that it might provide the ideal inference!
4 A better alternative.
Here I suggest an alternative concept of what we might mean by the cyclical component of a
possibly nonstationary series: how different is the value at date t + h from the value that we
would have expected to see based on its behavior through date t?12 This concept of the cyclical
component has several attractive features. First, as noted by den Haan (2000), the forecast
error is stationary for a wide class of nonstationary processes. Second, the primary reason that
we would be wrong in predicting the value of most macro and financial variables at a horizon
11 Nelson and Plosser (1982, pages 157-158) have also made this observation.
12 This idea is related to Beveridge and Nelson’s (1981) definition of the trend component of yt as gt = limh→∞
limp→∞
E(yt+h|yt, yt−1, .., yt−p+1) which limit exists and can be calculated provided that (1 − L)yt is a mean-zero
stationary process. Beveridge and Nelson then (somewhat curiously) interpreted the cyclical component as ct =gt−yt. By contrast, here we keep h and p fixed and take advantage of the fact that gt = E(yt+h|yt, yt−1, .., yt−p+1)exists for a broad range of nonstationary processes. We interpret the cyclical component at date t+ h as ct+h =yt+h − gt.
11
of h = 8 quarters ahead is cyclical factors such as whether a recession occurs over the next two
years and the timing of recovery from any downturn.13
While it might seem that calculating this concept of the cyclical component requires us
already to know the nature of the nonstationarity and to have the correct model for forecasting
the series, neither of these is the case. We can instead always rely on very simple forecasts
within a restricted class, namely, the population linear projection of yt+h on a constant and the
4 most recent values of y as of date t. This object exists and can be consistently estimated for
a wide range of nonstationary processes, as I now show.
4.1 Forecasting when the true process is unknown.
Suppose that the dth difference of yt is stationary for some d. For example, d = 2 would mean
that the growth rate is nonstationary but the change in the growth rate is stationary. Note the
dth difference is also stationary for any series with a deterministic time trend characterized by
a dth-order polynomial in time. For any such process we can write the value of yt+h as a linear
function of initial conditions at time t plus a stationary process. For example, when d = 1,
letting ut = ∆yt we can write
yt+h = yt + w(h)t (16)
where the stationary component is given by w(h)t = ut+1 + · · ·+ ut+h. For d = 2 and ∆2yt = ut,
yt+h = yt + h∆yt + w(h)t (17)
where now w(h)t = ut+h + 2ut+h−1 + · · ·+hut+1. This result holds for general d, as demonstrated
in the following proposition.
13 This same consideration suggests using h = 24 for monthly data and h = 2 for annual data.
12
Proposition 3. If (1− L)dyt is stationary for some d ≥ 1, then for all finite h ≥ 1,
yt+h = κ(1)h yt + κ
(2)h ∆yt + · · ·+ κ
(d)h ∆d−1yt + w
(h)t
with ∆s = (1− L)s, κ(1)` = 1 for ` = 1, 2, .. and κ
(s)j =
∑j`=1 κ
(s−1)` for s = 2, 3, ..., d and w
(h)t is
a stationary process.
It further turns out that if ∆dyt ∼ I(0) and we regress yt+h on a constant and the d most
recent values of y as of date t, the coefficients will be forced to be close to the values implied
by the coefficients κ(j)h in Proposition 3. For example, if ∆2yt is I(0) then in a regression of
yt+h on (yt, yt−1, 1)′, the fitted values will tend to yt + h(yt − yt−1) + µh for µh = E(w(h)t ) as the
sample size gets large; that is, the coefficient on yt will go to 1 + h and the coefficient on yt−1
will go to −h. The implication is that the residuals from a regression of yt+h on (yt, yt−1, 1)′ will
be stationary whenever y itself is I(2). The reason is that any other values for these coefficients
would imply a nonstationary series for the residuals, whose sum of squares become arbitrarily
large relative to those implied by the coefficients 1 + h and −h as the sample size grows large.
If ∆dyt is stationary and we regress yt+h on a constant and the p most recent values of
y as of date t for any p > d, the regression will use d of the coefficients to make sure the
residuals are stationary and the remaining p + 1 − d coefficients will be determined by the
parameters that characterize the population linear projection of the stationary variable w(h)t on
the stationary regressors (∆dyt,∆dyt−1, ...,∆
dyt−p+d+1, 1)′. The following proposition provides a
formal statement of these claims. In the proof of this proposition I have followed Stock (1994,
p. 2756) in defining a series ut to be I(0) if it has fixed mean µ and satisfies a Functional Central
13
Limit Theorem.14 This requires that the sample mean of ut has a Normal distribution as the
sample size T gets large, as does a sample mean that used only Tr observations for 0 < r ≤ 1.
Formally,
T−1/2∑[Tr]
s=1(ut − µ)⇒ ωW (r), (18)
where [Tr] denotes the largest integer less than or equal to Tr, W (r) denotes Standard Brownian
Motion, and “⇒” denotes weak convergence in probability measure. I will show that if either
the dth difference (ut = ∆dyt) satisfies (18) or if the deviation from a dth-order deterministic
polynomial in time (ut = yt − δ0 − δ1t− δ2t2 − · · · − δdtd) satisfies (18), then we can remove the
nonstationary component with the same simple regression.15
Proposition 4. Suppose that either ut = ∆dyt satisfies (18) or that ut = yt−∑d
j=0 δjtj with
δd 6= 0 satisfies (18) for some unknown d. Let xt = (yt, yt−1, ..., yt−p+1, 1)′ for some p ≥ d and
consider OLS estimation of yt+h = x′tβ + vt+h for t = 1, ..., T with estimated coefficient
β =(∑T
j=1 xtx′t
)−1 (∑Tj=1 xtyt+h
). (19)
If p = d, the OLS residuals yt+h− x′tβ converge to the variable w(h)t −E(w
(h)t ) in Proposition 3.
If p > d, the OLS residuals converge to the residuals from a population linear projection of w(h)t
on (∆dyt,∆dyt−1, ...,∆
dyt−p+d+1, 1)′.
14 Stock (1994, p. 2749) demonstrated that an example of sufficient conditions that imply (18) is that ut =µ+
∑∞j=0 ψjηt where ηt is a martingale difference sequence with variance σ2 and finite fourth moment, ψ(1) 6= 0,
and∑∞
j=0 j|ψj | <∞, in which case ω2 in (18) is given by σ2[ψ(1)]2. Alternatively, Phillips (1987, Lemma 2.2)derived (18) from primitive moment and mixing conditions on ut.
15 The reason to state these as two separate possibilities is that if the nonstationarity is purely deterministic,then the dth differences will not satisfy the Functional Central Limit Theorem. For example, if yt = γ0 +γ1t+εtwith εt white noise, then ∆yt = γ1 + ψ(L)εt for ψ(L) = 1 − L and ψ(1) = 0. Of course when ut = ∆dytsatisfies (18) with µ 6= 0, the series yt has both dth-order stochastic as well as dth-order deterministic polynomialtrends, so that case, along with pure stochastic trends (µ = 0) and pure deterministic trends are all allowed byProposition 4.
14
Proposition 4 establishes that if we estimate an OLS regression of yt+h on a constant and the
p = 4 most recent values of y as of date t,
yt+h = β0 + β1yt + β2yt−1 + β3yt−2 + β4yt−3 + vt+h, (20)
the residuals
vt+h = yt+h − β0 − β1yt − β2yt−1 − β3yt−2 − β4yt−3 (21)
offer a reasonable way to construct the transient component for a broad class of underlying
processes. The series is stationary provided that fourth differences of yt are stationary, a goal
that HP intends but does not necessarily achieve. But whereas the approximation to the HP
filter in equation (15) imposes all 4 unit roots, the sample regression would only use 4 differences
if it is warranted by observed features of the data. The proposed procedure has a number of other
advantages over HP. First, any finding that vt+h predicts some other variable xt+h+j represents
a true ability of y to predict x rather than an artifact of the way we chose to detrend y, by virtue
of the fact that vt+h is a one-sided filter16 . Second, unlike the HP cyclical series ct+h, the value
of vt+h will by construction be difficult to predict from variables dated t and earlier.17 If we find
such predictability, it tells us something about the true data-generating process, for example,
that x Granger-causes y. Third, the value of vt+h is a model-free and essentially assumption-
free summary of the data. Regardless of how the data may have been generated, as long as
(1− L)dyt is covariance stationary for some d ≤ 4, there exists a population linear projection of
16 While the OLS coefficients (19) make use of future observations, this influence vanishes asymptotically, incontrast to HP’s first-order dependence on future observations. Expression (22) below offers another alternativethat allows zero inference of future observations for any sample size T.
17 Note however that ct+h by construction can be predicted by variables known at date t+h− 1. The value ofct+h will be correlated with its own lagged values ct+h−1, ct+h−2, ..., ct+1 but likely uncorrelated with ct, ct−1, ...
15
yt+h on (yt, yt−1, yt−2, yt−3, 1)′. That projection is a characteristic of the data-generating process
that can be used to define what we mean by the cyclical component of the process and can be
consistently estimated from the data. Given a dynamic stochastic general equilibrium or any
other theoretical model that would imply an I(d) process, we could calculate this population
characteristic of the model and estimate it consistently from the data.
4.2 Properties in some common settings.
Random walk. Given the literature cited in Section 3.1 it is instructive to examine the conse-
quences if this procedure were applied to a random walk: yt = yt−1 + εt. In this case, d = 1
and w(h)t = εt+h + εt+h−1 + · · ·+ εt+1. For large samples, the OLS estimates of (20) converge to
β1 = 1 and all other βj = 0, and the resulting filtered series would simply be the difference
vt+h = yt+h − yt, (22)
that is, how much the series changes over an h = 8-quarter horizon, or equivalently the sum
of the observed changes over h periods. Note that for h = 8 the filter 1 − Lh wipes out any
cycles with frequency of exactly one year, and thus is taking out both the long-run trend as
well as any strictly seasonal components.18 This also fits with the common understanding of
what we would mean by the cyclical component. Because the simple filter (22) does not require
estimation of any parameters, it can also be used as a quick robustness check for concerns about
the small-sample applicability of the asymptotic claims in Proposition 4, as will be illustrated
in the applications below.
18 As in Hamilton (1994, pp. 171-172), the filter 1 − L8 has power transfer function (1 − e−8iω)(1 − e8iω) =2− 2 cos(8ω) which is zero at ω = 0, π/4, π/2, 3π/4, π and thus eliminates not only cycles at the zero frequencybut also cycles that repeat themselves every 8,4,8/3, or 2 quarters. See also Hamilton (1994, Figures 6.5 and6.6).
16
Deterministic time trend. Another instructive example is a pure deterministic time trend of
order d = 1: yt = δ0 + δ1t+ εt for εt white noise. In this case ∆yt = δ1 + εt − εt−1 is stationary
and w(h)t = ∆yt+1 + · · · + ∆yt+h = δ1h + εt+h − εt is also stationary for any h. I show in the
appendix that for this case the limiting coefficients on yt, .., yt−p+1 described by Proposition 4
are each given by 1/p and the implied trend for yt+h is
δ0 + δ1(t+ h) + p−1(εt + εt−1 + · · ·+ εt−p+1). (23)
Even for p = 1 this is not a bad estimate and for p = 4 should not differ much from the true
trend δ0 + δ1(t + h). Again regardless of the choice of p, the difference between yt+h and (23)
will be stationary.
Interpreting DSGE’s. A third instructive example is when yt is an element of a theoretical
dynamic stochastic general equilibrium model that is stationary around some steady-state value
µ. If the effects of shocks in the theoretical model die out after h periods, then the linear
projection (20) in the theoretical model is characterized by β0 = µ and β1 = β2 = β3 = β4 = 0.
In other words, the component vt+h is exactly the deviation from the steady state. If shocks have
not completely died out after h periods, then part of what is being labeled trend by this method
would include the components of shocks that persist longer than h periods. But for any value
of h, the linear projection is a well-defined population characteristic of the theoretical stationary
model, and there is an exactly analogous object one can calculate in the possibly nonstationary
observed data. The method thus offers a way to make an apples-to-apples comparison of theory
with data of the sort that users of the HP filter often desire, but which the HP filter itself will
always fail to deliver.
17
4.3 Specification of p and h.
One might be tempted to use a richer model than (20) to forecast yt+h, such as using a vector of
variables, more than 4 lags, or even a nonlinear relation. However, such refinements are com-
pletely unnecessary for the goal of extracting a stationary component, and have the significant
drawback that the more parameters we try to estimate by regression, the more the small-sample
results are likely to differ from the asymptotic predictions. The simple univariate regression
(20) is estimating a population object that is well defined regardless of whether the variable is
part of a large vector system with nonlinear dynamics. For this reason, just as the HP filter
is always implemented as a univariate procedure, my recommendation is to follow that same
strategy for the approach here.
A related issue is the choice of h. For any fixed h, there exists a sample size T for which
the results of Proposition 4 hold. However, a bigger sample size T will be needed the bigger is
h. The information in a finite data set about very long-horizon forecasts is quite limited. If
we are interested in business cycles, a 2-year horizon should be the standard benchmark. It is
also desirable with seasonal data to have both p and h be integer multiples of the number of
observations in a year. Hence for quarterly data my recommendation is p = 4 and h = 8.
In other settings the fundamental interest could be in shocks whose effects last substantially
longer than two years but are nevertheless still transient. A leading example would be the recent
interest in debt cycles prompted by datasets such as developed by Jorda, Schularick, and Taylor
(2016). For such an application I would use h = 5 years, with the regression-free implementation
(yt+5 − yt) having particular appeal given the length of datasets available.
18
4.4 Empirical illustrations.
Figure 5 shows the results when this approach is applied to data on U.S. total employment.
The raw seasonally adjusted data (yt) are plotted in the upper left panel. The residuals from
regression (20) estimated for these data are plotted in black in the lower-left panel, while the
8-lag difference (22) is in red. The latter two series behave very similarly in this case, as indeed
I have found for most other applications. The primary difference is that the regression residual
has sample mean zero by construction (by virtue of the inclusion of a constant term in the
regression) whereas the average value of (22) will be the average growth rate over a two-year
period.
One interesting observation is that the cyclical component of employment starts to decline
significantly before the NBER business cycle peak for essentially every recession. Note that this
inference from Figure 5 is summarizing a true feature of the data and is not an artifact of any
forward-looking aspect of the filter.
The right panels of Figure 5 show what happens when the same procedure is applied to
seasonally unadjusted data. The raw data themselves exhibit a very striking seasonal pattern,
as seen in the top right panel. Notwithstanding, the cyclical factor inferred from seasonally un-
adjusted data (bottom right panel) is almost indistinguishable from that derived from seasonally
adjusted data, confirming that this approach is robust to methods of seasonal adjustment.
Figure 6 applies the method to the major components of the U.S. national income and
product accounts. Investment spending is more cyclically volatile than GDP, while consumption
spending is less so. Imports fall significantly during recessions, reflecting lower spending by
19
U.S. residents on imported goods, and exports substantially less so, reflecting the fact that
international downturns are often decoupled from those in the U.S. Detrended government
spending is dominated by war-related expenditures– the Korean War in the early 1950s, the
Vietnam War in the 1970s, and the Reagan military build-up in the 1980s.
Table 2 reports the standard deviation of the cyclical component of each of these and a
number of other series, along with their correlation with the cyclical component of GDP. We
find very little cyclical correlation between output and prices.19 Both the nominal fed funds
rate and the ex ante real fed funds rate (the latter based on the measure in Hamilton, et al.,
2016) are modestly procyclical, whereas the 10-year nominal interest rate is not.
5 Conclusion.
The HP filter is intended to produce a stationary component from an I(4) series, but in practice
it can fail to do so, and invariably imposes a great cost. It introduces spurious dynamic relations
that are purely an artifact of the filter and have no basis in the true data-generating process,
and there exists no plausible data-generating process for which common popular practice would
provide an optimal decomposition into trend and cycle. There is an alternative approach
that can isolate a stationary component from any I(4) series, preserves the underlying dynamic
relations and consistently estimates well-defined population characteristics for a broad class of
possible data-generating processes.
19 Identifying the sign of this correlation was one of the primary interests of den Haan’s (2000) application ofa related methodology. In contrast to the results in Table 2, he found a positive correlation between the cyclicalcomponents of these series. I attribute the difference to differences in sample period.
20
References
Atkeson, Andrew and Lee E. Ohanian. 2001. “Are Phillips Curves Useful for Forecasting
Inflation?,” Quarterly Review, Federal Reserve Bank of Minneapolis, 25(1): 2-11.
Balcilar, Mehmet, Rangan Gupta, Anandamayee Majumdar and Stephen M. Miller. 2015.
“Was the Recent Downturn in US Real GDP Predictable?,” Applied Economics, 47(28): 2985-
3007.
Beveridge, Stephen, and Charles R. Nelson. 1981. “A New Approach to Decomposition of
Economic Time Series into Permanent and Transitory Components with Particular Attention to
Measurement of the ‘Business Cycle’,” Journal of Monetary Economics, 7:151-174.
Cheung, Yin-Wong, Menzie D. Chinn & Antonio Garcia Pascual. 2005. “Empirical exchange
rate models of the nineties: Are any fit to survive?,” Journal of International Money and Finance,
24(7): 1150-1175.
Choi, In. 1993. “Asymptotic Normality of the Least-Squares Estimates for Higher Order
Autoregressive Integrated Processes with Some Applications,” Econometric Theory, 9:263-282.
Cogley, Timothy, and James M. Nason. 1995. “Effects of the Hodrick-Prescott Filter on
Trend and Difference Stationary Time Series: Implications for Business Cycle Research,” Journal
of Economic Dynamics and Control, 19(1-2): 253-278.
Cornea-Madeira, Adriana. Forthcoming. “The Explicit Formula for the Hodrick-Prescott
Filter in Finite Sample,” Review of Economics and Statistics.
De Jong, Robert M., and Neslihan Sakarya. 2016. “The Econometrics of the Hodrick-Prescott
Filter,” Review of Economics and Statistics, 98: 310–317.
Den Haan, Wouter J. 2000. “The Comovement between Output and Prices,” Journal of
21
Monetary Economics 46: 3-30.
Fama, Eugene F. 1965. “The Behavior of Stock-Market Prices,” The Journal of Business,
38(1): 34-105.
Flood, Robert P. and Andrew K. Rose. 2010. “Forecasting International Financial Prices
with Fundamentals: How do Stocks and Exchange Rates Compare?,” Globalization and Eco-
nomic Integration, Chapter 6, Edward Elgar Publishing.
Hall, Robert E. 1978. “Stochastic Implications of the Life Cycle-Permanent Income Hypoth-
esis: Theory and Evidence,” Journal of Political Economy, 86(6): 971-987.
Hamilton, James D. 1994. Time Series Analysis, Princeton: Princeton University Press.
Hamilton, James D. 2009. “Understanding Crude Oil Prices,” Energy Journal, 30(2): 179-
206.
Hamilton, James D., Ethan S. Harris, Jan Hatzius, and Kenneth D. West. 2016. “The
Equilibrium Real Funds Rate: Past, Present and Future,” IMF Economic Review 64: 660-707.
Harvey, Andrew C., and Albert Jaeger. 1993. “Detrending, Stylized Facts and the Business
Cycle,” Journal of Applied Econometrics, 8(3): 231-247.
Hodrick, Robert J. and Edward C. Prescott. 1981. “Postwar U.S. Business Cycles: An
Empirical Investigation,” working paper, Northwestern University.
Hodrick, Robert J. and Edward C. Prescott. 1997. “Postwar U.S. Business Cycles: An
Empirical Investigation,” Journal of Money, Credit and Banking, 29(1): 1-16.
Jorda, Oscar, Moritz Schularick, and Alan M. Taylor. 2016. “Macrofinancial History and
the New Business Cycle Facts,” in NBER Macroeconomics Annual 2016, volume 31.
King, Robert and Sergio T. Rebelo. 1989. “Low Frequency Filtering and Real Business
22
Cycles,” working paper, University of Rochester.
King, Robert and Sergio T. Rebelo. 1993. “Low Frequency Filtering and Real Business
Cycles,” Journal of Economic Dynamics and Control, 17: 207-231.
Mankiw, N. Gregory. 1987. “The Optimal Collection of Seigniorage: Theory and Evidence,”
Journal of Monetary Economics, 20(2): 327-341.
McElroy, Tucker. 2008. “Exact Formulas for the Hodrick-Prescott Filter,” Econometrics
Journal, 11(1): 209-217.
Meese, Richard A. and Kenneth Rogoff. 1983. “Empirical Exchange Rate Models of the
Seventies : Do they Fit Out of Sample?,” Journal of International Economics, 14(1-2): 3-24.
Nelson, Charles R., and Charles R. Plosser. 1982. “Trends and Random Walks in Macroe-
conomic Time Series: Some Evidence and Implications,” Journal of Monetary Economics 10:
139-162.
Park, Joon Y., and Peter C.B. Phillips. 1989. “Statistical Inference in Regressions with
Integrated Processes: Part 2,” Econometric Theory, 5:95-131.
Phillips, Peter C.B. 1987. “Time Series Regression with a Unit Root,” Econometrica, 55:
277-301.
Phillips, Peter C.B., and Sainan Jin. 2015. “Business Cycles, Trend Elimination, and the
HP Filter,” working paper, Yale University.
Pesando, James E. 1979. “On the Random Walk Characteristics of Short- and Long-Term
Interest Rates in an Efficient Market,” Journal of Money, Credit and Banking, Blackwell Pub-
lishing, 11(4): 457-466.
Ravn, Morten O., and Harald Uhlig. 2002. “On Adjusting the Hodrick-Prescott Filter for
23
the Frequency of Observations,” Review of Economics and Statistics, 84(2): 371-380.
Samuelson, P. 1965. “Proof that Properly Anticipated Prices Fluctuate Randomly,” Indus-
trial Management Review, 6(2): 41-49.
Sargent, Thomas J. 1976. “A Classical Macroeconometric Model for the United States,”
Journal of Political Economy, 84(2): 207-237.
Sims, Christopher A., James H. Stock, and Mark W. Watson. 1990. “Inference in Linear
Time Series Models with Some Unit Roots,” Econometrica 58: 113-144.
Singleton, Kenneth J. 1988. “Econometric Issues in the Analysis of Equilibrium Business
Cycle Models,” Journal of Monetary Economics, 21: 361-386.
Stock, James H. 1994. “Unit Roots, Structural Breaks, and Trends,” in Handbook of Econo-
metrics, Volume 4, pp. 2739-2841, edited by Robert F. Engle and Daniel L. McFadden. Ams-
terdam: Elsevier.
24
Appendix.
Derivation of equation (2). The minimization problem (1) can be written
ming{(y −Hg)′(y −Hg) + λ(Qg)′(Qg)} .
The derivative with respect to g is −2H ′(y −Hg) + 2λQ′Qg and setting this to 0 gives (2).
Derivation of equation (4). Define at = [E(yy′)]−1E(ygt). For any at we have
E(gt − a′ty)2 = E(gt − a′ty + a′ty − a′ty)2
= E(gt − a′ty)2 + 2E[(gt − a′ty)y′](at − at) + (at − at)′E(yy′)(at − at).
The middle term equals 0 by the definition of at:
E[(gt − a′ty)y′](at − at) = {E(gty′)− E(gty
′)[E(yy′)]−1E(yy′)}(at − at).
Hence E(gt − a′ty)2 is minimized when at = at. Stacking a′ty into a (T × 1) vector gives (4).
Proof of Proposition 1. The assumptions can be written formally as
E(vt) = E(ct) = 0 (24)
E
vt
ct
[ vt−j ct−j
]=
σ2v 0
0 σ2c
if j = 0
0 otherwise
(25)
E(g0) = E(g−1) = 0 (26)
E
g0
g−1
[ g0 g−1
]= C0 (27)
25
E
vt
ct
[ g0 g−1
]= 0 for t = 1, ...T. (28)
I first establish that under (5)-(6) and (24)-(28),
(Q′Q)E(gg′)H ′ → σ2vH′ (29)
as C−10 → 0. To do so write (6) as Qg = v for v = (vT , vT−1, ..., v1) and Q0g = v0 for v0 a (2× 1)
vector with mean 0 and variance σ2vI2. Also from (28), v0 is uncorrelated with v and
Q0(2×T )
=
[0
(2×T )P−10(2×2)
]
where P0 is the Cholesky factor of C0 (P0P′0 = C0). Stacking these, Q
Q0
g =
v
v0
so
E(gg′) = σ2v
Q
Q0
−1 [
Q′ Q′0
]−1
[Q′ Q′0
] Q
Q0
E(gg′) = (Q′Q+Q′0Q0)E(gg′) = σ2vIT
(Q′Q)E(gg′)H ′ = σ2vH′ − (Q′0Q0)E(gg′)H ′
which goes to σ2vH′ as P−10 → 0, as claimed in (29).
Notice next from
y(T×1)
= H(T×T )
g(T×1)
+ c(T×1)
26
that E(yy′) = HE(gg′)H ′ + σ2cIT and E(gy′) = E(gg′)H ′ + E(gc′) = E(gg′)H ′. Hence
A = E(gy′)[E(yy′)]−1 (30)
= E(gg′)H ′[HE(gg′)H ′ + σ2cIT ]−1.
Combining (2) and (30),
(H ′H + λQ′Q)(A∗ − A)[HE(gg′)H ′ + σ2cIT ]
= H ′[HE(gg′)H ′ + σ2cIT ]− (H ′H + λQ′Q)E(gg′)H ′ (31)
= H ′σ2c − (σ2
c/σ2v)(Q
′Q)E(gg′)H ′
which from (29) goes to 0 as C−10 → 0. Since the matrices premultiplying and postmultiplying
the left side of (31) are of full rank, this establishes that A∗ = A as claimed.
Proof of Proposition 2. Let θ1, θ2, θ3, θ4 be the roots satisfying F (θi) = 0. As noted
by King and Rebelo (1989), since λ > 0, F (z) in (8) is positive for all real z meaning that θi
comprise two pairs of complex conjugates. Since F (z) = F (z−1), if θi is a root, then so is θ−1i .
Thus the values of θi are given by Reim, Re−im, R−1eim, and R−1e−im for some fixed R and m;
one pair is inside the unit circle and the other is outside. Noting that the coefficients on z2 and
z−2 in F (z) are both λ, it follows that F (z) can be written
F (z) = λ(1− θ1z)(1− θ2z)(θ−11 − z−1)(θ−12 − z−1).
From the symmetry of F (z) in z and z−1 we can without loss of generality normalize θ1 and θ2
to be inside the unit circle and write
F (z) =λ
θ1θ2(1− θ1z)(1− θ2z)(1− θ1z−1)(1− θ2z−1).
27
Define (1− φ1z− φ2z2) = (1− θ1z)(1− θ2z), namely φ1 is the real number θ1 + θ2 and φ2 is the
negative real number −θ1θ2. Note also that the roots of (1− φ1z − φ2z2) = 0 are the complex
conjugates θ−11 and θ−12 , which are both outside the unit circle. This gives the bounds on φ1
and φ2 stated in Proposition 2 as in Hamilton (1994, Figure 1.5). Then
F (z) =λ
−φ2
(1− φ1z − φ2z2)(1− φ1z
−1 − φ2z−2). (32)
Evaluating (8) and (32) at z = 1 gives
F (1) = 1 = (1− φ1 − φ2)2λ/(−φ2) (33)
as claimed in (12), Likewise evaluating (8) and (32) at z = −1 gives
F (−1) = 1 + 16λ = (1 + φ1 − φ2)2λ/(−φ2). (34)
Taking the difference between these last two equations establishes (4φ1 − 4φ1φ2)λ/(−φ2) = 16λ
or φ1(1 − φ2) = −4φ2 as claimed in (11). Note that since φ2 < 0 (required by complex roots),
from (11) φ1 > 0.
I next establish that
1
(1− φ1z − φ2z2)(1− φ1z
−1 − φ2z−2)
=C0 + C1z
1− φ1z − φ2z2
+C0 + C1z
−1
1− φ1z−1 − φ2z
−2 +B0. (35)
Combining terms over a common denominator shows that (35) will hold provided
1 = (C0 + C1z)(1− φ1z−1 − φ2z
−2) + (C0 + C1z−1)(1− φ1z
1 − φ2z2)
+B0(1− φ1z − φ2z2)(1− φ1z
−1 − φ2z−2)
= [2C0 − 2C1φ1 +B0(1 + φ21 + φ2
2)]
+[C1 − C0φ1 − C1φ2 −B0φ1 +B0φ1φ2](z + z−1)
−[C0φ2 +B0φ2](z2 + z−2).
28
The coefficient on (z2 + z−2) will be zero provided B0 = −C0, or provided
1 = [C0 − 2C1φ1 − C0φ21 − C0φ
22] + [C1 − C1φ2 − C0φ1φ2](z + z−1). (36)
The coefficient on (z + z−1) will be zero provided
C1 =C0φ1φ2
1− φ2
= −C0φ21/4 (37)
where the last equation made use of (11). Substituting (37) into (36), we see that (35) will be
true provided we set
1 = C0(1− φ21 − φ2
2 + φ31/2).
Combining these results we conclude that
1
(1− φ1z − φ2z2)(1− φ1z
−1 − φ2z−2)
= C0
[1− (φ2
1/4)z
1− φ1z − φ2z2
+1− (φ2
1/4)z−1
1− φ1z−1 − φ2z
−2 − 1
].
From (32) we then obtain (9) with C = −C0φ2/λ as claimed in (13).
To derive (10), recall from Hamilton (1994, pp. 16 and 33) that
1
1− φ1z − φ2z2
=∑∞
j=0Rj[2α cos(mj) + 2β sin(mj)]zj. (38)
We know that the coefficient on zj for j = 0 must be 1, requiring [2α cos(0) + 2β sin(0)] = 1 or
α = 1/2. We likewise know that the coefficient on zj for j = 1 is given by φ1, so R[cos(m) +
2β sin(m)] = φ1, which from (14) gives R2β sin(m) = φ1/2 or 2β sin(m) = cos(m) so 2β =
cot(m). Substituting these values for α and β into (38) gives (10).
Proof of Proposition 3. Recall the identity
yt+h = yt +∑h
j=1 ∆yt+j (39)
29
which immediately gives the result of Proposition 3 for the case d = 1 as stated in (16). We
likewise have the identity
∆yt+j = ∆yt +∑j
s=1 ∆2yt+s. (40)
Substituting (40) into (39) gives
yt+h = yt + ∆yt∑h
j=1 1 + w(h)t
for w(h)t =
∑hj=1
∑js=1 ∆2yt+s as claimed in (17) for the case d = 2. We can proceed recursively
using the identity ∆kyt+s = ∆kyt+∑s
r=1 ∆k+1yt+r and substituting into the preceding expression.
For any d the resulting w(h)t is a finite sum of stationary variables and therefore is itself stationary.
Proof of Proposition 4. Note that the fitted values and residuals implied by the coefficients
in (19) are numerically identical to those if we were to do the (infeasible) regression yt+h =
x′tα + vt+h for
xt = (ut, ut−1, ..., ut−p+d+1, 1,∆d−1yt,∆
d−2yt, ...,∆yt, yt)′
with ut = ∆dyt − µ. The latter regression is infeasible because we do not know the true values
of µ and d. But because the fitted values are the same, once we find the properties of the second
regression, we will also know the properties of the first. For example, for d = 2 and p = 4,
xt =
1 −2 1 0 −µ
0 1 −2 1 µ
0 0 0 0 1
1 −1 0 0 0
1 0 0 0 0
yt
yt−1
yt−2
yt−3
1
≡ Hxt (41)
30
and α = (∑Hxtx
′tH′)−1 (
∑Hxtyt+h) so β = H ′α for every sample. When p = d we define
the (p + 1)× 1 vector as xt = (1,∆d−1yt,∆d−2yt, ...,∆yt, yt)
′, that is, none of the ut−j variables
appear in xt when p = d.
Define q to be the (p+ 1)× 1 vector q = (0, ..., 0, E(w(h)t ), κ
(d)h , κ
(d−1)h , ..., κ
(1)h )′, so that w
(h)t =
w(h)t − E(w
(h)t ) = yt+h − x′tq and
α = (∑xtx′t)−1∑
xt(x′tq + w
(h)t )
= q + (∑xtx′t)−1∑
xtw(h)t . (42)
We first consider the case when (18) holds for ∆dyt when there is further no drift and the initial
value for all of the difference processes is zero, namely, the case when µ = 0 and ∆d−jyt = ξ(j)t
where ξ(1)t =
∑tj=1 uj and ξ
(s)t =
∑tj=1 ξ
(s−1)j for s = 2, 3, ..., d. For this case define
ΥT =
T 1/2Ip−d+1 0 0 · · · 0
0 T 0 · · · 0
0 0 T 2 · · · 0
......
... · · · ...
0 0 0 0 T d
. (43)
Adapting the approach in Sims, Stock and Watson (1990), we have from (42) that
T−1/2ΥT (α− q) = T−1/2ΥT (∑xtx′t)−1∑
xtw(h)t
= T−1/2[Υ−1T
∑xtx′tΥ−1T
]−1Υ−1T
∑xtw
(h)t
=[Υ−1T
∑xtx′tΥ−1T
]−1 [T−1/2Υ−1T
∑xtw
(h)t
]. (44)
31
Consider first the last term in (44):
T−1/2Υ−1T
∑xtw
(h)t =
T−1∑utw
(h)t
...
T−1∑ut−p+d+1w
(h)t
T−1∑w
(h)t
T−3/2∑ξ(1)t w
(h)t
T−5/2∑ξ(2)t w
(h)t
...
T−d−1/2∑ξ(d)t w
(h)t
. (45)
The first p − d terms are just the sample means of stationary variables, which by the Law of
Large Numbers converge in probability to their expectation E(ut−jw(h)t ). Term p−d+1 likewise
converges to E(w(h)t ) = 0. Calculations analogous to those behind Lemma 1(e) in Sims, Stock
and Watson (1990) show that the last d terms in (45) also all converge in probability to zero.20
Turning next to the first term in (44), the upper-left (p−d)× (p−d) block of Υ−1T
∑xtx′tΥ−1T
is characterized byT−1
∑u2t · · · T−1
∑utut−p+d+1
... · · · ...
T−1∑ut−p+d+1ut · · · T−1
∑u2t−p+d+1
p→
γ0 · · · γp−d−1
... · · · ...
γp−d−1 · · · γ0
for γj = E(utut−j). From Sims, Stock and Watson Lemma 1(a) and 1(b), the lower-right
20 That is, before multiplying by T−1/2 the terms are all Op(1); for similar calculations see Lemma 1(b) inChoi (1993) and Proposition 17.3(e) in Hamilton (1994).
32
(d+ 1)× (d+ 1) block satisfies
1 T−3/2∑ξ(1)t T−5/2
∑ξ(2)t · · · T−d−1/2
∑ξ(d)t
T−3/2∑ξ(1)t T−2
∑[ξ
(1)t ]2 T−3
∑ξ(1)t ξ
(2)t · · · T−d−1
∑ξ(1)t ξ
(d)t
T−5/2∑ξ(2)t T−3
∑ξ(2)t ξ
(1)t T−4
∑[ξ
(2)t ]2 · · · T−d−2
∑ξ(2)t ξ
(d)t
......
... · · · ...
T−d−1/2∑ξ(d)t T−d−1
∑ξ(d)t ξ
(1)t T−d−2
∑ξ(d)t ξ
(2)t · · · T−2d
∑[ξ
(d)t ]2
⇒
1 ω∫ 1
0W (1)(r)dr ω
∫ 1
0W (2)(r)dr · · · ω
∫ 1
0W (d)(r)dr
ω∫ 1
0W (1)(r)dr ω2
∫ 1
0[W (1)(r)]2dr ω2
∫ 1
0W (1)(r)W (2)(r)dr · · · ω2
∫ 1
0W (1)(r)W (d)(r)dr
ω∫ 1
0W (2)(r)dr ω2
∫ 1
0W (2)(r)W (1)(r)dr ω2
∫ 1
0[W (2)(r)]2dr · · · ω2
∫ 1
0W (2)(r)W (d)(r)dr
......
... · · · ...
ω∫ 1
0W (d)(r)dr ω2
∫ 1
0W (d)(r)W (1)(r)dr ω2
∫ 1
0W (d)(r)W (2)(r)dr · · · ω2
∫ 1
0[W (d)(r)]2dr
where W (1)(r) denotes Standard Brownian Motion and W (j)(r) =
∫ r
0W (j−1)(s)ds. For the
off-diagonal block of Υ−1T
∑xtx′tΥ−1T we see using calculations analogous to Sims, Stock and
Watson’s Lemma 1(e) that
T−1∑ut · · · T−1
∑ut−p+d+1
T−3/2∑ξ(1)t ut · · · T−3/2
∑ξ(1)t ut−p+d+1
T−5/2∑ξ(2)t ut · · · T−5/2
∑ξ(2)t ut−p+d+1
... · · · ...
T−d−1/2∑ξ(d)t ut · · · T−d−1/2
∑ξ(d)t ut−p+d+1
p→ 0.
Bringing all these results together, it follows that
T−1/2ΥT (α− q) p→
g
0
(46)
33
g =
γ0 · · · γp−d−1
... · · · ...
γp−d−1 · · · γ0
−1 E(utw
(h)t )
...
E(ut−p+d+1w(h)t )
.
Note that g corresponds to the coefficients from a population linear projection of w(h)t on
(ut, ut−1, ..., ut−p+d+1)′.
Writing out (46) explicitly using (43) gives
Ip−d+1 0 0 · · · 0
0 T 1/2 0 · · · 0
0 0 T 3/2 · · · 0
......
... · · · ...
0 0 0 0 T d−1/2
(α− q) p→
g
0
.
This equation shows that the first p − d elements of α converge to the stationary population
projection coefficients g, the p− d+ 1 term to E(w(h)t ), and the last d elements of α converge to
the κ(j)h terms in q. Indeed, the latter estimates are superconsistent– they still converge to the
terms in q even when multiplied by some positive power of T.
Taking again the p = 4 and d = 2 example (41), the coefficients β from the actual regression
34
of yt+h on (yt, yt−1, yt−2, yt−3, 1)′ have plim
βp→
1 0 0 1 1
−2 1 0 −1 0
1 −2 0 0 0
0 1 0 0 0
−µ −µ 1 0 0
g1
g2
µh(h+ 1)/2
h
1
=
g1 + h+ 1
g2 − 2g1 − h
g1 − 2g2
g2
µ{[h(h+ 1)/2]− g1 − g2}
.
The above derivation assumed µ = 0 so that there was no drift in ∆dyt. If instead we had
µ 6= 0, then ∆d−1yt =∑t
s=1 us =∑t
s=1 us + tµ = ξ(1)t + tµ, which is dominated for large t by the
drift term tµ rather than the random walk term ξ(1)t , and ∆d−jyt = ξ
(j)t + (1/j!)tjµ+ op(t
j). In
this case we would simply replace ΥT in the above derivations with
ΥT =
T 1/2Ip−d+1 0 0 · · · 0
0 T 3/2 0 · · · 0
0 0 T 5/2 · · · 0
......
... · · · ...
0 0 0 0 T d+1/2
. (47)
We would then arrive at the identical conclusion (46) this time using results (a), (c), and (g)
from Sims, Stock and Watson Lemma 1.
Alternatively, adding a nonzero initial condition, e.g. replacing ξ(1)t with ξ
(1)t + ξ
(1)0 for ξ
(1)0
any fixed constant produces a term that is still dominated asymptotically by ξ(1)t , and as in Park
and Phillips (1989), the original convergence claims again all go through.
Finally, the derivations are very similar for the case of purely deterministic time trends,
35
yt =∑d
j=0 δjtj + ut. For this case we have µ = E(∆dyt) = δd and
xt = (∆dut − δd, ...,∆dut−p+d+1 − δd, 1,1∑
j=0
δ(d−1)j tj + ∆d−1ut, ...,
d−1∑j=0
δ(1)j tj + ∆ut,
d∑j=0
δjtj + ut)
′
where∑d−s
j=0 δ(s)j tj =
∑d−s+1j=0 δ
(s−1)j tj −
∑d−s+1j=0 δ
(s−1)j (t − 1)j and δ
(0)j = δj. Then for ΥT as in
(47), we again have
T−1/2Υ−1T
∑xtw
(h)t
p→ (E(∆dut − δd)wt+h, ..., E(∆dut−p+d+1 − δd)wt+h, 0, ..., 0)′.
The matrix Υ−1T
∑xtx′tΥ−1T likewise has a block-diagonal plim, so for g the coefficients of the
population linear projection of wt+h on (∆dyt − δd, ...,∆dyt−p+d+1 − δd)′ we have
αp→ (g, E(w
(h)t ), κ
(d)h , ..., κ
(1)h )′. (48)
Derivation of equation (23). For σ2 the variance of εt, w(h)t = εt+h−εt, and vt = εt−εt−1
we have
g =
E(v2t ) E(vtvt−1) E(vtvt−2) · · · E(vtvt−p+2)
E(vt−1vt) E(v2t−1) E(vt−1vt−2) · · · E(vt−1vt−p+2)
E(vt−2vt) E(vt−2vt−1) E(v2t−2) · · · E(vt−2vt−p+2)
......
... · · · ...
E(vt−p+2vt) E(vt−p+2vt−1) E(vt−p+2vt−2) · · · E(v2t−p+2)
−1
E[vt(εt+h − εt)]
E[vt−1(εt+h − εt)]
E[vt−2(εt+h − εt)]
...
E[vt−p+2(εt+h − εt)
=
σ2
2 −1 0 · · · 0
−1 2 −1 · · · 0
0 −1 2 · · · 0
......
... · · · ...
0 0 0 · · · 2
−1
−σ2
0
0
...
0
=
−(p− 1)/p
−(p− 2)/p
−(p− 3)/p
...
−1/p
36
where the last equation can be verified by premultiplying by
2 −1 0 · · · 0
−1 2 −1 · · · 0
0 −1 2 · · · 0
......
... · · · ...
0 0 0 · · · 2
and confirming that the resulting vector is indeed (−1, 0, ..., 0)′. Hence the plim in (48) for this
example is
αp→ (−(p− 1)/p,−(p− 2)/p, ...,−1/p, hδ1, 1)′.
Also for this case we have
H =
1 −1 0 · · · 0 0 −δ1
0 1 −1 · · · 0 0 −δ1
0 0 1 · · · 0 0 −δ1...
...... · · · ...
......
0 0 0 · · · 1 −1 −δ1
0 0 0 · · · 0 0 1
1 0 0 · · · 0 0 0
so β = H ′α has plim (1/p, 1/p, ..., 1/p, δ1[h+ (p− 1)/p+ (p− 2)/p+ · · ·+ 1/p])′ implying a fitted
37
value
β′xt = (1/p)(yt + yt−1 + · · ·+ yt−p+1) + δ1[h+ 1/p+ 2/p+ · · ·+ (p− 1)/p]
= δ1h+ (1/p){yt + [yt−1 + δ1] + [yt−2 + 2δ1] + · · ·+ [yt−p+1 + (p− 1)δ1]}
= δ1h+ (1/p){[δ0 + δ1t+ εt] + [δ0 + δ1t+ εt−1] + · · ·+ [δ0 + δ1t+ εt−p+1]}
= δ0 + δ1(t+ h) + (1/p)(εt + εt−1 + · · ·+ εt−p+1).
38
39
Table 1. Maximum likelihood estimates of parameters of state-space formalization of the HP filter for
assorted quarterly macroeconomic series.
σ2c σ2
v λ
GDP 0.115 0.468 0.245
Consumption 0.163 0.174 0.940
Investment 4.187 12.196 0.343
Exports 5.818 3.341 1.741
Imports 4.423 4.769 0.927
Government spending 0.221 1.160 0.191
Employment 0.006 0.250 0.023
Unemployment rate 0.014 0.092 0.152
GDP Deflator 0.018 0.081 0.216
S&P 500 21.284 15.186 1.402
10-year Treasury yield 0.135 0.054 2.486
Fed Funds Rate 0.633 0.116 5.458
Real Rate 0.875 0.091 9.596
40
Table 2. Standard deviation of cyclical component and correlation with cyclical component of GDP for
assorted macroeconomic series.
Regression Residuals Random walk Sample
St. Dev. GDP Corr. St. Dev. GDP Corr.
GDP 3.38 1.00 3.69 1.00 1947:1-2016:1
Consumption 2.85 0.79 3.04 0.82 1947:1-2016:1
Investment 13.19 0.84 13.74 0.80 1947:1-2016:1
Exports 10.77 0.33 11.33 0.30 1947:1-2016:1
Imports 9.79 0.77 9.98 0.75 1947:1-2016:1
Government spending 7.13 0.31 8.60 0.38 1947:1-2016:1
Employment 3.09 0.85 3.32 0.85 1947:1-2016:2
Unemployment rate 1.44 -0.81 1.72 -0.79 1948:1-2016:2
GDP Deflator 2.99 0.04 4.11 -0.13 1947:1-2016:1
S&P 500 21.80 0.41 22.08 0.38 1950:1-2016:2
10-year Treasury yield 1.46 -0.05 1.51 0.08 1953:2-2016:2
Fed funds rate 2.78 0.33 3.03 0.40 1954:3-2016:2
Real rate 2.25 0.39 2.60 0.42 1958:1-2014:3
Notes to Table 2. Filtered series were based on the full sample available for that variable, while
correlations were calculated using the subsample of overlapping values for the two indicators. Note
that the regression residuals lose the first 11 observations and the random-walk calculations lose the
first 8 observations.
41
Figure 1. Values for φ1 and φ2 implied by different values of λ.
-1.2
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
-0.5 0 0.5 1 1.5 2 2.5
φ2
φ1
λ = 0
λ = ∞
λ = 1600
42
Figure 2. Autocorrelations and cross-correlations for first-difference of stock prices and real
consumption spending.
Notes to Figure 2. Upper left: autocorrelations of log growth rate of end-of-quarter value for S&P 500.
Upper right: autocorrelations of log growth rate of real consumption spending. Lower panels: cross
correlations.
Figure 3. Autocorrelations and cross-correlations for HP cyclical component of stock prices and real
consumption spending.
Notes to Figure 3. Upper left: autocorrelations of HP cycle for log of end-of-quarter value for S&P 500.
Upper right: autocorrelations of HP cycle for log of real consumption spending. Lower panels: cross
correlations.
Lag0 1 2 3 4 5 6 7 8
-0.5
0
0.5
1S&P 500 with own lags
Lag0 1 2 3 4 5 6 7 8
-0.5
0
0.5
1Consumption with own lags
Lag0 1 2 3 4 5 6 7 8
-0.5
0
0.5
1S&P 500 with lags of Consumption
Lag0 1 2 3 4 5 6 7 8
-0.5
0
0.5
1Consumption with lags of S&P 500
Lag0 1 2 3 4 5 6 7 8
-0.5
0
0.5
1S&P 500 with own lags
Lag0 1 2 3 4 5 6 7 8
-0.5
0
0.5
1Consumption with own lags
Lag0 1 2 3 4 5 6 7 8
-0.5
0
0.5
1S&P 500 with lags of Consumption
Lag0 1 2 3 4 5 6 7 8
-0.5
0
0.5
1Consumption with lags of S&P 500
43
Figure 4. Comparison of one-sided and two-sided HP filters.
Notes to Figure 4. Red line in both panels plots 100 times natural log of S&P 500 stock price index. The
black curve in the top panel plots the HP estimate of trend as inferred using the usual two-sided filter
(calculated using the Kalman smoother for the state-space model in Proposition 1), whereas the black
curve in the bottom panel plots trend from a one-sided HP filter (calculated using the Kalman filter for
the same model). Shaded regions denote NBER recession dates.
Two-sided filter
1950 1960 1970 1980 1990 2000 2010200
300
400
500
600
700
800
One-sided filter
1950 1960 1970 1980 1990 2000 2010200
300
400
500
600
700
800
44
Figure 5. Regression and 8-quarter-change filters applied to seasonally adjusted and seasonally
unadjusted employment data.
Notes to Figure 5. Upper left: 100 times the log of end-of-quarter values for seasonally adjusted
nonfarm payrolls. Lower left: black plots 0 1 8 2 9 3 10 4 11ˆ ˆ ˆ ˆ ˆ
t t t t ty y y y yβ β β β β− − − −− − − − − as a function of t
while red plots 8.t ty y −− Right panels show results when the identical procedure is applied instead to
seasonally unadjusted data.
Employment (seasonally adjusted)
1950 1960 1970 1980 1990 2000 20101060
1080
1100
1120
1140
1160
1180
1200
Cyclical component (SA)
1950 1960 1970 1980 1990 2000 2010-10.0
-7.5
-5.0
-2.5
0.0
2.5
5.0
7.5
10.0
12.5RANDOM_WALKREGRESSION
Employment (not seasonally adjusted)
1950 1960 1970 1980 1990 2000 20101060
1080
1100
1120
1140
1160
1180
1200
Cyclical component (NSA)
1950 1960 1970 1980 1990 2000 2010-10.0
-7.5
-5.0
-2.5
0.0
2.5
5.0
7.5
10.0
12.5RANDOM_WALKREGRESSION
45
Figure 6. Results of applying regression (black) and 8-quarter-change (red) filters to 100 times the log of
components of U.S. national income and product accounts.
GDP
1950 1960 1970 1980 1990 2000 2010-10
-5
0
5
10
15
20RANDOM_WALKREGRESSION
Consumption
1950 1960 1970 1980 1990 2000 2010-10
-5
0
5
10
15RANDOM_WALKREGRESSION
Investment
1950 1960 1970 1980 1990 2000 2010-50
-25
0
25
50RANDOM_WALKREGRESSION
Exports
1950 1960 1970 1980 1990 2000 2010-40
-30
-20
-10
0
10
20
30
40RANDOM_WALKREGRESSION
Imports
1950 1960 1970 1980 1990 2000 2010-40
-30
-20
-10
0
10
20
30
40RANDOM_WALKREGRESSION
Government
1950 1960 1970 1980 1990 2000 2010-30
-20
-10
0
10
20
30
40
50
60RANDOM_WALKREGRESSION
Top Related