Download - Why You Should Never Use the Hodrick-Prescott Filterjhamilto/hp.pdf · they never defend the claim that the particular structure assumed in Proposition 1 is an accurate representation

Why You Should Never Use the Hodrick-Prescott Filter∗

James D. Hamilton

[email protected]

Department of Economics, UC San Diego

July 30, 2016

Revised: May 13, 2017

ABSTRACT

Here’s why. (1) The HP filter produces series with spurious dynamic relations that have no

basis in the underlying data-generating process. (2) Filtered values at the end of the sample are

very different from those in the middle, and are also characterized by spurious dynamics. (3) A

statistical formalization of the problem typically produces values for the smoothing parameter

vastly at odds with common practice, e.g., a value for λ far below 1600 for quarterly data. (4)

There’s a better alternative. A regression of the variable at date t + h on the four most recent

values as of date t offers a robust approach to detrending that achieves all the objectives sought

by users of the HP filter with none of its drawbacks.

—————————————————————————————————-

∗I thank Daniel Leff for outstanding research assistance on this project and Frank Diebold,

Robert King, James Morley, and anonymous referees for helpful comments on an earlier draft of

this paper .

1 Introduction.

Often economic researchers have a theory that is specified in terms of a stationary environment,

and wish to relate the theory to observed nonstationary data without modeling the nonstationar-

ity. Hodrick and Prescott (1981, 1997) proposed a very popular method for doing this, commonly

interpreted as decomposing an observed variable into trend and cycle. Although drawbacks to

their approach have been known for some time, the method continues today to be very widely

adopted in academic research, policy studies, and analysis by private-sector economists. For

this reason it seems useful to collect and expand on those earlier concerns here and note that

there is a better way to solve this problem.

2 Characterizations of the Hodrick-Prescott filter.

Given T observations on a variable yt, Hodrick and Prescott (1981, 1997) proposed interpreting

the trend component gt as a very smooth series that does not differ too much from the observed

yt.1 It is calculated as

min{gt}Tt=−1

{∑Tt=1(yt − gt)

2 + λ∑T

t=1[(gt − gt−1)− (gt−1 − gt−2)]2}. (1)

When the smoothness penalty λ→ 0, gt would just be the series yt itself, whereas when λ→∞

the procedure amounts to a regression on a linear time trend (that is, produces a series whose

second difference is exactly 0). The common practice is to use a value of λ = 1600 for quarterly

time series.

1 Phillips and Jin (2015) reviewed the rich prior history of generalizations of this approach.

1

A closed-form expression for the resulting series can be written in vector notation by defining

T = T + 2, y(T×1)

= (yT , yT−1, ..., y1)′, g

(T×1)= (gT , gT−1, ..., g−1)

′ and

H(T×T )

=

[IT

(T×T )0

(T×2)

]

Q(T×T )

=

1 −2 1 0 · · · 0 0 0

0 1 −2 1 · · · 0 0 0

......

...... · · · ...

......

0 0 0 0 · · · −2 1 0

0 0 0 0 · · · 1 −2 1

.

The solution to (1) is then given by2

g∗ = (H ′H + λQ′Q)−1H ′y = A∗y. (2)

The inferred trend g∗t for any date t is thus a linear function of the full set of observations on y

for all dates.

As noted by Hodrick and Prescott (1981) and King and Rebelo (1993), the identical inference

can alternatively be motivated from particular assumptions about the time-series behavior of the

trend and cycle components. Suppose our goal was to choose a value for a (T × 1) vector at

such that the estimate gt = a′ty has minimum expected squared difference from the true trend:

minatE(gt − a′ty)2. (3)

2 The appendix provides a derivation of equations (2) and (4). Cornea-Madeira (forthcoming) providedfurther details on A∗ and a convenient algorithm for calculating it.

2

The solution to this problem is the population analog to a sample regression coefficient, and is

a function of the variance of y and its covariance with g:

g = E(gy′) [E(yy′)]−1y = Ay. (4)

As an example of a particular set of assumptions we might make about these covariances, let ct

denote the cyclical component and vt the second difference of the trend component:

yt = gt + ct (5)

gt = 2gt−1 − gt−2 + vt. (6)

Suppose that we believed that vt and ct are uncorrelated white noise processes that are also

uncorrelated with (g0, g−1), and let C0 denote the (2×2) variance of (g0, g−1). These assumptions

imply a particular value for A in (4). As we let the variance of (g0, g−1) become arbitrarily large

(represented as C−10 → 0), then in every sample the inference (4) would be numerically identical

to expression (2).

Proposition 1. For λ = σ2c/σ

2v and any fixed T, under conditions (5)-(6) with ct and vt

white noise uncorrelated with each other and uncorrelated with (g0, g−1), the matrix A in (4)

converges to the matrix A∗ in (2) as C−10 → 0.

The proposition establishes that if researcher 1 sought to identify a trend by solving the

minimization problem (1) while researcher 2 found the optimal linear estimate of a trend process

that was assumed to be characterized by the particular assumption that vt and ct were both

white noise, the two researchers would arrive at the numerically identical series for trend and

cycle provided the ratio of σ2c to σ2

v assumed by researcher 2 was identical to the value of λ used

by researcher 1.

3

The Kalman smoother is an iterative algorithm for calculating the population linear projec-

tion (4) for models where the variance and covariance can be characterized by some recursive

structure.3 In this case, (5) is the observation equation and (6) is the state equation. Thus

as noted by Hodrick and Prescott, applying the Kalman smoother to the above state-space

model starting from a very large initial variance for (g0, g−1)′ offers a convenient algorithm for

calculating the HP filter, and is in fact a way that the HP filter is often calculated in practice.

Nevertheless, this observation should also be a bit troubling for users of the HP filter, in that

they never defend the claim that the particular structure assumed in Proposition 1 is an accurate

representation of the true data-generating process. Indeed, if a researcher did know for certain

that these equations were the true data-generating process, and further knew for certain the

value of the population parameter λ = σ2c/σ

2v, he would probably be unhappy with using (2) to

separate cycle from trend! The reason is that if this state-space structure was the true DGP,

the resulting estimate of the cyclical component ct = yt − gt would be white noise– it would

be random and exhibit no discernible patterns. By contrast, users of the HP filter hope to

see suggestive patterns in plots of the series that is supposed to be interpreted as the cyclical

component of yt.

Premultiplying (2) by H ′H + λQ′Q gives a system of equations whose tth element is

[1 + λ(1− L−1)2(1− L)2]g∗t = yt for t = 1, 2, ..., T − 2 (7)

for L the lag operator (Lkxt = xt−k, L−kxt = xt+k). In other words, F (L)g∗t = yt for

F (L) = 1 + λ(1− L−1)2(1− L)2. (8)

3 See for example Hamilton, 1994, equation [13.6.3].

4

The following proposition establishes some properties of this filter.4

Proposition 2. For any λ : 0 < λ <∞, the inverse of the operator (8) can be written

[F (L)]−1 = C

[1− (φ2

1/4)L

1− φ1L− φ2L2

+1− (φ2

1/4)L−1

1− φ1L−1 − φ2L

−2 − 1

](9)

where

1

1− φ1z − φ2z2

=∑∞

j=0Rj[cos(mj) + cot(m) sin(mj)]zj (10)

1

1− φ1z−1 − φ2z

−2 =∑∞

j=0Rj[cos(mj) + cot(m) sin(mj)]z−j

φ1(1− φ2) = −4φ2 (11)

(1− φ1 − φ2)2 = −φ2/λ (12)

C =−φ2

λ(1− φ21 − φ2

2 + φ31/2)

(13)

R =√−φ2

cos(m) = φ1/(2R). (14)

Roots of (1−φ1z−φ2z2) = 0 are complex and outside the unit circle, φ1 is a real number between

0 and 2, φ2 a real number between −1 and 0, and R a real number between 0 and 1.

Figure 1 plots the values of φ1 and φ2 generated by different values of λ. For λ = 1600,

φ1 = 1.777 and φ2 = −0.7994. These imply R = 0.8941, so that the absolute value of the

weights decay with a half-life of about 6 quarters while R60 = 0.0012.5

4 Related results have been developed by Singleton (1988), King and Rebelo (1989, 1993), Cogley and Nason(1995), and McElroy (2008). Unlike these papers, here I provide simple direct expressions for the values of φ1and φ2, and my analytical expressions of the HP filter entirely in terms of real parameters in (9) and (10) appearto be new.

5 The other parameters for this case are C = 0.056075, m = 0.111687 and cot(m) = 8.9164.

5

Expression (7) means that for t more than 15 years from the start or end of a sample of

quarterly data, the cyclical component ct = yt − g∗t is well approximated by

ct = λ(1− L−1)2(1− L)2g∗t =λ(1− L−1)2(1− L)2

F (L)yt =

λ(1− L)4

F (L)yt+2. (15)

As noted by King and Rebelo, obtaining the cyclical component for these observations thus

amounts to taking fourth differences of the original yt+2 and applying the operator [F (L)]−1 to

the result, so that the HP cycle might be expected to produce a stationary series as long as

fourth-differences of the original series are stationary. However, De Jong and Sakarya (2016)

noted there could still be significant nonstationarity coming from observations near the start or

end of the sample, and Phillips and Jin (2015) concluded that for commonly encountered sample

sizes, the HP filter may not successfully remove the trend even if the true series is only I(1).

3 Drawbacks to the HP filter.

3.1 Appropriateness for typical economic time series.

The presumption by users of the HP filter is that it offers a reasonable approach to detrending

for a range of commonly encountered economic time series. The leading example of a time-series

process for which we would want to be particularly convinced of the procedure’s appropriateness

would be a random walk. Simple economic theory suggests that variables such as stock prices

(Fama, 1965), futures prices (Samuelson, 1965), long-term interest rates (Sargent, 1976; Pesando,

1979), oil prices (Hamilton, 2009), consumption spending (Hall, 1978), inflation, tax rates, and

money supply growth rates (Mankiw, 1987) should all follow martingales or near martingales. To

be sure, hundreds of studies have claimed to find evidence of statistically detectable departures

from pure martingale behavior in all these series. Even so, there is indisputable evidence that

6

a random walk is often extremely hard to beat in out-of-sample forecasting comparisons, as has

been found for example by Meese and Rogoff (1983) and Cheung, Chinn, and Pascual (2005)

for exchange rates, Flood and Rose (2010) for stock prices, Atkeson and Ohanian (2001) for

inflation, or Balcilar, et al. (2015) for GDP, among many others. Certainly if we are not

comfortable with the consequences of applying the HP filter to a random walk, then we should

not be using it as an all-purpose approach to economic time series.

For yt = yt−1+εt, where εt is white noise and (1−L)yt = εt, Cogley and Nason (1995)6 noted

that expression (15) means that when the HP filter is applied to a random walk, the cyclical

component for observations near the middle of the sample will approximately be characterized

by

ct =λ(1− L)3

F (L)εt+2.

For λ = 1600 this is

ct = 89.72{−q0,t+2 +

∑∞j=0(0.8941)j[cos(0.1117j) + 8.916 sin(0.1117j)](q1,t+2−j + q2,t+2+j)

}with q0t = εt − 3εt−1 + 3εt−2 − εt−3, q1t = εt − 3.79εt−1 + 5.37εt−2 − 3.37εt−3 + 0.79εt−4),

7 and

q2t = −0.79εt+1+3.37εt−5.37εt−1+3.79εt−2−εt−3. The underlying innovations εt are completely

random and exhibit no patterns, whereas the series ct is both highly predictable (as a result of

the dependence on lags of εt−j) and will in turn predict the future (as a result of dependence

on future values of εt+j). Since the coefficients that make up [F (L)]−1 are determined solely by

the value of λ, these patterns in the cyclical component are entirely a feature of having applied

6 Harvey and Jaeger (1993) also have a related discussion.

7 The term q1t is the expansion of (1− L)3[1− (φ21/4)L]εt.

7

the HP filter to the data rather than reflecting any true dynamics of the data-generating process

itself.

For example, consider the behavior of stock prices and real consumption spending.8 The

top panels of Figure 2 show the autocorrelation functions for first-differences of these series,

confirming that there is little ability to predict either from its own past values, as we might

have expected from the literature cited at the start of this section. The lower panels show cross

correlations. Consumption has no predictive power for stocks, though stock prices may have a

modest ability to anticipate changes in aggregate consumption.

Figure 3 shows the analogous results if we tried to remove the trend by HP filtering rather

than first-differencing. The HP cyclical components of stock prices and consumption are both

extremely predictable from their own lagged values as well as each other. The rich dynamics in

these series are purely an artifact of the filter itself and tell us nothing about the underlying data-

generating process. Filtering takes us from the very clean understanding of the true properties

of these series that we can easily see in Figure 2 to the artificial set of relations that appear in

Figure 3. The values plotted in Figure 3 summarize the filter, not the data.

3.2 Properties of the one-sided HP filter.

The HP trend and cycle have an artificial ability to “predict” the future because they are by

construction a function of future realizations. One way we might try to get around this would

be to restrict the minimization problem in (3), forcing at to load only on values (yt, yt−1, ..., y1)′

8 Stock prices were measured as 100 times the natural log of the end-of-quarter value for the S&P 500 andconsumption from 100 times the natural log of real personal consumption expenditures from the U.S. NIPAaccounts. All data for this figure are quarterly for the period 1950:1 to 2016:1.

8

that have been observed as of date t, rather than also using future values as was done in the HP

filter (4). The value of this one-sided projection for date t could be calculated by taking the

end-of-sample HP-filtered series for a sample ending at t, repeated for each t.9

The top panel of Figure 4 shows the result of applying the usual two-sided HP filter to stock

prices. The trend is identified to have been essentially flat throughout the 2000s, with the pre-

recession booms and post-recession busts in stock prices viewed entirely as cyclical phenomena.

The bottom panel shows the results of applying a one-sided HP filter to the same data. This

would instead identify the trend component as rising during economic expansions and falling

during recessions. The reason is that a real-time observer would not know in early 2009, for

example, that stock prices were about to appreciate remarkably, and accordingly would have

judged much of the drop observed up to that date to be permanent. It is only with hindsight

that we are tempted to interpret the 2008 stock-market crash as a temporary phenomenon.

Making use of unknowable future values in this way is in fact a fundamental reason that HP-

filtered series exhibit the visual properties that they do, precisely because they impose patterns

that are not a feature of the data-generating process and could not be recognized in real time.

Some researchers might be attracted by the simple picture of the “long-run” component of stock

prices summarized by the top panel of Figure 4. But that picture is just something that their

imagination has imposed on the data. And the end-of-sample value obtained from the procedure

is actually quite different from what the researcher is seeing in the middle of the sample.

Moreover, although a one-sided filter would eliminate the problem of generating a series that

9 An easier way to calculate the one-sided projection is with a single pass of the Kalman filter through theentire sample for the state-space model assumed in Proposition 1 with C0 large and σ2

c/σ2v = 1600. The Kalman

filter gives the one-sided projection while the Kalman smoother gives the usual two-sided HP filter.

9

is artificially able to predict the future, changes in both the one-sided trend and its implied

cycle are readily forecastable from their own lagged values, and likewise by values of any other

variables. Again this is not a feature of the stock prices themselves, but instead is an artifact

of choosing to characterize the cycle and trend in this particular way.

3.3 Data-coherent values for λ.

A separate question is what value we should use for the smoothing parameter λ. Hodrick and

Prescott motivated their choice of λ = 1600 based on the prior belief that a large change in

the cyclical component within a quarter would be around 5%, whereas a large change in the

trend component would be around (1/8)%, suggesting a choice of λ = σ2c/σ

2v = (5/(1/8))2 =

1600. Ravn and Uhlig (2002) showed how to choose the smoothing parameter for data at other

frequencies if indeed it would be correct to use 1600 on quarterly data. These rules of thumb

are almost universally followed.

It’s worth noting that if the state-space representation in Proposition 1 were indeed an

accurate characterization of the trend that we were trying to infer, we would not need to make

up a value for λ but could in fact estimate it from the data. If for example we assumed a Normal

distribution for the innovations (vt, ct)′ we could use the Kalman filter to evaluate the likelihood

function for the observed sample (y1, ...., yT )′ and find the values for σ2v and σ2

c that maximize

the likelihood function.10 This could alternatively be given a quasi-maximum likelihood

interpretation as a GLS minimization of the squared forecast errors weighted by reciprocals of

10 See for example Hamilton (1994, equations [13.4.1]-[13.4.2]). Note that although the inferred value for thetrend gt depends only on the ratio σ2

c/σ2v, the parameters σ2

c and σ2v are separately identifiable because σ2

c canbe inferred from the average observed size of (yt − gt)2.

10

their model-implied variance.

Table 1 reports MLEs of σ2v, σ

2c , and λ for a number of commonly studied macroeconomic

series. For every one of these we would estimate a value for σ2c whose magnitude is similar to,

and in fact often smaller than, σ2v, and certainly not 1600 times as large.11 If we used a value

of λ = 1 instead of λ = 1600, the resulting series for gt would differ little from the original data

yt itself; λ = 1 implies a value for R in expression (10) of 0.48, which decays with a half-life of

less than one quarter.

Thus not only is the HP filter very inappropriate if the true process is a random walk. As

commonly applied with λ = 1600, the HP filter is not even optimal for the only example (namely

(5)-(6)) for which anyone has claimed that it might provide the ideal inference!

4 A better alternative.

Here I suggest an alternative concept of what we might mean by the cyclical component of a

possibly nonstationary series: how different is the value at date t + h from the value that we

would have expected to see based on its behavior through date t?12 This concept of the cyclical

component has several attractive features. First, as noted by den Haan (2000), the forecast

error is stationary for a wide class of nonstationary processes. Second, the primary reason that

we would be wrong in predicting the value of most macro and financial variables at a horizon

11 Nelson and Plosser (1982, pages 157-158) have also made this observation.

12 This idea is related to Beveridge and Nelson’s (1981) definition of the trend component of yt as gt = limh→∞

limp→∞

E(yt+h|yt, yt−1, .., yt−p+1) which limit exists and can be calculated provided that (1 − L)yt is a mean-zero

stationary process. Beveridge and Nelson then (somewhat curiously) interpreted the cyclical component as ct =gt−yt. By contrast, here we keep h and p fixed and take advantage of the fact that gt = E(yt+h|yt, yt−1, .., yt−p+1)exists for a broad range of nonstationary processes. We interpret the cyclical component at date t+ h as ct+h =yt+h − gt.

11

of h = 8 quarters ahead is cyclical factors such as whether a recession occurs over the next two

years and the timing of recovery from any downturn.13

While it might seem that calculating this concept of the cyclical component requires us

already to know the nature of the nonstationarity and to have the correct model for forecasting

the series, neither of these is the case. We can instead always rely on very simple forecasts

within a restricted class, namely, the population linear projection of yt+h on a constant and the

4 most recent values of y as of date t. This object exists and can be consistently estimated for

a wide range of nonstationary processes, as I now show.

4.1 Forecasting when the true process is unknown.

Suppose that the dth difference of yt is stationary for some d. For example, d = 2 would mean

that the growth rate is nonstationary but the change in the growth rate is stationary. Note the

dth difference is also stationary for any series with a deterministic time trend characterized by

a dth-order polynomial in time. For any such process we can write the value of yt+h as a linear

function of initial conditions at time t plus a stationary process. For example, when d = 1,

letting ut = ∆yt we can write

yt+h = yt + w(h)t (16)

where the stationary component is given by w(h)t = ut+1 + · · ·+ ut+h. For d = 2 and ∆2yt = ut,

yt+h = yt + h∆yt + w(h)t (17)

where now w(h)t = ut+h + 2ut+h−1 + · · ·+hut+1. This result holds for general d, as demonstrated

in the following proposition.

13 This same consideration suggests using h = 24 for monthly data and h = 2 for annual data.

12

Proposition 3. If (1− L)dyt is stationary for some d ≥ 1, then for all finite h ≥ 1,

yt+h = κ(1)h yt + κ

(2)h ∆yt + · · ·+ κ

(d)h ∆d−1yt + w

(h)t

with ∆s = (1− L)s, κ(1)` = 1 for ` = 1, 2, .. and κ

(s)j =

∑j`=1 κ

(s−1)` for s = 2, 3, ..., d and w

(h)t is

a stationary process.

It further turns out that if ∆dyt ∼ I(0) and we regress yt+h on a constant and the d most

recent values of y as of date t, the coefficients will be forced to be close to the values implied

by the coefficients κ(j)h in Proposition 3. For example, if ∆2yt is I(0) then in a regression of

yt+h on (yt, yt−1, 1)′, the fitted values will tend to yt + h(yt − yt−1) + µh for µh = E(w(h)t ) as the

sample size gets large; that is, the coefficient on yt will go to 1 + h and the coefficient on yt−1

will go to −h. The implication is that the residuals from a regression of yt+h on (yt, yt−1, 1)′ will

be stationary whenever y itself is I(2). The reason is that any other values for these coefficients

would imply a nonstationary series for the residuals, whose sum of squares become arbitrarily

large relative to those implied by the coefficients 1 + h and −h as the sample size grows large.

If ∆dyt is stationary and we regress yt+h on a constant and the p most recent values of

y as of date t for any p > d, the regression will use d of the coefficients to make sure the

residuals are stationary and the remaining p + 1 − d coefficients will be determined by the

parameters that characterize the population linear projection of the stationary variable w(h)t on

the stationary regressors (∆dyt,∆dyt−1, ...,∆

dyt−p+d+1, 1)′. The following proposition provides a

formal statement of these claims. In the proof of this proposition I have followed Stock (1994,

p. 2756) in defining a series ut to be I(0) if it has fixed mean µ and satisfies a Functional Central

13

Limit Theorem.14 This requires that the sample mean of ut has a Normal distribution as the

sample size T gets large, as does a sample mean that used only Tr observations for 0 < r ≤ 1.

Formally,

T−1/2∑[Tr]

s=1(ut − µ)⇒ ωW (r), (18)

where [Tr] denotes the largest integer less than or equal to Tr, W (r) denotes Standard Brownian

Motion, and “⇒” denotes weak convergence in probability measure. I will show that if either

the dth difference (ut = ∆dyt) satisfies (18) or if the deviation from a dth-order deterministic

polynomial in time (ut = yt − δ0 − δ1t− δ2t2 − · · · − δdtd) satisfies (18), then we can remove the

nonstationary component with the same simple regression.15

Proposition 4. Suppose that either ut = ∆dyt satisfies (18) or that ut = yt−∑d

j=0 δjtj with

δd 6= 0 satisfies (18) for some unknown d. Let xt = (yt, yt−1, ..., yt−p+1, 1)′ for some p ≥ d and

consider OLS estimation of yt+h = x′tβ + vt+h for t = 1, ..., T with estimated coefficient

β =(∑T

j=1 xtx′t

)−1 (∑Tj=1 xtyt+h

). (19)

If p = d, the OLS residuals yt+h− x′tβ converge to the variable w(h)t −E(w

(h)t ) in Proposition 3.

If p > d, the OLS residuals converge to the residuals from a population linear projection of w(h)t

on (∆dyt,∆dyt−1, ...,∆

dyt−p+d+1, 1)′.

14 Stock (1994, p. 2749) demonstrated that an example of sufficient conditions that imply (18) is that ut =µ+

∑∞j=0 ψjηt where ηt is a martingale difference sequence with variance σ2 and finite fourth moment, ψ(1) 6= 0,

and∑∞

j=0 j|ψj | <∞, in which case ω2 in (18) is given by σ2[ψ(1)]2. Alternatively, Phillips (1987, Lemma 2.2)derived (18) from primitive moment and mixing conditions on ut.

15 The reason to state these as two separate possibilities is that if the nonstationarity is purely deterministic,then the dth differences will not satisfy the Functional Central Limit Theorem. For example, if yt = γ0 +γ1t+εtwith εt white noise, then ∆yt = γ1 + ψ(L)εt for ψ(L) = 1 − L and ψ(1) = 0. Of course when ut = ∆dytsatisfies (18) with µ 6= 0, the series yt has both dth-order stochastic as well as dth-order deterministic polynomialtrends, so that case, along with pure stochastic trends (µ = 0) and pure deterministic trends are all allowed byProposition 4.

14

Proposition 4 establishes that if we estimate an OLS regression of yt+h on a constant and the

p = 4 most recent values of y as of date t,

yt+h = β0 + β1yt + β2yt−1 + β3yt−2 + β4yt−3 + vt+h, (20)

the residuals

vt+h = yt+h − β0 − β1yt − β2yt−1 − β3yt−2 − β4yt−3 (21)

offer a reasonable way to construct the transient component for a broad class of underlying

processes. The series is stationary provided that fourth differences of yt are stationary, a goal

that HP intends but does not necessarily achieve. But whereas the approximation to the HP

filter in equation (15) imposes all 4 unit roots, the sample regression would only use 4 differences

if it is warranted by observed features of the data. The proposed procedure has a number of other

advantages over HP. First, any finding that vt+h predicts some other variable xt+h+j represents

a true ability of y to predict x rather than an artifact of the way we chose to detrend y, by virtue

of the fact that vt+h is a one-sided filter16 . Second, unlike the HP cyclical series ct+h, the value

of vt+h will by construction be difficult to predict from variables dated t and earlier.17 If we find

such predictability, it tells us something about the true data-generating process, for example,

that x Granger-causes y. Third, the value of vt+h is a model-free and essentially assumption-

free summary of the data. Regardless of how the data may have been generated, as long as

(1− L)dyt is covariance stationary for some d ≤ 4, there exists a population linear projection of

16 While the OLS coefficients (19) make use of future observations, this influence vanishes asymptotically, incontrast to HP’s first-order dependence on future observations. Expression (22) below offers another alternativethat allows zero inference of future observations for any sample size T.

17 Note however that ct+h by construction can be predicted by variables known at date t+h− 1. The value ofct+h will be correlated with its own lagged values ct+h−1, ct+h−2, ..., ct+1 but likely uncorrelated with ct, ct−1, ...

15

yt+h on (yt, yt−1, yt−2, yt−3, 1)′. That projection is a characteristic of the data-generating process

that can be used to define what we mean by the cyclical component of the process and can be

consistently estimated from the data. Given a dynamic stochastic general equilibrium or any

other theoretical model that would imply an I(d) process, we could calculate this population

characteristic of the model and estimate it consistently from the data.

4.2 Properties in some common settings.

Random walk. Given the literature cited in Section 3.1 it is instructive to examine the conse-

quences if this procedure were applied to a random walk: yt = yt−1 + εt. In this case, d = 1

and w(h)t = εt+h + εt+h−1 + · · ·+ εt+1. For large samples, the OLS estimates of (20) converge to

β1 = 1 and all other βj = 0, and the resulting filtered series would simply be the difference

vt+h = yt+h − yt, (22)

that is, how much the series changes over an h = 8-quarter horizon, or equivalently the sum

of the observed changes over h periods. Note that for h = 8 the filter 1 − Lh wipes out any

cycles with frequency of exactly one year, and thus is taking out both the long-run trend as

well as any strictly seasonal components.18 This also fits with the common understanding of

what we would mean by the cyclical component. Because the simple filter (22) does not require

estimation of any parameters, it can also be used as a quick robustness check for concerns about

the small-sample applicability of the asymptotic claims in Proposition 4, as will be illustrated

in the applications below.

18 As in Hamilton (1994, pp. 171-172), the filter 1 − L8 has power transfer function (1 − e−8iω)(1 − e8iω) =2− 2 cos(8ω) which is zero at ω = 0, π/4, π/2, 3π/4, π and thus eliminates not only cycles at the zero frequencybut also cycles that repeat themselves every 8,4,8/3, or 2 quarters. See also Hamilton (1994, Figures 6.5 and6.6).

16

Deterministic time trend. Another instructive example is a pure deterministic time trend of

order d = 1: yt = δ0 + δ1t+ εt for εt white noise. In this case ∆yt = δ1 + εt − εt−1 is stationary

and w(h)t = ∆yt+1 + · · · + ∆yt+h = δ1h + εt+h − εt is also stationary for any h. I show in the

appendix that for this case the limiting coefficients on yt, .., yt−p+1 described by Proposition 4

are each given by 1/p and the implied trend for yt+h is

δ0 + δ1(t+ h) + p−1(εt + εt−1 + · · ·+ εt−p+1). (23)

Even for p = 1 this is not a bad estimate and for p = 4 should not differ much from the true

trend δ0 + δ1(t + h). Again regardless of the choice of p, the difference between yt+h and (23)

will be stationary.

Interpreting DSGE’s. A third instructive example is when yt is an element of a theoretical

dynamic stochastic general equilibrium model that is stationary around some steady-state value

µ. If the effects of shocks in the theoretical model die out after h periods, then the linear

projection (20) in the theoretical model is characterized by β0 = µ and β1 = β2 = β3 = β4 = 0.

In other words, the component vt+h is exactly the deviation from the steady state. If shocks have

not completely died out after h periods, then part of what is being labeled trend by this method

would include the components of shocks that persist longer than h periods. But for any value

of h, the linear projection is a well-defined population characteristic of the theoretical stationary

model, and there is an exactly analogous object one can calculate in the possibly nonstationary

observed data. The method thus offers a way to make an apples-to-apples comparison of theory

with data of the sort that users of the HP filter often desire, but which the HP filter itself will

always fail to deliver.

17

4.3 Specification of p and h.

One might be tempted to use a richer model than (20) to forecast yt+h, such as using a vector of

variables, more than 4 lags, or even a nonlinear relation. However, such refinements are com-

pletely unnecessary for the goal of extracting a stationary component, and have the significant

drawback that the more parameters we try to estimate by regression, the more the small-sample

results are likely to differ from the asymptotic predictions. The simple univariate regression

(20) is estimating a population object that is well defined regardless of whether the variable is

part of a large vector system with nonlinear dynamics. For this reason, just as the HP filter

is always implemented as a univariate procedure, my recommendation is to follow that same

strategy for the approach here.

A related issue is the choice of h. For any fixed h, there exists a sample size T for which

the results of Proposition 4 hold. However, a bigger sample size T will be needed the bigger is

h. The information in a finite data set about very long-horizon forecasts is quite limited. If

we are interested in business cycles, a 2-year horizon should be the standard benchmark. It is

also desirable with seasonal data to have both p and h be integer multiples of the number of

observations in a year. Hence for quarterly data my recommendation is p = 4 and h = 8.

In other settings the fundamental interest could be in shocks whose effects last substantially

longer than two years but are nevertheless still transient. A leading example would be the recent

interest in debt cycles prompted by datasets such as developed by Jorda, Schularick, and Taylor

(2016). For such an application I would use h = 5 years, with the regression-free implementation

(yt+5 − yt) having particular appeal given the length of datasets available.

18

4.4 Empirical illustrations.

Figure 5 shows the results when this approach is applied to data on U.S. total employment.

The raw seasonally adjusted data (yt) are plotted in the upper left panel. The residuals from

regression (20) estimated for these data are plotted in black in the lower-left panel, while the

8-lag difference (22) is in red. The latter two series behave very similarly in this case, as indeed

I have found for most other applications. The primary difference is that the regression residual

has sample mean zero by construction (by virtue of the inclusion of a constant term in the

regression) whereas the average value of (22) will be the average growth rate over a two-year

period.

One interesting observation is that the cyclical component of employment starts to decline

significantly before the NBER business cycle peak for essentially every recession. Note that this

inference from Figure 5 is summarizing a true feature of the data and is not an artifact of any

forward-looking aspect of the filter.

The right panels of Figure 5 show what happens when the same procedure is applied to

seasonally unadjusted data. The raw data themselves exhibit a very striking seasonal pattern,

as seen in the top right panel. Notwithstanding, the cyclical factor inferred from seasonally un-

adjusted data (bottom right panel) is almost indistinguishable from that derived from seasonally

adjusted data, confirming that this approach is robust to methods of seasonal adjustment.

Figure 6 applies the method to the major components of the U.S. national income and

product accounts. Investment spending is more cyclically volatile than GDP, while consumption

spending is less so. Imports fall significantly during recessions, reflecting lower spending by

19

U.S. residents on imported goods, and exports substantially less so, reflecting the fact that

international downturns are often decoupled from those in the U.S. Detrended government

spending is dominated by war-related expenditures– the Korean War in the early 1950s, the

Vietnam War in the 1970s, and the Reagan military build-up in the 1980s.

Table 2 reports the standard deviation of the cyclical component of each of these and a

number of other series, along with their correlation with the cyclical component of GDP. We

find very little cyclical correlation between output and prices.19 Both the nominal fed funds

rate and the ex ante real fed funds rate (the latter based on the measure in Hamilton, et al.,

2016) are modestly procyclical, whereas the 10-year nominal interest rate is not.

5 Conclusion.

The HP filter is intended to produce a stationary component from an I(4) series, but in practice

it can fail to do so, and invariably imposes a great cost. It introduces spurious dynamic relations

that are purely an artifact of the filter and have no basis in the true data-generating process,

and there exists no plausible data-generating process for which common popular practice would

provide an optimal decomposition into trend and cycle. There is an alternative approach

that can isolate a stationary component from any I(4) series, preserves the underlying dynamic

relations and consistently estimates well-defined population characteristics for a broad class of

possible data-generating processes.

19 Identifying the sign of this correlation was one of the primary interests of den Haan’s (2000) application ofa related methodology. In contrast to the results in Table 2, he found a positive correlation between the cyclicalcomponents of these series. I attribute the difference to differences in sample period.

20

References

Atkeson, Andrew and Lee E. Ohanian. 2001. “Are Phillips Curves Useful for Forecasting

Inflation?,” Quarterly Review, Federal Reserve Bank of Minneapolis, 25(1): 2-11.

Balcilar, Mehmet, Rangan Gupta, Anandamayee Majumdar and Stephen M. Miller. 2015.

“Was the Recent Downturn in US Real GDP Predictable?,” Applied Economics, 47(28): 2985-

3007.

Beveridge, Stephen, and Charles R. Nelson. 1981. “A New Approach to Decomposition of

Economic Time Series into Permanent and Transitory Components with Particular Attention to

Measurement of the ‘Business Cycle’,” Journal of Monetary Economics, 7:151-174.

Cheung, Yin-Wong, Menzie D. Chinn & Antonio Garcia Pascual. 2005. “Empirical exchange

rate models of the nineties: Are any fit to survive?,” Journal of International Money and Finance,

24(7): 1150-1175.

Choi, In. 1993. “Asymptotic Normality of the Least-Squares Estimates for Higher Order

Autoregressive Integrated Processes with Some Applications,” Econometric Theory, 9:263-282.

Cogley, Timothy, and James M. Nason. 1995. “Effects of the Hodrick-Prescott Filter on

Trend and Difference Stationary Time Series: Implications for Business Cycle Research,” Journal

of Economic Dynamics and Control, 19(1-2): 253-278.

Cornea-Madeira, Adriana. Forthcoming. “The Explicit Formula for the Hodrick-Prescott

Filter in Finite Sample,” Review of Economics and Statistics.

De Jong, Robert M., and Neslihan Sakarya. 2016. “The Econometrics of the Hodrick-Prescott

Filter,” Review of Economics and Statistics, 98: 310–317.

Den Haan, Wouter J. 2000. “The Comovement between Output and Prices,” Journal of

21

Monetary Economics 46: 3-30.

Fama, Eugene F. 1965. “The Behavior of Stock-Market Prices,” The Journal of Business,

38(1): 34-105.

Flood, Robert P. and Andrew K. Rose. 2010. “Forecasting International Financial Prices

with Fundamentals: How do Stocks and Exchange Rates Compare?,” Globalization and Eco-

nomic Integration, Chapter 6, Edward Elgar Publishing.

Hall, Robert E. 1978. “Stochastic Implications of the Life Cycle-Permanent Income Hypoth-

esis: Theory and Evidence,” Journal of Political Economy, 86(6): 971-987.

Hamilton, James D. 1994. Time Series Analysis, Princeton: Princeton University Press.

Hamilton, James D. 2009. “Understanding Crude Oil Prices,” Energy Journal, 30(2): 179-

206.

Hamilton, James D., Ethan S. Harris, Jan Hatzius, and Kenneth D. West. 2016. “The

Equilibrium Real Funds Rate: Past, Present and Future,” IMF Economic Review 64: 660-707.

Harvey, Andrew C., and Albert Jaeger. 1993. “Detrending, Stylized Facts and the Business

Cycle,” Journal of Applied Econometrics, 8(3): 231-247.

Hodrick, Robert J. and Edward C. Prescott. 1981. “Postwar U.S. Business Cycles: An

Empirical Investigation,” working paper, Northwestern University.

Hodrick, Robert J. and Edward C. Prescott. 1997. “Postwar U.S. Business Cycles: An

Empirical Investigation,” Journal of Money, Credit and Banking, 29(1): 1-16.

Jorda, Oscar, Moritz Schularick, and Alan M. Taylor. 2016. “Macrofinancial History and

the New Business Cycle Facts,” in NBER Macroeconomics Annual 2016, volume 31.

King, Robert and Sergio T. Rebelo. 1989. “Low Frequency Filtering and Real Business

22

Cycles,” working paper, University of Rochester.

King, Robert and Sergio T. Rebelo. 1993. “Low Frequency Filtering and Real Business

Cycles,” Journal of Economic Dynamics and Control, 17: 207-231.

Mankiw, N. Gregory. 1987. “The Optimal Collection of Seigniorage: Theory and Evidence,”

Journal of Monetary Economics, 20(2): 327-341.

McElroy, Tucker. 2008. “Exact Formulas for the Hodrick-Prescott Filter,” Econometrics

Journal, 11(1): 209-217.

Meese, Richard A. and Kenneth Rogoff. 1983. “Empirical Exchange Rate Models of the

Seventies : Do they Fit Out of Sample?,” Journal of International Economics, 14(1-2): 3-24.

Nelson, Charles R., and Charles R. Plosser. 1982. “Trends and Random Walks in Macroe-

conomic Time Series: Some Evidence and Implications,” Journal of Monetary Economics 10:

139-162.

Park, Joon Y., and Peter C.B. Phillips. 1989. “Statistical Inference in Regressions with

Integrated Processes: Part 2,” Econometric Theory, 5:95-131.

Phillips, Peter C.B. 1987. “Time Series Regression with a Unit Root,” Econometrica, 55:

277-301.

Phillips, Peter C.B., and Sainan Jin. 2015. “Business Cycles, Trend Elimination, and the

HP Filter,” working paper, Yale University.

Pesando, James E. 1979. “On the Random Walk Characteristics of Short- and Long-Term

Interest Rates in an Efficient Market,” Journal of Money, Credit and Banking, Blackwell Pub-

lishing, 11(4): 457-466.

Ravn, Morten O., and Harald Uhlig. 2002. “On Adjusting the Hodrick-Prescott Filter for

23

the Frequency of Observations,” Review of Economics and Statistics, 84(2): 371-380.

Samuelson, P. 1965. “Proof that Properly Anticipated Prices Fluctuate Randomly,” Indus-

trial Management Review, 6(2): 41-49.

Sargent, Thomas J. 1976. “A Classical Macroeconometric Model for the United States,”

Journal of Political Economy, 84(2): 207-237.

Sims, Christopher A., James H. Stock, and Mark W. Watson. 1990. “Inference in Linear

Time Series Models with Some Unit Roots,” Econometrica 58: 113-144.

Singleton, Kenneth J. 1988. “Econometric Issues in the Analysis of Equilibrium Business

Cycle Models,” Journal of Monetary Economics, 21: 361-386.

Stock, James H. 1994. “Unit Roots, Structural Breaks, and Trends,” in Handbook of Econo-

metrics, Volume 4, pp. 2739-2841, edited by Robert F. Engle and Daniel L. McFadden. Ams-

terdam: Elsevier.

24

Appendix.

Derivation of equation (2). The minimization problem (1) can be written

ming{(y −Hg)′(y −Hg) + λ(Qg)′(Qg)} .

The derivative with respect to g is −2H ′(y −Hg) + 2λQ′Qg and setting this to 0 gives (2).

Derivation of equation (4). Define at = [E(yy′)]−1E(ygt). For any at we have

E(gt − a′ty)2 = E(gt − a′ty + a′ty − a′ty)2

= E(gt − a′ty)2 + 2E[(gt − a′ty)y′](at − at) + (at − at)′E(yy′)(at − at).

The middle term equals 0 by the definition of at:

E[(gt − a′ty)y′](at − at) = {E(gty′)− E(gty

′)[E(yy′)]−1E(yy′)}(at − at).

Hence E(gt − a′ty)2 is minimized when at = at. Stacking a′ty into a (T × 1) vector gives (4).

Proof of Proposition 1. The assumptions can be written formally as

E(vt) = E(ct) = 0 (24)

E

vt

ct

[ vt−j ct−j

]=

σ2v 0

0 σ2c

if j = 0

0 otherwise

(25)

E(g0) = E(g−1) = 0 (26)

E

g0

g−1

[ g0 g−1

]= C0 (27)

25

E

vt

ct

[ g0 g−1

]= 0 for t = 1, ...T. (28)

I first establish that under (5)-(6) and (24)-(28),

(Q′Q)E(gg′)H ′ → σ2vH′ (29)

as C−10 → 0. To do so write (6) as Qg = v for v = (vT , vT−1, ..., v1) and Q0g = v0 for v0 a (2× 1)

vector with mean 0 and variance σ2vI2. Also from (28), v0 is uncorrelated with v and

Q0(2×T )

=

[0

(2×T )P−10(2×2)

]

where P0 is the Cholesky factor of C0 (P0P′0 = C0). Stacking these, Q

Q0

g =

v

v0

so

E(gg′) = σ2v

Q

Q0

−1 [

Q′ Q′0

]−1

[Q′ Q′0

] Q

Q0

E(gg′) = (Q′Q+Q′0Q0)E(gg′) = σ2vIT

(Q′Q)E(gg′)H ′ = σ2vH′ − (Q′0Q0)E(gg′)H ′

which goes to σ2vH′ as P−10 → 0, as claimed in (29).

Notice next from

y(T×1)

= H(T×T )

g(T×1)

+ c(T×1)

26

that E(yy′) = HE(gg′)H ′ + σ2cIT and E(gy′) = E(gg′)H ′ + E(gc′) = E(gg′)H ′. Hence

A = E(gy′)[E(yy′)]−1 (30)

= E(gg′)H ′[HE(gg′)H ′ + σ2cIT ]−1.

Combining (2) and (30),

(H ′H + λQ′Q)(A∗ − A)[HE(gg′)H ′ + σ2cIT ]

= H ′[HE(gg′)H ′ + σ2cIT ]− (H ′H + λQ′Q)E(gg′)H ′ (31)

= H ′σ2c − (σ2

c/σ2v)(Q

′Q)E(gg′)H ′

which from (29) goes to 0 as C−10 → 0. Since the matrices premultiplying and postmultiplying

the left side of (31) are of full rank, this establishes that A∗ = A as claimed.

Proof of Proposition 2. Let θ1, θ2, θ3, θ4 be the roots satisfying F (θi) = 0. As noted

by King and Rebelo (1989), since λ > 0, F (z) in (8) is positive for all real z meaning that θi

comprise two pairs of complex conjugates. Since F (z) = F (z−1), if θi is a root, then so is θ−1i .

Thus the values of θi are given by Reim, Re−im, R−1eim, and R−1e−im for some fixed R and m;

one pair is inside the unit circle and the other is outside. Noting that the coefficients on z2 and

z−2 in F (z) are both λ, it follows that F (z) can be written

F (z) = λ(1− θ1z)(1− θ2z)(θ−11 − z−1)(θ−12 − z−1).

From the symmetry of F (z) in z and z−1 we can without loss of generality normalize θ1 and θ2

to be inside the unit circle and write

F (z) =λ

θ1θ2(1− θ1z)(1− θ2z)(1− θ1z−1)(1− θ2z−1).

27

Define (1− φ1z− φ2z2) = (1− θ1z)(1− θ2z), namely φ1 is the real number θ1 + θ2 and φ2 is the

negative real number −θ1θ2. Note also that the roots of (1− φ1z − φ2z2) = 0 are the complex

conjugates θ−11 and θ−12 , which are both outside the unit circle. This gives the bounds on φ1

and φ2 stated in Proposition 2 as in Hamilton (1994, Figure 1.5). Then

F (z) =λ

−φ2

(1− φ1z − φ2z2)(1− φ1z

−1 − φ2z−2). (32)

Evaluating (8) and (32) at z = 1 gives

F (1) = 1 = (1− φ1 − φ2)2λ/(−φ2) (33)

as claimed in (12), Likewise evaluating (8) and (32) at z = −1 gives

F (−1) = 1 + 16λ = (1 + φ1 − φ2)2λ/(−φ2). (34)

Taking the difference between these last two equations establishes (4φ1 − 4φ1φ2)λ/(−φ2) = 16λ

or φ1(1 − φ2) = −4φ2 as claimed in (11). Note that since φ2 < 0 (required by complex roots),

from (11) φ1 > 0.

I next establish that

1

(1− φ1z − φ2z2)(1− φ1z

−1 − φ2z−2)

=C0 + C1z

1− φ1z − φ2z2

+C0 + C1z

−1

1− φ1z−1 − φ2z

−2 +B0. (35)

Combining terms over a common denominator shows that (35) will hold provided

1 = (C0 + C1z)(1− φ1z−1 − φ2z

−2) + (C0 + C1z−1)(1− φ1z

1 − φ2z2)

+B0(1− φ1z − φ2z2)(1− φ1z

−1 − φ2z−2)

= [2C0 − 2C1φ1 +B0(1 + φ21 + φ2

2)]

+[C1 − C0φ1 − C1φ2 −B0φ1 +B0φ1φ2](z + z−1)

−[C0φ2 +B0φ2](z2 + z−2).

28

The coefficient on (z2 + z−2) will be zero provided B0 = −C0, or provided

1 = [C0 − 2C1φ1 − C0φ21 − C0φ

22] + [C1 − C1φ2 − C0φ1φ2](z + z−1). (36)

The coefficient on (z + z−1) will be zero provided

C1 =C0φ1φ2

1− φ2

= −C0φ21/4 (37)

where the last equation made use of (11). Substituting (37) into (36), we see that (35) will be

true provided we set

1 = C0(1− φ21 − φ2

2 + φ31/2).

Combining these results we conclude that

1

(1− φ1z − φ2z2)(1− φ1z

−1 − φ2z−2)

= C0

[1− (φ2

1/4)z

1− φ1z − φ2z2

+1− (φ2

1/4)z−1

1− φ1z−1 − φ2z

−2 − 1

].

From (32) we then obtain (9) with C = −C0φ2/λ as claimed in (13).

To derive (10), recall from Hamilton (1994, pp. 16 and 33) that

1

1− φ1z − φ2z2

=∑∞

j=0Rj[2α cos(mj) + 2β sin(mj)]zj. (38)

We know that the coefficient on zj for j = 0 must be 1, requiring [2α cos(0) + 2β sin(0)] = 1 or

α = 1/2. We likewise know that the coefficient on zj for j = 1 is given by φ1, so R[cos(m) +

2β sin(m)] = φ1, which from (14) gives R2β sin(m) = φ1/2 or 2β sin(m) = cos(m) so 2β =

cot(m). Substituting these values for α and β into (38) gives (10).

Proof of Proposition 3. Recall the identity

yt+h = yt +∑h

j=1 ∆yt+j (39)

29

which immediately gives the result of Proposition 3 for the case d = 1 as stated in (16). We

likewise have the identity

∆yt+j = ∆yt +∑j

s=1 ∆2yt+s. (40)

Substituting (40) into (39) gives

yt+h = yt + ∆yt∑h

j=1 1 + w(h)t

for w(h)t =

∑hj=1

∑js=1 ∆2yt+s as claimed in (17) for the case d = 2. We can proceed recursively

using the identity ∆kyt+s = ∆kyt+∑s

r=1 ∆k+1yt+r and substituting into the preceding expression.

For any d the resulting w(h)t is a finite sum of stationary variables and therefore is itself stationary.

Proof of Proposition 4. Note that the fitted values and residuals implied by the coefficients

in (19) are numerically identical to those if we were to do the (infeasible) regression yt+h =

x′tα + vt+h for

xt = (ut, ut−1, ..., ut−p+d+1, 1,∆d−1yt,∆

d−2yt, ...,∆yt, yt)′

with ut = ∆dyt − µ. The latter regression is infeasible because we do not know the true values

of µ and d. But because the fitted values are the same, once we find the properties of the second

regression, we will also know the properties of the first. For example, for d = 2 and p = 4,

xt =

1 −2 1 0 −µ

0 1 −2 1 µ

0 0 0 0 1

1 −1 0 0 0

1 0 0 0 0

yt

yt−1

yt−2

yt−3

1

≡ Hxt (41)

30

and α = (∑Hxtx

′tH′)−1 (

∑Hxtyt+h) so β = H ′α for every sample. When p = d we define

the (p + 1)× 1 vector as xt = (1,∆d−1yt,∆d−2yt, ...,∆yt, yt)

′, that is, none of the ut−j variables

appear in xt when p = d.

Define q to be the (p+ 1)× 1 vector q = (0, ..., 0, E(w(h)t ), κ

(d)h , κ

(d−1)h , ..., κ

(1)h )′, so that w

(h)t =

w(h)t − E(w

(h)t ) = yt+h − x′tq and

α = (∑xtx′t)−1∑

xt(x′tq + w

(h)t )

= q + (∑xtx′t)−1∑

xtw(h)t . (42)

We first consider the case when (18) holds for ∆dyt when there is further no drift and the initial

value for all of the difference processes is zero, namely, the case when µ = 0 and ∆d−jyt = ξ(j)t

where ξ(1)t =

∑tj=1 uj and ξ

(s)t =

∑tj=1 ξ

(s−1)j for s = 2, 3, ..., d. For this case define

ΥT =

T 1/2Ip−d+1 0 0 · · · 0

0 T 0 · · · 0

0 0 T 2 · · · 0

......

... · · · ...

0 0 0 0 T d

. (43)

Adapting the approach in Sims, Stock and Watson (1990), we have from (42) that

T−1/2ΥT (α− q) = T−1/2ΥT (∑xtx′t)−1∑

xtw(h)t

= T−1/2[Υ−1T

∑xtx′tΥ−1T

]−1Υ−1T

∑xtw

(h)t

=[Υ−1T

∑xtx′tΥ−1T

]−1 [T−1/2Υ−1T

∑xtw

(h)t

]. (44)

31

Consider first the last term in (44):

T−1/2Υ−1T

∑xtw

(h)t =

T−1∑utw

(h)t

...

T−1∑ut−p+d+1w

(h)t

T−1∑w

(h)t

T−3/2∑ξ(1)t w

(h)t

T−5/2∑ξ(2)t w

(h)t

...

T−d−1/2∑ξ(d)t w

(h)t

. (45)

The first p − d terms are just the sample means of stationary variables, which by the Law of

Large Numbers converge in probability to their expectation E(ut−jw(h)t ). Term p−d+1 likewise

converges to E(w(h)t ) = 0. Calculations analogous to those behind Lemma 1(e) in Sims, Stock

and Watson (1990) show that the last d terms in (45) also all converge in probability to zero.20

Turning next to the first term in (44), the upper-left (p−d)× (p−d) block of Υ−1T

∑xtx′tΥ−1T

is characterized byT−1

∑u2t · · · T−1

∑utut−p+d+1

... · · · ...

T−1∑ut−p+d+1ut · · · T−1

∑u2t−p+d+1

p→

γ0 · · · γp−d−1

... · · · ...

γp−d−1 · · · γ0

for γj = E(utut−j). From Sims, Stock and Watson Lemma 1(a) and 1(b), the lower-right

20 That is, before multiplying by T−1/2 the terms are all Op(1); for similar calculations see Lemma 1(b) inChoi (1993) and Proposition 17.3(e) in Hamilton (1994).

32

(d+ 1)× (d+ 1) block satisfies

1 T−3/2∑ξ(1)t T−5/2

∑ξ(2)t · · · T−d−1/2

∑ξ(d)t

T−3/2∑ξ(1)t T−2

∑[ξ

(1)t ]2 T−3

∑ξ(1)t ξ

(2)t · · · T−d−1

∑ξ(1)t ξ

(d)t

T−5/2∑ξ(2)t T−3

∑ξ(2)t ξ

(1)t T−4

∑[ξ

(2)t ]2 · · · T−d−2

∑ξ(2)t ξ

(d)t

......

... · · · ...

T−d−1/2∑ξ(d)t T−d−1

∑ξ(d)t ξ

(1)t T−d−2

∑ξ(d)t ξ

(2)t · · · T−2d

∑[ξ

(d)t ]2

⇒

1 ω∫ 1

0W (1)(r)dr ω

∫ 1

0W (2)(r)dr · · · ω

∫ 1

0W (d)(r)dr

ω∫ 1

0W (1)(r)dr ω2

∫ 1

0[W (1)(r)]2dr ω2

∫ 1

0W (1)(r)W (2)(r)dr · · · ω2

∫ 1

0W (1)(r)W (d)(r)dr

ω∫ 1

0W (2)(r)dr ω2

∫ 1

0W (2)(r)W (1)(r)dr ω2

∫ 1

0[W (2)(r)]2dr · · · ω2

∫ 1

0W (2)(r)W (d)(r)dr

......

... · · · ...

ω∫ 1

0W (d)(r)dr ω2

∫ 1

0W (d)(r)W (1)(r)dr ω2

∫ 1

0W (d)(r)W (2)(r)dr · · · ω2

∫ 1

0[W (d)(r)]2dr

where W (1)(r) denotes Standard Brownian Motion and W (j)(r) =

∫ r

0W (j−1)(s)ds. For the

off-diagonal block of Υ−1T

∑xtx′tΥ−1T we see using calculations analogous to Sims, Stock and

Watson’s Lemma 1(e) that

T−1∑ut · · · T−1

∑ut−p+d+1

T−3/2∑ξ(1)t ut · · · T−3/2

∑ξ(1)t ut−p+d+1

T−5/2∑ξ(2)t ut · · · T−5/2

∑ξ(2)t ut−p+d+1

... · · · ...

T−d−1/2∑ξ(d)t ut · · · T−d−1/2

∑ξ(d)t ut−p+d+1

p→ 0.

Bringing all these results together, it follows that

T−1/2ΥT (α− q) p→

g

0

(46)

33

g =

γ0 · · · γp−d−1

... · · · ...

γp−d−1 · · · γ0

−1 E(utw

(h)t )

...

E(ut−p+d+1w(h)t )

.

Note that g corresponds to the coefficients from a population linear projection of w(h)t on

(ut, ut−1, ..., ut−p+d+1)′.

Writing out (46) explicitly using (43) gives

Ip−d+1 0 0 · · · 0

0 T 1/2 0 · · · 0

0 0 T 3/2 · · · 0

......

... · · · ...

0 0 0 0 T d−1/2

(α− q) p→

g

0

.

This equation shows that the first p − d elements of α converge to the stationary population

projection coefficients g, the p− d+ 1 term to E(w(h)t ), and the last d elements of α converge to

the κ(j)h terms in q. Indeed, the latter estimates are superconsistent– they still converge to the

terms in q even when multiplied by some positive power of T.

Taking again the p = 4 and d = 2 example (41), the coefficients β from the actual regression

34

of yt+h on (yt, yt−1, yt−2, yt−3, 1)′ have plim

βp→

1 0 0 1 1

−2 1 0 −1 0

1 −2 0 0 0

0 1 0 0 0

−µ −µ 1 0 0

g1

g2

µh(h+ 1)/2

h

1

=

g1 + h+ 1

g2 − 2g1 − h

g1 − 2g2

g2

µ{[h(h+ 1)/2]− g1 − g2}

.

The above derivation assumed µ = 0 so that there was no drift in ∆dyt. If instead we had

µ 6= 0, then ∆d−1yt =∑t

s=1 us =∑t

s=1 us + tµ = ξ(1)t + tµ, which is dominated for large t by the

drift term tµ rather than the random walk term ξ(1)t , and ∆d−jyt = ξ

(j)t + (1/j!)tjµ+ op(t

j). In

this case we would simply replace ΥT in the above derivations with

ΥT =

T 1/2Ip−d+1 0 0 · · · 0

0 T 3/2 0 · · · 0

0 0 T 5/2 · · · 0

......

... · · · ...

0 0 0 0 T d+1/2

. (47)

We would then arrive at the identical conclusion (46) this time using results (a), (c), and (g)

from Sims, Stock and Watson Lemma 1.

Alternatively, adding a nonzero initial condition, e.g. replacing ξ(1)t with ξ

(1)t + ξ

(1)0 for ξ

(1)0

any fixed constant produces a term that is still dominated asymptotically by ξ(1)t , and as in Park

and Phillips (1989), the original convergence claims again all go through.

Finally, the derivations are very similar for the case of purely deterministic time trends,

35

yt =∑d

j=0 δjtj + ut. For this case we have µ = E(∆dyt) = δd and

xt = (∆dut − δd, ...,∆dut−p+d+1 − δd, 1,1∑

j=0

δ(d−1)j tj + ∆d−1ut, ...,

d−1∑j=0

δ(1)j tj + ∆ut,

d∑j=0

δjtj + ut)

′

where∑d−s

j=0 δ(s)j tj =

∑d−s+1j=0 δ

(s−1)j tj −

∑d−s+1j=0 δ

(s−1)j (t − 1)j and δ

(0)j = δj. Then for ΥT as in

(47), we again have

T−1/2Υ−1T

∑xtw

(h)t

p→ (E(∆dut − δd)wt+h, ..., E(∆dut−p+d+1 − δd)wt+h, 0, ..., 0)′.

The matrix Υ−1T

∑xtx′tΥ−1T likewise has a block-diagonal plim, so for g the coefficients of the

population linear projection of wt+h on (∆dyt − δd, ...,∆dyt−p+d+1 − δd)′ we have

αp→ (g, E(w

(h)t ), κ

(d)h , ..., κ

(1)h )′. (48)

Derivation of equation (23). For σ2 the variance of εt, w(h)t = εt+h−εt, and vt = εt−εt−1

we have

g =

E(v2t ) E(vtvt−1) E(vtvt−2) · · · E(vtvt−p+2)

E(vt−1vt) E(v2t−1) E(vt−1vt−2) · · · E(vt−1vt−p+2)

E(vt−2vt) E(vt−2vt−1) E(v2t−2) · · · E(vt−2vt−p+2)

......

... · · · ...

E(vt−p+2vt) E(vt−p+2vt−1) E(vt−p+2vt−2) · · · E(v2t−p+2)

−1

E[vt(εt+h − εt)]

E[vt−1(εt+h − εt)]

E[vt−2(εt+h − εt)]

...

E[vt−p+2(εt+h − εt)

=

σ2

2 −1 0 · · · 0

−1 2 −1 · · · 0

0 −1 2 · · · 0

......

... · · · ...

0 0 0 · · · 2

−1

−σ2

0

0

...

0

=

−(p− 1)/p

−(p− 2)/p

−(p− 3)/p

...

−1/p

36

where the last equation can be verified by premultiplying by

2 −1 0 · · · 0

−1 2 −1 · · · 0

0 −1 2 · · · 0

......

... · · · ...

0 0 0 · · · 2

and confirming that the resulting vector is indeed (−1, 0, ..., 0)′. Hence the plim in (48) for this

example is

αp→ (−(p− 1)/p,−(p− 2)/p, ...,−1/p, hδ1, 1)′.

Also for this case we have

H =

1 −1 0 · · · 0 0 −δ1

0 1 −1 · · · 0 0 −δ1

0 0 1 · · · 0 0 −δ1...

...... · · · ...

......

0 0 0 · · · 1 −1 −δ1

0 0 0 · · · 0 0 1

1 0 0 · · · 0 0 0

so β = H ′α has plim (1/p, 1/p, ..., 1/p, δ1[h+ (p− 1)/p+ (p− 2)/p+ · · ·+ 1/p])′ implying a fitted

37

value

β′xt = (1/p)(yt + yt−1 + · · ·+ yt−p+1) + δ1[h+ 1/p+ 2/p+ · · ·+ (p− 1)/p]

= δ1h+ (1/p){yt + [yt−1 + δ1] + [yt−2 + 2δ1] + · · ·+ [yt−p+1 + (p− 1)δ1]}

= δ1h+ (1/p){[δ0 + δ1t+ εt] + [δ0 + δ1t+ εt−1] + · · ·+ [δ0 + δ1t+ εt−p+1]}

= δ0 + δ1(t+ h) + (1/p)(εt + εt−1 + · · ·+ εt−p+1).

38

39

Table 1. Maximum likelihood estimates of parameters of state-space formalization of the HP filter for

assorted quarterly macroeconomic series.

σ2c σ2

v λ

GDP 0.115 0.468 0.245

Consumption 0.163 0.174 0.940

Investment 4.187 12.196 0.343

Exports 5.818 3.341 1.741

Imports 4.423 4.769 0.927

Government spending 0.221 1.160 0.191

Employment 0.006 0.250 0.023

Unemployment rate 0.014 0.092 0.152

GDP Deflator 0.018 0.081 0.216

S&P 500 21.284 15.186 1.402

10-year Treasury yield 0.135 0.054 2.486

Fed Funds Rate 0.633 0.116 5.458

Real Rate 0.875 0.091 9.596

40

Table 2. Standard deviation of cyclical component and correlation with cyclical component of GDP for

assorted macroeconomic series.

Regression Residuals Random walk Sample

St. Dev. GDP Corr. St. Dev. GDP Corr.

GDP 3.38 1.00 3.69 1.00 1947:1-2016:1

Consumption 2.85 0.79 3.04 0.82 1947:1-2016:1

Investment 13.19 0.84 13.74 0.80 1947:1-2016:1

Exports 10.77 0.33 11.33 0.30 1947:1-2016:1

Imports 9.79 0.77 9.98 0.75 1947:1-2016:1

Government spending 7.13 0.31 8.60 0.38 1947:1-2016:1

Employment 3.09 0.85 3.32 0.85 1947:1-2016:2

Unemployment rate 1.44 -0.81 1.72 -0.79 1948:1-2016:2

GDP Deflator 2.99 0.04 4.11 -0.13 1947:1-2016:1

S&P 500 21.80 0.41 22.08 0.38 1950:1-2016:2

10-year Treasury yield 1.46 -0.05 1.51 0.08 1953:2-2016:2

Fed funds rate 2.78 0.33 3.03 0.40 1954:3-2016:2

Real rate 2.25 0.39 2.60 0.42 1958:1-2014:3

Notes to Table 2. Filtered series were based on the full sample available for that variable, while

correlations were calculated using the subsample of overlapping values for the two indicators. Note

that the regression residuals lose the first 11 observations and the random-walk calculations lose the

first 8 observations.

41

Figure 1. Values for φ1 and φ2 implied by different values of λ.

-1.2

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

-0.5 0 0.5 1 1.5 2 2.5

φ2

φ1

λ = 0

λ = ∞

λ = 1600

42

Figure 2. Autocorrelations and cross-correlations for first-difference of stock prices and real

consumption spending.

Notes to Figure 2. Upper left: autocorrelations of log growth rate of end-of-quarter value for S&P 500.

Upper right: autocorrelations of log growth rate of real consumption spending. Lower panels: cross

correlations.

Figure 3. Autocorrelations and cross-correlations for HP cyclical component of stock prices and real

consumption spending.

Notes to Figure 3. Upper left: autocorrelations of HP cycle for log of end-of-quarter value for S&P 500.

Upper right: autocorrelations of HP cycle for log of real consumption spending. Lower panels: cross

correlations.

Lag0 1 2 3 4 5 6 7 8

-0.5

0

0.5

1S&P 500 with own lags

Lag0 1 2 3 4 5 6 7 8

-0.5

0

0.5

1Consumption with own lags

Lag0 1 2 3 4 5 6 7 8

-0.5

0

0.5

1S&P 500 with lags of Consumption

Lag0 1 2 3 4 5 6 7 8

-0.5

0

0.5

1Consumption with lags of S&P 500

Lag0 1 2 3 4 5 6 7 8

-0.5

0

0.5

1S&P 500 with own lags

Lag0 1 2 3 4 5 6 7 8

-0.5

0

0.5

1Consumption with own lags

Lag0 1 2 3 4 5 6 7 8

-0.5

0

0.5

1S&P 500 with lags of Consumption

Lag0 1 2 3 4 5 6 7 8

-0.5

0

0.5

1Consumption with lags of S&P 500

43

Figure 4. Comparison of one-sided and two-sided HP filters.

Notes to Figure 4. Red line in both panels plots 100 times natural log of S&P 500 stock price index. The

black curve in the top panel plots the HP estimate of trend as inferred using the usual two-sided filter

(calculated using the Kalman smoother for the state-space model in Proposition 1), whereas the black

curve in the bottom panel plots trend from a one-sided HP filter (calculated using the Kalman filter for

the same model). Shaded regions denote NBER recession dates.

Two-sided filter

1950 1960 1970 1980 1990 2000 2010200

300

400

500

600

700

800

One-sided filter

1950 1960 1970 1980 1990 2000 2010200

300

400

500

600

700

800

44

Figure 5. Regression and 8-quarter-change filters applied to seasonally adjusted and seasonally

unadjusted employment data.

Notes to Figure 5. Upper left: 100 times the log of end-of-quarter values for seasonally adjusted

nonfarm payrolls. Lower left: black plots 0 1 8 2 9 3 10 4 11ˆ ˆ ˆ ˆ ˆ

t t t t ty y y y yβ β β β β− − − −− − − − − as a function of t

while red plots 8.t ty y −− Right panels show results when the identical procedure is applied instead to

seasonally unadjusted data.

Employment (seasonally adjusted)

1950 1960 1970 1980 1990 2000 20101060

1080

1100

1120

1140

1160

1180

1200

Cyclical component (SA)

1950 1960 1970 1980 1990 2000 2010-10.0

-7.5

-5.0

-2.5

0.0

2.5

5.0

7.5

10.0

12.5RANDOM_WALKREGRESSION

Employment (not seasonally adjusted)

1950 1960 1970 1980 1990 2000 20101060

1080

1100

1120

1140

1160

1180

1200

Cyclical component (NSA)

1950 1960 1970 1980 1990 2000 2010-10.0

-7.5

-5.0

-2.5

0.0

2.5

5.0

7.5

10.0

12.5RANDOM_WALKREGRESSION

45

Figure 6. Results of applying regression (black) and 8-quarter-change (red) filters to 100 times the log of

components of U.S. national income and product accounts.

GDP

1950 1960 1970 1980 1990 2000 2010-10

-5

0

5

10

15

20RANDOM_WALKREGRESSION

Consumption

1950 1960 1970 1980 1990 2000 2010-10

-5

0

5

10


Investment

1950 1960 1970 1980 1990 2000 2010-50

-25

0

25


Exports

1950 1960 1970 1980 1990 2000 2010-40

-30

-20

-10

0

10

20

30


Imports

1950 1960 1970 1980 1990 2000 2010-40

-30

-20

-10

0

10

20

30


Government

1950 1960 1970 1980 1990 2000 2010-30

-20

-10

0

10

20

30

40

50