Christopher Dougherty EC220 - Introduction to econometrics (chapter 13) Slideshow: tests of...

Christopher Dougherty

EC220 - Introduction to econometrics (chapter 13)Slideshow: tests of nonstationarity: trended data

Original citation:

Dougherty, C. (2012) EC220 - Introduction to econometrics (chapter 13). [Teaching Resource]

© 2012 The Author

This version available at: http://learningresources.lse.ac.uk/139/

Available in LSE Learning Resources Online: May 2012

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License. This license allows the user to remix, tweak, and build upon the work even for commercial purposes, as long as the user credits the author and licenses their new creations under the identical terms. http://creativecommons.org/licenses/by-sa/3.0/

http://learningresources.lse.ac.uk/



http://creativecommons.org/licenses/by-sa/3.0/

General model

Alternatives

Case (a)

Case (b)

Case (c)

Case (d)

Case (e)

TESTS OF NONSTATIONARITY: TRENDED DATA

1

ttt tYY 121

In this slideshow we consider testing for nonstationarity when an inspection of the graph of a process reveals evidence of a trend.

11 2 12

0 0

or

or

12 0

12 01 0

12 01 0

12 0

*1

*1

12 0*1

General model

Alternatives

Case (a)

Case (b)

Case (c)

Case (d)

Case (e)


2

ttt tYY 121

Cases (a) and (b), considered in the previous slideshow, can be eliminated because they do not give rise to trends. Case (e) has been eliminated because it implies a quadratic trend.

11 2 12

0 0

or

or

12 0

12 01 0

12 01 0

12 0

*1

*1

12 0*1

General model

Alternatives

Case (a)

Case (b)

Case (c)

Case (d)

Case (e)


3

ttt tYY 121

So we are left with Cases (c) and (d).

11 2 12

0 0

or

or

12 0

12 01 0

12 01 0

12 0

*1

*1

12 0*1


4

ttt tYY 121

We need to consider whether the process is better characterized as a random walk with drift, as in Case (c), or a deterministic trend, as in Case (d).

11 2 12

0 0

or

or

0

*1 0

01

ttt YY 11

ttt tYY 121

General model

Alternatives

Case (c)

Case (d)

12

12


5

ttt tYY 121

11 2 12

0 0

or

or

0

*1 0

01

ttt YY 11

ttt tYY 121

General model

Alternatives

Case (c)

Case (d)

12

12

To do this, we fit the general model, as in Case (d), with no assumption about the parameters. We can then test H0: 2 = 1 using as our test statistic either T(b2 – 1) or the t statistic for b2, as before.


6

ttt tYY 121

11 2 12

0 0

or

or

0

*1 0

01

ttt YY 11

ttt tYY 121

General model

Alternatives

Case (c)

Case (d)

12

12

The inclusion of the time trend in the specification causes the critical values under the null hypothesis to be different from those in the untrended case. They are determined by simulation methods, as before.


7

ttt tYY 121

11 2 12

0 0

or

or

0

*1 0

01

ttt YY 11

ttt tYY 121

General model

Alternatives

Case (c)

Case (d)

12

12

We can also perform an F test. We have argued that a process cannot combine a random walk with drift and a time trend, so we can test the composite hypothesis H0: 2 = 1, = 0. Critical values for the three tests are given in Table A.7 at the end of the text.


8

ttt tYY 121

11 2 12

0 0

or

or

0

*1 0

01

ttt YY 11

ttt tYY 121

General model

Alternatives

Case (c)

Case (d)

12

12

If the null hypothesis is false, and Yt is therefore a stationary autoregressive process about a deterministic trend, the OLS estimators of the parameters are √T consistent, and the conventional test statistics are asymptotically valid.


9

ttt tYY 121

11 2 12

0 0

or

or

0

*1 0

01

ttt YY 11

ttt tYY 121

General model

Alternatives

Case (c)

Case (d)

12

12

Two special cases should be mentioned, if only as econometric curiosities.


10

ttt tYY 121

11 2 12

0 0

or

or

0

*1 0

01

ttt YY 11

ttt tYY 121

General model

Alternatives

Case (c)

Case (d)

12

12

In general, if a plot of the process exhibits a trend, we will not know whether it is caused by a deterministic trend or a random walk with drift, and we have to allow for both by fitting the general case, as in Case (d), with no restriction on the parameters.


11

ttt tYY 121

11 2 12

0 0

or

or

0

*1 0

01

ttt YY 11

ttt tYY 121

General model

Alternatives

Case (c)

Case (d)

12

12

But if, for some reason, we know that the process is a deterministic trend or, alternatively, we know that it is a random walk with drift, and if we fit the model appropriately, there is a spectacular improvement in the properties of the OLS estimator of the slope coefficient.

Special case where the process is known to be a deterministic trend


12

In the special case where 2 = 0 and the process is just a simple deterministic trend, we encounter a surprising result.

Distribution of b2

0

50

100

0 0.1 0.2 0.3 0.4

T = 25

T = 50

T = 100

0

50

100

0 0.1 0.2 0.3 0.4

T = 25

T = 50

T = 100

tt tY 1

dtbYt 1ˆ dtYbbY tt 121

ˆ

Distribution of b2

0

50

100

0 0.1 0.2 0.3 0.4

T = 25

T = 50

T = 100

0

50

100

0 0.1 0.2 0.3 0.4

T = 25

T = 50

T = 100



13

If it is known that there is no autoregressive component, and the regression model is correctly specified with t as the only explanatory variable, the OLS estimator of is hyperconsistent, its variance being inversely proportional to T3.

tt tY 1


ˆ

Distribution of b2

0

50

100

0 0.1 0.2 0.3 0.4

T = 25

T = 50

T = 100

0

50

100

0 0.1 0.2 0.3 0.4

T = 25

T = 50

T = 100



14

This is illustrated for the case = 0.2 in the left chart in the figure. Since the standard deviation of the distribution is inversely proportional to T3/2, the height is proportional to T3/2, and so it more than doubles when the sample size is doubled.

tt tY 1


ˆ

Distribution of b2

0

50

100

0 0.1 0.2 0.3 0.4

T = 25

T = 50

T = 100

0

50

100

0 0.1 0.2 0.3 0.4

T = 25

T = 50

T = 100



15

If Yt–1 is mistakenly included in the regression model, the loss of efficiency is dramatic. The estimator of reverts to being only √T consistent. Further, it is subject to finite-sample bias. This is illustrated in the right chart in the figure.

tt tY 1


ˆ

Distribution of b2

0

50

100

0 0.1 0.2 0.3 0.4

T = 25

T = 50

T = 100

0

50

100

0 0.1 0.2 0.3 0.4

T = 25

T = 50

T = 100



16

In this special case, if the regression model is correctly specified, and the disturbance term is normally distributed, OLS t and F tests are valid for finite samples, despite the hyperconsistency of the estimator of .

tt tY 1


ˆ

Distribution of d

0

50

100

0 0.1 0.2 0.3 0.4

T = 25

T = 50

T = 100

0

50

100

0 0.1 0.2 0.3 0.4

T = 25

T = 50

T = 100



17

If the disturbance term is not normal, but has constant variance and finite fourth moment, the t and F tests are asymptotically valid.

tt tY 1


ˆ

Special case where the process is a random walk with drift


18

Similarly, in the special case where the process is a random walk with drift, so that 2 = 1 and = 0, and the model is correctly specified with Yt–1 as the only explanatory variable, the OLS estimator of 2 is hyperconsistent.

ttt YY 11

121ˆ

tt YbbY

0

20

40

60

80

100

120

0.6 0.7 0.8 0.9 1 1.1

T = 200

T = 200T = 50

T = 50T = 100

T = 100

T = 25 T = 25

Red: time trend added

Distribution of b2



19

If a time trend is added to the specification by mistake, there is a loss of efficiency, but it is not as dramatic as in the other special case. The estimator is still superconsistent (variance inversely proportional to T2). The distributions for the various sample sizes for this case are shown as the red lines in the figure.

ttt YY 11

121ˆ

tt YbbY

0

20

40

60

80

100

120

0.6 0.7 0.8 0.9 1 1.1

T = 200

T = 200T = 50

T = 50T = 100

T = 100

T = 25 T = 25


Distribution of b2



20

The conventional t and F tests are asymptotically valid, but not valid for finite samples because the process is autoregressive.

ttt YY 11

121ˆ

tt YbbY

0

20

40

60

80

100

120

0.6 0.7 0.8 0.9 1 1.1

T = 200

T = 200T = 50

T = 50T = 100

T = 100

T = 25 T = 25


Distribution of b2

Augmented Dickey–Fuller tests

Second-order autoregressive process


21

tttt YYY 23121

We need to generalize the discussion to higher order processes. We will start with the second-order process shown.

Main condition for stationarity:

132




22

tttt YYY 23121

To be stationary, the parameters now need to satisfy several conditions. The most important in practice is |2 + 3| < 1. To test this, it is convenient to reparameterize the model.


132




23

tttt YYY 23121

tttt

tttttttt

YYY

YYYYYYY

2131321

23131311211

1

Subtract Yt–1 from both sides, add and subtract 3Yt–1 on the right side, and group terms together.


132




24

tttt YYY 23121

tttt

tttttttt

YYY

YYYYYYY

2131321

23131311211

1

ttt

tttt

YY

YYY

1*31

*21

131321

1

1

32*2 3

*3 211 ttt YYY

Thus we obtain a model where Yt = Yt – Yt–1 is related to Yt–1 and Yt–1, with 2* = 2 + 3 and

3* = 3.


132




25

tttt YYY 23121

tttt

tttttttt

YYY

YYYYYYY

2131321

23131311211

1

ttt

tttt

YY

YYY

1*31

*21

131321

1

1

32*2 3

*3 211 ttt YYY

Under the null hypothesis H0: 2* = 1, the process is nonstationary. Given the

reparameterization, H0 may be tested by testing whether the coefficient of Yt–1 is significantly different from zero.


132




26

tttt YYY 23121

tttt

tttttttt

YYY

YYYYYYY

2131321

23131311211

1

ttt

tttt

YY

YYY

1*31

*21

131321

1

1

32*2 3

*3 211 ttt YYY

One may usually perform a one-sided test with alternative hypothesis H1: 2* < 1 since 2

* > 1 implies an explosive process.


132




27

tttt YYY 23121

tttt

tttttttt

YYY

YYYYYYY

2131321

23131311211

1

ttt

tttt

YY

YYY

1*31

*21

131321

1

1

32*2 3

*3 211 ttt YYY

Under the null hypothesis, the estimator of 2* is superconsistent and the test statistics

T(b2* – 1), t, and F have the same distributions, and therefore critical values, as before.


132




28

tttt YYY 23121

tttt

tttttttt

YYY

YYYYYYY

2131321

23131311211

1

ttt

tttt

YY

YYY

1*31

*21

131321

1

1

32*2 3

*3 211 ttt YYY


132

If a deterministic time trend is suspected, it may be included and the critical values are those for the first-order specification with a time trend.


General autoregressive process


29

tptptt YYY 1121 ...


1... 12 p

Generalizing to the case where Yt depends on Yt–1, ..., Yt–p, a condition for stationarity is that|2 + ...+ p+1| < 1 and it is convenient to reparameterize the model as shown, where 2

* = 2 + ...+ p+1 and the other * coefficients are appropriate linear combinations of the original coefficients.

12*2 ... p

tptpttt YYYY *11

*31

*21 ...1




30

tptptt YYY 1121 ...

12*2 ... p


1... 12 p

Under the null hypothesis of non-explosive nonstationarity, the test statistics T(b2* – 1), t,

and F asymptotically have the same distributions and critical values as before. In practice, the t test is particularly popular and is generally known as the augmented Dickey–Fuller (ADF) test.

tptpttt YYYY *11

*31

*21 ...1




31

tptptt YYY 1121 ...

12*2 ... p


1... 12 p

There remains the issue of the determination of p. Two main approaches have been proposed and both start by assuming that one can hypothesize some maximum value pmax.

tptpttt YYYY *11

*31

*21 ...1




32

tptptt YYY 1121 ...

12*2 ... p


1... 12 p

In the F test approach, the reparameterized model is fitted with p = pmax and a t test is performed on the coefficient of Yt–pmax. If this is not significant, this term may be dropped.

tptpttt YYYY *11

*31

*21 ...1




33

tptptt YYY 1121 ...

12*2 ... p


1... 12 p

Next, an F test is performed on the joint explanatory power of Yt–pmax and Yt–pmax–1. If this is not significant, both terms may be dropped.

tptpttt YYYY *11

*31

*21 ...1




34

tptptt YYY 1121 ...

12*2 ... p


1... 12 p

The process continues, including further lagged differences in the F test until the null hypothesis of no joint explanatory power is rejected.

tptpttt YYYY *11

*31

*21 ...1




35

tptptt YYY 1121 ...

12*2 ... p


1... 12 p

The last lagged difference included in the test becomes the term with the maximum lag. Higher order lags may be dropped because the previous F test was not significant.

tptpttt YYYY *11

*31

*21 ...1




36

tptptt YYY 1121 ...

12*2 ... p


1... 12 p

Provided that the disturbance term is iid, the normalized coefficient of Yt–1 and its t statistic will have the same (non-standard) distributions as for the Dickey–Fuller test.

tptpttt YYYY *11

*31

*21 ...1




37

tptptt YYY 1121 ...

12*2 ... p


1... 12 p

The other method is to use an information criterion such as the Bayes Information Criterion (BIC), also known as the Schwarz Information Criterion (SIC). This requires the computation of the BIC statistic shown and choosing p so as to minimize the expression.

tptpttt YYYY *11

*31

*21 ...1

T

TpTRSS

TTk

TRSS

BIClog2

loglog

log




38

tptptt YYY 1121 ...

12*2 ... p


1... 12 p

The first term falls as p increases, but the second term increases, and the trade-off is such that asymptotically the criterion will select the true value of p.

tptpttt YYYY *11

*31

*21 ...1

T

TpTRSS

TTk

TRSS

BIClog2

loglog

log




39

tptptt YYY 1121 ...

12*2 ... p


1... 12 p

A common alternative is the Akaike Information Criterion (AIC) shown. This imposes a smaller penalty on overparameterization and will therefore tend to select a larger value of p, but simulation studies suggest that it may produce better results in practice.

tptpttt YYYY *11

*31

*21 ...1

T

TpTRSS

TTk

TRSS

BIClog2

loglog

log

Tk

TRSS

AIC2

log




40

tptptt YYY 1121 ...

12*2 ... p


1... 12 p

Whether one uses the F test approach or information criteria, it is necessary to check that the residuals are not subject to autocorrelation, for example, using a Breusch–Godfrey lagrange multiplier test.

tptpttt YYYY *11

*31

*21 ...1

T

TpTRSS

TTk

TRSS

BIClog2

loglog

log

Tk

TRSS

AIC2

log




41

tptptt YYY 1121 ...

12*2 ... p


1... 12 p

Autocorrelation would provide evidence that there remain dynamics in the model not accounted for by the specification and that the model does not include enough lags.

tptpttt YYYY *11

*31

*21 ...1

T

TpTRSS

TTk

TRSS

BIClog2

loglog

log

Tk

TRSS

AIC2

log




42

tptptt YYY 1121 ...

12*2 ... p


1... 12 p

The 1979 and 1981 Dickey–Fuller papers were truly seminal in that they have given rise to a very extensive research literature devoted to the improvement of testing for nonstationarity and of the representation of nonstationary processes.

tptpttt YYYY *11

*31

*21 ...1

T

TpTRSS

TTk

TRSS

BIClog2

loglog

log

Tk

TRSS

AIC2

log




43

tptptt YYY 1121 ...

12*2 ... p


1... 12 p

The low power of the Dickey–Fuller tests was acknowledged in the original papers and much effort has been directed to the problem of distinguishing between nonstationary processes and highly autoregressive stationary processes.

tptpttt YYYY *11

*31

*21 ...1

T

TpTRSS

TTk

TRSS

BIClog2

loglog

log

Tk

TRSS

AIC2

log




44

tptptt YYY 1121 ...

12*2 ... p


1... 12 p

Remarkably, the original Dickey–Fuller tests, particularly the t test in augmented form, are still widely used, perhaps even dominant.

tptpttt YYYY *11

*31

*21 ...1

T

TpTRSS

TTk

TRSS

BIClog2

loglog

log

Tk

TRSS

AIC2

log




45

tptptt YYY 1121 ...

12*2 ... p


1... 12 p

Other tests with superior asymptotic properties have been proposed, but some underperform in finite samples, as far as this can be established by simulation.

tptpttt YYYY *11

*31

*21 ...1

T

TpTRSS

TTk

TRSS

BIClog2

loglog

log

Tk

TRSS

AIC2

log




46

tptptt YYY 1121 ...

12*2 ... p


1... 12 p

The augmented Dickey–Fuller t test has retained its popularity on account of robustness and, perhaps, theoretical simplicity.

tptpttt YYYY *11

*31

*21 ...1

T

TpTRSS

TTk

TRSS

BIClog2

loglog

log

Tk

TRSS

AIC2

log




47

tptptt YYY 1121 ...

12*2 ... p


1... 12 p

However, a refinement, the ADF–GLS (generalized least squares) test due to Elliott, Rothenberg, and Stock (1996) appears to be gaining in popularity and is implemented in major regression applications.

tptpttt YYYY *11

*31

*21 ...1

T

TpTRSS

TTk

TRSS

BIClog2

loglog

log

Tk

TRSS

AIC2

log




48

tptptt YYY 1121 ...

12*2 ... p


1... 12 p

Simulations indicate that its power to discriminate between a nonstationary process and a stationary autoregressive process is uniformly closer to the theoretical limit than the standard tests, irrespective of the degree of autocorrelation.

tptpttt YYYY *11

*31

*21 ...1

T

TpTRSS

TTk

TRSS

BIClog2

loglog

log

Tk

TRSS

AIC2

log

Copyright Christopher Dougherty 2011.

These slideshows may be downloaded by anyone, anywhere for personal use.

Subject to respect for copyright and, where appropriate, attribution, they may be

used as a resource for teaching an econometrics course. There is no need to

refer to the author.

The content of this slideshow comes from Section 13.4 of C. Dougherty,

Introduction to Econometrics, fourth edition 2011, Oxford University Press.

Additional (free) resources for both students and instructors may be

downloaded from the OUP Online Resource Centre

http://www.oup.com/uk/orc/bin/9780199567089/.

Individuals studying econometrics on their own and who feel that they might

benefit from participation in a formal course should consider the London School

of Economics summer school course

EC212 Introduction to Econometrics

http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx

or the University of London International Programmes distance learning course

20 Elements of Econometrics

www.londoninternational.ac.uk/lse.

11.07.25

http://www.oup.com/uk/orc/bin/9780199567089/

http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx

Christopher Dougherty EC220 - Introduction to econometrics (chapter 13) Slideshow: tests of...

Documents

Transcript of Christopher Dougherty EC220 - Introduction to econometrics (chapter 13) Slideshow: tests of...