Stata Good

8/3/2019 Stata Good

1/35

Unit root tests and Box-Jenkins

Anton Parlow

Lab session Econ710UWM Econ Department

03/05/2010

nton Parlow Lab session Econ710 UWM Econ Department ()

Unit root tests and Box-Jenkins 03/05/2010 1 / 35
http://goforward/http://find/http://goback/

8/3/2019 Stata Good

2/35

Our plan

Introduction to time series

AR and MA-process

Box-Jenkins Method

Unit root tests

Short review of Stata

Finding the proper model

Unit root tests

Arima

Forecasting



8/3/2019 Stata Good

3/35

Introduction

A time series is the outcome of a variable observed over time e.g. annually, quarterly, monthlyand so on. There are different ways to describe a series e.g. has it a trend, a drift or is it a

random walk?

Example: Quarterly real GDP from 1947 to 2008

We want to explain GDP today with past values of GDP but have to find the proper model first.


http://find/http://goback/

8/3/2019 Stata Good

4/35

AR and MA-process

If GDP (yt) depends only on its own (=auto) and past values (regressive) we have anautoregressive process:

yt = + 1yt

1+

2yt

2+ 3yt

3+ + pytp + t

In general we call it an AR(p)-model and if GDP depends only on one past realization (=lag), itis an AR(1)-process:

yt = + 1yt1 + t

nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 4 / 35

8/3/2019 Stata Good

5/35

AR and MA-process continued

If a variable depends only on past realizations of own error-terms we have a moving averageprocess

yt = + t + 1t1 + 2t2 + 3t3 + + qtq

In general we call it a MA(q)-model and if it depends only on one past error-term, it is aMA(1)-process:

yt = + t + 1t1

Sometimes called a white noise process or the error-term is well-behaved (E [ut] = 0,Var(ut) = 2) and they are iid (=independently identically distributed)

A bit hard to find examples for this, so let us focus on AR-processes today!


8/3/2019 Stata Good

6/35


In general theses two models are an ARMA(p,q)-model where p = order for the AR-process, q= order for the MA-process

Examples:

ARMA(1,0)= AR(1)-process yt = + 1yt1 + t

ARMA(0,1)= MA(1)-process yt = + t + 1t1

ARMA(1,1)= AR(1) and MA(1) in one model

yt = + 1yt1 + it1 + t

If you see an ARIMA(p,I,q)-model then the I stands for integrated or when is the modelstationary (see unit-root tests). If I=0 or I(0) the time series is already stationary. If I=1 or I(1)

then it is stationary after first differencing and so on.


8/3/2019 Stata Good

7/35


Sometimes it is convenient to write these models in lag-operator notation L for L = one lag, L2

= two lags and so on.

Example: yt = + 1yt1 + t becomes yt = + 1Lyt + t

that Lyt = yt1, L2yt = yt2, L3yt = yt3 and so on

Example ARMA(1,1) in L-notation:

yt =[11]t

[11 ] yt [11L] = [1 1] t open the brackets yt1Lyt = t 1Lt

yt = 1Lyt + t 1Lt finally: yt = 1yt1 + t 1t1


8/3/2019 Stata Good

8/35


How to figure out the process describing a time-series? Use the autocorrelation function ACF(= covariance between past realizations) and the partial autocorrelation function PACF. SeeHamilton chapter 3 for a very good step by step derivation of these.

Take a look at these and decide. Time-series modeling is often referred as art (actually

empirical work in general) meaning you can have two economists telling you something else ifthey look at these functions.

Remember the ACF and PACF are pretty much opposite to each other when we talk about ARand MA-processes. An AR-process has a (exponentially) declining ACF and spikes for the PACF.A MA-process has spikes in the ACF and (exponentially) declining PACF CONFUSED??? see some examples next


8/3/2019 Stata Good

9/35


Example AR(1):

Example AR(2):


8/3/2019 Stata Good

10/35


Example MA(1):

Example MA(2):


8/3/2019 Stata Good

11/35


Much more fun if you have AR and MA-terms in your model.. ARMA(1,1):

Another way to find the underlying process is to use information criteria like BIC, AIC, SIC

which is part of the output in Eviews but not in STATA (calculating by hand a lot of fun) e.g.start with AR(0), then AR(1), AR(2).. and calculate the information criteria a trick maybe useestat ic


8/3/2019 Stata Good

12/35

8/3/2019 Stata Good

13/35

Unit root tests

If a time series is stationary, regressions results are not spurious or screwed up. This means mostof the time we want to have the series stationary (not needed if you do error-correction models).

Problem is, most macroeconomic time series like GDP, unemployment, trade and many more arenon-stationary (=contain a unit-root) or are not going back to their mean and the variance isnot constant (actually increasing over time). More formally, a series is stationary when the

errors are:

1. E(t) = 0

2. var(t) = 2 = or is constant

3. E(tt1) = 0 or error terms are not (serially) correlated

in other words: the errors are well-behaved or white noise.A non-stationary time series has the opposite properties!


8/3/2019 Stata Good

14/35

Unit root tests continued

Or if we use yt instead, a time-series is stationary when:

1. E(yt) = the mean is constant and does not depend on time

2. E(yt )(ytj ) = j that the auto covariance is independent of time too!

This means we have to test for non-stationarity, which is done using unit root tests like themost common Dickey-Fuller test.

To make a non-stationary time series stationary, we can do the following:

1. take the first differences

2. or detrend the time series (dont do this today)


8/3/2019 Stata Good

15/35


The Dickey Fuller test (or augmented if more than one lag is included) uses following testregressions:

1. yt = yt1 + t note: = yt yt1, = (constant 1)

if the time series is flat (no trend) and potentially slow turning around zero

2. yt = + yt1 + t

if the series is flat and potentially slow-turning around a non-zero value (or has a drift, intercept= )

3. yt = + yt1 + T + t

if the series has a trend T(up or down) and a drift (intercept) or slow-turning around a trendline you would draw through the data

The DF-test has its own test statistics and we want to reject the H0 : = 0 for stationarity. Orin other words if we cannot reject H0 the series is non-stationary and it has to be firstdifferenced.


8/3/2019 Stata Good

16/35


How do we choose the lag-length p for the DF-test? Schwert (1989) suggests following rule ofthumb:

pmax =

12

T100

14

where T = number of periods e.g. years, quarters

Why should we care? If p (1) is too small some serial correlation can remain in the errors andbiases the test, (2) is too large the power of the test will suffer

Another test for unit roots is suggested by Phillips-Perron (=PP) which corrects for a serialcorrelation and heteroskedasticity in the errors.

And both ADF and PP-tests are not very helpful if the series is close to be stationary.Kwiatkowski, Phillips, Schmidt and Shin (1992) suggest a test for stationarity, the so-called

KPSS-test s.t. H0 = series is stationary.


U i i d

8/3/2019 Stata Good

17/35


There are more tests out there, but in general it is not enough to use the Dickey-Fuller test only.

Usually you use some more to be confident about your time series.


Sh S i

8/3/2019 Stata Good

18/35

Short Stata review

Remember a command in Stata has the following structure:

[command] variable, options

We used gen for generating new variables e.g. gen lgdp=log(gdp) to generate the log of GDP

Remember: if you want to have the residues after a regression use predict


Fi di h d l S 1

8/3/2019 Stata Good

19/35

Finding the proper model - Step 1

We will work with quarterly GDP data first

1. set mem 50m

2. load gdp.dta 3. Stata needs to know it is a time series.

3.1. generate a time-variable: gen time=tq(1947q1)+_n-1

3.2. give it the right format: format time %tq

3.3. tell Stata about it: tsset time

4. graph the series: tsline gdp

5. generate: gen lgdp=log(gdp) and graph it again: tsline lgdp


Fi di th d l St 1

8/3/2019 Stata Good

20/35

Finding the proper model - Step 1

Let us play around with ACF (=ac) and PACF (=pac) and lgdp is the variable, option =lag-length

1. ac lgdp, lags(10)

2. pac lgdp, lags(10)

or

3. corrgram lgdp, lags(10)

What do we see? Do it again for 20 lags.

Let us do the same for the first-difference version of lgdp. There are two ways:

1. generate a new variable: gen flgdp=D.lgdp

or2. ac D.lgdp


Fi di th d l ti d St 2

8/3/2019 Stata Good

21/35

Finding the proper model continued - Step 2

Assume an AR(1)-model is okay for log of real GDP. We should run following regression:

reg lgdp L.lgdp

note:

Stata uses L= for lag, L2= two lags, L3 = three lags

Stata uses D = for taking the first difference

Stata uses F = if you have to forward your series, sometimes called a lead

pretty convenient, because you can use these for generating new variables too.


Finding the proper model continued Step 3

8/3/2019 Stata Good

22/35


If the AR(1) model is the proper one, the errors should be white noise. There are a couple ofways to test for it:

1. graph the errors

2. do a Breusch-Godfrey-test for serial correlation

3. do a Q-test called White-Noise test (or portmanteau test)

Note: The Box-pierce test is not very common anymore, due its poor performance in smallsamples.



8/3/2019 Stata Good

23/35


1. Graphing the errors

To get the residues after the regression: predict res, resid

Stata will save the errors in res

There are two ways to graph them:

1.1. tsline resid

plots them against time, there should be no pattern over time

1.2. plot the residues against past residues

and there should be no pattern again!

reg res L.res, beta

twoway (scatter res L.res)



8/3/2019 Stata Good

24/35


2. Breusch-Godfrey-test

again after the regression do the following (no need for predicting errors):

estat bgodfrey, lags(10)

H0 = no serial correlation, if we reject it, then the errors are correlated and not white-noise!

3. White-noise test

run the regression

predict the errors and do the following

wntestq resid, lags(10)

H0 = no serial correlation, if we reject it, then the errors are correlated and not white-noise!


Unit root tests

8/3/2019 Stata Good

25/35

Unit root tests

Are pretty straightforward in Stata:

load quarterly data for defense spending ds.dta and generate the log of defense spending (ds)

1.A-Dickey-Fuller tests

1.case: no constant, no trend term

dfuller lds, noconstant

2.case: constant, no trend

dfuller lds

3.case: constant, trend

dfuller, lds trend

options:

4. includes lags for ADF: dfuller lds, lags(10) includes 10 lags

5. if you need the regression output: dfuller lds, regress



8/3/2019 Stata Good

26/35


2. Phillips-Perron-test

If we dont specify a lag-length PP-test uses Schwerts thumb of rule.

Options are similar to dfuller

pperron lds

Remember: H0=non-stationary

3.KPSS-test

kpss lds

type help kpss into Stata, options are a bit different

Remember: H0=stationary

If we reject the Null, then the series is non-stationary. Stata gives you the test values fordifferent lag-lengths.


ARIMA in Stata

8/3/2019 Stata Good

27/35

ARIMA in Stata

We focused on AR-processes using OLS so far, but more powerful is following command:

arima

Arima-estimation is a maximum likelihood estimation and remember the notation is in generalArima(p,I,q) where I = integration e.g. I=0 the series is already stationary, I=1 you have totake the first differences first

examples

arima ds, ar(1) AR(1) for defense spending (ds)

arima ds, arima(1,0,0) still AR(1) but already stationary without first-differencing

arima D.ds, ar(1) = arima ds, arima(1,1,0) first-difference version of AR(1) on ds

arima ds, ma(1) = arima ds, arima(0,0,1) would be a MA(1)-process for ds

arima ds, ar(1) ma(1) = arima ds, arima(1,0,1) would have an AR(1) and a MA(1)component

to get the AIC, BIC for the models, use following command after a regression:

estat ic


ARIMA in Stata continued

8/3/2019 Stata Good

28/35

ARIMA in Stata continued

Residual test

to test the residuals for auto-correlation, it is similar as before (but bgodfrey will not work)

e.g. predict the residuals and graph them, do a whitenoise test (wntestq res)

or if you like a durban watson statistics (dwstat res) which should be around 2.


Forecasting

8/3/2019 Stata Good

29/35

Forecasting

There are different types of forecasting after a regression. We can do an in-sample forecast(using the quarters given) or we can do an out-of-sample forecast (adding quarters).

I will do it for the Arima-command (OLS is a bit different)

Remember: To check the quality of your forecast, you need to calculate the Root mean squareerror (RMSE). The RMSE uses the forecast-error (actual observation minus the forecast) and

the formula is the following: RMSE =

(Ytforecastt)2

N

Example AR(1)-model:

arima fgdp, ar(1)

Do a one-step ahead forecast:

predict fgdp1, y

Compare the actual value with the forecast

tsline fgdp fgdp1


Forecasting continued

8/3/2019 Stata Good

30/35


Calculate the RMSE:1. Generate the forecast error:

gen ferr=fgdp-fgdp1

2. Generate the square of the forecast error:

gen ferr2=ferr^2

3. Get the mean of the errors

sum ferr2

(0.0040)

4. Use it to compute the RMSE.

display "rmse: " (0.0040)^.5

Note there are more ways to measure forecast accuracy.



8/3/2019 Stata Good

31/35


A dynamic forecast could be done as follows:

predict fgdpd, xb dynamic(.)

Plot the actual value and the forecast

tsline fgdp fgdpd

Out of sample forecast

Do the regression but then you have to extend the time-horizon first:

tsappend, add(24)

adds 24 quarters to the quarterly data-set we have.

Then use the predict command for one-step ahead or dynamic forecasts.



8/3/2019 Stata Good

32/35

g

A simple linear OLS-forecast (dont ask me about the dynamic one, same command as above isnot working. There should be a way to compute it manually in Stata):

reg fgdp L.fgdp

predict fgdp1

(Stata assumes the option xb anyway in this case)

tsline fgdp fgdp1

What else could be done???

There is much more out there e.g. rolling forecast, comparing forecasts of different models e.g.AR(1) with AR(2) and so on.


How to create the first difference of a series

8/3/2019 Stata Good

33/35

The simplest way in Stata is:

Let gdp be in levels and we want to create the first difference:

gen fgdp=D.gdp

(same as: yt yt1)

or D2 would be (yt yt1) (yt1 yt2)

As you have seen above, in a regression you can use D,F and L in front of a variable withoutgenerating a new variable first!


Setting the time

8/3/2019 Stata Good

34/35

g

In our examples we had quarterly data, what if you have annual, monthly, weekly or daily data?

annual data

gen time=1947+_n-1

tsset time

monthly data

gen time=tm(1962m2)+_n-1

format time %tm

tsset timeweekly data

gen time=tw(1962w1)+_n-1

format time %tw

tsset time

daily data

gen time=td(1apr1962)+_n-1

format time %td

tsset time

Note:: _n = adds 1 observation to the start date and then it subtracts one.


How to detrend a series? And how to choose the time

8/3/2019 Stata Good

35/35

horizon?

1. Detrending

Sometimes you want to detrend a series e.g. there is a trend present or compared to taking the first difference, you save oneobservation. Imagine you only have 20 years of annual observations.

Steps:

create a trend variable, e.g. a variable increasing with time

gen trend = _n+1

regress your variable of interest using a constant and a trend

reg lgdp trend

use the residuals for the fun stuff you want to do!

2. Choosing the time horizon

There a couple of ways e.g. use observations if starting with 1980 or so but one neat command is the followingtin

= time inreg D.lgdp D2.lgdp tin{1947q1,1965q4)

that the observations are from January 1947 (first quarter) to December 1965 (fourth quarter)


Stata Good

Documents

Transcript of Stata Good