Stata Good

download Stata Good

of 35

Transcript of Stata Good

  • 8/3/2019 Stata Good

    1/35

    Unit root tests and Box-Jenkins

    Anton Parlow

    Lab session Econ710UWM Econ Department

    03/05/2010

    nton Parlow Lab session Econ710 UWM Econ Department ()

    Unit root tests and Box-Jenkins 03/05/2010 1 / 35

    http://goforward/http://find/http://goback/
  • 8/3/2019 Stata Good

    2/35

    Our plan

    Introduction to time series

    AR and MA-process

    Box-Jenkins Method

    Unit root tests

    Short review of Stata

    Finding the proper model

    Unit root tests

    Arima

    Forecasting

    nton Parlow Lab session Econ710 UWM Econ Department ()

    Unit root tests and Box-Jenkins 03/05/2010 2 / 35

    http://goforward/http://find/http://goback/
  • 8/3/2019 Stata Good

    3/35

    Introduction

    A time series is the outcome of a variable observed over time e.g. annually, quarterly, monthlyand so on. There are different ways to describe a series e.g. has it a trend, a drift or is it a

    random walk?

    Example: Quarterly real GDP from 1947 to 2008

    We want to explain GDP today with past values of GDP but have to find the proper model first.

    nton Parlow Lab session Econ710 UWM Econ Department ()

    Unit root tests and Box-Jenkins 03/05/2010 3 / 35

    http://find/http://goback/
  • 8/3/2019 Stata Good

    4/35

    AR and MA-process

    If GDP (yt) depends only on its own (=auto) and past values (regressive) we have anautoregressive process:

    yt = + 1yt

    1+

    2yt

    2+ 3yt

    3+ + pytp + t

    In general we call it an AR(p)-model and if GDP depends only on one past realization (=lag), itis an AR(1)-process:

    yt = + 1yt1 + t

    nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 4 / 35

    http://find/http://goback/
  • 8/3/2019 Stata Good

    5/35

    AR and MA-process continued

    If a variable depends only on past realizations of own error-terms we have a moving averageprocess

    yt = + t + 1t1 + 2t2 + 3t3 + + qtq

    In general we call it a MA(q)-model and if it depends only on one past error-term, it is aMA(1)-process:

    yt = + t + 1t1

    Sometimes called a white noise process or the error-term is well-behaved (E [ut] = 0,Var(ut) = 2) and they are iid (=independently identically distributed)

    A bit hard to find examples for this, so let us focus on AR-processes today!

    nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 5 / 35

    http://find/http://goback/
  • 8/3/2019 Stata Good

    6/35

    AR and MA-process continued

    In general theses two models are an ARMA(p,q)-model where p = order for the AR-process, q= order for the MA-process

    Examples:

    ARMA(1,0)= AR(1)-process yt = + 1yt1 + t

    ARMA(0,1)= MA(1)-process yt = + t + 1t1

    ARMA(1,1)= AR(1) and MA(1) in one model

    yt = + 1yt1 + it1 + t

    If you see an ARIMA(p,I,q)-model then the I stands for integrated or when is the modelstationary (see unit-root tests). If I=0 or I(0) the time series is already stationary. If I=1 or I(1)

    then it is stationary after first differencing and so on.

    nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 6 / 35

    http://find/http://goback/
  • 8/3/2019 Stata Good

    7/35

    AR and MA-process continued

    Sometimes it is convenient to write these models in lag-operator notation L for L = one lag, L2

    = two lags and so on.

    Example: yt = + 1yt1 + t becomes yt = + 1Lyt + t

    that Lyt = yt1, L2yt = yt2, L3yt = yt3 and so on

    Example ARMA(1,1) in L-notation:

    yt =[11]t

    [11 ] yt [11L] = [1 1] t open the brackets yt1Lyt = t 1Lt

    yt = 1Lyt + t 1Lt finally: yt = 1yt1 + t 1t1

    nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 7 / 35

    http://find/http://goback/
  • 8/3/2019 Stata Good

    8/35

    AR and MA-process continued

    How to figure out the process describing a time-series? Use the autocorrelation function ACF(= covariance between past realizations) and the partial autocorrelation function PACF. SeeHamilton chapter 3 for a very good step by step derivation of these.

    Take a look at these and decide. Time-series modeling is often referred as art (actually

    empirical work in general) meaning you can have two economists telling you something else ifthey look at these functions.

    Remember the ACF and PACF are pretty much opposite to each other when we talk about ARand MA-processes. An AR-process has a (exponentially) declining ACF and spikes for the PACF.A MA-process has spikes in the ACF and (exponentially) declining PACF CONFUSED??? see some examples next

    nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 8 / 35

    http://find/http://goback/
  • 8/3/2019 Stata Good

    9/35

    AR and MA-process continued

    Example AR(1):

    Example AR(2):

    nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 9 / 35

    http://find/http://goback/
  • 8/3/2019 Stata Good

    10/35

    AR and MA-process continued

    Example MA(1):

    Example MA(2):

    nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 10 / 35

    http://goforward/http://find/http://goback/
  • 8/3/2019 Stata Good

    11/35

    AR and MA-process continued

    Much more fun if you have AR and MA-terms in your model.. ARMA(1,1):

    Another way to find the underlying process is to use information criteria like BIC, AIC, SIC

    which is part of the output in Eviews but not in STATA (calculating by hand a lot of fun) e.g.start with AR(0), then AR(1), AR(2).. and calculate the information criteria a trick maybe useestat ic

    nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 11 / 35

    http://goforward/http://find/http://goback/
  • 8/3/2019 Stata Good

    12/35

  • 8/3/2019 Stata Good

    13/35

    Unit root tests

    If a time series is stationary, regressions results are not spurious or screwed up. This means mostof the time we want to have the series stationary (not needed if you do error-correction models).

    Problem is, most macroeconomic time series like GDP, unemployment, trade and many more arenon-stationary (=contain a unit-root) or are not going back to their mean and the variance isnot constant (actually increasing over time). More formally, a series is stationary when the

    errors are:

    1. E(t) = 0

    2. var(t) = 2 = or is constant

    3. E(tt1) = 0 or error terms are not (serially) correlated

    in other words: the errors are well-behaved or white noise.A non-stationary time series has the opposite properties!

    nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 13 / 35

    http://goforward/http://find/http://goback/
  • 8/3/2019 Stata Good

    14/35

    Unit root tests continued

    Or if we use yt instead, a time-series is stationary when:

    1. E(yt) = the mean is constant and does not depend on time

    2. E(yt )(ytj ) = j that the auto covariance is independent of time too!

    This means we have to test for non-stationarity, which is done using unit root tests like themost common Dickey-Fuller test.

    To make a non-stationary time series stationary, we can do the following:

    1. take the first differences

    2. or detrend the time series (dont do this today)

    nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 14 / 35

    http://goforward/http://find/http://goback/
  • 8/3/2019 Stata Good

    15/35

    Unit root tests continued

    The Dickey Fuller test (or augmented if more than one lag is included) uses following testregressions:

    1. yt = yt1 + t note: = yt yt1, = (constant 1)

    if the time series is flat (no trend) and potentially slow turning around zero

    2. yt = + yt1 + t

    if the series is flat and potentially slow-turning around a non-zero value (or has a drift, intercept= )

    3. yt = + yt1 + T + t

    if the series has a trend T(up or down) and a drift (intercept) or slow-turning around a trendline you would draw through the data

    The DF-test has its own test statistics and we want to reject the H0 : = 0 for stationarity. Orin other words if we cannot reject H0 the series is non-stationary and it has to be firstdifferenced.

    nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 15 / 35

    http://goforward/http://find/http://goback/
  • 8/3/2019 Stata Good

    16/35

    Unit root tests continued

    How do we choose the lag-length p for the DF-test? Schwert (1989) suggests following rule ofthumb:

    pmax =

    12

    T100

    14

    where T = number of periods e.g. years, quarters

    Why should we care? If p (1) is too small some serial correlation can remain in the errors andbiases the test, (2) is too large the power of the test will suffer

    Another test for unit roots is suggested by Phillips-Perron (=PP) which corrects for a serialcorrelation and heteroskedasticity in the errors.

    And both ADF and PP-tests are not very helpful if the series is close to be stationary.Kwiatkowski, Phillips, Schmidt and Shin (1992) suggest a test for stationarity, the so-called

    KPSS-test s.t. H0 = series is stationary.

    nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 16 / 35

    U i i d

    http://goforward/http://find/http://goback/
  • 8/3/2019 Stata Good

    17/35

    Unit root tests continued

    There are more tests out there, but in general it is not enough to use the Dickey-Fuller test only.

    Usually you use some more to be confident about your time series.

    nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 17 / 35

    Sh S i

    http://goforward/http://find/http://goback/
  • 8/3/2019 Stata Good

    18/35

    Short Stata review

    Remember a command in Stata has the following structure:

    [command] variable, options

    We used gen for generating new variables e.g. gen lgdp=log(gdp) to generate the log of GDP

    Remember: if you want to have the residues after a regression use predict

    nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 18 / 35

    Fi di h d l S 1

    http://goforward/http://find/http://goback/
  • 8/3/2019 Stata Good

    19/35

    Finding the proper model - Step 1

    We will work with quarterly GDP data first

    1. set mem 50m

    2. load gdp.dta 3. Stata needs to know it is a time series.

    3.1. generate a time-variable: gen time=tq(1947q1)+_n-1

    3.2. give it the right format: format time %tq

    3.3. tell Stata about it: tsset time

    4. graph the series: tsline gdp

    5. generate: gen lgdp=log(gdp) and graph it again: tsline lgdp

    nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 19 / 35

    Fi di th d l St 1

    http://goforward/http://find/http://goback/
  • 8/3/2019 Stata Good

    20/35

    Finding the proper model - Step 1

    Let us play around with ACF (=ac) and PACF (=pac) and lgdp is the variable, option =lag-length

    1. ac lgdp, lags(10)

    2. pac lgdp, lags(10)

    or

    3. corrgram lgdp, lags(10)

    What do we see? Do it again for 20 lags.

    Let us do the same for the first-difference version of lgdp. There are two ways:

    1. generate a new variable: gen flgdp=D.lgdp

    or2. ac D.lgdp

    nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 20 / 35

    Fi di th d l ti d St 2

    http://goforward/http://find/http://goback/
  • 8/3/2019 Stata Good

    21/35

    Finding the proper model continued - Step 2

    Assume an AR(1)-model is okay for log of real GDP. We should run following regression:

    reg lgdp L.lgdp

    note:

    Stata uses L= for lag, L2= two lags, L3 = three lags

    Stata uses D = for taking the first difference

    Stata uses F = if you have to forward your series, sometimes called a lead

    pretty convenient, because you can use these for generating new variables too.

    nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 21 / 35

    Finding the proper model continued Step 3

    http://goforward/http://find/http://goback/
  • 8/3/2019 Stata Good

    22/35

    Finding the proper model continued - Step 3

    If the AR(1) model is the proper one, the errors should be white noise. There are a couple ofways to test for it:

    1. graph the errors

    2. do a Breusch-Godfrey-test for serial correlation

    3. do a Q-test called White-Noise test (or portmanteau test)

    Note: The Box-pierce test is not very common anymore, due its poor performance in smallsamples.

    nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 22 / 35

    Finding the proper model continued Step 3

    http://goforward/http://find/http://goback/
  • 8/3/2019 Stata Good

    23/35

    Finding the proper model continued - Step 3

    1. Graphing the errors

    To get the residues after the regression: predict res, resid

    Stata will save the errors in res

    There are two ways to graph them:

    1.1. tsline resid

    plots them against time, there should be no pattern over time

    1.2. plot the residues against past residues

    and there should be no pattern again!

    reg res L.res, beta

    twoway (scatter res L.res)

    nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 23 / 35

    Finding the proper model continued Step 3

    http://goforward/http://find/http://goback/
  • 8/3/2019 Stata Good

    24/35

    Finding the proper model continued - Step 3

    2. Breusch-Godfrey-test

    again after the regression do the following (no need for predicting errors):

    estat bgodfrey, lags(10)

    H0 = no serial correlation, if we reject it, then the errors are correlated and not white-noise!

    3. White-noise test

    run the regression

    predict the errors and do the following

    wntestq resid, lags(10)

    H0 = no serial correlation, if we reject it, then the errors are correlated and not white-noise!

    nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 24 / 35

    Unit root tests

    http://goforward/http://find/http://goback/
  • 8/3/2019 Stata Good

    25/35

    Unit root tests

    Are pretty straightforward in Stata:

    load quarterly data for defense spending ds.dta and generate the log of defense spending (ds)

    1.A-Dickey-Fuller tests

    1.case: no constant, no trend term

    dfuller lds, noconstant

    2.case: constant, no trend

    dfuller lds

    3.case: constant, trend

    dfuller, lds trend

    options:

    4. includes lags for ADF: dfuller lds, lags(10) includes 10 lags

    5. if you need the regression output: dfuller lds, regress

    nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 25 / 35

    Unit root tests continued

    http://goforward/http://find/http://goback/
  • 8/3/2019 Stata Good

    26/35

    Unit root tests continued

    2. Phillips-Perron-test

    If we dont specify a lag-length PP-test uses Schwerts thumb of rule.

    Options are similar to dfuller

    pperron lds

    Remember: H0=non-stationary

    3.KPSS-test

    kpss lds

    type help kpss into Stata, options are a bit different

    Remember: H0=stationary

    If we reject the Null, then the series is non-stationary. Stata gives you the test values fordifferent lag-lengths.

    nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 26 / 35

    ARIMA in Stata

    http://goforward/http://find/http://goback/
  • 8/3/2019 Stata Good

    27/35

    ARIMA in Stata

    We focused on AR-processes using OLS so far, but more powerful is following command:

    arima

    Arima-estimation is a maximum likelihood estimation and remember the notation is in generalArima(p,I,q) where I = integration e.g. I=0 the series is already stationary, I=1 you have totake the first differences first

    examples

    arima ds, ar(1) AR(1) for defense spending (ds)

    arima ds, arima(1,0,0) still AR(1) but already stationary without first-differencing

    arima D.ds, ar(1) = arima ds, arima(1,1,0) first-difference version of AR(1) on ds

    arima ds, ma(1) = arima ds, arima(0,0,1) would be a MA(1)-process for ds

    arima ds, ar(1) ma(1) = arima ds, arima(1,0,1) would have an AR(1) and a MA(1)component

    to get the AIC, BIC for the models, use following command after a regression:

    estat ic

    nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 27 / 35

    ARIMA in Stata continued

    http://goforward/http://find/http://goback/
  • 8/3/2019 Stata Good

    28/35

    ARIMA in Stata continued

    Residual test

    to test the residuals for auto-correlation, it is similar as before (but bgodfrey will not work)

    e.g. predict the residuals and graph them, do a whitenoise test (wntestq res)

    or if you like a durban watson statistics (dwstat res) which should be around 2.

    nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 28 / 35

    Forecasting

    http://goforward/http://find/http://goback/
  • 8/3/2019 Stata Good

    29/35

    Forecasting

    There are different types of forecasting after a regression. We can do an in-sample forecast(using the quarters given) or we can do an out-of-sample forecast (adding quarters).

    I will do it for the Arima-command (OLS is a bit different)

    Remember: To check the quality of your forecast, you need to calculate the Root mean squareerror (RMSE). The RMSE uses the forecast-error (actual observation minus the forecast) and

    the formula is the following: RMSE =

    (Ytforecastt)2

    N

    Example AR(1)-model:

    arima fgdp, ar(1)

    Do a one-step ahead forecast:

    predict fgdp1, y

    Compare the actual value with the forecast

    tsline fgdp fgdp1

    nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 29 / 35

    Forecasting continued

    http://goforward/http://find/http://goback/
  • 8/3/2019 Stata Good

    30/35

    Forecasting continued

    Calculate the RMSE:1. Generate the forecast error:

    gen ferr=fgdp-fgdp1

    2. Generate the square of the forecast error:

    gen ferr2=ferr^2

    3. Get the mean of the errors

    sum ferr2

    (0.0040)

    4. Use it to compute the RMSE.

    display "rmse: " (0.0040)^.5

    Note there are more ways to measure forecast accuracy.

    nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 30 / 35

    Forecasting continued

    http://goforward/http://find/http://goback/
  • 8/3/2019 Stata Good

    31/35

    Forecasting continued

    A dynamic forecast could be done as follows:

    predict fgdpd, xb dynamic(.)

    Plot the actual value and the forecast

    tsline fgdp fgdpd

    Out of sample forecast

    Do the regression but then you have to extend the time-horizon first:

    tsappend, add(24)

    adds 24 quarters to the quarterly data-set we have.

    Then use the predict command for one-step ahead or dynamic forecasts.

    nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 31 / 35

    Forecasting continued

    http://goforward/http://find/http://goback/
  • 8/3/2019 Stata Good

    32/35

    g

    A simple linear OLS-forecast (dont ask me about the dynamic one, same command as above isnot working. There should be a way to compute it manually in Stata):

    reg fgdp L.fgdp

    predict fgdp1

    (Stata assumes the option xb anyway in this case)

    tsline fgdp fgdp1

    What else could be done???

    There is much more out there e.g. rolling forecast, comparing forecasts of different models e.g.AR(1) with AR(2) and so on.

    nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 32 / 35

    How to create the first difference of a series

    http://goforward/http://find/http://goback/
  • 8/3/2019 Stata Good

    33/35

    The simplest way in Stata is:

    Let gdp be in levels and we want to create the first difference:

    gen fgdp=D.gdp

    (same as: yt yt1)

    or D2 would be (yt yt1) (yt1 yt2)

    As you have seen above, in a regression you can use D,F and L in front of a variable withoutgenerating a new variable first!

    nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 33 / 35

    Setting the time

    http://goforward/http://find/http://goback/
  • 8/3/2019 Stata Good

    34/35

    g

    In our examples we had quarterly data, what if you have annual, monthly, weekly or daily data?

    annual data

    gen time=1947+_n-1

    tsset time

    monthly data

    gen time=tm(1962m2)+_n-1

    format time %tm

    tsset timeweekly data

    gen time=tw(1962w1)+_n-1

    format time %tw

    tsset time

    daily data

    gen time=td(1apr1962)+_n-1

    format time %td

    tsset time

    Note:: _n = adds 1 observation to the start date and then it subtracts one.

    nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 34 / 35

    How to detrend a series? And how to choose the time

    http://goforward/http://find/http://goback/
  • 8/3/2019 Stata Good

    35/35

    horizon?

    1. Detrending

    Sometimes you want to detrend a series e.g. there is a trend present or compared to taking the first difference, you save oneobservation. Imagine you only have 20 years of annual observations.

    Steps:

    create a trend variable, e.g. a variable increasing with time

    gen trend = _n+1

    regress your variable of interest using a constant and a trend

    reg lgdp trend

    use the residuals for the fun stuff you want to do!

    2. Choosing the time horizon

    There a couple of ways e.g. use observations if starting with 1980 or so but one neat command is the followingtin

    = time inreg D.lgdp D2.lgdp tin{1947q1,1965q4)

    that the observations are from January 1947 (first quarter) to December 1965 (fourth quarter)

    nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 35 / 35

    http://goforward/http://find/http://goback/