STAT 497 LECTURE NOTES 4
description
Transcript of STAT 497 LECTURE NOTES 4
![Page 1: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/1.jpg)
STAT 497LECTURE NOTES 4
MODEL IDENTIFICATION AND NON-STATIONARY TIME SERIES MODELS
1
![Page 2: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/2.jpg)
MODEL IDENTIFICATION
• We have learned a large class of linear parametric models for stationary time series processes.
• Now, the question is how we can find out the best suitable model for a given observed series. How to choose the appropriate model (on order of p and q).
2
![Page 3: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/3.jpg)
MODEL IDENTIFICATION
• ACF and PACF show specific properties for specific models. Hence, we can use them as a criteria to identify the suitable model.
• Using the patterns of sample ACF and sample PACF, we can identify the model.
3
![Page 4: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/4.jpg)
MODEL SELECTION THROUGH CRITERIA
• Besides sACF and sPACF plots, we have also other tools for model identification.
• With messy real data, sACF and sPACF plots become complicated and harder to interpret.
• Don’t forget to choose the best model with as few parameters as possible.
• It will be seen that many different models can fit to the same data so that we should choose the most appropriate (with less parameters) one and the information criteria will help us to decide this.
4
![Page 5: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/5.jpg)
MODEL SELECTION THROUGH CRITERIA
• The three well-known information criteria are– Akaike’s information criterion (AIC) (Akaike, 1974)– Schwarz’s Bayesian Criterion (SBC) (Schwarz, 1978).
Also known as Bayesian Information Criterion (BIC)– Hannan-Quinn Criteria (HQIC) (Hannan&Quinn,
1979)
5
![Page 6: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/6.jpg)
AIC• Assume that a statistical model of M parameters
is fitted to data
• For the ARMA model and n observations, the log-likelihood function
.2likelihood maximumln2 MAIC
sidualSS
qpa
a SnLRe
22 ,,
212ln
2ln
.,0~ assuming 2..
a
dii
t Na 6
![Page 7: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/7.jpg)
AIC
• Then, the maximized log-likelihood is
constant
2 2ln12
ˆln2
nl nnL a
MnAIC a 2ˆln 2
Choose model (or the value of M) with minimum AIC.
7
![Page 8: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/8.jpg)
SBC• The Bayesian information criterion (BIC) or
Schwarz Criterion (also SBC, SBIC) is a criterion for model selection among a class of parametric models with different numbers of parameters.
• When estimating model parameters using maximum likelihood estimation, it is possible to increase the likelihood by adding additional parameters, which may result in overfitting. The BIC resolves this problem by introducing a penalty term for the number of parameters in the model.
8
![Page 9: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/9.jpg)
SBC
• In SBC, the penalty for additional parameters is stronger than that of the AIC.
• It has the most superior large sample properties.
• It is consistent, unbiased and sufficient.
nMnSBC a lnˆln 2
9
![Page 10: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/10.jpg)
HQIC
• The Hannan-Quinn information criterion (HQIC) is an alternative to AIC and SBC.
• It can be shown [see Hannan (1980)] that in the case of common roots in the AR and MA polynomials, the Hannan-Quinn and Schwarz criteria still select the correct orders p and q consistently.
nMnHQIC a lnln2ˆln 2
10
![Page 11: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/11.jpg)
THE INVERSE AUTOCORRELATION FUNCTION
• The sample inverse autocorrelation function (SIACF) plays much the same role in ARIMA modeling as the sample partial autocorrelation function (SPACF), but it generally indicates subset and seasonal autoregressive models better than the SPACF.
11
![Page 12: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/12.jpg)
THE INVERSE AUTOCORRELATION FUNCTION
• Additionally, the SIACF can be useful for detecting over-differencing. If the data come from a nonstationary or nearly nonstationary model, the SIACF has the characteristics of a noninvertible moving-average. Likewise, if the data come from a model with a noninvertible moving average, then the SIACF has nonstationary characteristics and therefore decays slowly. In particular, if the data have been over-differenced, the SIACF looks like a SACF from a nonstationary process
12
![Page 13: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/13.jpg)
THE INVERSE AUTOCORRELATION FUNCTION
• Let Yt be generated by the ARMA(p, q) process
• If (B) is invertible, then the model
is also a valid ARMA(q, p) model. This model is sometimes referred to as the dual model. The autocorrelation function (ACF) of this dual model is called the inverse autocorrelation function (IACF) of the original model.
.,0~ where 2attqtp WNaaBYB
tptq aBZB
13
![Page 14: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/14.jpg)
THE INVERSE AUTOCORRELATION FUNCTION
• Notice that if the original model is a pure autoregressive model, then the IACF is an ACF that corresponds to a pure moving-average model. Thus, it cuts off sharply when the lag is greater than p; this behavior is similar to the behavior of the partial autocorrelation function (PACF).
• Under certain conditions, the sampling distribution of the SIACF can be approximated by the sampling distribution of the SACF of the dual model (Bhansali, 1980). In the plots generated by ARIMA, the confidence limit marks (.) are located at 2n1/2. These limits bound an approximate 95% confidence interval for the hypothesis that the data are from a white noise process.
14
![Page 15: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/15.jpg)
EXAMPLE USING SIMULATED SERIES 1
• Simulated 100 data from AR(1) where =0.5.• SAS output
15
Autocorrelations
Lag Covariance Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 Std Error
0 1.498817 1.00000 | |********************| 0 1 0.846806 0.56498 | . |*********** | 0.100000 2 0.333838 0.22273 | . |****. | 0.128000 3 0.123482 0.08239 | . |** . | 0.131819 4 0.039922 0.02664 | . |* . | 0.132333 5 -0.110372 -.07364 | . *| . | 0.132387 6 -0.162723 -.10857 | . **| . | 0.132796 7 -0.301279 -.20101 | .****| . | 0.133680 8 -0.405986 -.27087 | *****| . | 0.136670 9 -0.318727 -.21265 | . ****| . | 0.141937 10 -0.178869 -.11934 | . **| . | 0.145088 11 -0.162342 -.10831 | . **| . | 0.146066 12 -0.180087 -.12015 | . **| . | 0.146867 13 -0.132600 -.08847 | . **| . | 0.147847 14 0.026849 0.01791 | . | . | 0.148375 15 0.175556 0.11713 | . |** . | 0.148397
![Page 16: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/16.jpg)
16
Inverse Autocorrelations
Lag Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1
1 -0.50606 | **********| . | 2 0.09196 | . |** . | 3 0.06683 | . |* . | 4 -0.14221 | .***| . | 5 0.16250 | . |***. | 6 -0.07833 | . **| . | 7 -0.02154 | . | . | 8 0.10714 | . |** . | 9 -0.03611 | . *| . | 10 0.03881 | . |* . | 11 -0.04858 | . *| . | 12 0.00989 | . | . | 13 0.09922 | . |** . | 14 -0.09950 | . **| . | 15 0.11284 | . |** . |
Partial Autocorrelations
Lag Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1
1 0.56498 | . |*********** | 2 -0.14170 | .***| . | 3 0.02814 | . |* . | 4 -0.01070 | . | . | 5 -0.11912 | . **| . | 6 -0.00838 | . | . | 7 -0.17970 | ****| . | 8 -0.11159 | . **| . | 9 0.02214 | . | . | 10 -0.01280 | . | . | 11 -0.07174 | . *| . | 12 -0.06860 | . *| . | 13 -0.02706 | . *| . | 14 0.07718 | . |** . | 15 0.04869 | . |* . |
![Page 17: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/17.jpg)
EXAMPLE USING SIMULATED SERIES 2
• Simulated 100 data from AR(1) where =0.5 and take a first order difference.
• SAS output
17
Autocorrelations
Lag Covariance Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 Std Error
0 1.301676 1.00000 | |********************| 0 1 -0.133104 -.10226 | . **| . | 0.100504 2 -0.296746 -.22797 | *****| . | 0.101549 3 -0.131524 -.10104 | . **| . | 0.106593 4 0.080946 0.06219 | . |* . | 0.107557 5 -0.116677 -.08964 | . **| . | 0.107919 6 0.080503 0.06185 | . |* . | 0.108669 7 -0.016109 -.01238 | . | . | 0.109024 8 -0.176930 -.13592 | .***| . | 0.109038 9 -0.055488 -.04263 | . *| . | 0.110736 10 0.136477 0.10485 | . |** . | 0.110902 11 0.022838 0.01754 | . | . | 0.111898 12 -0.067697 -.05201 | . *| . | 0.111926 13 -0.117708 -.09043 | . **| . | 0.112170 14 0.013985 0.01074 | . | . | 0.112904 15 0.0086790 0.00667 | . | . | 0.112914
![Page 18: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/18.jpg)
18
Inverse Autocorrelations
Lag Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1
1 0.58314 | . |************ | 2 0.60399 | . |************ | 3 0.56860 | . |*********** | 4 0.46544 | . |********* | 5 0.51176 | . |********** | 6 0.43134 | . |********* | 7 0.40776 | . |******** | 8 0.42360 | . |******** | 9 0.36581 | . |******* | 10 0.33397 | . |******* | 11 0.28672 | . |****** | 12 0.27159 | . |***** | 13 0.26072 | . |***** | 14 0.16769 | . |***. | 15 0.17107 | . |***. |
Partial Autocorrelations
Lag Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1
1 -0.10226 | . **| . | 2 -0.24095 | *****| . | 3 -0.16587 | .***| . | 4 -0.03460 | . *| . | 5 -0.16453 | .***| . | 6 0.01299 | . | . | 7 -0.06425 | . *| . | 8 -0.18066 | ****| . | 9 -0.11338 | . **| . | 10 -0.03592 | . *| . | 11 -0.05754 | . *| . | 12 -0.08183 | . **| . | 13 -0.17169 | .***| . | 14 -0.11056 | . **| . | 15 -0.13018 | .***| . |
![Page 19: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/19.jpg)
THE EXTENDED SAMPLE AUTOCORRELATION FUNCTION_ESACF• The extended sample autocorrelation function
(ESACF) method can tentatively identify the orders of a stationary or nonstationary ARMA process based on iterated least squares estimates of the autoregressive parameters. Tsay and Tiao (1984) proposed the technique.
19
![Page 20: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/20.jpg)
ESACF
• Consider ARMA(p, q) model
or
then
follows an MA(q) model
20
tq
qtp
p aBBYBB 101 11
qtqttptptt aaaYYY 11110
tp
pt YBBZ 11
.110 qtqttt aaaZ
![Page 21: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/21.jpg)
ESACF• Given a stationary or nonstationary time series
Yt with mean corrected form with a true autoregressive order of p+d and with a true moving-average order of q, we can use the ESACF method to estimate the unknown orders and by analyzing the sample autocorrelation functions associated with filtered series of the form
21
tt YY
m
ijt
jmittjm
jmt YYYBZ
1
,,
, ˆˆ
process. jm,ARMAan is series that theassumption under the estimatesparameter theare s'ˆ where i
![Page 22: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/22.jpg)
ESACF
• It is known that OLS estimators for ARMA process are not consistent so that an iterative procedure is proposed to overcome this.
• The j-th lag of the sample autocorrelation function of the filtered series is the extended sample autocorrelation function, and it is denoted as
22
1,
1,111,
11,1,
ˆˆˆˆˆ
jm
m
jmmjm
ijm
ijm
i
.ˆ mj
![Page 23: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/23.jpg)
ESACF
ESACF TABLE
23
ARMA
0 1 2 3 …
0 …
1 …
2 …
3 …
… … … … … …
01
02
03
04
34
33
32
22
12
31
11 21
13
14 24
23
![Page 24: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/24.jpg)
ESACF• For an ARMA(p,q) process, we have the
following convergence in probability, that is, for m=1,2,… and j=1,2,…, we have
24
otherwise 0,X
qjpm ,ˆ m
j00
![Page 25: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/25.jpg)
ESACF
• Thus, the asymptotic ESACF table for ARMA(1,1) model becomes
25
ARMA
0 1 2 3 4 …
0 X X X X X …
1 X 0 0 0 0 …
2 X X 0 0 0 …
3 X X X 0 0 …
4 X X X X 0 …
… … … … … … …
![Page 26: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/26.jpg)
ESACF• In practice, we have finite samples and may not be exactly zero.
However, we can use the Bartlett’s approximate formula for the asymptotic variance of .
• The orders are tentatively identified by finding a right (maximal) triangular pattern with vertices located at (p+d, q) and (p+d, qmax) and in which all elements are insignificant (based on asymptotic normality of the autocorrelation function). The vertex (p+d, q) identifies the order.
26
qjpmmj 0,
mj
![Page 27: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/27.jpg)
EXAMPLE (R CODE)> x=arima.sim(list(order = c(2,0,0), ar = c(-0.2,0.6)), n = 200)> par(mfrow=c(2,1))> par(mfrow=c(1,2))> acf(x)> pacf(x)
27
![Page 28: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/28.jpg)
EXAMPLE (CONTD.)
• After Loading Package TSA in R:
28
> eacf(x)AR/MA 0 1 2 3 4 5 6 7 8 9 10 11 12 130 x x x x x x x x x x x x x x 1 x x x x x o o o o o o o o o 2 o o o o o o o o o o o o o o 3 x o o o o o o o o o o o o o 4 x x o o o o o o o o o o o o 5 x o x o o o o o o o o o o o 6 x x o x o o o o o o o o o o 7 x x o x o o o o o o o o o o
![Page 29: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/29.jpg)
MINIMUM INFORMATION CRITERION
MINIC TABLE
29
MA AR 0 1 2 3 …0 SBC(0,0) SBC(0,1) SBC(0,2) SBC(0,3) …1 SBC(1,0) SBC(1,1) SBC(1,2) SBC(1,3) …2 SBC(2,0) SBC(2,1) SBC(2,2) SBC(2,3) …3 SBC(3,0) SBC(3,1) SBC(3,2) SBC(3,3) …… … … … … …
![Page 30: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/30.jpg)
MINIC EXAMPLE
• Simulated 100 data from AR(1) where =0.5• SAS Output
30
Minimum Information Criterion
Lags MA 0 MA 1 MA 2 MA 3 MA 4 MA 5
AR 0 0.366884 0.074617 0.06748 0.083827 0.11816 0.161974 AR 1 -0.03571 -0.00042 0.038633 0.027826 0.064904 0.097701 AR 2 -0.0163 0.021657 0.064698 0.072834 0.107481 0.140204 AR 3 0.001216 0.034056 0.080065 0.118677 0.152146 0.183487 AR 4 0.037894 0.069766 0.115222 0.14586 0.189454 0.229528 AR 5 0.065179 0.099543 0.143406 0.185604 0.230186 0.272322
Error series model: AR(8) Minimum Table Value: BIC(1,0) = -0.03571
![Page 31: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/31.jpg)
NON-STATIONARY TIME SERIES MODELS
• Non-constant in mean
• Non-constant in variance
• Both31
![Page 32: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/32.jpg)
NON-STATIONARY TIME SERIES MODELS
• Inspection of the ACF serves as a rough indicator of whether a trend is present in a series. A slow decay in ACF is indicative of a large characteristic root; a true unit root process, or a trend stationary process.
• Formal tests can help to determine whether a system contains a trend and whether the trend is deterministic or stochastic.
32
![Page 33: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/33.jpg)
NON-STATIONARITY IN MEAN
• Deterministic trend– Detrending
• Stochastic trend– Differencing
33
![Page 34: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/34.jpg)
DETERMINISTIC TREND• A deterministic trend is when we say that the
series is trending because it is an explicit function of time.
• Using a simple linear trend model, the deterministic (global) trend can be estimated. This way to proceed is very simple and assumes the pattern represented by linear trend remains fixed over the observed time span of the series. A simple linear trend model:
34
tt atY
![Page 35: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/35.jpg)
DETERMINISTIC TREND• The parameter measure the average change
in Yt from one period to the another:
• The sequence {Yt} will exhibit only temporary departures from the trend line +t. This type of model is called a trend stationary (TS) model.
35
t
ttttt
YEaattYYY 11 1
![Page 36: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/36.jpg)
EXAMPLE
36
![Page 37: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/37.jpg)
TREND STATIONARY
• If a series has a deterministic time trend, then we simply regress Yt on an intercept and a time trend (t=1,2,…,n) and save the residuals. The residuals are detrended series. If Yt is stochastic, we do not necessarily get stationary series.
37
![Page 38: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/38.jpg)
DETERMINISTIC TREND• Many economic series exhibit “exponential
trend/growth”. They grow over time like an exponential function over time instead of a linear function.
• For such series, we want to work with the log of the series:
38
t
tt
YE
atY
ln: is rategrowth average theSo
ln
![Page 39: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/39.jpg)
DETERMINISTIC TREND• Standard regression model can be used to
describe the phenomenon. If the deterministic trend can be described by a k-th order polynomial of time, the model of the process
• Estimate the parameters and obtain the residuals. Residuals will give you the detrended series.
39
.,0~ where 2
2210
at
tk
kt
WNa
atttY
![Page 40: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/40.jpg)
DETERMINISTIC TREND
• This model has a short memory. • If a shock hits a series, it goes back to
trend level in short time. Hence, the best forecasts are not affected.
• Rarely model like this is useful in practice. A more realistic model involves stochastic (local) trend.
40
![Page 41: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/41.jpg)
STOCHASTIC TREND
• A more modern approach is to consider trends in time series as a variable. A variable trend exists when a trend changes in an unpredictable way. Therefore, it is considered as stochastic.
41
![Page 42: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/42.jpg)
STOCHASTIC TREND
• Recall the AR(1) model: Yt = c + Yt−1 + at.• As long as || < 1, everything is fine (OLS is
consistent, t-stats are asymptotically normal, ...).• Now consider the extreme case where = 1, i.e.
Yt = c + Yt−1 + at.• Where is the trend? No t term.
42
![Page 43: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/43.jpg)
STOCHASTIC TREND• Let us replace recursively the lag of Yt on the
right-hand side:
43
t
ii
ttt
ttt
aYtc
aaYccaYcY
10
12
1
Deterministic trend
• This is what we call a “random walk with drift”. If c = 0, it is a“random walk”.
![Page 44: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/44.jpg)
STOCHASTIC TREND
• Each ai shock represents shift in the intercept. Since all values of {ai} have a coefficient of unity, the effect of each shock on the intercept term is permanent.
• In the time series literature, such a sequence is said to have a stochastic trend since each ai shock imparts a permanent and random change in the conditional mean of the series. To be able to define this situation, we use Autoregressive Integrated Moving Average (ARIMA) models.
44
![Page 45: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/45.jpg)
DETERMINISTIC VS STOCHASTIC TREND• They might appear similar since they both lead to
growth over time but they are quite different.• To see why, suppose that through any policies,
you got a bigger Yt because the noise at is big. What will happen next period?
– With a deterministic trend, Yt+1 = c +(t+1)+at+1. The noise at is not affecting Yt+1. Your policy had a one period impact.
– With a stochastic trend, Yt+1 = c + Yt + at+1 = c + (c + Yt−1 + at) + at+1. The noise at is affecting Yt+1. In fact, the policy will have a permanent impact.
45
![Page 46: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/46.jpg)
DETERMINISTIC VS STOCHASTIC TREND
Conclusions:– When dealing with trending series, we are always
interested in knowing whether the growth is a deterministic or stochastic trend.
– There are also economic time series that do not grow over time (e.g., interest rates) but we will need to check if they have a behavior ”similar” to stochastic trends ( = 1 instead of || < a, while c = 0).
– A deterministic trend refers to the long-term trend that is not affected by short term fluctuations in the series. Some of the occurrences are random and may have a permanent effect of the trend. Therefore the trend must contain a deterministic and a stochastic component.
46
![Page 47: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/47.jpg)
DETERMINISTIC TREND EXAMPLESimulate data from let’s say AR(1):>x=arima.sim(list(order = c(1,0,0), ar = 0.6), n = 100)Simulate data with deterministic trend>y=2+time(x)*2+x>plot(y)
47
![Page 48: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/48.jpg)
DETERMINISTIC TREND EXAMPLE> reg=lm(y~time(y))> summary(reg)Call:lm(formula = y ~ time(y)) Residuals: Min 1Q Median 3Q Max -2.74091 -0.77746 -0.09465 0.83162 3.27567 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.179968 0.250772 8.693 8.25e-14 ***time(y) 1.995380 0.004311 462.839 < 2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.244 on 98 degrees of freedomMultiple R-squared: 0.9995, Adjusted R-squared: 0.9995 F-statistic: 2.142e+05 on 1 and 98 DF, p-value: < 2.2e-16
48
![Page 49: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/49.jpg)
DETERMINISTIC TREND EXAMPLE> plot(y=rstudent(reg),x=as.vector(time(y)), ylab='Standardized Residuals',xlab='Time',type='o')
49
![Page 50: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/50.jpg)
DETERMINISTIC TREND EXAMPLE
> z=rstudent(reg) > par(mfrow=c(1,2))> acf(z)> pacf(z)
50
De-trended series
AR(1)
![Page 51: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/51.jpg)
STOCHASTIC TREND EXAMPLESimulate data from ARIMA(0,1,1):> x=arima.sim(list(order = c(0,1,1), ma = -0.7), n = 200)> plot(x)> acf(x)> pacf(x)
51
![Page 52: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/52.jpg)
AUTOREGRESSIVE INTEGRATED MOVING AVERAGE (ARIMA) PROCESSES• Consider an ARIMA(p,d,q) process
52
.0,WN~ and
rootscommon no share 1
and 1 where
1
2a
1
1
0
t
qqq
ppp
tqtd
p
a
BBB
BBB
aBYBB
![Page 53: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/53.jpg)
ARIMA MODELS
• When d=0, 0 is related to the mean of the process.
• When d>0, 0 is a deterministic trend term.– Non-stationary in mean:
– Non-stationary in level and slope:
53
.1 10 p
tqtp aBYBB 01
tqtp aBYBB 021
![Page 54: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/54.jpg)
RANDOM WALK PROCESS
• A random walk is defined as a process where the current value of a variable is composed of the past value plus an error term defined as a white noise (a normal variable with zero mean and variance one).
• ARIMA(0,1,0) PROCESS
54
.,0~ where
12
1
at
tttttt
WNa
aYBYaYY
![Page 55: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/55.jpg)
RANDOM WALK PROCESS
• Behavior of stock market.• Brownian motion.• Movement of a drunken men.• It is a limiting process of AR(1).
55
![Page 56: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/56.jpg)
RANDOM WALK PROCESS• The implication of a process of this type is that the
best prediction of Y for next period is the current value, or in other words the process does not allow to predict the change (YtYt-1). That is, the change of Y is absolutely random.
• It can be shown that the mean of a random walk process is constant but its variance is not. Therefore a random walk process is nonstationary, and its variance increases with t.
• In practice, the presence of a random walk process makes the forecast process very simple since all the future values of Yt+s for s > 0, is simply Yt.
56
![Page 57: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/57.jpg)
RANDOM WALK PROCESS
57
![Page 58: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/58.jpg)
RANDOM WALK PROCESS
58
![Page 59: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/59.jpg)
RANDOM WALK WITH DRIFT
• Change in Yt is partially deterministic and partially stochastic.
• It can also be written as
59
t
Y
tt aYYt
01
trend
stochastic1
trendticdeterminis
00
t
iit atYY Pure model of a trend
(no stationary component)
![Page 60: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/60.jpg)
RANDOM WALK WITH DRIFT
60
00 tYYE t
After t periods, the cumulative change in Yt is t0.
flatnot 0 sYYYE ttst
Each ai shock has a permanent effect on the mean of Yt.
![Page 61: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/61.jpg)
RANDOM WALK WITH DRIFT
61
![Page 62: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/62.jpg)
ARIMA(0,1,1) OR IMA(1,1) PROCESS
• Consider a process
• Letting
62
.,0~ where
112at
tt
WNa
aBYB
tt YBW 1
stationaryaBW tt 1
![Page 63: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/63.jpg)
ARIMA(0,1,1) OR IMA(1,1) PROCESS
• Characterized by the sample ACF of the original series failing to die out and by the sample ACF of the first differenced series shows the pattern of MA(1).
• IF:
63
.1 where11
t
jjt
jt aYY
1
121 1,,
jjt
jttt YYYYE
Exponentially decreasing. Weighted MA of its past values.
![Page 64: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/64.jpg)
ARIMA(0,1,1) OR IMA(1,1) PROCESS
64
,,1,, 2111 ttttttt YYYEYYYYE
where is the smoothing constant in the method of exponential smoothing.
![Page 65: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/65.jpg)
REMOVING THE TREND• Shocks to a stationary time series are
temporary over time. The series revert to its long-run mean.
• A series containing a trend will not revert to a long-run mean. The usual methods for eliminating the trend are detrending and differencing.
65
![Page 66: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/66.jpg)
DETRENDING
• Detrending is used to remove deterministic
trend.
• Regress Yt on time and save the residuals.
• Then, check whether residuals are stationary.
66
![Page 67: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/67.jpg)
DIFFERENCING
• Differencing is used for removing the stochastic trend.
• d-th difference of ARIMA(p,d,q) model is stationary. A series containing unit roots can be made stationary by differencing.
• ARIMA(p,d,q) d unit roots
67
Integrated of order d, I(d) dIYt ~
![Page 68: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/68.jpg)
DIFFERENCING
• Random Walk:
68
ttt aYY 1
tt aY
Non-stationary
Stationary
![Page 69: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/69.jpg)
DIFFERENCING
• Differencing always makes us to loose observation.
• 1st regular difference: d=1
• 2nd regular difference: d=2
69
tttt YYYYB 11
21222 2211 tttttt YYYYBBYYB
difference 2nd not the is 2 tt YY
![Page 70: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/70.jpg)
DIFFERENCING
70
Yt Yt 2Yt YtYt-23 * * *
8 83=5 * *
5 58=3 35=8 53=2
9 95=4 4(3)=7 98=1
![Page 71: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/71.jpg)
KPSS TEST
• To be able to test whether we have a deterministic trend vs stochastic trend, we are using KPSS (Kwiatkowski, Phillips, Schmidt and Shin) Test (1992).
71
stationary difference 1~:
stationary or trend level0~:
1
0
IYHIYH
t
t
![Page 72: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/72.jpg)
KPSS TESTSTEP 1: Regress Yt on a constant and trend and
construct the OLS residuals e=(e1,e2,…,en)’.STEP 2: Obtain the partial sum of the residuals.
STEP 3: Obtain the test statistic
where is the estimate of the long-run variance of the residuals.
72
t
iit eS
1
n
t
tSnKPSS1
22
2
![Page 73: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/73.jpg)
KPSS TEST
• STEP 4: Reject H0 when KPSS is large, because that is the evidence that the series wander from its mean.
• Asymptotic distribution of the test statistic uses the standard Brownian bridge.
• It is the most powerful unit root test but if there is a volatility shift it cannot catch this type non-stationarity.
73
![Page 74: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/74.jpg)
DETERMINISTIC TREND EXAMPLE kpss.test(x,null=c("Level")) KPSS Test for Level Stationaritydata: x KPSS Level = 3.4175, Truncation lag parameter = 2, p-value = 0.01
Warning message:In kpss.test(x, null = c("Level")) : p-value smaller than printed p-value> kpss.test(x,null=c("Trend")) KPSS Test for Trend Stationaritydata: x KPSS Trend = 0.0435, Truncation lag parameter = 2, p-value = 0.1
Warning message:In kpss.test(x, null = c("Trend")) : p-value greater than printed p-value
74
Here, we have deterministic trend or trend stationary process. Hence, we need de-trending to work with stationary series.
![Page 75: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/75.jpg)
STOCHASTIC TREND EXAMPLE> kpss.test(x, null = "Level") KPSS Test for Level Stationarity data: x KPSS Level = 3.993, Truncation lag parameter = 3, p-value = 0.01 Warning message:In kpss.test(x, null = "Level") : p-value smaller than printed p-value> kpss.test(x, null = "Trend") KPSS Test for Trend Stationarity data: x KPSS Trend = 0.6846, Truncation lag parameter = 3, p-value = 0.01 Warning message:In kpss.test(x, null = "Trend") : p-value smaller than printed p-value
75
Here, we have stochastic trend or difference stationary process. Hence, we need differencing to work with stationary series.
![Page 76: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/76.jpg)
PROBLEM
• When an inappropriate method is used to eliminate the trend, we may create other problems like non-invertibility.
• E.g.
76
. and circleunit
theoutside are 0 of roots thewherestationary Trend10
tt
tt
aBB
tYB
![Page 77: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/77.jpg)
PROBLEM
• But if we misjudge the series as difference stationary, we need to take a difference. Actually, detrending should be applied. Then, the first difference:
77
tt BYB 11
Now, we create a non-invertible unit root process in the MA component.
![Page 78: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/78.jpg)
PROBLEM
• To overcome this, look at the inverse sample autocorrelation function. If it has the same ACF pattern of non-stationary process (that is, slow decaying behavior), this means that we over-differenced the series.
• Go back and de-trend the series instead of differencing.
• There are also smoothing filters to eliminate the trend (Decomposition Methods).
78
![Page 79: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/79.jpg)
NON-STATIONARITY IN VARIANCE• Stationarity in mean Stationarity in variance• Non-stationarity in mean Non-stationarity in
variance• If the mean function is time dependent,1. The variance, Var(Yt) is time dependent.2. Var(Yt) is unbounded as t.3. Autocovariance and autocorrelation functions are also
time dependent.4. If t is large wrt Y0, then k 1.
79
![Page 80: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/80.jpg)
VARIANCE STABILIZING TRANSFORMATION
• The variance of a non-stationary process changes as its level changes
for some positive constant c and a function f.• Find a function T so that the transformed
series T(Yt) has a constant variance.
80
tt fcYVar .
The Delta Method
![Page 81: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/81.jpg)
VARIANCE STABILIZING TRANSFORMATION
• Generally, we use the power function
81
1964) Cox, and(Box 1
tt
YYT
Transformation1 1/Yt
0.5 1/(Yt)0.5
0 ln Yt
0.5 (Yt)0.5
1 Yt (no transformation)
![Page 82: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/82.jpg)
VARIANCE STABILIZING TRANSFORMATION
• Variance stabilizing transformation is only for positive series. If your series has negative values, then you need to add each value with a positive number so that all the values in the series are positive. Now, you can search for any need for transformation.
• It should be performed before any other analysis such as differencing.
• Not only stabilize the variance but also improves the approximation of the distribution by Normal distribution.
82
![Page 83: STAT 497 LECTURE NOTES 4](https://reader035.fdocuments.us/reader035/viewer/2022062323/56816835550346895dddedb0/html5/thumbnails/83.jpg)
TRANSFORMATIONinstall(TSA)library(TSA)oil=ts(read.table('c:/oil.txt',header=T), start=1996, frequency=12)BoxCox.ar(y=oil)
83