Seasonal Time Series Zishan Chen, Lijuan Kang, Qichao Sun,
Mengying Wang, Zihao Wang
Slide 2
Seasonality Zihao Wang
Slide 3
What is Seasonality? In statistics, many time series exhibit
cyclic variation known as seasonality, seasonal variation, periodic
variation, or periodic fluctuations. This variation can be either
regular or semi-regular. In short, its like a time series repeats
itself after a regular period of time. Seasonal variation is a
component of a time series which is defined as the repetitive and
predictable movement around the trend line in one year or less. It
is detected by measuring the quantity of interest for small time
intervals, such as days, weeks, months or quarters.
Slide 4
What is Seasonality? Seasonality could applied to many
industries, for example, retail sales tend to peak for the
Christmas season and then decline after the holidays. So time
series of retail sales will typically show increasing sales from
September through December and declining sales in January and
February. Seasonality is quite common in economic time series. It
is less common in engineering and scientific data.
Slide 5
What is Seasonality? Organizations affected by seasonal
variation need to identify and measure this seasonality to help
with planning for temporary increases or decreases in labor
requirements, inventory, training, periodic maintenance, and so
forth. Apart from these considerations, the organizations need to
know if the variation they have experienced has been more or less
than the expected, given the usual seasonal variations.
Slide 6
Detecting Seasonality Time Plot
Slide 7
Detecting Seasonality Seasonal Subseries Plot monthplot() in
R
Slide 8
Detecting Seasonality Box Plot boxplot() in R
Slide 9
Detecting Seasonality The time plot is a recommended first step
for analyzing any time series. Although seasonality can sometimes
be indicated with this plot, seasonality is shown more clearly by
the seasonal subseries plot or the box plot. Furthermore, for large
data sets, the box plot is usually easier to read than the seasonal
subseries plot.
Slide 10
Detecting Seasonality Both the seasonal subseries plot and the
box plot assume that the seasonal periods are known. In most cases,
the analyst will in fact know this. For example, for monthly data,
the period is 12 since there are 12 months in a year. However, if
the period is not known, the autocorrelation plot can help. If
there is significant seasonality, the autocorrelation plot should
show spikes at lags equal to the period. For example, for monthly
data, if there is a seasonal effect, we would expect to see
significant peaks at lag 12, 24, 36, and so on (although the
intensity may decrease the further out we go).
Slide 11
Detecting Seasonality ACF
Slide 12
Detecting Seasonality Periodogram
Slide 13
Seasonal Unit Roots The main advantage of seasonal unit root
tests is where you need to make use of data that cannot be
seasonally adjusted or even as a pretest before seasonal
adjustment. If a series has seasonal unit roots, then standard ADF
test statistic do not have the same distribution as for
non-seasonal series.
Slide 14
The Dickey-Hasza-Fuller Test The first test for testing
seasonal unit root is develop by Dickey, Hasza and Fuller (DHF) in
1984. This test is an extension version of the well known
Dickey-Fuller procedure to seasonal time series. Assuming that the
process is SAR(1), The DHF test is shown as: In the null
hypothesis, we are testing versus
Slide 15
The Dickey-Hasza-Fuller Test After the OLS estimation, the test
statistics is obtained as Again, the asymptotic distribution of
this test statistics is a non- standard distribution. The critical
values were obtained by Monte- Carlo simulation for different
sample sizes and seasonal periods.
Slide 16
The Dickey-Hasza-Fuller Test The problem of the DHF test is
that, under the null hypothesis, one has exactly s unit roots.
Under the alternative, one has no unit root. This is very
restrictive, as some people may wish to test for specific seasonal
or non-seasonal unit roots, there is some other tests.
Slide 17
HEGY Test The HEGY test is posed by Hylleberg, Granger, Engle,
Yoo. This test has the advantage of testing seasonal unit root at
each frequency separately, thus it is widely applied and it is the
most customary test.
Slide 18
HEGY Test The HEGY test for seasonal integration is conducted
by estimating the following regression (special case for quarterly
data): where Q jt is a seasonal dummy, and the W it are given
below.
Slide 19
HEGY Test After OLS estimation, tests are conducted for 1 = 0,
for 2 = 0 and a joint test of the hypothesis 3 = 4 = 0. The HEGY
test is a joint test for LR (or zero frequency) unit roots and
seasonal unit roots. If none of the i are equal to zero, then the
series is stationary (both at seasonal and nonseasonal
frequencies).
Slide 20
HEGY Test Interpretation of the different i is as follows: 1.
If 1 < 0, then there is no long-run (nonseasonal) unit root. 1
is on W 1t = S(B)Y t which has had all of the seasonal roots
removed. 2. If 2 < 0, then there is no semi-annual unit root. 3.
If 3 and 4 < 0, then there is no unit root in the annual
cycle.
Slide 21
HEGY Test Just as in the ADF tests, it is important to ensure
that the residuals from estimating the HEGY equation are white
noise. The power of unit root tests is low, that is, it is not easy
to distinguish between genuine unit roots and near-unit roots. So
erroneously imposing a unit root seems better than not imposing it
when one should.
Slide 22
OCSB Test Osborn, Chui, Smith and Birchenhall (1988) modified
the DHF test by replacing s z t with s y t as the dependent
variable. The overall F statistics for testing is used to test the
presence of all seasonal unit roots, and the t statistics on
relates to test the null hypothesis that no intermediate
(nonseasonal) lag is involved in the data generation.
Slide 23
CH Test Canova and Hansen (1995) proposed this test. Unlike all
the seasonal unit root tests shown above, the null hypothesis of
this test is that the process is stationary, while the alternative
hypothesis can be the presence of unit root(s) at specific seasonal
frequency or at selected frequencies.
Slide 24
CH Test Canova and Hansen use the assumption that both the
process under investigation and the explanatory variables in the
null regression do not contain any non-stationary behavior at the
zero frequency.
Slide 25
Number of differences required for a stationary series (ndiffs)
ndiffs uses a unit root test to determine the number of differences
required for time series x to be made stationary. If test="kpss",
the KPSS test is used with the null hypothesis that x has a
stationary root against a unit-root alternative. Then the test
returns the least number of differences required to pass the test
at the level alpha. We can also change the kpss to adf, pp
(Phillips-Perron), In both of these cases, the null hypothesis is
that x has a unit root against a stationary root alternative.
ndiffs(x, alpha=0.05, test=c("kpss","adf", "pp"))
Slide 26
Number of differences required for a stationary series
(nsdiffs) nsdiffs uses seasonal unit root tests to determine the
number of seasonal differences required for time series x to be
made stationary. If test=ch, the Canova-Hansen test is used (with
null hypothesis of deterministic seasonality) and if test=ocsb, the
Osborn-Chui-Smith- Birchenhall test is used (with null hypothesis
that a seasonal unit root exists). nsdiffs(x, m=frequency(x),
test=c("ocsb","ch"))
Slide 27
Number of differences required for a stationary series
(nsdiffs) After one difference, the seasonal time series becomes
stationary
Slide 28
Regression model with seasonal variables Zishan Chen
Slide 29
Dummy Variable Usually, the predictor takes numerical values.
When a predictor is a categorical variable taking only two values
(e.g., "yes" and "no"). This situation can still be handled within
the framework of regression models by creating a "dummy variable"
taking value 1 corresponding to "yes" and 0 corresponding to "no".
A dummy variable is also known as an "indicator variable". If there
are more than two categories, then the variable can be coded using
several dummy variables (one fewer than the total number of
categories if the intercept term are included in the model).
Slide 30
Seasonal Dummy Variable Deterministic seasonality S t can be
written as a function of seasonal dummy variables Let s be the
seasonal frequency s=4 for quarterly s=12 for monthly Let D 1t, D
2t, D 3t,, D st be seasonal dummies D 1t = 1 if s is the first
period, D 1t = 0 otherwise D 2t = 1 if s is the second period, D 2t
= 0 otherwise At any time period t, one of the seasonal dummies D
1t, D 2t, D 3t,, D st will equal 1, all the others will equal
0.
Slide 31
Example of monthly seasonality
Slide 32
Example of weekly seasonality
Slide 33
Other types of seasonality Daily data High-frequency data
Holiday effects Flower sales big on Valentines Day, Mothers Day,
Easter, yet these days can move around Trading-day/ business-day
variation Can divide by number of trading days, or include as a
regressor
Slide 34
Regression Model
Slide 35
Interpreting Coefficients
Slide 36
Seasonal dummy variables and linear time trend
Slide 37
Seasonal + AR
Slide 38
Transformation
Slide 39
Seasonal Representation
Slide 40
Redundant But lagged seasonal dummies are redundant with the
original seasonal dummies The set of lagged dummy variables are
collinear with the current dummy variables Given that you know this
month is February, there is no information in knowing that last
month was January. The lagged dummies should be omitted
Slide 41
AR(p) Case
Slide 42
Trend + Seasonal + AR(p)
Slide 43
Atmospheric CO 2 Monthly Mean Concentrations
Slide 44
Fit the data with dummy variables model fit
Example 1.To get the regression function, first, we need to
simulate a set of empty harmonics at different frequencies of 1 to
6(=s/2). >sin
2. Regression using all potential varibales >co2.lm1
coef(co2.lm1)/sqrt(diag(vcov(co2.lm1))) >summary(co2.lm1)
Slide 56
3. Select significant variables Rule 1: an approximate t-ratio
of magnitude 2 is a common choice. This t-ratio can be obtained by
dividing the estimated coefficient by the standard error of the
estimate. Rule 2: check the p-value using significant level of 0.05
(two-tailed).
Slide 57
4.Regression with selected variables > co2.lm2 coef(co2.lm2)
> summary(co2.lm2)
Slide 58
5. Check the regression model 5.1 Time plot of original data
and fitted values
Slide 59
5.2 Check the residuals (residual plot, acf, and pacf) ADF test
the check the stationary of residuals > adf.test(resid(co2.lm2))
Augmented Dickey-Fuller Test data: resid(co2.lm2) Dickey-Fuller =
-2.8827, Lag order = 8, p-value = 0.2047 alternative hypothesis:
stationary 5. Check the regression model
Slide 60
5.3 Modeling the error term
Slide 61 acf(coerror$resid[-(1:3)])"> coerror coerror 5.4
Check the AR(3) model by residual plot and ACF
>plot(coerror$resid, ylab="Residuals", type">
> coerror coerror 5.4 Check the AR(3) model by residual plot
and ACF >plot(coerror$resid, ylab="Residuals",
type="p");abline(h=0) > acf(coerror$resid[-(1:3)])
Slide 62
5.4 Check the AR(3) model by residual plot and ACF
Slide 63
6. The final model The harmonic function Where e(t) is white
noise. Mean of t Sd of t