Download - Seasonal Time Series Zishan Chen, Lijuan Kang, Qichao Sun, Mengying Wang, Zihao Wang.

Seasonal Time Series Zishan Chen, Lijuan Kang, Qichao Sun, Mengying Wang, Zihao Wang

Seasonality Zihao Wang

What is Seasonality? In statistics, many time series exhibit cyclic variation known as seasonality, seasonal variation, periodic variation, or periodic fluctuations. This variation can be either regular or semi-regular. In short, its like a time series repeats itself after a regular period of time. Seasonal variation is a component of a time series which is defined as the repetitive and predictable movement around the trend line in one year or less. It is detected by measuring the quantity of interest for small time intervals, such as days, weeks, months or quarters.

What is Seasonality? Seasonality could applied to many industries, for example, retail sales tend to peak for the Christmas season and then decline after the holidays. So time series of retail sales will typically show increasing sales from September through December and declining sales in January and February. Seasonality is quite common in economic time series. It is less common in engineering and scientific data.

What is Seasonality? Organizations affected by seasonal variation need to identify and measure this seasonality to help with planning for temporary increases or decreases in labor requirements, inventory, training, periodic maintenance, and so forth. Apart from these considerations, the organizations need to know if the variation they have experienced has been more or less than the expected, given the usual seasonal variations.

Detecting Seasonality Time Plot

Detecting Seasonality Seasonal Subseries Plot monthplot() in R

Detecting Seasonality Box Plot boxplot() in R

Detecting Seasonality The time plot is a recommended first step for analyzing any time series. Although seasonality can sometimes be indicated with this plot, seasonality is shown more clearly by the seasonal subseries plot or the box plot. Furthermore, for large data sets, the box plot is usually easier to read than the seasonal subseries plot.

Detecting Seasonality Both the seasonal subseries plot and the box plot assume that the seasonal periods are known. In most cases, the analyst will in fact know this. For example, for monthly data, the period is 12 since there are 12 months in a year. However, if the period is not known, the autocorrelation plot can help. If there is significant seasonality, the autocorrelation plot should show spikes at lags equal to the period. For example, for monthly data, if there is a seasonal effect, we would expect to see significant peaks at lag 12, 24, 36, and so on (although the intensity may decrease the further out we go).

Detecting Seasonality ACF

Detecting Seasonality Periodogram

Seasonal Unit Roots The main advantage of seasonal unit root tests is where you need to make use of data that cannot be seasonally adjusted or even as a pretest before seasonal adjustment. If a series has seasonal unit roots, then standard ADF test statistic do not have the same distribution as for non-seasonal series.

The Dickey-Hasza-Fuller Test The first test for testing seasonal unit root is develop by Dickey, Hasza and Fuller (DHF) in 1984. This test is an extension version of the well known Dickey-Fuller procedure to seasonal time series. Assuming that the process is SAR(1), The DHF test is shown as: In the null hypothesis, we are testing versus

The Dickey-Hasza-Fuller Test After the OLS estimation, the test statistics is obtained as Again, the asymptotic distribution of this test statistics is a non- standard distribution. The critical values were obtained by Monte- Carlo simulation for different sample sizes and seasonal periods.

The Dickey-Hasza-Fuller Test The problem of the DHF test is that, under the null hypothesis, one has exactly s unit roots. Under the alternative, one has no unit root. This is very restrictive, as some people may wish to test for specific seasonal or non-seasonal unit roots, there is some other tests.

HEGY Test The HEGY test is posed by Hylleberg, Granger, Engle, Yoo. This test has the advantage of testing seasonal unit root at each frequency separately, thus it is widely applied and it is the most customary test.

HEGY Test The HEGY test for seasonal integration is conducted by estimating the following regression (special case for quarterly data): where Q jt is a seasonal dummy, and the W it are given below.

HEGY Test After OLS estimation, tests are conducted for 1 = 0, for 2 = 0 and a joint test of the hypothesis 3 = 4 = 0. The HEGY test is a joint test for LR (or zero frequency) unit roots and seasonal unit roots. If none of the i are equal to zero, then the series is stationary (both at seasonal and nonseasonal frequencies).

HEGY Test Interpretation of the different i is as follows: 1. If 1 < 0, then there is no long-run (nonseasonal) unit root. 1 is on W 1t = S(B)Y t which has had all of the seasonal roots removed. 2. If 2 < 0, then there is no semi-annual unit root. 3. If 3 and 4 < 0, then there is no unit root in the annual cycle.

HEGY Test Just as in the ADF tests, it is important to ensure that the residuals from estimating the HEGY equation are white noise. The power of unit root tests is low, that is, it is not easy to distinguish between genuine unit roots and near-unit roots. So erroneously imposing a unit root seems better than not imposing it when one should.

OCSB Test Osborn, Chui, Smith and Birchenhall (1988) modified the DHF test by replacing s z t with s y t as the dependent variable. The overall F statistics for testing is used to test the presence of all seasonal unit roots, and the t statistics on relates to test the null hypothesis that no intermediate (nonseasonal) lag is involved in the data generation.

CH Test Canova and Hansen (1995) proposed this test. Unlike all the seasonal unit root tests shown above, the null hypothesis of this test is that the process is stationary, while the alternative hypothesis can be the presence of unit root(s) at specific seasonal frequency or at selected frequencies.

CH Test Canova and Hansen use the assumption that both the process under investigation and the explanatory variables in the null regression do not contain any non-stationary behavior at the zero frequency.

Number of differences required for a stationary series (ndiffs) ndiffs uses a unit root test to determine the number of differences required for time series x to be made stationary. If test="kpss", the KPSS test is used with the null hypothesis that x has a stationary root against a unit-root alternative. Then the test returns the least number of differences required to pass the test at the level alpha. We can also change the kpss to adf, pp (Phillips-Perron), In both of these cases, the null hypothesis is that x has a unit root against a stationary root alternative. ndiffs(x, alpha=0.05, test=c("kpss","adf", "pp"))

Number of differences required for a stationary series (nsdiffs) nsdiffs uses seasonal unit root tests to determine the number of seasonal differences required for time series x to be made stationary. If test=ch, the Canova-Hansen test is used (with null hypothesis of deterministic seasonality) and if test=ocsb, the Osborn-Chui-Smith- Birchenhall test is used (with null hypothesis that a seasonal unit root exists). nsdiffs(x, m=frequency(x), test=c("ocsb","ch"))

Number of differences required for a stationary series (nsdiffs) After one difference, the seasonal time series becomes stationary

Regression model with seasonal variables Zishan Chen

Dummy Variable Usually, the predictor takes numerical values. When a predictor is a categorical variable taking only two values (e.g., "yes" and "no"). This situation can still be handled within the framework of regression models by creating a "dummy variable" taking value 1 corresponding to "yes" and 0 corresponding to "no". A dummy variable is also known as an "indicator variable". If there are more than two categories, then the variable can be coded using several dummy variables (one fewer than the total number of categories if the intercept term are included in the model).

Seasonal Dummy Variable Deterministic seasonality S t can be written as a function of seasonal dummy variables Let s be the seasonal frequency s=4 for quarterly s=12 for monthly Let D 1t, D 2t, D 3t,, D st be seasonal dummies D 1t = 1 if s is the first period, D 1t = 0 otherwise D 2t = 1 if s is the second period, D 2t = 0 otherwise At any time period t, one of the seasonal dummies D 1t, D 2t, D 3t,, D st will equal 1, all the others will equal 0.

Example of monthly seasonality

Example of weekly seasonality

Other types of seasonality Daily data High-frequency data Holiday effects Flower sales big on Valentines Day, Mothers Day, Easter, yet these days can move around Trading-day/ business-day variation Can divide by number of trading days, or include as a regressor

Regression Model

Interpreting Coefficients

Seasonal dummy variables and linear time trend

Seasonal + AR

Transformation

Seasonal Representation

Redundant But lagged seasonal dummies are redundant with the original seasonal dummies The set of lagged dummy variables are collinear with the current dummy variables Given that you know this month is February, there is no information in knowing that last month was January. The lagged dummies should be omitted

AR(p) Case

Trend + Seasonal + AR(p)

Atmospheric CO 2 Monthly Mean Concentrations

Fit the data with dummy variables model fit

Example 1.To get the regression function, first, we need to simulate a set of empty harmonics at different frequencies of 1 to 6(=s/2). >sin

2. Regression using all potential varibales >co2.lm1 coef(co2.lm1)/sqrt(diag(vcov(co2.lm1))) >summary(co2.lm1)

3. Select significant variables Rule 1: an approximate t-ratio of magnitude 2 is a common choice. This t-ratio can be obtained by dividing the estimated coefficient by the standard error of the estimate. Rule 2: check the p-value using significant level of 0.05 (two-tailed).

4.Regression with selected variables > co2.lm2 coef(co2.lm2) > summary(co2.lm2)

5. Check the regression model 5.1 Time plot of original data and fitted values

5.2 Check the residuals (residual plot, acf, and pacf) ADF test the check the stationary of residuals > adf.test(resid(co2.lm2)) Augmented Dickey-Fuller Test data: resid(co2.lm2) Dickey-Fuller = -2.8827, Lag order = 8, p-value = 0.2047 alternative hypothesis: stationary 5. Check the regression model

5.3 Modeling the error term

acf(coerror$resid[-(1:3)])"> coerror coerror 5.4 Check the AR(3) model by residual plot and ACF >plot(coerror$resid, ylab="Residuals", type">

> coerror coerror 5.4 Check the AR(3) model by residual plot and ACF >plot(coerror$resid, ylab="Residuals", type="p");abline(h=0) > acf(coerror$resid[-(1:3)])

5.4 Check the AR(3) model by residual plot and ACF

6. The final model The harmonic function Where e(t) is white noise. Mean of t Sd of t

7.Prediction > newT TIME sin