Time Series Analysis with R -...

27
Time Series Analysis with R Jinseog Kim Department of Staistics and Information Science Dongguk University E-mail:[email protected] 0-0

Transcript of Time Series Analysis with R -...

Time Series Analysis with R

Jinseog Kim

Department of Staistics and Information Science

Dongguk University

E-mail:[email protected]

0-0

1 Time series analysis in R

1.1 k�N�á~ &P��7�£�· �Dø5� R-packages

• stats: (R l��:rJ�v�t�)

• datasets: The R Datasets Package

• forecast(forecasting)

• tseries: Time series analysis and computational finance

• fma: Data sets from ”Forecasting: methods and applications” by

Makridakis, Wheelwright & Hyndman (1998)

0-1

> library(datasets)

> class(LakeHuron)

[1] "ts"

> plot(LakeHuron, ylab="¡Ý÷¬È(in feet)", xlab=" ¬Ï᤼"))

> lag.plot(LakeHuron, lag=4, diag.col = "forest green", do.lines=F)

0-2

lag 1

LakeH

uro

n576

577

578

579

580

581

582

576 578 580 582

lag 2

LakeH

uro

n

lag 3

LakeH

uro

n

lag 4

LakeH

uro

n

576

577

578

579

580

581

582

576 578 580 582

Figure 1: lag.plot

0-3

WWWusage( Internet Usage per Minute) : A time series of the numbers of

users connected to the Internet through a server every minute.

> work <- diff(WWWusage)

> par(mfrow = c(2,1));

> plot(WWWusage);

> plot(work)

0-4

lynx:Annual number of lynx trapped in McKenzie river district of north-

west Canada: 1821∼1934.

#plot(lynx)

#tsdisplay(lynx)

dlynx <- diff(lynx)

par(mfrow = c(2,1));

ts.plot(lynx);

plot(dlynx)

0-5

Time

lyn

x1820 1840 1860 1880 1900 1920

03

00

07

00

0

Time

dly

nx

1820 1840 1860 1880 1900 1920

−3

00

00

30

00

Figure 2: lynx plot

0-6

random walk:Simulation data

rw<-rbinom(100, 1, prob=0.5)*2-1

rw<-cumsum(rw)

par(mfrow = c(2,1));

ts.plot(rw,

main="random walk form independent Bernoulli distribution");

rw<-rnorm(100)

rw<-cumsum(rw)

ts.plot(rw,

main="random walk form independent normal distribution");

0-7

random walk form independent Bernoulli distribution

Time

rw0 20 40 60 80 100

−5

05

random walk form independent normal distribution

Time

rw

0 20 40 60 80 100

−10

05

0-8

rw<-matrix(ncol=10, nrow=100)

for(i in 1:10)

{

x<-rnorm(100)

rw[,i]<-cumsum(x)

ts.plot(rw,

main="random walk form independent normal distribution",

col=1:30);

}

0-9

random walk form independent normal distribution

Time

0 20 40 60 80 100

−10

010

2030

0-10

filter:Linear filter

filter(x, filter, method = c(”convolution”, ”recursive”), sides = 2, circular =

FALSE, init)

yt = xt + f1yt−1 + ... + fpyt−p

plot(dlynx)

lines(filter(dlinx, rep(1,6))/6, col=2)

lines(filter(dlynx, rep(1,6))/6, col=2)

lines(filter(dlynx, rep(1,10))/10, col=3)

0-11

Time

dly

nx

1820 1840 1860 1880 1900 1920

−3

00

0−

10

00

01

00

03

00

0

Figure 3: linear filter

0-12

1.2 k�N�á~�¿ÌfC£�· �Dø5� Áþ�ÊÁ

• ar: Fit Autoregressive Models to Time Series

• arima: ARIMA Modelling of Time Series

1.3 k�N�á~+�Øכ�� smoothing or filtering£�· �Dø5�

Áþ�ÊÁ

• tsSmooth: Use Fixed-Interval Smoothing

• filter: Linear Filtering on a Time Series

• HoltWinters: Holt-Winters Filtering

0-13

• KalmanLike: Kalman Filtering

• KalmanRun: Kalman Filtering

• KalmanSmooth: Kalman Filtering

1.4 k�N�á~�¿ÌfCãÃ�× Ud¥�G±ê£�· ��·�]� �ÐM� Áþ�ÊÁ

• predict.ar: Fit Autoregressive Models to Time Series

• predict.Arima: Forecast from ARIMA fits

• predict.arima0: ARIMA Modelling of Time Series - Preliminary Ver-

sion

0-14

• predict.HoltWinters: prediction function for fitted Holt-Winters mod-

els

• KalmanForecast: Kalman Filtering

1.5 k�N�á~+�Øכ�� e��ï&P��7�£�· �Dø5� Áþ�ÊÁ

• lag: Lag a Time Series

• acf: Auto-Correlation Function Estimation

• ccf: Cross Correlation Function Estimation

• pacf: Partial Correlation Function Estimation

0-15

1.6 k�N�á~+�Øכ�� e��ï&P��7�£�· �Dø5� Áþ�ÊÁ(plot)

• lag.plot: Time Series Lag Plots

• ts.plot: Plot Multiple Time Series

• tsdiag: Diagnostic Plots for Time-Series Fits

• plot.acf: Plot Autocovariance and Autocorrelation Functions

1.7 k�N�á~&P��7�Uc"� ��à�~É�Ça�£�· �Dø5� Áþ�ÊÁ

• Box.test: Box-Pierce and Ljung-Box Tests

• PP.test: Phillips-Perron Test for Unit Roots

0-16

1.8 k�N�á~+�Øכ�� �ñd�æ·ÿb�£�· �Dø5� Áþ�ÊÁ

• ts: Time-Series Objects

• ts.intersect: Bind Two or More Time Series

• ts.plot: Plot Multiple Time Series

• ts.union: Bind Two or More Time Series

• tsp: Tsp Attribute of Time-Series-like Objects

0-17

2 k�N�á~�¿ÌfC

r�>�\P��+þA\�"f_� SX�Ò�¦���ú:

{Xt},#�l�"f t ��H r�çß�, t ≥ 0.

s�1lxîç�H�+þA(Moving average models; MA): at ∼iid (0, σ2)���¦ ½+É M:, ��6£§

_� �+þA�¦ �¦�9 ���.

Xt = at + φat−1 + φ2at−2 + φ3at−3 + ...

0A_� d���Ér ��6£§õ� °ú s� ³ð�&³½+É Ãº e����.

Xt = φXt−1 + at.

0-18

2.1 e�Ôeµ5���� ��� £� #aÇa�h�

• &ñ�©�$í(stationarity): weak stationarity(���&ñ�©�$í)

– E(Xt) = µ, ���H t\� @/ �#� îç�Hs� {9�&ñ ���.

– V ar(Xt) = σ2

– Cov(Xt, Xt−h) = σ(h).

Note:Xt, Yt�� ÇÐÇÐ Ça�(�×k�N�á~l�¢�> aXt + bYt �¿ Ça�(�×k�N�á~l���.

Check the stationarity for following model:

Xt = et + 0.4et−1, et ∼ WN(0, σ2),

0-19

• ��l�/BNì�ríß�(autocovariance):

V ar(Xt) = σ(0)

Cov(Xt, Xt−h) = σ(h)

V ar(Xt−h, Xt) = σ(−h), σ(h) = σ(−h)

|σ(h)| ≤ σ(0)

• ��l��©��'a(autocorrelation):

ρh = corr(Xt, Xt−h) =σ(h)σ(0)

ρ0 = 1

ρk = ρ−h

0-20

2.2 Ça�(�×k�N�á~�+ Ud

• Ñþ�Ò�oú�6£§: white noise process

at ∼ (0, σ2)

• Xt = φXt−1 + at, |φ| < 1

• Xt = θat−1 + at, |θ| < 1

• SX�Ò�¦�Ð'��-random walk process

Xt = Xt−1 + at,éß� at ∼ (0, σ2).

X0 = 0ܼ�Ð��&ñ ����, Xt = (Xt−2+at−1)+at = . . . = a1+a2+. . . at.

0-21

2.3 Fitting ARIMA models

2.3.1 AR models

We are now going to load up some of the datasets and try to fit ARIMA

models to them

• Exercise 1. Load up the dataset beavers from R, and then analyse

the temperature data in the dataframe beaver1: plot it, then inspect

the ACF, then try fitting AR models using the ar command in its

various forms; ar.yw, ar.burg, ar.mle. Using the first 80 observations,

predict ahead the remaining 34: I had stored the temperatures in a

time-series object y, so I did

0-22

> new <-y[1:80]

> pr <-predict.ar(F2,new,n.ahead=34)

> plot(y)

> lines(pr$pred,col=’’red’’)

Comment on what you have found.

• Exercise 2. Load up the dataset austres, and plot it. What do you

see? Take differences, and inspect the ACF. Try fitting various AR

models. Ifda is the difference sequence, try the following commands:

> var(da)

> F1 <-ar.yw(da)

0-23

> F1$ar

> F1$var.pred

> F2 <-ar.burg(da)

> F2$ar

> F2$var.pred

> acf(F2$resid,na.action=na.omit,lag.max=30)

> acf(F1$resid,na.action=na.omit,lag.max=30)

> F3 <-ar.burg(da, aic=F, order.max=4)

> F3$ar

> F3$var.pred

> acf(F3$resid,na.action=na.omit,lag.max=30)

0-24

There are three models fitted here; which would you prefer to use and

why?

• Exercise 3. Load up the dataset treering, and plot it. Does it appear

that the data should be transformed? Do there appear to be outlying

values? Inspect the ACF. Does this suggest a possible model for the

data? Try some of the following.

> var(treering)

> F1 <-ar.yw(treering);F2 ar.burg(treering)

> F1$ar; F2$ar

> F1$var.pred; F2$var.pred

> F3 <-ar.burg(treering, aic=F, order.max=3)

0-25

> F3$ar

> F3$var.pred

• Exercise 4. Following the lines of the earlier examples, find an appor-

priate model for the data in the R dataset lh. If you choose something

other than an AR(1) model, compare your choice with an AR(1) and

explain why you think your choice is to be preferred.

• Exercise 5. See what you make of the lynx data.

• Exercise 6. And lastly, find some model to fit the sunspot.month data.

0-26