Time series modelling Marian Scott SAGES, March 2009.

41
Time series modelling Marian Scott SAGES, March 2009

Transcript of Time series modelling Marian Scott SAGES, March 2009.

Page 1: Time series modelling Marian Scott SAGES, March 2009.

Time series modelling

Marian Scott

SAGES, March 2009

Page 2: Time series modelling Marian Scott SAGES, March 2009.

what is a time series?

• a time series is a sequence of measurements made over time.

• notationally, this would commonly be written as y1, y2,…, yi, ….yT

• the index i denotes the position in the sequence of observations

• for this early session, we will assume that the data are equally spaced-so that i is truly an index

Page 3: Time series modelling Marian Scott SAGES, March 2009.

how to plot the data

a time series plot• choice of the x-axis scale

– occasionally, each observation is indexed by its position in the sequence (OK if equally spaced)

– alternatively, we may use the actual timescale (e.g. if an annual series, years or a daily series, then days 1-365)

– or we may regard time on a continuous scale (time might be recorded in decimal form e.g. 1986.5- which would be June 1986)

Page 4: Time series modelling Marian Scott SAGES, March 2009.

How is biodiversity changing (EEA CSI 009)

• Populations of common and widespread farmland bird species in 2003 are only 71% of their 1980 levels.

• an annual indicator

Page 5: Time series modelling Marian Scott SAGES, March 2009.

Water quality- freshwater (CSI 020)

• Concentrations of P generally decreased

• Nitrate concentrations have remained constant

• What are the rates of change and are they significant?

Page 6: Time series modelling Marian Scott SAGES, March 2009.
Page 7: Time series modelling Marian Scott SAGES, March 2009.

Example 1- monthly mean CO2 levels

Page 8: Time series modelling Marian Scott SAGES, March 2009.

daily mean temperature

days

de

gre

e c

elc

ius

-10

01

02

0

01/01/1973 01/01/1980 01/01/1987 12/31/1993 12/31/2000

daily minima temperature

days

de

gre

e c

elc

ius

-20

-10

01

0

01/01/1973 01/01/1980 01/01/1987 12/31/1993 12/31/2000

Example 2: a time series plot (daily values)

the x-axis shows the actual date

Page 9: Time series modelling Marian Scott SAGES, March 2009.

Example 3- Some typical environmental series- Loch Leven

(NERC-CEH)SRP

Years

SR

P,

mu

g/l

1970 1980 1990 2000

02

04

06

0

TP

Years

TP

, m

ug

/l

1970 1980 1990 2000

50

10

01

50

Secchi

Years

Se

cch

i, m

etr

es

1970 1980 1990 2000

12

34

Daphnia

Years

Da

ph

nia

, in

div

idu

als

/l

1970 1980 1990 2000

02

04

06

08

0

Chlorophyll

Years

Ch

loro

ph

yll,

mu

g/l

1970 1980 1990 2000

05

01

00

15

02

00

Water Temperature

Years

Wa

ter

Te

mp

era

ture

, o

C

1970 1980 1990 20000

51

01

52

0

Page 10: Time series modelling Marian Scott SAGES, March 2009.

SO2 monitored in AT02

observation number

ug

S/m

3

0 2000 4000 6000 8000

02

04

06

08

01

00

Example 4- air quality, monitored through time (from EMEP programme)

note the gaps and the rather extreme values- one strategy is to take logs

Page 11: Time series modelling Marian Scott SAGES, March 2009.

Time series data features

• patterns over time (both short and long term)

• often missing data- may cause problems for statistical analysis

• variation, which may not be constant over time so may need to consider transformations (log)

Page 12: Time series modelling Marian Scott SAGES, March 2009.

Seasonal patterns (cycles)

• in many environmental times series, we could imagine some periodicity (e.g. such as a monthly pattern in temperature)

• so it is common to produce a “seasonality plot”

• the index (x-axis scale) depends on the period over which the cycle repeats itself.

Page 13: Time series modelling Marian Scott SAGES, March 2009.

days

resid

ua

ls

0

-20

2

smoothing residuals of SO2 Vs days of the year with a bandwidth of 30, GB02, p.value = 0

0 30 60 90 120 150 180 210 240 270 300 330 360

days

resid

ua

ls

0

-4-2

02

smoothing residuals of SO2 Vs days of the year with a bandwidth of 30, AT02, p.value = 0

0 30 60 90 120 150 180 210 240 270 300 330 360

Example 1: daily observations, so the seasonal curve is plotted over days of the year

Page 14: Time series modelling Marian Scott SAGES, March 2009.

SunSatFriThuWedTueMon

0.04

0.02

0.00

-0.02

-0.04

-0.06

Seasonal Indices

15

10

5

0

Percent Variation, by Seasonal Period

43210

-1-2-3-4

Original Data, by Seasonal Period

43210

-1-2-3-4

Residuals, by Seasonal Period

Seasonal Analysis for ln(SO2) monitored in GB02

Mon

MonMon

Tue

TueTue

Wed

WedWed

Thu

Thu ThuFri Fri

Fri Sat

SatSat Sun

Sun

Sun

Example 2: Daily data- data are plotted over the days of the week

Page 15: Time series modelling Marian Scott SAGES, March 2009.

Log SRP

Month

Lo

g S

RP

, m

ug

/l

2 4 6 8 10 12

-20

24

Log TP

MonthL

og

TP

, m

ug

/l2 4 6 8 10 12

3.5

4.0

4.5

5.0

Log Chlorophyll

Month

Lo

g C

hlo

rop

hyl

l, m

ug

/l

2 4 6 8 10 12

01

23

45

Log Daphnia

Month

Lo

g D

ap

hn

ia,

ind

ivid

ua

ls/l

2 4 6 8 10 12

-4-2

02

4

Log Secchi

Month

Lo

g S

ecc

hi,

me

tre

s

2 4 6 8 10 12

-0.5

0.0

0.5

1.0

Water Temperature

Month

Wa

ter

Tem

pe

ratu

re,

oC

2 4 6 8 10 12

05

10

15

20

Example 3: Loch Leven, monthly data- data are plotted over the months of the year (Lowess smooth included)

Page 16: Time series modelling Marian Scott SAGES, March 2009.

what are the questions of interest?

• we want to know about trends, where a trend is defined to be:– the long-term sweep of the data.

• we want to know about possible seasonality (or cycles)– The seasonal component of a time series

describes a regular fluctuation which has a period. (The period is the time interval between consecutive peaks or troughs.)

Page 17: Time series modelling Marian Scott SAGES, March 2009.

a descriptive model

• A useful descriptive model for a time series consists of 3 components:

• X = Trend + Seasonal Component + Irregular Component

or X = T+S+I• I is the irregular component, which is left over

when the trend, and seasonal components are all accounted for. It is an irregular or random fluctuation (like residuals in regression).

Page 18: Time series modelling Marian Scott SAGES, March 2009.

smoothing a time series

• In many time series, the seasonal variation can be so strong that it obscures any trend or cyclical component. However, for understanding the process being observed (and forecasting future values of the series), trends and cycles are of prime importance. Smoothing is a process designed to remove seasonality so that the long-term movements in a time series can be seen more clearly

Page 19: Time series modelling Marian Scott SAGES, March 2009.

smoothing a time series

• one of the most commonly used smoothing techniques is moving average.

• difficult choice: the window over which to smooth

• smooth series: Yi = wkYi+k

• other smoothing methods (more modern) commonly used include Lowess

Page 20: Time series modelling Marian Scott SAGES, March 2009.

smoothing a time series

• LO(W)ESS, is a method that is known as locally weighted polynomial regression. At each point in the data set a low-degree polynomial is fit to a subset of the data, with explanatory variable values near the point whose response is being estimated. The polynomial is fit using weighted least squares, giving more weight to points near the point whose response is being estimated and less weight to points further away.

• Many of the details of this method, such as the degree of the polynomial model and the weights, are flexible.

Page 21: Time series modelling Marian Scott SAGES, March 2009.

Example 1: water surface temperature from Jan 1981- Feb 1992 (Piegorsch)- with lowess curve

19/12/

1991

01/11/

1990

27/09/

1989

24/08/19

88

21/0

7/19

87

19/0

6/19

86

17/0

5/19

85

12/0

4/19

84

10/0

3/19

83

09/0

2/19

82

20/0

1/19

81

30

25

20

15

10

5

0

date

tem

p

Time Series Plot of temp

Page 22: Time series modelling Marian Scott SAGES, March 2009.

Example 1: water surface temperature -seasonal pattern

121086420

30

25

20

15

10

5

0

month

tem

pScatterplot of temp vs month

Page 23: Time series modelling Marian Scott SAGES, March 2009.

Example 1: water surface temperature- seasonal pattern by week

6050403020100

30

25

20

15

10

5

0

week

tem

pScatterplot of temp vs week

Page 24: Time series modelling Marian Scott SAGES, March 2009.

Example 1: water surface temperature- variability by year

199219911990198919881987198619851984198319821981

30

25

20

15

10

5

0

year

tem

pBoxplot of temp

Page 25: Time series modelling Marian Scott SAGES, March 2009.

Example 1: water surface temperature-variability by month

121110987654321

30

25

20

15

10

5

0

month

tem

p

Boxplot of temp

Page 26: Time series modelling Marian Scott SAGES, March 2009.

Example 1: water surface temperature-moving average length 52

19/12/19

91

01/1

1/19

90

27/0

9/19

89

24/08/

1988

21/07/

1987

19/06/

1986

17/0

5/19

85

12/0

4/19

84

10/0

3/19

83

09/02/

1982

20/01/

1981

30

25

20

15

10

5

0

date

tem

p

Length 52Moving Average

MAPE 44.8212MAD 6.1001MSD 48.9017

Accuracy Measures

ActualFits

Variable

Moving Average Plot for temp

Page 27: Time series modelling Marian Scott SAGES, March 2009.

days

ln(u

g S

/m3

)

0

-4-3

-2-1

01

23

45

a) smoothing of the logarithm of SO2bandwidth = 30

0 730 1460 2190 2920 3650 4380 5110 5840 6570 7300 8030

days

ln(u

g S

/m3

)

0

-4-3

-2-1

01

23

45

b) smoothing of the logarithm of SO2bandwidth = 800

0 730 1460 2190 2920 3650 4380 5110 5840 6570 7300 8030

Example 2: different smoothing technique applied to air quality data (that have been logged)

Page 28: Time series modelling Marian Scott SAGES, March 2009.

harmonic regression

• another way of a) describing and b) hence being able to remove the periodic component is to use what is called harmonic regression

• remember sin and cos from school?

Page 29: Time series modelling Marian Scott SAGES, March 2009.

harmonic regression

• build a regression model using the sine function. sin () lies between -1 and +1, where measured in radians.

• for a periodic time series Yi we can build a regression model

• Yi = 0 + sin (2[ti - ]/p) + i

• to make this simpler, if we assume that p is known, this can be written as a simple multiple linear regression model

Page 30: Time series modelling Marian Scott SAGES, March 2009.

harmonic regression

• for a periodic time series Yi we can build a regression model

• Yi = 0 + sin (2[ti - ]/p) + i

• to make this simpler,

• Yi = 0 + 1ci + 2si + i

• where ci = cos(2ti/p) and si = sin(2ti/p)

Page 31: Time series modelling Marian Scott SAGES, March 2009.

ln(SO2) in GB02 against fine gridModel 2

weeks

ln(u

g S

/m3

)

1980 1985 1990 1995 2000

-2-1

01

23

Example 2: red curve shows the harmonic pattern (superimposed on a declining trend).

Page 32: Time series modelling Marian Scott SAGES, March 2009.

correlation through time

• in many situations, we expect successive observations to show correlation at adjacent time points (most likely stronger the closer the time points are), strength of dependence usually depends on time separation or lag

• for regularly spaced data, we typically make use of the autocorrelation function (ACF)

Page 33: Time series modelling Marian Scott SAGES, March 2009.

correlation through time

• for regularly spaced time series, with no missing data, we define the sample mean in the usual way

• then the sample autocorrelation coefficient at lag k ( 0), r(k)

• correlation between original series and a version shifted back k time units

• horizontal lines show approximate 95% confidence intervals for individual coefficients.

Page 34: Time series modelling Marian Scott SAGES, March 2009.

Example 1: ACF of water temperature data

605550454035302520151051

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

Lag

Auto

corr

ela

tion

Autocorrelation Function for temp(with 5% significance limits for the autocorrelations)

Page 35: Time series modelling Marian Scott SAGES, March 2009.

correlation through time

• ACF shows a very marked cyclical pattern• interpretation of the ACF

– we need to have removed both trend and seasonality– we hope that (for simplicity in subsequent modelling)

that only a few correlation coefficients (at small lags) will be significant.

• ACF an important diagnostic tool for time series modelling (formal models ARIMA). Formal time series models …see later session on trends

• how should we remove the seasonal pattern or the trend?

Page 36: Time series modelling Marian Scott SAGES, March 2009.

differencing

• a common way of removing a simple trend (eg linear) is by differencing

• define a new series

• Zt = Yt – Yt-1

• a common way of removing seasonality (if we know the period to be p), is to take pth differences

• Zt = Yt – Yt-p

Page 37: Time series modelling Marian Scott SAGES, March 2009.

Example 1: ACF of water temperature data

302520151051

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

Lag

Auto

corr

ela

tion

Autocorrelation Function for monthlymean(with 5% significance limits for the autocorrelations)

Page 38: Time series modelling Marian Scott SAGES, March 2009.

Example 1: ACF of water temperature data- difference order 12

30282624222018161412108642

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

Lag

Auto

corr

ela

tion

Autocorrelation Function for 12 difference of monthly mean(with 5% significance limits for the autocorrelations)

Page 39: Time series modelling Marian Scott SAGES, March 2009.

a descriptive model

• A useful descriptive model for a time series consists of 3 components:

• X = Trend + Seasonal Component + Irregular Component

or X = T+S+I• I is the irregular component, which is left over

when the trend and seasonal components are all accounted for. It is an irregular or random fluctuation (like residuals in regression).

Page 40: Time series modelling Marian Scott SAGES, March 2009.

simple algorithm

• obtain rough estimate of trend (smoothing but one not affected by seasonality):

• subtract estimated trend• estimate seasonal cycle from detrended

series• what is left is the irregular component, • good alternative- STL (seasonal trend

lowess) decompostion (stl() command in R)

Page 41: Time series modelling Marian Scott SAGES, March 2009.

a couple of examples for you to try

• for monthly temperature data– obtain the acf – use the stl() command

• for dissolved oxygen in River Clyde – fit a seasonal regression model

• In the final session on trend detection we will return to regression for time series.