Multivariate Time Series - George Mason...

Multivariate Time Series

Notation: I do not use boldface (or anything else) to distinguish vectors fromscalars.

Tsay (and many other writers) do.

I denote a multivariate stochastic process in the form “{Xt}”, where, for anyt, Xt is a vector of the same order.

We denote the individual elements with two subscripts:

“Xit” denotes the ith element of the multivariate time series at time t.

Warning: I’m not promising I’ll do this!

This is Tsay’s notation, and one of the two obvious notations.

One common introductory-graduate-level textbook on time series is by Shumwayand Stoffer, and another is by Brockwell and Davis.

Shumway and Stoffer use “Xti”. (I like this.)Brockwell and Davis use “Xit”.

One further thing on notation:you do not need to use the transpose (as Tsay and many others do) onXt = (X1t, . . . , Xkt).

It’s a column vector, but you’re not drawing a picture!

1

Marginal Model Components

In a multivariate time series {Xt : t ∈ . . . ,−1,0,1, . . .}, each

univariate series {Xit : t ∈ . . . ,−1,0,1, . . .} is called a marginal

time series.

The model for {Xit : t ∈ . . . ,−1,0,1, . . .} is called a marginal

model component of the model for {Xt : t ∈ . . . ,−1,0,1, . . .}.

This illustrates one of the disadvantages of not having a special

notation for vectors!

2

Stationary Multivariate Time Series

First of all, let’s establish that in time series, “stationary” means

“weakly stationary”.

(“Strictly stationary” means “strictly stationary” or “strongly”.)

If a time series is stationary, then its first and second moments

are time invariant. (This is not the definition.)

For a stationary time series {Xt}, we will generally denote the

first moment as the vector µ:

µ = E(Xt)

and the variance-covariance as Γ0:

Γ0 = V(Xt) = E((Xt − µ)(Xt − µ)T

).

(Note on notation: “Γ”, with or without serifs, is used to denote

the gamma function; it is a reserved symbol.

“Γ” is used to denote various things.)

3


If a time series is stationary, then its first and second moments

are time invariant.

Tsay (page 390) states the converse of this.

That is true, if “second moments” means “second auto-moments”

at fixed lags (including the 0 lag).

I looked back to see how clear Tsay has been on that point.

He has not been clear. On page 39 (in the middle), he makes

an argument that is based on

the time invariance of the first and second moments and

the finiteness of the autocovariance imply stationarity.

In other places, he indicates that time invariance of the auto-

correlations is also required, but because he does not write as a

mathematician, it is sometimes not clear.

4


So let’s be very clear. Analogously to the autocovariance γs,t,

let’s define the cross-autocovariance matrix Γs,t:

Γs,t = E((Xs − µs)(Xt − µ)T

).

(“cross-autocovariance matrix” is quite a mouthful, so we’ll just

call it the “cross-covariance matrix”.)

Now, suppose Γs,t is constant if s − t is constant.

(Notice I did not say if |s − t| is constant.)

Under that additional condition (together with the time-invariance

of the first two moments), we say that the multivariate time se-

ries is stationary.

5


In the case of stationarity, we can use the notation Γh, which is

consistent with the notation Γ0 used before.

We refer to the h = 0 case as “concurrent” or “contemporane-

ous”. The matrix Γ0 is the concurrent cross-covariance matrix.

We now introduce another notational form so that we can easily

refer to the elements of the matrices. We equate Γ(h) with Γh.

Now, we can denote the ijth element of Γ(h) as Γij(h).

(There is an alternative meaning of Γp. We have used it (I think)

to refer to the p×p symmetric matrix of autocovariances of orders

0 through p. It is often seen in the Yule-Walker equations.)

The meaning is made clear in the context.

6


Notice that stationarity of the multivariate time series implies

stationarity of the individual univariate time series.

The univariate autocovariance functions are the diagonal ele-

ments of Γh.

We sometimes use the phrase “jointly stationary” to refer to a

stationary multivariate time series. (This excludes the case of a

multivariate time series each of whose components is stationary,

but the cross-covariances are not constant at constant lags.)

7

Cross-Correlation Matrix

The standard deviation of the ith component of a multivariate time series inthe standard notation is

√Γii(0).

For a k-variate time series, the matrix D = diag(√

Γ11(0), . . . ,√

Γkk(0))

is

very convenient.

All variances are assumed to be positive, so D−1 exists, and

D−1ΓhD−1

is the cross-correlation matrix of lag h. (If h = 0, of course, it is the concurrentcross-correlation matrix.)

Tsay denotes it as ρ0 or ρ`.

I like to use uppercase rho, R0 or Rh.

Of course, either way, we may use an alternative notation, ρij(`) or Rij(h).

Furthermore, note that in my notation, I may use ρij(h) in place of Rij(h).

8

Properties of the Cross-Covariance and

Cross-Correlation Matrices

Notice that Γ0 is symmetric; it’s an ordinary variance-covariance

matrix.

On the other hand, Γh is not necessarily symmetric; in fact,

ΓTh = Γ−h.

The elements of the matrix Γ(h) have directional meanings, and

these have simple interpretations.

First of all, we need to be clear about what kind of relationships

that covariance or correlation relates to.

Covariance or correlation relates to linear relationships.

For X centered on 0, the correlation between X and Y = X2 is

0.

9

Properties of the Cross-Covariance and

Cross-Correlation Matrices

Consider a given i and j representing the ith and jth marginalcomponent time series.

The direction of time is very important in characterizing therelationships between the ith and jth series.

In the following, which is consistent with the nontechnical lan-guage in time series analysis, we will use the term “depend”loosely. (We need the “linear” qualifier.)

If Γij(h) = 0 for all h, then the series Xit does not depend onthe past values of the series Xjt.

If Γij(h) 6= 0 for some h, then the series Xit does depend on the

past values of the series Xjt. In this case we say Xjt leads Xit,or Xjt is a leading indicator of Xjt.

If Γij(h) = Γji(h) = 0 for all h, then neither depends on the pastvalues of the other series, and we say the series are uncoupled.

10

The Cross-Covariance Matrix in a Strictly

Stationary Process

Strict stationarity is a restrictive property.

Notice, first of all, that it requires more than just time-invariance

of the first two moments; it requires time-invariance of the whole

distribution.

It also requires stronger time-invariance of auto-properties.

An iid process is obviously strictly stationary, and such a process

is often our choice for examples.

The following process, however, is also strictly stationary:

. . . , X, −X, X, −X, . . .

11

The Cross-Covariance Matrix in a Strictly

Stationary Process and in a Serially

Uncorrelated Process

The cross-covariance matrix alone does not tell us whether the

process is stationary; we also need time-invariance of the first

two moments.

Given stationarity, it is not possible to tell from the cross-covariance

matrix whether or not a process is strictly stationary.

In a serially uncorrelated process, Γh is a hollow matrix for all

h 6= 0.

12

Sample Cross-Covariance and Cross-Correlation

Matrices

Given a realization of a multivariate time series {xt : t = 1, . . . , n},where each xt is a k-vector, the sample cross-covariance and

cross-correlation matrices are formed from in the obvious ways.

We use the “hat” notation to indicate that these are sample

quantities, and also because they are often used as estimators

of the population quantities.

We also use the notation x̄ to denote∑

xt/n, and σ̂i to denote√∑t(xit − x̄i)

2/n.

(Note the divisor. It’s not really important, of course.)

13

Sample Cross-Covariance and Cross-Correlation

Matrices

The sample cross-covariance matrix is

Γ̂h =1

n

n∑

i=1+h

(xt − x̄)(xt−h − x̄)T.

(Note the divisor.)

Letting D̂ = diag(σ̂1, . . . , σ̂k), the sample cross-correlation matrix

is

R̂h = D̂−1Γ̂hD̂−1.

14

Sample Cross-Correlation Matrices

There is a nice simplified notation that Tiao and Box introduced

to indicate leading and lagging indicators as measured by a sam-

ple cross-covariance matrix.

Each cell has an indicator of “significant” positive, “significant”

negative, “insignificant” sample correlation.

Here, “significant” is defined with respect to twice the asymp-

totic 5% critical value of a sample correlation coefficient under

the assumption that the true correlation coefficient is 0.

The comparison value is 2/√

n, and the indicators are “+”, “−”,

and “.”; thus, at a specific lag, a correlation matrix for three

component time series may be represented as

. + −+ + .. − −

.

15

Multivariate Portmanteau Tests

Recall the test statistic for the portmanteau test of Ljung and

Box (first, recall what the portmanteau test tests):

Q(m) = n(n + 2)m∑

h=1

1

n − hρ̂2h.

The multivariate version for a k-variate time series is

Qk(m) = n2m∑

h=1

1

n − htr

(Γ̂T

h Γ̂−10 Γ̂hΓ̂−1

0

).

Note the similarities and the differences.

Under the null hypothesis and some regularity conditions, this

has an asymptotic distribution of chi-squared with k2m degrees

of freedom.

16

VAR Models

In time series, “VAR” means “vector autoregressive”.

In finance generally, “VaR” means “value at risk.”

A VAR(1) model is

Xt = φ0 + ΦXt−1 + At,

where Xt, φ0, and Xt−1 are k-vectors, Φ is a k × k matrix, and

{At} is a sequence of serially uncorrelated k-vectors with 0 mean

and constant positive definite variance-covariance matrix Σ.

Note that the systematic term may be bigger than it looks.

There are k linear terms. The key is that they only go back in

time one step.

This form is called the reduced-form of a VAR model.

17

Structural Equations of VAR Models

The relationships among the component time series arise from

the off-diagonal elements of Σ.

To exhibit the concurrent relationships among the component

time series, we do a diagonal decomposition of Σ, writing it as

Σ = LGLT, where L is a lower triangular matrix whose diagonal

entries are all 1 (which means that it is of full rank), and G is

a diagonal matrix with positive entries. (Such a decomposition

exists for any positive definite matrix.)

Note that G = L−1Σ(L−1)T.

18


Now we transform the reduced-form equation by premultiplying

each term by L−1:

L−1Xt = L−1φ0 + L−1ΦXt−1 + L−1At

= φ∗0 + Φ∗Xt−1 + Bt.

This is one of the most fundamental transformations in statistics.

The important result is that the variance-covariance that ties

the component series together concurrently, that is, V(At), has

been replaced by V(Bt), which is diagonal.

Because of the special structure of L, we can see concurrent

linear dependencies of the kth series on the others. And by

rearranging the terms in the series, we can make any component

series the “last” one.

19


The last row of L−1 has a 1 in the last position. Call the other

elements wk1, wk2, etc.

Then the last equation in the matrix equation

L−1Xt = φ∗0 + Φ∗Xt−1 + Bt.

can be written as

Xkt +k−1∑

i=1

wikXit = φ∗k,0 +

k∑

i=1

φ∗k,iXit−1 + Bkt.

This shows the concurrent relationship of Xkt on the other series.

20

Properties of a VAR(1) Process

There are several important properties we can see easily.

Because the {A} are serially uncorrelated, Cov(At, Xt−h) = 0 for

all h > 0.

Also, we see Cov(At, Xt) = V(At) = Σ.

Also, we see the Xt depends on the jth previous X (and A by

Φj. The process would be explosive (i.e., the variance would go

to infinity) unless Φj → 0 as j → ∞. This will be guaranteed if

all eigenvalues of Φ are less than 1 in modulus.

(Remember this?)

Also, just as in the univariate case, we have the recursion

Γh = ΦΓh−1

from which we get

Γh = ΦhΓ0.

21

VAR(p) Processes and Models

A VAR(p) model, for p > 0 is

Xt = φ0 + Φ1Xt−1 + · · · + ΦpXt−p + At,

where Xt, φ0, and Xt−i are k-vectors, Φ1, . . . , Φp are k×k matrices,

with Φp 6= 0, and {At} is a sequence of serially uncorrelated

k-vectors with 0 mean and constant positive definite variance-

covariance matrix Σ.

We also can write this using the back-shift operator as

(I − Φ1B − · · · − ΦpBp)Xt = φ0 + At,

or

Φ(B)Xt = φ0 + At,

We also can work out the autocovariance for a VAR(p) process:

Γh = Φ1Γh−1 + · · · + ΦpΓh−p.

This is the multivariate Yule-Walker equations.

22

The Yule-Walker Equations

Let’s just consider an AR(p) model.

We have worked out the autocovariance function of an AR(p)

model. It is

γ(h) = φ1γ(h − 1) + · · · + φpγ(h − p)

and

σ2A = γ(0)− φ1γ(1) − · · · − φpγ(p).

23

The Yule-Walker Equations

The equations involving the autocovariance function are called

the Yule-Walker equations.

There are p such equations, for h = 1, . . . , p.

For an AR(p) process that yields the two sets of equations on

the previous slide, we can write them in matrix notation as

Γpφ = γp

and

σ2A = γ(0) − φTγp

24

Yule-Walker Estimation of the AR Parameters

After we compute the sample autocovariance function for a given

set of observations, we merely solve the Yule-Walker equations

to get our estimators:

φ̂ = Γ̂−1p γ̂p

and

σ̂2A = γ̂(0) − φ̂Tγ̂p

Instead of using the sample autocovariance function, we usually

use the sample ACF:

φ̂ = R̂−1p ρ̂p.

25

Large Sample Properties of the Yule-Walker

Estimators

A result that can be shown for the Yule-Walker estimator φ̂ is

√n(φ̂ − φ)

d→ Np(0, σ2wΓ−1

p )

and

σ̂2A

p→ σ2A.

26

Yule-Walker Equation for a VAR(p) Model

The multivariate Yule-Walker equations,

Γh = Φ1Γh−1 + · · · + ΦpΓh−p

can also be used in estimation.

They are often expressed in the correlation from

Rh = Υ1Rh−1 + · · · + ΥpRh−p,

where Υi = D−1/2ΦiD1/2.

27

Companion Matrix

We can sometimes get a better understanding of a k-dimensional

VAR(p) process by writing it as a kp VAR(1).

It is

Yt = Φ∗Xt−1 + Bt

where

0 I 0 · · · 00 0 I · · · 0... ... ... ... 00 I 0 · · · 0Φp Φp−1 Φp−2 · · · Φ1

.

This is sometimes called the companion matrix.

28

Multivariate Time Series - George Mason...

Documents

Transcript of Multivariate Time Series - George Mason...