lec16 Bocipherous

8/11/2019 lec16 Bocipherous

1/8

Lecture 16: State Space Model and Kalman FilterBus 41910, Time Series Analysis, Mr. R. Tsay

A state space model consists of two equations:

S t +1 = F S t + Get +1 , (1)Z t = HS t + t (2)

where S t is a state vector of dimension m, Z t is the observed time series, F,G,H arematrices of parameters, {et } and { t } are iid random vectors satisfying

E (et ) = 0 , E ( t ) = 0 , Cov(et ) = Q, Cov( t ) = R

and {et } and { t } are independent. In the engineering literature, a state vector denotesthe unobservable vector that describes the status of the system. Thus, a state vectorcan be thought of as a vector that contains the necessary information to predict the futureobservations (i.e., minimum mean squared error forecasts).

Remark : In some applications, a state-space model is written as

S t +1 = F S t + Get ,Z t = HS t + t .

Is there any difference between the two parameterizations?In this course, Z t is a scalar and F,G, H are constants. A general state space model infact allows for vector time series and time-varying parameters. Also, the independencerequirement between et and t can be relaxed so that et +1 and t are correlated.To appreciate the above state space model, we rst consider its relation with ARMA models.The basic relations are

an ARMA model can be put into a state space form in innite many ways;

for a given state space model in (1)-(2), there is an ARMA model.

A. State space model to ARMA model:The key here is the Cayley-Hamilton theorem, which says that for any m m matrix F with characteristic equation

c() = |F I | = m + 1 m 1 + 2 m 2 + + m 1 + 0 ,

we have c(F ) = 0. In other words, the matrix F satises its own characteristic equation,i.e.F m + 1 F m 1 + 2 F m 2 + + m 1 F + m I = 0.

1


2/8

Next, from the state transition equation, we have

S t = S tS t +1 = FS t + Get +1S t +2 = F 2 S t + F Ge t +1 + Get +2S t +3 = F 3 S t + F 2 Get +1 + FGe t +2 + Get +3

... = ...

S t + m = F m S t + F m 1 Get +1 + + F Ge t + m 1 + Get + m .

Multiplying the above equations by m , m 1 , , 1 , 1, respectively, and summing, weobtain

S t + m + 1 S t + m 1 + + m 1 S t +1 + m S t = Get + m + 1 et + m 1 + + m 1 et +1 . (3)

In the above, we have used the fact that c(F ) = 0.Finally, two cases are possible. First, assume that there is no observational noise, i.e. t = 0for all t in (2). Then, by multiplying H from the left to equation (3) and using Z t = HS t ,we have

Z t + m + 1 Z t + m 1 + + m 1 Z t +1 + m Z t = at + m 1 a t + m 1 m 1 a t +1 ,

where a t = HGe t . This is an ARMA( m, m 1) model.The second possibility is that there is an observational noise. Then, the same argumentgives

(1 + 1 B + + m B m )(Z t + m t + m ) = (1 1 B m 1 B m 1 )a t + m .

By combining t with a t , the above equation is an ARMA( m, m ) model.

B. ARMA model to state space model:We begin the discussion with some simple examples. Three general approaches will begiven later.Example 1 : Consider the AR(2) model

Z t = 1 Z t 1 + 2 Z t 2 + a t .

For such an AR(2) process, to compute the forecasts, we need Z t 1 and Z t 2 . Therefore,it is easily seen that

Z t +1Z t

= 1 21 0 Z tZ t 1

+ 10 a t +1

andZ t = [1, 0]S t

2


3/8

where S t = ( Z t , Z t 1 ) and there is no observational noise.

Example 2 : Consider the MA(2) model

Z t = at 1 a t 1 2 a t 2 .

Method 1: at

a t 1 = 0 0

1 0

at 1

a t 2 + 1

0a t

Z t = [ 1 , 2 ]S t + a t .

Here the innovation a t shows up in both the state transition equation and the observationequation. The state vector is of dimension 2.Method 2: For an MA(2) model, we have

Z t | t = Z tZ t +1 | t = 1 a t 2 a t 1Z t +2 | t = 2 a t .

Let S t = ( Z t , 1 a t 2 a t 1 , 2 a t ) . Then,

S t +1 =0 1 00 0 10 0 0

S t +1

1 2

a t +1

andZ t = [1, 0, 0]S t .

Here the state vector is of dimension 3, but there is no observational noise.

Exercise : Generalize the above result to an MA( q ) model.

Next, we consider three general approaches.Akaikes approach : For an ARMA( p, q ) process, let m = max { p, q + 1}, i = 0 fori > p and j = 0 for j > q . Dene S t = ( Z t , Z t +1 | t , Z t +2 | t , , Z t + m 1 | t ) where Z t + | t is theconditional expectation of Z t + given t = {Z t , Z t 1 , } . By using the updating equationof forecasts (recall what we discussed before)

Z t +1 ( 1) = Z t ( ) + 1 a t +1 ,

it is easy to show that

S t +1 = F S t + Ga t +1Z t = [1, 0, , 0]S t

3


4/8

where

F =

0 1 0 00 0 1 0...

...m m 1 2 1

, G =

112...

m 1

.

The matrix F is call a companion matrix of the polynomial 1 1 B m B m .

Aokis Method : This is a two-step procedure. First, consider the MA( q ) part. LettingW t = at 1 a t 1 qa t q, we have

a ta t 1

...a t q+1

=

0 0 0 01 0 0 0...0 0 1 0

a t 1a t 2

...a t q

+

10...0

a t

W t = [ 1 , 2 , , q]S t + a t .In the next step, we use the usual AR( p) format for

Z t 1 Z t 1 pZ t p = W t .

Consequently, dene the state vector as

S t = ( Z t 1 , Z t 2 , , Z t p, a t 1 , , a t q) .

Then, we have

Z tZ t 1

...

Z t p+1a ta t 1

...a t q+1

=

1 2 p 1 1 q1 0 0 0 0 0...

...

0 1 0 0 0 00 0 0 0 0 00 0 0 1 0 0... 00 0 0 0 1 0

Z t 1Z t 2

...

Z t pa t 1a t 2

...a t q

+

10...

010...0

a t .

andZ t = [1 , , p, 1 , , q]S t + a t .

Third approach : The third method is used by some authors, e.g. Harvey and his asso-ciates. Consider an ARMA( p, q ) model

Z t = p

i=1i Z t i + a t

q

j =1 j a t j .

4


5/8

Let m = max { p, q }. Dene i = 0 for i > p and j = 0 for j > q . The model can bewritten as

Z t =m

i=1i Z t i + a t

m

i=1i a t i .

Using (B) = (B)/ (B), we can obtain the -weights of the model by equating coefficientsof B j in the equation

(1 1 B m B m ) = (1 1 B m B m )(0 + 1 B + + m B m + ),

where 0 = 1. In particular, consider the coefficient of B m , we have

m = m 0 m 1 1 1 m 1 + m .

Consequently,

m =m

i=1i m i m . (4)

Next, from the -weight representation

Z t + m i = at + m i + 1 a t + m i 1 + 2 a t + m i 2 + ,

we obtainZ t + m i | t = m i a t + m i+1 a t 1 + m i+2 a t 2 +

Z t + m i | t 1 = m i+1 a t 1 + m i+2 a t 2 + .

Consequently,Z t + m i | t = Z t + m i | t 1 + m i a t , m i > 0. (5)

We are ready to setup a state space model. Dene S t = ( Z t | t 1 , Z t +1 | t 1 , , Z t + m 1 | t 1 ) .Using Z t = Z t | t 1 + a t , the observational equation is

Z t = [1, 0, , 0]S t + a t .

The state-transition equation can be obtained by Equations (5) and (4). First, for the rstm 1 elements of S t +1 , Equation (5) applies. Second, for the last element of S t +1 , themodel implies

Z t + m | t =m

i=1i Z t + m i | t m a t .

Using Equation (5), we have

Z t + m | t =m

i=1i (Z t + m i | t 1 + m i a t ) m a t

=m

i=1i Z t + m i | t 1 + (

m

i=1i m i m )a t

=m

i=1i Z t + m i | t 1 + m a t ,

5


6/8

where the last equality uses Equation (4). Consequently, the state-transition equation is

S t +1 = F S t + Ga t

where

F =

0 1 0 00 0 1 0...

...m m 1 2 1

, G =

123...

m

.

Note that for this third state space model, the dimension of the state vector is m =max{ p, q }, which may be lower than that of the Akaikes approach. However, the innova-tions to both the state-transition and observational equations are at .

Kalman FilterKalman lter is a set of recursive equation that allows us to update the information ina state space model. It basically decomposes an observation into conditional mean andpredictive residual sequentially. Thus, it has wide applications in statistical analysis.

The simplest way to derive the Kalman recursion is to use normality assumption. It shouldbe pointed out, however, that the recursion is a result of the least squares principle (orprojection) not normality. Thus, the recursion continues to hold for non-normal case.The only difference is that the solution obtained is only optimal within the class of linearsolutions. With normailty, the solution is optimal among all possible solutions (linear andnonlinear).Under normality, we have

that normal prior plus normal likelihood results in a normal posterior,

that if the random vector ( X, Y ) are jointly normal

X Y N (

xy

, xx xy yx yy),

then the conditional distribution of X given Y = y is normal

X |Y = y N [x + xy 1yy (y y ), xx xy 1

yy yx ].

Using these two results, we are ready to derive the Kalman lter. In what follows, let P t + j | tbe the conditional covariance matrix of S t + j given {Z t , Z t 1 , } for j 0 and S t + j | t bethe conditional mean of S t + j given {Z t , Z t 1 , } .

First, by the state space model, we have

S t +1 | t = F S t | t (6)

6


7/8

Z t +1 | t = HS t +1 | t (7)P t +1 | t = F P t | t F + GQG (8)V t +1 | t = HP t +1 | t H + R (9)C t +1 | t = HP t +1 | t (10)

where V t +1 | t is the conditional variance of Z t +1 given {Z t , Z t 1 , } and C t +1 | t denotesthe conditional covariance between Z t +1 and S t +1 . Next, consider the joint conditional

distribution between S t +1 and Z t +1 . The above results give S t +1Z t +1 t

N ( S t +1 | tZ t +1 | t, P t +1 | t P t +1 | t H HP t +1 | t HP t +1 | t H + R

).

Finally, when Z t +1 becomes available, we may use the property of nromality to update thedistribution of S t +1 . More specically,

S t +1 | t +1 = S t +1 | t + P t +1 | t H [HP t +1 | t H + R] 1 (Z t +1 Z t +1 | t ) (11)

andP t +1 | t +1 = P t +1 | t P t +1 | t H [HP t +1 | t H + R] 1 HP t +1 | t . (12)

Obviously,r t +1 | t = Z t +1 Z t +1 | t = Z t +1 HS t +1 | t

is the predictive residual for time point t + 1. The updating equation in (11) says thatwhen the predictive residual r t +1 | t is non-zero there is new information about the system sothat the state vector should be modied. The contribution of r t +1 | t to the state vector, of course, needs to be weighted by the variance of r t +1 | t and the conditional covariance matrixof S t +1 .In summary, the Kalman lter consists of the following equations:

Prediction: (6), (7), (8) and (9)

Updating: (11) and (12).

In practice, one starts with initial prior information S 0 |0 and P 0 |0 , then predicts Z 1 |0 andV 1 |0 . Once the observation Z 1 is available, uses the updating equations to compute S 1 |1 andP 1 |1 , which in turns serve as prior for the next observation. This is the Kalman recusion.

Applications of Kalman FilterKalman lter has many applications. They are often classied as ltering, prediction ,and smoothing . Let F t 1 be the information available at time t 1, i.e., F t 1 = -led{Z t 1 , Z t 2 , . . .}.

Filtering: make inference on S t given F t .

Prediction: draw inference about S t + h with h > 0, given F t .

7


8/8

SMoothing: make inference about S t given the data F T , where T t is the samplesize.

We shall briey discuss some of the applications. A good reference is Chapter 11 of Tsay(2005, 2nd ed.).

8

lec16 Bocipherous

Documents

Transcript of lec16 Bocipherous