Stochastic Models, Estimation, And Control Volume 3

Stochastic models,estimation, and control

VOLUME 2

This is Volume 141-2 inMATHEMATICS IN SCIENCE AND ENGINEERINGA Series of Monographs and TextbooksEdited by RICHARD BELLMAN, University of Southern California

"{he "om\llete l\'i.t\t\'g, ot boo¥..'i. \\\ \\\\'1> '1>e!\e'1> \'1> a'la\lab\e hom t\\e \lub\\'!>\\etupon request.

Stochastic models,estimation,

and controlVOLUME 2

PETER S. MAYBECKDEPARTMENT OF ELECTRICAL ENGINEERING

AIR FORCE INSTITUTE OF TECHNOLOGY

WRIGHT-PATTERSON AIR FORCE BASE

OHIO

1982 @ACADEMIC PRESS

A Subsidiary of Harcourt Brace Jovanovich, Publishers

New York LondonParis San Diego San Francisco Sao Paulo Sydney Tokyo Toronto

COPYRIGHT © 1982, BY ACADEMIC PRESS, INC.ALL RIGHTS RESERVED.NO PART OF THIS PUBLICATION MAY BE REPRODUCED ORTRANSMITTED IN ANY FORM OR BY ANY MEANS, ELECTRONICOR MECHANICAL, INCLUDING PHOTOCOPY, RECORDING, OR ANYINFORMATION STORAGE'AND RETRIEVAL SYSTEM, WITHOUTPERMISSION IN WRITING FROM THE PUBLISHER.

ACADEMIC PRESS, INC.111 Fifth Avenue, New York, New York 10003

United Kingdom Edition published byACADEMIC PRESS, INC. (LONDON) LTD.24/28 Oval Road, London NWI 7DX

Library of corgress Catalogirg in Publication Data

Maybeck, Peter S.Stochastic models, estimation, and control.

(Mathematics in science and engineering)Includes bibliographies and index.1. System analysis. 2. Control theory.

3. Estimation theory. 4. Stochastic processes.I. Title. II. Series.QA4D2. toG7 519.2 78-8836ISBN D-12-48D7D2-X (v. 2) AACRl

PRINTED IN THE UNITED STATES OF AMERICA

82 83 84 85 9 8 7 6 543 2 1

To Kristen and Keryn . . .the "other women" in my life

This page intentionally left blank

Preface •Notation

VOLUME 1

Contents

xixiii

Chapter 1 Introduction

Chapter 2 Deterministic system models

Chapter 3 Probability theory and static models

Chapter 4 Stochastic processes and linear dynamic system models

Chapter 5 Optimal filtering with linear system models

Chapter 6 Design and performance analysis of Kalman filters

Chapter 7 Square root filtering

VOLUME 2

Chapter 8 Optimal smoothing

8.1 Introduction8.2 Basic Structure8.3 Three Classes of Smoothing Problems8.4 Fixed-Interval Smoothing8.5 Fixed-Point Smoothing8.6 Fixed-Lag Smoothing8.7 Summary

ReferencesProblems

I235

1516171819

vii

viii CONTENTS

Chapter 9

9.19.29.3

9049.59.6

Compensation of linear model inadequacies

IntroductionPseudonoise Addition and Artificial Lower Bounding of PLimiting Effective Filter Memory and Overweighting Most RecentDataFinite Memory FilteringLinearized and Extended Kalman FiltersSummaryReferencesProblems

2324

283339595962

Chapter 10 Parameter uncertainties and adaptive estimation

10.110.210.3lOA10.510.610.710.810.910.1010.11

IntroductionProblem FormulationUncertainties in cit and Bd : Likelihood EquationsUncertainties in cit and Bd : Full-Scale EstimatorUncertainties in cit and Bd : Performance AnalysisUncertainties in cit and Bd : Attaining Online ApplicabilityUncertainties in Qd and RBayesian and Multiple Model Filtering AlgorithmsCorrelation Methods for Self-Tuning: Residual "Whitening"Covariance Matching and Other TechniquesSummaryReferencesProblems

6870748096

lot120129136141143144151

Chapter 11 Nonlinear stochastic system models

11.1 Introduction11.2 Extensions of Linear System Modeling11.3 Markov Process Fundamentals1104 ItO Stochastic Integrals and Differentials11.5 ItO Stochastic Differential Equations11.6 Forward Kolmogorov Equation11.7 Summary

ReferencesProblems

Chapter 12 Nonlinear estimation

159160167175181192202202205

12.1 Introduction 21212.2 Nonlinear Filtering with Discrete-Time Measurements: Conceptually 21312.3 Conditional Moment Estimators 2151204 Conditional Quasi-Moments and Hermite Polynomial Series 23912.5 Conditional Mode Estimators 24112.6 Statistically Linearized Filter 243

CONTENTS ix

12.7 Nonlinear Filtering with Continuous-Time Measurements 12.8 Summary

References Problems

Index

VOLUME 3

Chapter 13 Dynamic programming and stochastic control

Chapter 14 Linear stochastic controller design and performance analysis

Chapter 15 Nonlinear stochastic controllers

245 257 259 265

273

Preface

As was true of Volume 1, the purpose of this book is twofold. First, itattempts to develop a thorough understanding of the fundamental conceptsincorporated in stochastic processes, estimation, and control. Second, andof equal importance, it provides experience and insights into applying thetheory to realistic practical problems. Basically, it investigates the theoryand derives from it the tools required to reach the ultimate objective of sys-tematically generating effective designs for estimators and stochastic con-trollers for operational implementation.

Perhaps most importantly, the entire text follows the basic principles ofVolume 1 and concentrates on presenting material in the most lucid. bestmotivated, and most easily grasped manner. It is oriented toward an engi-neer or an engineering student, and it is intended both to be a textbook fromwhich a reader can learn about estimation and stochastic control and to pro-vide a good reference source for those who are deeply immersed in theseareas. As a result, considerable effort is expended to provide graphical rep-resentations, physical interpretations and justifications, geometrical in-sights. and practical implications of important concepts, as well as preciseand mathematically rigorous development of ideas. With an eye to practical-ity and eventual implementation of algorithms in a digital computer, em-phasis is maintained on the case of continuous-time dynamic systems withsampled-data measurements available; nevertheless, corresponding resultsfor discrete-time dynamics or for continuous-time measurements are alsopresented. These algorithms are developed in detail, to the point where thevarious design trade-offs and performance evaluations involved in achievingan efficient, practical configuration can be understood. Many examples andproblems are used throughout the text to aid comprehension of importantconcepts. Furthermore, there is an extensive set of references in each chap-ter to allow pursuit of ideas in the open literature once an understanding ofboth theoretical concepts and practical implementation issues has been es-tablished through the text.

xi

xii PREFACE

This volume builds upon the foundations set in Volume 1. The seven chap-ters of that volume yielded linear stochastic system models driven by whiteGaussian noises and the optimal Kalman filter based upon models of that form.In this volume, Chapters 8-10 extend these ideas to consider optimalsmoothing in addition to filtering, compensation of linear model inadequa-cies while exploiting the basic insights of linear filtering (including an initialstudy of the important extended Kalman filter algorithm), and adaptive esti-mation based upon linear models in which uncertain parameters are embed-ded. Subsequently, Chapter 11 properly develops nonlinear stochastic sys-tem models, which then form the basis for the design of practical nonlinearestimation algorithms in Chapter 12.

This book forms a self-contained set with Volume 1, and together withVolume 3 on stochastic control, can provide a fundamental source for study-ing stochastic models, estimation, and control. In fact, they are an out-growth of a three-quarter sequence of graduate courses taught at the AirForce Institute of Technology; and thus the text and problems have receivedthorough class testing. Students had previously taken a basic course in ap-plied probability theory, and many had also taken a first control theorycourse, linear algebra, and linear system theory; but the required aspects ofthese disciplines have also been developed in Volume 1. The reader is as-sumed to have been exposed to advanced calculus, differential equations,and some vector and matrix analysis on an engineering level. Any more ad-vanced mathematical concepts are developed within the text itself, requiringonly a willingness on the part of the reader to deal with new means of con-ceiving a problem and its solution. Although the mathematics becomes rela-tively sophisticated at times, efforts are made to motivate the need for, andto stress the underlying basis of, this sophistication.

The author wishes to express his gratitude to the many students who havecontributed significantly to the writing of this book through their feedback tome-in the form of suggestions, questions, encouragement, and their ownpersonal growth. I regard it as one of God's many blessings that I have hadthe privilege to interact with these individuals and to contribute to theirgrowth. The stimulation of technical discussions and association with Pro-fessors Michael Athans, John Deyst, Nils Sandell, Wallace Vander Velde,William Widnall, and Alan Willsky of the Massachusetts Institute of Tech-nology, Professor David Kleinman of the University of Connecticut, andProfessors Jurgen Gobien, James Negro, J. B. Peterson, and Stanley Robin-son of the Air Force Institute of Technology has also had a profound effecton this work. I deeply appreciate the continued support provided by Dr.Robert Fontana, Chairman of the Department of Electrical Engineering atAFIT, and the painstaking care with which many of my associates have re-viewed the manuscript. Finally, I wish to thank my wife, Beverly, and mychildren, Kristen and Keryn, without whose constant love and support thiseffort could not have been fruitful.

Notation

Vectors, Matrices

Scalars are denoted by upper or lower case letters in italic type.Vectors are denoted by lower case letters in boldface type, as the vector x

made up of components Xi'

Matrices are denoted by upper case letters in boldface type, as the matrix Amade up of elements Aij (ith row, jth column).

Random Vectors (Stochastic Processes), Realizations (Samples),and Dummy Variables

Random vectors are set in boldface sans serif type, as x( .) or frequently justas x made up of scalar components Xi: x(·) is a mapping from the sample spaceQ into real Euclidean n-space R": for each W k E Q, X(Wk) ERn.

Realizations of the random vector are set in boldface roman type, as x:X(Wk) = x.

Dummy variables (for arguments of density or distribution functions, integra-tions, etc.) are denoted by the equivalent Greek letter, such as, being associatedwith x: e.g., the density function fx(~). The correspondences are (x,~), (y,p),(Z,I;), (Z, ~).

Stochastic processes are set in boldface sans seriftype,just as random vectorsare. The n-vector stochastic process x( " .) is a mapping from the product spaceT x Q into R", where T is some time set of interest: for each t j E T and 0h E Q,

x(tj' wk ) ERn. Moreover, for each t j E T, x(tj , ' ) is a random vector, and foreach Wk E Q, xt, wd can be thought of as a particular time function and iscalled a sample out of the process. In analogy with random vector realizations,such samples are set in boldface roman type: xt, wd = x(·) and xU j' wd = xU j ) .

Often the second argument of a stochastic process is suppressed: xU,,) is oftenwritten as x(t), and this stochastic process evaluated at time t is to be dis-tinguished from a process sample x(t) at that same time.

xiii

T.

-1.

xiv

Subscripts

a: augmentedb: backward runningc: continuous-timed: discrete-timef: final time; or filter (shaping filter)

Superscripts

transpose (matrix)*. complex conjugate transpose;

or transformed coordinatesinverse (matrix)

Matrix and Vector Relationships

NOTATION

n: nominalss: steady statet: truth model0: initial time

# . pseudoinverseestimateFourier transform;or steady state solution

A > 0: A is positive definite.A ~ 0: A is positive semidefinite.x sa: componentwise, Xl sal, X z s az, . . . , and x, san'

Commonly UsedAbbreviations and Symbols

E{' JE{ 'j'}expt: )lim.l.i.m.In(' )m.s.max.min.Rn

sgnt )tr( .)

expectationconditional expectationexponentiallimitlimit in mean (square)natural logmean squaremaximumminimumEuclidean n-spacesignum (sign of)trace

w.p.1

t·\11'11

FE

C

1.1l J

with probability of onedeterminant ofnorm of

matrix square root of(see Volume 1)

element ofsubset ofset of; such as{x E X:x S a}, i.e., the setof x E X such thatXi S aj for all i

List ofsymbols and pages where theyare defined or first used

14; 78; 137222;226

220; 22575

NOTATION XV

d i 97 KGS 222;226d N 98 KSL 244,B 41; 160 KTS 220; 225Bd 2 Kss 57Bm(ti) 227 L 35, 75bm(t) 249 mx 162; 199bm(t; -) 224; 225 m 40~p(t) 249 nGP 164bp(t/t;_ d 224; 226 n p 163C 25 n 39Ci 237 P = P(t/t) 58; 248; 249dl 181 P(ti-) 5;9;45;216dx 39; 161; 181 p(tn 5; 9; 44; 216dP 39; 161 P(t/t i - 1 ) 44; 215

dPm. 57; 163; 246 P(tdt j ) 6; 10d\jJ 186 PN 37E{' ] 26;39 Pb1(c) 3; 6; 10

E{"!-J 1 Pb1(tn 3;6;9e 26; 98; 137 Pxx 162F 20; 41; 45; 160 PxAt) 162; 199Fx 162 Pxx(t, t + r) 162Fx[y 162; 167 Po 2;9;70f 39 PtA) 163

t: 161;175 P(ti) 168!x(~, t [p, t') 174 P 75

i; 161 Pk(ti) 130; 131

f.IY 35;161;174 Q 39:F(t) 244 Qd 2;70:F 176 R 2; 40; 160G 39; 160 R c 58; 160G[x(t),t] 166; 181 Ri-aged 29G d 2 r 137Gd[x(ti ) , t;] 238 r, 80H 2; 41; 44; 160 r 39

Hi 239 rxy 28h 40 SI 82.#'(t;) 244 s 44I 161; 175; 178; 181 T 114; 169I 6 T xiii; 161I 36 tf 4J 81 t, 1J 243 u 2; 39K 6;44;58;59 v 2;40

xvi NOTATION

Vc 57;246 z(t) 57; 160W 8 z(t;) 1; 40; 160W 39; 160 z, 1wd 2 P 39; 160x 1; 39; 160 Pm 57; 163; 246x(t) 58;248;249 P' 176;192x(C) 1;9;45;216 ox 41x(tt) 1;9;44;216 00 41x(tlti_ d 44;215 oz 41x (t;/t j ) 1; 10 e 32XMAP 242 A 163xN 35; 36 V 256Xb(ti- ) 2 I: 242xb(tn 6 (J 55;67x k 131 (J2 28Xo 9;39;70 (1) 2; 161Yb(ti- ) 6; 10 4J 238Yb(tt) 6;9 cjJx 197Z 1; 213 If!xx(w) 63Z(ti,t) 2 t/J 186ZN 80 Q xiiiZi 1 W 1; 63

CHAPTER 8Optimal smoothing

8.1 INTRODUCTION

In the previous chapters, we have considered linear system models andoptimal filtering, the optimal estimation of the state at time t., x(t j , Wj) = x(t;l,based upon knowledge of all measurements taken up to time t.:

Z(tl,Wj)=Zl' z(tz,Wj)=zz, ... , z(ti,W)=Zi

or equivalently, Z(t;, w) = Zi' We have actually considered optimal predictionas well in attempting to estimate x(t;l based on knowledge ofZ(ti - 1 , w) = Zi-l'Under our assumptions, the optimal estimate of x(t;l, based on knowledge ofavailable measurement information, has been the conditional expectation ofx(t;l, conditioned on that information:

x(t/) = E{x(ti)IZ(t;,w) = Zi}

x(t j-) = E{x(t;}IZ(tj_bWj) = Zi-d

(8-1)

(8-2)

In fact, these values were shown to be optimal with respect to many differentcriteria.

The Kalman filter, or square root implementation of the same estimator,provides the best estimate of x(t;l based on all measurements through time t,in a recursive manner, and it is thus ideally suited to real-time computations.However, if one were willing (or able) to wait until after time t i to generate anoptimal estimate ofx(t;}, then a better estimate than the x(ti+) provided by theKalman filter could be produced in most cases [6, 7, 9, 23, 28]. The additionalinformation contained in the measurements taken after time t, can be exploitedto provide this improvement in estimation accuracy. The optimal smoothedestimate [36J (again under many criteria) is

j>i (8-3)

2 8. OPTIMAL SMOOTHING

(8-4)

(8-5)

and the subject of optimal smoothing is concerned with developing efficient,practical algorithms for calculating this estimate.

Section 8.2 formulates the smoothing problem and presents a conceptualapproach to smoothing of combining the outputs of a filter running forwardfrom initial time to to the current time t., and a separate filter running backwardfrom terminal time tf to ti • Three useful classes of smoothing problems, charac-terized by the manner in which t, and t j can vary in (8-3), are presented inSection 8.3. and then discussed individually in the ensuing three sections.

8.2 BASIC STRUCTURE

Explicit equations for various forms of optimal smoothers are generallyquite complicated. However, the basic smoothing concept and underlyingstructure can be discerned readily by dividing the estimation problem into twoparts, one involving the past and present measurements and the other basedon future measurements alone, and combining the results.

Consider a discrete-time model (possibly "equivalent discrete"):

x(ti + 1) = «Il(ti+ 1, ti)X(ti) + Bct(ti)u(t;) + Gct(t;)Wct(ti)

with the usual assumptions on x(to), Wct(', -), and v(', .): Gaussian and indepen-dent of each other, initial conditions with mean Xo and covariance Po, whiteand zero-mean processes of strengths Qct(ta and R(ta, respectively, for all timesof interest. Now assume we are trying to estimate x(ti ) from measurement datathrough time tj , with j > i. Put all of the measurements up through time t,into a single composite vector Z(ta, or perhaps more explicitly Z(t1, ta, denotingthe fact that its partitions are z(td, Z(t2), ... ,z(ti ). Similarly, put all "future"measurements, z(t i + d, z(t i + 2), ... ,z(tj ) , into a single composite vector Z(ti+ I' tJConceptually, a three-part procedure can now be employed to estimate x(ta:

(1) Calculate

(8-6)

by means of a filter running forward in time from time to to time t.. A prioriinformation about x(to) is used to initialize this filter.

(2) Independently, calculate

Xb(ti - ) = E{x(t;)IZ(ti+l,t j ) = Zd+I),j} (8-7)

by means of a filter that is run backwards in time from time t j to time ti + I'

plus a one-step "prediction" backward to time ti • The notation Xb(ti-) ismeant to denote the estimate of x(t i) provided by the backward-running filter(thus the subscript b), just before the measurement at time t, is incorporated(thus the minus superscript on t, "). Note that time t, - is to the right of t, + on

8.3 THREE CLASSES OF SMOOTHING PROBLEMS 3

a real-time scale for the backward filter, as shown in Fig. 8.1, since minus andplus denote before and after measurement incorporation, respectively. The"initial" condition for the backward-running filter is established by viewingx(t) as a random vector about which you have no a priori statistical information,i.e., Pb l(tj -) = O. Thus, an inverse-covariance formulation is appropriate forthe backward filter. (This will be developed further in Section 8.4.)

Backward filter

Forward filter

FIG. 8.1

x(I,+ )

•

Xb(li- )

Forward and backward filter operation.

(3) The smoothed estimate of X(ti), x(tdt) as defined in (8-3), is generatedby optimally combining the value ofx(ti +) from the forward filter (incorporatinginitial condition information about x(to) and measurement information fromZl' Z2' ... , Zi) and Xb(ti -) from the backward filter (incorporating measurementinformation from Zi+ 1, Zi+2, ... , z). This combination is accomplished byviewing x(t i +) and Xb(ti - ) as two separate "observations" ofx(t i ) and assigningrelative weighting according to the confidence you have in the precision ofeach, indicated by P(t/) and Pb(t i - ) , respectively. Another way of thinking ofthis process is to consider the backward filter output Xb(ti -) as providing anadditional "measurement" with which to update the forward filter. Note thatwe choose to process z(ti , wk) = Zi in the forward filter; we could just as easilyhave chosen to process it in the backward filter instead, as long as this datadoes not enter into both filters and thus be counted twice in the smoothedestimate.

8.3 THREE CLASSES OF SMOOTHING PROBLEMS

There are many different classes of smoothing problems, each being deter-mined by the manner in which the time parameters t, and tj are allowed tovary in the desired smoothed estimate x(tdtj). However, there are three classesof particular interest because of their applicability to realistic problems,namely, fixed-interval, fixed-point, and fixed-lag smoothing problems[10,23, 24, 27, 32, 36].

To describe fixed-interval smoothing, let an experiment (system operation,mission, etc.) be conducted, and let measurement data be collected over theinterval from initial time to to final time tI» [to, tf]' After all of the data hasbeen collected, it is desired to obtain the optimal estimate of x(t;) for all timet, E [to, tf]' based on all measurements taken in the interval. Offline computa-tions are thus inherently involved in generating the optimal fixed-intervalsmoothed estimate,

(8-8)x(tjt f) = E{x(t;)IZ(tf) = Zf}

ti = to, t1>"" tf ; tf = fixed final time

Figure 8.2a represents these calculations schematically. This estimation tech-nique is used for post-experiment data reduction to obtain refined state estimates

(a]

x(ta/tfl----..... X(t.!tf) -x{t,+ ,Itf) --------. x{tr!tf)

[~----<~ ~ ~ ~>-----+--]t,

\

Data available from whole interval

(b) x{t.!t,)

•x(tr/t d d!~

[__-+_----<~ ~ x(tl,/tf)

t,I

Growing length of data

(e)

x(tl+dtJ+N+tlI

t r + I ti+flH I

~ N intervals--;

Data

t,

t== N intervals-;

Data

\~" '"

(rr-o----=:-'-------

(L~-------:::-"----~FIG. 8.2 Three types of smoothers: (a) fixed-interval smoothing, (b) fixed-point smoothing,

(c) fixed-lag smoothing.

8.4 FIXED-INTERVAL SMOOTHING 5

(8-9)

(8-10)

of better quality than that provided by online filters. It is also possible to usefixed-interval smoothing to estimate values of control inputs as well as states,to assess whether the "deterministic" controls were actually of commandedmagnitude. A specific example would be post-flight analysis of a missile,generating smoothed estimates of both trajectory parameters (states) and thrustactually produced by the rocket motors for that flight (controls).

To consider fixed-point smoothing, let there be a certain point (or points)in time at which the value of the system state is considered critical. For example,conditions at engine burnout time are critical to rocket booster problems. Thus,one would desire an estimate ofx(ti) for fixed t., conditioned on more and moredata as measurements become available in real time:

x(t;/t) = E{x(ti)!Z(tj ) = Zj}

t, fixed; tj = t., ti+ 1,···, tJ

This is the optimal fixed-point smoothed estimate, as depicted in Fig. 8.2b.Finally, let measurements be taken, but assume that it is admissible for your

application to generate an optimal estimate of x(ti), not at time t., but at timeti+ N, where N is a fixed integer. Thus, to estimate x(ti), you have available notonly the measurements

Z(t1,Wk) = Zl' z(tz,wk) = zz, ... , Z(t;,Wk) = Z;

but also the N additional measurements

and you are willing to delay the computation of the estimate ofx(t;) until ti + N

to take advantage of the additional information in these N measurements.We wish to generate the optimal fixed-lag smoothed estimate,

x(t;/t;+N) = E{x(t;)IZ(ti+ N) = Zi+N}

t, = to,t l' ... , tJ-N; N = fixed integer

Such an estimator is depicted in Fig. 8.2c and is particularly applicable tocommunications and telemetry data reduction.

8.4 FIXED-INTERVAL SMOOTHING

To develop the fixed-interval smoother, we shall exploit the work of Fraser[6, 7], who first showed it to be just a suitable combination of two optimalfilters. Let the forward filter recursively produce a state estimate x(tk") anderror covariance P(tk-) before incorporation of measurement Zb and x(tk+)

and P(tk+) after incorporation, for k = 1, 2, ... , i. Notationally, let Xb(tk-)and Pb(tk-) denote the state estimate and error covariance before incorporatingmeasurement Zk into the backward filter, and let Xb(tk+) and Pb(tk+) be analogous


quantities after incorporation. Because the backward filter is of inverse co-variance formulation [17J, it will actually incorporate Zk to generate Ph l(tk+)and Yb(tk+) = Ph l(tk+)Xb(tk+), and then propagate backward in time to formPh l(tk_d and Yb(tk- d = Ph l(tk_ dXb(tk-1), for k = j, j - 1, ... .i + 1. Thus,we have the situation depicted in Fig. 8.1, with tj = tr-Subsequently, the optimalfixed-interval smoothed estimate and associated error covariance can be eval-uated totally in terms of quantities available from the two filters as [6, 7, 34, 35J:

x(t;/tf ) = P(t;/tf)[P-1(t/)x(t/) + Phl(ti-)Xb(ti-)] (8-11a)

= P(t;/tf)[P-1(t/)x(t/) + Yb(ti-n (8-11b)

P- 1(t;/tf ) = P- 1(t

i+) + Ph1(t

i-) (8-12)

There are a number of ways of deriving (8-11) and (8-12), and a demon-stration by analogy to previous filter results is presented here because of itssimplicity. In Chapter 5, we started from an estimate x(ti -) and associatederror covariance P(t i -) just before the measurement z(t;, Wk) = Z; was incor-porated, and derived the update equations as

x(t/) = x(ti - ) + K(t;)[Zi - H(tJX(ti-)]

= [I - K(t;)H(t;)]x(t i -) + K(t;)z;

where the gain K(t;) is given by

K(ti) = P(ti -)HT(t;)[H(t;)P(t; - )HT(ti) + R(t i)] - 1= P(t; +)HT(t;)R(tr 1

and the covariance update as

Ptr,") = P(ti-) - K(t;)H(t;)P(ti-)= [I - K(t;)H(t;)]P(t i - )

or, equivalently,

(8-13a)

(8-13b)

(8-14a)

(8-14b)

(8-15a)

(8-15b)

(8-16)

Because it will be useful to the ensuing derivation, (8-15b) can be rearranged as

(8-17)

Substituting this and (8-14b) into (8-13b) yields an equivalent expression forx(t/) as:

x(t/) = [P(t/)P- 1(ti - )]X(ti-) + P(t/)HT(ti)R - l(t;)Z;

= P(t;+)[P~I(ti-)X(ti-) + H T(ti)R-

1(t;)zJ (8-18)

Now consider the analogous problem of having an estimate x(t/) andassociated P(t; +), based on all measurements up to and including z;, and nowit is desired to update that estimate with the "measurement" xb(t; -) with an


associated "measurement error" covariance matrix Pb(t i "), Since Xb(ti- l wouldrepresent a "measurement" of the entire state, the H(til for this update wouldbe the identity matrix. When the update is performed, the resulting state esti-mate will be the optimal estimate of x(ti) based on Zi and Xb(ti - l, and sinceXb(ti-l embodies all the information about x(ti) based (solely) on the dataZi+l,Zi+2, ... ,zf' this is then the optimal estimate ofx(t;) based on ZJ' orx(tjtf)' The associated estimate error covariance after update would be P(tjtfl.Thus, for this analogy, the replacements shown in Table 8.1 can be made.

TABLE 8.1

Original filter update

xlt, ")x(t;+)

Smoother relation

x(t/)x(ti/tf )

xblt, -)

Original filter update

Hlt;)R(t,)P(t,-)P(ti+)

Smoother relation

IPblti -)

P(t/)Plt'/tf)

Using these replacements, (8-18) and (8-16) become equal to (8-11) and (8-12),respectively, as desired.

Equation (8-12) indicates that

(8-19)

i.e., that the smoothed estimate is at least as good as the filtered estimate forall time. Generally, it will in fact be better except at the terminal time tf' asshown graphically in Fig. 8.3 for a typical case. The conditions on the smootherat time tf' and their implications for the "initial conditions" on the backwardfilter, warrant further attention.

Mean square estimation error

\ Forward filter

\~ Pltt)\\ .-\ ,\\ I

Smoother Pltdtr)

Time

FIG. 8.3 P(t i +), Pb(t,+), P(tdtf) for typical estimation problem.

8 8. OPTIMAL SMOOTIIING

At terminal time t J' the optimal smoothed estimate is

x(tJ/tJ) ~ E{x(tJ)IZ(tJ) = Z!} (8-20)

By definition, this must be equivalent to the forward filter estimate x(tJ +), since

x{t! +) ~ E{x(t!)!Z(tJ) = ZA (8-21)

Similarly, it must be true that

petJ/tJ) = petJ +) (8-22)

Since (8-12) is true for all time, let t, = tJ and obtain

P- l(tJ/tJ) = P- 1(tJ +) + Pb l(tJ-) (8-23)

Combining (8-22) and (8-23) yields the starting condition for the backwardfilter:

(8-24)

The boundary condition on xb, xb(tJ "), is as yet totally unknown. However,it is finite, so

(8-25)

Thus, the backward filter is formulated as an inverse covariance filter, usingthe starting conditions (8-24) and (8-25). The forward filter may be formulatedeither as a conventional Kalman filter or an inverse covariance form optimalestimator, the former being more prevalent.

Once x(t; +) and Xb(t; -) have been calculated in the forward and backwardfilters, respectively, (8-11) and (8-12) can be used to combine these results.However, it would be computationally more efficient to process P(t;jtJ) directly,instead of using (8-12) to generate P-1(tdtJ) and inverting for use in (8-11).Algebraic manipulation of (8-12) yields this desirable form as

P(tdtJ) = [I - W(t;)Pb l(t; - )]P(t;+)[1 - W(t;)Pb l(t; -)y+ W(t;)Pb l(t; -)WT(t;) (8-26)

where

(8-27)

The smoothed estimate relation (8-11) is in a useful computational form ifthe forward filter is of inverse covariance form, since P- 1(t/)x(t;+) would beavailable directly as y{t;+) and then

(8-28)

However, for the more typical case in which the forward filter is of conventionalcovariance form, a relation that does not require an inversion of the n-by-nPu,") matrix would be preferable. To achieve such a form, premultiply (8-12)

by P(t/) to obtain

9

P(t/)P- 1(tdtJ) = 1 + P(t/)Ph 1(ti-)

Inverting this expression yields

P(tdtJ)P-1(t/) = [I + P(t; +)Ph l(t;-)] - 1 (8-29)

Finally, substitution of (8-29) into (8-11b) produces

x(tdtJ) = [I + P(t/)Ph l(ti-)] -lX(t/) + P(tdtJ)Yb(t; -) (8-30)

Although (8-30) requires computation of [I + P(t/lPh 1(t;-)]-1, this is thesame inverse used in (8-27),so we have in fact reduced the number of inversionsrequired.

Let us now summarize the fixed-interval smoother. First a conventionalforward filter is initialized by

x(to) = Xo

P(to) = Po

Time propagations between sample times are processed by

x(tk+ d = <D(tk+ b tk)X(tk+) + Bd(tk)U(tk)

P(tk+ 1) = <D(tk+ 1> tk)P(tk+ )<DT(tk+ 1> tk) + Gd(tk)Qd(tdG/(tk)

Measurement updates are given by

K(tk) = P(tk -)HT(tk) [H(tk)P(tk- )HT(tk) + R(tk)]-l

x(t/) = x(tk-) + K(t k) [Zk - H(tk)x(tk-)]

P(tk+) = P(tk-) - K(tk)H(tdP(tk-)

(8-31a)

(8-31 b)

(8-32a)

(8-32b)

(8-33a)

(8-33b)

(8-33c)

Equations (8-32)and (8-33)are applied iteratively for k = 1,2, ... , i to producex(ti +). Note that (8-33c)could be replaced by the Joseph form if desired. Squareroot forms are also possible, but offline processing is usually done on computerswith long enough wordlength that numerics do not dictate such implementation.

The inverse-covariance backward filter is initialized at terminal time tJthrough (note the use of tJ -, since a measurement update to incorporate ZJ

precedes the first backward time propagation):

S\(tJ -) = 0

Ph1(tJ -) = 0

Measurement updates are generated by

Yb(tk+) = Yb(tk-) + HT(tk)R -l(tk)Zk

Ph l(tk+) = Ph l(tk-) + HT(tk)R -l(tk)H(td

(8-34a)

(8-34b)

(8-35a)

(8-35b)

10 8. OPTIMAL SMOOTIUNG

(8-36d)

(8-36a)

(8-36b)

(8-36c)

The estimate is propagated backward in time to the preceding measurementsample time via [6, 7, 12, 15,33-35]:

J(t k) = Pb 1(tk+ )Gd(tk- d[GdT(tk-1)Pb 1(tk+)Gd(tk- d+ Qd 1(t

k_ 1)]-1

L(tk) = I - J(tdGdT(tk_ d

Yb(tk- 1) = «)T(tk' tk- dL(tk)[Yb(tk+) - Pb 1(tk+ )Bd(tk- 1)U(tk- 1)]

Pb1(tk-_ 1)= <DT(tb i.: d{L(tk)Pb 1(tk+ )U(tk)+ J(tk)Qd 1(tk_ 1)J

T(tk)} <D(tb tk- 1)

Note the time indices on the state transition matrices: these are indeed properfor backward propagation of adjoint system relations. Equations (8-35) and(8-36) are applied recursively for k = f, (f - 1), ... ,(i + 1) to generate Yb(t;"),

At time t.. the smoothed estimate is calculated by combining X(ti +), P(t/),Yb(ti - ) and Pb

1(ti - ) through

X(t i) = [I + P(t/)Pb 1(t;-)]-1

W(ti) = P(t;+)XT(tJ

Y(t i) = I - W(ti)Pb1(t

i - )

P(t;/t f) = Y(ti)p(t/)yT(tJ + W(ti)Pb1(ti-)WT(ti)

x(t;/tf) = X(t;)x(t/) + P(t;/tf)Yb(t;-)

(8-37a)

(8-37b)

(8-37c)

(8-37d)

(8-37e)

One effective computational procedure is to compute and store values ofPb 1(tk -) and Yb(tk -) from running the backward filter for k = [, (f - 1), ... ,1.Then the forward filter is run across the interval, simultaneously generatingthe values of x(t;/tf) and P(t;/tf) from (8-37) by using the values stored duringthe backward filter operation.

In some post-experimental data reduction applications, such as trajectoryanalyses, it is desirable to determine an estimate of the control inputs based onmeasurement data, to discern whether actual control inputs were equal tonominally commanded values. Let the system dynamics be described by

where u(t i) is the nominal value ofthe control input at time t.. This is the originalproblem model (8-4) with the assumption that Bd(t;) == Gd(t;) for all times ofinterest (this is rather nonrestrictive, since zero entries can be added to u(t i ) orWd(t i) as required for compatibility). A smoothed estimate of the actual controlapplied at time t, based on all measurement data u(t;/t f) can then be determined,along with an associated covariance for [u(t i) + Wd(t i ) ] , U(tdtf). The two setsof filter relations are altered only to set Bd(tk ) == Gd(t k), and then the following

computations are added to smoother relations (8-37):

V(ti) = Qct(ti)GctT(ti)T(ti' ti+ilKu(t;) = V(ti)XT(ti)

fi(t;/t f ) = u(t i) + Ku(t;)[Yb(t;-) - P;l(t;-)X(ti+)J

U(t;/tf) = Qct(ti) - V(ti)P; l(ti- )K/(ti)

11

(8-39a)

(8-39b)

(8-39c)

(8-39d)

The question naturally arises whether or not smoothing is worth the addi-tional burden beyond that of filtering for a particular application. The smooth-ability criterion as defined by Fraser [6J is meant to judge whether smoothingwill provide a state estimate which is superior to that obtained by the simplermeans of extrapolating the final forward filter state estimate backward in timethrough repeated usage of (8-32a) arranged as

x'(td = (tk, tk+ l)[X'(tk+ il - Bct(tk)U(tk)] (8-40a)starting from

(8-40b)

By this. definition, those components of the state which are controllable withrespect to the dynamic driving noise are smoothable. Thus, one should notconsider smoothing in those cases for which an adequate model is a linearsystem with no driving noise, i.e., Qct(ti) == O. Moreover, the larger the elements(eigenvalues) of the stochastic controllability matrix, the greater the relativebenefit of smoothing instead of filtering.

EXAMPLE 8.1 Consider application [6] of the discrete-time smoother to an oscillatorysystem for which an adequate model is a lightly damped, constant coefficient, second order sto-chastic differential equation; for instance, a mass, spring, and dashpot described by

where W n is the undamped natural frequency of the system and (iSlhe damping ratio, with numericalvalues

W n = 6 rad/sec ~ I rev/sec, ( = 0.16

Scalar measurements of X 1(t i) are taken every 0.1 sec, modeled as

Z(ti) = [I O]X(ti) + v(ti)

Initial condition information about x(to) is that it can be modeled as a Gaussian random variablewith mean and covariance

Let an equivalent discrete-time model be generated, and let Qd = 10- 2 and R = 10- 4.

Figures 8.4 and 8.5 portray the diagonal elements (on log scale) of the resulting forward filterP(ti +) and smoother P(tdtJ) for a time interval of 10 sec. Note that the smoother estimate is neverworse than that of the filter, this being the consequence of (8-12), as discussed earlier.

0

-I

dJ -2u<:::0;

'C0;

~OJ) -30

..:I

-4 t\..

-50 2 3 4 5 6 7 8 9 10

Time

FIG.8.4 One-one element of covariance matrices, R = 10- 4 , Qd = 10- 2: D, forward filter;

0, smoother. From Fraser [6].

O_----r---,----~-~--~-~--,._-~--,._-_,

o 2 3 4 5 6 7 8 9 10Time

FIG.8.5 Two-two element of covariance matrices, R = 10-4, Qd = 10- 2: D, forward filter:


Also note that there are indeed two transients for the smoother: the transient in the beginningbeing due to the forward filter and the one at the end due to the backward filter. For this case, bothfilters were essentially in steady state for most of the time interval of interest, and thus so was thesmoother.

The improvement of smoothing over filtering is noticeably more significant for estimating X2

than for Xl' This is true because the X2 state is more strongly smoothable: x2(- , . ) is driven directlyby the dynamic driving noise w(','), while Xl(', .) is separated from wt-,-]by an integration. •

"u<::ell.;: 0ell~OIl0-l

-1

1098765432- 2 L-_---'---_-----'__--'--_~ ____'__~ __J.__~ __~ _ ___'

oTime

FIG. 8.6 Two-jwo element of covariance matrices. R = 10- 4, Qd = 10: D. forward filter;0, smoother. From Fraser [6].

EXAMPLE 8.2 Consider the same problem as in Example 8.1, but now investigate the effectof varying Qd' For Fig. 8.6, a plot of log P22 for the forward filter and smoother, Qd was increasedfrom 10- 2 to 10. As Qd is made larger, X 2 becomes "more smoothable," and the improvement ofsmoothing over filtering becomes larger, as seen by comparing this plot to Fig. 8.5.

For Fig. 8.7, Qd was set at zero, and there is virtually no separation between the smoother andfilter variances. For the case of no dynamic driving noise, the state vector is not "smoothable,"and the smoother result is merely the backward extrapolation of the forward filter final covariance.

•

1098765432-12 L-_--'-_---..J...._--------'-__'--_-"--_---'---_--------'-__'--_--'----_---l

oTime

FIG. 8.7 Two-two element of covariance matrices, R = 10- 4, Qd = 0: D. forward filter;

EXAMPLE 8.3 The previous examples, cases in which smoothing significantly improves thefilter estimate, as shown in Figs. 8.5 and 8.6, are characterized by both the filter and smootherquickly reaching steady state operation. For these cases, the Qd/R ratio is large: the uncertaintyin the state propagation is large compared to relatively accurate measurements, so the new estimateis much more heavily dependent upon the new measurement than prior estimates; this combinedwith noise stationarity caused the filter to reach steady state operation quickly.

The steady state values attained by the filter and smoother are independent of the Po matrix.The filter steady state is reached after the transient effects due to Po have settled out, and the back-ward filter is independent of Po by definition. Since P(tdtIl is a function of the two filter covariances,the smoother steady state is also independent of Po.

Figure 8.8 exemplifies a case with a long transient compared to the time interval of interest:the same problem as in the previous examples, but with Qd = 10- 4 and R = 10. Note the lack ofsignificant improvement by smoothing. Also note that the variance oscillations occur at a frequencytwice that of the system characteristic frequency, typical of estimation problems with oscillatorysystems. •

o~<::os

.~ -I~Ollo

....l -2

-3

o 2 3 4 5 6 7 8 9 10Time

FIG. 8.8 Two-two element of covariance matrices, R = 10, Qd = 10- 4: 0, forward filter;0, smoother. From Fraser [6].

The smoother algorithm of (8-31)-(8-37) can be shown to be equivalent tothe optimal fixed-interval smoother obtained by Meditch [20, 21, 23, 24].Computationally, this alternate method entails performing the forward filtercomputations and storing x(t i - ) , P(t i - ), x(t;+), and P(t/) for all time over theinterval [to,tI ]. Then, starting from the boundary condition

x(ti/tI ) = x(t/) (8-41)

the smoothed estimate is generated backward in time via

x(tdtI) = x(t/) + A(t;)[x(t;+ l/tI) - x(t;-+ l)J

where the "smoothing estimator gain matrix" A(t;) is given by

A(ti) = p(t/)cI>T(ti+l,tJP-1(t

i-+ 1)

(8-42)

(8-43)

8.5 FIXED-POINT SMOOTHING 15

Again, the indices in (f)T in this expression are correct as shown, appropriateto the state transition matrix for propagating adjoint system quantities back-ward in time. Also note that an n-by-n inversion is required for each recursionof (8-43). The covariance of the zero-mean Gaussian estimation error [X(ti) -i(t;/tf)]' as given in (8-37d), can be generated backwards from

P(tfitf) = P(tf +) (8-44)using the recursion

P(t;/tf) = Pu,") + A(tJ[P(ti+ r/tf) - P(ti-+ d]AT(ti)

For the special case of Qd(ti) == 0, A(ti) becomes CI>(ti , i.; 1) andf-1

x(t;/tf) = (f)(ti> tf)x(tf +) - L (f)(ti> tk+ dBd(tk)U(tk)k=i

as would be predicted in view of the smoothability criterion and (8-40).

8.5 FIXED-POINT SMOOTHING

(8-45)

(8-46a)

(8-46b)

Meditch [10, 19-21,23,24] has shown that the optimal fixed-point smoothercan be expressed in the following form. Starting from the initial condition

x(t;/t i) = X(ti+) (8-47)

with x(t; +) obtained from a concurrently running Kalman filter, the relation

(8-48)

is solved for fixed t., letting tj = t i+ 1, t i+ 2 , ••• , tf, with the gain matrix W(tj)evaluated recursively as

j-1W(t j) ~ n A(tk) = W(tj- dA(tj - d

k=i

(8-49a)

A(tk) ~ P(tk+)CI>T(tk+b tk)p - 1(tk+ d (8-49b)

The values for x(tj +) and x(tj -) in this expression are obtained from the Kalmanfilter iterating forward in time. In fact,

(8-50)

so (8-48) can be interpreted as the means of "reflecting back" the informationembodied in the filter residual at each time t j about the state value at time ti .

Moreover, the vector in (8-50) is directly available from the filter. The errorcommited by this estimator, [x(t i ) - i(t;/tj)], is Gaussian and zero mean forj = i, i + 1, ... , f, with covariance:

P(t;/t) = P(t;/tj- 1) + W(t) [P(t j +) - P(tj -)]WT(t)

= P(t;/tj- 1) - W(tj)K(tj)H(tj)P(t j -)WT(tj)

(8-51a)

(8-51b)

solved forward for t j = t i+ 1, ti +2, ..• , tf from the initial condition

P(tdti) = P(ti +) (8-52)

To avoid the inversion of P(tk+ 1) as required for each recursion of (8-49b),the equivalent form due to Fraser (derived in a fashion analogous to that ofSection 8.4) can be used [6, 7]' This algorithm instead requires knowledge ofR -1(tk), which can be precomputed for all k. As in the Meditch smoother, aKalman filter is an integral part of the estimator. Letting t; be fixed, x(tdt) iscalculated iteratively for t j = t i+ b t i+ 2 , .•• , tf from

x(tdt) = x(tdtj-l) + W(t)HT(tj)R -1(t)[Zj - H(t)x(tj -)] (8-53)

using (8-47) as an initial condition, with W(tj ) generated by means of the re-cursion

starting from

Set) = HT(tj)R -1(t)H(t)

W(t j) = W(tj_t)<IlT(t j, t j_ d[1 - S(tj)P(tj +)]

(8-54a)

(8-54b)

(8-54c)

In (8-53), the bracketed term is the residual that is directly available from thefilter, and does not require separate computation. The error covariance canbe computed in this form as

P(tdt) = P(tdtj- d - W(tj) [S(tj)P(tj -)S(tj ) + S(t)]WT(t) (8-55)

for j = i + 1, i + 2, ... , f, with initial condition given by (8-52).

8.6 FIXED-LAG SMOOTHING

The optimal fixed-lag smoother for an N-step time lag can be generatedfrom the relation [10, 21, 23, 24]

x(ti+ dti+ N+d = <Il(ti+ 1> tdx(tdti+ N)

+ C(ti+N+l)K(ti+N+d[Zi+N+l - H(ti+N+dx(ti-+N+d]

+ U(ti+ d[x(tdti+ N) - x(t/)] (8-56)

where the n-by-n gain matrix C(ti+N+d is given byi+N

C(ti+N+ 1) = n A(tk ) = A-1(ti)C(ti+N)A(ti+N)k=i+ 1

(8-57a)

where A(td is defined in (8-49b), and the n-by-n U(ti+ d is given by

U(ti+ d = Gd(ti)Qd(ti)G/(ti)<IlT(t;, t.; 1)P-l(t/) (8-57b)

In (8-56), the first term on the right hand side indicates the time propagation ofthe previous smoothed estimate, the second term is the correction due to

8.7 SUMMARY 17

incorporating the new information available from measurement at time t, + N + 1,

and the third term is the correction due to dynamic noise smoothing beyondthe capabilities of the Kalman filter. The entire quantity {K(t i+ N+ d[Zi+N+ 1 -

H(t i + N + dX(ti-+N+d]} is available from the Kalman filter which is run simul-taneously with the smoother algorithm.

The smoothed estimate is calculated by (8-56) for t, = to, t b ... , tf - N fromthe initial condition x(to/tN ) . Thus, in order to initialize the fixed-lag smoother,an optimal fixed-point smoother as described in the previous section must beiterated N times to start from x(to/to) = Xo and compute x(to/td, x(to/t2), ... ,

x(to/tN ) in succession.The covariance of the zero-mean error committed by this estimator can be

computed by

P(ti+ t!ti+N+ 1) = P(ti-+ 1) - C(t i+ N+ 1)K(ti+N+ 1)H(ti+ N+ dP(ti-+ N+dCT(ti+ N+ 1)

- A - 1(t;)[p(t; +) - P(tdt;+ Nl]A-1(tJT (8-58)

for i = 0, 1, ... , (f - N - 1),starting from the initial condition P(to/tNl, whichis also obtained as the output of N initial iterations of the optimal fixed-pointsmoother. The computational and storage burden of this algorithm is seen tobe considerably greater than that of an optimal filter for the same problem(which is inherently part of the smoother structure itself). Consequently, theperformance benefit of incorporating N additional measurements into eachestimate must warrant the added burden and N-step delay of availability ofthe state estimate, before this estimator would become preferable to a morestraightforward filter algorithm.

8.7 SUMMARY

This chapter has generated algorithms for the optimal smoothed estimateof the state at time t., based on measurement data through time t j • wherei > i; namely, x(tdt) = E{x(t i)IZ(t j ) = ZJ. Conceptually, the smoothing prob-lem can be decomposed into two filtering problems: one using initial conditionsand the "past" history of data {Z1, Z2' , Zi} and the other incorporating only"future" measurements [z.; b Zi+2, , Zj}' with the smoothed estimate beingthe optimal combination of these two filter outputs. Figure 8.2 portrayed thethree useful types of smoothers: the fixed-interval smoother described by(8-31)-(8-37), the fixed-point smoother given by (8-52)-(8-55), and the fixed-lagsmoother specified by (8-56)-(8-58). If it is acceptable to make an estimate ofx(t i ) at some time later than t, itself, and if the additional computational andstorage requirements are not prohibitive, a smoother can provide estimatessuperior to those of a filter. The degree of improvement can be assessed bycomparing P(tdt) to Ptr, +) from a filter, so that a tradeoff decision can bemade rationally.


Optimal smoothers can also be developed analogously for the cases ofnonwhite measurement corruption noise [3, 25, 26J or continuous-time mea-surements [18-20,23], and extensions of the concepts in this chapter can beused to generate nonlinear smoothers [1,2,4, 5, 8, 13,22,29, 34, 35]. Innova-tions approaches [10, 11, 14-16J, maximum likelihood methods [30, 31], leastsquares [10,11,14], and dynamic programming [5J can also be applied togenerate smoothing algorithms.

REFERENCES

I. Bizwas, K. K., and Mahalanabis, A. K., Suboptimal algorithms for nonlinear smoothing,IEEE Trans. Aerospace Electron. Systems AES-9 (4), 529-534 (1973).

2. Bryson, A. E., and Frazier, M., Smoothing for linear and nonlinear dynamic systems, Proc.Optimum Systems Synthesis Con]: Tech. Rep. ASD-TRD-63-119, pp. 354-364, Wright-Patterson AFB, Ohio (February 1963).

3. Bryson, A. E., Jr., and Henrikson, L. J., Estimation Using Sampled-Data Containing Se-quentially Correlated Noise, Tech. Rep. No. 533. Division of Engineering and AppliedPhysics, Harvard Univ., Cambridge, Massachusetts (June 1967).

4. Cox, H., On the estimation of state variables and parameters for noisy dynamic systems,IEEE Trans. Automat. Contr. 9 (I), 5-12 (1964).

5. Cox., H., Estimation of state variables via dynamic programming, Proc. Joint Automat.Contr. Conf., Stanford, California pp. 376-381 (1964).

6. Fraser, D. C., A New Technique for the Optimal Smoothing of Data. Ph.D. dissertation,MIT, Cambridge, Massachusetts (June 1967).

7. Fraser, D. C, and Potter, J. E., The optimum linear smoother as a combination of two opti-mum linear filters. IEEE Trans. Automat. Control 7 (4), 387-390 (1969).

8. Friedland, B., and Bernstein, 1., Estimation of the state of a nonlinear process in the presenceof nongaussian noise and disturbances, J. Franklin Inst. 281, 455-480 (1966).

9. Griffin, R. E., and Sage, A. P., Sensitivity analysis of discrete filtering and smoothing algo-rithms, Proc. AIAA Guidance, Control, and Flight Dynam. Conf., Pasadena, California PaperNo. 68-824 (August 1968); also AIAA J. 7 (10), 1890-1897 (1969).

10. Kailath, T., Supplement to "A Survey of Data Smoothing," Automatica 11,109-111 (1975).II. Kailath, T., and Frost, P., An innovations approach to least-squares estimations, Part II:

Linear smoothing in additive white noise, IEEE Trans. Automat. Control AC-13 (6),655-660(1968).

12. Lainiotis, D. G., General backward Markov models, IEEE Trans. Automat. Control AC-21 (4),595-598 (1976).

13. Leondes, C. T., Peller, J. B., and Stear, E. B., Nonlinear smoothing theory, IEEE Trans.Syst. Sci. Cybernet. SSC-6 (1),63-71 (1970).

14. Ljung, L., and Kailath, T., A unified approach to smoothing formulas, Automatica 12 (2),147-157 (1976).

IS. Ljung, L., and Kailath, T., Backwards Markovian models for second-order stochasticprocesses, IEEE Trans. Informal. Theory IT-22 (4), 488-491 (1976).

16. Ljung, L., and Kailath, T., Formulas for efficient change of initial conditions in least-squaresestimation, IEEE Trans. Automat. Control AC-22 (3), 443-447 (1977).

17. Maybeck, P. S., "Stochastic Models, Estimation, and Control," Vol. I. Academic Press,New York, 1979.

18. Meditch, J. S., Optimal fixed point continuous linear smoothing, Proc. Joint Automat. ControlConf., Univ. of Pennsylvania, Philadelphia, Pennsyluania pp. 249-257 (June 1967).

19. Meditch, J. S., On optimal fixed-point linear smoothing, Int. J. Control, 6, 189 (1962).

PROBLEMS 19

20. Meditch, J. S., On optimal linear smoothing theory, Inform. and Control 10, 598-615 (1967).21. Meditch, J. S., Orthogonal projection and discrete linear smoothing, J. SIAM ControlS (I),

74-89 (1967).22. Meditch, J. S., A successive approximation procedure for nonlinear data smoothing, Proc,

Symp. Informat. Processing, Purdue Uniu., Lafayette, Indiana pp. 555-568 (April 1969).23. Meditch, J. S., "Stochastic Optimal Linear Estimation and Control." McGraw-Hill, New

York,1969.24. Meditch, J. S., A survey of data smoothing, Automatica 9,151-162 (1973).25. Mehra, R. K., and Bryson, A. E., Smoothing for time-varying systems using measurements

containing colored noise, Proc. Joint Automat. Control Conf., Ann Arbor, Michigan pp. 871-883 (June 1968).

26. Mehra, R. K., and Bryson, A. E., Linear smoothing using measurements containing correlatednoise with an application to inertial navigation, IEEE Trans. Automat. Control AC-13 (5),496-503 (1968).

27. Mehra, R. K., and Lainiotis, D. G. (eds.), "System Identification: Advances and CaseStudies." Academic Press, New York, 1976.

28. Nash, R. A., Jr., Kasper, J. F., r-., Crawford, B. S., and Levine, S. A., Application of optimalsmoothing to the testing and evaluation of inertial navigation systems and components,IEEE Trans. Automat. Control AC-16 (6) 806-816 (1971).

29. Peller, J. B., Nonlinear smoothing techniques, "Control and Dynamic Systems; Advancesin Theory and Applications" (c. T. Leondes, ed.), Vo!. II, pp. 256-388. Academic Press,New York, 1975.

30. Rauch, H. E., Solutions to the linear smoothing problem, IEEE Trans. Automat. ControlAC-8 (4), 371-372 (1963).

31. Rauch, H. E., Tung, F., and Streibel, C. T., Maximum likelihood estimates of linear dynamicsystems, AIAA J. 3 (8),1445-1450 (1965).

32. Schweppe, F. C., "Uncertain Dynamic Systems." Prentice-Hall, Englewood Cliffs, NewJersey, 1973.

33. Sidhu, G. S., and Desai, U. B., New smoothing algorithms based on reversed-time lumpedmodels, IEEE Trans. Automat. Control AC-21 (4),538-541 (1976).

34. Wall, J. E., Jr., Control and Estimation for Large-Scale Systems Having Spatial Symmetry.Ph.D. dissertation. Rep. ESL-TH-842, MIT Electronic Systems Laboratory, Cambridge,Massachusetts (August 1978).

35. Wall, J. E., Jr., Willsky, A. S., and Sandell, N. R., Jr., On the fixed-interval smoothing problem,Proc. IEEE Conf. Decision and Control, Albuquerque, New Mexico pp. 385-389 (December1980).

36. Wiener, N., "The Extrapolation, Interpolation and Smoothing of Stationary Time Series."Wiley, New York, 1949.

PROBLEMS

8.1 Consider the example used previously in Chapter 5 of Volume I of a gyro on test, describedby the scalar dynamics equation

x(t) = -x(t) + wit)

where x( " .) is gyro drift rate and w( " -] is white Gaussian noise of zero mean and variance kernelQb(r), where Q= 2 deg2/hr. The output is sampled every 0.25 hr, and this output is described as

ziti) = X(ti) + viti)

where v(', -] is a white Gaussian noise of zero mean and with power given by R = 0.5 deg' /hr2.

For this problem, we assumed initial conditions of Xo = 0, Po = 1 deg2/hr2. The state transition

matrix was found to be (ti + l' t,) = 0.78.


The optimal error covariance of the optimal filter for a two-hour interval of data sampling wasfound to be as shown in Table 8.P1.

TABLE 8.PI

o0.250.500.751.001.251.501.752.00

1.00.590.550.540.540.540.540.54

1.00.330.270.260.250.250.250.250.25

Now calculate the optimal fixed-interval smoother results for the same data. Write out the stateestimate and covariance relations for the backward filter and smoother, and actually compute thesmoother error covariance for the two-hour interval.

8.2 Repeat Problem 8.1. but use the alternate fixed-interval smoother results of (8-41)-(8-46).Demonstrate the equivalence of these two means of generating the smoothed estimate.

8.3 Reconsider Problem 8.1, but now assume that the state value at t = 0.75 hr is of specificinterest. Generate the fixed-point smoother result to estimate the gyro drift rate x at that time,based on measurements taken sequentially over the entire two-hour interval, using (8-47)-(8-52).

8.4 Repeat the previous problem, using the alternate fixed-point smoother results, (8-53)-(8-55),and demonstrate the equivalence of these two alternate computations of the smoothed estimate.

8.5 Reconsider Problem 8.1, but now compute the fixed-lag smoother result for a 3-steptime lag. Note the need of the fixed-point smoother for initialization over the first three measure-ments. Compare these results to the previous Kalman filter estimate.

8.6 Repeat the preceding problem, but for a I-step time lag. Compare performance to both the3-step fixed-lag smoother and the Kalman filter.

8.7 We have attained continuous-time filter equations as

dijdt = Fi + PHTR- 1[z - Hi]

dPjdt = FP + PFT+ GQGT - PHTR-1HP

Now derive the continuous-time fixed-interval optimal smoother relations.

(a) One means of deriving this result is to express the sampled-data results of (8-31)-(8-37)in the limit as the sample period goes to zero. In Chapter 5 of Volume I, such a limiting operationwas used to derive the filter equations above. Analogously take the limit of the backward filterequations (8-34)-(8-36), and combine these results with those of the forward filter via the analog of(8-37), to produce the smoother for continuous-time measurements.

(b) An alternate derivation of the continuous-measurement fixed-interval smoother has alsobeen suggested. Since the backward filter propagates the estimates backwards in time, let 1: = t f - t,and then write the system description,

dx(t)jdt = F(t)x(t) + G(t)w(t)

z(t) = H(t)x(t) + v(t)

PROBLEMS 21

in terms of the variable t. By so doing, the equations for the backward filter in terms of xb andP, can be written as an analog to the forward filter, with (- F) replacing F and (- G) replacing G.Knowing dPbldr, an expression can be obtained for dP; "[d«, using

dP; "[d: = - P; I (dPbldr)P; I

Writing Yb(r) = P; I(r))(b(r) and differentiating yields a "state" equation for Yb(r). Starting conditionscan be specified as for the discrete-time case, and finally the smoother relationships derived from theforward and backward filters.

One must be careful about misinterpreting these results. Proper characterization of properties ofreverse-time stochastic processes must be considered, as accomplished in [12, 15, 33-35].

8.8 Consider the system and measurement described by the scalar equations

x=w, z=x+v

where wand v are zero-mean white Gaussian noises, independent of each other, and of strengthQ and R, respectively. Show that, in steady state operation, the optimal smoothed estimate of xis just the average of the forward and backward filter estimates.

8.9 Generate a numerically precise form of fixed-interval smoother by writing the forwardfilter (8-31)-(8-33) in square root form (see Chapter 7 of Volume 1) and the backward filter (8-34)-(8-36)in square root inverse covariance form, and combining the results according to (8-37) writtenin factored form.

8.tO The concept of adjoint differential and difference equations appears in this chapter and willreappear later in optimal control problems.

Given a homogeneous linear differential equation x(t) = F(t)x(t), the associated "adjoint"differential equation is the differential equation for the n-vector p(t) such that the inner productof pit) with x(t) is constant for all time:

(a) Take the derivative of this expression to show that the adjoint equation associated withx(t) = F(t)x(t) is

(b) If «J>x(t, to) is the state transition matrix associated with F(t) and «J>p(t, to) is the state transi-tion matrix associated with [ - FT(t)], then show that

«J>.(t, to) = «J>/(to,t) = [«J>/(t, to)]-1

To do this, show that [«J>/(t,to)«J>At, to)] and I satisfy the same differential equation and initialcondition.

(c) Show that, as a function of its second argument, «J>x(t, r) must satisfy

or, in other words,

11 [«J>/(t, r)]/or = [ - F(t)T] «J>/(t, r)

(d) For discrete-time models, consider the relationship

x(t i + dTp(t,+ d = X(t,)Tp(t;)

and the homogeneous state difference equation

X(ti+d = «J>(ti+1> ti)x(t,)

to develop the associated adjoint difference equation.


(e) Adjoints can be understood in a more general linear operator sense as well. Let V and Wbe inner product spaces, i.e., linear vector spaces with inner products defined on them as (".) y

and (',')w, respectively. Let d be a linear operator (transformation, mapping, function, etc.)from V into W, i.e., d transforms any v E V into aWE W in a manner that can be represented bya simple matrix multiplication, w = Av. If there exists a transformation d* from W into V suchthat, for all v E V and WE W,

(.xlv, w)w = (v, d*w)y

then d* is called the adjoint of d. Whenever d* exists, it is unique, linear, and has the propertythat if (d*)* exists, (d*)* = d. For the previous part of this problem, V and Ware both a finitedimensional Euclidean space R", and an appropriate inner product is

(X 1,X2)Rn = x/x2

and d* is guaranteed to exist. For part (a), let dx(t) = x(t) and find d* such that

(dx(t), p(t)Rn = (x(t),d*p(t)Rn

For part (d), let dx(t i ) = x(t i + tl and find d* such that

CHAPTER 9Compensation of linear

model inadequacies

9.1 INTRODUCTION

Up to this point, we have assumed that linear models for the system dynamicsand measurement relations are adequate for developing optimal estimators.No model is perfect, and it is especially true that a linear model is the result ofeither purposeful approximation and simplification or lack of knowledge aboutthe system being modeled. Thus, although a linear model might depict thepredominant aspects of a problem, it is always an erroneous model to somedegree. Unfortunately, when an estimator is developed on the basis of suchan erroneous model, it can "learn the wrong state too well [22]" when operatedfor a long enough period of time, especially if the strengths of noises in theassumed model are small. If Qd(tJ, the strength of the dynamic driving noise inthe model, has small eigenvalues for all time, the filter-computed error co-variance matrix and gain matrix are correspondingly "small." Thus, the filter'sestimate is highly dependent upon the output of its internal model and notaffected dramatically by measurements coming in from the "real world." Tooheavy a reliance on the internal model can cause the state estimate and the"true state" values in the "real world" to diverge. In the extreme case, the filtergains are zeroed and the state estimator is totally divorced from the real world,simultaneously indicating through its computed error covariance matrix thatultimate confidence should be placed in the accuracy of its computed estimate.

There is always some discrepancy between the performance indication prop-agated by the filter (the computed P(ti -) and P(ti +)) and the actual performanceachieved in realistic applications, because the model embodied in the filtercannot be exactly correct. Such discrepancy is termed divergence [12-16, 18,19, 22, 26, 39, 41, 47, 50-53, 58, 60]. In some instances, often denoted as"apparent divergence [14]," the true estimation errors are larger in magnitudethan those indicated by the filter-propagated error covariance matrix, but the

23

24 9. COMPENSATION OF LINEAR MODEL INADEQUACIES

true magnitudes remain bounded. Better filter performance could be achievedthrough better "tuning" of the filter algorithm, or some other form of modelcompensation, or through improved numerical precision. "True divergence[14]" is a more critical case in which the filter-computed error covarianceremains bounded while the true estimation errors grow unbounded: the filternot only provides inadequate estimates, but is totally unaware of the existence ofa problem. True divergence can be caused either by unmodeled or mismodeledeffects, or by numerical instabilities. Numerical problems were treated in Chap-ter 7 (Volume 1), and here we address the problem of model inadequacy. Notethat, in order for true divergence to be due to modeling problems, the truesystem must exhibit a behavior of certain quantities growing without bound(such as position error in an inertial navigation system), since the estimatoralgorithm itself will be stable if its internal model is stochastically observableand controllable-a nonrestrictive assumption.

This chapter seeks to compensate for the inadequacies of an assumed linearmodel, thereby exploiting linear estimation concepts and insights as much aspossible, rather than to dispose of them entirely in preference of full-scalenonlinear estimation. Compensation techniques that are described in the fol-lowing sections are:

(1) addition of pseudonoise to the assumed model and artificial lowerbounding of error covariance matrix elements,

(2) limiting of effective filter memory and overweighting most recent data,(3) finite memory filtering,(4) linearized Kalman filtering,(5) extended Kalman filtering.

The first of these methods entails telling the filter that it should decrease itsconfidence in its own model. The second and third are concerned with the casein which a linear model is indeed adequate, but only for a limited length oftimepropagation. Finally, the last two attempt to exploit linear models and methodsin the case in which a substantially better depiction of the true system would bein the form of a nonlinear model.

The special case of state estimation in the face of uncertain parameters in alinear dynamics model and linear measurement model, or in the statisticaldescription of noises entering the system model, will be reserved for the followingchapter. Adaptive estimators will also be discussed in general in that chapter.Full-scale nonlinear models and estimation will be developed in detail inensuing chapters.

9.2 PSEUDONOISE ADDITION AND ARTIFICIALLOWER BOUNDING OF P

Addition of pseudonoise to the dynamics model, as by increasing the elementsof Qd(t;) for all time, was previously discussed in the context of tuning a sim-plified, reduced order Kalman filter [30]. Essentially, the dominant (linear)

9.2 PSEUDONOISE ADDITION AND ARTIFICIAL LOWER BOUNDING OF P 25

aspects of the dynamics are included in the model, and one accounts for themany neglected effects by introducing additional uncertainty into the model.By adding such fictitious noise, or pseudonoise, to the dynamics model, one"tells" the filter that it cannot neglect the incompleteness or inadequacy of therepresentation that its internal model provides of the true system.

EXAMPLE 9.1 In Sections 4.11 and 6.5 (Volume I), alternative models of a bias of b(tJ = 0and bit) = wit), with w(·,·) a zero-mean white Gaussian noise, were discussed and resulting filterperformance depicted. The latter model can be interpreted as the original bit) = 0, but with pseudo-noise added to reflect a conviction that this basic model is not totally adequate.

Without the pseudonoise addition, the filter would compute a variance of the error in estimatingbias that converged to zero. Simultaneously, the filter gain on that state channel would be zeroed,thereby precluding use of further measurement data to maintain a viable estimate. Consequently,if the "bias" parameter were in fact slowly varying, or if it underwent a sudden change of value(failure, etc.), the true estimation error might grow to a nonnegligible magnitude. This problemcontext will be repeated in examples throughout this chapter. •

Pseudonoise can similarly be added to the measurement relations to reflectuncertainty in the adequacy of the assumed measurement model as well.

EXAMPLE 9.2 Suppose that an adequate model for a measurement of some state x. were

z(ti) = x.(ti) + b(t,) + n(t,J + viti)

where b( '.' Jis a bias, n(" .) is a time-correlated noise, and v(',' Jis a white Gaussian noise of strengthR(t ,) for all t i . Ifa filter were to be based on a reduced-order model that eliminated band n as statevariables, the measurement model would become

ziti) = x.(t,) + v(t,)

The strength R(t,) should be increased to account for these neglected effects. •

Many approaches have been suggested as "systematic" means of addingpseudonoises [12, 14, 22, 52]' However, most are basically ad hoc proceduresthat require iterative adjustment (guided by physical insights where possible)until the filter performance observed in a performance sensitivity analysis(seeChapter 6, Volume 1) is acceptable.

A somewhat more formal result is provided by the concept of a minimumvariance reduced order (MVRO) estimator [4,20,62]' Suppose that a truthmodel description of a system were

Xt(ti+ 1) = <lJt(ti+l,ti)Xt(tJ + wdt(tJ

Zt(ti) = Ht(ti)xt(ti) + vt(tJ

(9-1a)

(9-1b)

where x, is of dimension nt • Assume we want to develop an n-dimensional filterin terms of states

(9-2)

(in terms of the notation of Chapter 6, the desired quantities are the entirefilter state, so that yet;) = x(ti),C(t i ) = I). An estimator is chosen to be of the

26

form

9. COMPENSA nON OF LINEAR MODEL INADEQUACIES

x(t j+) = x(t j-) + K(tj)[Zt(tJ - H(t;)x(t i - )]

X(ti+l) = (tj+t>t;)x(t/)

(9-3a)

(9-3b)

where K, H, and are to be chosen so as to minimize the mean square errorfunction tr[E{et(t/ )e/(t/)}] for all sample times t., where the error et(t/)is given by

(9-4)

Generation of E{et(t/)etT(t/)} by a set of expressions as in the covarianceanalyses of Chapter 6 and differentiation with respect to K, H, and «D thenyield the general MVRO estimator. Constrained MVRO estimators, in whichH and are selected by the designer and optimization is performed only withrespect to K, or in which additional assumptions are imposed to simplify theresulting estimator form, have also been applied in practice.

EXAMPLE 9.3 A constrained MVRO estimator of practical importance is one in which C,is of the form [I : OJ, such that the estimator state forms the first n variables in the truth modelstate. Thus, the truth model becomes

and the estimator provides estimates of X,l. A further constraint is imposed that impulsive feedbackcorrection of an estimated variables is employed. Under these assumptions, a covariance analysiscan be performed using the nt-dimensional [(Xtt - X)T : X;2Y instead of the (n, + n)-dimensionalvector [X,T : XTJT as in Chapter 6, and the optimal choice of H and IIIin the filter are H'l and lIl,t b

respectively. The vector [(Xtt - X)T : X[2JT satisfies the same dynamics as [X[l : X[2Y and updatesaccording to

Letting P, be the associated covariance matrix, partitioned as

P, = [!'-!j~-~!~~JPtt 2 I P'22

the MVRO estimator gain, found via a{trP,~tl/aK = 0, is

K = [I : O]p,-H,T{H,P,-H/ + R,}-1

9.2 PSEUDONOISE ADDITION AND ARTIFICIAL LOWER BOUNDING OF P 27

where

R' = HtlP,l2Hi2 + H'2P'I~Hil + H'2P'22Hi2

K' = P'12Hi2{HtlP'llHi, + R, + R'}-1

From the dynamics equation for [(Xtl - sW : Xi2Y' the covariance P, propagates according to

P,(t'-+I) = CIl,(t,+I' t,jP,(t,+)CIl,T(t'+I,t,) + Qd,(t,)

from which the upper left partition can be extracted as

P'll(t,-+ tl = CIltll(t,+ l' t,)Ptl1(t/ )CIlil1(ti+ I' t,) + Qdll1 + Qd'

Qd' = CIl'l1P'~2C1liI2 + CIl'J2P,~~CIlil1 + CIl, J2Pi22C1li12

Such an MYRa estimator design was conducted in the design of the reduced-order navigationKalman filter (in U-D covariance factorization form) for the Offensive Avionics System (OAS)intended for Air Force operational use in the 1980s [62]. •

As seen in the preceding example, if C, in (9-2) is of the form [I : 0], then theMYRO estimator has a structure of the usual Kalman filter state equationsand covariance update equations, but with modified covariance propagationand gain relations as

K(ti) = P(t i-)HT(t;)[H(t;lP(t; - )HT(ti) + Rt(t;) + R'(t i)]- I + K'(t;l (9-5a)

P(t i-+ I' t;l = «I>(ti+ I)P(t i+)«I>T(ti+ I, til + [CtQdt(ti)CtT

] + Qd'(ti) (9-5b)

where the structure of R'(ti) and Qd'(ti) to describe appropriate pseudonoiseaddition and the structure of K'(r.) to delineate additional appropriate filtergain compensation are completely defined by the MYRO estimator solution.These results provide a design goal for performance of a compensated filterof given reduced-order dimension and state selection, a more realistic designgoal than that provided by the full-order Kalman filter based upon the truthmodel. Since Pt l 2 and Pt 2 2 are not computed explicitly by the online filter,the MYRO estimator result can be used either for implementations employingprecomputed gains or for insights into the structure of online approximationsto R', Qd', and K'.

Thus, complete compensation for unmodeled errors is seen to require bothpseudonoise addition and additional gain compensation. The structure of anadded gain compensation will be seen again in Section 9.3, where the Schmidtepsilon technique is discussed.

A technique closely related to pseudonoise addition is the artificial lowerbounding of computed error covariance matrix elements. If the calculated valueof P(t/) embodies very small entries (or eigenvalues) then this method arti-ficially increases these entries before time propagation rather than artificiallyincrease the elements of Qd(ti) for the propagation itself as above. The overalleffect, though, is essentially the same: subsequent values of P(t j -), P(t/), andK(t) are "larger," and the filter places less relative weighting upon its owninternal model. One means of implementing this technique is to preselect the

n minimum admissible values for the diagonal terms of P+, denoted as ai-min,a~_min' ... , a;-min, and also a maximum correlation coefficient rmax' In onlinefilter operation, the standard calculation of P(t/) is performed, followedimmediately by

PMt/) = max{Pjj(t/), aI-min} (9-6a)

for j = 1,2, ... , n. In other words, if the computed value falls below aI-min,it is artificially increased to the minimum admissible value. The off-diagonalterms of P*(t i +) are then generated from

(9-6b)

p* t + _ {Pjk(t i) if IPjk(tj +)1 < Mjk(t j )

jk( i ) - sgn[Pjk(t/)].JMjk(t;) otherwise (9-6c)

The P*(ti+) matrix is then used in place of P(t/) in the filter. Although thisdoes prevent diagonal terms from going too small (or negative, due to numerics)and the correlations from becoming too large (yielding potential singularityproblems), this is totally an ad hoc procedure, and P(t j +) generally does notprovide any form of optimal approximation to the true estimation errorcovariance.

It should be noted that both of these methods are used to circumvent numer-ical difficulties in a Kalman filter as well as to account for modeling short-comings.

9.3 LIMITING EFFECTIVE FILTER MEMORYAND OVERWEIGHTING MOST RECENT DATA

Suppose we believe that a given linear system is adequate over certainlengths of time, but we have low confidence that the same model accuratelyportrays propagations over long time intervals. For instance, a linear perturba-tion model might be adequate for a satellite orbit determination problem whilethe satellite is in view and telemetry data is being processed continuously, butthe same model might be inadequate for propagating between intermittentdata and will probably be grossly unsatisfactory for propagating between thetime the satellite goes below the horizon and when it reappears on the oppositehorizon. Conceptually, we would like to eliminate the effect of older data from acurrent state estimate if that data is thought to be no longer meaningful, dueto the erroneous system model degrading the information in measurements fromthe distant past.

One means of limiting effective memory length is called exponential age-weighting of data [2,9, 14, 17,22, 36, 37, 45, 46, 49, 59, 61], as developedoriginally by Fagin [9]. Basically, as one moves along in time, the assumedstrengths of measurement corruption noises for prior measurements are arti-

9.3 LIMITING EFFECTIVE FILTER MEMORY 29

ficially increased before their influence is brought to bear on the current stateestimate. Suppose you are at time tb and a conventional Kalman filter algorithmwould provide a state estimate based on a sequence of measurement noisestrengths R(ttl, R(t2 ), ••• ,R(t;). Let s be some real number greater than one,and generate a new sequence of assumed noise strengths using

Ri_aged(t) = S(i- j)R(tj) (9-7)

for j = 1, 2, ... , i. It is convenient to think of the factor s as

s = el'1t/T. (9-8)

where tit is the (constant) measurement sample period (t i+ 1 - til, and T', isthe exponential age-weighting time constant: the smaller the value of Ta , thefaster the prior data will be "aged," and thus "forgotten" by the resulting filter.

Figure 9.1 portrays the result of this technique applied to the case of sta-tionary noise statistics, R(t j ) = R for all t j • At time t i, the modified noise strengths

t,

t,

FIG. 9.1 Computed Ri_aged(l) and R(i+ ll-aged(lj ) values. (a) At time t.. (b) At time t, + ,.

30

would be

9. COMPENSATION OF LINEAR MODEL INADEQUACIES

R. (t.) = e(i-j).1.t/T'Rr-aged } (9-9a)

for j = 1,2, ... , i, as in plot (a). The estimate of x(t;) would then be generated,conditioned on the measurement history Zi using this description of corruptivenoises. At the next measurement update time t;+I' a new computation of noisestrengths would yield

R . (t.) = e(i+ 1- j).1.t/T'R(,+ 1 )-aged ) (9-9b)

for j = 1,2, ... , (i + 1). The estimate of x(t;+ d would then be based upon thesame measurement history Z; plus the additional z;+1, but with a totallydifferent assumed history of noise strengths.

Such a concept might at first seem to preclude a simple iterative algorithm.However, if the optimal estimates of x(t;) and x(ti+ 1) are generated indepen-dently and compared, it can be shown that a recursion almost identical to theconventional Kalman filter can be used to implement the desired estimator. Forpropagating between measurement times, the computations are

x(t;-+ 1) = (t;+ b t;)X(t; +) + Bd(t;)u(t;)

P(t;-+d = S«J)(t;+ 1, t;)P(t/ )T(ti+ 1,t;) + Gd(t;)Qd(t;)G/(t;)

and for measurement updating,

K(t;) = P(t; - )HT(t;)[H(t;)P(t;- )HT(t;) + R(t;)]-1

x(t/) = X(t;-) + K(t;)[z; - H(t;)x(t;-)]

P(t/) = P(t; -) - K(t;)H(t;)P(t; -)

(9-lOa)

(9-lOb)

(9-11a)

(9-11b)

(9-11c)

These relations are identical in form to those of a conventional Kalman filter,except for the age-weighting factor in the time propagation equation (9-10b).However, the notation P instead of P has been adopted purposely, because itcan be shown that P in these relations is not in general the estimation errorcovariance matrix.

EXAMPLE 9.4 Recall the bias estimation problem previously considered in Example 9.1.Assume that discrete-time measurements of some unknown constant (bias) are available in theform of

z(t,) = b + v(t,)

where b is the bias value and v(',·) is a zero-mean white Gaussian noise of strength R. Propose abias model as

b(t) = 0

where b(to) is assumed to be a Gaussian random variable with mean Eo and variance P bO ' Thus, forusing (9-10) and (9-11), we identify

<IJ = 1, H = I,

9.3 LIMITING EFFECTIVE FILTER MEMORY 31

In the previous example, the gain of a conventional filter based on such a model was shown toconverge to a steady state value of zero. Here we have, from (9-10) and (9-11),

p(tj~ d = sP(tj +)

_ + _ _ P(t j-+d2

P(ti+') = P(ti+d - =----P(ti+d+ R

_ + S2p2(t j+)= sP(t ) - --=----c---

, sP(t/) + R

sP(ti+)R

sP(t/) + R

For steady state performance, we solve P(tt+ d = P(t, +), which yields

RP + SP2 = sPR --+ SP2 + RP(I - s) = 0

--+ P = R(I - lis)

Thus, the steady state gain is

sP sR(l - lis) IK" = -_-- = = I - -

sP + R sR(1 - lis) + R s

For any s > I, i.e., for any finite T. in (9-8), K" is greater than zero, as desired. •

Other forms of "fading memory" filters have also been derived [37,59]that are algebraically equivalent to (9-10) and (9-11). Moreover, matrix weight-ing factors have been considered in addition to scalar factors in order to providedifferent rates of fading for different filter channels [2, 46].

A similar compensation method is the overweighting of the single mostrecent measurement, again in an effort to discount the effect of previous datainformation being propagated through an erroneous dynamics model. By over-weighting the current measurement (or, underweighting prior data relatively),the values of P and K calculated by the filter are prevented from becomingunrealistically small. One means of accomplishing this objective for the caseof scalar measurements is the Schmidt epsilon technique [14, 22, 26, 54, 55],developed to resolve some of the difficulties encountered in designing theKalman filter for the C-5A aircraft navigation system.

Let us propose a modification to the state update equation as

x(t/) = x(ti-) + K(tJ[Zi - H(tJx(ti-)] + e'Ai(t/) (9-12)

where Ai(t; +) is the estimate of [x(t;) - x(t; -)] based only on the measurementz(ti,w) = z, and e' is a scalar scale factor to be determined. The least squaresestimate of [x(tJ - x(t; -)] based only upon the scalar measurement residual[Zi - H(t;)x(ti-)] can be shown to be (with R -l(ti) chosen as the weightingfactor in the cost function to be minimized):

Ai(t/) = [HT(t;)R -l(t;)H(ti)]#HT(t;)R -l(t;)[Z; - H(tJx(t; -)] (9-13)

where # denotes the Penrose pseudoinverse [43, 44]. Note that the n-by-nmatrix [HT(ti)R -l(tJH(ti)] is ofrank one since H(tJ is I-by-n, and thus it does


not have an ordinary inverse. Moreover, in this case,

[HT(tJR -1(tJH(ti)] # = H#(ti)R -1 #(ti)HT#(tJ

= H#(tJR(ti)HT#(ti)

where

(9-14a)

(9-14b)

since then the l-by-n H(t i) times the n-by-l H#(ti) yields the l-by-l identity,one, as would be desired of an "inverse" for H(tJ Putting (9-14) into (9-13)produces

(9-15)

Note that the vector measurement case does not admit the simple forms of(9-14b) and (9-15) because [H(tJHT(tJ] would no longer be a scalar.

Now substitute (9-15) into (9-12), defining the Schmidt c parameter forconvenience, as

to write the modified update relation as

x(t.+) = x(t.-) + [P(ti-)HT(ti) + c[R(ti)HT(ti)/H(ti)HT(ti)]] [z. - H(t.)x(t.-)]I I H(ti)P(ti )HT(ti) + R(ti) I I I

(9-16)

Thus, the effective gain of the modified filter is

(9-17)

where Koverweight(ti) is proportional to c. The valid range of c is from °to 1.When c = 0, the conventional Kalman filter results. On the other hand, whenc = 1, premultiplying (9-16) by H(t i ) yields

H(tJx(t/) = H(ti)x(ti-) + [1][zi - H(ti)x(ti-)] = z,

In other words, the estimate x(ti+) is computed such that H(ti)X(t i+) is forcedto agree exactly with the measurement data. The appropriate value for s fora given application must be determined through a performance analysis. Itcan be shown that the corresponding covariance of the estimation error is

P + P + c2R2(t;)HT(t

i)H(tJ 18)(ti )modified = (r, )Kalman + [H(ti)P(ti-)HT(ti) + R(tJ][H(t;)HT(t

i)] 2 (9-

9.4 FINITE MEMORY FILTERING 33

EXAMPLE 9.5 Returning to the bias estimation problem of Example 9.1, the steady stategain of the filter would not converge to zero, but to

K" = 0 + e[(R . 0/(1 . 1)]/0 . 0 . I + R) = s •

9.4 FINITE MEMORY FILTERING

All estimators discussed up to this point are of the growing memory type:x(ti +) is an estimate of x(t i) based on the initial condition information and allmeasurements to time t., Zt> ZZ, ... , Zi' As time progresses, the estimator"remembers" an ever-growing number of measurements. However, if a systemmodel were thought to be valid over a time interval equal to N sample periodsbut inadequate for longer durations, it would be inappropriate to have sucha growing memory. For example, slowly varying parameters might be modeledas "essentially constant over the most recent N sample periods," whereasmodeling them as constants for all time would mismode1 true behavior. Thus,one might seek an estimate of xU;) that is based only upon the most recentN measurements.

EXAMPLE 9.6 Suppose a parameter of interest is modeled as a constant bias.

x(t) = k = const for all t

and it is desired to estimate its value from discrete-time measurements (with sample period Ml.modeled as the parameter plus zero-mean white Gaussian noise:

Z(ti) = X(ti) + V(ti)

One reasonable estimator would be a simple averager:

Now let the true parameter be best modeled as a bias plus a ramp.

x,(t) = k + at

In this case. the average estimation error would be

I i

E{X,(ti) - x(t,}} = E{[k + at'] - ~ L [k + at j + v(t j )]} = !aMU - 1)I j= 1

This is a quantity which grows without bound.If, instead. the estimator averaged only the most recent N data points.


then the mean error due to the erroneous model would be

As shown in Fig. 9.2, this does not grow unbounded in time.

x

k

i s »i> N

•

IN

FIG. 9.2 True and estimated state values for Example 9.3.

This section develops finite memory filters, which provide estimates of theform

E(x(talz(ti - N + bWj) = Zi-N+b Z(ti-N+2,W j ) = Zi-N+2'" ., z(ti,w) = zJwhere N is some fixed integer chosen for a particular application [21, 22, 56, 57].Other names for filters of this type are "limited memory filters," and "slidingarc" or "sliding window" filters, since they involve a state estimate based on asliding arc of the N most recent measurements, denoted as xN(t/).

As in the case of growing length filters, steady state operation is possibleif the assumed model entails a time-invariant system and stationary noisestatistics. However, a finite memory filter can exhibit instability problems thatare not encountered in the corresponding growing memory filter (the systemmodel must now be stochastically observable and controllable with respectto any time interval defined by N consecutive sample times, i.e., of length equalto (N - 1) sample periods, for the filter to be stable) [22, 57].

A filter of this form can always be implemented by maintaining storage ofthe N most recent measurement values, and performing an N-step recursion(N updates, N - 1 propagations) of an inverse covariance filter {to allow algo-


rithm startup with no assumed "a priori" information about the state) eachtime the real system progresses one sample period. However, this can place asevere,ifnot prohibitive, computational load upon the computer. Consequently,an equivalent one-step recursion would be extremely desirable, in the generalform [57] of

(9-19)

To derive such a result, assume xN(tt- d is known; note that it is based onZi-N' Zi-N+ 1, ... , Zi-1' Now calculate XN+ 1(t/), based on the N measurementsjust delineated plus Zi' by means of a time propagation and measurementupdate, expressing the result in terms of xN(tt-d and Z(ti'Wj) = Zi' Second,assume xN(t/) is known, and calculate XN+1(t i+) in terms of xN(t/) andZ(ti-N,Wj) = Zi-N' Finally, equate the two expressions for XN+1(ti+ ) to obtaina recursive relationship of the form given by (9-19).

By looking at the inverse covariance form of the filter, this concept can beinterpreted in an appealing manner. As one steps from time t i - 1 to time t.,the information in z(t;, w) = z, is added to the propagated state estimate andthe information in Z(ti-N,W) = Zi-N is subtracted from it, recalling thatHT(ti)R -l(t;)H(t;) is the "information" embodied in the current measurement.Thus, even though (9-19) is recursive, it is still necessary to store the most recentN measurements for processing, since knowledge of Zi-N is required at time t..

Although (9-19) seems to be an attractive formulation, useful recursionrelations of this structure have not been generated for the exact solution tothe general case. In fact, the calculations of C1 , C,, and C3 are so complexthat the straightforward N-step recursion is preferable. However, in the par-ticular case of no dynamic noise (Qd(ti) == 0 for all t i), such a form can be derived,and these have found sufficient applications to warrant further study.

One form of the finite memory filter for Qd(t;) == 0 can be derived [56, 57]as the maximum likelihood estimate that maximizes the likelihood function

L(~, :!ri. i-N+ 1) = In{fz(t,).... ,Z(t'-N+ tllx(t,)('i" .• ,'i-N+ 11~)} (9-20)

This is, in fact, an appropriate choice oflikelihood function to obtain an estimateofx(t i) based only on information contained in the most recent N measurements,with no "a priori" information about X(t i - N+ 1)' For algebraic simplicity, thedeterministic input will be neglected here, but then added later to the finalresult. By repeated application of Bayes' rule, the density function in (9-20)can be expressed as

fz(lil, ... ,z(t, - N + tl!X(I,) = h(lil!Z(I, - i), ... ,Z(I, - N + I), X(t,).fz(t, - 1), •.• ,z(t, - N + I )Ix(t,)

i

n .fz(tjJlz(tj- t ), ••• ,z(t, - N + r ), x(t,)j=i-N+ 1

(9-21)


where the term evaluated for j = i - N + 1 is !z(t'-N+ dlx(t;l('i-N+ 11 ~). For thecase of Qd(t;) == 0, conditioning on the realization ofx(t;) completely determinesthe densities appearing in (9-21), independent of z(tj_ d, .. " Z(t;-N+ d:

!z(tj)lz(tr t ), ••.• Z(ti - N + d. X(ti)('j I'j- 1, ... ,'i-N + 1'~)

= j~(tJ)IX(t)'j I~)

1 { 1 [ JT - 1(2nt/2IR(t)ll/2 exp -2 'j -H(tj)«I>(tj, t;)~ R (t)

x ['j - H(t)«I>(tj,t;)~J} (9-22)

Therefore, the likelihood function in (9-20) can be written as

Nm 1 i

L(~'~i,;-N+d= -2In(2n) - "2 L InlR(tillj=i-N+l

(9-23)

The maximum likelihood estimate is generated as the solution to

This yields

XN(ti+) = .F- l(t i, t, -N + l{j=;~ + 1 «DT(tj' ti)HT(t)R -1(t)'j] (9-25)

where 'j is a realized value of z(tj) and .F(ti,ti-N+ 1) is the N-step informationmatrix, defined as

.F(ti, ti- N+ d = L «I>T(t j, t;)HT(t)R - 1(t)H(t) «I>(tj, t;) (9-26)j=i-N+ 1

This N-step information matrix is a measure of the certainty of the state estimatedue only to the N most recent measurement values, analogous to the growinglength information matrix .F(t;,td discussed in Section 5.7 (Volume 1). To beassured of the existence of .F- 1(t;,t;_N+d in (9-25), .F(t;,t;-N+l) must be offull rank for all t.: unlike .F(t;, t 1), the rank of .F(ti , t;-N+ 1) is not necessarily anondecreasing function of ti. For the first N sample times, .F(ti, ti- N+ tl isreplaced by .F(t;, t 1), and in general there is some minimum number of sampletimes greater than one before .F(ti, t 1) is of rank n, since there are usually morestates than measurements (n > m).

A recursion can be developed for the N-step information matrix, for i > N,by writing ~(ti' t i- N+ 1)and ~(ti-1> t i- N) through (9-26) and equating like terms,to yield

~(ti' t.: N+ 1) = <l>T(t i _ 1, ti)~(t;_ l' t.: N)(ti-l, t i) + HT(tJR -1(ti)H(t;)

- «DT(ti_ N, ti)HT(ti_N)R -1(ti_N)H(ti_N)«D(ti_N' tJ (9-27)

Thus, to obtain ~(t;,t;-N+l) from the one-step propagated ~(ti-l,ti-N)' addthe information obtained from ZIti) and subtract the information due to Z(t;-N)'Similar algebraic manipulation of (9-25) yields a recursive state estimaterelation as

XN(ti +) = (t;, t;-I)XN(tt-l) + ~-I(ti' t i- N+1)

x {HT(tJR -1(ti)[Zi - H(t;)(t;, t;-I)XN(tt- dJ- «DT(ti_ N, tJHT(t;_N)R -1(t;_N)

x [Zi-N - H(t;-N)(ti-N,t;-dxN(tt-dJ} (9-28)

In (9-28), {H(tJ(t;, t;-I)XN(tt-I)} is the optimal prediction of Z; before thatmeasurement is taken, as propagated from XN(tt-l), and {H(ti-N)«D(ti- N,t;-I)XN(tt- I)} is similarly the best estimate ("backward prediction") of z,- N asgenerated from XN(tt-l)' The two bracketed terms in (9-28) are thus residualsat time t, and t i - N as propagated from the best state estimate at time t i - 1 •

Allowing deterministic control inputs extends (9-28) to

XN(ti-) = «D(ti,ti-dxN(tt-d + B d(ti- 1)U(ti-d (9-29a)i-I

XN(t;-N/t;- d = «D(t;_N,t;_ dxN(tt- d - L «D(ti-N,tj)Bitj- du(tj- dj=i-N+ 1

(9-29b)

XN(t/) = XN(ti-) + ~-l(ti,t;_N+l){HT(tJR-l(tJ[Zi - H(ti)XN(ti-)]

- <l>T(ti_N,tJHT(t;_N)R -1(ti_ N)[Zi-N - H(ti-N)XN(ti-N/ti- dJ}(9-29c)

Thus, besides requiring storage of the most recent N measurement values, theH, R-1, «D, Bd , and u values must also be stored and an n-by-n matrix inversionperformed each iteration, a significant computer burden.

Another formulation [21,22J of the finite memory filter for Qd(ti) == 0 isgiven by

XN(t;+) = P N(t i+)[P-1(t/)x(t/) - P- 1(tdti-N)X(tdti-N)J

PNl(t/) = P- 1(t/) - P- 1(tdti_ N)

(9-30a)

(9-30b)

where x(t/) and P(t/) are the state estimate and error covariance of a con-ventional growing-memory filter, and X(tdti- N) and P(tdti- N) are the optimal

prediction of X(ti) based on the measurements up through Z(ti-N,W) = Zi-Nand covariance of the error [X(ti) - x(tdti- N)], given by

i

X(tdti- N) = «I>(ti, ti-N)X(tt-N) + L «I>(t;, t)Bd(tj- du(tj- d (9-31a)j=i-N+ 1

P(tdti- N) = «I>(ti,ti-N)P(tt-N)<DT(ti,ti-N) (9-31b)

For (9-31), the values of X(tt-N) and P(tt-N) could be obtained either fromstorage of these values from the conventional filter or without storage from asecond conventional filter running simultaneously. This result is expressed asthe weighted difference of two growing-memory filter estimates, the structurebeing readily interpreted in a manner analogous to that used for the smootherresults, (8-11)and (8-12),of the previous chapter. Unfortunately, such a structureembodies two growing-length memory filters, each of these being the very set oferroneous and potentially divergent calculations we seek to avoid throughfinite memory filtering.

Nevertheless, this form provides insights into a practical approximatemethod [21, 22] that substantially reduces the computational and storageburden of finite memory filters for the case of Qd(t i) == O. In this method, olddata is eliminated in batches of N, so that the memory length varies between Nand 2N measurements. The procedure starts by iterating a conventional Kalmanfilter algorithm N times to generate x(tN+) and P(tN +), and these two valuesare put into storage. Then the conventional filter is run to time t2N, producingX(tiN) and P(tiN) based upon the first 2N measurements. Using the quantitiespreviously stored, the prediction X(t2N/tN) and associated covariance P(t2N/tN)are computed as

2NX(t2N/tN) = <D(t2N, tN)X{tN +) + L «I>(t2N, t)Bd(tj-1)u(tj- d (9-32a)

j=N+ 1

P(t2N/tN) = «I>(t 2N, tN)P(tN+)«I>T(t2N, tN) (9-32b)

Then xN(tiN) and PN(tiN) are calculated from (9-20) as

PNl(tiN) = P-1(tiN) - P- 1(t2N/tN) (9-33a)

xN{tiN) = PN(tiN)[P-1(tiN)X(tiN) - p-l(t2N/tN)X(t2N/tN)] (9-33b)

Now xN(tiN) and PN(tiN) are put into storage for future use and also used asinitial conditions for a conventional Kalman filter algorithm, which is run outto time t3N. At this time, the prediction X(t3N/t2N) and covariance P(t3N/t2N)are computed analogously to (9-32), replacing x(tN+) by xN(tiN) and P(tN+) byPN(tiN). These results are combined analogously to (9-33) to approximateXN(ttN) and PN(ttN). The recursion continues in this fashion. Estimates are infact available at every measurement sample time, not just every N points, and

9.5 LINEARIZED AND EXTENDED KALMAN FILTERS 39

these estimates will be based upon (N), (N + 1), ... ,(2N - 1), (N), (N + 1), ...measurements iteratively. Such an approximate filter has been applied withsuccess to orbital determination problems and other applications [21, 22].

9.5 LINEARIZED AND EXTENDED KALMAN FILTERS

Consider an estimation problem involving a continuous-time system withdiscrete-time measurements. Unlike cases discussed previously, assume that alinear model does not provide a valid description of the problem, i.e., thatnonlinearities in the deterministic portion of the state dynamics and measure-ment models are not negligible. Let the system state be well modeled as satisfyingthe nonlinear stochastic differential equation

dx(t) = f[x(t), u(t), t] dt + G(t) dp(t) (9-34a)

where f'(>, " .) is a known n-vector of functions of three arguments, u(·) is anr-vector of deterministic control input functions, and P(· , .) is Brownian motionwith diffusion Q(t) for all t E T. Equation (9-34a) is a relation for the stochasticdifferential dx(',') in the sense that integrating the right hand side of the equa-tion yields the manner of time evolution of the x(', .) process. It can also bewritten in white noise notation as

i:(t) = f[x(t), u(t), t] + G(t)w(t) (9-34b)

where w(',·) is a zero-mean white Gaussian noise process with covariancekernel

E{w(t)wT(t + T)} = Q(t) (j(T) (9-35)

In analogy to Section 2.3 (Volume 1), f(', " .) is assumed Lipschitz in its firstargument, continuous in its second argument (and u(·) is assumed piecewisecontinuous), and piecewise continuous in its third argument. More generalnonlinear stochastic differential equations will be developed in detail in Chapter11, at which time sufficient conditions for the existence of a solution will bediscussed in greater detail. At this point, we view (9-34) as a modest gener-alization of models considered previously, replacing [F(t)x(t) + B(t)u(t)]by f[x(t),u(t),t]. Note that the dynamic driving noise, P(',·) in (9-34a) orw(',·) in (9-34b), is still assumed to enter in a linear additive fashion. This isrepresentative of a wide class of practical applications and warrants specialconsideration, especially since the insights from linear estimation will bedirectly exploitable.

The initial condition x(to) for (9-34) is assumed to be a Gaussian randomn-vector with mean Xo and covariance Po. As is true of the dynamic noise, theGaussian assumption is motivated in part by the desire to exploit linear estima-tion theory; the x(',·) process described by (9-34) willnot in general be Gaussian.


Let the discrete-time measurements be modeled in general as a knownnonlinear function of the state plus linearly additive noise corruption for allti E T, as

z(tJ = h[x(tJ, ta + v(tJ (9-36)

where h[', .] is a known m-vector of functions of state and time, and v(', .) is awhite Gaussian noise sequence of mean zero and covariance kernel

(9-37)

Again, this is a modest increase in complexity over previous models, replacing[H(ti)X(ti)] by h[x(ti), tJ In fact, there are many applications in which eitherfor h in the adequate system model is in fact a linear function.

Given such a system model, it is desired to generate an "optimal" stateestimate. This will be viewed later in the context of estimation with generalnonlinear models, but here we seek to apply the previously developed linearestimation results directly. To do so, assume that we can generate a nominal(reference) state trajectory, xn('), for all time t E T, starting from the initialcondition xn(tO) = XnO and satisfying the deterministic differential equation:

Xn(t) = f[xit),u(t),t] (9-38)

Here f( ., ., .) and u(·) are identical to those in (9-34). In fact, we could assumethat xn(·) were driven by some nominal deterministic control function un(·)different from u('), but we will confine our attention to un(t) ,@, u(t) for all t E Tto maintain adequacy of linear perturbation techniques as much as possible.Associated with such a nominal state trajectory would be the sequence ofnominal measurements,

(9-39)

where h[ ., .] is as given by (9-36).Now consider the perturbation of the state from the assumed nominal

trajectory: [x(t) - xn(t)] for all t E T. This is a stochastic process satisfying

[x(t) - xn(t)] = f[x(t), u(t), t] - f[xn(t), u(t), t] + G(t)w(t) (9-40)

i.e., a nonlinear stochastic differential equation written in white noise notation,with xn(t) given for all t ETas the solution to (9-38). Equation (9-40) can beexpressed as a series by expanding about xn(t):

af[x u(t) t]![x(t) - xit)] = " [x(t) - xn(t)] + h.o.t. + G(t)w(t) (9-41)ax x=xn(t)

where the zero-order term in the Taylor series for f[x(t), u(t), t] has been can-celed by the second term in (9-40) and where "h.o.t." are terms in powers of


[x(t) - xn(t)] greater than one. A first order approximation to this equation,called the variational or perturbation equation, can be written as

&x(t) = F[t; xn(t)] ox(t) + G(t)w(t) (9-42)

where ox(', .) is a first order approximation to the process [x(', .) - xn( ' ) ] , andF[t; xn(t)] is the n-by-n matrix of partial derivatives of f with respect to itsfirst argument, evaluated along the nominal trajectory:

F[t;xn(t)] ~ of[x,u(t),t] Iox x=xn(t)

(9-43)

The solution to Eq. (9-42) is a viable approximation to the solution of (9-40)as long as the deviations from the nominal state trajectory are small enoughfor the higher order terms in (9-41)to be negligible (the strength ofw(·,·) mightbe increased somewhat to account for these terms). The appropriate initialcondition would thus be to model ox(to) as a Gaussian random variable withmean [xo - xnO] and covariance Po; Xo and X n O are usually equal, so E{ox(to)}is typically O. Note that, had we allowed u(t) to assume a value different thanun(t), with t5u(t) ~ [o(t) - un(t)], (9-42) could be expanded to

&x(t) = F[t; xn(t), uit)] ox(t) + B[t; xn(t), un(t)] t5u(t) + G(t)w(t) (9-44)

with B[t; xn(t), un(t)] as of/ou evaluated along the nominal. However, lettingt5u(t) be nonzero compromises the validity of small perturbation approxi-mations unnecessarily, and we shall therefore assume t5u(t) == 0 for all t E T, asmentioned before.

In a similar manner, we can consider the measurement perturbations foreach time t i , through (9-36) and (9-39), as

[z(t i) - zn(ti)] = h[x(t i), t;] - h[xn(t;), t;] + vet;) (9-45)

A linearization of this relation yields the perturbation measurement model of

(9-46)

where H[ti ; xn(ti)] for each t, is the m-by-n matrix of partial derivatives of hwith respect to its first argument, evaluated along the nominal trajectory:

H[ .' .)] ~ _oh--:::-[x_'t-=.:::;] It" xn(t, '"

uX X=Xn(ti)

(9-47)

Thus, oz(·,·) is an approximation to the difference process [z(',·) - zn(')]that is valid to first order. A realization of the actual measurement processmodel, z(t i , w) = z., will be different from zn(t;) because of two effects, thenoise corruption v(ti,wj ) and the fact that h[x(ti,w), t;] is not in general equalto h[xn(ti), tJ Additionally, oz(ti, w) will differ from [z(t;, w) - zn(ti)] becauseof the approximations inherent in (9-46).


Now one can consider applying linear filtering theory to the system modelprescribed by (9-42) and (9-46). Using the a priori nominal trajectory xi'),i.e., the (precomputable) solution to (9-38), the required matrices F[t; xn(t)]and H[ti;xn(ti)] can be evaluated, provided the derivatives involved exist. Theinput "measurement" for this filter at time t, is in fact the difference value[z(ti,w) - zn(t;)]. The output of such a filter would be the optimal estimateof (jx(t) for all t E T, denoted as ~(t), and this could be added to the nominalvalue xn(t) to establish an estimate of the total state:

(9-48)

This form of estimator is called the linearized Kalman filter, or perturbationKalman filter. It is computationally advantageous compared to an "optimal"nonlinear filter, but it can suffer from large magnitude errors if the "true" andnominal trajectories differ significantly. In fact, the feedforward error statespace Kalman filters of Chapter 6 (Volume 1) were examples of this config-uration.

The basic idea of the extended Kalman filter is to relinearize about eachestimate i(ti +) once it has been computed. As soon as a new state estimate ismade, a new and better reference state trajectory is incorporated into theestimation process. In this manner, one enhances the validity ofthe assumptionthat deviations from the reference (nominal) trajectory are small enough toallow linear perturbation techniques to be employed with adequate results.The extended Kalman filter will now be derived using the insights from thelinearized Kalman filter, and then the actual algorithm for implementationwill be summarized and applied to realistic problems.

First consider the linear filter used to estimate ox(') in the development ofthe linearized Kalman filter above. If we have just processed the measurementz(ti,w) = Zi to obtain i(t;+), let us relinearize about the trajectory startingfrom that value instead of xn(tJ Let xn(t/ti) denote the solution to the nominaltrajectory differential equation but starting from the "initial" condition ofi(ti +),i.e., the solution over the interval [t,ti + 1)of

Xn(t/ti) = f[xn(t/ti), u(t), t]

xn(tdt;) = i(t; +)

(9-49a)

(9-49b)

At time t.; just after measurement incorporation and relinearization, the bestestimate of (jX(ti) is (because of the relinearization process):

JX(t;+) = 0 (9-50)

To propagate the state perturbation estimate to the next sample time t, + 1,

we would employ the relinearized evaluation ofF, F[t; xn(t/ti)]. Ifwe let t5X(t/t;)denote the estimate of (jx(t) based on the measurements through z(t;, w) = Zi'


for t in the interval [t i , t, + 1)' then it is the solution to/--. -----(jx(t/t i) = F[t; Xn(t/t;)] (jx(t/t;)

subject to the initial condition-r--; ----- +(jx(t;/t;) = t5x(ti ) = 0

43

(9-51a)

(9-51b)

Thus, it can be seen that "h:(t/ti) is identically zero over the entire interval[ti,ti + 1 ), and so

(9-52)

(9-54)

The measurement to be presented to the relinearized filter at time t i + 1 is[Z(ti+ 1,W) - zn(ti+dti)], where

zn(ti+liti) ~ h[xn(ti+lit;), ti+ 1] (9-53)

Consequently, the measurement update for the relinearized filter is

"h:(tt+1) = "h:(C+d + K(t i+ 1)[{Zi+1 - zn(ti+dti)} - H(t i+ 1)"h:(ti-+ d]

In view of (9-52) and (9-53), this becomes

~(tt+d = K(t i+ 1){Zi+1 - h[xn(ti+dti), ti+ 1]}

where K(ti+ d is computed using P(C+ 1) and H(t H d evaluated along the mostrecent "nominal," xn(t/ti) for t E [t;, i., 1).

To achieve the final form of the extended Kalman filter, consider combiningthe most recent "nominal" xn(t/t;) with the state perturbation estimate ~(t/ti)to generate an estimate of the full state. Assuming that an adequate model ofx(t) for all tin [ti' t., 1) is

x(t) = xn(t/ti ) + iix(t)

we define the optimal estimate of x(t) in this interval as

x(t/ti) ~ xn(t/t i) + ~(t/ti)

(9-55)

(9-56)

Since "h:(t/t i) is zero over the entire interval between measurement times tiand t, + l' the best estimate of the total state over this interval is obtained asthe solution to

5t,(t/t;) = f[x(t/t i) , u(t), t]

starting from the initial condition

x(t;/ti) = x(ti+)

(9-57a)

(9-57b)

Note that this estimate propagation (prediction) employs the nonlinear systemmodel dynamic equations. To incorporate the measurement at time t.; l' we


again invoke the assumed adequacy of (9-55) to write

x(tt+ d = x(t;+ilt;) + (ji(tt+ 1)

= x(ti+t!ti) + K(t;+l){Z;+l - h[x(t;+t!ti),t i+ 1] } (9-58)

where x(ti+ tit;) is obtained from the solution of (9-57).The extended Kalman filter will now be summarized [5,6,22,26,53]. Let

the system of interest be described by the dynamics model

x(t) = f[x(t), u(t), t] + G(t)w(t) (9-59)

where x(to) is modeled as a Gaussian random n-vector with mean Xo andcovariance Po, u(·) is an r-vector of known input functions, and w(·,·) is azero-mean white Gaussian s-vector process, independent ofx(to) and of strengthQ(t) for all time t E T. Let the available discrete-time measurements be modeledas the m-vector process z(·,·): for each t; E T,

z(tJ = h]'x(t i), t;] + v(t;) (9-60)

where v(·,·) is a zero-mean white Gaussian m-vector process, independentof x(to) and w(·, .), and of strength R(t;) for all t, of interest.

The extended Kalman filter measurement update incorporates the measure-ment z(ti,w) = Z; by means of

K(ti)= P(t;- )HT[t;;x(t; -)] {H[ti;x(t; -)] P(t;-)HT[ti; x(t; -)] + R(t;)}-1

x(t;+) = x(t;-) + K(t;){Zi - h[x(ti-),t;]}

Ptz,") = P(ti-) - K(ti)H[t;; x(t; -)]P(ti-)

= {I - K(ti)H[ti;x(ti-)]}P(ti-){I - K(ti)H[ti;x(ti-)]}T

+ K(ti)R(t;)KT(t;)

where H[t;; X(ti-)] is defined as the m-by-n partial derivative matrix:

H[t;; x(t; -)] ~ ah[x, t;]Iax x=x(t,-)

(9-61)

(9-62)

(9-63a)

(9-63b)

(9-64)

The estimate is propagated forward to the next sample time t., 1 by integrating

5l.(t/t;) = f[x(t/t;), u(t), t] (9-65)

P(t/ti) = F[t; x(t/t;)]P(t/ti) + P(t/t;)FT[t; x(t/t;)] + G(t)Q(t)GT(t) (9-66)

from time t, to t;+ b using the initial conditions provided by (9-62) and (9-63):

x(tdt;) = x(ti+)

P(tdti) = P(t;+)

(9-67a)

(9-67b)

(For the first interval, from to to t1> the initial conditions would be Xo and Po,respectively.) In (9-66), F[t; x(t/t;)] is the n-by-n partial derivative matrix:


(9-68)

(9-69a)

(9-69b)

F[ . ~(/ )].!d af[x,u(t),t]1t, x t t i - --'''-------'---=-

ax x=x(t/t;J

for all t in the interval [ti> t.; d. Upon integrating (9-65) and (9-66) to the nextsample time, x(tj-:r I) and P(ti+ d are defined as

x(ti+l) = X(ti+l/t j )

P(ti+ d = P(ti+ titi)

for use in the next measurement update.

EXAMPLE 9.7 One means of numerically integrating (9-65) and (9-66) from one sampletime to the next would be to divide the interval [r, t,+ ,) into N equal subintervals and apply afirst order Euler integration technique to each subinterval. For example, let N = 2, and let eachsubinterval be M seconds long. First x(t,/t,l and t, are substituted into f in (9-65) to evaluate ~(t,/t,),with which x(t, + I'J.t/t,) is computed as (~)

x(t, + M/t,l ~ x(t,/t,) + [i(t,/t,)] I'J.t

Similarly, F[t,; x(t,/t,)] is computed and then (9-66) evaluated to yield P(t,/t,), from which iscalculated

P(t, + I'J.t/t;) ~ P(t,/t,) + [P(t,/t,)] M

The second subinterval propagation follows similarly. The computed x(t, + M/t,) and valuesof u(t, + M) and (t, + I'J.t) are used to evaluate f in (9-65), generating an approximate iU, + M/t,),and then x(t,+ ,/t,) is calculated from

x(t,+ ,/t,) £0 x(t, + 2I'J.t/t,l ~ x(t, + M/t,) + [~(t, + I'J.t/t,)] I'J.t

Once F[t, + I'J.t; x(t, + I'J.t/t,l] is computed, (9-66) can yield Pit, + M/t,), and then

P(t,+ ,/t,) £0 Ptr, + 2I'J.t/t;) ~ Pit, + I'J.t/t,) + [P(t, + I'J.t/t,)]!'J.t

Integration accuracy is improved if the value assumed by the derivatives at the midpoints ofthe subintervals are used in place of their values at the beginning of each subinterval. Consider thefirst subinterval. In (9-65), x(t, + O.5M/t,l would not be available, but u(t, + O.5M) or an approxi-mation to it of Hu(t,) + u(t, + M)} is, and so f[xU,/t,), u(t,), t,] could be replaced by f[x(t.!t,),u(t, + O.5!'J.t), (t, + O.5I'J.t)]. Once (9-65) is integrated forward, both xU,/t,) and x(t, + I'J.t/I,) areavailable for use in (9-66), so F[t,;xU,/t,)] could be supplanted by F[Ir, + O.5I'J.t); Hx(t,/t,) +xU,+ M/t,)}]. Furthermore, [G(t,)Q(t,)GT(t,l] could be replaced by its value, or averaged approxi-mation, at the time (ti + O.5M).

On the other extreme, if computation time is critical, (9-66) can be approximated by evaluatingthe partial derivative matrix only once over the entire sample period, effectivelyreplacing F[t; xU/t,l]by F[t,; x(t, +)] in (9-66).A somewhat better approximation for time-varying systems would replaceF[t; X(t/ti)] by F[t; x(t, +)]. •

The time propagation relations can be written equivalently as

x(ti+d = x(tn + 1:'+1 f[x(t/ti),u(t),t]dt (9-70)

P(t j-+ I) = fI)[t j+ 1> tj;x (T/tJ]p(tn«t>T[t j+ 1> t.; x(T/tjl]

+ 1:'+1 «t>[tj+ l , t; x(T/tJ]G(t)Q(tlGT(t)fI)T[ti+ I' t;x(T/tjl] dt (9-71)

46 9. COMPENSAnON OF LINEAR MODEL INADEQUACIES

In (9-71), «)[t i + b t; x(r/t i ) ] denotes the state transition matrix associated withF[r;x(r/t;)] for all r E [t;,ti+d.

Note that F (and thus «)), H, K, and P are evaluated by knowing the mostrecent estimate of the nominal (reference) state trajectory. In contradistinctionto the conventional and linearized Kalman filters, the equations for propagatingand updating the estimation error covariance matrix are coupled to the stateestimate relations. Therefore the covariance and gain matrices cannot be pre-computed without knowledge of the state estimates and thus of the actualmeasurement values.

EXAMPLE 9.8 Recall the satellite in planar orbit discussed in Examples 2.7 and 2.8 ofVolume 1. A reasonably good model of satellite motion in terms of range r(t) from the earth centerand angle 9(t) relative to a fixed coordinate system can be expressed in terms of'x, = r, X 2 = t, x3 = e,and X4 = eas x(t) = f[x(t), u(t), t] + G(t)w(t) by adding acceleration-level pseudonoise to the modelin Example 2.7:

)(,(t) x2(t) 0 0

x2(t) 2 G0x,(t)x4 (z) - -2- + U,(t)

x, (r)[ w,(t)J+ w,(t))(3(t) x4(t) 0 0

2 1x4(t) --- X4(t)X2( t ) +-- u,(t) 0x,(t) x,(t)

Assume that discrete-time measurements of both range and angle are available, as

rX,(ti)J

[Zt(ti)J = [1 0 0 OJ X2(ti) + [Vt(t,)JZ2(ti) 0 0 1 0 X3(ti) V2(ti)

x4(t ;)

Thus, for this problem, h[ x(t')' t;] = Hx(tJ Based on range and angle measurements from a groundtracking station and the models above, it is desired to generate a state estimate over a given timeinterval.

A linearized Kalman filter could be based upon an assumed nominal of a circular orbit atradius ro:

I 0 0 Jo 0 2row001

-2w/ro 0 0

[ ] '" af[ x, u(t), t] I r3~2F t;x.(t) = =ax x~xnl') 0

o

with u,(t) = u,(t) = 0 and G = r 03w 2

• The perturbation state estimate propagation would be basedupon (9-42) with

and the update based on (9-46) with H as given above. Figure 9.3a depicts a "true satellite" tra-jectory as an ellipse, with sampled data measurements and position estimates generated viaX1(ti +) = r(t/) and .X 3(t i +) = 8(t, +) from a particular run of this filter. Note that the assumption ofsmall deviations from the assumed nominal is subject to serious doubt, and filter performancesuffers.

_ True trajectory

x xx x x

)< :"""..-.*-------.......~~~' "~D

// '"//>t, »;

/ c ,

" \ __ Assumed/ \ circular nominal

/ \

(a)

(b)

Reference trajectoryfor [t2,t 3 )

.. ,_ Reference trajectory

" for [t " t 2)

c). -,'~ Reference

\ trajectory for [to, t d\\:Ie

FIG. 9.3 (a) Linearized and (b) extended Kalman filters applied to orbit determination. D,sampled data measurement information; x, position estimate from filter (x(t, +)).

On the other hand, an extended Kalman filter for this problem would be given by (9-61)-(9-69),with F[t; x(t/t,)] in (9-66) given by

[_ ] b. ilf[X,U(t),t]!

F t;x(t/til = --='--~-.:::ax x~"(tlr,)

o 1 0 0

( 2 2G) 0 0 2X , X4X4 + X, 3

0 0 0

2X2X4 2x40

2x 2

~ Xl X, x=x(tlt,)

where x(t/t,) for t E [t" t,+ 1) represents a new reference trajectory as a conic arc over each sampleperiod. For the same measurement realizations as considered for the linearized filter, Fig. 9.3bdepicts the position estimates and successive reference trajectories generated by an extendedKalman filter. Due to relinearizations, small perturbation assumptions are more valid and per-formance is better than that achieved in Fig. 9.3a. Unlike the linearized Kalman filter, if a secondrun of this "experiment" were conducted, yielding a different set of measurement realizations, anentirely different sequence of reference trajectories would result. •

Not only is it impossible to precompute covariance and gain time historiesbecause of coupling of second moments to state estimates and measurementrealizations, a priori covariance performance (sensitivity) analyses as describedin Chapter 6 (Volume 1) are also precluded as a complete analysis method forthe extended Kalman filter. Instead, its performance characteristics must beportrayed through a Monte Carlo analysis [38]. Nevertheless, covarianceanalyses are exploited to some degree in the design of extended Kalman filters.If a nominal state trajectory is chosen (corresponding to either "typical" or"worst case" situations), then a covariance analysis ofa linearized Kalman filteroperating over this nominal affords an approximate analysis of an extendedKalman filter with small deviations from this same nominal. In other words, itis assumed that the "true" x(t) is well approximated by some trajectory xn(t) forall t, but the filter does not really know what these values are in actual operation,so it uses its best estimate x(t/t;) to evaluate the partial derivative matrices Fand H. However, an a priori covariance analysis can be performed by evaluatingthese partial derivatives based on xn(t) for all t instead. In this manner, an initialfilter tuning and local perturbation analysis of the extended filter can be per-formed efficiently. Subsequently, a Monte Carlo analysis is conducted toinvestigate filter performance thoroughly, as especially the ability of the filterto recover from a condition in which an estimate x(t/t;) is significantly differentfrom the "true" x(t) value at some t E T.

One particular application of extended Kalman filtering is the estimation ofstate variables in a system well described by a linear model, but in which areembedded some uncertain parameters in the F (or <1», B, and/or H matrices[28, 63]. These uncertain parameters can be treated as additional state variables,resulting in a nonlinear model since these new states are multiplied by theoriginal state variables in the model. There are other approaches to the problemof simultaneous estimation of states and parameters in a linear model structure,and the next chapter will study this problem extensively, but the extendedKalman filter has been shown to be a viable candidate for a means of solution.

EXAMPLE 9.9 This example investigates thrust-vector controlling for spacecraft that canundergo significant bending. The angular orientation of a gimbaled rocket engine nozzle (relativeto the spacecraft body) is commanded not only to achieve desired guidance corrections, but also todamp out vehicle bending modes, during the thrusting period.

Consider a pitch plane description of the docked configuration of the Apollo Command andService Module (CSM) with the Lunar Module (LM). This is a vehicle for which bending is especiallyimportant because of the narrow docking tunnel between the two modules, as shown in Fig. 9.4.


FIG.9.4 The Apollo spacecraft: docked configuration ofCSM and LM. From Widnall [64],with permission of MIT Press.

The basic vehicle dynamics can be described by

[

oo(t)] [0 0 0 0.0815a(t) 1 0 0 0d 2- vb(t) = 0 0 0 -Wbdt q(t) 0 0 1 0

Il(t) 0 0 0 0

1.I3][oo(t)] [0] [1.13]-:1 :;~i' + : '_'0 + -:1 .",-10 o(t) 10 0

where 00 and e are rigid body motion angular velocity and attitude (relative to an inertial referencedirection), Vb and q are the velocity and position of the generalized bending coordinate, Il is theangle of the gimbaled engine nozzle relative to the vehicle centerline, bcom is the commanded gimbalangle, and w models the vibrational disturbance due to the rocket engine. Figure 9.5 presents ablock diagram portrayal of the system model. Notice that the bending mode is modeled as anundamped second order system; true damping ratio is on the order of 0.005 and so it has beenneglected for this problem. Later we will return to this problem to consider design of a feedbackcontroller to provide bcom(') simply as some input command signal. The engine gimbal servosystem is modeled to consider design of a feedback controller to provide bcom(.), but here we con-centrate on the estimation problem and consider bcom( ' ) simply as some input command signal.The engine gimbal servo system is modeled as a first order lag driven by bcom: this state is stochas-tically uncontrollable, and so should no: enter the vector of states to be estimated; instead, b(t)

can be treated as a known input, with an auxiliary equation of

b(t) = -lOb(t) + lObcom(r);b(to) = 0


1-------------- ----------,I I

",om : b(t) :IIIII 10 II I~S~r~ --l

III Rigid bodyL _

w(t)

FIG. 9.5 System model for thrust vector control problem.

to determine its value once "com(') is known. The dynamic driving noise wt-, .) enters the equationsas a zero-mean white Gaussian noise random thrust vector angle. with power spectral densityvalue of 0.0004 rad 2/Hz; such a disturbance can be expected to cause a lateral velocity of about2 ft/sec in 100 sec of engine burn time.

In the system dynamics model, Wb2 is the undamped natural frequency of the bending mode, a

slowly varying parameter whose value is known only with some uncertainty (in fact, initial tests inthe Apollo program revealed values substantially different from those predicted analytically). Useof a wrong assumed value not only degrades performance of an ensuing feedback controller design,but an error of only a few rad/sec can cause controller-induced instability: the applied control wouldactually increase the bending mode amplitude rather than damp it out.

The system model is rather simple. A more realistic model would describe the full scale problemby coupling the fundamental yaw plane rigid body and bending modes, the torsional bending mode,fuel slosh and center-of-gravity shift effects. Higher order modes could be neglected for this appli-cation. However, attention is confined to the five states described above for illustrative purposes.

A sampled measurement is available every 0.1 sec from the inertial measurement unit (lMU) ofangular orientation of the vehicle at the IMU location with respect to inertial space, modeled as

z(tJ = S(ti ) - O.l3q(t,) + v(t,)

where S(ti ) is the rigid body angular attitude, the second term is the generalized bending coordinateq(ti ) multiplied by the slope of the bending mode at the IMU station per unit displacement ofq(ti ), and v(·,·) is a zero-mean white Gaussian noise used to model readout quantization errors andother uncertainties. If quantization error is assumed to be uniformly distributed over the quanti-


zation interval of 0.0002 rad associated with the analog-to-digital conversion, then the varianceof this error is f,;(0.0002)2 rad'. Since this is the major contribution to measurement uncertainty,R(t,) is set to this value for all t, (it could have been increased slightly to account for neglectedeffects).

An optimal estimate of the state variables to, 8, Vb' and q and the uncertain parameter W h2 could

be provided by an extended Kalman filter by treating [Wb2] as an additional state. This "state"could be modeled as a random bias plus pseudonoise (i.e., a random walk) as

where n(',') is a white Gaussian noise of appropriate strength. Thus, the dynamics model forfilter development is

[

ro(tl ] [0.0815Q

(t) + 1.13b(t)] [1.138(t) ro(t) 0fr vb(t) = - [rob2(t)][Q(t)] - llb(t) + -11

Q(t) vb(t) 0

[rob2(t )] 0 0

itt) I[x(t), u(t)] + G

:][W(t)]n(t)o

1

w(t)

where b(t) is generated deterministically from commanded control '>Corn(t) by the auxiliary equation

b(to) = 0

Note the nonlinearity in f3[x(t), u(t)]. The measurement model is

z(t,) = [0

(z(t i ) =o -0.13 0] x(ti) + V(ti)

H x(t,) + V(ti))

Figure 9.6 presents a representative time history (i.e.,from one run out of a Monte Carlo analysis)of the error in the filter estimate of [rob2]. For this simulation, the "true" value of [rob2] was set

"'lil~ iO

~'" .0

3u..ou.J...«:;:;:;:: 20V"lu.J

Z

J~OO

FIG. 9.6 Parameter estimate error produced by extended Kalman filter.


equal to 150 rad 2/sec2, while the filter model assumed [lOb2(tol] to be described as a Gaussianrandom variable with mean 100 rad 2/sec2 and variance 1000 rad'i/sec". This choice of variance,and the selection of200 radt/sec" for the strength of the white noise n(·,·), are empirically determined"good" values achieved through filter tuning. Thus the filter exhibits an ability to estimate statesand parameters simultaneously; the corresponding state estimation precision far exceeds that of aconventional Kalman filter that erroneously assumes [lOb2] = 100 rad 2/sec 2.

The estimator performance is strongly dependent upon the assumed second order statistics on[oo/(to)] and n(','), larger numerical values causing greater oscillations in the steady state per-formance, and lower ones causing slower transients. The latter case is depicted in Fig. 9.7, for whichthe initial variance on lOb2 was reduced from 1000 to 10 rad 4/sec4 and the strength of n(',') reducedfrom 200 to 2 rad 4/sec5

. •

_ 10

'"~

'" .03

'-'--o~~ 20

l-V)

"-'

'"ct:oct:ct:w

FIG. 9.7 Parameter estimate error when variances for parameter model are reduced.

The preceding example revealed a bias error in the state estimate producedby an extended Kalman filter. This is very characteristic of extended Kalmanfilters, and it is due to the neglected higher order effects inherent in trying toexploit linear perturbation concepts. The more pronounced the nonlinearitiesare in a given application, the more seriously one can expect performance to bedegraded by this effect [8,22]. This can be compensated to some degree by"tuning" [30J the filter such that its internally computed error variances matchthe true mean squared errors (variances plus squares of the mean errors) asindicated in a Monte Carlo performance analysis [38]. This will be discussedmore extensively in Chapter 12, and alternative means of compensating forsuch bias errors (called "bias correction terms") will be developed at that time.

Extended Kalman filters have been used in many practical applications[1,3,7,10,11,23-27,29,31-35,39,40,42,48,53,60,65]. The following examplepresents one such implementation.


EXAMPLE 9.10 In this example [31,35], an extended Kalman filter is designed to track along-range object, using outputs from a forward looking infrared (FUR) sensor as measurements.The FUR sensor generates outputs of an array of infrared detectors as they are mechanicallyscanned through a limited field of view, with digitized outputs corresponding to the average intensityover each picture element (pixel; assumed square, of dimension defined by the optics and physicalsize of the detectors) in a complete picture image (frame of data). A frame of data is available each1/30sec, and an 8-by-8 pixel array out of the complete frame, called a "tracking window," is providedto the filter as a measurement (i.e., 64 intensity values).

Long-range targets appear as point sources of infrared radiation. Due to the physics of wavepropagation and optics, the resulting radiation intensity pattern (called glint, and assumed to betime invariant), can be modeled as a two-dimensional Gaussian function over the FUR imageplane, centered at [xpeak(t), Ypeak(t)] relative to the center of the 8-by-8 pixel array, and havingcircular equal-intensity contours:

1,arg e'(X, Y, t) = Imaxexp{( -1/2a.2)([X - xpeadtJY + [y - Ypeak(t)]2)]

where Imax is the peak intensity value (assumed known) and a. is the dispersion of the Gaussianglint function. Figure 9.8 portrays this graphically.

y peak (I)

FIG. 9.8 Apparent target intensity pattern on imageplane. From [31], © 1980 IEEE.

Center ofApparent TargetIntensity Profile

Equal-l ntensityContours

I--..-x peak (I)

8-by-S Array of Pixels

The apparent location of the target is actually the sum of effects due to true target dynamics,atmospheric disturbances (jitter), and vibration:

xpeak(t) = Xdyn(t) + xa'm(t) + Xvib(t)

and similarly for Ypeak(t). We want to estimate x dyn and Ydyn separately from other effects, based onreal-time noise-corrupted FUR sensor data.

To provide a simple, generally applicable target dynamics model that accounts for time-correlated behavior of realistic targets, an independent zero-mean first order Gauss-Markovmodel in each direction was chosen:

Xdyn(t) = -(I/rd)xdyn(t) + wxdyn(t)

E{Wxdyn(t)Wxdyn(t + r)} = 2a//rdb(r)

By suitable choice of xdyn variance a/ and correlation time rd' amplitude and rate-of-changecharacteristics of a variety of targets as seen in the image plane can be well represented.

Through spectral analysis of the atmospheric jitter phenomenon, it was found that X, 'mandY,'m can each be adequately modeled as the output of a third order shaping filter, described by

G(s) = KW1W//[(S + wIlls + w/)]

driven by unit strength white Gaussian noise. Because Wz » w, and the significant frequencies ofthe problem were well below w 2 , this was approximated by

G'(s) = Kwtl(s + wI!

in the eventual filter design with very small degradation in performance. Because the tracker underconsideration was ground based, vibration effects were considered negligible compared to othereffects.

Thus, the dynamics model for the filter is a four-state (x = [x d yn , Ydyn,X.tm, y.tm]T) linear time-invariant model. This yields very simple propagations:

x(t,+Il = <J)x(t,+)

P(t'-+I) = <J)P(ti+)<J)T + Qd

with constant, precomputed <J) and Qd = J<J)GQGT<J)T dt,The target intensity pattern is corrupted by background noise (clutter, etc.) and inherent FUR

errors (thermal noise, dark current, etc.) before being available in the measurements. Letting Zj.(ti)denote the measurement at time t i of the average intensity over the pixel in the jth row and kthcolumn of the 8-by-8 array,

Zj.(t,) = ~ SS I,.,ge'(x, Y,t.ldx dy + vj.(t,)

P region ofjkth pixel

where Ap is the area of one pixel, and vj.(t,) models the combined intensity effects of backgroundand FUR errors on the jkth pixel. Arraying 64 such scalar equations into a single measurementvector yields a model of the form

zIt,) = h[x(t,), t'] + v(t,)

Note that spatial correlation of the background and FUR errors are readily represented by theoff-diagonal terms of the R matrix associated with v. In the actual filter implementation, the hfunction was approximated by replacing the two-dimensional integral terms by the value of I",getevaluated at the center of the jkth pixel.

Due to the large number of measurements, the usual measurement update given by (9-61)-(9-63)was computationally inefficient, (9-61) requiring the inversion of a 64-by-64 matrix. To circumventthis problem, the algebraically equivalent form

l(t, -) = [PIt, -W'I(t,+) = I(ti) + H T (t,)R - 1(ti)H(t,)

Pit, +) = [I(t, +)]-1

K(t,) = P(t i +)HT(t,)R - 'It,)

was employed. This form only requires two 1-by-4 matrix inversions online; R -, is constant andis generated once offline.

The performance capabilities of this filter were evaluated and compared to those of a previouslyproposed correlation tracker by means of a Monte Carlo analysis (involving 20 sample simulationsfor each run). Figures 9.9 and 9.10 portray the sample mean error ± 1 standard deviation committedby the filter in estimating the target horizontal location, and the corresponding correlation tracker


1.0

0.5

Vl

~ O. 0 ~--Hr+----4,........----,'"c,

-0.5

5.04.01.0-1.0 +----T""""---T""""---T""""---......---T

0.0 2.0 3.0TIME (SEC)

FIG. 9.9 xd yn mean error ± 10". SIN = 10, 0". = 3 pixels, O"d = 0", = 1 pixel, Td = 1 sec. From[31J, © 1980 IEEE.

4.0

2.0

Vl...J

~ 0.0c,

-2.0

-4.00.0 1.0 2.0 3.0 4.0 5.0

TIME (SEC)

FIG. 9.10 Correlator x tracking error mean ± 10". SIN = 10,0". = 3 pixels, O"d = 0", = 1 pixel,Td = 1 sec. From [31J, © 1980 IEEE.

error, for a particular representative case. Note that signal-to-noise ratio SIN is defined here as

SIN = Im,x[10" value of (background + FUR) noiseJ

0". specifies the intensity pattern spot size on the FUR image plane relative to pixel size, O"d and Td

describe target dynamics, and 0", is the root mean square atmospheric jitter.Tables 9.1 and 9.2 depict the performance of the two tracking algorithms at the end of 5 sec as

a function ofrealistic SIN (20, 10, and 1) and 0". values (3 pixels and 1 pixel). These results are for

TABLE 9.1

Mean Error and 10" Error Comparisons with 0". = 3 Pixels

Correlation tracker Extended Kalman filter

Mean error 10" error Mean error 10" errorSIN (pixels) (pixels) (pixels) (pixels)

20 0.5 1.5 0.0 0.210 3.0 3.0 0.0 0.2

1 15.0 30.0 0.0 0.8

TABLE 9.2

Mean Error and 10" Error Comparisons with 0". = 1 Pixel

Correlation tracker Extended Kalman filter

Mean error 10" error Mean error 10" errorSIN (pixels) (pixels) (pixels) (pixels)

20 7.0 8.0 0.0 0.210 8.0 10.0 0.0 0.21 15.0 30.0 0.0 0.8

Td = 1 sec and averaged over realistic cases of «J«, = 5, 1, and 0.2. The extended Kalman filterconsistently outperforms the correlation tracker. Lowering SIN to 1 degrades the filter performanceslightly and decreasing spot size to 1 pixel (i.e., 0". = 1 pixel in Table 9.2) has no noticeable effect,whereas both of these variations seriously affect the correlation tracker.

The filter exploits knowledge unused by the tracker-size, shape, and motion characteristicsof target, and atmospheric jitter spectral description-to yield the enhanced performance. Becausethe correlation tracker makes no specific assumptions about the target, it may be a more robustalgorithm: able to perform well in the face of large variations in the true characteristics of thesystem. Nevertheless, the filter is not overly sensitive to small errors in its assumed model of thetracking task. For instance, with SIN = 20, 0". = 3 pixels, «J«, = 5, and Td = 1 sec, the filter'smean error remained at 0, and 10" error only increased from 0.2 to 0.3 pixels when its assumed <T.

was changed from 3 pixels to 1. •

The preceding example considered vector measurement updates. Note that,unlike the linear or linearized Kalman filters, the extended Kalman filter cangenerate a different state estimate if measurements are incorporated recursivelyat a sample time instead. Consider (9-61)-(9-65) with R(t;) diagonal. If H[t i ;

x(t i -)] is computed and decomposed into m rows HI, H 2 , ••. , Hm , then vectorupdating and n iterative scalar updates yield the same value of x(t; "). However,if each new x(t i+) generated by incorporating a single scalar measurement isused to evaluate the ensuing Hb to provide a better point about which to


linearize h, as

n, = ahk [ X, t;] Iax x = x(li +) incorporating (k - 1)

components of z(t,)

57

(9-72)

then the two forms of updating will differ in their results.Of practical importance because of significantly reduced online computation

are precomputed-gain extended Kalman filters. They provide state estimatesaccording to (9-62) and (9-65), but with gain histories precomputed as in thelinearized Kalman filter, using a precomputed nominal state trajectory. Notethat this in fact differs from the linearized filter, which uses the same K(tJ timehistory, but generates state estimates according to

x(t/) = x(ti-) + K(tiHZi - h[xn(t;), t;] - H(t i; Xn(ti))[X(ti -) - Xn(ti)J} (9-73)

i(t/t;) = f[xn(t;),t;] + F(t;xn(t))[x(t/t;) - xn(t)J (9-74)

Within this class of estimators, the constant-gain extended Kalman filter(CGEKF) [50,51J is especially significant. For a time-invariant nonlinearsystem description in which the nominal is an equilibrium state trajectory,x, = const, the linear perturbation equations are themselves time invariant,and the K(t;) associated with the linearized filter can reach a steady state value,Kss • Then the constant-gain extended Kalman filter is defined by

x(t;+) = x(t;-) + Kss{Zi - h[x(t;-)J}

i(t/t;) = f[x(t/t;}, u(t)J

(9-75)

(9-76)

Such a filter form will be useful when incorporated into a control system thatconstantly drives the system state toward the equilibrium condition, therebymaintaining the adequacy of models based on linearizations about that pre-computed value. More will be said about this form in Chapter 15 (Volume 3)on stochastic controllers for nonlinear systems.

Continuous-time continuous-measurement extended Kalman filters can bebased upon state dynamics and measurement models of

dx(t) = f[x(t), u(t), tJ dt + G(t) dJl(t)

dy(t) = h[x(t), tJ dt + dJlm(t)

often written in the less rigorous form

(9-77a)

(9-77b)

x(t) = f[x(t), u(t), tJ + G(t)w(t) (9-78a)

z(t) = h[x(t),tJ + vc(t) (9-78b)

by formally letting w = dJl/dt, Vc = dJlm/dt, z = dy/dt, where w(',·) and vc( ' , ' )

are zero-mean white (Gaussian) noises that are uncorrelated with each other


with statistics (using subscript c to denote continuous-time)

E{w(t)wT(t + r)} = Q(t)b(r)

E{vc(t)v/(t + r)} = Rc(t) b(r)

The estimator equation is given by

i(t) = f[x(t), u(t),tJ + K(t){z(t) - h[x(t), tJ}

where the gain K(t) is given by

K(t) = P(t)HT[t; x(t)JR; l(t)

and P(t) satisfies

P(t) = F[t; x(t)]P(t) + P(t)FT[t; x(t)J + G(t)Q(t)GT(t)

- P(t)HT[t; x(t)]R; l(t)H[t; x(t)JP(t)

(9-79a)

(9-79b)

(9-80)

(9-81)

(9-82)

where F[t; x(t)J and H[t; x(t)] are the partials of f and h, respectively, withrespect to x, evaluated at x = x(t). Precomputed and constant gain forms ofthis filter can also be applied in practice [50, 51]'

Nonlinear full-order observers can be defined of the form of(9-62) and (9-65),or (9-80) for continuous measurements, with the gain K(ti) or K(t) chosen byother means than used in the extended Kalman filter. For constant-gain ob-servers, pole placement or more extensive eigenvalue/eigenvector assignmenttechniques can be applied to the system of perturbation equations about anequilibrium solution to achieve desirable observer characteristics (see Chapter14, Section 4, in Volume 3). Another means of establishing gain sequences inthe discrete-measurement form is through stochastic approximations [14J,yielding estimators that are not optimal with respect to statistical characteristicsbut that possess desirable, well-defined convergence properties, even in the faceof large parameter uncertainties.

In certain applications, local iterations (over a single sample period) onnominal trajectory redefinition and subsequent relinearization may be war-ranted for performance improvement and practical computationally. Theresulting iterated extended Kalman filter and iterated linearized filter-smootherare capable of providing better performance than the basic extended Kalmanfilter, especially in the face of significant nonlinearities, because of the improvedreference trajectories incorporated into the estimates [8, 22, 52]'

Recall the extended Kalman filter algorithm, (9-61)-(9-69).The fundamentalidea of the iterated filter is that, once x(ti +) is generated by using (9-62), thenthis value would serve as a better state estimate than X(ti -) for evaluating hand H in the measurement update relations. Then the state estimate aftermeasurement incorporation could be recomputed, iteratively if desired. Thus,(9-61)-(9-62) would be replaced by setting Xo equal to x(t; -) and performing an

REFERENCES

iteration on

K(t;) = P(t; -)HT(t;; xk)[H(t;; Xk)P(t;-)HT(t;; Xk) + R(t;)]-l

Xk+l = X(t;-) + K(t;){z; - h(xk,t;) - H(t;,xd[x(t;-) - XkJ}

59

(9-83)

(9-84)

for k = 0, 1, ... , N - 1, and then setting x(t; +) = x N (the iteration could bestopped when consecutive values xk and xk + 1 differ by less than a preselectedamount). Note that Xl is just the x(t;+) estimate provided by a simple extendedKalman filter.

The algorithm just described addresses the problem of significant non-linearities by reevaluating hand H to achieve a better x(t/). This will alsoimprove estimation over future intervals because of improved succeedingreference trajectories. It is also possible to improve the reference trajectory usedfor propagation over [t;-l,t;) once the measurement z; is taken, by applyingsmoothing techniques backward to time t;-l' Incorporating such a localiteration into the extended Kalman filter structure yields what is termed theiterated linearized filter-smoother [22J, but because of its limited applicabilitydue to significant computational burden, the algorithm is not presented inexplicit detail.

9.6 SUMMARY

This chapter presented means of extending linear models, techniques, andinsights further into the problem of state estimation. Fundamentally, thesemethods entailed (1) decreasing the filter's confidence in the adequacy of thelinear model within its structure, (2) telling the filter to discount or totallyignore older data because of the cumulative errors resulting from an erroneouslinear model for time propagations, and (3) attempting to incorporate non-linearities into the model while still exploiting linear estimation techniques.The next chapter investigates estimation based upon a linear system modelwith uncertain parameters within its structure. Succeeding chapters will thenaddress proper development of general nonlinear stochastic models, and stateestimation based upon such extended models.

REFERENCES

I. Asher, R. B., Maybeck, P. S., and Mitchell, R. A. K., Filtering for precision pointing andtracking with application for aircraft to satellite tracking, Proc. IEEE Conf. Decision andControl, Houston, Texas pp. 439-446 (December 1975).

2. Barker, R., Estimation of Decayed Satellite Reentry Trajectories. Ph.D. dissertation, AirForce Institute of Technology, Wright-Patterson AFB, Ohio (1981).

3. Bryan, R. S., Cooperative Estimation of Targets by Multiple Aircraft, M. S. thesis, Air ForceInstitute of Technology, Wright-Patterson AFB, Ohio (June 1980).

4. Center, J. L., D'Appolito, J. A., and Marcus, S. 1'0 Reduced-Order Estimators and theirApplication to Aircraft Navigation, Tech. Rep., TASC TR-316-4-2. The Analytic SciencesCorporation, Reading, Massachusetts (August 1974).


5. Cox, H., On the estimation of state variables and parameters for noisy dynamic systems,IEEE Trans. Automat. Control AC-9 (I), 5-12 (1964).

6. Cox, H., Estimation of state variables via dynamic programming, Proc. Joint Automat.Control Conf., Stanford, California pp. 376-381 (1964).

7. Cusumano, S. J., and DePonte, M., Jr., An Extended Kalman Filter Fire Control SystemAgainst Air-to-Air Missiles. M.S. thesis, Air Force Institute of Technology, Wright-PattersonAFB, Ohio (December 1977).

8. Denham, W. F., and Pines, S., Sequential estimation when measurement function nonlinearityis comparable to measurement error, AIAA J.4, 1071-1076 (1966).

9. Fagin, S. L., Recursive linear regression theory, optimal filter theory, and error analysis ofoptimal systems, IEEE Int. Convent. Record 12,216-240 (1964).

10. Farrell, J. L. et al., Dynamic scaling for air-to-air tracking, Proc. Nat. Aerospace and Elec-tron. Conf., Dayton, Ohio pp. 157-162 (May 1975).

11. Fitts, J. M., Aided tracking as applied to high accuracy pointing systems, IEEE Trans. Aero-space Electron. Systems AES-9 (3),350-368 (1973).

12. Fitzgerald, R. J., Divergence of the Kalman filter, IEEE Trans. Automat. Control AC-16 (6),736-743 (1971).

13. Friedland, B., On the effect of incorrect gain in the Kalman filter, IEEE Trans. Automat.Control AC-12 (5),610 (1967).

14. Gelb, A. (ed.), "Applied Optimal Estimation." MIT Press, Cambridge, Massachusetts, 1974.15. Griffin, R. E., and Sage, A. P., Large and small scale sensitivity analysis of optimum esti-

mation algorithms, Proc. Joint Automat. Control Conf., Ann Arbor, Michigan 977-988,(June 1968).

16. Griffin, R. E., and Sage, A. P., Sensitivity analysis of discrete filtering and smoothing algo-rithms, Proc. AIAA Guidance, Control and Flight Dynam. Conf., Pasadena, California PaperNo. 68-824 (August 1968).

17. Hampton, R. L. T., and Cooke, J. R., Unsupervised tracking of maneuvering vehicles, IEEETrans. Aerospace and Electron. Syst. AES-9 (2), 197-207 (1973).

18. Heffes, H., The effect of erroneous error models on the Kalman filter response, IEEE Trans.Automat Control AC-ll (3), 541-543 (1966).

19. Huddle, J. R., and Wismer, D. A., Degradation of linear filter performance due to modelingerror, IEEE Trans. Automat. Control AC-13 (4), 421-423 (1968).

20. Hutchinson, C. E., D'Appolito, J. A., and Roy, K. J., Applications of mimimum variancereduced-state estimators, IEEE Trans. Aerospace and Electron. Syst. AES-Il (5), 785-794(1975).

21. Jazwinski, A. H., Limited memory optimal filtering, Proc. Joint Automat. Control Conf., AnnArbor, Michigan (1968); also IEEE Trans. Automat. Control AC-13 (5),558-563 (1968).

22. Jazwinski, A. H., "Stochastic Processes and Filtering Theory." Academic Press, New York,1970.

23. Kirk, D. E., Evaluation of State Estimators and Predictors for Fire Control Systems, Tech.Rep. NPS-52KI74101. Naval Postgraduate School, Monterey California (October 1974).

24. Kolibaba, R. L., Precision Radar Pointing and Tracking Using an Adaptive Kalman Filter.M.S. thesis, Air Force Institute of Technology, Wright-Patterson AFB, Ohio (January 1973).

25. Landau, M. I., Radar tracking of airborne targets, Proc. Nat. Aerospace and Electron. Coni,Dayton, Ohio p. 500 (May 1976).

26. Leondes, C. T. (ed.), Theory and Applications of Kalman Filtering, AGARDograph No. 139.NATO Advisory Group for Aerospace Research and Development, London (February 1970).

27. Lindberg, E. K., A Radar Error Model and Kalman Filter for Predicting Target States in anAir-to-Air Environment. M.S. thesis, Air Force Institute of Technology, Wright-PattersonAFB, Ohio (December 1974).

28. Ljung, L., Asymptotic behavior of the extended Kalman filter as a parameter estimator forlinear systems, IEEE Trans. Automat. Control AC-24 (I), 36-50 (1979).

REFERENCES 61

29. Lutter, R. N., Application of an Extended Kalman Filter to an Advanced Fire Control Sys-tem. M. S. thesis, Air Force Institute of Technology, Wright-Patterson AFB, Ohio (December1976).

30. Maybeck, P. S., "Stochastic Models, Estimation, and Control," Vol. I. Academic Press,New York, 1979.

31. Maybeck, P. S., and Mercier, D. E., A target tracker using spatially distributed infraredmeasurements, IEEE Trans. Automat. Control AC-25 (2),222-225 (1980).

32. Maybeck, P. S., Negro, J. E., Cusumano, S. J., and DePonte, M., Jr., A new tracker for air-to-air missile targets, IEEE Trans. Automat. Control AC-24 (6),900-905 (1979).

33. Maybeck, P. S., Reid, J. G., and Lutter, R. N., Application of an extended Kalman filter toan advanced fire control system, Proc. IEEE Conf, Decision and Control, New Orleans,Louisiana pp. 1192-1195 (December 1977).

34. Mehra, R. K., A comparison of several nonlinear filters for reentry vehicle tracking, IEEETrans. Automat. Control AC-16 (4),307-319. (1971).

35. Mercier, D. E., An Extended Kalman Filter for Use in a Shared Aperture Medium RangeTracker. M.S. thesis, Air Force Institute of Technology, Wright-Patterson AFB, Ohio(December 1978).

36. Miller, R. W., Asymptotic behavior of the Kalman filter with exponential aging, AIAA J.9 (3), 537-538 (1971).

37. Morrison, N., "Introduction to Sequential Smoothing and Prediction." McGraw-Hill,New York 1969.

38. Musick, S. H., SOFE: A Generalized Digital Simulation for Optimal Filter Evaluation;User's Manual, Tech. Report AFWAL-TR-80-1108, Avionics Laboratory, Air Force WrightAeronautical Laboratories, Wright-Patterson AFB, Ohio, 1980.

39. Myers, K. A., and Tapley, R D., Dynamic model compensation for near-earth satellite orbitdetermination, AIAA J. 13 (3), (1975).

40. Nash, J. M., and Samant, V. S., Airborne defensive fire control algorithms, Proc. IEEENat. Aerospace and Electron. Conf., Dayton, Ohio pp. 653-657 (May 1977).

41. Neal, S. R., Linear estimation in the presence of errors in assumed plant dynamics, IEEETrans. Automat. Control AC-12 (5),592-594, (1967).

42. Pearson, J. R, III, and Stear, E. B., Kalman filter applications in airborne radar tracking,IEEE Trans. Aerospace Electron. Systems AES-I0 (3), 319-329 (1974).

43. Penrose, R., A generalized inverse for matrices, Proc. Cambridge Phi/os. Soc. 51, Part 3,406-413 (1955).

44. Penrose, R., On the best approximate solutions of linear matrix equations, Proc. CambridgePhi/os. Soc. 52, Part 3, 17-19 (1956).

45. Pon, W., A state covariance matrix computation algorithm for satellite orbit determinationsequential filtering, Sym. Nonlinear Estimat. Theory, 3rd (September 1972).

46. Pon, W., Math Theory for the Advanced Orbit Determination and Ephemeris GenerationSystem (AOES), Data Dynamics Rep. (September 1973).

47. Price, C. F., An analysis of the divergence problem in the Kalman filter, IEEE Trans. Automat.Control AC-13 (6),699-702 (1968).

48. Price, C. F., and Warren, R. S., Performance Evaluation of Homing Guidance Laws forTactical Missiles, Tech. Rep. TR-170-4. The Analytic Sciences Corp., Reading, Massachusetts(1973).

49. Sacks, J. E., and Sorenson, H. W., Comment on 'A practical nondiverging filter,' AIAA J.9(4),767-768(1971).

50. Safonov, M. G., Robustness and Stability Aspects of Stochastic Multivariable FeedbackSystem Design, Ph.D. dissertation, Rep. #ESL-R-763. MIT Electronic Systems Lab.,Cambridge, Massachusetts (September 1977).

51. Safonov, M. G., and Athans, M., Robustness and computational aspects of nonlinearstochastic estimators and regulators, IEEE Trans. Automat. Control AC-23 (4),717 -725 (1978).


52. Schlee, F. H., Standish, C. J., and Toda, N. F., Divergence in the Kalman filter, AIAA J. 5,1114-1120 (1967).

53. Schmidt, G. T. (ed.], Practical Aspects of Kalman Filtering Implementation, AGARD-LS-82.NATO Advisory Group for Aerospace Research and Development, London (May 1976).

54. Schmidt, S. F., Compensation for Modeling Errors in Orbit Determination Problems, Rep.No. 67-16. Analytical Mechanics Associates, Seabrook, Maryland (November 1967).

55. Schmidt, S. F., Weinburg, J. D., and Lukesh, J. S., Case study of Kalman filtering in the C-5aircraft navigation system, Case Studies in System Control, pp. 57-109. IEEE and Univ, ofMichigan, Ann Arbor, Michigan (June 1968).

56. Schweppe, F. C, Algorithms for Estimating a Re-Entry Body's Position, Velocity, andBallistic Coefficient in Real Time or from Post Flight Analysis, Tech. Rep. ESD-TDR-64-583.MIT Lincoln Lab., Lexington, Massachusetts (December 1964).

57. Schweppe, F. C., "Uncertain Dynamic Systems." Prentice-Hall, Englewood Cliffs, NewJersey, 1973.

58. Sorenson, H. W., On the error behavior in linear minimum variance estimation problems,IEEE Trans. Automat. Control AC-12 (5), 557-562 (l967).

59. Sorenson, H. W., and Sacks, J. E., Recursive fading memory filtering, Inform. Sci. 3, 101-109(1971).

60. Tapley, B. D., and Ingram, D. S., Orbit determination in the presence ofunmode1ed acceler-ations, Symp. Nonlinear Estim. Theory, 2nd (September 1971).

61. Tarn, T. J., and Zaborsky, J., A practical, nondiverging filter, AIAA J.8, 1127-1133 (1970).62. Vagners, J., Design of numerically stable flight filters from minimum variance reduced order

estimators, Proc. IEEE Nat. Aerospace Electron. Conf., Dayton, Ohio pp. 577-581 (May 1979).63. Westerlund, T., and Tysso, A., Remarks on 'Asymptotic behavior of the extended Kalman

filter as a parameter estimator for linear systems,' IEEE Trans. Automat. Control AC-25 (5),1011-1012 (1980).

64. Widnall, W. S., "Applications of Optimal Control Theory to Computer Controller Design."MIT Press, Cambridge, Massachusetts, 1968.

65. Worsley, W., Comparison of Three Extended Kalman Filters for Air-to-Air Tracking. M.S.thesis, Air Force Institute of Technology, Wright-Patterson AFB, Ohio (December 1980).

PROBLEMS

9.1 Recall the example of a gyro on test, as described in Problem 8.1, and the Kalman filterdeveloped to estimate the gyro drift rate x(t}. Assume that adequate performance is not beingachieved, and you suspect that this is caused by modeling inadequacies. We want to explore someof the compensation techniques of this chapter on filter operation. As discussed in Example 5.4 ofVolume 1, "tracking" properties of the filter are indicated by the gain K(t,): if K(t,) is approximatelyone, x(ti + } is approximately equal to the most recent measurement =,; if K(t,) is very small, thenx(t,+} is not "tracking" the measurements closely, but rather is heavily weighting the output of itsown internal system model.

(a) Generate the K(t,} time history corresponding to the two-hour interval as depicted inProblem 8.1 and Example 5.4.

(b) Demonstrate the effectof adding "pseudonoise" by increasing Q from 2 deg21hrto 4 deg2/hr,

and computing the time histories of Ptt, "), Pit, +), and K(t,} values.

(c) As an alternative, consider putting a lower bound of 0.35 on PIt,+), and generate thecorresponding time histories.

(d) Repeat for the case of using age-weighting instead, with an age-weighting time constantT; of 1 hr. Repeat with T'; = 0.5 hr.

PROBLEMS 63

(e) Apply the Schmidt e technique to this problem, with e = 0.25. Repeat with e = 0.5, 0.75,and 1.0.

(f) Consider development of a one-state MVRO estimator for this problem, assuming that abetter model ("truth model") for the gyro would modify

x(t) = - x(t) + wIt)

with w(·,·) having a power spectral density value of 2 deg2/hr over all frequencies, into the model

x(t) = - x(t) + n(t)

where n(',') is a zero-mean first-order Gauss-Markov process with power spectral density valueof about' 2 at the low frequencies but band-limited with a break frequency of 3 rad/sec:

lfinn(w) = 18/(w 2 + 9)

or

E{n(tln(t + T)} = 6exp{ -3fTI}Relate this result to the previous modifications via pseudonoise addition and filter gaincompensation.

9.2 Assume that a certain sensor acts as a low-pass filter in providing an output voltagesignal s, in response to a variation in the quantity being measured, denoted as u. Due to the effectsof parameter variations in the sensor and associated electronics, a slowly drifting bias b is alsointroduced into the output. Furthermore, instrument noise and quantization error affect thedigital readout from the sensor package, this being effectively modeled by white noise t' of zeromean and strength R. Thus, a model of the sensor would be as shown in Fig. 9.PI. With the sampleperiod chosen as I'1t sec, the slowly drifting bias is essentially a constant for 3 sample periods, butsuch a model becomes invalid for 4 or more periods.

b v(fjl

~1 x ++- ++1' r(f.)s+u ~ I

Sampler

FIG.9.PI

(a) Write the equations for the appropriate finite memory filter to estimate x, a "smoothed"version of the input u.

(b) Generate the filter equations to estimate x using

(I) a pseudonoise addition,(2) the Schmidt e technique, and(3) the Fagin age-weighting technique

as ad hoc procedures to maintain a valid estimate of b and x; explain how the value of the "design"parameters, Q, s, and T, respectively, might be established.

9.3 Derive the age-weighting filter results (9-10) and (9-11) by the means described in thetext prior to these relations.

9.4 Show that the age-weighted or fading memory filter described by (9-10) and (9-11) canalso be expressed in the following convenient manner if Qd = 0: For measurement update at timer., either

K(t i) = P'(ti-)HT(t,)[H(ti)P'(ti-IW(tJ + {R(tJ/s}r 1

X(ti+) = X(ti-) + K(tJ[zi - H(ti)x(ti-)]

P'(ti +) = s{P'(ti -) - K(tJH(ti)P'(ti -)}

64

or

9. COMPENSAnON OF LINEAR MODEL INADEQUACIES

P'(t;+) = [{SP'(ti-)}-' + HT(t;)R-'(t;)H(t;)r'

K(t;) = P'(t; +)HT(t;)R- '(t;)

x(t/) = x(t; -) + K(t;)[z, - H(t;)x(t; -)]

and, for time propagation to the next sample time,

x(t;-+tl = ~(t;+"t;)x(t;+) + Bd(t;)u(t;)

P'(t;-+1) = ~(t;+"t,)P'(ti+)~T(t;+t,t;)

9.5 (a) Generate the finite memory filter for the problem of Examples 9.1, 9.4, and 9.5:

Ii(t) = 0, z(t;) = b(t;) + v(t,)

Using (9-25)-(9-29), develop finite memory filters for N = 2, 5, and 10. Compare these to the resultsof the text example.

(b) Apply the approximate filter discussed at the end of Section 9.4 to this problem (see (9-32),(9-33), etc.), for N of 2,5, and 10.

(c) Show how the finite memory filters above change if the dynamics model is altered to

Ii(t) = - [0.1 ]b(t)

9.6 Derive the recursion relation (9-27) by writing out ~(t;, t;-N+ 1) and ~(t;-t, t;-N) accordingto (9-26) and equating like terms.

9.7 (a) Write the explicit relations required for the linearized Kalman filter for Example 9.8.Which of these can be precomputed?

(b) Compare these to the detailed relations for implementing the extended Kalman filter forthe same problem. Which of these can be precomputed?

(c) Now generate the precomputed-gain extended Kalman filter for this problem. Comparethe details of this filter to the two previous algorithms.

(d) How would filter performance change if only range or only angle measurements wereavailable? Consider observability of the linearized model (as linearized about a nominal circularorbit) for each case. What would this indicate about the linearized filter performance? What aboutthe filters of parts (b) and (c) above?

(e) Show how the three types of filters in (a), (b), and (c) change if the available measurementsare in the form of pressure and a direction cosine,

Zl'(ti) = poexp{ -yx,(t;)} + vt'(t;)

z/(t,) = cos {x3(t;)} + v/(t;)

where Po is sea level pressure and y is a known decay factor, instead of simple range and angle asin Example 9.8.

9.8 Consider a measurement update of the extended Kalman filter of Example 9.8. Explicitlywrite out the equations to incorporate the range and angle measurements as a two-dimensionalvector update, and compare these to the iterative scalar updating with these two measurements.Note the difference of the two results.

9.9 (a) Write out the explicit equations for the extended Kalman filter for the thrust-vectorcontrol problem of Example 9.9. Note that w(t) does not affect the servo: b(to) is known exactlyand angle commands bcom are the only servo input; what implications does this have on the dimen-sion of the state vector in the filter? In actual practice, boom is constant over the interval between

PROBLEMS

measurement updates, and beom(t;) for use from t, to t i + 1 is computed as

65

where Ge* is the optimal controller gain matrix, propagated backwards from final time tf usingthe best estimate of Wb 2, and i(t i +) is the optimal estimate of the five original states in the linearsystem model. This will be developed in Chapters 13-15 in Volume 3. However, for this problemconsider beomas a deterministic input.

(b) Now generate the precomputed-gain extended Kalman filter for this same problem. Beaware of implications ofthe assumed equilibrium solution that is used for gain evaluation. Considera constant-gain extended Kalman filter. How does this compare to the precomputed-gain filter,and why might it be a suitable replacement for the precomputed-gain filter in this application?

9.10 Assume that you want to design a filter to separate a signal s(t) from a signal-plus-noiseinput, modeled as

i(t) = s(t) + n(t)

where the signal and noise are assumed stationary, zero-mean, and uncorrelated with each other,with power spectral density descriptions, respectively, of

5.7; 12'I',,(w) = w2 + 4

7:r. 12'l'nn(W) = W2 + 16

(a) To do so, assume that you take sampled-data measurements every 0.2 sec, of the form

Z(ti) = S(ti) + n(ti) + V(t i)

where V(', -] is zero-mean white Gaussian discrete-time noise of variance 0.01, modeling the effectof errors induced by analog-to-digital conversion with a finite wordlength (see Problem 6.2 ofVolume 1). Generate the appropriate Kalman filter to estimate s(t).

(b) Now assume that filter performance has been considered inadequate in actual operation.You suspect some unmodeled effects, and so you conduct some further analysis of the signalgenerator. The initial model has been a zero-mean white Gaussian noise w(', -], of strength fI; i.e.,

E{w(t)w(t + Tl} = fIb(t)

driving a linear equation

x(t) = -2x(t) + w(t)

with an output of

s(t) = x(t)

After some analysis, a refined model is produced as

x(t) = - 2x(t) + 0.05x2(t) + w(t)

s(t) = x(t) - 0.02x2(t)

Design the linearized and extended Kalman filters for this application, explicitly writing out allrequired vectors and matrices.

(c) Generate the precomputed-gain and constant-gain extended Kalman filters for this prob-lem, and compare to the results in (b).

(d) Describe how one could account for these nonlinearities in ways other than using anextended Kalman filter.

9.11 Reconsider Problem 9.10, but now in the context of continuous-time measurementsbeing available. Assume measurements are available as

z(t) = s(t) + ntr)

where lfI,,(w) is as given in Problem 9.10, but

.i:lfInn(w) = _2_

12_ + 0.1w + 16

(a) Generate the appropriate continuous-measurement Kalman filter to estimate s(t).

(b) Consider the model modification given in part (b) of Problem 9.10, and design the linearizedand extended Kalman filters for this continuous-measurement case.

{c) As in the previous problem, generate the precomputed-gain and constant-gain extendedKalman filters for this problem, and compare these to the results of part (b).

9.12 (a) Explicitly write out the extended Kalman filter equations for Example 9.10.

(b) Assume that a controller can point the center of the field of view toward any commandedlocation in one sample period, without error. Thus, at time to' we can propagate the estimate toobtain X(ti-+ I), and the first two components, Xdyn(ti-+d and Ydyn(t i-+ d, can be used as pointingcommands. How should the filter's computed X(ti-+ I) be modified if this control action is taken?How does this affect the next residuals? Compare to "open-loop" tracking in which the center ofthe field of view continues to point in one direction for all time.

(c) Show how the filter changes when a dynamics model appropriate for some airborne targets,

Xdyn(t) = vx(t)

vx(t) = ax(t)

axlt) = (-I/Ta)axlt) + wx(t)

with wxl',') zero-mean white Gaussian noise with E{wxlt)wxlt + r)} = Qx<5(r), and similarly forthe y direction, replaces the benign target dynamics model of Example 9.10. Note that ax!',') is afirst order Gauss-Markov process, with autocorrelation kernel of

and that u;x and T; (and corresponding values for the y axis) can be selected to match accelerationcharacteristics of many airborne vehicles.

(d) Repeat part (c), but now use the nonlinear model for acceleration that assumes the targetwill exhibit constant turn-rate trajectories (with pseudonoise added to account for model inade-quacies, etc.),

art) = _0)2V (t ) + w(t)

where

0) = Iv(t) x a(t)!/lv(tJ!2

and I Idenotes magnitude, and x denotes cross-product. Be particularly careful in evaluation of8f/8x: recall that, by definition of partial derivatives, 8f,(X I,X 2 , ••• , x j , ••. , x n)/8x j is to be eval-uated under the assumption that all other x components besides Xj are to be held constant.

9.13 This is a computer problem that assumes access to a Monte Carlo filter performanceevaluation algorithm such as [38], as described in Section 6.8 and Problem 7.14 of Volume 1 forthe linear model case; extensions to the nonlinear truth model (and filter) case will be discussed in

PROBLEMS 67

Chapter II (and Chapter 12). Consider a "truth model" description for Example 9.8 for the case ofu, = U, = aand let G = I, so that one possible nominal solution to the original nonlinear equationswould be the circular orbit with ro = I radius unit, w = I rad/ttime unit), and O(t) = wt. Note thatthe radial velocity is zero and tangential velocity is row = I radius unit/ttime unit) for this orbit.

If an impulsive velocity change L\v is applied tangentially at t = 0, then a "Hohmann transfer"ellipse results as in Fig. 9.P2. Here ro is both the radius of the circular orbit and the perigee of theelliptical orbit, while r l is the apogee of the ellipse; v, is the velocity magnitude for the circularorbit (i.e., row) and [v, + L\v] is the velocity at ro for the Hohmann ellipse. The values of v" L\v, ro,and rl can be related by

(L\V/VJ2 = 3 - [2ro/(ro + rd] - 2.../2rl/{ro + r1)

Consider a highly elliptical trajectory in which r l = lara. This yields L\v ~ 0.34841'" and therebythe appropriate initial condition on 0 at t = a for the elliptical orbit.

Hohmann transfer ellipse

6:fI+-..::.....---i---------'---------.

FIG.9.P2

Assume measurements of 0 only, corrupted by white measurement noise of variance R, everyL\t= 0.5 time units. Design both linearized and extended Kalman filters based on the model givenin Example 9.8, with u, "" U, "" a and WI (-, .) and W 2( ., .) assumed to be zero-mean white Gaussiannoises, independent of each other, and of strengths QI and Q2, respectively. Initially assume thenominal circular orbit for both filters.

Compare the performance of linearized and extended Kalman filters over an interval of 5 timeunits, letting the filter-assumed R and truth model R agree, for the cases of

(a) truth model = circular orbit, R = 0.01, QI = Q2 = 10- 6,

(b) truth model = circular orbit, R = 0.0001, QI = Q2 = 10- 6,

(c) truth model = ellipse, R = 0.01, QI = Q2 = 10- 6,

(d) truth model = ellipse, R = 0.0001, QI = Q2 = 10- 6,

(e) truth model = ellipse, R = 0.01, QI and Q2 tuned for best performance,

(f) truth model = ellipse, R = 0.0001, QI and Q2 tuned for best performance.

Specifically look at the mean ± lIT (standard deviation) time histories of errors between true andestimated variables.

CHAPTER 10Parameter uncertaintiesand adaptive estimation

10.1 INTRODUCTION

For many applications of interest, various techniques can be used to producean adequate system description in the form of a linear system model driven byknown inputs and white Gaussian noises, with which it is possible to developan optimal state estimator and/or controller. However, the optimality of thesedevices is dependent upon complete knowledge of the parameters that definethe best model for system dynamics, output relations, and statistical descriptionof uncertainties. In any practical application, these quantities are known onlywith some uncertainty, and the performance degradation that results fromimproperly assigned values can be severe.

The extent of the model uncertainty and the sensitivity of filter performanceto such uncertainty vary substantially from one problem to another. Forexample, in spacecraft thrust vector control problems, the physics involvedare well understood and can lead to a very accurate linear system model, butcertain key parameters, as bending mode natural frequency, are typically knownimprecisely. An optimal filter or filter/controller combination is sensitivelytuned to this parameter, and even a small deviation in its value can causeinadequate, or even unstable, control. On the other hand, the model is less welldefined for the dynamics involved in many process control applications.Distillation towers would be a prime example: a wide range of complex dy-namics occur within each tower, but practical experience has shown thatadequate control is possible by assuming an approximate model of a secondorder system plus a time delay. Although the towers are qualitatively similar,the appropriate parameters for the linear model differ for each particular tower,and vary slowly in time as well. A priori information about these parameterswould typically be in the form of only a range of physically admissible values.

68

10.1 INTRODUCTION 69

Thus, in order to improve the quality of the state estimates, it would bedesirable in many instances to estimate a number of uncertain parameters inthe dynamics or measurement model simultaneously in an online fashion. Thisis often termed combined state estimation and system identification [6, 13, 28,31, 35, 55, 58, 63, 82, 84, 88, 89, 91, 112, 115, 116, 117]. A number of methodshave been suggested for developing such capacity, such as using an extendedKalman filter to solve the nonlinear estimation problem that results fromtreating the parameters as additional state variables. However, these techniquesusually depend upon a priori parameter statistical information (difficult toprovide with confidence in many practical situations), or require a difficultinterpretation of physical knowledge about the parameters (i.e., that they aremore slowly varying than the states) into the specification of appropriate noisestrengths to drive the dynamics model. One objective of this chapter is toprovide a feasible state and parameter estimator that (1) does not requirecomplete a priori parameter statistics, but can utilize any such informationthat is available, (2) allows the engineer to use his knowledge that the param-eters are slowly varying (if at all) in a direct and physically meaningful manner,and (3) provides, or yields to approximations that provide, both online capa-bility and adequate performance.

Previous chapters demonstrated the effect of improper filter tuning, i.e.,improper selection of parameters to describe noise statistics in a problemformulation, upon the precision of state estimation. One would like to readjustthe assumed noise strengths in the filter's internal model, based upon informa-tion obtained in real time from the measurements becoming available, so thatthe filter is continually "tuned" as well as possible. Such an algorithm is oftentermed an adaptive or self-tuninq estimation algorithm, and this class of algo-rithms willalso be investigated in this chapter [5, 6, 59,88,91,103,141]' Actually,this differs from a state estimation/system identification algorithm only in whichparameters are assumed to be uncertain enough to require estimation, and allsuch algorithms could be properly termed adaptive estimators, or simultaneousestimators of states and uncertain parameters. Because of the interrelation ofconcepts involved in problem formulation and solution for these two areas, itis fruitful to consider them together [62, 72].

In all cases, the key to adaptation will be the residuals of the state estimator.Since these are the differences between actual measurements and best measure-ment predictions based upon the filter's internal model, consistent mismatchindicates erroneous model formulation, and particular characteristics of themismatch can be exploited to perform the needed adaptation.

Section 10.2 formulates the basic problem of parameter uncertainties andadaptive estimation. The next four sections treat uncertainties in cD and B,,using maximum likelihood techniques to develop the full scale estimator,evaluating its performance capabilities, and attaining online applicability with-out severely degrading performance. Similarly, Section 10.7 develops the

70 10. PARAMETER UNCERTAINTIES AND ADAPTIVE ESTIMATION

maximum likelihood estimator for the case of uncertainties being confined toQd and/or R. In subsequent sections, other solution methods are developed:Bayesian and multiple model filtering algorithms, correlation methods, andcovariance matching techniques.

10.2 PROBLEM FORMULATION

Assume that the system of interest can be described by means of the linearstochastic difference equation

(10-1)

from which are available discrete-time measurements modeled by the linearrelation

z(ti) = H(ti)x(ti) + v(ti) (10-2)

It is assumed that wd ( ' , .) and v(', .) are independent, zero-mean, white Gaussiannoise processes with covariance kernels

E {Wd(ti)W/(t)} = Qd(t i ) oijE {v(t;lvT(t)} = R(ti)oij

(10-3)

(10-4)

where Qd(t i ) is positive semidefinite and R(t;l is positive definite for all t.. Theinitial condition on the state is known only with some uncertainty in general,and x(to) is described by means of a Gaussian random vector assumed in-dependent ofwd ( ' , ' ) and v(','), with known mean and covariance:

E{x(to)} = Xo

E {[x(to) - xo] [x (to) - XO]T} = Po

(l0-5a)

(10-5b)

where Po is positive semidefinite.In a system identification problem context, some parameters defining the

state transition matrix CI>(ti+ 1, til, input matrix Bd(ti) , or measurement matrixH(t i) time histories are not determined completely. These parameters might beexplicit elements of these matrices, but need not be. For instance, (10-1) wouldmost naturally arise as an equivalent discrete-time model to represent acontinuous-time system with sampled-data measurements; then a singleuncertain parameter in the F(t) matrix of the continuous-time description willgenerally affect many elements of both CI>(ti+ l' til and Bd(tJ Uncertainties inH(t i) can be deemphasized for two reasons: (l) generally H(t;l is known moreprecisely than <D(ti + t- til or Bd(t i) in practical problems and (2) uncertainties inH often cannot be distinguished from uncertainties in CI> or B, on the basis ofobserving time histories of Z(ti) and u(t;l values, so one can choose a state spacedescription that avoids uncertain parameters in H. For example, in the simple

10.2 PROBLEM FORMULATION 71

case of a scalar state system model with no control inputs or driving noise,the ith measurement is [H(tJ<D(ti , to)x o + V(ti)J, from which an estimate can bemade of the product H(t;)<D(ti, to), but not of both H(tJ and <D(t;, to) separately.Note that uncertainties in Gd(ti) are not included, but this case can be treatedequivalently as uncertain parameters in Qd(tJ

In an adaptive estimator context, uncertain parameters are considered toexist in the Qd and/or R matrices. These are to be estimated simultaneously withthe states in a filter algorithm.

The designer often lacks sufficient information to develop valid or completestatistical or probability density function models for these parameters. In fact,he might be able to provide at best a range of possible values and a most probablevalue for each parameter by examining the physics of the problem. Theseparameters are distinguished from the state variables in that they will varysignificantly more slowly than the states, and may in fact be time invariant. Itis this characterization of uncertain parameters that an effective estimationalgorithm should exploit in a direct manner.

The objective is to improve the quality of the state estimation (and possiblyof a stochastic controller as well, to be discussed later) by simultaneouslyestimating some of the uncertain parameters in the model structure. From acomputational feasibility standpoint, it will be important to perform a sensitivityanalysis [82, 83, 85, 100, 101Ja priori to identify the parameters that are crucialto estimate, rather than to estimate all whose values are at all uncertain. More-over, it may be advantageous computationally to attempt to choose the bestparameter value from a set of discrete values rather than to consider a con-tinuous range of possible values. Such practical aspects will be considered oncethe full-scale solution to the problem has been described in detail.

Of the many possible means of formulating this problem, the maximumlikelihood technique has been chosen for development in this chapter [65, 82,108, 115, 121, 147]. This choice was motivated by many considerations. First,some properties of a general maximum likelihood estimator make it especiallyattractive [25, 107, 121, 137, 145]. The following characteristics have beenproven by Cramer [25J for the case of independent, identically distributedmeasurements, and the corresponding generalizations to the context of thecurrent problem will be developed subsequently. If an efficient estimate exists(i.e., ifthere exists an unbiased estimate with finite covariance such that no otherunbiased estimate has a lower covariance), it can always be found throughmaximum likelihood methods. Further, if an efficient estimate exists, thelikelihood equation will have a unique solution that equals the efficient estimate[127]. Under rather general conditions, the likelihood equation has a solutionthat converges in probability to the true value of the variables as the number ofsample elements grows without bound; i.e., it is consistent [10-12,109,139].This solution is an asymptotically Gaussian and asymptotically efficient


estimate. Kerr [68J further asserts these additional properties. If any singlesufficient statistic for the estimated variable exists, the maximum likelihoodestimate will be sufficient, and under very general conditions, it will be at leastasymptotically sufficient and unbiased. Even though the estimate will generallybe biased for small samples, it will provide the unique minimum attainablevariance estimate under the existence of sufficient statistics, attaining theCramer-Rae lower bound [137J if this is possible. Without sufficient statistics,this optimal behavior cannot be proven for small samples, but it will still beasymptotically optimal and usually a good small-sample estimator.

With regard to the bias, maximum likelihood estimates tend to have thetrue value of the estimated variable near the center of their distributions, sothat the bias is often negligible. Levin [75J further states that if the measurementnoise variance is small with respect to the actual system output, then the bias inthe estimate of the pulse transfer function parameters of the system model willbe negligible compared to the standard deviation of the estimate. He alsodemonstrates that, in the absence of the Gaussian assumption on the noises,the maximum likelihood estimate will at least provide a generalized leastsquares fit to data.

Besides these desirable properties, other considerations tend to favormaximum likelihood techniques as well. Once a particular conditional densityhas been propagated, the various possible estimates, as the mode, median, ormean, will often be very close, especially for unimodal densities concentratedabout their mean. Therefore, it is logical to choose the estimate that is easiestto compute or approximate. Schweppe [119-l21J and others have noted thatestimates based on the peak value of the density often have this characteristic.Furthermore, the mean and median estimate computations are made complexby truncating the density, as by imposing an admissible range of parametervalues, whereas the maximum likelihood estimate is not affected at all. In hiswork toward simultaneously estimating the system state and noise statistics,Abramson [1J chose the maximum likelihood method partially because ofsevere analytical difficulties in a minimum variance formulation. Especiallysince one objective of this chapter is to provide online algorithms, the rationaleof using the estimator which yields the simplest implementation is important, sothe use of maximum likelihood techniques is further substantiated.

A potentially simpler technique might be the method of weighted leastsquares [52,54, 143]. With the "proper" choice of weighting factors, this methodcan derive the Kalman filter and other powerful estimation results. However,the appropriate choice of the weighting factors is usually based upon con-siderable hindsight gained from comparison to other estimation techniques.The primary conceptual disadvantage to this method is that it does not attemptto propagate statistical or probabilistic information in time, and the knowledgeof certain conditional densities provides a complete, rational basis of estima-tion. Although the available statistical information can often be incorporated

10.2 PROBLEM FORMULATION 73

into the weighting factors, the estimation technique itself does not reveal howto do so.

Bayesian estimation is conceptually satisfying in that it propagates theconditional density of the variables to be estimated, conditioned upon thevalues assumed by the sample elements that are actually observed. However,this requires an a priori specification of a parameter density function, andsufficient statistical information to define such a density adequately is quiteoften lacking in real applications. A maximum likelihood formulation need notsuffer from this drawback. The classical likelihood function is the conditionaldensity of the measurements, conditioned upon the value of the uncertainparameters, and the estimation is then a form of hypothesis testing to find thevalues that maximize the probability of the events (the measurements) that haveactually occurred. However, there are also more general forms of likelihoodfunctions [82, 120J, in the form of appropriately defined conditional densities,which can exploit as much, or as little, of the a priori statistical information asis available. In this respect, the maximum likelihood approach is more generallyapplicable than Bayesian methods. Similarly, Kashyap [65J has asserted thatmaximum likelihood methods are more generally applicable than least squares,the instrumental variable method, stochastic approximation, or other methodscommonly suggested for system identification purposes [2, 51, 71,114]'

Perhaps the single greatest disadvantage to maximum likelihood estima-tion is the lack of theoretical knowledge about the behavior of the estimates forsmall sample sizes.Much is known about the asymptotic behavior as the numberof samples becomes infinite, however, and a considerable amount of practicalexperience with the method instills confidence in its viability.

To exploit the maximum likelihood method, one must choose an appropriatelikelihood function. This in turn requires a mathematical model for the param-eters. An "optimal" estimation of the parameters should exploit the knowledgethat their values will be more consistent from measurement sample time tosample time than will any of the states or noises associated with the problem.An adequate and readily applied model is that the parameters are essentiallyconstant over any given interval of N sample periods (where N is chosen by thedesigner) [29,82, 121]. That is, at a given time t., the parameters are modeledas remaining constant over the samples (i - N + 1) to i. At the next sampletime t;+l' the parameters are again assumed to be constants, though possiblyof different magnitude, over the samples (i - N + 2) to (i + 1). A polynomial ofany fixed order could be used to model parameters over (longer) fixed-durationintervals, but use of a constant can usually be justified as the best a prioriapproximation, the assumed form that is easiest to work with as a designer(as, to set the appropriate value of N), and the form that yields the least complexestimation equations (of distinct importance for online applicability).

Note that such a model does not require the designer to make assumptionsabout the form of a priori probability densities to describe the parameters, a

difficult task leading to rather arbitrary choices, often with erroneously biasingeffects. Moreover, there is no contorted stochastic description ofthe parameters,such as the output of an integrator driven by white Gaussian noise of strengthchosen in an iterative manner so as to yield reasonable results. Instead, thedesigner must determine an adequate value of N for the "essentially constantover N sample periods" model from the physics of the particular problem athand: a more straightforward application of his physical knowledge. Factorsinvolved in the choice of N are fully discussed later in Section 10.6.

Based upon this model for the parameters, one is naturally led to consideringa fixed-length memory type of estimator for the parameters. Conceptually, onecan "remember" only the most recent N data points and ask, what constantvalues for the parameters will fit these N samples best in a maximum likelihoodsense? By proper choice oflikelihood function, such a parameter estimate canbe combined with a state estimate that depends either on the most recent Nsamples of data or on all data taken since the initial time. The choice of thelikelihood function for eventual implementation will depend upon tractabilityof the resulting equations, physical sensibility of the likelihood function, andoverall performance of the estimator.

This formulation will provide an online estimator that would remain sensitiveto any slow parameter variations, whereas a growing-length estimator employ-ing a constant-parameter model would become less sensitive to parameterchanges occurring later in the interval of interest than those occurring earlier.On the other hand, an estimator with N > 1 will not be overly sensitive tosingle points of bad data, as would be the case if the parameter estimate werebased upon only the current single measurement.

Apparent drawbacks of the fixed-length memory formulation of parameterestimation would be the necessity to "remember" N sets of measurement dataat all times and the foreseeable use of "batch processing" of the data collectedover an N-period interval each time a parameter estimate is made. With regardto objectionable memory requirements, as the system passes from time t i - 1

to t;, the information provided by the (i - N)th set of data would be removed,and the data from the ith measurement could be overwritten into these locationsof an inexpensive form of memory. Batch processing will not be required online:because the parameters are slowly varying, they can be estimated less fre-quently than the state variables (for instance, only every N sample periods, atleast after an initial transient period of the first N sample times).

10.3 UNCERTAINTIES IN <D AND Bd :

LIKELIHOOD EQUATIONS

Assume that an adequate system representation has been generated in theform ofEqs. (10-1)-(10-5), but that p uncertain parameters have been identifiedin the state transition matrix (<<I» and deterministic input matrix (Bd) time

10.3 UNCERTAINTIES IN <D AND Bd : LIKELIHOOD EQUATIONS 75

histories. Let these parameters be arrayed as the components of a p-dimensiona1vector, denoted as a. Further assume that the following matrices (required forparameter estimation) can be evaluated (or approximated) for all times ofinterest: 8<D(ti + l' ti)/8ak and 8Bd(t;)/8ak for k = 1,2, ... , p.

First one must ask if simultaneous state and parameter estimation is possibleto accomplish. One cannot really attempt to estimate the "true" values ofstates and parameters in a system: there are no n-dimensional processes innature, only processes that can be modeled adequately with an n-state repre-sentation. Thus, one attempts to find the values of states and parameters that,when substituted into the assumed model structure, yield a model outputbehavior that best duplicates the actual system performance in some respect.The ability to perform the estimation will therefore be determined by conditionsupon the mathematical model employed as an adequate representation. If theestimation is impossible with the originally formulated problem, one may needto incorporate different measurements, additional measurements, or a modifiedsystem model in order to satisfy these conditions.

Basically, of the system modes (quasi-static perhaps) that a particular param-eter affects, at least one must be (1) observable, (2) excited by the initialconditions or controllable with respect to the points of entry of the dynamicnoise wd ( . , . ) or those of the deterministic input (assumed not identically zero),and (3) such that <D and Bd do not assume identical values over a range of param-eter values or for a number of discrete parameter values (as in an aliasingphenomenon). Refined theoretical "identifiability" criteria have been establishedin the literature [10, 11,27, 36, 82, 102, 124, 127-129, 132-134], but they aregenerally difficult to verify. The generalized ambiguity function concept ofSection 10.5 will also indicate whether the estimation of particular parametersis feasible (in a practical manner), and it will further predict the accuracy withwhich each can be estimated. Thus, adequacy of performance would be evaluatedpractically through an ambiguity function analysis and simulation results.

Now let us develop the likelihood equations to be solved to generate thestate and parameter estimates. For any specified likelihood function L[ 8(t;), :!l';],where 8(ti) is the vector of variables to be estimated and:!l'i is the set of realizedvalues of the measurements to be used as data, the objective of maximum like-lihood estimation is to find that value of 8*(t;) that maximizes L[8(t;),:!l'J asa function of 8(t;). When L[8(t;),!r;] is differentiable with respect to 8(t;) andthe maximum actually lies within the admissible range of parameter values,this can be obtained by solving the likelihood equation:

8L[8(ti),:!l'JI =OT (10-6)88(t;) 9(1;)=9*(1,)

Table 10.1 presents the likelihood functions for which results have beenobtained [82]. Those in the left numbered column yield growing-length memoryalgorithms, whereas the right numbered column entries yield fixed-length

(1) In1.(f,',.IZ(I;)(~,IXI;;r;)

(2) In !Z(f;)lx(I;), .(~, I~, IX)

(3) In 1.(t'IIZ(f'I,.(~I:X"IX)

(4) In1.(ld,Z(ldl.(~,~dIX)


TABLE 10.1

Candidate Likelihood Functions

(1) In1.(ld.• IZN(,j~,IXI~,.'-N+d

(2a) In !ZN("llx(fd,.(~" '-N+ 11~, IX)

(2b) In !ZN(r,)IZ(I, _NI,x('d,.(;;r" '-N+ 11 :X,_ N'~' IX)

(3) In 1.(f,JlZN(f,).•(~I ~,. '-N+ I,IX)

(4a) In1.(rd.ZN(t,)I.(~,~,,'-N+IIIX)

(4b) In 1.('d, ZN(f;)lz(,,- NI.•(~, ;;r" i-N + 11 :X i - N , IX)

x(t,) = state vector at time t,

a = parameter vector

Z(t;) = measurement history z(td, Z(t2),' , "z(t,)

ZN(t;) = most recent N measurements Z(t'_N+ d, Z(t'-N+2)" , " z(t,)

Z(t'_N) = measurement history z(td, z(t2), ' , "Z(t'_N)

memory parameter estimators with either growing-length or fixed-length stateestimators. Consequently, the functions on the left also serve to generatestarting procedures for the corresponding functions on the right before Nsamples have been accumulated.

Of these prospective likelihood functions, the fourth entry is probably thebest choice for most applications, in that it exploits all a priori informationtypically available and can yield an effective and computationally feasibleestimator. To see this, let us compare the various entries.

The first entry in Table 10.1 would be the most "logical" choice from aBayesian viewpoint, but it requires specification of a priori parameter statistics(unlike the others since a appears to the right of the conditioning sign in theother entries). Such statistics are usually difficult, if not impossible, to assess,and a is assumed to be of specific form (Gaussian, uniform, etc.) solely to admitsolutions to the problem formulated. In the case of a uniform density betweentwo boundary values, the same results can be achieved more readily by usingthe fourth entry in the table and limiting the estimates to the specified range.

The second entry in Table 10.1 is the "classical" likelihood function forgenerating maximum likelihood estimates. However, Bayes' rule (applied toentries in the left column of Table 10.1 for sake of argument) yields

!Z(I;)I.(I;),a !x(I;)la = !x(I,), z(I;)la (10-7)

In other words, the information contained in the second entry is also containedin the fourth entry, but the latter's dependence upon the propagation of thea priori state statistics, given by !x(t;)la(~ Ioc), has been removed. In most practicalproblems, initial state statistics are both available and useful for estimationperformance improvement.

10.3 UNCERTAINTms IN «D AND Bd : LIKELffiOOD EQUATIONS 77

Bayes' rule can also be used to depict the difference between the third andfourth entries of Table 10.1:

!.(t;)IZ(t;), afZ(t;)la = !.(t;), Z(t;)la (10-8)

Because fZ(t;)la(~d~) is not an explicit function of the state values, the stateestimators found by using either the third or fourth entry in (10-6) will be ofthe same algorithmic form. However, the third entry does not include the veryterms that are highly sensitive to parameter values and thereby yield goodparameter estimates.

Thus, the fourth entry is the preferable form. In the right column ofthe table,entry (4a) will yield a fixed-length memory estimator of both states and param-eters, whereas (4b) will combine a growing-length memory state estimatorwith a fixed-length memory parameter estimator. This latter form will be themost useful, from the viewpoints of both performance and computationalburden. Consequently, attention will be concentrated on this form and thecorresponding entry in the left column of Table 10.1 for the associated startupprocedure.

First, consider In!.(t;), z(t;)la(~, ~d ~). Bayes' rule can be applied repeatedlyto obtain

!.(t;), Z(t;)la = !.(t;)IZ(t;), a fZ(t;) la

= !.(t;)IZ(t;), a!.(t;)lz(t, - i ), afz(t, - dla

i

=!.(t,)IZ(t;),a Il !.(tJ)IZ(tj-d,aj= 1

(10-9)

where, since the first measurement occurs at time t1, the term in the productfor j = 1 is fz(t,)la(( 1 I~). Each of the separate densities in (10-9) can be writtenout explicitly as a Gaussian density:

1!.(t;)IZ(ti), a(~ I~;,~) = (2nf/2IP(t/ )1 1/2 exp{ .}

{.} = {-H~ - X(t/)]TP(t/)-I[~ - x(t;+)J}(10-10)

where x(t i+) and P(t;+) are implicitly dependent upon the particular parametervalues upon which the density is conditioned, and

1fz(ti)IZ(tj-d,a('jl~j-l>~) = (2nr/2!A(tjWI2 exp]}

{.} = { -H'j - H(tj)x(tj -)]TA(tT 1['j - H(tj)x(tj -)J}(10-11)

78

where

10. PARAMETER UNCERTAINTIES AND ADAPTIVE ESTIMATION

A(t) = H(t)P(tj -)HT(t) + R(t) (10-12)

where again x(tj -), P(tj -), and A(t) are implicitly functions of the given param-eter values. By substituting these expressions into (10-9), the likelihood functioncan be written as

In .r.(I,).z(I,)la(~, x, IIX)

n+im I Ii= --2- ln(2n) - "2 ln(IP(ti +)I) -"2Jl In(jA(tj)j)

1-"2 [~- X(ti+)]Tp(t/)-l[~ - x(t i+)]

(10-13)

The desired likelihood equations are found through Eq. (10-6), which can bewritten as the simultaneous solution to

~ [ln1.(I,).Z(I')la(~,1rdlX)]1 = OT.. ~=x*(I,)

",=a*(I,)

: [In 1.(1,). z(I')la(~, 1r i IIX)] I = OTIX ~=X*(li)

cc=a*(t 1 )

Using (10-13), the first of these becomes

-[~ - x(ti+)]TP(t/)-lll;=x*(I,) = OT

",=a*(I,)

for which the solution is

x*(t.) = x(t+)!1 1 cc=a*(tt>

(10-14a)

(1O-14b)

(10-15)

In other words, the maximum likelihood estimate of the system state at timet, is provided by the Kalman filter algorithm of Chapter 5, with the estimateof a*(t i ) replacing the nominal parameter values for the required propagations.

Now consider the partial derivative of the likelihood function (10-13) withrespect to the p-dimensional vector of parameters, as in (10-14b). Typical formsto appear in this evaluation are the partials of a log of a determinant and of aquadratic form involving an inverse matrix; such partials can be expressed

10.3 UNCERTAINTIES IN cI> AND Bct: LIKELIHOOD EQUATIONS

as [82]:

79

(1O-16a)

(10-16b)

(10-18)

oInlXI = 0 InlXI 0IXI = _1_ 0IXI = tr {x -1 ax}oak ojXj oak Ixj oak ell.k

oX- 1 = _ X- 1 eX X- 1

ell.k Oll.k

where X is a square matrix and lI.k is the kth component of a:. Using these rela-tions, the desired partial derivatives of (10-13) become

e- 2 Oll.k {In fx(til, z(t,)la(~' a, Ia:)}

_ {P( +)-1 0P(ti+)}-tr t --

I Oll.k

OX(t+)T- 2 0 l P(t/)-l[~ - X(t/)]

lI.k

- [~ - X(t i+)]TP(t/)-l OPo(t/) P(t/ )-1[~ - X(ti +)]lI.k

+ ±tr {A(tj) -1 OA(tj)}

j=l Oll.k

~ OX(tj -)T T _ 1 [ ~ _]- 2 /~'t Oll.k H(tj) A(t) 'j -H(tj)x(tj )

-J1 ['j - H(t)x(tj - )]TA(tF1O~;:) A(tF l['j - H(tj)x(tj -)] (10-17)

From Eq. (10-15), ~ is simultaneously being set equal to x(t/)I,,=o*(til' so[~- x(t/)] = 0 in (10-17). Since fTg = tr{fgT} = tr{gfT} for general vectorsf and g, the likelihood equation for each parameter can be written as (fork = 1,2, ... ,p):

{ap(t +)} i ox(t -)T

tr P(t/)-l T - 2.L o~ H(tj)TA(t)-l[Zj - H(tj)x(tj-)]k J= 1 k

+ jt1

tr{[A(tF 1 - A(tF l[Zj - H(t)x(tj -)][Zj - H(t)x(tj -)]TA(tF 1]

x OA(t)}! - 0Oll.k ,,=o*(t,) -

Unfortunately, there is no general closed form solution to this equation, and soan iterative procedure will have to be employed to determine a*(tJ The solution

to the p equations of the form of(10-18) and the result of(1O-15)yield the simul-taneous maximum likelihood estimate of states and parameters.

To provide a fixed-length memory parameter estimator, consider the like-lihood function Inf.(t;),ZN(t,lIZ(t'_N),a(~'~i,i-N+ll~i-N'O!)' where ZN(tJ is anNm-dimensional random vector composed of the N most recent measurements,Z(t i - N + d, Z(t i - N +2), · · · , z(ti), a particular realization of which would bedenoted as Zi.i-N+l> and the corresponding dummy vector as ~i,i-N+l'By Bayes' rule,

f. !ZN(t;), Z(ti - Nlla= X(t,)IZN(t,l. Z(ti - Nl, a I'

JZ(t, - Nlla

f. !z(l.lla= x(t;)IZ(t;). a I'

JZ(t,-Nlla

i

= f.(t,)iZ(I,l, a IT .f..(tJ)!Z(tj-ll, aj=i-N+ 1

(10-19)

This is identical in form to (10-9) except that the lower limit on the productterms is (i - N + 1)instead of 1.By an analogous derivation, it can be concludedthat the desired estimator is given by (10-15) and (10-18) with the lower limitson the summations changed to (i - N + 1):

x*(tJ = i(ti+)[ (10-20)l2=a*(td

with the p components ak*(ti) satisfying the likelihood equations

{8P(t +)} i 8~( -)T

tr P(ti +)-1 __i _ _ 2 I x tj H(t)TA(t)-l rj8ak j=i-N+ 1 8ak

+ ± tr{[A(t)-1 - A(tT 1rl/A(tT 1J 8A(t)}! = 0 (10-21)j=i-N+ 1 8ak ,,=a'(l;)

for k = 1,2, ... , p, where

A(t) = H(tj)P(tj - )H(tj)T + R(tj)

r j = Zj - H(t)i(tj-)

(10-22a)

(10-22b)

lOA UNCERTAINTIES IN AND Bd :

FULL-SCALE ESTIMATOR

The previous section yielded likelihood equations, (10-15) and (10-18) or(10-20) and (10-21), the solution to which will provide the desired estimatesof system states and parameters. Since these equations do not have a generalclosed form solution, it is necessary to consider iterative techniques.

lOA UNCERTAINTIES IN «D AND Bd : FULL-SCALE ESTIMATOR 81

(10-23)

The general form of the likelihood equations is given by Eq. (10-6), whereO(ti) is composed of an n-dimensional partition of states and a p-dimensionalpartition of parameters. If this equation were to be solved iteratively by theNewton-Raphson method [57, 64], the new estimate of O(t;), denoted as O*(t;),could be expressed in terms of a trial solution or previous best estimate, denotedas O*(ti), as (see Problem 10.10):

O*(t-) =°(t.) _ [a2L[0*(t

i),zaJ-1

()L[O*(t;),ZaT

, *, a02 00

where the notation {aL[ O*(tJ, Za/oO} T is meant to depict the vector quantity{oL[O, za/aO}T evaluated at the point 0 = O*(ti): the gradient or score vector. Touse this algorithm, the second derivative matrix, or Hessian, a2 L[O*(ti), Za/(1(J2must be of full rank. Local iterations of (10-23) would be possible conceptually,but would not be practical for online applications. Unfortunately, even if onlyone iteration of (10-23) were to be processed at each time step, the computationand inversion of the required Hessian would put a prohibitive load on anonline computer [39,43].

Rao [107] has suggested an approximation called "scoring" which simplifiesthe computations substantially while maintaining accuracy over large samples.The approximation made is that

where the matrix

J[ .°(.)] ~ E {oLeO, Z(t;)]T aL[O, Z(ti)] 10 =°(.)}t" * t, ao ao * t,

(10-24)

(10-25)

is termed the conditional information matrix by Rao [107]. It can be shown that,in fact [82],

(10-26)

(10-27)

so that the approximation being made is that the second derivative ()2L/002 [°*(t;),Za, for a particular realization of Z(t i , wk ) = Zi' can be adequatelyrepresented by its ensemble average over all possible measurement time his-tories. The desirability of such an approximation is that it requires generationof only first order information instead of first and second order, as required inusing (10-23) directly. Thus, the first order correction to O*(t;) becomes

- - - loL[O*(t,.),Z.]TO*(t.) = 0 (t·) + J[t. 0 (t.)]- ,, *, l' * , 00


Various other iterative techniques have been investigated, but "scoring"has been found to be a superior algorithm. The error committed by the approx-imation (10-24) is of order liN for large N, and for large N its convergencerate approaches that of the Newton-Raphson method. In general, it convergesless rapidly than a Newton-Raphson algorithm near the solution, but it con-verges from a larger region than Newton-Raphson and has very substantialcomputational advantages over this method. On the other hand, it convergesmore rapidly than a conventional gradient algorithm, requiring somewhatmore computation to do so. Conjugate gradient methods might also be em-ployed, but the literature reports certain undesirable numerical characteristicsof this algorithm for the parameter estimation application.

There are some disadvantages to the "scoring" method, but these can becountered in practical usage. First, initial evaluations of J[t;, O*(t;)J yield amatrix with small magnitude entries (and not a good approximation to thenegative Hessian), so that its inverse has very large entries. Thus, a precomputedJ- 1 (the value to which J- 1 might converge in simulation studies for an averagevalue of (J, for instance) could be used in the recursion (10-27) for the first fewsample periods, until J[t;, O*(t;)] has grown to a reasonable "magnitude." An-other disadvantage might be the need to calculate the value and inverse ofJ[t i , O*(tJJ repeatedly. However, since J[t i , O*(tJJ typically will not change sub-stantially after an initial transient, one could retain a fixed computed value forJ- 1 after a certain state, periodically verifying whether the corresponding Jis still "close" to a currently computed J[t;,O*(t;)]. The effect of these ad hocapproaches will be depicted through examples later.

Based on the previous discussion, the recursions to yield a full-scale solutionto the state and parameter estimation problem will now be developed. This isnot intended to provide online capability, but rather to establish the bestperformance achievable from maximum likelihood methods. Section 10.6 willdevelop means of furnishing online applicability based on insights gained fromthe full-scale estimator, and the performance and computational loading ofthe resulting estimators can then be compared to the benchmark of the full-scale estimator.

To implement the scoring iteration for equations of the form (10-21), it isnecessary to generate the score {oL[x*(t;), a*(t;), ZaloaV and the conditionalinformation matrix J[ti> x*(t;),a*(t;)]. The score is a p-dimensional vector whosecomponents are of the form of Eq. (10-21) times - t, but in which the terms areevaluated with the parameter estimate a*(t;), rather than the actual but unknownmaximum likelihood estimate a*(t;). To evaluate the required terms, it is con-venient to decompose the N-step score into the sum of the N most recentsingle-measurementscores,Sl[Z;, a*(t;)J, and a final term denoted as y[Z;, a*(tJJ:

oL;;- [x*(tJ, a*(t;), Za = }'k[Z;,a*(t;)J + L s, l[Zj' a*(t;)] (10-28)uak j=;-N+ 1

lOA UNCERTAINTIES IN «I> AND Bd : FULL-SCALE ESTIMATOR

where

83

(1O-29a)

(10-29b)

The notation is meant to depict that, although Sk1[Z j , aJ explicitly contains

only the residual at time t j , this single-measurement score is in fact a functionof the entire measurement history Zjo

With regard to the conditional information matrix, it is possible to obtaina closed-form expression for the case of the parameter value assuming itstrue (but unknown) value at [82, 145]. It can be shown that the kith componentof J[ti , x*(t i), atJ can be decomposed into the sum of the N most recent single-sample terms and one additional term [82J:

Jkl[ti,X*(t;),~J = E{Yk[Z(t;),aJY1[Z(ti),aJ!a = at}i

+ L E{s/[Z(tj),aJs/[Z(t),aJ [a =~} (10-30)j=i-N+l

where

(1O-31a)

(10-31b)

The equations above are exact: no approximations were used to derive them.The approximation to be made is that the same expressions can be used forparameter values other than at.

The parameter estimator seeks the value of the parameter vector, constantover the most recent N sample times, that best matches the system model

(1O-32c)

(10-32a)

(1O-32b)

(1O-32d)(for all k)

to the measurements taken from the real system. Thus, if the estimate is beingmade at time t., then X(tt-N) and P(tt-N) are considered to be immutable; thus,the initial conditions for the N -step recursions are

X(tt-N) = previously computed

P(tt-N) = previously computed

OX(tt-N)= 0 (for all k)oak

OP(tt-N) = 0Oak

E {OX~:~N) OX(i:N)T Ia = a*(ti)} = 0 (for all k and l) (10-32e)

The computational process will now be described. Let the current time bet., and let the parameter estimate most recently computed (at the previoussample time t,-1' assuming a new parameter estimate is generated at everytime step) be denoted as a*(t;). Using a*(tJ and starting from the initial con-ditions in (10-32), perform an N-step recursion of

(1) X(tt-1) --+ x(tj-) --+ x(tj+).(2) P(tt-1)--+P(tj-)--+P(t/).(3) Compute Sl[Zj, a*(ti)] and add to a running sum.(4) Compute E{sl[Z(tj),a]slT[Z(t),a]la = a*(ti)} and add to a running

sum.

At the end of the N-step recursion, y[Zi, a*(ti)] is computed and added to thescore running sum to form {oL[x*(ti),a*(ti),Z;]/oaV, and similarly E{y[Z(ti),a]yT[Z(tJ, a] Ia = a*(ti)} is computed and added to the conditional informationmatrix running sum to generate J[ti, x*(ti), a*(tJ]. Finally, a new parameterestimate is computed by means of

a*(t;) = a*(ti) + J[t;, x*(ti), a*(ti )] -1 {OL[x*(t i),a*(ti),Z;]/oa V (10-33)

Thus are generated a state estimate x(t/), evaluated using a*(ti), and a newparameter estimate a*(ti). Local iterations are possible to improve convergence,but they would require substantial computation. (These are often consideredfor offline identification.)

The recursions will now be described in detail. To propagate from just aftermeasurement incorporation at time tj _ 1 to just before measurement incorpora-tion at time t j , for j = (i - N + 1), (i - N + 2), ... , i, the relations (implicitlybased on a*(ti)) are in three categories. First, the state related equations are:

x(tj -) = CI>(t j, tj- 1)X(tt-1) + Bd(tj_ tlu(tj - 1) (10-34)

P(t j-) = CI>(t j, i.: tlP(tt- tl<l>T(tj, tj- 1) + Gd(tj-1)Qd(tj-1)G/(tj- tl (10-35)

10.4 UNCERTAINTIES IN <D AND Bd : FULL-SCALE ESTIMATOR

A(t j ) = H(t) [P(t j -)HT(t)] + R(t)

K(tj ) = [P(t j -)HT(t)]A -1(tj )

85

(10-36)

(10-37)

These are the usual Kalman filter equations, but the <D and Bd matrices areevaluated using the most recent parameter estimate, a*(tJ

The score equations necessary for time propagation are given by the p setsof "sensitivity system" equations

ax(t j -) _ (_ _ ) aX(tt-1) a<D(tj , tj _ d ~(:- )a

- <D tj,tj _ 1 a + a x t j - 1ak ak ak

(10-38)

(10-39)

aA(t j ) = H(t.) ap(tj -) HT(t_) (10-40)aak j aak j

for k = 1,2, ... , p. Note that a<D(t j , t j_ d/aakand aBd(tj- 1)/aakmust be knownfor all k. Note also that au(tj_ 1)/aak = 0 because, when at time t., control inputsthrough time t i - 1 have already been applied and cannot be changed, even forfeedback control.

Conditional information matrix computations for propagating forward intime are (removing time arguments equal to tj - 1 and shortening "a = a*(t;)"to "a*" in the conditioning):

E {ai (tj-) aiT(tj-) I = ~ ( -)}

a a a a, t,ak a l

= <DE {ai+ ai+TIa*}<DT+ a<D E{i+i+Tla*} a«l>T

aak aal aak oal

+ «I>E{~:~ i+T1a*} 00:: + :: E{i+ o;:T/a*}<DT

OBd { TI~} OBdT+-0 Euu a, -o-ak al

a<D E{~+ TI~ }aBdT es, E{ ~+TI~ }a«l>T+ -0 x U a* -0- + - ux a* -o-ak al oak al

+ «I>E{Oi+ uT1a*} OBdT+ OBd E{U oi+T

!a*}<DT (10-41)oak oal oak oal

86 10. PARAMETER UNCERTAINTIES'AND ADAPTIVE ESTIMATION

E{x(tj-)xT(tj-)/ii = a*(ti)} = cJ)E{x+x+Tla*}T + BdE{uuTla*}B/+ cJ)E{x+uTla*}B/ + BdE{ux+Tla*}cJ)T

(10-42)

+ 0<1> E{x+uTja }B T+ oBd E{ux+Tla }Toak * d oak *

(10-43)

E{Sk 1[Z(t),a]SII[Z(tj),aJla = a*(ti )}

=! tr [A -1(t.) oA(tj) A-1(t.) oA(t j)2 J oak J oal

+ 2A- 1(f.)H(t.)E{oX(tj-) oxT(tj -) Ia = a (ti)} HT(t .)] (10-44)

J J oak oal * J

Equations (10-42) and (10-43) are needed to define certain terms in (10-41), andonce (10-43) is computed, both it and its transpose will be used. The initialconditions at ti-Non (10-41) and (10-43)are 0, and E(X(tt_N)XT(tt_N)!a = a*(ti)}is approximated by E{X(tt_N)XT(tt_N)la = a*(ti-d} as generated at time t i - I .

The expectations involving the control are evaluated in one of two ways. If theu(·) time history is completely precomputed (zero, for instance), then E{u( )Tla*}becomes uE{( )T\a*}, and thus the following recursions must be processed toevaluate the required terms:

E{x(tj +)ja*} = <l>E{x+ la*} + Bdu (1O-45a)

E{ox~:{) Ia*} = [I - K(tj)H(tj)]

x [ <l>E {~:: \a*} + ~: E{x+ la*} + ::kdU](1O-45b)

On the other hand, if the control were computed in feedback form as u(t) =- Gc(tj)x(t/), then the expectations of the form E{u( na*} become- GcE{x+( )Tla*} and E{uuTI a*} becomes GcE{x+x+Tja*}GcT, for which therecursions have already been evaluated.

10.4 UNCERTAINTIES IN «I» AND Bd : FULL-SCALE ESTIMATOR

To incorporate the measurement at time t j, the state relations are:

r j = Zj - H(t)x(t j-)

D(tj) = I - K(t)H(t)

x(t/) = x(tj-) + K(t)rj

P(tj+) = P(tj-) - K(tj)H(t)P(tj-)

= D(t)P(tj -)DT(tj) + K(tj)R(t)KT(t)

87

(10-46)

(10-47)

(10-48)

(10-49a)

(1O-49b)

After the first time through these relations, the values ofx(tt_N+ 1)and P(tt-N+ dare stored in memory to serve as the initial conditions at the next sample time,ti + r-

The score equations to be processed during a measurement update are

and, for k = 1, 2, ... , p,

OJ = A- 1(t j )r j

C(t) = A -l(tj ) - Ojo/

(10-50)

(10-51)

(10-52)

(10-53)

(10-54)

It should be noted that, despite its simple appearance, Eq. (10-54)does accountfor the variation ofK(t) with respect to ak • This can be verified by taking the par-tial of (10-49b) with respect to ai; expressing 8K(tN8ak in terms of 8P(tj -)/oak ,

and collecting like terms. The p scalar results of the computations in (10-52)are added to the running sums for generating the score vector.

The conditional information matrix relations are


E{i(tj +)iT(t j +)Ia = a*(t i )} = E{i(tj -)iT(tj -)Ia = a*(t;)} + K(t)A(t)KT(t)

uo-ss:

E{ax~;k+) iT(t/)la = a*(t i ) }

= D(t)[E{ax~;k-) e(tj -)Ia = a*(tJ} + a~~~ -) HT(t)KT(tj)] (la-57)

Note that the propagation and update equations simplify substantially fora parameter ak confined totally to Bd • Not only is a(f)/aak = 0, but so are allsecond order statistics terms of the form ap- /aab ap+ /aab and aA/aak'

At the end of the N-step recursion, P(t/), ax(t/)/aak' op(t;+)/aak, andE{[ ai(t; +)/aak] [ai(t/)/aaIY 1a = a*(tJ} will have been evaluated for all valuesof k and 1. These are precisely the terms required to form the final componentsof summations to yield the score vector and conditional information matrixaccording to (l0-29b) and (1O-31b). These are computed and added to theappropriate running sums to generate {aL[x*(ti), a*{tJ, Z;]/aa}Tand J[ti , x*{tJ,a*(tJ] via (10-28) and (10-30). Subsequently, Eq. (10-33) is processed to yieldthe new parameter estimate a*(tJ

EXAMPLE 10.1 To appreciate the magnitude of the computations required by the full-scaleestimator, consider a hypothetical problem involving the estimation of a five-dimensional state,two parameters influencing the state transition matrix Ill, and one parameter affecting Bd . Thecontrol vector is two dimensional, computed in feedback fashion using the current state estimate(the controller gains are assumed to be precomputed). In addition, the state is driven by a scalardynamic noise, and two-dimensional measurements are made at each sample time. Finally, it hasbeen found that the parameters can be modeled adequately as constants over an interval of tensample times.

Table 10.2portrays the number of multiplications, additions, subtractions, and matrix inversionsto be processed in progressing a single sample period with the state and parameter estimation. Thetotals are separated into the evaluations necessary to propagate and update the state x, score s,and conditional information matrix J and finally into the additional computations to achieve anew parameter estimate. (General results as functions of dimensions n, m, r, and s are in [82].) Iflocal iterations are to be considered, the numbers in the table correspond to the number of com-putations in each local iteration.

TABLE 10.2

Full-Scale Solution of Hypothetical Problem

Term Multiplications Additions Subtractions Inversions

x 8800 6638 270 10 (2 x 2)s 25742 20618 60 1 (5 x 5)J 120437 89765 0 0a* 9 9 0 I (3 x 3)

lOA UNCERTAINTIES IN cI> AND Bd : FULL-SCALE ESTIMATOR 89

It can be seen from this table that, even for the moderate state dimension of this problem, thecomputational load is great. In fact, the number of multiplications and additions grows in proportionto the cube of the state dimension n, and the burden on the computer thus becomes overwhelmingas n increases. •

The inordinately large number of calculations required for the conditionalinformation matrix, as evidenced in the preceding example, motivates searchfor a means of approximating its value in a simple fashion. Offline precomputa-tion is a possibility, but discussion of this method is deferred until Section 10.6.Another technique would be to employ the approximation [82]

(10-58)

This states that the expectation of the matrix {[ox(tj - )/oak] [oxT(tj - l/ra/]}over all possible noise sequences can be adequately represented by the value itwould attain due to the particular sequence that is assumed to have generatedthe measurement data. A heuristic justification for (10-58) is that the scoringapproximation to the Newton-Raphson method removed the dependence ofthe Hessian (second derivative) matrix upon the particular sequence of datataken, and this dependence is now regained in a different, but related, manner.From a practical standpoint, this should reduce sensitivity of the estimates toincorrectly assumed values for the noise statistics.

By incorporating Eq. (10-58) into (10-44), the component terms of theconditional information matrix become

(10-60)

From these relations can be seen the tremendous benefit of using the proposedapproximation: J can be evaluated using only the propagation and updaterelations necessary for the state and score computations. Equation (10-59)replaces (10-41)-(10-45) and (10-55)-(10-57), and Eq. (10-60) replaces (10-31b).

EXAMPLE 10.2 For the hypothetical problem posed in Example 10.1, the approximationjust described reduces the required number of multiplications from 120,437to 1598,and the numberof additions from 89,765 to 1117. Of these, only 790 multiplications and 464 additions are requiredfor the 10-term sum of components given by (10-59), the remainder being for the final term givenby (10-60); this is caused by the difference between the state and measurement dimensions. •

The approximate solution method just described yields vast reductions inrequired computation. Nevertheless, investigations [82] have shown thecorresponding accuracy and rate of convergence to be exceptionally high.

EXAMPLE 10.3 Consider a system modeled by the discrete-time representation (with sampleperiod of 0.1 sec)

[XI(tj+tlJ = [0 IJ[XI(tilJ + [OJWd(tilX2(tj+l) -aD -al x2(t j) I

z(t j ) = [I 0] [Xj(t;)J + v(t,)x2(t,)

Let wd( " , ) and v(',') be zero-mean white Gaussian noises with constant variances Qd = 10/3 andR = 0.1, and let the a priori statistics on Gaussian x(to) be

[100 OJ

Po = 0 100

For the purpose of simulation, let the true system parameters be a j = -I and ao = 0.8, resultingin a damped second order mode (the discrete-time system eigenvalues are given by A= -Hal ±Ja j

2 - 4ao] ~ 0.5 ± jO.74,each of magnitude less than unity; thus the mode is stable). Moreover,let the true initial state value be XtQ = [10 IO]".

10

UJ:::l--'<C> 0

......)(

I~A ~

AA) II IV N II

V~1 I~ sV\ ~ 00

V

-10

FIG. 10.1 Typical Xl state trajectory.

10.4 UNCERTAINTIES IN (J> AND Bd : FULL-SCALE ESTIMATOR 91

For the system described, a typical state trajectory for x, versus sample time i for 200 sampletimes (i.e., 20 sec) is portrayed in Fig. to.1. The corresponding X 2 sample trajectory is identical butdisplaced one time-step ahead. Since the measurement is a rather accurate indication of x I (ti), theerrors in a state estimate x,(t i ) will be rather small regardless of the parameter estimate, whereasthe errors in X2(t;), a prediction ofthe next value of x I' will be more dependent upon the accuracy ofthe assumed system model. This is borne out in Fig. 10.2: plots (a) and (b) depict the errors in theestimates x,(t i ) and X2(t;) from a single run of a Kalman filter with a correctly evaluated systemmodel (a, = -I, ao = 0.8), whereas (c)and (d) plot the same errors for a Kalman filter that assumes

10 (0)

• ESTIMATE x PROVIDED BYKALMAN FILtER WITH CORRECTLYASSUMED PARAMETER VALUEal

= -I

Z 5

c::oc::c::u.J

10 (b) • ESTIMATE Xc PROVIDED BYKALMAN FILTER WITHCORRECTLY ASSUMED PARAMETERVALUE al = -I

NocZ

000c::

0c::c::u.J

-10

FIG. 10.2 (a) Error in Xl for assumed parameter value of -1. (b) Error in X2 for assumedparameter value of - 1.

92 10. PARAMETER UNCERTAINTmS AND ADAPTIVE ESTIMATION

,0 (c)

• ESTIMATE X1 PROVIDED BYKALMAN FILTER WITH ASSUMEDal

VALUE OF -0.5

• TRUE VALUE OF al

= -1

z0::o0::0::L.U

'0 (d)

• ESTIMATE x2 PROVIDED BYKALMAN FILTER WITH ASSUMEDal VALUE OF -0.5

• TRUE VALUE OF al

= -I

200

\1Aisc 1 ~

Itv

z0::

~ °f-:H-t-H'--r-H+t-ltt--tf+tlI1f-tf+-ff-+-\l-:ctfL--'+--:1'-+-t-t-+-H--J1+--1i..+-t-+4++4+-I-+.lI+'I-+-+-_0::L.U

N<><

-s

FIG. 10.2 (c) Error in X, for assumed parameter value of -0.5. (d) Error in x2 for assumedparameter value of -0.5.

a, = -0.5. Thus, plots (a) and (b) represent the best state estimate performance attainable froman estimator of both states and parameters, a performance level approached as the parameterestimate converges to the true value.

The state and parameter estimator using the approximation of(10-58)-(10·60) was implementedto estimate X" X2, and a" with the initial guess of a, set at -0.5. Figure 10.3presents typical param-eter estimate error trajectories, plot (a) pertaining to an estimator with the number (N) of samples inthe fixed-length memory parameter estimator set equal to 10,and plot (b) corresponding to N = 30.

10.4 UNCERTAINTIES IN <D AND Bd : FULL-SCALE ESTIMATOR

(a)

.5

93

-'"u..0u..JI-«:::;:l-V> 0u..J

Z

0::00::0::u..J

-.5

(b)

.'1

o+--+-=>t-<---+--<-----+--------J'---+>+-I-++--...----+f-\TI"'-\--+--<---+-.........,Rf-l+---

-; Ii

FIG. 10.3 (a) Parameter estimate error trajectory; full-scale estimator with N = 10.(b) Param-eter estimate error trajectory; full-scale estimator with N = 30.

The larger N is seen to be very effective in reducing the magnitude ofthe error fluctuations, approxi-mately halving the average magnitudes.

The results of a Monte Carlo evaluation, using five simulations differing from each other onlyin the particular realizations of w d ( . , . ) and v(',·) employed, are portrayed in Fig. lOA as a plot ofmean ± 1 standard deviation values for all time. The initial transient behavior is undesirable inthat the average parameter error grows rapidly for a few samples before converging to a small

• RESULTS OF MONTE CARLO EVALUATIONOF FULL-SCALE ESTIMATOR WITHN· 300.8

0 .e

~-

:5~ 0.'-c~

~0.2

'"0~

-0 •

100 150 200 i

0

N<)(

Z

""0

"" -5""u.J

-10

FIG. lOA Mean ± lIT values of full-scale parameter estimate errors.

• ESTIMATE PROVIDED BYFULL-SCALE ESTIMATORWITH N = 3D

FIG. 10.5 Error in x2 when parameter is estimated simultaneously.

"steady state" value. The error grows because of the particular state initial conditions (not changedfor the Monte Carlo runs), and its rate of change is large partially due to the large magnitude ofJ- 1 for the first few samples. Practical experience has shown that this undesirable behavior can beremoved entirely by delaying the first parameter estimate until the average transient settles out,in this case until i = 5, or t, = 0.5 sec.

For all cases discussed, the state estimate error trajectories do in fact converge to the behaviorof the Kalman filter with correctly evaluated parameters. This is readily seen by comparing Fig. 10.5,

10.4 UNCERTAINTIES IN <]) AND Bd : FULL-SCALE ESTIMATOR 95

a plot of the error in x2(t ,) generated by the estimator that yielded the parameter estimate error inFig. 1O.3b, to Fig. 10.2b. •

EXAMPLE 10.4 Consider the same problem as in the previous example, but now let botha 1 and aD be estimated simultaneously with the states. The filter initially assumed erroneous param-eter values of a 1 = -0.5 and ao = 0.3, with true values maintained at -I and 0.8, respectively.

Figure 10.6a presents the error in the corresponding estimate of aI' and Fig. 10.6b pertains toaD' The only difference between this simulation and that portrayed by Fig. 10.3b is the addition of

(a)

• FROM FULL-SCALEESTIMATOR OF STATESAND PARAMETERSaO and at

• N = 30

0"o0"0"UJ

'"u.oUJ

~ .2

>-VlUJ

Z

o+--+=-+-~-+_~-+_~-+_--+;I--~~++-t-~-+;;---+:I+--+___+_-+___+_-,-+-:-_

-.2

0"o0"0"UJ

ot--t+-+----+'<+-''H\I--::':c-+-~--+-_+l~+-~_++_hl_++---+-'rf_+_\H:_~I_--+--+----

• fROMFULL-SCALE ESTIMATOROF STATES AND PARAMETERSao AND at

• N =30

-.1

FIG. 10.6 (a) Error in estimate of the parameter a l . (b) Error in estimate of the parameter ao'


the ao estimate and the delay of the first parameter estimate to reduce the magnitude of initialtransients: identical random numbers were used to simulate the noise inputs for both cases. There-fore, Fig. 10.6a is directly comparable to Fig. 10.3b, and the ability to perform an accurate estimateis seen to be maintained in the face of additional parameter uncertainty.

Direct comparison of the two plots in Fig. 10.6 reveals that the increased error magnitude forat corresponds directly to an error of opposite sign in estimating ao. This is to be expected since theparameter estimator is basically fitting the assumed system model to the acquired data. As a result,the errors in the state estimates for the two cases are virtually indistinguishable. •

10.5 UNCERTAINTIES IN (J) AND Bd :

PERFORMANCE ANALYSIS

To be able to use this estimation technique with confidence, the user mustbe able to predict the performance achievable from the algorithm. This sectionprovides two distinct performance analyses to satisfy this need: characterizationof asymptotic properties and ambiguity function portrayal of estimationprecision.

Asymptotic properties refer to behavior exhibited by the estimator as thenumber of measurements processed grows without bound. Besides being oftheoretical importance for growing-length memory estimators, these propertiesalso delineate trends of estimator behavior as time progresses, for sufficientlylarge N associated with the fixed-length memory parameter estimator. Undera set of very nonrestrictive "regularity" conditions [82, 107, 145], the stateand parameter estimator described in the previous section yields a parameterestimator a*(t i) that is:

(1) consistent: it converges in probability to the true (but unknown) valueat as the number of sample elements processed grows without bound (theoret-ically, as N ---. CXl);

(2) asymptotically unbiased;(3) asymptotically Gaussian distributed with mean at and covariance

J[t i , at] -1;(4) asymptotically efficient: in the limit as the number of sample elements

grows without bound, the estimator a*(tJ is unbiased, has finite error covariance,and there is no other unbiased estimator whose error covariance is smallerthan that of a*(ti ) .

Furthermore, since the parameter estimate does converge to the true valueat, the behavior of the corresponding state estimator converges to that of theKalman filter that uses the true value at for the parameters. The proofs of theseclaims [82], not included herein, are conditioned upon the regularity conditionsmentioned previously and the assumption that the model of the parametersas constant over the interval of interest (i.e., over all time) is valid. Thus, to beassured of estimator behavior well described by this theoretical prediction, oneshould seek as large a value of N for the fixed-length memory parameter

10.5 UNCERTAINTIES IN cD AND Bd : PERFORMANCE ANALYSIS 97

estimator as allowed by the physics of the problem and by computer capacity.Considerable information about performance to be expected from the esti-

mator can be obtained through the concept of a generalized ambiguity function[82, 121], the average value of the likelihood function upon which the estimatoris based. Let L[O(ti ) , ~J denote that likelihood function, and then the ambiguityfunction d i ( · , · ) is defined as the scalar function such that

(10-61)

where 0 is some value of the estimated variables at time t, and 0t is the true, butunknown, value of these quantities. For a given value of 01' this function of 0provides both a global and local performance analysis. If it has multiple peaks,it indicates that individual likelihood functions realized by a particular set ofmeasurement data may well have multiple peaks, which can cause convergenceto local maxima rather than global, or cause failure of any convergence at all.In fact, the name "ambiguity function" is derived from the fact that multiplepeaks tend to cause ambiguities. Furthermore, the curvature of the ambiguityfunction in the immediate vicinity of its peak value at 0t (various chosen valuesof 0t would be investigated in practice) conveys the preciseness with which amaximum likelihood estimate can be discerned. In fact, this curvature can beinversely related to the Cramer-Rae lower bound [121, 137] on the estimateerror covariance matrix.

To be more specific, if a is an estimator of the scalar parameter a, whosetrue but unknown value is at, its bias error is

(10-62)

(10-63)

(10-64)

The Cramer-Rae lower bound on maximum likelihood estimate error variancecan then be expressed as

E{[' - a,]')" [;,+ ~I.J--02 di(a,at)1

a a=at

For a vector parameter a, the unbiased version of (10-63)would be

[02 I J-1E{[a - at][a - at]T} ~ - oa2 JIii(a,3t) a=at

Although biases are generally present in maximum likelihood estimates, theyare often ignored in performance analyses for expediency. The covariance lowerbound so derived serves the purpose in design as the highest attainable goal inperformance, the standard of comparison for all practical, suboptimal imple-mentations.


An analogous development can be made for an N-step generalized ambiguityfunction defined as

dt(a, at) ~ r; ... f:'oo L[a, sr; i-N+ l]fzN(tdla(t.l;?l'i, i-N+ 11~) da'i, i-N+ 1

(10-65)

This would be instrumental in evaluating the fixed-length memory parameterestimator of basic interest to us. It can be used as a design tool, able to delineatesensitivity of estimation accuracy to

(1) various sizes of the fixed-length memory (N),(2) form of the assumed system model (particular state variables chosen,

state dimension, etc.),(3) types of measurements taken (the structure of the H(t i) matrices) and

their precision (the R(t i) time history),(4) the magnitude and uncertainty in the initial conditions,(5) the dynamic driving noise intensities,(6) control inputs,(7) use of alternate likelihood functions,(8) removal of certain terms from the likelihood equations to reduce com-

putational burden.

Thus, the ambiguity function can be an invaluable tool to ensure adequateperformance of a state and parameter estimator.

It can be shown [82] that the ambiguity function value diN(a, at) for any aand at can be evaluated entirely from the output of a state estimator sensitivityanalysis as described in Section 6.8 of Volume 1. Let the "truth model" beidentical to the model upon which a Kalman filter is based, except that theparameters are evaluated as a, in the former, and as a in the latter. Let theestimation error e t(·,·) be defined as the difference between the entire stateestimate x(',·) and the truth model state xt(·,·). Furthermore, let the covariancesof this estimation error before and after measurement incorporation be denotedin accordance with Eq. (6-104) as Pe(tj-;~,a) and Pe(t/;at,a), respectively,where the notation is meant to reflect the dependence of these values on theparticular choices of at and a. Then d;N(a,~) can be evaluated as

i

d;N(a, at) = L [!m In(2n) - ! In[IA(tj; all]j=i-N+ I

- ttr{A - l(tj; a)[H(tj)Pe(tj -; at, a)HT(t) + R(t)]}]

+ [-!nln(2n) - !In[!P(t/;a)l] - !tr{p- 1(t;+;a)Pe(ti+;a"aj}]

(10-66)

The separation of terms in the above expression is intended to display theeffect of the individual terms in Eq. (10-19) on the N-step ambiguity function.

10.5 UNCERTAINTIES IN (J) AND Bd : PERFORMANCE ANALYSIS 99

The last trace term warrants special attention: this corresponds to the likelihoodequation terms that contain the vector [~ - x(t/)], all of which vanish, so theeffect of this trace term on the ambiguity function (and Cramer-Rae lowerbound) should be ignored.

This result is especially convenient. One would normally perform an errorsensitivity analysis of a standard Kalman filter for a range of a values for eachchosen a., to determine whether simultaneous parameter estimation werewarranted and, if so, which parameters might be the most critical to estimate.By so doing, one would also be providing much ofthe data required to generatean ambiguity function performance analysis of the parameter estimation itself.

EXAMPLE 10.5 Consider the application introduced ih Example 10.3. Figure 10.7 presentsthe ambiguity function for i = 50 and N = 30 plotted as a function of the parameter (-ad, withsign reversal to correspond to the entry in the transition matrix. The true value, (-a lt ) , is 1.0. Asseen from the figure, the ambiguity function is unimodal, with sufficient curvature at the peak valueto predict successful parameter estimation.

By calculating the ambiguity function in similar fashion for other values of N, Eq. (10-63) canbe used to establish the Cramer-Rae lower bound on an unbiased parameter estimate error varianceas a function of N, as presented in Fig. 10.8.Since this is a lower bound on achievable performance,it specifies a minimum N required to yield a desired accuracy in parameter estimation for particularvalues ofR, Qd'and Po. Moreover, it indicates that, for small N, enlarging the fixed-length memorycan substantially improve precision, but for N above approximately 30, this length must be enlargedappreciably to effect a moderate reduction in error variance. This would be invaluable knowledgeto aid determination of the N to be used online.

~ -100i=uZ:::lLL

~l-SC)

iD~-200

o 1.0VALUE OF PARAMETER 01

FIG. 10.7 Ambiguity function for i = 50 and N = 30.

2.0

• CRAMER-RAO LOWER BOUND OBTAINEDFROM AMBIGUITY FUNCTION ANALYSIS

100

0.06

0.05

'"W-0

'"'I-< 0.04::;;:I-

~Z

'"' 0.03uz<

'"<>cr 0.020cr5

0.01

0 0 5 10 20 30 40 50

N

FIG. 10.8 Lower bound on parameter estimate error variance vs Nat i = 50.

Return to Fig. 10.3: As predicted by the ambiguity function analysis, the larger interval (N = 30versus N = 10) is very effective in reducing the magnitude of the error fluctuations. From Fig. 10.8,the lower bounds on the 10- values are about 0.16 for N = 10 and 0.09 for N = 30, almost a halvingof the standard deviation. This is corroborated by the plots in Fig. 10.3. Moreover, the 117 valuein Fig. 10.4has converged to the close vicinity of the lower bound provided by the ambiguity functionanalysis. •

1.0

• ESTIMATE PROVIDED BYFULL-SCALE ESTIMATOR WITHN • 30

-'"w-0u..JI-<~ .5l-V)u..J

Z

cr0c::c::u..J

FIG. 10.9 Parameter estimate error for Example 10.6.

10.6 UNCERTAINTIES IN (J) AND Bd : ATTAINING ONLINE APPLICABILITY 101

0.12

• CRAMER-RAO LOWER BOUND OBTAINEDFROM AMBIGUITY FUNCTION ANALYSIS

0.10

50.03020

R-ID

10oo!;-------;--~----'----____;~------'--_c;';;_----'----____;~------'----=:::::;

0.02

~

~-c

§ D.O'

'"o;

N

FIG. 10.10 Lower bound on parameter estimate error variance vs Nat i = 50.

EXAMPLE 10.6 Figure 10.9 is a typical plot of parameter estimate error for the case ofN = 30, R changed from 0.1 to 0.05, and Qj(to) changed from -0.5 to O. This reveals that an abilityto acquire a good parameter estimate is not strongly dependent upon the initial error. Over a largerange of initial values of Qj(to), the estimator was able to converge to the immediate neighborhoodof the true value within five sample periods.

Comparing this plot to Fig. 1O.3b, the effect of reducing the measurement noise variance Ris not very apparent in the post-transient, or "tracking," phase. This and other runs confirm thetrends discernible from Fig. 10.10, a plot of the Cramer-Rae bound as a function of N, at i = 50,for various values of R. Only for R greater than 0.1 does variation of measurement precision have asignificant effect on parameter estimate precision. •

10.6 UNCERTAINTIES IN (J) AND Bd :

ATTAINING ONLINE APPLICABILITY

The full-scale solution algorithm of Section 10.4 requires a substantialamount of computation after each new measurement is incorporated. Con-sequently, this section seeks to attain online applicability while maintainingadequate estimation accuracy. One means of providing such applicability isthrough approximations designed to extract the essence of the full-scale solution:simplified iterative solution procedures, inclusion of only the most significantterms in the estimator likelihood equations, and precomputation and curve-fitting of various needed quantities [82]. Methods not inherently involvingapproximations will also be explored: use of advantageous state space repre-sentations, exploitation of symmetry, and modification of measurementincorporation [82].

Before detailing these aspects, it is appropriate to mention some generalconsiderations that will prevail because of the time constraint inherent in


online applications. First of all, parameter estimation should be restricted tothe least number of parameters that will provide acceptable state estimationprecision and overall system performance. Furthermore, the slowly varying natureof parameters can be exploited, such as by estimating the parameters less fre-quently than the states. An algorithm that calculates a state estimate imme-diately upon receiving a measurement, using a previous parameter estimate,and then performs parameter estimation "in the background," is superior inthe online context to one that requires an iteration of the parameter calculationsbefore it makes its (mathematically superior) state estimate available. Moreover,if a parameter estimate were processed only every N sample periods, only therunning sums for sand J need be stored, and the memory requirements arevastly reduced.

Choice of the fixed-length memory size N affects online feasibility as well.Factors that cause one to seek small N are: (1) the limit imposed by the validityof the model that the system parameters are "essentially constant" over Nsample periods, (2) the limit imposed by the capacity of the computer to ac-complish the required calculations in an allotted portion of a sample period,(3) memory requirements, and (4) the necessity of a small interval to maintainadequate parameter tracking, since retention of large amounts of "old" datacan seriously affect initial convergence and also cause significant lag in responseto true parameter variations. On the other hand, reasons for making N largeinclude: (1) large N yields less estimator susceptibility to errors due to single"bad" measured data points and reduces high frequency oscillation of theestimates, (2) larger N values allow greater confidence that the actual estimatorwill exhibit behavior well described by the theoretical asymptotic properties,(3) the Cramer-Rae lower bound is inversely related to N and in fact establishesa minimum admissible N for possibly providing adequate performance, and(4) certain approximations such as (10-24) and (10-58) become more valid withincreasing N. A tradeoff must be analyzed, and the value of N that is mostcompatible with these diverse factors chosen as the actual size for implementa-tion.

Some simplified iterative solution procedures have already been described,such as using (10-58)-(10-60) to evaluate J[t i , x*(ti), a*(t i)]. To avoid numericaldifficulties of inverting a matrix with very small eigenvalues and large conditionnumber, another approximation was also suggested: using a precomputedJ- 1 for a number of initial sample periods or deferring the first parameterestimate until a prespecified number of samples has been taken, to ensureadequate size of the computed J entries.

Even these approximations do not yield online applicability, however. Everysample period, we are still required to regenerate the state estimates, residuals,and score and J component terms of the parameter vector estimate over anN-step interval, precluding real-time computation in most practical problems.Since the parameters can be assumed to be essentially constant over N periods,

10.6 UNCERTAINTIES IN AND Bd : ATTAINING ONLINE APPLICABILITY 103

then once a good parameter estimate is made, it should not vary significantlyin N steps. Therefore, for some applications, it is adequate to estimate theparameters only every N sample periods (at least after an initial acquisitiontransient), with no reevaluation of terms over that interval. Let a parameterestimate a*(t) be made, and use it for the next N sample periods to propagateand update the state, score, and conditional information matrix relations (using(10-59) and (10-60) to simplify J computations). At each sample time t, in thisinterval, running sum registers with contents denoted as s(t i ) and J(t;) (zeroedat time t) are updated according to

s(t i) = s(ti-d + sl[Zi,a*(t)] (l0-67a)

J(t;) = J(ti-d + Jl[Z;,a*(t)] (l0-67b)

with s! and Jl computed as in (10-52) and (10-59), respectively. At the end ofthe N steps, (l0-29b) and (10-60) are computed and the results added to therunning sums to generate S[Zj+N' a*(t)] and J[Zj+N' a*(t)], and finally thenew parameter estimate is generated by means of

a*(tj+ N) = a*(t) + J[Zj+N,a*(tj)] -lS [ Z j +N,a*(tj)] (10-68)

Advantages of this simplified iterative solution procedure are low computationand storage load and more rapid availability of the state estimate at each sampletime (desirable, and critical if the estimator is to be used in conjunction with afeedback controller). However, a parameter estimate is produced only everyN samples, yielding slower initial convergence to a good estimate and aninherent lag in estimating a parameter change. These effects can be alleviatedby changing the size of the interval from N to Nil sample periods, with I asmall integer by which N is divisible. This effectively reduces the lag in re-sponding to measurement information that the parameter values are differentthan previously estimated. Then, to incorporate the expectation that the param-eters will be essentially constant over N steps, the I most recent parameterestimates can be averaged.

EXAMPLE 10.7 Recall the hypothetical problem discussed in Examples 10.1 and 10.2.Table10.3 reveals the reduction in computations afforded by this online conceptualization. These num berspertain to a sample period in which a parameter estimate is made-the case in which the most

TABLE 10.3

Online Conceptualization for Hypothetical Problem

Term

xsJa*

Multiplications Additions Subtractions Inversions

889 671 27 1 (2 x 2)2801 2249 6 1 (5 x 5)887 694 0 0

9 9 0 1 (3 x 3)

104

Term

xsJa*


TABLE 10.4

Online Conceptualization-s-No Parameter Update


889 671 27 I (2 x 2)2549 2041 6 0

79 47 0 00 0 0 0

calculations must be performed within a sample period. Note that the approximation for J discussedin Example 10.2 is employed.

For the case of not updating the parameter estimate, the required calculations are as in Table10.4.The differences between this and the previous table are due mostly to the removal of the finalcomponents of the score and conditional information matrix given by Eqs. (10-29b) and (10-60),requiring a substantial amount of the J computations because n is much greater than m. •

EXAMPLE 10.8 Recall Example 10.3 and the full-scale estimator results portrayed in Figs.10.3 and 10.4. To provide a direct basis of comparison, identical system simulations (with thesame noise samples) were used to analyze the performance of the online estimator just described.

Figure 10.1la depicts the parameter estimate error committed by the online algorithm thatestimates the parameter every ten samples (with N = 10). For the first ten sample periods, a param-eter estimate is made every period (without any recalculation of's! or Jl values) in order to improvethe initial transient characteristics. Tests conducted without the initially increased estimate fre-quency required about 3 parameter estimates, or 30 sample periods, for the parameter value to beacquired: on the average, five times as long as required for the decay of the transient in Fig. 10.1!.Since the additional computations are minimal, this procedure is an attractive enhancement toperformance.

Full-scale estimator and online estimator capabilities are very comparable, as seen by comparingFigs. 10.1la and 10.3a.This is true, despite a reduction from 12.95to 1.10secoflBM 360/75 computertime to perform the estimation over 200 sample periods (not including simulation and programsetup times). The associated Kalman state estimator required 0.55 sec.

Since an appropriate value of N has been determined as equal to 30, the online estimator couldbe generated with N = 30 (after an initial 10 samples of more rapid estimation, to avoid a 3-iterationtransient equal to 90 sample periods in this case). Or, the estimator with N = 10 can be used,averaging each new parameter estimate with the two previous ones; the original unaveragedestimate is stored for future averaging, while the averaged value is used as the actual current param-eter estimate. Figure 10.11b presents the results of this procedure. Here, averaging was started ati = 40 to yield a graph scale identical to that of Fig. 1O.11a, facilitating a comparison of the post-transient error magnitudes; averaging two estimates at i = 20 and three at i = 30 would be recom-mended for actual use. Averaging is not used in the first ten sample periods, since this would degradeinitial acq uisition capabilities. The accuracy ofthe estimates following i = 40 in Fig. 10.11 b stronglyresembles that attained by the full-scale estimator with N = 30, as depicted in Fig. 1O.3b. For thiscase, the reduction in computer time is even more pronounced, from 38.98 to 1.\0 sec of IBM360/75 time.

This method also decreases the lag time in responding to parameter variations from that obtainedby simply setting N to 30. If a good estimate is achieved at sample instant i and the "true" parametervalue begins to vary slowly, the latter technique will not respond until sample instant (i + 30),whereas the former would start to respond at (i + 10). Moreover, the former would make threeparameter estimate iterations by instant (i + 30), with more valid s' and J' values for instants

(0)

.,• ONLINE ESTIMATOR THAT

PROCESSES A PARAMETERESTIMATE EVERY N SAMPLEPER IODS

• N =10

100

e>::offi°t---t---~--=,==fo---+-F=j'----:-::>-::-+-l-----+--Io----+----1Io----+--l-----+--<---

-.2

(b)

.'

-'".....oL.WI--c .2::E

• ONLI NE ESTIMATOR THATPROCESSES A PARAMETERESTIMATE EVERY N SAMPLEPERIODS

• N =10• AVERAG ING OVER 3D-SAMPLE

INTERVAL

l-V>L.W

Z

txo~ O+--+-+-----+-+--::':,----+----!:=_-,+--,-~-+-_I_--+-f____:f:=\1_--+-+---+__,,+_::_ L.W

-.2

FIG. 10.11 (a) Parameter estimate error; online estimator with N = 10. (b) Parameterestimate error; online estimator with N = 10 plus averaging.

(i + 11) to (i + 30), and thus would generally converge to a better estimate at instant (i + 30) thanthe estimator with N = 30.

Figure 10.12 presents the mean ± 10' time histories obtained from a Monte Carlo analysis ofthe online estimator with N = 10 and averaging over the three most recent estimates, identical tothe simulations used to generate Fig. 10.4. Comparison of these plots verifies the fact that the onlinetechnique attains performance almost identical to that of the more complex algorithm. •

106

0.8

0 .e

~-

~w 0.4::;:~>--V>w

0.2~<>:

~eo

-0.4


• RESULTS OF MONTE CARLO EVALUATION OFONLI NE ESTIMATOR THAT PROCES SES APARAMETER ESTIMATE EVERY N SAMPLE PER 1005

• N' 10• AVERAGING OVER 3D-SAMPLE INTERVAL

FIG. 10.12 Mean ± ItT values of online parameter estimate errors.

If it is desired to evaluate a parameter estimate more often, a somewhatdifferent online approximate technique can be used. Whether or not a new pa-rameter estimate is to be made at time ti, the one-step terms S1[Z;, a*(tj ) ]

and J1[Zi, a*(t)] are evaluated as real time progresses from t;-1 to t.; wheretj is the most recent time that a parameter estimate was made. Also, the newrunning sums for sand J are computed by adding these newest S1 and J1values and subtracting off the oldest, corresponding to time t; - N:

s(t i) = S(ti- 1)+ S1 [Zi' a*(tj )] - S1[Z;_ N'a*(tk) ]

J(t i) = J(t;_ d + J 1[Zi,a*(t)] - J 1 [Z i_N,a*(tk) ]

(1O-69a)

(1O-69b)

where a*(tk) was the available estimate of a when sand J were computed attime ti - N • Note that this technique thus requires additional computer memoryto store the N most recent S1 and J1 evaluations. Whenever a new parameterestimate is desired, the results of calculating (1O-29b) and (10-60) are addedto the appropriate running sums to produce S[Zi' a*(tj ) ] and J[Z;, a*(t)], anda*(t;) is computed in a manner analogous to Eq. (10-68). As in the previousform, using an interval of less than N periods can be used to improve transientand tracking characteristics. Subsequently, the estimates over an N-periodinterval can be averaged to incorporate the constant-over-N-step model intothe parameter estimate and remove high frequency fluctuations.

EXAMPLE 10.9 For the hypothetical problem discussed in Examples 10.1, 10.2, and 10.7,the only change in Tables 10.3 and 10.4 are the incorporation of three more subtractions for sand six more subtractions for J, as seen by comparing Eqs. (10-69) and (10-67). •

10.6 UNCERTAINTIES IN «I> AND Bd : ATTAINING ONLINE APPLICABILITY 107

EXAMPLE 10.10 Recall the problem in Examples 10.3 and 10.8. Although an appropriateN was found to be 30, the lack of term regeneration requires a smaller interval for adequate per-formance, and a 5-step size was chosen. Averaging over a 30-sample interval was then incorporated.

Figure 10.13a plots the parameter estimate error from such an estimator that generates a newparameter value every sample period, produced by the same noise sequences as generated Figs. 10.3and 10.11.Averaging was started at the initial time, causing the initial transient to settle out some-what more slowly; this aspect can be removed by starting the averaging later. As seen from the

• ONLINE ESTIMATOR THAT PROCESSESA PARAMETER ESTIMATES EVERY SAMPLEPER 100

• N =5• AVERAG ING OVER 3D-SAMPLE INTERVAL

(0)

• &

-'"u...0 .1LU.....«::2:.....V)LU

Z

c>: .20c>:c>:LU

050 200

-.2

I 0(b)

0.8

0.e.a'~

0

>--<C 0.4

'"§:==

'"02

0

5

• RESULTS OF MONTE CARLO EVALUATION OFONLINE ESTIMATOR THAT PROCESSES APARAMETER ESTIMATE EVERY SAMPLE PERIOD

• N' 5• AVERAGING OVER 3D-SAMPLE INTERVAL

2 0 i

- 0 2

-0.'

FIG. 10.13 (a) Parameter estimate error; online estimator with N = 5 plus averaging. (b) Mean±10'values of online parameter estimate errors.

(a)• FULL-SCALE ESTIMATOR• N "30• TIME VARYING PARAMETER at

'"occ:3

• ONLINE ESTIMATOR THAT PROCESSESA PARAMETER ESTIMATE EVERYSAMPLE PER 10D

• N "5• AVERAGING OVER 3D-SAMPLE INTERVAL• TIME VARYING PARAMETER at

-.2

(b)

.1

-s:~

0<.u;;:~ .2>-~~

~cc0co:3

aloa

""'.2

(c)

.6

.r~

0 .1~-c:;;0

>-~Z

co.20

co:3

• ONLINE ESTIMATOR THAT PROCESSESA PARAMETER ESTIMATE EVERY NSAMPLE PER 10DS

• N "10• AVERAGING OVER 3D-SAMPLE INTERVAL• TIME VARYING PARAMETER at

200

FIG. 10.14 Parameter estimate error with variable true parameter. (a) Full-scale estimator,(b) and (c) online estimators.

10.6 UNCERTAINTIES IN «) AND Bd : ATTAINING ONLINE APPLICABILITY 109

figure, the tracking behavior of this online method is comparable to that of the full-scale estimator.The Monte Carlo results shown in Fig. 10.l3b (based on the same noise simulation runs as forFig. 10.4)further substantiate this fact.

Very similar performance is achievable, with less memory load, by including only every secondor third estimate in the averaging, or by making a parameter estimate only every second or thirdsample period. The undesirable initial deviation can be removed by delaying the time of the firstparameter estimate by 5 sample periods. •

EXAMPLE 10.11 The ability of the estimators to track a slowly time-varying parametervalue was examined by letting the "true" parameter in the previous problem double its magnitude(linearly) in the 200 sample period interval. Figures 10.14a, b, and c portray the parameter estimateerrors committed by the full-scale and two online estimators, respectively. Figure 10.l4b displaysthe slope of the "true" parameter growth, since the simulation computed the error every periodwhile this estimate was made only every ten samples after i = 10. Thus, although the parameteractually changed by 0.075 every 30-step interval, i.e., 15~~ of its original value, all three estimatorswere able to track its value very adequately with a model of the parameter as constant over 30periods. •

To enhance online applicability, not only can the iterative solution pro-cedure be approximated, but the likelihood equations themselves can be approx-imated as well. Analyses based on ambiguity functions and simulations canindicate the relative sensitivities of individual terms in the likelihood equationsto parameter values, and the less sensitive terms can be neglected. Experiencehas shown that, ofthe possible implementations of this form, the most successfulis the inclusion of only weighted least squares type of terms. Not only does thisremove a considerable amount of computation, but it generally provides es-timates of the same quality as attained with the more complex algorithm. Infact, certain terms removed in this manner can be shown to contribute a biasto the parameter estimate, further motivating their removal, provided that theremaining terms can effect an adequate estimate.

Basically, if Gaussian models are involved, the maximum likelihood methodinvolves maximizing

(10-70)

as a function of 0, where both; and Yare functions of 0, yielding the p simul-taneous equations (k = 1,2, ... ,p):

ae Y- 1;+! tr {Y - 1ay} _ ! ;Ty-1ay Y- 1;I = 0 (10-71)ee, 2 eo, 2 aek 8=8'

If the dependence of Y on 0 is neglected, the likelihood equations reduce to

a;T Y-1;1 = 0 (10-72)aek 8=8'

which are also the weighted least squares relations for maximizing (- t;Ty-1;),or minimizing (t;Ty-1;),by appropriate choice of O. In Eq. (10-71), the secondterm does not depend on the actual sequence of measurement values, and thus


(10-74)

(10-75)

at best it contributes nothing to a valid means of estimating parameters, andat worst it contributes a bias or destabilization. The third term is not as sensitiveto parameter variations as the first, and if it can be neglected, then there is noneed to generate expressions for oYjoek in the estimator algorithm.

The implementation equations for this approximation are vastly simplified.To propagate between sample times, the state equations are as given by (10-34)-(10-37). However, the partials of the covariance matrices with respect to eachparameter ak are now neglected, and only (10-38) remains of the score relations,and the conditional information matrix relations reduce to the single com-putation (see (l0-59)):

E {Sk 1 [Z(tJ, a ]S/ [Z(tJ, a] Ia = a*(t i)}

::::::: OXT(ti -) HT(t-)A - l(t.)H(t.) ox(ti -) (10-73)- oak 1 "oa l

For measurement updates, the state relations remain unchanged from (10-46)to (10-49), but the score relations reduce to (10-50) and

1[ ~ ] OXT(ti -) TSk Zi' a*(tJ = 0 H (tJDiak

ox(t i+) = D(t.) ox(t i - )

oak 'oak

and no J computations are necessary. Moreover, the final terms added to thescore running sums corresponding to (1O-29b) go to zero, and the associatedJ term given by (10-60) need not be added either.

EXAMPLE 10.12 Table 10.5 depicts the required computations for the hypothetical problemdescribed earlier, using the online estimator that provides a parameter estimate more frequentlythan every N samples. These results are directly comparable to those of Example 10.9, and thereductions are seen to be substantial. If no new parameter estimate is provided, only the last rowof the table is affected. •

TABLE 10.5

Online Method Using Weighted Least Squares Terms Only


x 889 671 27 I (2 x 2)s 275 189 3 0J 36 24 6 0a* 9 9 0 1 (3 x 3)

EXAMPLE 10.13 Figure 10.15 portrays the performance of an online estimator identical tothat described in Example 10.10,except that only the weighted least squares type terms are retained.Plot (a) is the parameter estimate error for the same representative run as shown in Fig. 1O.13a,

10.6 UNCERTAINTIES IN «I> AND Bd : ATTAINING ONLINE APPLICABILITY 111

(a)

200 I,--~_~150

• ONLINE ESTIMATOR THAT PROCESSESA PARAMETER ESTIMATE EVERY SAMPLEPER 100

• N· 5• AVERAGING OVER 3D-SAMPLE INTERVAL• WEIGHTED LEAST SQUARES TERMS ONLY

100

• RESULTS OF MONTE CARLO EVALUATION OFONLINE ESTIMATOR THAT PROCESSES APARAMETER ESTIMATE EVERY SAMPLE PERIOD

• N· 5• AVERAGING OVER 3D-SAMPLE INTERVAL• WEIGHTED LEAST SQUARES TERMS ONLY

.0-0 .5

~

0~

>-«~

5~

'"0cce'i 0

0

11.0 (b)

0.8

.0- 0.6~

0~

>--c 0.4~

5~ 0.2sx:

~e'i

050

10'EAN

-0.2 10'

-0.4

FIG. 10.15 (a) Parameter estimate error for online estimator using only WLS terms. (b) Mean± 10'values of online parameter estimate errors; WLS terms only.

and plot (b) presents the Monte Carlo simulation results comparable to Fig. 10.13b. Comparisonof these plots reveals that the 10' bands are wider in Fig. 1O.15b for the initial stages, but becomenarrower than those in Fig. 10.l3b as time progresses beyond the initial transient period. Moreimportantly, though, the bias is seen to be decreased in Fig. 1O.15b.

This performance is gained with a significant reduction in computer time. IBM 360/75 timefor estimator implementation over 200 samples was reduced from 6.40 to 2.68 sec, the major differ-ence attributed to the removal of the propagation of iJP/iJa. •

In order to improve the efficiency of the estimator algorithms, it would beadvantageous to precompute and store various terms for online use, rather thancalculate all quantities in real time. Curve-fitting of simple functions, such aspiecewise-linear functions, to precomputed time histories can be used to


minimize storage requirements and maintain feasibility. Quantities could beprecomputed as a function of the parameter values a, and then these requiredvalues could be evaluated using either the most recent estimate of a or a nominala value that provides reasonable performance over a range of possible param-eter values.

First of all, J -1 typically reaches a "steady state" value very quickly, andneeds updating infrequently, if at all, after an initial transient. In fact, there isconsiderable advantage to using such a precomputed J- 1 before such time asan online computed J- 1 would complete the initial transient stage, as discussedpreviously. By using a precomputed J- 1 initially, the estimator would beemploying a form of weighted gradient iteration instead of scoring, with betterinitial convergence properties for the parameter estimate.

Matrix propagations that can be precomputed parametrically as a functionof a would be the P sequence (and related A and K), the oPjoak sequence andassociated oAjoak for all p values of k, and finally the controller gains iffeedbackcontrol is used. (Controller design, and specifically assumed certainty equiv-alence controllers as applicable here, will be discussed in Volume 3.) Thesesequences could be evaluated over an interval of interest using a number offixed values of the parameters. Simple but adequate approximating functionswould be curve-fitted to express these sequence values as a function of a.

EXAMPLE 10.14 Table 10.6 presents the calculations required for the hypothetical problem,using the online estimator with frequent parameter updating, but with precomputed J- 1 andmatrix propagations as described above. These results are directly comparable to those of Example10.9.

TABLE 10.6

Online Method; Precomputed Matrices


x 69 56 2 0s 309 229 9 0J 0 0 0 0a* 9 9 0 0

TABLE 10.7

Online Method with WLS Terms Only; Precomputed Matrices

Term

xsJa*


69 56 2 0275 189 9 0

0 0 0 09 9 0 0

10.6 UNCERTAINTffiS IN AND Bd : ATTAINING ONLINE APPLICABILITY 113

Similarly, Table 10.7 presents the corresponding results for the case of including only weightedleast squares type terms. Comparison to Table 10.5 of Example 10.12 reveals the marked improve-ment afforded by precomputation.

Note that computations required to evaluate the functions of a for the curve-fitted approxima-tions are not included in these two tables. •

EXAMPLE 10.15 If the Monte Carlo simulations of Example 10.13 are examined, the scalar)-1 is seen to reach a value of approximately 0.05 after the initial transient period. Figure 10.16plots the parameter estimate error when J-' is set equal to 0.05 for all time in the estimator whoseperformance was given by Fig. 10.15a (with the same noise simulations used). The most notablechange is the improved transient behavior: the initial rise in error is less, it decreases to 0.05 inabout half the time, and yet overshoots zero by less. In certain other simulations that previouslyrequired a delay in making the first parameter estimate, the use of precomputed J-' yields adequateperformance without the delay, since the initial values of J- 1 are no longer disproportionatelylarge. Substantiating results were obtained with both the other online estimator and the full-scaleversion.

Precomputed state estimator gains as functions of the parameter value also proved successfulfor this problem. Steady state gains were attained to four significant figures in four sample periodsover the entire range of parameter values, so only steady state gains were used. Least squarescurve-fitting techniques yielded an approximation of

k, = 0.971392 - 0.000035a, + 0.OO0680a,2

k2 = -0.9505a,

that commits an error of, at worst, one part in 104 over a range of a, from -2 to O. Estimationperformance remained essentially the same, simulation results agreeing to two significant figuresby the end of the second sample period, and being virtually indistinguishable thereafter. •

. 6

• ONLINE ESTIMATOR THAT PROCESSESA PARAMETER ESTIMATE EVERYSAMPLE PERIOD

• N = 5• AVERAG ING OVER 3D-SAMPLE INTERVAL• WEIGHTED LEAST SQUARES TERMS ONLY• PRECOMPUTED I' J

....V'>u.J

Z .2

cco'"'"u.J

'00 150 200

FIG. 10.16 Parameter estimate error for estimator using precomputed [fJ.

One method of enhancing online feasibility that does not inherently involveapproximations is the use of advantageous state space representations. Since asingle system input-output relation can be represented by an infinite numberof state vector models, it is beneficial to choose a particular model that embodiesthe greatest number of zero matrix elements, thereby requiring the fewestcomputations in final implementation. If a system can be modeled as

x(t;+1) = (t;+ 1, t;)X(t;) + Bd(t;)u(t;) + Gd(t;)Wd(t;)

Z(t;) = H(t;)x(t;) + V(t;)

(l0-76a)

(l0-76b)

then it is possible to define a new state vector through use of an invertibletransformation matrix T(t;) as [19]

x(t;) = T(t;)x*(t;)

x*(t;) = T- 1(t;)x(t;)

(lO-77a)

(1O-77b)

and obtain a model with an input-output relation identical to that of(10-76)in the form of

x*(t;+d = *(ti+10 ti)x*(t;) + Bd*(t;)u(t;) + Gd*(t;)Wd(t;)

z(t;) = H*(t;)x*(t;) + v(t;)

where the system matrices are generated by

<Il*(t;+ 1, t;) = T- 1(t;+ 1)<Il(t;+ l' t;)T(ti)

Bd*(t;) = T-1(t;+ dBd(t;)

Gd*(t;) = T- 1(t;+ dGd(ti)

H*(t;) = H(ti)T(t;)

(l0-78a)

(10-78b)

(1O-79a)

(1O-79b)

(10-79c)

(1O-79d)

In the case of scalar measurements, a particularly convenient form [82, 144]is given by the discrete-time counterpart of standard observable phase variableform (this form will also be exploited later in controller design as well). Withscalar measurements, H(t;) becomes a l-by-n matrix, or vector transpose, soEq. (1O-76b) can be written as

(10-80)

If the original system model is completely observable, then an invertibletransformation matrix can be generated explicitly through

(10-81 )

(10-82)


that will generate a new system state transition matrix as

* - [0 i I ] (ti + b til - -*T-----q, (ti + 1,t;)

where 0 is (n - 1)-by-1, I is an (n - l)-by-(n - 1)identity matrix, and q,*T(li+ l,ti)

is given by

(10-83)

Moreover, Bd*(ti ) and Gd*(t i) are given by (l0-79b) and (1O-79c), and the newmeasurement matrix becomes

OJ (10-84)

From (10-84) it is clear that there are no uncertain parameters in h*T(t;), asassumed earlier.

For the case of vector measurements, a row of the measurement matrixH(t i ), with respect to which the system model is completely observable, isdenoted as hT(ti), and the same procedure is followed. The remaining portionof the new measurement matrix is obtained from (1O-79d). If no such row ofH(t i) can be found, a modified procedure can be used [82, 144]. In general,vector measurements will yield a H*(t i) that is no longer devoid of uncertainties.The estimator form can be expanded to admit uncertain parameters in H toaddress this case, or an equivalent update using m scalar updates can be gen-erated, iteratively applying the previous results (if complete observability isprovided by each scalar measurement).

If the system model is time invariant, then either the standard observableform above or the discrete-time modified Jordan canonical form [19J, analogousto that discussed in Chapter 2 [83J, can be used to advantage. The modeseparation provided by the latter form can significantly reduce the amount ofcomputation required to evaluate ox/oak and oP/oak recursions, since eachparameter will enter into only one or a few system modes. Moreover, this formgenerally allows a shorter computer wordlength to achieve the same precisionin system representation as that attained by other state variable forms.

EXAMPLE 10.16 Table 10.8 presents the required number of calculations for the hypo-thetical problem, using canonical variables and the online estimator formulation including H*(t;)uncertainties. Table 10.9 relates the corresponding values when only weighted least square termsare used, and Table 10.10 employs precomputations to reduce the loading further. For all of thesetables, it is assumed that there are three first order modes and one second order, that one uncertainparameter affects the second order mode, another a first order mode, and the last is an uncertainparameter in B, that affects a first order mode. The numbers cited do not include the computationof required functional evaluations for the elements of the system matrices, which would furthermotivate use of canonical variables. The numbers in parentheses are the portion of the totals dueto allowing uncertain parameters in H*: for scalar measurements, there would be no suchcontribution. •

116

Term

x

Ja*


TABLE 10.8

Online Method; Canonical Form


606 473 27 1 (2 x 2)1394 (115) 1136 (95) 69 (60) 1 (5 x 5)887 698 (4) 6 0

9 9 0 1 (3 x 3)

TABLE 10.9

Online Method with WLS Terms Only; Canonical Form


x 606 473 27 1 (2 x 2)s 162 (25) 137 (11) 13 (10) 0J 36 28 (4) 6 0a* 9 9 0 1 (3 x 3)

TABLE 10.10

Online Method; WLS Terms, Precomputed Matrices, Canonical Form

Term

xsJa*


46 43 2 0162 (25) 137 (11) 13 (10) 0

0 0 0 09 9 0 0

Exploiting symmetry can enhance not only storage and computation re-quirements by computing only nonredundant elements of symmetric matrices,but also numerical precision through implementation of square root forms.Modification of measurement incorporation, iteratively updating the partitionsof z, as discussed previously, can also increase the online applicability of theestimator.

EXAMPLE 10.17 Recall the thrust vector control problem of Example 9.9, in which anextended Kalman filter was used to estimate both the states and the uncertain parameter (Wb2).Figure IO.l7a presents a representative parameter estimate error trajectory as generated by thefull-scale estimator with N = 30. As in previous examples, the initial behavior characteristics areimproved by delaying the first estimate, allowing the computed J- 1 to attain a reasonable magnitudebefore using it in the estimation: Fig. 10.17b depicts the result of postponing the first parameterestimate until the sixth sample instant.

The corresponding state estimation accuracy is enhanced considerably, as seen by comparingthe errors in the estimates of generalized bending coordinate velocity with and without simultaneous

10.6 UNCERTAINTIES IN <D AND Bd : ATTAINING ONLINE APPLICABILITY 117

(o)

• FULL -SCALE ESTIMATOR• N =30

"I.e3

u..o

~ti;UJ

Z 20

0<o0<0<UJ

o

(b)

• FULL-SCALE ESTIMATOR• N= 30• FIRST PARAMETER ESTIMATE

MADE AT i = 6"I.e

3u..oUJ

~:;:>-V)

UJ 20z0<o0<

ffi

FIG. 10.17 (a) Full-scale parameter estimate error for N = 30. (b) Full-scale parameterestimate error when first estimate is delayed.

parameter estimation, as in Fig. 10.18. The other state estimates display similar improvement, andresults are consistent when mean ± lO" (standard deviation) plots from a Monte Carlo analysisare studied [82]. In fact, because of the precision of the parameter estimate, the state trajectoriesassociated with the full-scale estimator are essentially equivalent to the output of a Kalman filtertuned to the correct value of wb2.

The online estimator that processes a parameter estimate every N sample periods (N = 10),combined with averaging over the three most recent estimates, performed well, as seen in the

118 10.

(0)

.5

PARAMETER UNCERTAINTIES AND ADAPTIVE ESTIMATION

• ESTIMATE FR9M KALMAN FILTER THATASSUMES "'b = 100 rad2, sec2

• TRUE "'~ • 150 rad2, sec2

~~

.c<>

Z0

0:: 0000::0::u.J

-.5

(b)

.5

-.5

00

• FULL-SCALE ESTIMATOR• N· 30• FIRST PARAMETER ESTIMATE

MADE AT i =6

200

FIG. 10.18 (a) Vb error when parameter is not estimated. (b) Vb error when parameter isestimated.

typical parameter estimate error trajectory (generated by the same noise simulations as for Fig. 10.17)in Fig. 10.19a. As in the previous examples, by estimating the parameter more often for the firstfew instants, convergence to a good value was very rapid, followed by estimation accuracy thatequals that of the full-scale estimator [82].

The error trajectory of Fig. 10.19b was generated by the same sequence of noise inputs, but theestimator itself differed in two respects. First, it incorporated only weighted least squares typeterms, and the influence of dropping the other terms is minimal. Second, the initial parameter

10.6 UNCERTAINTIES IN (» AND Bd : ATTAINING ONLINE APPLICABILITY 119

bOt (0)

Nu~

N"C

~ 10N.C

:3LL.0L.LJI--c:2I-

~Z

ct::20

0ct::

5

N.Q

:3LL.oL.LJI-

~l-V)

L.LJ 20z

(b)

• ONLINE ESTIMATOR THAT PROCESSESA PARAMETER ESTIMATE EVERYN SAMPLE PERIODS

• N = IO• AVERAGING OVER 3D-SAMPLE INTERVAL

o

• ONLINE ESTIMATOR THAT PROCESSESA PARAMETER ESTIMATE EVERYN. SAMPLE PER 1005

• N= 10• AVERAGING OVER 3D-SAMPLE INTERVAL• WEIGHTED LEAST SQUARES TERMS ONLY

FIG. 10.19 (a) Online parameter estimate error. (b) Online parameter estimate errors; WLSterms only.

estimation frequency was not increased, and the resulting slower transient response is characteristic;if this response is satisfactory, this form might be preferable because of the reduced computationalload.

The online procedure that produces a new parameter estimate every period also nearly dupli-cated the performance of the full-scale technique, similarly with considerably less computationalexpense. •


10.7 UNCERTAINTIES IN Qd AND R

(10-38')

The concept of the previous sections can be applied directly to the problemof state estimation in the face of uncertain parameters in the dynamic noisecovariance Qd or the measurement noise covariance R [3,5,6, 16, 18,48,49,70,87,88,91,126,141]' In fact, the solution to (10-15) and (l0-18) will againprovide the maximum likelihood estimate of states and parameters assumingconstant parameters for all time, whereas the solution to (10-20) and (10-21)will provide the estimates exploiting the constant-over-N-steps model forparameters. The only difference in algorithm form lies in the evaluation of thepartial derivatives in these relations [1].

In detail, the modifications to the full-scale estimator equations (10-28)-(l0-57) are as follows. The score time propagation equations become

oi(ti -) _ ""'( ) oi(tt- dO

- '&' ti , ti - 1 0ak ak

(10-54')

(10-39')

(10-40a')

(1O-40b')

oP(t j-) OP(tt-1) To = CI>(ti, ti- 1) 0 CI> (ti , ti- d

ak ak

oQd(ti-tl T+ Gd(ti- d 0 Gd (ti- dak

oA(t) = H( . oP(ti -) H T( .) oR(t)O t) 0 tJ + 0ak ak ak

oE(t) = oP(ti -) H T .) _ K( . oA(t)O 0 (tJ t) 0ak ak ak

where the additional term in (10-40b') is defined for computational efficiencyand is related to the partial of the filter gain K by [oK/oak] = [oE/oak]A -1.Of Eqs. (10-41)-(10-43), only the first term in (10-41) remains; similarly, (10-45)is not required. The modifications to the score measurement update relations are

oi(ti+) = D(t.) oi(tj-) + oE(t) n. (l0-53')oak J oak oak J

O~(tj +) = D(tj) O~(tj -) DT(tj) + K(tj) ORo(ti) KT(t)

ak ak ak

For the conditional information matrix, (10-55) becomes

{Oi (tj +) oiT(ti+) I = ~ (.)}

E 0 0 a a* t,ak at

= D(t .)E {Oi(ti -) oxT(tj -) Ia = a (t.)} DT(t-) + oE(t) A -1(t.) oE(t)J oak oal * I J oak J oa,

(l0-55')

and (l0-56) and (10-57) are no longer required.

10.7 UNCERTAINTIES IN Qd AND R 121

As previously, the full-scale estimator is not feasible for online computations,so means of simplifying the algorithm are required. This is even more true inthe present case, since the number of parameters to be estimated, and thus theamount of computation, is often greater in a practical self-tuning filter contextthan in a state estimation and system identification application (except "blackbox" identification). Even if Qd and R are assumed to be diagonal to limit thenumber of uncertain parameters, this generally yields more parameters thanencountered in many identification adaptations.

The two online conceptualizations of the previous section can be exploitedto enhance feasibility, as can the use of precomputed (or infrequently recom-puted) J- 1

• Also the idea of retaining only the most significant terms is fruitfulto pursue, but the dominant terms in this case differ from those in Section 10.6.

Recall the reasoning associated with Eqs. (1O-70)~(10-72). Now, rather thanneglect the dependence of Y on lJ, it is appropriate to neglect the dependenceof ~ on lJ. This is equivalent to ignoring the second term in Eq. (10-21), theexplicit likelihood equation for this problem, to yield

± tr{[A(tFl - A(tj)-lr//A(tF1] OAo

(t)}j=/-N+l ak

+ tr {P(t/)-l oP(t/ )}I = 0 (10-85a)oak a = a*(t,)

Moreover, the last term in this expression is often dominated by the first,especially for reasonably sized N, so it too is usually neglected. Note that thisis equivalent to removing In fx(tillz(t;),a (~I ~/, IX) from the likelihood functionwhen seeking the estimate of parameters:

..± tr{[A(tj)-l - A(t)-lr//A(tF I] o~(t)}1 = 0 (10-85b)

J=z-N+ I uak a=a*(t;)

For either of the "pseudo" likelihood equation forms given in (10-85), thecomputational burden is reduced considerably. There is no longer any need tocompute oi- /oa kor E {[oi/oak] [oi/oa/]T Ia = a*(t i)} , thereby removing (10-38'),(1O-40b'), (10-41), (10-53'), and (10-55'), and simplifying (10-44) and (10-52) to

{ I[ ] I[ ]1 ~ } 1 {-I oA(tj) -I OA(t)}E Sk Z(tj),a s, Z(t), a a = a*(t/) = 2 tr A (t) a;;; A (tj) ~

(10-44')

I [ ~ ] 1 { 0A(tj)}Sk r.; a*(tJ = -2 tr C(tj ) a;;; (10-52')

It should be noted that numerical results [1] have indicated the conditionalinformation matrix based upon (10-44') to exhibit generally smaller eigenvaluesthan that based on (10-44), with resultant increased numerical difficulty whenits inverse is computed.


A further approximation can be made to obtain explicit estimates of RandQd' When these approximations are made, the existence of independent andunique solutions for the uncertain parameters and convergence properties aresubject to question. IfR or Qd is to be estimated separately, a reasonable solutionis usually achievable. Simultaneous estimation of parameters from both Qd andR is not as well behaved. It is generally true, for all algorithms and not justthose based upon the maximum likelihood concept, that R parameter estimatesare more precise than Qd parameter estimates.

Using (lO-40a'), the "pseudo" likelihood equation (1O-85b) can be written as

i { oP(t.-) }L tr [A -l(tj) - A -l(tj)rl/A -l(tj)]H(t) __J- HT(t j ) = a (l0-86a)j=i-N+l oak

for ak an unknown parameter in Qd' and

~ ({[ -1 -1 T -1] oP(tj -) T }j=i~+ 1 tr A (t j) - A (tj)rlj A (t j) H(t j) oak H (tj)

+ [A -l(tj) - A -l(t)rl/A -l(t)]kk) = 0 (l0-86b)

for ak a diagonal element of R, ak = Rkk. To obtain an explicit relation for R,assume Qd to be known completely. If, as in many applications,

oP(tj-) TH(tj) a H (t) « IRkk

then (l0-86b) can be approximated by its second term alone:

i

o= L [A -l(t) - A -l(t)rl/A -l(tj)]kkj=i-N+l

i

L [A -l(t){A(t) - rlnA -l(t)]kkj=i-N+ 1

i

L [A -l(tj){H(tj)P(tj -)HT(t) + R(tj) - rln A-l(t)]kk (l0-87)j=i-N+l

which would be satisfied for all k if the term in braces, { }, is zero. If the estima-tion process is essentially time invariant over the most recent N steps, i.e.,A-l(t) ~ const over these steps, then an estimate of R(t;) can be defined as

[1 ~ TJ - T~ N..L. rli - H(ti)P(t i )H (t i)

J=,-N+ 1

(l0-88a)

(lO-88b)

where the r j and P(tj -) are computed on the basis of previous parameterestimates, and the term in brackets is recognized as an ergodic approximationto the covariance of the zero-mean residuals, A(tJ = H(tJP(ti-)HT(tJ + R(tJEven for reasonably large N, these expressions can lead to an estimate ofR(t;) that is not positive definite. An alternative, better conditioned estimatecan be generated by noting that

A -l(tj)rj = R -l(t)[Zj - H(t)x(t j+)J (10-89)

so that a development analogous to (10-87) yields

R(ti) = ~ ± {[Zj - H(t)x(tj+)J[Zj - H(t)x(tj +)JTj=i-N+ 1

+ H(t)P(t/)HT(t)} (10-90)

Although (10-90) is superior to (10-88) numerically [1J, it requires a greateramount of computation, since neither [Zj - H(t)x(t j+)J nor [H(tj)P(t j+)HT(t)Jis already calculated in the estimation algorithm.

Obtaining an explicit estimate of Qd is less straightforward, and a uniqueestimate may well not exist without additional constraints. The partial deriva-tive in (10-86a) is evaluated by means of (10-39'), which can be approximatedby neglecting the first term. Again assuming that the estimation process isessentially in steady state over the most recent N steps, (10-86a) can be trans-formed into a set of equations that are satisfied if

i

L [(tj, t j_ dP(tt-l )<l»T(tj, tj_ d + Gd(t j_ dQd(t j- 1)G/(tj- dj=i-N+ 1

(10-91)

where

AXj = x(t/) - x(tj-) = K(t)rj (10-92)

If Gd(t j- 1) is invertible for all j (as especially Gd(t j- tl ~ I), then an estimate ofQd(t i) can be defined as

~ 1 i

Qd(t i) = N j=;~N+l {Gi1(tj_tl[AxjAx/ + P(t/)

- (t j, tj_l)P(tj+_l)T(tj, tj-1)JGi l(t j_ tlT} (10-93)

A physical interpretation of this result is developed in Problem 10.17 [61,86].If Gd(t j- 1) is not invertible, then Gi 1(tj_ 1) in (10-93) can be replaced by thepseudoinverse calculated as

Gd#(t j- 1) = [Gd(tj_l)TGd(tj_l)]-lGd(tj_ dT (10-94)

It should be noted that (10-88) or (10-90) is an approximate solution forR assuming Qd to be known, and (10-93) is a solution for o. assuming R


known. The two solutions can be used, with caution, to estimate both matricessimultaneously, knowing that the resulting estimates are not independent.This dependency can cause biased estimates that do not distinguish betweenerrors in Rand o;

EXAMPLE 10.18 Consider a second order time-invariant system with

where the undamped natural frequency W n is 0.1 rad/sec and the damping ratio ( is 0.05. Further,let Po = 101, and let measurement samples be taken from the system every second for 200 sec.

2IR,Qd

o

(a) 2,----------------"

40

•

o

(b) 40 ,------------------::lI

20R,IOQd

FIG. 10.20 (a) Estimating scalar Rand Qd- (b) Estimating scalar Rand Qd; larger true values.From Abramson [1].


This example will indicate the performance for an estimator with N chosen appropriately: forN = 25 and above this precision is essentially constant, whereas smaller N yields increasing errormagnitudes.

First consider a single input, single output problem [IJ in which the equivalent discrete-timemodel is described by G/ = [0 IJ and H = [1 0]. Figure 10.20a depicts the results often differentsimulations, in which true values of Qd and R were each varied between 0.1 and 2, while the estimatorassumed initial conditions of Qd = 0.5 and R = 1.0 in all cases. If there were no estimation error,the estimates of Rand Qd would lie along the diagonal R = Rand Qd = Qd' The dispersion aboutthis line is a measure of the estimation error. Figure 10.20b portrays the same results, but lettingQd vary from 0 to 4 and R from 5 to 35, with Qd = I and R = 10 initially. Both plots show goodagreement between true and estimated values.

Use of a precomputed J-l, based upon the initially assumed parameter values. yielded resultsalmost identical to that depicted in Fig. 10.20.This is shown in Table 10.11,which presents the meanvalue (averaged over the ten simulations corresponding to each plot of Fig. 10.20)of the true param-eter at, and the mean and root mean squared values of the error (ii - a.). In fact, for the cases run,precomputation of J - 1 improved estimation performance.

IfGd and H are both changed to 2-by-2 identity matrices, and the a priori values of o, and Rare

A _ [10 OJQd - 0 I

then Fig. 10.21 plots estimation performance data similar to that of Fig. 10.20, again for a 10-runsimulation. Increasing the number of quantities to be estimated did not degrade the estimatorperformance, but substantially more computation is required to estimate four parameters insteadof two. Table 10-12 also reveals that a precomputed J- 1 is again useful to employ.

The estimator based upon the "pseudo" likelihood equation (1O-85b) exhibited severe numericalproblems, due to the near singularity of the computed J matrix. When a gradient algorithm wasused, however, estimates that agreed well with the solution to the full likelihood equations wereachieved.

The explicit suboptimal estimates given by (10-90) and (10-93) performed well when the apriori assumed parameter values (for the state estimation) were close to the true values. However,this performance degraded more significantly than that of the full-scale estimator as the a priorivalues were made less accurate. Figure 10.22a portrays the full-scale estimator results for a 10-run

TABLE 10.11

Estimation of Scalar Qd and R

Meanparameter Mean RMS

Case value error error

Fig.l0.20a Qd Full-scale 0.619 0.022 0.143Precomputed J - 1 0.619 0.017 0.153

R Full-scale 0.756 -0.056 0.115Precomputed J- 1 0.756 -0.052 0.110

Fig.IO.20b Qd Full-scale 1.329 0.010 0.523Precomputed J - 1 1.329 0.008 0.339

R Full-scale 15.599 -1.250 3.535Precomputed J - 1 15.599 -1.056 3.186

40o

20

(a) 40 ,----------------"71

(b) 40 ~--------------.,.

o m ~

Qd"IOQd2

FIG. 10.21 (a) Estimating two diagonal terms of R. (b) Estimating two diagonal terms ofQd. From Abramson [I].

TABLE 10.12

Estimation of 2-by-2 Diagonal Qd and R

Meanparameter Mean RMS


Qd' Full-scale 11.198 0.678 2.075Precomputed J -e L 11.198 0.841 2.216

Qd2 a Full-scale 1.896 0.082 0.786Precomputed J -, 1.896 0.177 0.776

R, a Full-scale 9.167 0.191 1.245Precomputed J- I 9.167 0.552 1.639

R 2 Full-scale 11.388 0.062 1.752Precomputed J- 1 11.388 0.257 1.753

a Correspond to Qd and R of Table 10.11, for the case ofFig. 10.20b.

126

(a) 20.RA Qd

AA

R,IOQd • •• ••10 . • .A A

A A •A A

AI

0 2 3 4 5 6 7 8 9 10Trial

(b) 20.RA Qd

R,IOQdA

10 • - ••• • ~- & A J

0 2 3 4 5 6 7 8 9 10Trial

(c) 20 Ro, 10Qdo

• • ••• • • • • •

R,IOQd A AA

A A A A

10 A A A R,IOQd

.RAQd

o 2 3 4 5 6 7 8 9 10Trial

FIG. 10.22 (a) Full-scale estimator. (b) Explicit suboptimal estimator; good a priori parametervalues. (c) Explicit suboptimal estimator; poor a priori parameter values. From Abramson [I].

127


TABLE 10.13

Full-Scale vs Explicit Suboptimal Estimators

Trueparameter Mean RMS


Full-scale estimator Qd I -0.140 0.326R 10 0.443 1.404

Explicit suboptimalgood a priori values Qd I 0.029 0.077

R 10 0.154 0.373Explicit suboptimal

poor a priori values Qd I 0.141 0.160R 10 5.585 5.601

(10-95)

simulation involving the single input-single output form of this example, letting the true value ofQd and R be I and 10, respectively, for all cases. Initial values of Qd and Rwere set equal to thesevalues as well. Figure 1O.22b is the corresponding result for the explicit suboptimal estimates.When the initial values of Qd and R are changed to 2 and 20, respectively, the full-scale estimatorproduces results essentially identical to Fig. 10.22a, whereas the explicit suboptimal estimates arebiased significantly towards these erroneous initial conditions, as seen in Fig. 10.22c. Table 10.13summarizes the lO-run average of these plot values. •

Another adaptive suboptimal filter can be proposed by recalling that theobjective of the adaptation is to improve the state estimation performance. Aslong as the nm elements of the adaptive1y set filter gain K are appropriate, the"accuracy" in estimating Qd and R, or the uniqueness of these solutions, is ofsecondary importance. Again under the assumption of steady state performanceover the most recent N sample times, a unique estimate of K and A can beobtained even if a unique estimate of Qd cannot. The likelihood equations areagain written as in (10-21), but ignoring the first term as negligible. If the matrixA is chosen as one of the parameters to be estimated, then for ak = A, thethird term in (10-21) dominates the second, and an expansion as used in (10-87)can be used to generate an estimate of A as

~ 1 i TA(ti) = N L rlj

j=i-N+ 1

as interpreted physically below (10-88). If uncertain parameters in K are tobe estimated simultaneously, (10-95)causes the third term in (10-21) to reduceto zero, yielding

(10-96)

as the equations to be solved iteratively for ak as elements of K, along with(10-95). Once Aand Kare achieved, estimates of Rand Qd could be obtained,

10.8 BAYESIAN AND MULTIPLE MODEL FILTERING ALGORITHMS 129

(1O-97a)

(1O-97b)

(lO-97c)

(l0-97d)

R = A - HKA

o, = Gd#[P- - cI>p+cI>T]Gd#

if desired, by solving the steady state equations for A, K, P-, and p+ (as givenby (10-35)-(10-37) and (10-49) for Qd and R as [88,91]

P- = KA(HT)#

P+ = P- - KHP-

10.8 BAYESIAN AND MULTIPLE MODELFILTERING ALGORITHMS

Let a denote the vector of uncertain parameters in a given model, allowingthem to affect any or all of «1>, Bd, H, Qd' and R. The purpose of Bayesianestimation is to compute the conditional density function:

f.(til. alz(t,)(~, ~ IZi) = f.(t,)la,Z(t,)(~ I~, Z;)faIZ(ti)(~ IZJ (10-98)

Under the assumed model form of (10-1)-(10-5), the first density on the righthand side of this expression would be Gaussian, with mean x(ti +) and covarianceP(ti +) as computed by a Kalman filter, for each given value of the parametervector a.

First assume that a can assume any value in a continuous range A c RP.Then the second term in (10-98) can be expressed as

faIZ(til(~ I:?l'i) = falz(til.Z(t, - tl(~ IL sr..1)

fa, z(tilIZ(t, - tl(~, Cd:?l' i- 1)- !z(ti)IZ(t,-d(Cil:?l'i-1)

_ !z(t,lIa, ZIt, - tl(Ci I~,:?l'i - l)faIZ(ti - J~ I:?l'i - 1) (10-99)- SA !z(t,lIa, ZIt, _tl(Ci I~,:?l'i - l)faIZ(ti -1l(~ I:?l'i- 1) d~

Since !z(t,)la,Z(ti _tl(Ci I~, :?l'i-1) is Gaussian, with mean H(ti)x(t; -) and covariance[H(ti)P(t; - )HT(ti) + R(ti)], for each value of the parameter vector, (10-99) couldconceptually be solved recursively, starting from an a priori density fa(~). Astate estimate would then be generated as the conditional mean

E{x(ti)IZ(ti) = Zi} = r; ~f.(t,)IZ(t,)(~IZi)d~

= f~oo ~[L f.(til,alz(til(~,~IZi)d~ Jd~

= f~oo e[L f.(tilla, Z(til(~ I~, Zi)faIZ(tJ~ IZi) d~ ] d~

= L [f~oo ~f.(tilla'Z(t,)(~I~,ZJd~JfaIZ(til(~IZJd~ (10-100)


which is obtained by use of marginal densities, Bayes' rule, and interchange oforder of integration. Note that the expression in brackets on the last line of(10-100) is x(t/) as produced by a Kalman filter for a particular parametervector value. Unfortunately, the integrations involved in (10-99) and (10-100)make this estimate computationally infeasible for online usage.

To enhance feasibility, let the parameter vector instead assume only a finitenumber of values. This finite set might be the result of discretizing a continuousparameter space: selecting a set of values [a., a2, ... , aK } that are dispersedthroughout the region of reasonable parameter vector values. Or, a problemof interest might naturally be described by discrete parameter values, as forsensor failure detection in which each ak would correspond to a particularconfiguration of some sensors failed and the others operational.

Assuming that the parameter vector in fact assumes one of the valuesa., a2, ... , aK , one can seek an algorithm to produce the true conditionalmean and covariance of the state simultaneously with identification of the"true" parameter value [80]. However, if a continuous parameter space has beendiscretized, the "true" parameter value will not be identical to one of the ak's,but "near" to one [8, 22, 23, 97]' With a sufficiently fine discretization, theapproximate solution afforded by such an algorithm will often provide adequateperformance. Optimum discretization of a continuous parameter space is asubject of current research.

Conceptually, associated with each ak is a different system model of theform given by (10-1)-(10-5). Thus, a is considered a discrete random variable,and each realization a, corresponds to a particular model being selected bynature as the best representation of a given system. For Bayesian estimation,an a priori density function must be specified for a. Letting Pk(tO) be the prob-ability that a assumes the value ak at time to, this density is

K

f.(rl) = L Pk(tO) b(rl - ak)k=l

Note that the Pk(tO) values must be such that

(10-101)

for all k (10-10la)

(1O-102b)

and that their values reflect one's best guess about which particular modelsare most likely to be correct. For example, if all models are equally likely, thenPk(tO) = 11K for all k.

Now define the hypothesis conditional probability Pk(tJ as

(10-103)


which also satisfies relations analogous to (10-102). It is desired to obtainrecursive relations for both Pk (for k = 1,2, ... , K) and the conditional meanand covariance of the state, given the measurement history. In a developmenttotally analogous to (10-99) and (10-100), we can write, for k = 1,2, ... , K,

(t )_ fz(t,lla, Z(t, - JZi Iab Zi - tlPk(t i - I)

Pk i - "K tL..i=1 fz(t,lIa.z(t'_ll(Zi aj,Zi-dpj(ti-d

x(t/) = E{x(ti)IZ(ti) = Z;}

= f:'oo ~ [JI f.(t;)!a, Z(t.)(~ Iak,Zi)Pk(ti)] d~K

= L Xk(ti+)Pk(t;)k= 1

(10-104)

(10-105)

where Xk(ti+) is the state estimate produced by a Kalman filter based on theassumption that the parameter vector equals ak . Thus, the overall state estimateis the probabilistically weighted average of the state estimates generated by eachof K separate Kalman filters, using the hypothesis conditional probabilitiesPk(ti) as the appropriate weighting factors. The conditional covariance of x(t;) is

P(ti+) = E{[x(ti) - x(ti+)][x(t;) - x(ti+lYIZ(ti) = Z;}

= f:'oo [~- X(ti+)][~ - x(t/)Yf.(t.)IZ(t,)(~IZi)d~

= ktl Pk(t;){f~oo [~- x(t/)][~ - X(ti+)]Tf.(t;)la'Z(t;)(~lak>Z;)d~}K

= L Pk(ti){Pk(t/) + [xk(t/) - x(t/)][Xk(t/) - x(t/lY}k= I

(10-106)

where Pk(ti+) is the state error covariance computed by the Kalman filterbased upon ak • From (10-106) it can be seen that P(t i+) cannot be precomputed,since it involves Pk(t;), Xk(t i+), and x(t/), all of which require knowledge of themeasurement history.

The adaptive filter that results from this development is known as the multiplemodel filtering algorithm [7-9, 11, 12, 14, 15, 22-24, 30, 50, 53, 69, 73, 80, 87,97, 98, 125, 131, 146], and its structure is depicted in Figure 10.23. It is com-posed of a bank of K separate Kalman filters, each based on a particular valuea I' a2 , ... , or aK of the parameter vector. When the measurement Zi becomesavailable at sample time t;, the residuals rl(ti ) , r 2(ti), ••. , rK(ti) are generated inthe K filters, and passed on for processing by the hypothesis conditional prob-ability computation, an implementation of (10-104). Specifically, each

Kalman filterbased on a l

Zi

Kalman filterbased on a2

••

Kalman filterbased on aK

X2

r 2

...Hypothesis

conditional probabilitycomputation

•PK :

...

FIG. 10.23 Multiple model filtering algorithm.

1f.(tilla, Z(ti - t)(Zi Iak , Zi-l) = (2nt /2!A

k(tJ II/2 exp{ .}

{.} = { -tr/(t;)Ak l(t;)rk(ti )}

where Ak(ti) is generated in the kth Kalman filter as

Ak(ti) = Hk(ti)Pk(ti-)H/(ti) + Rk(ti)

(10-107)

(10-108)

These K evaluations, along with memory of the previous Pk(ti- d values, allowcomputation ofthe current hypothesis conditional probabilities PI(t;), P2(t;), ... ,PK(t;) according to (10-104). The Pk(ti) values, in turn, are used as weightingcoefficients to generate i(t/) according to (10-105).

If it is desired to produce an estimate of the parameter vector itself, theconditional mean of a at time ti is

a(ti) ~ E{a(ti)IZ(t;) = Z;} = r; lX!alz(t;j(IXI Zi)dlX

= r; IX [ktl Pk(t;) <5 (IX - ak)JdlX

K

= I akPk(t;)k= I

(10-109)


An indication ofthe precision of this estimate would be given by the conditionalcovariance of a(ti ),

K

E { [a - a(t;)J[a - a(ti)]T IZ(t;) = zj = L [ak - a(t i)][a, - a(tiWPk(t;)k=l (10-110)

Note that neither these calculations nor (10-106)need to be processed to obtainx(ti +) however.

Heuristically, one would expect that the residuals of the Kalman filter basedupon the "correct" model will be consistently smaller than the residuals of theother mismatched filters. If this is true, then Eqs. (10-104) and (10-107) will causethe "correct" probability Pk(ti), i.e., the one whose index is associated with the"correct" filter model, to increase, while causing the others to decrease. Theperformance of this algorithm is dependent upon a significant difference betweenthe residual characteristics in the "correct" and the "mismatched model" filters.In fact, if the residuals instead are consistently of the same magnitude, thenEqs. (10-104) and (10-107) result in the growth of the Pk associated with thefilter with the smallest value of IAkl. The IAkl values are independent not onlyof the residuals, but also of the "correctness" of the K models, and so sucha result would be totally erroneous. It is therefore important not to add toomuch dynamics pseudonoise during tuning, since this tends to mask differencesbetween good and bad models. Unfortunately, no rigorous general proofs areavailable concerning the asymptotic properties of the hypothesis conditionalprobabilities. Partial results do indicate convergence of Pk to unity for thefilter based on the true system model, or highest Pk being associated with the"closest to true" system model in the discretized parameter case [8, 22, 23, 97].

EXAMPLE 10.19 Consider the single axis motion of a vehicle affected by control force u anddrag a. Letting Xl denote vehicle position, and X2 velocity, a continuous-time model of its motion is

[ ~ 1 ( t) ] = [0 1][Xl(t)]+ [O]U(t)x 2(t) 0 - a x 2(t) I

Let a radar be located at the origin, and take measurements every !'it = 0.1 sec:

z(t i ) = [1 O]x(t,)+ v(t,)

with R(t i ) == I. The equivalent discrete-time model of the system is

[

Xl(t' +d] = [I ~ (1 - e-aAt)][Xl(t,)] [I 0 J. [Wdl(ti) ]

a + _(1 _ e-aat) u(t,) +X2(ti + 1) 0 e-aAt x2(t;l a Wd2(t i)

where pseudonoise wd( " , ) of strength Qd is added to reflect uncertainty in the model. The initialstates are xl(to) = 100 and x 2(tO) = 50. A multiple model filter is to be used to estimate Xl and X 2,

and to identify a, where a is assumed to take on only values a l = 0, a2 = 0.5, or a, = I. Note that,for those values, the discrete-time model elements are as shown in the following tabulation:

Parameter value <l>d12. Bd2 <l>d22

a, = 0 0.1000 1.000a2 = 0.5 0.0975 0.951a3 = 1 0.0950 0.900

Three cases were run, holding the parameter a constant in the "true" system, equal to one ofthe three possible values in each case. For these cases, the filters were initialized with x(to) set tothe correct x(to) values, Po = I, U ;: 0, Qd = 0, and the initial hypothesis probabilities uniformlydistributed: Pk(tO) = t for k = 1,2,3. Figure 10.24 presents the time histories of the hypothesisconditional probabilities for these cases: the "true" system is always identified in 10 sample periodsor less. This illustrates the ability of the algorithm to identify constant parameters with rapid, well-behaved convergence. •

EXAMPLE 10.20 Consider the same application as in the preceding example, but let thetrue parameter value undergo jump changes such that it equals a 1 = 0 for t E [0,2), a2 = 0.5 for

E:Ejg 0.5 P3o....c,

o

o

True parametervalue = a,

5 10 ISNumber of data points

True parametervalue = a2

5 10 15Number of data points

20

20

True parametervalue = a,

o 5 10 15 20Number of data points

FIG. 10.24 Hypothesis conditional probabilities for Example 10.19. From Athans and Chang[8].

t E [2,4), and a3 = I for t E [4,6]. Further assume that the control force u is equal to 50 for all time,and known to the estimator.

To allow the estimator to adapt to the changing parameter value, two ad hoc compensationsare introduced. First, the hypothesis conditional probabilities are artificially bounded below by asmall number (0.0005) to prevent any of them from converging to zero, which would make it verydifficult for them to change significantly in response to a subsequent change in true parametervalue. Second, pseudonoise of appropriate strength (Qd = I) is added to the filter models. With nosuch noise addition, once the true parameter is identified, the algorithm can become locked to asingle filter output, and mismatched filter estimates can drift significantly from true state values.When the parameter value changes and one of these "mismatched" filters becomes the "correct"one, a very long transient is required to achieve proper identification. The pseudo noise additioncauses each Kalman filter to generate state estimates sufficiently close to the true states to allowadaptation to parameter changes.

Figure 10.25 portrays the parameter estimate provided by the algorithm as modified above. Itdemonstrates the parameter tracking ability provided by the ad hoc modifications. The controlinput u is also of significance here, in that it provides a persistent excitation of the true system toenhance identifiability. •

....<I)

';)

§ 0.5

~

o

., I

I",,I

TrueIII

1II

" III I

f/' I I____ .I, ,, ,, ,I

,I II II IEstimateII I, I

I\ ,.1\ ---"

o 2 3 4 5 6Time

FIG. 10.25 Parameter estimates for Example 10.20. From Athans and Chang [8].

The preceding example treated time-varying parameters by adding pseudo-noise to the system model and artificially bounding probability descriptions,two compensation techniques discussed in Chapter 9. A "constant-over-N-steps" model of parameters could also be proposed, but an efficient algorithmbased upon such a model has yet to be generated. As an alternative, one couldexplicitly model possible parameter variations. In this case, the optimum stateestimate is the weighted sum of estimates generated by filters matched to all


possible parameter histories [8, 22, 23]. If the parameter vector can assume anyof K values at each sample time, then K i elemental estimators are required attime t.: such exponential growth in memory requirements is impractical.However, if the parameter vector is additionally assumed to be Markov, i.e.,the present value depends only on the previous single value, then the numberof elemental estimators remains at K 2

• This is certainly more feasible, but forreasonable levels of discretization, K 2 may well be prohibitive compared to K,and thus the estimator with ad hoc modifications as in Example 10.20 mightwell be the preferable means of tracking time-varying parameters. This isespecially true since, as stated before, 'parameters are expected to vary signif-icantly more slowly than states, and not exhibit consistently dynamic behaviorfrom one sample time to the next. Another ad hoc modification for trackingslowly varying parameters, allowing for both a large range of possible valuesand sufficiently fine discretization, is to incorporate a dynamically redefinable(or "moving") bank of filters instead of a statically fixed set of bank members.

The previous example also indicated the usefulness of certain forms of inputsto aid the identifiability of systems. The subject of optimum design of inputs [4,40,45, 78, 79, 93, 94, 104, 130, 136, 142, 148] for system identification and otherobjectives will be pursued in subsequent chapters on controller synthesis (inVolume 3).

10.9 CORRELATION METHODS FOR SELF-TUNING:RESIDUAL "WHITENING"

Correlation methods have been used classically for estimation in time seriesanalysis, deriving equations to relate system parameters to an observed auto-correlation function and then solving these to obtain parameter estimates [17,60, 88, 90, 91]' Such techniques are most applicable to time-invariant systemdescriptions and stationary noises. Time-varying, or constant-over-N-stepextensions are possible, but this section will assume system time invariance andnoise stationarity.

Although output autocorrelations have been exploited to some degree,estimates based upon autocorrelation of a filter's residuals are more efficientbecause the residual sequence is less correlated than the output sequence. Asshown previously, residuals form a white Gaussian sequence in a "truly optimal"filter, one based upon a complete and perfectly tuned model. However, a sub-optimal or mistuned filter will exhibit a time-correlated residual sequence. Thus,an adaptive technique might be developed to adjust the uncertain parametersin Qd and R so as to "whiten" the residuals of the state estimator.

One such algorithm [88, 91] performs a correlation test on observed residualsto determine statistically whether adaptation is required. If so, and the numberof unknown elements in Qd is less than nm (the number of states times thenumber of measurements), then asymptotically Gaussian, unbiased, and con-

(10-111)

10.9 CORRELATION METHODS FOR SELF-TUNING: RESIDUAL "WHITENING" 137

sistent estimates of Qd and R are generated. If there are more than nm param-eters in Qd' the steady state gain of the filter is estimated without an explicitestimate of Qd .

The basic concepts and algorithm will now be developed. Assuming that aKalman filter embodies a constant gain K, not necessarily equal to the optimalsteady state gain, an expression can be developed [88] for the autocorrelationof the stationary Gaussian residual process r(', .) by writing

r(ti) = z(ti) - Hx(ti -) = H[x(ti) - x(ti-)] + v(t;)

= He(ti-) + v(ti)

Thus, the desired autocorrelation becomes

Ak g, E{r(ti)rT(t i_k)}

= {HE{e(ti-)eT(ti-_k)}HT + HE{e(ti)vT(t i_k)}HP-HT + R

k > 0 (10-112a)k = 0 (1O-112b)

where P- is the steady state error covariance matrix found as the solution to[83, Problem 6.7]

P- = <1>[(1 - KsH)P-(1 - KsH)T + KsRK/]T + GdQdG/ (10-113)

Note the use of suboptimal K" but optimal Qd and R in the two precedingexpressions. Equation (10-112a) can be expanded by noting that e(ti - ) satisfies

e(ti - ) = (1)(1 - KsH)e(ti-- 1) - <l>KsV(t i - 1)+ GdWd(ti-1) (10-114)

By iterating this expression k times, it can be shown that

E{e(ti-)eT(ti-_ k)} = [(1)(1 - KsH)]kP-

E{e(ti-)vT(ti_k)} = -[<1>(1 - KsH)]k-1<l>KsR

Which, when substituted into (l0-112a), yields

k> 0 (l0-115a)k = 0 (10-115b)

(10-116)

Equations (l0-115) and (10-113) describe the desired residual autocorrelationcompletely, since A_ k = A k • Note that for an optimal filter in which K, is setequal to {P-HT(HP-HT + R)-l}, A k == 0 for k # O.

In online usage, the value of P" is unknown, since optimal Qd and R valuesare needed for its evaluation. Basically, (10-115a) can be written for k = 1,2, ... , n in order to solve for P-HT, in terms of the computed sample auto-correlations (based on the same ergodic assumptions as (10-95))

~ 1 ~ TAk = - L. riri-k

N i=k+l


Although this is a biased estimate of Ak (dividing by N - k - 1 instead of Nremoves the bias), it is preferable since it yields le~an squared error thanthe corresponding unbiased estimate [88]. Once P-HT is obtained, this canbe used to obtain asymptotically unbiased and consistent estimates of RandK, and of Qd if desired as well. Setting up (l0-115a) for k = 1,2, ... , nandrearranging yields

rAI + HcI»KsAo J

p;fr = (MTM)-IMT ~2 + H«I>Kt~l + H«I>2KsAo ~ (10-117)

An + H«I>KsAn- 1 + ... + H«I>nKsAo

where M is the nm-by-n product ofthe observability matrix and the nonsingulartransition matrix cI»:

(l0-118)

Since M is of rank n, [(M™)-IMT] in (10-117) is an appropriate choice ofpseudoinverse of M. Now R can be estimated as

R = Ao - Hp;fr (10-119)

A better filter gain than K, is evaluated by manipulation of P-. Equation(10-113) expresses the true error covariance P- associated with the suboptimalfilter, whereas P*- associated with an optimally tuned filter would satisfy

P *- = cI»(P*- - K*HP*-)«I>T + GdQdG/ (l0-120)

Differencing (10-113) and (10-120) yields an equation for [P*- - P-] g t5P-:

t5P- = «I>[t5P- - (P-HT+ t5P-HT)(Ao + Ht5P-HT)-I(HP- + Ht5P-)

+ KsHP- + P-HTKsT - KsAoKsT]«I>T (10-121)

~ ~This is solved for I5P- using the computed Ao and P-HT from (10-116) and(10-117), yielding W. Finally, an estimate of the optimal gain is produced as

K ~~[HP-HT+R]-1* * *

g [(P- + t5P-)H T][HP-HT+ H I5P-H T+ R]-1

~ ------ ~ ------~ [P-HT+ t5P-HT][Ao + Ht5P-HT] - 1 (10-122)

For non-real-time applications, local iterations of (10-116)-(10-122) on thesame set of N measurements {zJ could be used to improve these estimates.

10.9 CORRELATION METHODS FOR SELF-TUNING: RESIDUAL "WHITENING" 139

With each iteration, the residual sequence would become increasingly morewhite, with resulting better estimates. Two proposed methods of estimatingGdQG/, if desired, are to use (10-120) directly with K* given by (10-122) and~ given either by [P- + ~J using (10-113) and (l0-121), or by

~ = K*Ao(HT)# = K*Ao(HHT)-lH (10-123)

which is obtained by solving K* = P*-HTAo1 for P* -, as in (10-97) [88].

EXAMPLE 10.21 The above correlation technique is applied to a position- and velocity-aided inertial navigation problem in one direction. The inertial system errors are modeled by adamped Schuler loop (position and velocity error states) forced by an exponentially time-correlatedstationary input to depict gyro drift. Both measurements are corrupted by exponentially time-correlated as well as white noises.

The augmented system description therefore has five states, and the equivalent discrete-timemodel for an iteration period of 0.1 sec is:

[075-1.74 -0.3 0

-015 ] [ °0

~ ].,",10.09 0.91 -0.0015 0 -0.008 0 0

X(t'+I) = : 0 0.95 0 ~ x(t,) + 2~.6 0

0 0 0.55 0.835

0 0 0 0.905 0 0 1.83

zIt,) = [ ~ 0 0 0 ~ ] x(t,) + v(t,)1 0 I

with

[q, 00] ~JQd= ~ q2 0 , [r lR=

0o q3

The actual values of ql' q2, Q3, r1, and r2 are unity, but they are assumed unknown, with the besta priori guesses as ill = 0.25, il2 = 0.5, il3 = 0.75, i\ = 0.4, and '2 = 0.6. It is desired to estimatethese values adaptively based on measurements z" Z2, ... , ZN' where for this case N is set at alarge value, 950.

The nonwhiteness of the residuals of the mistuned filter can be demonstrated in the followingmanner. It can be shown that the diagonal term [Ak]ii in (10-116) is an asymptotically Gaussianestimator of [Ak]ii with variance converging to [Ao]?!N. Thus, the 95%-confidence limits on[Ak]ii are ±(1.96/NI

/2)[Ao] ii ' One can look at a set of values [Ak]ii for k > 0 and check the number

of times they lie outside this confidence band. If this number is less than 5'j~ of the total, the residualsequence can be considered white. In view of (10-112), note that A k = 0 for all k > 0 for a truewhite noise and that one would desire A k magnitudes to lie within an envelope that decays (perhapsexponentially) with k in order to call the noise "essentially white"; however, since autocorrelationestimates for larger k are based on fewer data points (N - k), there is more uncertainty in these esti-mated values, and so comparison to afixed confidence band for all k is reasonable. (Other statisticaltests are available [17, 38, 60, 96, 140], some being more appropriate for small N.) For this case, thetest was applied to [Ak]ii for k = 1,2, ... , 40, and four evaluations (100

0 of the total) exceededthe confidence limits, and so the hypothesis that the residuals are white can be rejected.

The results of estimating the uncertain parameters repeatedly on the same batch of data areshown in Table 10.14.11 is seen that most of the adaptation is accomplished during the first iteration.


TABLE 10.14

Estimates of Qd and R Based on 950 Measurements

Percentage of Ak values outsideLikelihood 95%-confidence band

Number of functioniterations ill ilz il3 rl rz L(t}d,R,Z) Measurement I Measurement 2

0 0.25 0.5 0.75 004 0.6 -7.008 10 101 0.73 1.31 0.867 1.444 0.776 - 6.514 2.5 52 0.87 1.39 0.797 1.537 0.767 -6.511 2.5 53 0.91 lAO 0.776 1.565 0.765 -6.510 2.5 54 0.92 1.41 0.770 1.573 0.765 -6.509 2.5 5

Check case 1.0 1.0 1.0 1.0 1.0 -6.507 2.5 5

Further iterations do not increase the likelihood function L(Qd, R, Z) much, where

I N 1 mL(Qd,R,Z) = -- I r/(HP-W + R)-I ri - -lnIHP-HT + RI- -In(2n)

2N i~' 2 2

even though the changes in Qd and R are significant. Moreover, a check case was run with truevalues of Qd and R in the filter, and the likelihood function is seen to be very close to that achieved

•

109

•

~.True r,

Batch number2345678

T'~"~•• I

rz

:~F·0.5

o

............

~ ..True qz

True Q3.)

•

•

1.5

V1.0 -,.!~==:!:=~~~:::::!~~~,.

0.5

o

"r1.0

0.5

o

:~v ·0.5

o

Batch number2 3 4 5 6 7 8 9 10

FIG. 10.26 Online adaptation. From Mehra [88], © 1970 IEEE.

10.10 COVARIANCE MATCHING AND OTHER TECHNIQUES 141

in the first iteration. This indicates that the obtained estimates are quite close to the maximumlikelihood estimates and that the ambiguity function has very low curvature.

This table reflects the performance of Qd estimates achieved by manipulating Eq. (10-113) byrepeated back substitution of P- and selection of a linearly independent subset of equations. How-ever, including more equations and performing a least squares fit or using one of the methodsdescribed in the text above provide substantially the same performance. Furthermore, essentiallythe same values are achieved when different initial estimates of Qd and R are employed.

After adaptation, [Ak ] 11 and [A k ] 2 2 were compared to the 95~~-confidence limits for k =1,2, ... ,40. As seen in Table 10.14, only 2.5% (i.e., one point) ofthe [A k ] 11 values and 5"" of the[Ak]22 points exceeded these limits, supporting the hypothesis that the residuals are white.

Figure 10.26 [88] demonstrates the performance of this estimator in online usage, generatingnew estimates of Q, and R on ten successive batches ot'N measurements (N = 950).Simple averagingis performed on the estimated values: if aN(k) is the parameter estimate based on the kth batchof N measurements, then the parameter estimate ark) produced by the algorithm after processingk batches is

ark) = {aN(k)ark - I) + (I/k)[aN(k) - ark - 1)]

k = 1k = 2, 3, ...

Note the asymptotic convergence of the estimates towards their true values in this figure. •

10.10 COVARIANCE MATCHING ANDOTHER TECHNIQUES

Covariance matching [91, 111, 122, 123J entails comparing online estimatesof residual covariances and their theoretical values as computed by the statefilter, and modifying the tuning of Qd and/or R until the two agree. This modi-fication can often be confined to Qd alone, since the uncertainty in the appro-priate noise strength is usually greater for dynamics noise than measurementcorruption noise.

EXAMPLE 10.22 Consider tracking a vehicle moving along a single axis from a radarlocated at the origin of the coordinate system. One possible model to use in a tracking filter would be

x(ti+d = x(t i) + [M]u(t i)

u(ti+d = u(t,) + [~t]Wd(ti)

where x(t,) is target position at time t.; u(t,) is velocity, wd(t,) is a white Gaussian noise meant tomodel the vehicle acceleration, and ~t is the sample period at which the radar generates noise-corrupted position measurements.

If the target is nonmaneuvering, or essentially maintaining constant velocity, then a very smallvalue of Qdwill provide good estimation accuracy. However, if the target initiates dynamic maneu-vering, this Qdvalue will not allow the state estimate to track the true oscillations well. An increasedQd, or perhaps even a better acceleration model than a simple white noise, is required to provideprecision tracking. On the other hand, these modifications may severely corrupt the trackingperformance during more benign segments of the vehicle trajectory.

Throughout the range of these scenarios, a single measurement model might be used:

Z(ti) = x(t,) + V(ti)

with a single R to represent the radar precision regardless of vehicle maneuvering. •

Assume that R is known. Then the empirically generated estimate of residualcovariance,

- 1 ~ TA = N 1... fli

j=i-N+i

(10-124)

with N chosen to provide statistical smoothing, can be compared to its theoret-ical value [H(ti)P(ti-)HT(ti) + R(ti)] from the Kalman filter. For instance, ifAexceeds the filter-derived value (on the basis of eigenvalues, diagonal terms,norm, etc.), then Qd should be increased. One means of obtaining a betterQd would be to manipulate

into

HGdQdG/HT = A(ti) - HP(tt_ d<l>THT - R (10-125)

Even if A(ti ) and the P(tt-i) erroneously calculated by the filter were used toevaluate the right hand side of this equation, a unique solution for Qd or[Gd Qd Gd

T] generally does not exist because H is seldom of rank n. If thenumber of unknowns in Qd is restricted, a unique solution can be generated.Otherwise, pseudoinverses can be used. However, the convergence propertiesof this method are generally subject to question.

If instead Qd is assumed known, then R can be estimated as

T;, 1 ~ T -TJ( = - 1... fli - HP(ti )H

N j=i-N+ 1

(10-126)

Notice that this and the Qd result above are identical to the explicit suboptimalsolutions for the maximum likelihood estimates, as given in Section 10.7.

There are numerous related ad hoc adaptation techniques. For instance,one might model Qd as

(10-127)

where QdO is a nominal value, AQd is a perturbation to that nominal, and a is ascalar parameter to be adjusted for best agreement between empirical andtheoretical residual covariances. Further, the parameter might be constrainedto assume only discrete values, resulting in a multiple model filtering algorithmsimilar in structure to that of Section 10.8, but in which the state estimate isgenerated from the one filter with the best residual characteristics, rather thanas a weighted sum of K filter outputs.

In many proposed algorithms, a statistical test is performed on the observedresiduals to determine if and when to undergo adaptation. Optimal testscan be developed from statistical decision theory [121, 137], but more practical

10.11 SUMMARY 143

suboptimal tests are usually implemented, such as determining if a certainpercentage of the N most recent residuals have surpassed computed 20" or 30"bounds, or determining if the current residual is sufficiently beyond a giventhreshold while simultaneously the previous N residuals have been of the samesign [49, 66, 67, 91]. However, such a test introduces a time delay in adaptingto changes in the true system. If this is not tolerable, reprocessing of the N mostrecent measurements (stored for this eventuality) can be used to enhancetransient estimation performance once the need to adapt is declared.

EXAMPLE 10.23 One proposed filter for the tracking problem, as discussed in Example 10.22,is to provide two different values of Qd' or perhaps two different acceleration models altogether,one for the nonmaneuvering case and one for the more dynamic case. When in the nominal non-maneuvering mode, the filter declares the target to be maneuvering if N (two or more) consecutiveresiduals are of the same sign and outside the computed 30' level [26,49,66, 67, 87, 99, 131].

However, the target will have been maneuvering for some time before the actual declaration,and the state estimates may well have diverged enough due to the use of the "erroneous" no-maneuver model to cause recovery problems. Therefore, the most recent N raw measurements(the number of residuals used for the maneuver decision) are retrieved from storage and reprocessedusing the heavier weighting of the gains generated by the "maneuver" Qd' Thus, the residual be-havior is used to discern the inadequacy of the internal model, and then there is a backsliding ofthe computation "time" to the point at which the model is first thought to become inadequate. Allmeasurements from that time forward are reprocessed, until the algorithm returns to real-timeoperation.

Of course, multiple model adaptive filtering provides a viable alternative method for thiscase. •

10.11 SUMMARY

This chapter has considered the online simultaneous estimation of statesand uncertain parameters. The basic motivation for the adaptive parameterestimation is to improve the filter's internal model, with regard to both structure(<D, Bd, and H) and assumed noise strengths (Qd and R), thereby enhancingstate estimation performance. Offline methods and algorithms devoted toparameter identification without state estimation have not been discussed indetail, but the substantial amount ofliterature in these areas [5, 6, 20, 21, 32-34,37, 41, 42, 44-47, 56, 63, 74, 76-78, 81, 92, 95, 105, 106, 110, 113, 118, 130,135, 138J can be understood in terms ofthe concepts presented here. Moreover,attention has been confined to applications in which linear models suffice,although the same basic methods can be used for the more general case.

In many applications, the robustness of the Kalman filter or other estima-tion algorithms yields adequate state estimation precision despite parameteruncertainties. It may be sufficient to add "pseudonoise" to the filter model tocause a heavier weighting of real-world measurements, thereby decreasingsensitivity to erroneously assumed parameter values. However, the contri-butions of the parameter errors to model uncertainty often cannot be treatedadequately as "equivalent" white noise effects. Sensitivity analyses can be used


to indicate whether simultaneous parameter estimation is required and, if so,which parameters should be included to improve performance the most.

If uncertain parameters in «1l, Bd , or H are to be estimated, one can treatthese as additional state variables, modeled as random constants, and formulatean extended Kalman filter to handle the resulting nonlinear state estimationproblem. Although this is attractive computationally, such solutions oftenexhibit objectionable bias and convergence properties. Maximum likelihoodtechniques provide perhaps the best estimation performance, but are burden-some. This chapter revealed the use ofa constant-over-N-steps parameter model,approximations, and efficiencyimproving techniques to derive algorithms withboth online practicality and desirable performance. Discretization of thepossible parameter values and Bayesian techniques produced the multiplemodel filtering algorithm, which may become particularly attractive with theincreasing use of microprocessors and distributed computation in computerarchitecture.

If uncertainties lie in Qd or R, maximum likelihood techniques and multiplemodel filtering algorithms can similarly be exploited. More ad hoc methods,such as residual "whitening" and covariance matching, are also available,requiring less computation and storage but generally providing inferiorperformance.

Throughout this chapter, the residuals in the state estimator have playeda key role. It is this difference between measurement data and their model-predicted values that allows any model adaptation to take place. The variousestimators of this chapter have differed primarily in the means of exploitingthe residuals' characteristics in order to provide simultaneous estimation ofuncertain model parameters.

The topic of designing control inputs to aid the parameter identification wasintroduced in this chapter. However, detailed treatment of this aspect will bepostponed until Chapter 15 (Volume 3).

REFERENCES

I. Abramson, P. D., Jr., Simultaneous Estimation of the State and Noise Statistics in LinearDynamic Systems, Rep. TE-2S, Sc.D. dissertation. MIT, Cambridge, Massachusetts (May1968).

2. Albert, A. E., and Gardner, L. A., Jr., "Stochastic Approximation and Nonlinear Regres-sion," Research Monograph No. 42. MIT Press, Cambridge, Massachusetts, 1967.

3. Alspach, D. L., A parallel filtering algorithm for linear systems with unknown time varyingstatistics, IEEE Trans. Automat. Control AC-19 (5), 552-556 (1974).

4. Arimoto, S., and Kimura, H., Optimum input test signals for system identification-Aninformation-theoretical approach, Internat. J. Syst. Sci. 1,279-290 (1971).

S. Asher, R. B., Andrisani, D., II, and Dorato, P., Bibliography on adaptive control systems,Proc. IEEE 64 (8), 1226-1240 (1976).

6. Astrom, K. J., and Eykhoff, P., System identification-A survey, Automatica 7 (2),123-162(1971).

REFERENCES 145

7. Athans, M., et al., The stochastic control of the F-8C aircraft using a multiple modeladaptive control (MMAC) method-Part I: Equilibrium flight, IEEE Trans. Automat.Control AC-22 (5),768-780 (1977).

8. Athans, M., and Chang, C. B., Adaptive Estimation and Parameter Identification UsingMultiple Model Estimation Algorithm, ESD- TR-76-184, Tech. Note 1976-28. LincolnLab., Lexington, Massachusetts (June 1976).

9. Athans, M., Whiting, R. H., and Gruber, M., A suboptimal estimation algorithm withprobabilistic editing for false measurements with application to target tracking with wakephenomena, IEEE Trans. Automat. Control AC-22 (3),372-384 (1977).

10. Baram, Y., Information, Consistent Estimation and Dynamic System Identification, Rep.ESL-R-718. Electronic Systems Lab., Dept. of Elec. Eng., MIT, Cambridge, Massachusetts(November 1976).

II. Baram, Y., and Sandell, N. R., Jr., An information theoretic approach to dynamic systemmodelling and identification, IEEE Trans. Automat. Control AC-23 (I), 61-66 (1978).

12. Baram, Y., and SandeIl, N. R., Jr., Consistent estimation of finite parameter sets withapplication to linear systems identification, IEEE Trans. Automat. Control AC-23 (3),451-454 (1978).

13. Bar-Shalom, Y., Optimal simultaneous state estimation and parameter identification inlinear discrete-time systems, IEEE Trans. Automat. Control AC-17 (3),308-319 (1972).

14. Bar-Shalom, Y., Tracking methods in a multitarget environment, IEEE Trans. Automat.Control AC-23 (4),618-626 (1978).

IS. Bar-Shalom, Y., and Tse, E., Tracking in a cluttered environment with probabilistic dataassociation, Automatica 11, 451-460 (1975).

16. Belanger, P. R., Estimation of noise covariance matrices for a linear time-varying stochasticprocess, Automatica 10, 267-275 (1974).

17. Box, G. E. P., and Jenkins, G. M., "Time Series Analysis: Forecasting and Control." Holden-Day, San Francisco, California, 1976.

18. Brewer, H. W., Identification of the noise characteristics in a Kalman filter, "Control andDynamic Systems: Advances in Theory and Applications" (C. T. Leondes, ed.), Vol. 12,pp. 491-582. Academic Press, New York, 1976.

19. Brockett, R. W., "Finite Dimensional Linear Systems." Wiley, New York, 1970.20. Caines, P. E., Prediction error identification methods for stationary stochastic processes,

IEEE Trans. Automat. Control AC-21 (4), 500-505 (1976).21. Caines, P. E., and Rissanen, J., Maximum likelihood estimation for multivariable Gaussian

stochastic processes. IEEE Trans. Informat. Theory IT-20 (1),102-104, (1974).22. Chang, C. B., Whiting, R. H., and Athans, M., The Application of Adaptive Filtering

Methods to Maneuvering Trajectory Estimation, Lincoln Lab., MIT, Rep. No. TN-I975-59,24 Nov., 1975.

23. Chang, C. B., Whiting, R. H., and Athans, M., On the state and parameter estimation formaneuvering re-entry vehicles, IEEE Trans. Automat. Control AC-22 (I), 99-105 (1977).

24. Cox, H., On the estimation of the state variables and parameters for noisy dynamic systems,IEEE Trans. Automat. Control, AC-9 (I), 5-12, (1964).

25. Cramer, H., "Mathematical Methods of Statistics." Princeton Univ. Press, Princeton,New Jersey, 1946.

26. Demetry, J. S., and Titus, H. A., Adaptive Tracking of Maneuvering Targets, TechnicalRep. NPS-52DE804IA. Naval Postgraduate School, Monterey, California (April 1968).

27. Denham, M. J., Canonical forms for the identification of multivariable linear systems,IEEE Trans. Automat. Control AC-19 (6), 646-656 (1974).

28. Detchmendy, D. M., and Sridhar, R., Sequential estimation of states and parameters innoisy nonlinear dynamical systems, J. Basic Eng. (Trans. ASMEj, Ser. D 88 (2), 362-368(1966).


29. Dunn, H. J., and Montgomery, R. C, A moving window parameter adaptive control systemfor the F8-DFBW aircraft, IEEE Trans. Automat. Control AC-22 (5), 788-795 (1977).

30. Eulrich, B. J., Adrisani, D., II, and Lainiotis, D. G., Partitioning identification algorithms,IEEE Trans. Automat. Control AC-25 (3), 521-528 (1980).

31. Eykhoff, P., Process parameter and state estimation, Automatica 4, 205-233 (1968).32. Eykhoff, P., "System Identification." Wiley, New York, 1974.33. Farison, J. B., Graham, R. E., and Shelton, R. C, Identification and control oflinear discrete

systems, IEEE Trans. Automat. Control AC-l2 (4), 438-442 (1967).34. Galiana, F., A Review of Basic Principles and of Available Techniques in System Identi-

fication, Tech. Rep. 20. Power Systems Engineering Group, MIT, Cambridge, Massachusetts(November 1969).

35. Geise, C, and McGhee, R. B., Estimation of nonlinear system states and parameters byregression methods, Preprints of the Joint Automat. Control Conf, Rensselaer PolytechnicInst., Troy, New York pp. 46-55 (June 1965).

36. Glover, K., and Willems, J. C, Parametrization of linear dynamical systems-canonicalforms and identifiability, IEEE Trans. Automat. Control AC-19 (6),640-646 (1974).

37. Goodwin, G. C, and Payne, R. L., "Dynamic System Identification: Experimental Designand Data Analysis." Academic Press, New York, 1977.

38. Granger, C. W. J., A quick test for the serial correlation suitable for use with nonstationarytime series, Amer. Statist. Assoc. J. 728-736 (September 1963).

39. Gupta, N. K., Efficient computation of gradient and Hessian of likelihood function inlinear dynamic systems, IEEE Trans. Automat. Control AC-2I (5), 781-783 (1976).

40. Gupta, N. K., and Hall, W. E., Jr., Input Design for Identification of Aircraft Stability andControl Derivatives, Tech. Rep. NASA-CR-2493 (February 1975).

41. Gupta, N. K., and Hall, W. E., Jr., System identification technology for estimating re-entryvehicle aerodynamic coefficients, AIAA J. Guidance and Control 2 (2), 139-146 (1979).

42. Gupta, N. K., Hall, W. E., Jr., and Trankle, T. L., Advanced methods of model structuredetermination from test data, AIAA J. Guidance and Contrail (3), 197-204 (1978).

43. Gupta, N. K., and Mehra, R. K., Computational aspects of maximum likelihood estimationand reduction in sensitivity function calculations, IEEE Trans. Automat. Control AC-19 (6),774-783 (1974).

44. Gustavsson, I., Survey of application of identification in chemical and physical processes,Automatica 11, 3-24 (1975).

45. Gustavvson, I., Ljung, L., and Soderstrom, T., Survey paper-identification of processes inclosed-loop-identifiability and accuracy aspects, Automatica 13, 59-75 (1977).

46. Hall, W. E., Jr., Identification of Aircraft Stability Derivatives for the High Angle-of-AttackRegime, Tech. Rep. No. I for ONR Contract No. NOOOI4-72-C-0328. Systems Control,Inc., Palo Alto, California (June 1973).

47. Hall, W. E., Jr., and Gupta, N. K., System identification for nonlinear aerodynamic flightregimes, J. Spacecraft and Rockets 14 (2),73-80 (1977).

48. Hampton, R. L. T., and Cooke, J. R., Unsupervised tracking of maneuvering vehicles,IEEE Trans. Aerospace Electron. Systems AES-9 (2), 197-207 (1973).

49. Heller, B. J., Adapting an Alpha Beta Tracker in a Maneuvering Target Environment, Tech.Note 304-154. Naval Weapons Center, China Lake, California (July 1967).

50. Hilborn, C. G., Jr., and Lainiotis, D. G., Optimal estimation in the presence of unknownparameters, IEEE Trans. Systems Sci. Cybernet. SSC-5 (1),38-43 (1969).

51. Ho, Y, On the stochastic approximation method and optimal filtering theory, J. Math.Anal. Appl. 6,152-154 (1962).

52. Ho, Y., The Method of Least Squares and Optimal Filtering Theory, Memo. RM-3329-PR.The Rand Corp., Santa Monica, California (October 1962).

53. Ho, Y. C, and Lee, R. C K., A Bayesian approach to problems in stochastic estimation andcontrol, IEEE Trans. Automat. Control AC-9 (5) (1964).

REFERENCES 147

54. Ho, Y., and Lee, R. C. K., Identification of linear dynamic systems, Inform. and Control8 (I), 93-110, (1965).

55. Ho, Y., and Whalen, 8. H., An approach to the identification and control of linear dynamicsystems with unknown parameters, IEEE Trans. Automat. Control AC-8 (3), 255-256 (1963).

56. Iliff, K. W., and Maine, R. E., Observations on maximum likelihood estimation of aero-dynamic characteristics from flight data, AIAA J. Guidance and Control 2 (3), 228-234 (1979).

57. Isaacson, E., and Keller, H. 8., "Analysis of Numerical Methods." Wiley, New York, 1966.58. Iserman, R., Baur, U., Bamberger, W., Kneppo, P., and Siebert, H., Comparison of six

on-line identification and parameter estimation methods, Automatica 10, 81-103 (1974).59. Jazwinski, A. H., Adaptive filtering, Automatica 5, 975-985 (1969).60. Jenkins, G. M., and Watts, D. G., "Spectral Analysis and its Applications." Holden-Day,

San Francisco, California, 1968.61. Jensen, R. L., and Harnly, D. A., An Adaptive Distributed-Measurement Extended Kalman

Filter for a Short Range Tracker. M.S. thesis, Air Force Institute of Technology, Wright-Patterson AFB, Ohio, (December, 1979).

62. Johnson, C. R., Jr., The common parameter estimation basis of adaptive filtering, identi-fication and control, Proc. IEEE Conf. Decision and Control, Albuquerque, New Mexico447-452 (December 1980).

63. Kailath, T. (ed.), Special issue on system identification and time-series analysis, IEEE Trans.Automat. Control AC-19 (6) (1974).

64. Kale, 8. K., On the solution of the likelihood equation by iteration processes, Biometrika 48,452-456 (1961).

65. Kashyap, R. L., Maximum likelihood identification of stochastic linear systems, IEEETrans. Automat. Control AC-15 (1),25-34 (1970).

66. Kendrick, J. D., Estimation of Aircraft Target Motion by Exploiting Pattern RecognitionDerivates. Ph.D. dissertation, Air Force Institute of Technology, Wright-Patterson AFB,Ohio (March 1978).

67. Kendrick, J. D., Maybeck, P. S., and Reid, J. G., Estimation of aircraft target motion usingorientation measurements, IEEE Trans. Aerospace Electron. Syst. AES-17 (2), 254-260(1981).

68. Kerr, R. 8., Statistical, structural, and regression models in linear process identification,Preprints of the Joint Automat. Control Conf., Rensselaer Polytechnic Inst., Troy, New Yorkpp. 458-466 (June 1965).

69. Keverian, K. M., and Sandell, N. R., Jr., Multiobject Tracking by Adaptive HypothesisTesting, Tech. Rep. LIDS-R-959. MIT Lab. for Information and Decision Systems, Cam-bridge, Massachusetts (December 1979).

70. Kolibaba, R. L., Precision Radar Pointing and Tracking Using an Adaptive ExtendedKalman Filter. M.S. thesis, Air Force Institute of Technology, Wright-Patterson AFB,Ohio (June 1973).

71. Kumar, K. S. P., and Sridhar, R., On the identification of control systems by the quasi-linearization method, IEEE Trans. Automat. Control AC-9 (2),151-154 (1964).

72. Lainiotis, D. G., Optimal adaptive estimation: Structure and parameter adaptation, IEEETrans. Automat. Control AC-15 (4),160-170 (1971).

73. Lainiotis, D. G., Partitioning: A unifying framework for adaptive systems, I: Estimation,Proc. IEEE 64 (8), (1976).

74. Lee, R. C. K., "Optimal Estimation, Identification and Control." MIT Press, Cambridge,Massachusetts, 1964.

75. Levin, M. J., Estimation of a system pulse transfer function in the presence of noise, IEEETrans. Automat. Control AC-9 (3), 229-235 (1964).

76. Ljung, L., On the consistency of prediction error identification methods, "System Identi-fication: Advances and Case Studies" (R. K. Mehra and D. G. Lainiotis, eds.), pp. 121-164.Academic Press, New York, 1976.


77. Ljung, L., Convergence analysis of parametric identification methods, IEEE Trans. Automat.Control AC-23 (5), 770-782 (1978).

78. Ljung, L., Gustavsson, I., and Soderstrom, T., Identification of linear, multivariable systemsoperating under linear feedback control, IEEE Trans. Automat. Control AC-19 (6),836-840(1974).

79. Lopez-Toledo, A. A., Optimal Inputs for Identification of Stochastic Systems. Ph.D.dissertation, MIT, Cambridge, Massachusetts, 1974.

80. Magill; D. T., Optimal adaptive estimation of sampled stochastic processes, IEEE Trans.Automat. Control AC-IO (5), 434-439 (1965).

81. Martin, W. c., and Stubberud, A. R., The innovations process with applications to identi-fications, "Control and Dynamic Systems: Advances in Theory and Applications" (c. T.Leondes, ed.), Vol. 12, pp. 173-258, Academic Press, New York, 1976.

82. Maybeck, P. S., Combined Estimation of States and Parameters for On-Line Applications,Rep. T-557. Ph.D. dissertation, MIT, Cambridge, Massachusetts, (February 1972).

83. Maybeck, P. S., "Stochastic Models, Estimation and Control." Vol. I. Academic Press,New York, 1979.

84. Maybeck, P. S., Cusumano, S. J., DePonte, M., Jr., and Negro, J. E., Enhanced fire controlsystem filtering via refined air-to-air missile acceleration modelling, Proc. 1978 IEEE ConfDecision and Control, San Diego, California pp. 80-87 (January 1979).

85. Maybeck, P. S., Hamly, D. A., and Jensen, R. L., Robustness of a new infrared targettracker, Proc. IEEE Nat. Aerospace and Electron. Conf., Dayton, Ohio pp. 639-644 (May1980).

86. Maybeck, P. S., Jensen R. L., and Harnly, D. A., An adaptive extended Kalman filterfor target image tracking, IEEE Trans. Aerospace Electron. Syst. AES-17 (2), 172-180(1981).

87. McAulay, R. J., and Denlinger, E., A decision-directed adaptive tracker, IEEE Trans.Aerospace and Electron. Syst. AES-9 (2),229-236 (1973).

88. Mehra, R. K., On the identification of variances and adaptive Kalman filtering, IEEETrans. Automat. Control AC-15 (2),175-184 (1970).

89. Mehra, R. K., Identification of stochastic linear dynamic systems using Kalman filterrepresentation. AIAA J. 9 (I), 28-31 (1971).

90. Mehra, R. K., On-line identification oflinear dynamic systems with applications to Kalmanfiltering, IEEE Trans. Automat. Control AC-16 (1),12-21 (1971).

91. Mehra, R. K., Approaches to adaptive filtering, IEEE Trans. Automat. Control AC-17 (5),693-698 (1972).

92. Mehra, R. K., Identification in control and econometrics: similarities and differences.Ann. Econ. Soc. Measurement 3,21-48 (1974).

93. Mehra, R. K., Optimal inputs for linear system identification, IEEE Trans. Automat.Control AC-19 (3), 192-200 (1974).

94. Mehra, R. K., Optimal input signals for parameter estimation-Survey and new results,IEEE Trans. Automat. Control AC-19 (6),753-768 (1974).

95. Mehra, R. K., and Lainiotis, D. G., "System Identification: Advances and Case Studies."Academic Press, New York, 1976.

96. Mehra, R. K., and Peschon, J., An innovations approach to fault detection and diagnosisin dynamic systems, Automatica 7,637-640 (1971).

97. Moore, J. B., and Hawkes, R. M., Decision methods in dynamic system identification.Proc. IEEE Conf. Decision and Control, Houston, Texas pp. 645-650 (1975).

98. Moose, R., Applications of adaptive state estimation theory, Proc. IEEE Conf Decisionand Control, Albuquerque, New Mexico pp. 568-575 (December 1980).

99. Nahi, N. E., and Schaefer. B. M., Decision-directed adaptive recursive estimators: Diver-gence prevention, IEEE Trans. Automat. Control AC-17 (1),61-67 (1972).

REFERENCES 149

100. Nash, R. A., and Tuteur, F. 8., The effect of uncertainties in the noise covariance matriceson the maximum likelihood estimate of a vector, IEEE Trans. Automat. Control AC-13(1),86-88 (1968).

101. Neal, S. R., Linear estimation in the presence of errors in assumed plant dynamics, IEEETrans. Automat. Control AC-12 (5),592-594 (1967).

102. Ng, T. S., Goodwin, G. c., and Anderson, B. D.O., Identifiability of MIMO linear dynamicsystems operating in closed loop, Automatica 13, 477-485 (1977).

103. Ohap, R. F., and Stubberud, A. R., Adaptive minimum variance estimation in discrete-time linear systems, "Control and Dynamic Systems: Advances in Theory and Applications"(c. T. Leondes, ed.), Vol. 12, pp. 583-624. Academic Press, New York, 1976.

104. Olmstead, D. N., Optimal Feedback Controls for Parameter Identification. Ph.D. disser-tation, Air Force Institute of Technology, Wright-Patterson AFB, Ohio (March 1979).

105. Pearson, J. D., Deterministic aspects of the identification oflinear discrete dynamic systems,IEEE Trans. Automat. Control AC-12 (6),764-766 (1967).

106. Phatak, A., Weinert, H., and Segall, I., Identification of the Optimal Control Model forthe Human Operator. Tech. Rep., Systems Control, (1974).

107. Rao, C. R., "Linear Statistical Inference and its Applications." Wiley, New York, 1968.108. Rauch, H. E., Tung, F., and Striebel, C. T., Maximum likelihood estimates oflinear dynamic

systems, AIAA J. 3 (8), 1445-1450 (1965).109. Roussas, G. G., Extension to Markov processes ofa result by A. Wald about the consistency

of maximum likelihood estimates, Z. Wahrscheinlichkeitstheorie 4, 69-73 (1975).110. Rogers, A. E., and Steiglitz, K., Maximum likelihood estimation of rational transfer func-

tion parameters, IEEE Trans. Automat. Control AC-12 (5),594-597 (1967).III. Sage, A. P., and Husa, G. W., Adaptive filtering with unknown prior statistics, Proc. Joint

Automat. Control Conj., Boulder, Colorado pp. 760-769 (1969).112. Sage, A. P., and Masters, G. W., On-line estimation of states and parameters for discrete

nonlinear dynamic systems, Proc. Nat. Electron. Conf., Chicago, Illinois 22,677-682 (October1966).

113. Sage, A. P., and Melsa, J. L., "System Identification." Academic Press, New York, 1971.114. Sakrison, D. J., The use of stochastic approximation to solve the system identification

problem, IEEE Trans. Automat. Control AC-12 (5),563-567 (1967).liS. Sandell, N. R., Jr., and Yared, K. I., Maximum Likelihood Identification of State Space

Models for Linear Dynamic Systems, Tech. Rep. ESL-R-814. MIT Electronic Systems Lab.,Cambridge, Massachusetts (April 1978).

116. Saridis, G. N., Comparison of six on-line identification algorithms, Automatica 10, 69-79(1974).

117. Saridis, G. N., and Lobbia, R. N., Parameter identification and control of linear discrete-time systems, IEEE Trans. Automat. Control AC-17 (1), 52-60 (1972).

118. Schulz, E. R., Estimation of pulse transfer function parameters by quasilinearization, IEEETrans. Automat. Control AC-13 (4), 424-426 (1968).

119. Schweppe, F. c., Algorithms for Estimating a Re-Entry Body's Position, Velocity andBallistic Coefficient in Real Time or for Post Flight Analysis. Lincoln Lab., Group Rep.1964-64, Lexington, Massachusetts (1964).

120. Schweppe, F. C,; Evaluation of likelihood functions for Gaussian signals, IEEE Trans.Inform. Theory IT-ll (I), 61-70 (1965).

121. Schweppe, F. c., "Uncertain Dynamic Systems." Prentice-Hall, Englewood Cliffs, NewJersey, 1973.

122. Shellenbarger, J. c., Estimation of covariance parameters for an adaptive Kalman filter,Proc. Nat. Electron. Conf. 22, 698-702 (1966).

123. Shellenbarger, J. C., A multivariance learning technique for improved dynamic systemperformance, Proc. Nat. Electron. Conf. 23, 146-151 (1967).


124. Siferd, R. E., Observability and Identifiability of Nonlinear Dynamical Systems with anApplication to the Optimal Control Model for the Human Operator. Ph.D. dissertation,Air Force Institute of Technology, Wright-Patterson AFB, Ohio (June 1977).

125. Sims, F. L., and Lainiotis, D. G., Recursive algorithm for the calculation of the adaptiveKalman filter weighting coefficients, IEEE Trans. Automat. Control AC-14 (2), 215-217(1969).

126. Smith, G. L., Sequential estimation of observation error variances in a trajectory estimationproblem, AIAA J. 5, 1964-1970 (1967).

127. Soderstrom, T., On the uniqueness of maximum likelihood identification, Automatica 11,193-197 (1975).

128. Soderstrom, T., Gustavsson, I., and Ljung, L., Identifiability conditions for linear systemsoperating in closed-loop, Internat. J. Control 21 (2),243-255 (1975).

129. Soderstrom, T., Ljung, L., and Gustavsson, I., Identifiability conditions for linear multi-variable systems operating under feedback, IEEE Trans. Automat. Control AC-21 (6),837-839 (1976).

130. Stepner, D. E., and Mehra, R. K., Maximum Likelihood Identification and Optimal InputDesign for Identifying Aircraft Stability and Control Derivatives, NASA Rep. CR-2200.Systems Control, Inc., Palo Alto, California (March 1973).

131. Tenney, R. R., Hebbert, R. S., and Sandell, N. R., Jr., A tracking filter for maneuveringsources, IEEE Trans. Automat. Control AC-22 (2), 246-261 (1977).

132. Tse, E., Information matrix and local identifiability of parameters, Proc. Joint AutomaticControl Conf., Columbus, Ohio pp. 611-619 (1973).

133. Tse, E., and Anton, J., On identifiability of parameters, IEEE Trans. Automat. ControlAC-17 (5), 637-646 (1972).

134. Tse, E., and Weinert, H., Correction and extension of "On the identifiability of parameters,"IEEE Trans. Automat. Control AC-18 (6), 687-688 (1973).

135. Tse, E., and Weinert, H., Structure determination and parameter identification for multi-variable stochastic linear systems, IEEE Trans. Automat. Control AC-20 (5),603-612 (1975).

136. Upadhyaya, B. R., and Sorenson, H. W., Synthesis of linear stochastic signals in identifica-tion problems, Automatica 13, 615-622 (1977).

137. Van Trees, H. L., "Detection, Estimation and Modulation Theory," Vol. I. Wiley, NewYork,1968.

138. Vincent, J. H., Hall, W. E., Jr., and Bohn, J. G., Analysis ofT-2C High Angle of AttackFlight Test Data with Nonlinear System Identification Methodology, Tech. Rep. ONR-CR212-259-IF. Systems Control, Inc., Palo Alto, California (March 1978).

139. Wald, A., Note on the consistency of the maximum likelihood estimate, Ann. Math. Statist.20,595-601 (1949).

140. Watson, G. S., and Durbin, J., Exact tests of serial correlation using noncircular statistics,Ann. Math. Statist. 22, 446-451 (1951).

141. Weiss, I. M., A survey of discrete Kalman-Bucy filtering with unknown noise covariances,Proc. AIAA Guidance, Control, and Flight Mech. Conf., Santa Barbara, California Paper No.70-955 (August 1970).

142. Wellstead, P. E., Reference signals for closed-loop identification, Internat. J. Control 26,945-962 (1977).

143. Wellstead, P. E., and Edmunds, J. M., Least-squares identification of closed-loop systems,Internat. J. Control 21 (4),689-699 (1975).

144. Widnall, W. S., "Applications of Optimal Control Theory to Computer Controller Design,"Research Monograph No. 48. MIT Press, Cambridge, Massachusetts, 1968.

145. Wilks, S. S., "Mathematical Statistics." Wiley, New York, 1963.146. Willsky, A. S., A survey of design methods for failure detection in dynamic systems, Auto-

matica 12, 601-611 (1976).

PROBLEMS 151

147. Yared, K. L, On Maximum Likelihood Identification of Linear State Space Models, Ph.D.dissertation, Rep. LIDS-TH-920. MIT Lab. for Inform. and Decision Systems, Cambridge.Massachusetts (July 1979).

148. Zarrop, M. B., and Goodwin, G. C, Comments on optimal inputs for system identification,IEEE Trans. Automat. Control AC-20 (2), 299-300 (1975).

PROBLEMS

10.1 Consider the simple problem formulation of (10-1)-(10-5) in which all variables arescalar. Let scalar cIl and Bd be uncertain parameters to estimate simultaneously with the scalarstate x,

(a) Write out the likelihood equations, (10-20) and (10-21), in explicit detail.

(b) Generate the full-scale estimator equations (10-32)-(10-57) for this special case.

(c) Demonstrate the simplification due to using the approximation given by (10-58)-(10-60).

(d) Develop the explicit evaluation of the ambiguity function given by (10-66) by assuming ascalar truth model (so model order reduction effects do not come into play).

(e) Generate the online estimator equations based upon (10-67) and (10-68); compare these tothe online estimator based on (10-69).

(f) Demonstrate the simplification due to including only weighted least squares type terms,as discussed in relation to (10-73)-(10-75). Compare the number of computations to that requiredwhen all likelihood equation terms are included.

(g) What portion of these estimator equations can be precomputed and stored, perhaps as afunction of assumed parameter values?

(h) Assume that possible cIlvalues are adequately represented by three discrete values, cIlI , cIl 2 ,

and cIl 3, while Bd is adequately represented by two values, Bdl and Bd 2 • Develop the multiple modelfiltering algorithm for this problem description.

10.2 Write out the equations in explicit detail for the simultaneous state estimation andsystem identification algorithms of

(a) Example 10.3.

(b) Example 10.4.

(c) Example 10.8.

(d) Example 10.10.

(e) Example 10.13.

(f) Example 10.15.

(g) Example 10.19.

10.3 As in Problem 10.1, consider the simple problem formulation of (10-1)-(10-5) in whichall variables are scalar. Now consider either Qd or R being uncertain parameters requiring simul-taneous estimation along with the state.

(a) Write the full-scale estimator equations (10-32)-(10-57), as modified by the results of theinitial part of Section 10.7, for the case of estimating x and Qd'

(b) Explicitly show the simplification due to using (10-85).

(c) Establish the closed-form approximation (10-93) for this problem.

(d) Assume that possible Qd values are adequately represented by three discrete values, Qdl'

Qd2, and Qd3' Develop the multiple model filtering algorithm for this problem.

(e) Repeat (a) for the case of estimating x and R.


(f) Repeat (b) for the case of estimating x and R.

(g) Establish the closed-form approximations (10-88) and (10-90) for this problem, and com-pare them.

(h) Assume that possible R values are well represented by three discrete values, R l' R 2 , andR 3 · Develop the multiple model filtering algorithm for these models.

(i) Develop the multiple model filtering algorithm for uncertain Qd and R, discretized as in(d) and (h).

(j) Apply (10-95)-(10-97) to this problem, for cases of uncertainties in Qd' or R, or both.

(k) Develop the self-tuning adaptive filter for this problem, based on correlation methods,as described in Section 10.9. Consider cases of Qd uncertain, R uncertain, and both Qd and Runcertain.

(I) Develop the self-tuning adaptive filter for this problem, based on covariance matchingtechniques, as described in Section 10.10. Consider cases of Qd uncertain, R uncertain, and bothQd and R uncertain.

10.4 Write out the equations in explicit detail for the self-tuning adaptive estimators of

(a) Example 10.18, single input-single output case, with and without precomputed J- 1.

(b) Example 10.18, two input-two output case, with and without precomputed J- 1•

(c) Example 10.18, scalar Qd estimated via (10-93) and scalar R via either (10-88) or (10-90).

(d) Example 10.18, 2-by-2 Qd estimated via (10-93) and 2-by-2 R via either (10-88) or (10-90).

(e) Example 10.21.

(f) Example 10.22 by covariance matching.

(g) Example 10.22 by multiple model filtering.

10.5 (a) Consider an alternative to the development of (10-15) and (10-18) that would beuseful for the case of Qd == O. Instead of using (10-9), show that Bayes' rule can also produce

i

h(f,), %(1;)1_ = h(lilla n J~(tj>lZ(tj- d,x(t,l,aj= 1

and that each term in this product can be expressed conveniently in Gaussian form if Qd == O.Generate these forms explicitly and establish the appropriate likelihood equations to be solvedsimultaneously for X*(ti) and a*(t,).

(b) Repeat for the case of fixed-length memory parameter estimation and fixed-length memorystate estimation by showing Bayes' rule can produce

10.6 (a) Develop the maximum likelihood estimator equations as in Section 10.3 for entrieslabeled (1) in Table 10.1.

(b) Repeat for entries labeled (2).

(c) Repeat for entries labeled (3).

10.7 Derive the finite-memory maximum likelihood state and parameter estimator basedupon the likelihood function In f.,,(tiJ.ZN(tiJl.(~' :!EN IIX). Show that the estimator is of the same formas in Section 10.4, except that the initial conditions X(t'+_N)' P(t'+-N), and E{X(t'+_N)XT(t'+_N)} arereplaced by i(t i- N). P(t'-N)' and i(ti_N)iT(t i_ N), respectively, where i(ti-N) is generated from aone-step propagation from the previous value i(ti-N- d,

PROBLEMS 153

for i 2 (N + 1),starting from the initial condition x(ta) = xa, and similarly P(t'-N) is obtained from

P(t i - N) = <fl(t,- N,ti_N- I ; 3*(ti))P{t, - N_ l)<flT(t, - N,ti - N- I ; 3*(ti))

+ Gd(ti- N- ,)Qd{ti-N- ,)G/(ti-N-,)

for i 2 (N + 1), with initial condition P(ta) = Po·

1O.S Consider the special case of a single input-single output linear dynamic system

y(ti+ ,) = tPay(ti) + ... + 4>n-ly(ti-n+ ,) + bU(ti) + Wd(ti)

where rPa, ... , tPn- I' and b are uncertain, u is a control input, and wd ( " , ) is zero-mean whiteGaussian discrete-time noise of constant variance Qd' Note that this can be expressed in state spacemodel form as

[X(ti+,)] = [~J~T~J[X{ttl] + t~jU(ttl + t~jWd(ti)z(t i) = [ OT i I ]X(ti)

by letting

XT(ti) ~ [y{ti-n+,)

l/JT~[tPn_l

y(ti-,) y(t,)]

4> I tPo]

(a) Generate the likelihood equations to be solved to form the parameter estimator basedupon a likelihood function of

In!zN(t;)\Z(fl N),11 = L lnh(tj)!z(lj_l),.j=j-N+ 1

that would be appropriate for parameters modeled as essentially constant over N steps. Show thateach term in the preceding summation is Gaussian, with mean [l/JTZn(tJ _ ,) + bu(tj_,)] and varianceQd, where Zn{tj-,) is the vector of the most recent n measurements at time t j_" with componentsz(tj-n) to z(tj_,); then take the appropriate partials with respect to rP and b to yield the desiredresults.

(b) Show that these equations call be solved in closed form as

where

A = L Zn(tj_,)ZnT(tj_,)j=i-N+ 1

A = L Zn(tj_,)u(t j_,)j=i-N+ 1

Aa = L U2(tj _ ' )

j=i-N+ 1

(c) Show how this result extends to the case of time-varying Qd(t,j.

10.9 In deriving the expressions given by (10-31) for evaluating the scalar elements of theconditional information matrix in (10-30),it is necessary to evaluate the form E{[yTW lY] [yTWzy]}where y is a zero-mean Gaussian random vector with covariance Y, and where W, and Wz are


arbitrary weighting matrices. This is a useful result in and of itself, so we want to show that

To show this, first show that

y=jYx

where x is a zero-mean Gaussian random vector with covariance I. Then show

. .= I I AIiA 2 jE{x/x/}

i= 1 j=l

with Al and A2 appropriately defined, and Ali and A2 j being the eigenvalues of Al and A2 , respec-tively, for i and j both taking on values from 1 to n. Demonstrate the validity of these equalities,and use them to derive the final result.

10.10 Consider maximizing a likelihood function L[O, Z] as a function of °and for givenZ, by numerical means. The solution point is then defined by

oL[O, Z]/OO = OT

(a) Assume you have a trial point 0. and the gradient evaluated at that point is nonzero:

The question is, for a given length of step to the next iterate value 0k+I, in which direction shouldwe step so as to make the greatest change in L? Define this step as

!l.0. = 0k+ I - 0.

Then, under the constraint that the length of the step is s, i.e., !l.0/ !l.0. = S2, show that the best stepto take is along the gradient, or "steepest ascent," direction, yielding an iteration

where the term in braces is viewed as a scalar step-size control parameter, yielding a step of lengths in the gradient direction. The easiest way to show this is by using Lagrange multipliers: maximizethe value of

with respect to 0k+1 and v. It is useful to consider a Taylor series for L[0k+I' Z] carried to firstorder:

to derive this gradient optimization algorithm.

(b) Consider the case of scalar e. We want to find the value ofethat yields the stationary pointin L, i.e., where

OL[e, Z]/oele' = 0

PROBLEMS

-+-------:7""'''-----~J----~---fi

FIG. 10.PI Basic Newton-Raphson concept.

155

Thus, if we plot iJL/iJ8 versus 8 as in the Fig. 1O.PI, we are searching for the zero crossing of cL/iJ8.Suppose you start at some value 8k • One means of efficiently iterating towards the zero-crossing isto approximate the gradient iJL/88 by the local tangent at 8., and find the value of 8 where thislocal tangent crosses zero and call that point 8k+ i - Show this yields the iteration

8k+1 = e, - [~:~IJ- [[~~IJknown as the Newton-Raphson method.

(c) Repeat part (b) for the case of vector 0, and show that the result generalizes to the formdepicted in (10-23).

(d) Note the similarities and differences of the forms of the gradient and Newton-Raphsoniterations just derived.

10.11 Demonstrate the validity of (10-26) by writing

j; ... roo fZ(t;)I.(~I(l)d~ = 1

so that

Combining this with the fact that

f = exp{lnf} = exp{L}

yields the desired result.

10.12 Develop the count for the number of multiplications, additions, subtractions, andinversions for state X, score s, conditional information matrix J, and new parameter estimate li*as depicted in Examples 10.1, 10.2, 10.7, 10.9, 10.12, 10.14, and 10.16, but written as functions of n(dimension of x), r (dimension of u], s (dimension of w), m (dimension of z), and P (dimension of a).(Let P = PA + PH' where PH is the number of parameters whose effects are confined to B, only,whereas PA corresponds to parameters that affect both cJ) and B, in general: express the result interms of PA and PH separately.)

10.13 (a) By applying the transformation of (10-77) to the model equations (10-76), demon-strate the validity of (10-78) and (10-79).

(b) Show the validity of (10-81)-(10-84) by substitution into (10-76)-(10-79).


10.14 Consider the estimator equations developed in Sections 10.3 and 10.4 for the case ofuncertain parameters in III and Bd. As discussed in the beginning of Section 10.2, often we neednot consider uncertainties in H. But in some cases, especially when transformations of state variablesare considered (see the discussion following (10-84)), we must consider such uncertain parameters.Generalize the results of Sections 10.3 and 10.4 to allow H to be affected by components of theparameter vector a. Specifically detail the modifications to the full-scale estimator equations(10-32)-(10-57) as a result of this extension.

10.15 Derive relation (10-54) for DP(tj +)/Dak for use in maximum likelihood estimation ofstates and parameters in III and B, according to the method discussed just below that equationin the text.

10.16 Demonstrate that (1/N) Ljrjrl is an ergodic approximation to [H(tj)P(tj-)HT(tj) + R(t j)],as claimed below (10-88).

10.17 (a) This problem develops a physical interpretation of the closed-form approximationfor Qd(t,) given by (10-93). Recall the covariance update and propagation equations for a Kalmanfilter,

P(tj +) = P(tj ") - K(tj)H(tj)P(t j ")

P(tj-+d = lIl(tj+ l' t)P(tj +)IIlT(tj+1, t) + Gd(t)Qd(t)GdT(tj)

By noting that E{Llxj Llxl} = K(tj)H(t)P(tj-) and then approximating the ensemble average bya temporal average over the most recent N samples of data, derive (10-93)as a physically reasonableapproximation.

(b) To reduce storage requirements, a fading memory approximation to the finite memoryresult above is sometimes implemented in actual applications, i.e.,

Qd(tj) = kQd(tj- d + [1 - k]Qdl(tjl

where Qdl(t,) is a single term from the summation in (10-93) and k ~ 0.8 typically. Show howk E (0,0.8) versus k E (0.8,1.0) would affect error variance and responsiveness of this estimate, andrelate this to N of(10-93).

10.18 Explicitly derive relations (10-104)and (l0-105) for the multiple model filtering algorithmby the method described in the text.

10.19 (a) Derive (10-112).

(b) Iterate (10-114) k times and demonstrate the validity of the two expectation relationsbelow it.

(

(c) Generate the expression given by (10-115).

(d) Explicitly write out (l0-115a) for k = 1,2, ... , n and rearrange to derive (l0-117).

(el Take the difference of (10-113) and (10-120) to yield (10-121), noting that

K* = P*-HT(HP* -HT + R)-l = P* -HT(Ao + H6P-HT) - 1

(f) Explicitly generate the two optional methods for providing an estimate of GdQdGdT, asoutlined below (10-122) in the text.

10.20 Assume that you are trying to achieve the "best" linear time-invariant system modelto represent the input-output characteristics of an unknown system. This problem explores theissue of identifiability of the system model.

(a) Let each system input point in turn be individually driven by wideband noise with stationarystatistics of known power spectral density characteristics, and let each system output be analyzed toyield the power spectral density of each in response to a particular input. Assume perfect measure-ments. Describe fully how this information might be used to generate a linear frequency domainmodel of the system. Interpret the idea of identifiability for this application.

PROBLEMS 157

(b) Assume that some basic modeling has been completed and that the dimension n of thestate vector x to model the system has been established. Consider a free dynamic system modeled as

x(t, + tl = cJ)x(t,)

where x(to) is known and all states can be observed. Using the first n values of x(t,) from a specificinitial condition xo, develop a necessary condition for cJ) to be determined completely, and interpretthis physically.

(c) If only a single measurement were available for part (b),

with hTconstant. It is desired to "identify" the equivalent system

x*(t,+ 1) = cJ)*x*(t,),

where cJ)* and h*Tare of the form

By considering the first 2n measurements, the n unknown elements of cJ)* can be evaluated undercertain conditions. What are these conditions?

(d) Now consider a general system description as follows. Let x be a state function that maps(0 x U x l) into R", where 0 is the (Hilbert) space of uncertain parameters 8, U is the space ofaddmissible controls, and I is a time interval in R 1. Let Y be the (Hilbert) space of output Junctions,with elements y(.) E Y defined by

y(t) = h[x,8,u,t] + v,(t)

for all t, where v,(t) is an additive noise (eventually to be described as a sample from a white zero-mean random process with covariance kernel E{v,(t)vT(t + 't)}= R,Ii('t)). Since we have no dynamicsdriving noise, a complete state solution exists for any nominal value of 8, once u and x(to) arespecified. Thus, we can write y(t) = h[8, t] + v,(t). The local identifiability of this system modelhas been related to being able to solve uniquely the least squares problem (neglecting v,(t) for themoment)

minl16y - Jf'68112 = min <[6y - Jf'68], [6y - Jf'68])ya8 49

where Jf' is the output sensitivity operator, which can be thought of as the partial derivative ofh with respect to 8 evaluated at the nominal value of the parameters, 80 , (It is also the "Frechetdifferential" of h at 80 , ) <.,')y is the inner product defined on the space for establishing "length"of the vectors y(.) E Y. Thus, if a unique least squares solution 68 can be found, i.e., if the "best"value of the small variation of 8 from 80 can be determined uniquely from the small variation ofy from the value Yo that it would assume if 80 were the true parameter value, then the system issaid to be locally identifiable. Under what conditions does such a unique solution exist?

To gain insights, consider this as a generalization to the unweighted least squares problemdiscussed at the end of Section 3.11 in Volume 1. There, outputs were elements of Y = Rm

, withinner product defined as

for all y" Y2 E Rm, Here let us assume that Y is the space of square integrable functions that map

the interval [to, tf ] into Rm (sometimes denoted as L 2 { [to, tf]' Rm} ), with associated inner product as

< ) rlf T () dy,(')'Y2(') Y = Jto y,(t) Y2 t t

Moreover, simple premultiplication by a matrix H and then by its transpose in the previous case.as'in writing HTHx+ = HTy in order to solve for x+ as [HTH]-'HTy, are replaced by sequentialapplications of the operators Jf and then its adjoint Jf*, to form the operator Jf* Jf. Lettingo be RP and recalling Problem 8.10e, we can define the adjoint Jf* as the mapping from Y =

L 2 {[to, tf],Rm

} into 0 = RP that satisfies

<Jf 00, oy)y = <oO,Jf*OY)RP

Explicitly show the form of the adjoint mapping Jf*, and use it to express the conditions underwhich the system model is locally identifiable.

(e) Now consider the generalized, or weighted, least squares problem,

min Iloy - Jf oOllt·, = min <Coy - Jf 00], Coy - Jf OO])Rc'80 80

where fJic is a positive self-adjoint operator from the space Y back into Y, such that a;: existsand is self-adjoint. Thus the generalized inner product can be appropriately defined for any Y1( , ),

Y2(')E Yby

<y"Y2)Rc' ~ <Yl,fJi;'Y2)Y = <fJi;'Y"Y2)Y

due to the self-adjoint property. Obtain the solution to this generalized least squares problem.The positive, self-adjoint 8'lJerator (Jf* fJi;' Jf) that is inherent in this solution is a mapping

from 0 into 0, and it provides information about the quality of the estimate 00. Show that if 0is Euclidean p-space RP and Y is L 2 {[to, tf ] , Rm} , then this operator is representable as a p-by-pmatrix, recognized to be the Fisher information matrix at the point 0o. What are the conditions forlocal identifiability of the system model, in terms of this matrix?

CHAPTER 11Nonlinear stochastic

system models

11.1 INTRODUCTION

Suppose we are given a physical system that can be subjected to our ownknown controls and to inputs beyond our control, and that it can provide certainnoise-corrupted measurements to us. Then the objective of a mathematicalmodel associated with this real system is to generate an adequate, tractablerepresentation of the behavior of all system outputs of interest. As stated earlierin Volume 1, adequacy and tractability are subjective and are functions of theintended use of the model. Our desire to develop estimators and controllerscauses us to impose a certain structure upon our models. It has already beenseen that linear system models driven by white Gaussian noise are not onlydescriptive of many observed phenomena, but also yield a basis of usefulestimator algorithms. Controller design based on such a model is also tractableand useful practically, and this will be developed subsequently.

Nevertheless, there is a large and important class of problems for whichlinear stochastic system models are not adequate. In this chapter we considerextensions of this model to admit nonlinearities explicitly. Because we wish tobase estimators and controllers upon the extended model, we will be motivatedto preserve the Markov nature ofthe state stochastic process that had previouslybeen obtained with linear dynamic system models. This motivation and theappropriate model structure are developed more fully in Section 11.2, and thefundamental characteristics of Markov processes are presented in Section 11.3.In order to develop nonlinear stochastic differential equations properly, sto-chastic integrals and differentials are first defined and characterized in Section11.4. Then the subsequent section investigates the associated stochastic differ-ential equations. Since the solution processes for these equations are Markov,their characteristics are described completely by transition probability densities,

159

160 11. NONLINEAR STOCHASTIC SYSTEM MODELS

and the time propagation of these important densities is the subject of the lastsection. The stochastic process models of this chapter will be exploited bynonlinear filter and controller derivations in the sequel.

11.2 EXTENSIONS OF LINEAR SYSTEM MODELING

Recall the results of the linear system modeling efforts of Chapter 4 (Volume1).There we attempted to exploit a model of the form of linear state dynamicsdriven only by known inputs and white Gaussian noise

x(t) = F(t)x(t) + B(t)u(t) + G(t)w(t) (11-1)

starting from a Gaussian x(to) with known mean "0 and covariance Po, alongwith a linear measurement corrupted by additive white Gaussian noise of eitherthe discrete-time form

or of the continuous-time form

z(t) = H(t)x(t) + volt)

(11-2a)

(11-2b)

The noise processes in (11-1) and (11-2) were assumed independent of x(to)and (at least initially) independent of each other, with mean of zero and corre-lation kernels given by

and either

or

E{w(t)wT(t + Tn = Q(t)c5(T) (11-3)

(11-4a)

(11-4b)

White noise inputs were justified on the basis that desired time correlationproperties of physically observed phenomena could be produced sufficientlywell by passing white Gaussian noise through a linear shaping filter, whichcould then be augmented to the basic state model to yield an overall modelof the form above.

As intuitive and physically motivated as this model seemed, it was notsuitable because the solution to (11-1) could not be developed properly usingordinary Riemann integrals. Linear stochastic differential equations and theirsolutions were properly developed through (Wiener) stochastic integrals andBrownian motion. Brownian motion (or the Wiener process) is a zero-meanvector process ~( ., .) that has independent Gaussian increments with

(11-5)

11.2 EXTENSIONS OF LINEAR SYSTEM MODELING 161

that was shown to be continuous but nondifferentiable (with probability oneor "almost surely," and in the mean square sense). Its hypothetical derivativewould be w(',·) that appears in (11-1). A (Wiener) stochastic integral [10,43,44,71]

I(t,') = it A(,)d~(r,.)Jto (11-6)

could then be defined for known (nonrandom) A(') by means of a mean squarelimit

I(t, .) = l.i.m. it AN(r) d~(r,')N--+oo Jto

N-)

= l.i.m. L AN(ti)[~(ti+)") - ~(ti' .)]N-oo ;:::::0

(11-7)

where N is the number of time cuts made in [to, t] and AN(r) = A(t;) for allr E [ti , t, + )). Viewed as a stochastic process, this stochastic integral is itself aBrownian motion process with rescaled diffusion: Gaussian, zero-mean, with

Stochastic differentials are properly defined as functions which, when integratedover appropriate limits, yield the associated stochastic integrals: given (11-6),the stochastic differential dl(t, .) of I(t, .) is

dl(t, .) = A(t) dp(t, .) (11-9)

Thus, the properly written linear stochastic differential equation directlyrelated to (11-1) is

dx(t) = F(t)x(t) dt + B(t)u(t) dt + G(t) dp(t) (11-10)

where P(', .) is of diffusion strength Q(t) for all t of interest, as given by (11-5)or equivalently as

(11-11)

The solution to (11-10) is the stochastic process x(', .) given by

x(t) = «I>(t, to)x(to) + it «I>(t, ,)B(r)u(r)dr + it «I>(t, r)G(r)dp(r) (11-12)Jto Jtowith «I> the state transition matrix associated with F. To characterize a stochasticprocess completely would generally entail specification of the joint probabilitydensity (or distribution if the density cannot be assumed to exist) of x(t)),x(t2 ), ••• , x(tN) for any number N of time cuts in the interval T = [to,tf] ofinterest:

N

f.(tIl •...• x(tN)(~J,···, ~N) = n f.(tJ)lx(tJ-Il..... X(tIl(~jl~j-),··· ,~d (11-13)j= )

162 11. NONLINEAR SroCHASTIC SYSTEM MODELS

(11-14)

(11-15a)

(11-15b)

by repeated application of Bayes' rule. However, because x(',·) given by (11-12)can be shown to be a Markov process, each term in the product in (11-13) isequivalent to f.(tj)lx(tj _ l)(~jl ~j- d, so specification of these transition probabilitydensities completely specifies the needed joint density. Thus, for Markov pro-cesses, conditional distributions of the form Fx(t)lx(t')(~ I~'), or second order jointdistributions ofthe form Fx(t),x(t')(~' ~') from which these conditional distributionscan be derived via Bayes' rule, are of primary importance because they totallydescribe the process characteristics. When the density functions f.(t)Ix(n(~ I~')and f.(t),X(t')(~' ~') exist, they similarly fulfill this objective.

Moreover, since x(',·) given by (11-12) is Gauss-Markov, this completeprocess specification can also be generated entirely in terms ofthe mean functionand covariance kernel (function): the first two moments off.(t).x(t')(~' ~'), given by

mAt) ~ E{x(t)}

= (t, to)mAto) + it (t, T)B(T)u(T) diJtoPxx(t', t) ~ E{ [x(t') - mAt')] [x(t) - mx(t)Y}

= {(t', t)pxx(t) t' ~ tP xx(t')T(t, t') t' ~ t

where the covariance matrix Pxx(t) is given by

Pxx(t) ~ E{[x(t) - mAt)][x(t) - mAtlY}

= (t, to)Po<l>T(t, to) + it (t, T)G(T)Q(T)GT(T)T(t, T)dT (11-16)JtoNote that the last term in (11-12) contributes zero to the mean in (11-14) andthe integral term as given previously in Eq. (11-8) to the covariance matrix in(11-16). Equations (11-14) and (11-16) are the solutions to the ordinary differen-tial equations

IDx(t) = F(t)mx(t) + B(t)u(t)

PAt) = F(t)PAt) + Px(t)FT(t) + G(t)Q(t)GT(t)

(11-17)

(11-18)

Together, these two equations define the propagation ofthe Gaussian transitionprobability density f.(t)lx(t')(~ I~') with time t, for fixed t' ~ t, and also of theunconditional density f.(t)(~). Even if x(', .) were not Gaussian, these twomoments would provide useful information about the central location andspread of possible sample values about that central location for all time, thoughnot a complete depiction.

The preceding results were developed properly on the basis of the stochasticintegral and differential defined in (11-7)and (11-9).However, it was also shownthat these stochastic differentials obey the formal rules associated with deter-ministic functions, and similarly that all of the foregoing results could be


obtained formally by white noise interpretations and integrals of the (improper)form J:o A(r)w(r) dt: Furthermore, the same results can also be generated prop-erly with other meaningful choices of the stochastic integral, as for example

(11-19)

instead of (11-7). However, as we consider generalizations of the definitionsof stochastic integrals, it is important not to expect all definitions to yieldidentical results and properties, nor to assume that familiar formal rules willbe satisfied.

Note that, just as (11-10) replaces the formal differential equation (11-1),the continuous-time measurement process (11-2b) is properly replaced by

dy(t) = H(t)x(t) dt + dPm(t) (11-20)

Here, dPm(t)/dt is formally interpreted as Yc(t), and dy(t)/dt as z(t).We now wish to extend this model to a wider class of problems, but we

still wish to do so in a fruitful manner. Although we will be forced to relinquishGaussianness once nonlinearities are introduced, it will nonetheless be useful(from the standpoints of both providing good descriptions of phenomena andyielding tractable mathematics for estimator and controller development) toso construct this extension as to ensure generation of Markov processes.

One way in which the previous results can be extended is to consider a moregeneralized driving process on (11-10) than Brownian motion [22,45,63]. Topreserve the Markov nature of xt-, '), the driving process should have inde-pendent increments that are independent of the initial condition (conventionallyset as x(to) = 0 with probability one), but the increments need not be Gaussian.Poisson processes fit this characterization [23, 25, 56, 63]. A homogeneousPoisson process is a counting process np ( ' , '), whose components are scalarprocesses np i( " .)

(1) that have independent increments;(2) all of whose sample functions (except possibly for a set of probability

zero) start at zero and are step functions with jump value of unity at eachdiscontinuity;

(3) whose steps occur at Poisson intervals: the probability that the numberof jumps between time t and t' equals k is

),}(It' - tllk

P[npi(t') - npi(t) = k] = I k! exp{ -Ailt' - tl}

where Ai > 0 is the rate parameter for the ith component; i.e.,

(11-21)

P[npi(t + dt) - npi(t) = 1] = Aidt, P[npi(t + dt) - npi(t) = 0] = 1 - Aidt


(11-22)

(4) with discontinuities of the first kind; i.e., if td is a point of discontinuity,then

npi(td - B, ill) #- npi(td , ill) = npi(td + B, ill)

for arbitrarily small B.

Nonhomogeneous Poisson processes have time-varying rate parameters Ai(t)associated with each component, and then (11-21) is modified to

[rA(T)dTJk {j' }P[npi(t') - npi(t) = kJ = t i

k! exp - Jt

tAi(T)dT

If the jump magnitudes are not unity and positive but the realizations of randomvariables, then a generalized Poisson process is created. To be specific, considerthe ith component of a simple Poisson process np( ' , ' ), npi(',') as describedpreviously, and let

np,(t)

nGPi(t) = L aUu_ 1(t - t i)

j= 1

(11-23)

(11-25)

where the random variables au are independent of np ( . , .) and described by agiven joint distribution, and u _ 1(t - ti) is the unit step function initiating at thejth arrival time of the Poisson counting process npi( · , · ).

Of particular interest is the scalar case of (11-23) in which the amplitudesau are independent, identically distributed, zero-mean Gaussian randomvariables with variance (J/ and in which np ( ' '' ) is homogeneous. It can beshown [45J that the autocorrelation function of nGP( " .) is then

E{nGP(t)nGP(t')} = A(J/t t S t' (11-24)

which is identical to the autocorrelation of a Brownian motion. Moreover, thehypothetical derivative of nGP(. , -), a process formally composed of Poisson-distributed impulses weighted by Gaussian-distributed coefficients, has anautocorrelation of

E {dnGP(t) dnGP(t')} = A(J 2b(t - t')dt dt' a

i.e., it is a zero-mean stationary white noise process with Gaussian amplitudes(though not itself a Gaussian process) with flat power spectral density (therebyable to excite all system modes uniformly). In other words, to second momentproperties, nGp( ' , ' ) is identical to scalar constant-diffusion Brownian motion.Direct extensions to the vector and nonstationary cases are also possible.

A more conclusive statement can be made about the character of an inde-pendent increment process x(''')' First, if x(''') is continuous in time w.p.l(for almost all ill), then the increments [x(t + lit, .) - x(t,·)J are Gaussian.Second, if almost all sample paths start from zero and have discontinuitiesof the first kind that are not fixed (i.e., the limit of p[lx(t) - x(t')1 > OJ is zero


as t --+ t') and have jump magnitude of plus one, then the increments are Poisson[45]. Thus, given an arbitrary independent increment process, a decompositioninto Gaussian (continuous) and Poisson (discontinuous) parts may be possible[25, 30, 45]. It is then natural to consider a generalization of the stochasticdifferential equation (11-10) to the form:

dx(t) = F(t)x(t) dt + B(t)u(t) dt + G 1(t) dp(t) + Git) dnGP(t) (11-26)

Poisson processes (or conditionally Poisson processes, conditioned on therate function) have been used as models for the outputs of photon detectors(as, in optical communication systems) and other applications in which discrete-point discontinuities (as, the increasing number of photoconversions in time)is an inherent, important characteristic of the physical phenomenon. In fact,the complete model for optical detectors includes both point processes (detectoroutput due to signal, "dark current," etc.) and Gaussian noise processes (thermalnoise, etc.). Snyder and others have developed rather extensive point processmodels and estimator and controller results based upon them [15, 26, 35,45,58, 59, 61-65]' On the other hand, we will focus upon the Gaussian processcase. To date, the mixed process case is essentially intractable as far as estimatorand controller development, and most designs are based on either one modeltype or the other, or they use models adequate to second moments only (basedon insights from the Gaussian noise model) to treat the mixed process case.Thus, although (11-26)indicates a useful extension to (11-10), we will not pursueit further in this text.

A second extension is to admit nonlinearities in the system dynamics andmeasurement equations, while retaining the linear additivity of the drivinguncertainties, as modifying (11-10) to the more general form

dx(t) = f[x(t), u(t), t] dt + G(t) dp(t)

which could be interpreted heuristically as

i(t) = f[x(t), u(t), t] + G(t)w(t)

(l1-27a)

(11-27b)

again treating white Gaussian noise w(',·) as the hypothetical derivative ofBrownian motion P(·,·). Similarly, discrete-time measurements would bemodeled as

z(ti) = h[x(t i), t;] + v(ti)

and continuous-time measurements as

dy(t) = h[x(t), t] dt + dPm(t)

or heuristically as

z(t) = h[x(t),t] +vc(t)

(11-28)

(11-29a)

(11-29b)

By forcing (11-27) with open-loop (deterministically prespecified) controlu(.) and Brownian motion P(·,·) with independent increments, the state


process can readily be shown to be Markov (this will be accomplished lateras a special case of a more general model form). Later, when feedback controlis introduced, we will have to return to this issue. Ifu were a memoryless func-tion of state value, there would be no concern, but perfect knowledge of theentire state vector is not typically available to generate a feedback control.

Note that the preceding model was the assumed form upon which theextended Kalman filter of Chapter 9 was based. It is, in fact, descriptive of manyrealistic problems of interest. For instance, nonlinear system models driven byadditive white Gaussian and time-correlated Gaussian noises, generated asthe outputs of linear shaping filters driven by white Gaussian noise, are widelyapplicable.

Nevertheless, useful extensions can be made beyond this model form.Specifically, a nonlinear stochastic differential equation of the form

dx(t) = f[x(t), utr), t] dt + G[x(t), t] dfl(t) (11-30)

with G[','] now allowed to be a function of x( ., .) as well as time, and fl(', .)being Brownian motion ("Wiener process") as before, is motivated. As pre-viously accomplished in Chapter 4 (Volume 1) for the linear case, to give propermeaning to (11-30) requires a proper definition of a stochastic integral, nowof the form

(11-31)

where A(', .) denotes a matrix of stochastic processes such that A(t, -) depends atmost on the past and present values of fl(','), i.e., {fl(t','), to ~ t' ~ t}, but isindependent of future values of fl(', '). Alternative definitions of such a sto-chastic integral are possible, in analogy with (11-7)and (11-19), yielding resultswith different characteristics. One in particular, the Ito stochastic integral[27,30], possesses useful properties not shared by other definitions. Oneprimary result is that, if Ito stochastic integrals are used to give meaning to(11-30), then the solution process x(',·) will be Markov. Because of this andsome other pertinent properties, attention will be concentrated upon the Itodefinition of stochastic integrals and stochastic differential equations. Usingsuch a definition, certain formal rules such as the chain rule for differentialswill not be obeyed, unlike the case of Wiener stochastic integrals as definedin (11-6). Subtle modeling issues are raised due to this fact, and these will bediscussed in the sequel.

Models more general than (11-30) might be proposed, such as the heuristicequation

ic(t) = f[x(t), u(t), w(t), t] (11-32)

However, such a model would be especially difficult to develop rigorously.Moreover, even if the solution process x(',·) existed and could be characterized,

11.3 MARKOV PROCESS FUNDAMENTALS 167

it generally would not be Markov. Because Markov process models are signif-icantly more tractable analytically than more general processes, combinedwith the fact that they do provide adequate models of physically observedphenomena, we will confine our attention to the generality of (11-30)and (11-31).

11.3 MARKOV PROCESS FUNDAMENTALS

In this section, the basic characterization of Markov processes [3, 10, 11,23, 30, 56] is developed. The natural importance of transition probabilities anddensities, and the Chapman-Kolmogorov equation associated with them, is acentral thought throughout.

Let x( ., .) be a vector stochastic process, and consider

FX(lillx(li -d,x(li - 2) •... ,X(tj~i I~i- b ~i- 2, ... ,~)

the conditional probability distribution of x(t i , ' ) as a function of the n-dimensional vector ~i' given that X(ti-I,wd = ~i-b X(ti-2,Wk) = ~i-2"'"x(t j , Wk) = ~j' If, for any countable choice of values of i and j, it is true that

F X(t,lIx(t, -d, X(li - 2), ... ,X(t)~i I~i- I' ~i- 2' ... ,~) = FX(lilIX(t, -d(~d ~i- d (11-33)

then x(',·) is a Markov process. Thus, the Markov property for stochasticprocesses is conceptually analogous to the ability to define a system state fordeterministic systems: the value ofx at time t i - I provides as much informationabout x(ti) as do the values ofx at time t i - I and all previous time instants. Saidanother way, the time history leading up to X(t i _ l ) is of no consequence, andonly knowledge of X(t i _ l) itself is required for propagation to the future timet..

There are also Markov-Z, Markov-3, ... , Markov-N processes, where forinstance a Markov-2 process is such that

FX(tillx(t, -d, x(t, - 2), ... ,X(lj)(~i I~i-I' ~i- 2, ... ,~) = FX(I,)lx(t, -d, X(li _ 2)(~i I~i- I' ~i - 2)

(11-34)

Such processes were encountered in conjunction with smoothing problems inChapter 8. However, if x( " .) is an n-dimensional Markov-2 process, then the2n-dimensional process defined at each t, by

(11-35)

is Markov-I. Obvious extensions to Markov-N processes can be made. Thus,we will consider only Markov-1 processes when we speak of a Markov process.

A Markov process x( ., -] can evolve as a continuous process in time or as asequence of random variables at discrete points in time. Also, its probabilitydistribution function can be continuous or have discrete discontinuities. To

(11-36)

start simply, we shall consider the case of discrete probabilities and discretetransition times. Once some fundamental concepts are established for this case,we shall extend to both continuous-state and continuous-time models.

EXAMPLE 11.1 Consider tossing a coin at discrete time instants I" 12 , 13 " " , and letx(',·) be the stochastic process such that X(ti") is a discrete-valued random variable describing thenumber of heads obtained in i tosses ofthe coin. Then Fx«(,)!x(t, _d(~ Ip) is, by definition, the probability(i.e., of the set of wE 0) that the number of heads to appear in i tosses is less than or equal to ~,

given that p heads have occurred in i-I tosses. Let p take on an integer value. For ~ < p, this isthe probability that less than p heads occur in i tosses, given that p heads have occurred in i-Itosses, and so FX«(,)lx(t,_d(~lp) = 0 for this case. For ~ in the interval [p, p + I), FX",)lx(t,_d(~lp) = 1,since it is the probability that less than p heads or p (integer) heads occur on i tosses, given that pheads occurred on i-I tosses, i.e., the probability of a tail being thrown at time t.. Finally, for~;;:: p + I, FX('')IX«(,_d(~lp) = 1: the probability that p or more heads occur in i tosses, knowing pheads occurred in i-I tosses. Moreover, these evaluations are true regardless of what occurredprior to time t, _ t s so x(',·) is Markov. •

EXAMPLE 11.2 A graphic example of a process modeled as Markov is given by the descrip-tion of a frog jumping from one lily pad to another in a pond. There are only a finite number of"state" values in the description ofthe "system," i.e., the index number ofthe pad currently occupiedby the frog. The frog's leap then constitutes a state transition that occurs at discrete points in time.We can think of this as a discrete-time process by indexing the transitions in time.

To study such a process, the probabilistic nature of the state transition must be specified.Generally, the probability that pad j is occupied at time Ii + l' given that pad k is occupied at timet., will be dependent on the value of k; e.g., the more distant padj is from pad k, the lower the prob-ability value. If that probability is independent of how the frog got to pad k, i.e., of the previousstate history, then the process is Markov.

One can then describe the probability that any particular state value (lily pad) is assumed at agiven point in time. This can be generated after a specified number of transitions from an initialstate, which itself might be described only probabilistically. •

The previous examples introduced the concept of a transition probabilitymodel. To generalize, suppose that there are N possible discrete state valueswhich a system of interest can assume at any given time, and let x(',·) be ascalar discrete-valued state process that can assume integer values (indexnumbers,j) from 1 to N at each discrete time t.. Associated with each state valuej, one can define a state probability Pj(ti), the probability that the system will bein the state j at time t i - These separate state probabilities can then be arrayedin a vector p(ti ) as

f

P1(tJ] fP({W:X(ti,W) = I}) ]pitJ P({w:x(ti,w) = 2})

p(t i ) = . = .. .. .PN(ti) P({w:x(t;,w) = N})

Note that, by its definition, the sum of the N components of p(t i ) equals onefor any t.:

N

L: piti) = 1j= 1

(11-37)


If the system of interest has the Markov property, then the probability of atransition to state j by the next discrete time of interest, given that the systemnow occupies state k, is a function ofj and k and not of any history of the systembefore its arrival at state k. Therefore, we can define the state transition proba-bilities for a discrete-state Markov process as

(11-38)

i.e., Tjk(t i+1> ti) is the conditional probability that the system will be in state jat the next time instant t,+ I' given that the state at present time t, is k. Sincethe N state values are assumed to be mutually exclusive and collectively ex-haustive, it must be true that

N

L Tjk(t i+ 1> til = 1j= 1

k = 1,2, ... , N (11-39)

which simply states, given that the state is k at time t., it must assume somevalue at time ti+ I.

Now the state transition probabilities can be used to compose the statetransition probability matrix T(t i+ I, t;):

lT ll (ti+l , ti) r.o..; ti) T1N(ti+ 1> ti)J

T(ti+l,t;) = T21(t~+I,t;) T 22(ti+I,t;) T 2N(ti;+I,ti) (11-40)

TN1(ti+l,t;) TN2(ti+l,tJ TNN(ti+ I' t;)

Note that each column k depicts the probabilities of transitioning into state 1,state 2, ... , state N from any given state k, and in view of (11-39), each of thesecolumns must sum to one. Each row j relates the probabilities of reaching statej from state 1, state 2, ... , state N. Note that the elements of the matrix can befunctions of time: T(ti + I' t;) need not be time invariant. With (11-40), the vectorof state probabilities at time ti+1> p(ti+d, can be expressed in terms of the stateprobabilities at time t., p(t;), as

p(ti+I) = T(ti+I' ti)P(ti)

The jth component of this equation is

N

piti+l) = L T jk(ti+1>ti)Pk(ti)k= 1

or

(11-41)

(11-42a)

N

P{x(ti+d =j} = L P{X(ti+ l) =jlx(ti) = k}P{X(ti) = k} (11-42b)k= 1

Thus, it is readily seen that there are N possible ways x(ti + I) could have reachedthe value j: by means of a state transition to state j from anyone of N states


at time t.. Moreover, (11-41) can be applied recursively to obtain

p(ti) = T(t;, i.: tl ... T(tz, ttlT(t1' to)p(to) (11-43)

If the state transition probability matrix is time invariant, so that T(t i+ l' ti) =T = const for all i, then this simplifies to

(11-43')

Sometimes a graphical model known as a transition probability diagram isemployed to aid problem formulation. Figure 11.1 depicts such a diagram fora system composed of three possible states. For instance, to get to state 1 attime t., 1> one could currently be in state 1 and undergo no state transition,with probability of occurrence T ll(ti+ 1> ti); or one could be in state 2 at timet, and undergo a state transition into state 1, with probability T 1Z(tH 1,ti); orfinally one could be in state 3 and transition into state 1, with probabilityT 13(ti + 1, tJ By knowing the nine state transition probabilities depicted in thefigure, i.e., knowing the 3-by-3 state transition probability matrix T(t H 1,ti),and also knowing the probabilities of being in states 1, 2, and 3 at some initialtime, one can project the probability of being in any particular state or statesat any point in time by propagating the difference equation (11-41).

T 3 3

FIG. 11.1 Transition probability diagram.

Now consider an M-step state transition probability Tjk(ti+M, ti), which isthe conditional probability that the process will be in state j at time tH M afterM transition time instants, given that the present state of the process at timet, is k:

(11-44)

Let t i +1 be some intermediate time instant between t, and ti + M , i.e., with 1< M.One can then write

N

Tjk(ti+M, til = I Tj/(tH M, ti+l)Tlk(ti+l,tJ1= 1

(11-45)

This states that the conditional probability that the system will be in state jat time tH M, given that it is in state k at time t., is equal to the summation(over intermediate state value l) of the N possible terms formed as the productsof [the probability that the system will transition from state k at time t, to stateI at time t,+/] and [the probability that it will transition from that state I attime t i +1 to statej at time ti + M ]. This is the discrete-state version ofthe Chapman-Kolmogorov equation for Markov processes; it and its continuous-state analogwill be of significant value subsequently for nonlinear estimator formulations.Note that (11-45) is also the jkth element of the relation

T(ti+M, ti ) = T(ti+M, tH 1)T(ti+ 1, tJ (11-46)

i.e., the state transition probability matrix (like the state transition matrix<!l(ti' t) associated with linear deterministic state models of dynamic systems)has the semigroup property: transitioning from t, to ti + M can be achieved as atransition from t, to t i +1 and then from there to ti + M . This property is inherentin the solution form (11-43).

EXAMPLE 11.3 Recall the 3-state transition probability diagram depicted in Fig. ILl.Consider the term T 31(ti+5,I,), the conditional probability that the process will be in state 3 attime t, + 5' given that the present state at time t, is state I. Then, by (11-45),

3

T 31(ti+5,t,j = I T 31(t'+5,I'+2)Tll(ti+2,t,)1= 1

= T31(ti+5,ti+2)Tll(t'+2,t,j + T32(t'+5,ti+2)T2I(ti+2,ti)

+ T 33(t'+5,t'+2)T31(ti+2,t,)

In other words, there are three possible "routes" of transition through the intermediate time t, + 2'

as displayed in Fig. 11.2.The first "route" corresponds to the first term in the preceding summation:the product of [the probability that state 1 is assumed at time t i + 2' given that state 1 is assumedat t,] and [the probability that state 3 is assumed at t'+5' given that the state at time t i+2 is indeedstate 1]. Similarly, the second and third "routes" correspond to the second and third terms in theequation, respectively. •

EXAMPLE 11.4 Discrete-state Markov processes provide a useful model for system reliabilitypurposes. Suppose that we have two actuators in parallel, so that the overall actuator system failsonly if both actuators fail, and let us investigate the probability ofa system failure. For convenience,quantize time into intervals !'1t in length.

There are four possible states: state 1 = both actuators are working, state 2 = actuator A hasfailed but actuator B is working, state 3 = A is operating but B has failed, and state 4 = bothactuators have failed. By using a Markov process model, we are assuming that the state transitionprobabilities depend only on the current state, not on the previous history leading up to thatstate. For instance, assume that there is a probability of 0.01 that either actuator will fail in the


•Time

~State1~

StT1~ • State 2 ~StT3

I ~State3~ II I II I II I II 1 II I 1I I II 1 I

: I :

i I ~i------t,

FIG. 11.2 Transitions through intermediate time t'+2 for Example 11.3.

next tl.t seconds, given that both are operating at the beginning of that time interval. Failures areassumed to be independent of one another. Furthermore, if one actuator has failed at the beginningof an interval, there is a probability of 0.02 that the other will fail in the ensuing ~t sec: since thesingle actuator takes the entire load, the failure probability is higher. Note that these are conditionalprobabilities, conditioned on the system state at the beginning of the interval, and not on anyprevious history of values.

Figure 11.3 presents the transition probability diagram for this problem. We now evaluate thetransition probabilities as depicted there. First, from state 1, it is possible to transition into any ofthe four states. Since failures are assumed independent of one another, the probability of both A

FIG. 11.3 Transition probability diagram for Example 11.4.


and B failing in the next I'1t sec is the product of the separate probabilities of A failing and B failing:

T 41 = P{A fails and B fails IA and B both working to start}

= P{A failsIboth working to start}P{B failsjboth working to start}

= (0.01)(0.01) = 0.0001

To get T 21, we know that P{A fails Iboth working to start} = 0.01, and that this is the sum

P{A fails, B works Iboth working to start} + P{A fails, B failsIboth working to start}

because these latter events are mutually exclusive and their union is the event {A failsIboth workingto start}. Thus,

0.01 = P{A fails, B works Iboth working to start} + 0.0001

so that T 21 = 0.01 - 0.0001 = 0.0099 and similarly for T 31. Finally, Til is obtained by the factthat the total probability of reaching some state out of all possible states is one, so

From state 2, we can transition only into states 2 or 4: we disallow the possibility of "healing"and recertification of an actuator once it fails (also known as "regeneration"); such a possibilitycan of course be handled mathematically by nonzero state transition probabilities T I 2 and T 32.

The value of T42 was given as 0.02, and so T44 = 1 - T42 = 0.98. Analogously, T43 = 0.02 andT 33 = 0.98.

From state 4, if we disallow "healing" of failed sensors, we can only transition into state 4, andso T44 = 1. Note this example expresses the transition probabilities as time invariant; time-varyingprobabilities can also be addressed mathematically.

Thus, the state transition probability matrix is

[

0.9801

0.0099T(ti+l,t i) = 0.0099

0.0001

o0.98

o0.D2

o OJo 0

0.98 0

0.D2 1

Note for instance that the first row indicates the probabilities of transitions into state 1 from states1,2, 3, and 4; the first column depicts probabilities of transitions from state 1 into states I, 2, 3,and 4. Moreover, each column sums to I: the total probability of reaching any of the admissiblestates from a given state must equal unity.

For reliability purposes, we are interested in the probability of reaching state 4, starting froman initial condition of both actuators working. Thus, we have an initial condition on (11-36) asp(to) = [I 0 0 O]T, and (11-41) can be used to propagate the vector of state probabilities in time.We are especially interested in P4(t i): using the given T(ti+l,t,) and (11-41), P4(ti) is seen to start atzero and to grow, asymptotically approaching 1 as time progresses. Had "healing" of failed sensorsbeen allowed, a nonunity steady state value of P4could have been reached.

In this problem, since T(t i+ I' til = T is time invariant (a function only of I'1t), the Chapman-Kolmogorov equation (11-46) becomes

T(ti+M,t,) = T M = T(M-I)TI = T(ti+M,ti+I)T(ti+I,ti)

Thus, P(t i + M ) = TMp(t i) can also be evaluated as

P(t i + M ) = T(M-I)TIp(t i) = T(ti+M,ti+/)P(ti+I)

in terms of the probabilities at the intermediate time t H I, P(ti+ I). •


(11-47)

Now let us apply the insights gained from the case of discrete states (eachwith associated discrete probability) to the continuous-state case in whichprobability distribution functions are continuous. We shall assume henceforththat the conditional density function

a1.(t)IX(t,)(~lp) = a~1 ,.. a~n Fx(t)lx(t,)(~lp)

exists for all t z t', Knowledge of this transition probability density allowscomplete specification of the statistical properties of a Markov process x(', .),as demonstrated earlier. Common notational conventions for this fundamen-tally important function include

1.(t)IX(t,)(~lp),@ 1.(~, tlx(t') = p),@ 1.(~, tip, t') (11-48)

and we shall use the last and most common of these for convenience.In the discrete-state case examined so far, one of a finite number of admissible

state values can be assumed at any time, including the intermediate time ti + 1 ,

resulting in the finite sum of transition probability products in the Chapman-Kolmogorov equation, (11-45). When continuous-state problems are consid-ered, we can expect this sum to be replaced by an integration over the range ofvalues that can be assumed. We now demonstrate the validity of this insight.Consider any three times of interest, t 1 ::; t2::; t3 . By the definition of condi-tional densities and Bayes' rule, we can write in general

1.(t3), x(t2)lx(td(~' P I '1) = 1.(t3)lx(t2), X(td( ~ 1p, '1)1.(t2)lx(t)P I'1)

Since x( ., . ) is assumed to be a Markov process, this becomes

1.(t3), x(t2)lx(td(~' PI'1) = 1.(t3)lx(t2)(~ Ip)1.(tz)ix(td(PI'1)

To obtain the conditional marginal density for X(t3), conditioned on x(t 1) = '1,we can integrate over p, the process value at the intermediate time t2 :

f.(t3)lx(td(~ 1'1) = f~oo f.(t3), x(t2)lx(td(~' P I '1) dp

= f~oo f~oo f.(t3), x(t2)lx(td(~' PI'1) dp 1 dp2 ... do;

= f~oo f~oo f.(t3)IX(tJ~lp)f.(tz)lx(td(pl'l)dpl dp2" -do; (11-49)

Using the notation introduced in (11-48), this becomes the conventional formof the Chapman-Kolmoqorou equation:

1.(~, t31 '1, t1) = f~oo ...r, 1.(~, t31 p, t2)1.(P, t21 '1, t1)dp 1... do; (11-50)

This is directly comparable to (11-45), especially when we note that

(11-51)

11.4 ITO SroCHASTIC INTEGRALS AND DIFFERENTIALS 175

(11-52)

By taking the expectation of the Chapman-Kolmogorov equation, i.e., inte-grating out the dependence on 'l, we obtain

!.(t3)(~) = f~oo !.(t3>1X(t2)(~ Ip)!.(t2ip) dp

= f~oo !.(~, t 31 p,t2)!.(t2)(P) dp

where !,(tJp) can be considered an initial condition. This is directly analogousto the discrete-state result (11-42).

Further insights about the structural form of propagation relations fortransition probability densities can be obtained as well. In the case of discretestates (discrete probabilities, discontinuities in the distribution function) anddiscrete transition times, the model description for both transition and uncon-ditional probabilities is in the form of a difference equation, i.e., (11-45) and(11-42), respectively. The case of discrete states and continuous time leads todifferential equation models for the time evolution of such probabilities.Extending to the case of continuous states and continuous-time Markovprocesses will result in partial differential equations to describe time evolutionof transition probability densities: the forward Kolmogorov equation to be dis-cussed subsequently. An important class of such continuous-time, continuous-state Markov processes is the set of solutions to nonlinear stochastic differentialequations defined in the Ito sense. The proper definition of such processes andtheir characteristics, including time evolution of the associated transitiondensities, will be developed. To do so, as in Chapter 4 of Volume 1 wherelinear stochastic differential equations were developed, we first establish theproper definition of Ito stochastic integrals and differentials.

11.4 ITO STOCHASTIC INTEGRALSAND DIFFERENTIALS

Wiener stochastic integrals of the form of (11-6) have previously beendeveloped, and their properties are directly effected by properties of theBrownian motion or Wiener process P( " .)used to define the integrals properly.Now we wish to consider the extension to (11-31), initially in the scalar case:

I(t,') = i:a(r")d~(r,.) (11-53)

where a(',') is an admissible stochastic process, such that a(t,') depends atmost on the past and present values of ~(".), {~(t'), to s t' s r}, but is inde-pendent of future values of ~(.,' )[10, 11,22,23,27, 30-32, 39,46,47, 60]. Theproperties of this stochastic integral will also be a direct consequence ofBrownian motion characteristics, so we first consider these [9, 12, 32, 44, 45,55, 71]'

Scalar Brownian motion is defined to be a process with independent incre-ments that are Gaussian, such that for any t1 and t2 in the time set T of interest,

E{[~(t2) - ~(tl)]} = 0

E{[~(t2) - ~(tlW} = i t 2q(1:)d1:Jt l

(11-54a)

(11-54b)

and, by convention, such that ~(to) = 0 with probability 1. From this definition,the following properties were developed in Volume 1: (1) ~(".) is Markov,which is true of any process with independent increments; (2) ~(".) is con-tinuous everywhere with probability one (or, "almost surely," i.e., all samplefunctions are continuous except possibly a set of total probability zero) andalso in the mean square sense (or, in "quadratic mean"); (3) ~(.,.) is nondif-ferentiable everywhere with probability one and in the mean square sense;(4) ~(', .) is not of bounded variation with probability one and in the mean squaresense [23, 68]; (5) E{~(t;)~(t)} = E{~(ty} for tj ~ t..

Since ~(.,.) is a zero-mean process with independent increments, it isreadily shown also to be a martingale [10, 14, 39, 50], a stochastic processxt-, .) for which E {lx(t)1} is finite for all admissible t and

E{x(ti)lx(ti- 1 ), X(ti- 2),···, x(to)} = x(ti-tl (11-55a)

for any sequential times to, t l' ... , t., or if'xt -,") is defined over some interval T,

E[x(t)1 {x(1:), to ~ 1: ~ t' < t}] = x(t') (11-55b)

which can be written more rigorously as

E{x(t)I~,} = x(t') (11-55c)

where ~, is the minimal o-algebra generated by {x(1:), to ~ 1: ~ t' < t}. Themartingale property is important in developing the idea of conditional expec-tations conditioned on knowledge of samples of a continuous-time process(e.g., continuous-time measurements), likelihood ratios for statistical detectiontheory, and other concepts [10], as well as being pertinent in the discussion ofstochastic integrals. Moreover, it can be proven [34, 45] that if x(', .) is amartingale that is continuous with probability one, with covariance

E{[x(t2) - X(tl)]21~,} = (t2 - ttl (11-56)

then x( ., .) is a Brownian motion with unit diffusion. Throughout our discussionof (11-53), we could restrict our attention to unit-diffusion Brownian motionwithout loss of generality since

it a(1:,')d~(1:,) = it [a(1:,')ql/2(1:)]dW(1:,)Jto Jto

£ 1:a'(1:,')dW(T,·) (11-57)

if E{[d~(tW} = q(t)dt and E{[dW(t)]2} = dt.

11.4 ITO STOCHASTIC INTEGRALS AND DIFFERENTIALS 177

An additional property of Brownian motion has not been discussed pre-viously, since it was not essential to the development of Wiener stochasticintegrals. Namely, Brownian motion has the Levy oscillation property orquadratic variation property [34,42]: IfW(',') is unit-diffusion Brownian motionand {to,t 1, ... ,tN = tf} is a partition of the interval [to, tf]' then

N-1

lim L [B'(r.; 1) - W(tiW = (tf - to)maxjti+] -tij...-.+O i=O

(11-58)

where the limit exists both in the mean square sense and with probability one.Written another way,

[dW(tW = dt w.pJ; in m.s. (l1-59a)

The mean square convergence is readily demonstrated:

and the variance of the sum tends to zero in the limit as maxlr.; 1 - til ~ 0.Thus, not only is it true that E{[dW(t)]2} = dt, but [dW(t)]2 itself equals dt forall samples except possibly a set of total probability zero. This readily extendsto the nonunit-diffusion case:

[d~(t)]2 = q(t)dt w.p.l ; in m.s. (11-59b)

and also to the vector Brownian motion case:

w.p.l; in m.s. (l1-59c)

This result will have profound effect upon the properties of the stochasticintegral (11-53). Unlike the Wiener integral case, evaluating it as

N-1

I(t,') = I.i.m. L a(ri,·)[~(ti+1") - ~(ti")]N-O i=O

(11-60)

where r i is any point in the interval [ti , t, + 1), will cause the value and propertiesof I(t,') to be dependent upon the specific choice of ri [19, 32, 34, 67]. Heu-ristically, (11-59) will also impact the evaluation of truncated Taylor seriesrepresentations of nonlinear functions of ~(', -), invalidating the applicabilityof formal rules for differentials and the like.

Let at-;') be admissible (as defined below Eq. (11-53)) and suppose thatJ;~ E{a(r,' )2} dt is finite with probability one. Under these sufficient conditions,Ito [27,30,34] showed that the stochastic integral (11-53) can be defined in amanner analogous to that of Chapter 4 for Wiener integrals. First the interval[to, t] is partitioned into N steps with to < t1 < ... < tN = t, and the stochasticintegral can then be defined for a piecewise constant function aN ( ' , . ), constantover each partition [ti' ti+ d, approximating a(',') by letting aN(r,') = aUi")


for r E [t i , t i + 1), asN-1

IN(t,) = i~O aN(ti,.)[~(ti+1'·) - ~(t;,')] ~ 1: aN(r,·)d~(r,·) (11-61)

Now we take finer and finer partitions of [to, t], such that max{lti+ 1 - til} ~ 0,and consider the sequence of integrals IN(t,') as N ~ 00. If the sequence ofaN(',')'S converges to the random function a(',') in the sense that

as N ~ 00 (11-62)

which is assured by the two stated sufficient conditions, then the Its stochasticintegral can be defined as the mean square limit:

N-1I(t,.) = l.i.m. I aN(ti,.)[~(ti+1") - ~(ti")]

N-oo i=O

~ l: a(r,.)d~(r,.)

EXAMPLE 11.5 Consider evaluation of the stochastic integral [10]

l(t,') = f~ ~(T,')d~(-r,')

(11-63)

in which the admissible a(',') is specifically ~(".) itself: here aft, .) depends only on ~(t, .) and notthe more generally admissible W(t','), 0::; t'::; r}. Let ~(-,.) have constant diffusion parameter q.First let us interpret this stochastic integral in the Ito sense and use (11-63) to establish the desiredevaluation:

where the second term follows from the Levy oscillation property (11-59b). Note that this does notobey formal integration rules, which would yield simply W2(t).

Ifalternate stochastic integral definitions are considered as in (11-60),with Ti = t, + k[t i+1- t']and k E [0, 1], then [19]

N-I

1ft) = l.i.m, L ~(ti + k[t i+I - ti])[~(ti+ d - ~(tiJJN-oo ;=0

Ifk is chosen to be j , i.e., if we let Ti be the midpoint between t, and ti + I, then the formal integrationrule is satisfied. This corresponds to the Stratonovich definition of a stochastic integral [16, 17,32,66, 67]. Despite this attractive feature in the scalar case, the Stratonovich integral lacks someproperties possessed by the Ito integral that are essential to Markov process descriptions andoptimal estimator derivation, so we shall concentrate only on the latter definition. •

11.4 ITO STOCHASTIC INTEGRALS AND DIFFERENTIALS 179

Viewed as a stochastic process, 1(',') is itself admissible, and it is both meansquare continuous and continuous with probability one [10,14,32,45]. It is alsoa martingale of Brownian motion [19,20,34]:

E[I(t)/{j3(r), to::S; r ::s; t' < t}] = lU') (11-64)

and has the properties (for admissible a(' , .) and b(', . ))

E{i: a(r)dj3(r)} = ° (11-65)

E{[i: a(r)dP(r)][i: b(r)dj3(r)]} = i: E{a(r)b(r)}q(r)dr (11-66)

Demonstration of (11-65), (11-66), and Mean Square Continuity

These properties follow directly from (11-63). For instance, aN(t,) is independent of [~(t, + ,) -

~(t,)] for admissible aN, so that the expectation of (11-61) yields

E {IN(t)} = E [t: aN(t,)[~(ti+ tl - ~(t,)] }

N-1= L E{aN(t,)}E{[~(t,+,) - ~(t,)]} = 0

i=O

for all N. Since

for any sequence {xN } with finite second moments (readily demonstrated using the Schwarz in-equality; see Problem 11.7),(11-65) is established by

E{I(t)} = E{l.i.m. IN(t)} = lim E{IN(t)} = 0N-ro N-oo

Similarly, (11-66) is valid for step functions aN and bN :

E {[s,: aN(T) d~(T)J [s,: bN(T)d~(T)J}

= E{[t: aN(t,){~(t,+tl- ~(t,)}J[t: bN(t){~(tj+tl- ~(t)}J}N-l

= i~O E{aN(t,)bN(t,)} s,~'" q(T)dT

= f' E{aN(T)bN(T)}q(T)dTjto

where the single summation in the third line is the result of aN(t,), bN(t j), and [~(t,+ ,) - ~(t,)] beinguncorrelatedwith [~(tj+') - ~(ti)] for r, > t,andE{[~(ti+1) - ~(tj)]} = O(and similarly for t, > t);for t, = ti, aN(t,) and bN(t,) are uncorrelated with [~(t,+,) - ~(t,)], so the expectation of the productis E{aN(ti)bN(t,)}E{[~(t,+,) - ~(t,)y} as shown. Now assume {aN(T)} and {bN(T)} converge to atr)

and b('l"), to:O; r :0; t, in the sense of(1I-62); then

lim (, E{aN('l")bN('l")}q('l")d'l" = (' E{a('l")b('l")}q('l")d'l"N_ooJto Jto

which can be shown by appropriate application of the Schwarz inequality (see Problem 11.8).Finally, for any sequences {x N} and {YN} with finite second moments,

E{[l.i.m. XNJ [ l.i.m. YNJ} = lim E{XNYN}N-oo N-oo N-oo

(see Problem 11.7),so that (11-66) is established by

E {[INi:.~: ~t: aN(t,){~(ti+ tl- ~(ti)} J[INi:.~. ~t: bN(t){~(tj+ tl - ~(tj)} J}

= ~~~ E{[t: aN(t,){~(ti+tl- ~(ti)}J[t: bN(t){~(tj+tl- ~(tj)}J}= (, E{a('l")b('l")}q('l")d'l"J,o

Moreover, since (11-66) is true, E{[J:o a('l")d~('l"lY} equals S:o E{a 2('l")}q('l")d'l" and is thus finite ifE{a 2('l")} < 00, to :0; 'l" :0; t, and

lim E{[ ('Ht a('l")d~('l")J2} = lim pH' E{a2('l")}q('l")d'l" = 0At ..... 0 Jr 6.t-O Jr

since the last Riemann integral is a continuous function of its upper limit. This demonstratesmean square continuity. •

The properties just described for Ito stochastic integrals were seen to bedependent upon the definition given in (11-63). These properties need not betrue for alternate definitions of stochastic integrals as given in (11-60).

Other properties of the Ito integral are also natural extensions of propertiesof Wiener integrals. For instance, a linear functional of a random process Z(', .)with finite second moments is defined to be a finite sum of the form If= t kiz(t i)or a limit in the mean of such sums, while a nonlinear functional of z( ., .) isdefined as a finite sum If= 1 kiz(t j), a finite product n~ 1 kjz(t j), or a limit inthe mean of such sums, products, or combinations thereof. It can be shown [10]that any finite-variance linear functional of Brownian motion {J3(L), 0 ~ L ~ t}can be written as a Wiener stochastic integral, as

xU) = f~ a(L) dJ3(L) (1l-67a)

with J~ a2(L)q(L) dL finite. Analogously, any finite-variance nonlinear functionalof Brownian motion can be expressed as an I to stochastic integral, as

x(t) = f~ afr) dJ3(L) (1l-67b)

with J~ E{a2(L)}q(L)dL finite [4, 5,19,20,28,34,39,72]. In fact, once the Itointegral is established, it is useful to consider the Wiener integral as a specialcase of Ito integral, with nonrandom function a(t) == a(t) used in its definition.

11.5 ITO STOCHASTIC DIFFERENTIAL EQUATIONS 181

The concepts presented for the scalar case extend directly to the vector Itostochastic integral,

IU,') = 1: A(r,·)dp(-r,·) (11-68)

with A(" .) a matrix of admissible functions. Then, if the sequence of AN ( " • Is,achieved by finer and finer partitions, converges to A( ., .) in the sense that

It E{[A(r,)-AN(r,nt}dr---+OJtoas N ---+ CiJ for all i andj, then the stochastic integral is defined as in (11-63):

N-lI(t,') = !.i.m. I AN(t;,' )[P(t;~ 1") - P(t;,')]

N-oo i=O

(11-69)

with the same properties as already described in the scalar case.Given the definition ofan Ito stochastic integral, the corresponding stochastic

differential can be established. If

I(t) = I(to) + It A(r)dp(r)Jto (11-70)

then the stochastic differential of I(t) is that which would be integrated fromtime to to time t to evaluate [I(t) - I(to)]:

dl(t) = A(t) dP(t)

Heuristically, it can be interpreted as an infinitesimal difference

dl(t) = I(t + dt) - I(t)

11.5 ITO STOCHASTIC DIFFERENTIAL EQUATIONS

(11-71)

(11-72)

Consider a dynamical system described by the nonlinear (Ito) stochasticdifferential equation [1, 10, 18,22,27-32]

dx(t) = f[x(t),t]dt + G[x(t),t]dP(t) (11-73)

where x(',·) is the n-dimensional state stochastic process, f[x(t), t] is an n-vectorfunction of x(t) and (possibly) t describing system dynamics, G[x(t), t] is ann-by-s matrix of functions of (possibly) x(t) and t, and P(', .) is s-vector dimen-sional Brownian motion of mean zero and diffusion Q(t):

E{dP(t)dpT(t)} = Q(t)dt

E{[P(t2 ) - P(tl)][PU2) - P(1)Y} = i:2Q(t)dt

Equation (11-73) is to be understood in the sense that

x(t) - x(to) = It f[x(-r),r]dr + It G[x(r),r]dp(r)Jto Jto

(l1-74a)

(11-74b)

(11-75)


where the first integral on the right hand side is an ordinary integral for a givensample function of the process. The second integral is a specific form of Itostochastic integral as in (11-68) and (11-69), but in which we restrict attentionto G[" t] being specifically a function of x(t) rather than the entire history{x(r), t0 ~ r ~ t}. In fact, (11-73) can be generalized to the case of assumingonly that f and G are admissible, yielding solution processes known as Itoprocesses. The motivation for the restriction is that solutions to (11-73) will beMarkov, whereas general Ito processes need not be. Moreover, it is only theIto definition of stochastic integrals that will yield a solution to an equationsuch as (11-73) that will be Markov.

Ito [29] established the existence and uniqueness of solutions in the meansquare sense to (11-73) under sufficient conditions that are very similar to thosefor ordinary nonlinear differential equations, namely:

(1) f[',] and G["'] are real functions that are uniformly Lipschitz intheir first argument (a continuity condition): there exists a K, independent of t,such that

Ilf[x + Ax, t] - f[x, t]11 ~ KllAxl1

IIG[x + Ax, t] - G[x, t]11 ~ KllAxl1

for all x and Ax in R" and all t in the interval [to' tf] of interest, where theappropriate norm definitions for an n-vector v and an m-by-n matrix Mare

Ilvll = Ltl v/T/2 = [vTvr /2 = [tr(vvT)] 1/2

IIMII = Ltl Jl MGT/2 = [tr(MMT

)] 1/2

(2) f[·,·] and G[', .] are continuous in their second (time) argument overthe interval [to, tf ] of interest;

(3) f[·, .] and G[', .] are uniformly bounded according to

IIG[x, t]11 2 ~ K(l + IlxW)

(4) x(to) is any random vector, with finite second moment E{x(to)xT(to)},which is independent of the Brownian motion process P(·,·).

In fact, Ito's proof is analogous to the standard successive approximationsproofof existence and uniqueness of solutions for ordinary nonlinear differentialequations [8]. The solution is generated constructively by assuming the exis-tence of xk(',·) with xk(to,) = x(t o, .) and then forming xk+ 1(',) via

xk+ 1(t) = x(to) + it f[xk(r), r] dt + it G[xk(r), r] dp(r) (11-76)Jto JtoThe sequence {xk(', .)} converges (in the mean square sense and with probabilityone) on any finite interval [to,tf ] to a solution process x(·,) provided that

the four sufficient conditions are satisfied. Once a solution is so generated, itsuniqueness in the mean square sense can also be established [9,29,32,45,73,74].

The solution process x(', -] has the following important properties [3, 9, 30,32,34,45,76]:

(1) x( " . ) is mean square continuous, i.e.,

or

l.i.m. x(t') = x(t)t'-+t

lim tr E{[x(t') - x(t)] [x(t') - x(tj]T} = °t'-t

(ll-77a)

(ll-77b)

and it is also continuous w.p.1.(2) [x(t) - x(to)] and x(t) are both independent of the future increments of

P(., .), i.e., [P(t + r 2) - P(t + r 1)] for r 2 > r 1 ~ 0.(3) x(',·) is Markov. Since, for t ~ t',

x(t) = x(t') + f f[x(r),r]dr + f G[x(r),r]dp(r) (11-78)

x(t) depends on x(t') and {dp(r), t' :S r :S r}, and the latter is independent ofx(a), a :S t', Thus, the conditional probability distribution for x(t) given x(t')and {x(a), a < t'} equals the distribution conditioned only on x(t'), establishingthe Markov nature.

(4) The mean squared value of each component of x(',·) is bounded by afinite number,

(1l-79a)

for all time, and also

(1l-79b)

(11-80)

for any to and tr(5) The probability of a change in x(t) in a small interval !!t is of higher

order than !!t (a form of continuity property):

li~ Lr; ... S~oo fx(~, t + !!tIp, t)d~l ... d~n = 0,dt 0 11~ _pll 2: b

where the notation means that the integration over ~ 1, ... , ~n is to be carriedout outside the ball of radius b about p. Note that since x(', .) is Markov, thetransition probability is an appropriate means of describing fundamental prop-erties. Figure 11.4 displays this property graphically for scalar x( " .): as !!t ~ 0,the transition probability converges to a delta function at Az = °and crowds intothe region p ± b faster than !!t ~ 0.


fx(~, t + ~tlp, t)

p-b p p+b

~Ball of radius b about o

FIG. I1.4 Illustrating property (5) of solution process x(·,·).

(6) The drift ofx(',') at time t is f[x(t),t]: recalling (11-73),

lim : f~ ... f~ [~- p]fx(~, t + dtlp, t)d~l ... «..1.1-+ 0 tit 00 00

= lim : E{x(t + At) - x(t)lx(t) = p}.1.1-+0 tit

= f[p, t] (11-81 )

This says that the mean rate of change in x(·,·) going from t to t + At is f[x(t), t]in the limit as dt --+ O.

(7) The diffusion ofx( " .) at time t is {G[x(t), t]Q(t)GT[x(t), t]} :

lim : f~ ... r [~- p][~ - PY.r.(~, t + Atlp, t)d~l .,. d~nLl.t-+ 0 tit 00 00

= lim : E{[x(t + At) - x(t)][x(t + dt) - x(t)Ylx(t) = p}.1.1-+ 0 tit

= G[p, t]Q(t)GT[p, t] (11-82)

This is the covariance of the rate of change of x(', .), and Q(t) is the Brownianmotion diffusion defined in (11-74).

(8) The higher order infinitesimals in the progression established by (11-80)-(11-82) are all zero:

lim : f~ '" f~ (~i - pl.r.(~, t + dtlp, t)d~l ... d~n = 0 (11-83a).1.1-+ 0 tit 00 00

for k » 2, and a similar relation is true for the general products greater thansecond degree as well, such as

lim : f~ ... f~ (~i - Pi)(~j - P)(~k - Pk)fx(~, t + ~tlp, t)d~l ... d~n = 0M~ 0 o.t 00 00

(1l-83b)

This implies that the process does not diffuse "too fast," not that the transitiondensity stays symmetric or Gaussian-like.

Formal Calculation of Statistics of dx(t)

It is instructive to calculate the statistics of the differential dx(t) in (11-73) to understand prop-erties (5)and (6) and the rate of change ofx(·,·) as just described. To do this properly would requireaddressing existence questions, but we present only formal calculations here.

In performing these calculations, we will have need for the fact that, if x and yare scalar functionsof random vectors u and v respectively, i.e.

then

x = l/J[u], y = 8[v]

This can be established as

E{xYlv = v(·)} = E{xlv = v(· )}(8[v(·)])

E{xyjv = 11} = roo roo ~P;;.Yly(~,plll)d~dp

= f~oo f~oo ~pfvl"y(pl~,II);;ly(~lll)d~dp

by Bayes' rule. But .f~I',.(P I~, II) = <>(p - 8(11)) so the sifting property of the Dirac delta yields

E{xYlv = 11} = f~oo ~8(11);;ly(~lll)d~

= [r: ~;;ly(~III)d~J[8(11)]as desired.

Now consider the conditional expectation of dx(t), Recalling (11-73),

E{dx(t)lx(t) = x(t,')} = E{f[x(t), t] dtlx(t) = x(t,')} + 0

=f[x(t).t]dt

where the last equality follows componentwise from the result just established, with v = x(t), l/J 0= 1,and 8[v] =};[x(t), t]' This establishes property (6).

Now consider the conditional expected value:

E {dx(t) dxT(t)Ix(t) = x(t, .)} = E {f[x(t), t]fT[x(t), t] dt 2 + f[x(t), t] dtdJJT(tlGT[x(t), t]

+ G[x(t), t] d!J(tW[x(t), t] dt

+ G[x(t), t] dJJ(t)d!J(t)TGT[X(t), t] jx(t) = x(t,'J}

The first term is second order in dt ; the second and third terms are zero mean because dJJ(t) isindependent ofx(t) and is zero mean. Using the previously established result componentwise yields

186

the fourth term as

II. NONLINEAR STOCHASTIC SYSlEM MODELS

G[x(t), t]E {dp(t) dpT(t) Ix(t) = x(t,' )}GT[x(t), t] = G[x(t), t]E {dP(t) dpT(t)}G T[x(t), t]

= G[x(t), t]Q(t)GT[ x(t), t] dt

again using the independence of dp(t) and x(t). This establishes property (7). •

Fonnal rules of integration and differentiation are not valid for Ito stochasticintegrals or differentials based upon them. As a direct consequence of the Itointegral definition and the Levy oscillation property ofBrownian motion (II-59),differentials of functions of x(',·) satisfy the Ito differential rule [27, 30, 32,34,39,45]. Let x(·,·) be the unique solution to (11-73)-(11-75), and let I/f["']be a scalar real-valued function that has continuous first and second partialderivatives with respect to its first argument x and is continuously differentiablein its second argument t. Then I/f satisfies the stochastic differential equation

a\jJ a\jJ I a2\jJ

dl/f[x(t),t]=-a dt+-a

dX(t)+?tr{G[x(t),t]Q(t)GT[x(t),t]-a2 dt (11-84)t x ~ x

where

a\jJ ~ al/f[x, t] Iat at x=X(t)'

and tr designates trace. Again, (11-84) is to be understood in the sense that

I/f[x(t),t] -I/f[x(to),to] = it dl/f[X(L)'L]Jto (11-85)

It is the last term in (11-84) that causes formal differential rules to be erroneousin this application. The validity of (11-84) can be demonstrated formally by aTaylor series expansion of I/f[x + dx, t + dt]:

a\jJ a\jJ 1 a2\jJ

I/f[x + dx, t + dt] = I/f[x,t] + atdt + ax dx(t) +"2 at2 dt2

I a2\jJ

+ "2 dX(t)T ax 2 dx(t) + ...

Now (11-73) is substituted into the last term, and recalling the Levy oscillationproperty (11-59), we retain only terms up to first order in dt and second order

in dP to get

(J\jJ (J\jJ 1 82\jJl/J[x + dx, t + dt] - l/J[x, t] = -a dt + -8 dx(t) + - dJlT(t)GT-a2 G dp(t)

t x 2 x

a\jJ a\jJ 1 { (l2\jJ}= iii dt + ax dx(t) + 2tr G dp(t) d\3T(t)G

T(lX2

= ~~ dt + ~~ dx(t) + ~ tr {GQ(t)GT ~:~} dt

where the last equality invokes the Levy property directly. Rigorous derivationsof (11-84) can also be developed [30, 60]. Equation (11-84) is often combinedwith (11-73) and written in the form

dl/J[x(t),t] = ~~ dt + 2"'{l/J[x(t),t]}dt + :~ G[x(t),t]d\3(t) (1l-86a)

2"'{l/J[x(t),t]} = ~~ f[x(t),t] +~tr{G[x(t),t]Q(t)GT[x(t),t] ~:~} (11-86b)

where 2"'{l/J[x(t), t]} is termed the differential generator of the process.

EXAMPLE 11.6 Consider a simple scalar case of (11-73) with f = 0, G = I:

dx(t) = d~(t)

i.e., x(',·) is scalar Brownian motion itself, and we really have a trivial linear stochastic differentialequation, written heuristically as x(t) = wit) with w(·,·) white Gaussian noise. Let the diffusion(white noise strength) Q be constant. Now consider the nonlinear function

l/J[x(t), t] = e'(t l = e~(l)

From (11-84), this satisfies the stochastic differential equation

dljJ = e'(t) dx(t) + tQe'(t) dt

or

d[e~(tl] = e~(t) d~(t) + tQe~(tl dt

Note that, because of the last term, this does not satisfy formal rules for differentials. Letting y(t) =

eP<tl, this yields

dy(t) = !Qy(t)dt + y(t)d~(t); y(to) = 1 w.p.!

as the appropriate stochastic differential equation in the form of (11-73) or equivalently (11-75)to yield a solution in the form of e~(l), since ~(to) = °w.p.!. Thus, it can be seen that stochasticdifferential equations do not obey formal rules of integration either. The differential equation thatwould have been proposed by formal rules,

dz(t) = z(t) d~(t); z(to) = 1 w.p.1

can be shown via (11-84) to have a solution, for to = 0, of

zit) = eIM t ) - (Qt/2 )] •


EXAMPLE 11.7 A model that has been used for normal acceleration of an aircraft. i.e.,acceleration orthogonal to the plane of its wings, is of the form [36, 37] :

a(t) = ao + be"(/)

where the coefficients ao, b, and c are constants associated with a particular class of aircraft, andn(',') is an exponentially time-correlated Gauss-Markov motion, the solution to

dn(t) = -(l/T)n(t) dt + d~(t)

with ~( " .) Brownian motion of constant diffusion Q. Using (11-84),a( " . )could be modeled directlyas the solution to the differential equation

da(t) = [bee,n(,)] dn(t) + !Q[be2e,n(t)] dt

= e[a(t) - ao][ -(I/T)n(t)dt + d~(t)] + !Qc2[a(t) - ao] dt

= {[a(t) - ao][(Qc2/2) - (l/T)ln[(a(t) - ao)/b]]} dt + {e[ a(t) - ao]} d~(t)

Although this is of the form of (11-73), the original linear dynamics relation with nonlinear outputis more desirable computationally for generating a(·,·). •

EXAMPLE 11.8 In many physical applications, nonlinear dynamical systems are driven bytime-correlated noises that can be modeled as the output of linear shaping filters driven by whiteGaussian noise. In such applications, the appropriate Ito stochastic differential equations arereadily generated. Let a dynamical system be described heuristically by

x(t) = f[x(t), n(t), t]

where f is in fact often linear in n(','), and let n be the output of a linear shaping filter,

dxr(t) = F(t)x,(t)dt + G(t)dp(t)

nit) = H(t)x,(t)

Then an augmented stochastic differential equation results as

[dX(t)] [f[x(t), H(t)x,(t), t]] [ 0 ]

= dt + dp(t)dx,(t) F(t)x,(t) G(t) •

EXAMPLE 11.9 Consider the general scalar stochastic differential equation

dx(t) = f[x(t), t] dt + G[x(t), t] d~(t)

with x(to) = 0 w.p.! and ~(.,.) having diffusion Q(t). Now let JjJ[x(t), t] = X(t)2 Then, by (11-84),X(t)2 satisfies

dljJ[x(t).t],@, d[x 2(t)] = 2x(t)dx(r) + !{2G[X(t),t]2Q(t)}dt

= {2x(t)f[x(t), t] + G[ x(t), t]2Q(t)} dt + {2x(t)G[ x(t). t]} d~(t)

or, in view of(I1-75),

x2(t) = s,: {2x(r)f[x(r),r] + G[x(r),r]lQ(r)}dr + s,: {2x(r)G[x(r),r]} d~(r)

Here, the differential generator of the process x2 is

2'{IjJ[x(t),t]} = 2x(t)f[x(t),t] + G[x(t),t]lQ(t) •

In view of Examples 11.5 and 11.6, it might seem appropriate to considerstochastic differential equations based upon the Stratonovich stochastic integral

11.5 ITO SroCHASTIC DIFFERENTIAL EQUATIONS 189

instead of the Ito definition. The fact is, if we restrict our attention from a( ., .)being admissible, i.e., a(t,·) being dependent on at most {~(t'), to :5: t' :5: t}, toa(t,') being an explicit function of only ~(t), and if a[~(t), t] has a continuouspartial derivative with respect to its first argument and is continuous in itssecond argument and J:~ E{a[~(r),r]2}dr is finite with probability one, thenthe mean square limit

N-l

l.i.m. L a[t~(t;) + t~(ti+ d, tJ[~(ti+ d - ~(t;)]N-OC! i=O

exists and defines the Stratonovich stochastic integral. Moreover, if this isdenoted as {J:o a[~(r), r] d~(r)}s, then Stratonovich showed its relation to theIto integral to be

{i t } it 1 it aa[~, r]

Jroa[~(r),r]d~(r) s = Jtoa[~(r),r]d~(r)+2Jto Q(r) a~ dt w.p.l

(11-87)

As a result, for the restricted class of functions for which the Stratonovichintegral can be defined, there is a one-to-one correspondence between thescalar I to equation

dx(t) = f[x(t), t] dt + G[x(t), t] d~(t)

and the scalar Stratonovich equation

dx(t) = {J[x(t), t] - tQ(t)G[x(t), t] aG[x(t), t]/ax} dt

+ G[x(t),t]d~(t)

(1l-88a)

(11-88b)

since a[~(t), t] ~ G[x(t), t] and so aa/a~ = [aG/ax][ax/a~]. This can be of useheuristically in generating Ito stochastic differential equations since Stra-tonovich equations obey formal integration rules.

EXAMPLE 11.10 Recall Example 11.6 in which an Ito stochastic differential equation wassought so as to yield a solution eP(tI. Using formal rules of integration, the Stratonovich equation(1l-88b) would be

dx(t) = x(t) d~(t); x(to) = 1 w.p.1

The associated Ito equation (1l-88a) is then found as

dx(tJ = UQx(t) ax/ax} dt + x(t) d~(t)

= !Qx(tJdt + x(t)d~(t)

as found previously by applying the Ito differential rule. •

The previous discussion and example, as well as Example 11.8, have impactupon numerical simulation of stochastic differential equations [2, 24, 38, 51, 52,


54, 66, 74, 75J, as for conducting Monte Carlo simulations on a digital com-puter. In essence, our stochastic differential equation models are meant toprovide adequate representations of true physical processes, and we must askhow well solutions, or simulations of solutions, to these equations representthe desired physical process. If a physical process can be generated via

x(t) = f[x(t), tJ + G[x(t), t]n(t) (11-89)

or even the more general form as seen in Example 11.8, and n(',') is a time-correlated Gaussian process, the appropriate stochastic differential equationand simulation thereof are readily produced via linear shaping filter design andstate augmentation.

EXAMPLE 11.11 A digital simulation of the augmented system in Example 11.8 can begenerated by first developing the equivalent discrete-time model for the shaping filter:

where «II(tj+I. til is the state transition matrix satisfying

«II(ti , tJ = I

and wd( " ,) is discrete-time zero-mean white Gaussian noise with E{Wd(ti)W/(tj)} = Qd(tj),)ij and

A fun discussion of such simulations is provided by Sections 4.9 and 6.8 and Problem 7.14 ofVolume 1.

With Xf(',') so simulated, x(', -) can be simulated on a sample-by-sample basis using any numer-ical integration procedure. For instance, Euler integration would yield

X(ti+I,W.) = x(tj,w.) + f[x(t"w.), H(tilxdtj,w.), ti]{t j + 1 - til •

This approach really bypassed the modeling and simulation subtleties ofIto differential equations by accounting for the time correlation (finite band-width) of n(',') in (11-89) and generating an augmented state equation of theform

(11-90)

in which G, is not a function ofxa(t).The question still remains: How would we simulate (11-73), heuristically

the result of replacing time-correlated n(',') with white Gaussian w(',·) in(11-89)? To gain some insight, consider the scalar case and a polygonal ap-proximation to scalar Brownian motion, J3p ( ' , ' ) , defined over each interval

11.5 ITO STOCHASTIC DIFFERENTIAL EQUATIONS

Consider the differential equation

191

(11-92)

Assume J[', .] and G[','] satisfy the conditions imposed at the beginning ofthis section and further that aG[',' ]jux be continuous in its arguments and{Q(t)G[x, t] oG[x, t]jox} be uniformly Lipschitz in its first argument x andcontinuous in t, and let xp(to) = x(to). Then [74]

l.i.m. xp(t, .) = x(t, .)max{ti + 1 - t l } - 0

for all t, where x(t, .) is the solution to the scalar Stratonovich equation

dx(t) = J[x(t), t] dt + G[xtr), t] dP(t)

or, in view of (11-88), the scalar Ito equation

dx(t) = {I[x(t), t] + tQ(t)G[x(t), t] aGex(t), t]jox} dt

+ G[x(t), t] dp(t)

(11-93a)

(11-93b)

(11-93c)

Thus, if (11-91) and (11-92) were the basis ofa numerical simulation, then aswe let the simulation sample period be decreased, the results would convergein the mean square sense to the solution of (1l-93b) or (11-93c).

EXAMPLE 11.12 A digital simulation could generate a sample ofa sequence of independentrandom variables [B(ti+"') - B(ti,')] for i = 0, 1,2, ... , by independent calls to a realization gen-eratorforrandom variables with mean zero and variance U::" Q(r)dr]. Then, using(lI-91), Bp ( ' , Wk)

would be a piecewise linear function connecting successive B(t" wd values. Then, on a sample-by-sample basis,

becomes an ordinary nonlinear differential equation to be solved numerically by standard Runge-Kutta or predictor-corrector methods. •

Extensions to the vector case can be made if P(',·) has diffusion Q(t) == Ifor all t: the appropriate correction term on the ith component of dx(t), corre-sponding to the scalar term in (1l-88b) or (11-93c), is

If P(', .) has nondiagonal diffusion Q(t), such results are not valid, but this isnot a severe restriction in view of a vector extension of (11-57):

it G[x(r),r]dp(r) = it {G[x(r),r]QI/2(r)}dp'(r)Jto Jto~ it G'[x(r),r]dp'(r)Jto (11-94)

with Q l /2(t) defined as in Chapter 7 of Volume 1, and evaluated for exampleby Cholesky decompositions described there.

Thus, it might appear that, for modeling and simulation purposes, Stra-tonovich stochastic integrals and differential equations are more natural thanthe Ito type. However, the Stratonovich definition is not as useful as the Itoform for many reasons [10, 19,20,24, 32-34,40,41,48,51, 52, 77]. Properties(11-64)-(11-66) are no longer valid, and Stratonovich stochastic differentialequations do not model Markov processes, making estimation problems sub-stantially more complicated. In fact, since the Stratonovich integral is definedonly for the restricted class of a functions (functions of ~(t) only, rather than{~(r), to ~ r ~ t}), it is not applicable to continuous-measurement estimationproblems in which (recalling (11-29a)) E {hEx(t), t] I[y(r), to ~ r < t]}, i.e., func-tionals of {~(r), to ~ r < r}, appear in integrands. Furthermore, the motivationto use Stratonovich results, the preservation of formal calculus rules, is notfulfilled in the general vector case [6, 7] with P(·,·) of nondiagonal diffusionQ(t) or in extensions to P(',·) being replaced by a non-Gaussian independent-increment process.

On the other hand, it is fruitful to pursue Ito stochastic integrals, which canbe extended to non-Gaussian zero-mean independent-increment processesreplacing P(·,·). Properties (11-64)-(11-66) do remain valid, and generalizedstochastic differential equations do still yield Markov solutions. The Ito differ-ential rule as stated in (11-84) is not valid, but it can be modified [30,45, 63]to handle the generalized case as well.

11.6 FORWARD KOLMOGOROV EQUATION

The Ito stochastic differential equations of the previous section provide ameans of describing a large and important class of Markov processes knownas diffusions. Being Markov, these processes are characterized by their transi-tion probability densities, and it is natural to ask how these transition densitiespropagate in time. It is shown that the forward Kolmogorov equation, orFokker-Planck equation, is the partial differential equation that these transitionprobability densities must satisfy in their propagation forward in time [9, 12,13,21,40,49,53, 55, 57, 69, 70]. As might be suspected from the name, there arealso backward Kolmogorov equations, which will be important and fully dis-

11.6 FORWARD KOLMOGOROV EQUATION 193

cussed in conjunction with stochastic optimal control problems in Chapter 13of Volume 3.

In the special case of linear stochastic differential equations driven byBrownian motion (or, heuristically, white Gaussian noise), the transition den-sities remain Gaussian. Thus, although the forward Kolmogorov equation isvalid for this special case, we previously did not exploit it. Instead, we generatedexpressions for propagating means, covariances, and covariance kernels, whichtotally defined the time propagation of Gaussian densities. Here, these momentsdo not define the transition density completely, and in fact we will not even beable to generate proper differential equations for these moments themselveswithout knowledge of the entire density. Thus, the forward Kolmogorovequations will be of primary importance both for stochastic process charac-terization in this chapter and for filter time propagation relationships to bederived in the next chapter.

Given the Markov process x(',·) generated as the solution to the It6 sto-chastic differential equation

dx(t) = f[x(t), t] dt + G[x(t), t] d~(t) (11-95)

with ~(".) being Brownian motion of diffusion Q(t) for all tE [to,tJ ] , thetransition probability density for x( " . ),fi~, tip, t') as defined in (11-48), satisfiesthe forward Kolmogorov equation:

af.(~,:}p,tl) = - itl

a~i Ux(~,tlp,t');;[~,t]}

1 n n a2

+ 2: i~l j~l a~ia~j [fx(~,tlp,t'){G[~,t]Q(t)GT[~,t]}ij](11-96)

assuming the existence of the continuous partial derivatives as indicated. In(11-96), /;[~,t] is the ith component of f[~,t] as given in (11-95), not to beconfused with the transition density fx(~, tip, t'). Similarly, {GQGT}ij is thei-j element of the n-by-n matrix GQGT.

Formal proof Given the It6 stochastic differential equation (11-95),x(', -] can be described via

x(t) = x(to) + rr f[x(r),r]dr + r' G[x(r),r]dp(r)Jto j.,

We want to show that J.(~, tip, r'), i.e., the conditional density of J.(,jIX(,·j(~ Ip), for the process x(" .)does in fact satisfy the forward Kolmogorov equation (11-96).To do so [9, 32], we first assume thatthe derivatives

oJ./8t,

exist and are continuous for all i and j. This proof will be formal in that existence of appropriatelimits is assumed rather than proven; rigorous proofs include such existence [29].

This proof will entail integration by parts of certain functions over R", i.e., with infinite upperand lower limits. As a general approach to this kind of problem, we first evaluate results for inte-grands that are nonzero only over a finite region of R" and that smoothly go to zero at the boundary

of this region, allowing integration by parts to be performed simply. Then, the region over whichthe integrand can take on nonzero values is expanded to fill as much of R" as required. To thisend, we define a hypercube in R·:

{~:al:S; ~I:S; b l , a2:S; ~2:S; b2, ... , a.:S;~.:S; b.}

with an interior defined as the set S

S = g:al < ~I 0 if ~ E S;(2) r(~)=Oif~IfS;

(3) r(~) has continuous first and second derivatives;(4) (lr(~)/a~i = 0 and (l2r(W(l~ia~j = 0 if ~ E Bs, for all i andj.

In other words, r(~) can assume arbitrary positive values in the interior S, is zero on the boundaryand outside the hypercube, and goes to zero smoothly at the boundary.

Writing the partial of f. with respect to t as the appropriate limit, the properties of r(~) allow usto write

r af.(~,tlp,t') r(~)d~ = lim j: f.(~, t + Mlp, t') - f.(~,tlp,t') r(~)d~Js at ""~o -00 flJ.t

where we note the integrations are over n-dimensional hypervolumes. Since x( ',.) is Markov (prop-erty (3) in Section 11.5), the Chapman-Kolmogorov equation (11-50) is valid and yields

r(~)

r(~)

---------'""-----"""'"--"'~I

b 2

-Bs(boundary of rectangle)

S(interior of rectangle)

FIG. 11.5 Hypercube interior, boundary, and r(~) for two-dimensional case.


Substituting this into the preceding equation yields

r 8!.(~,tlp,t') I {foo [foo JJs i1t r(~)d~ = Li~ I'>t -00 -a!.(~' t + I'>tl',tlf,K tip, t')d' r(~)d~

- f_oo", !.(~, tlp,t')r(~)d~}

= lim ~{foo [foc f~(~, t + I'>tl" t)r(~)d~Jf~(',tlp,t')d'6.l-+0 L\t -'X -t tr .

- roo f~(~,tlp,t')r(~)d~}

195

after changing the order of integration. Now reverse the roles of the dummy variables ~ and' inthe first term to obtain

Now r(') can be expanded in a Taylor series about the point ~ to obtain

Dr(~) I 02r(~)r(') = r(~) + Elf [, - ~J + "2 [, - ~y ~ [, - ~J + o{II'- ~In

" or I n n D2 r= r(~) +I "~ [C - ~iJ + -2 II 'J~"" [(i - ~J[C - ~J + o{il' - ~II':

!=ll'~l l=l J=l (slv!.,j

Substituting this into the previous expression provides

J, (1!.(~, tip, t') I foo I {[foc; ]s r(~)d~ = lim - _ !.(~,t p,t') _ lxi', t + MI~, tId' r(~) - r(~)at &,-ol'>t 00 00

;, ar(~) foo+ i~l a~i _oo!x("t+MI~,tl[C-~Jd'

I n " a2r(~) 00

+"2 i~l j~i a~ia~j f-oo !.(" t + 1'>*' t)[(i - ~iJ[C - ~jJ d'

+ roo o{II'- ~In!.(', t + I'>tl~, t)d'}d~But, the first two terms (on the right hand side) sum to zero, since jooa, !.(" t + MI~, tId' = 1.Now we take the limit as M --+ 0, using properties (6)-(8) of Ito differential equation solutions inSection 11.5, to obtain

Since the first and second partial derivatives of r with respect to ~ are zero outside the interior set S,the integration over all ~ on the right can be replaced by an integration only over S. The expressions


on the right hand side are now integrated by parts, using the fact that the partials of r are zero onthe boundary Bs . For instance, the first term in the first sum is evaluated as

rb, •.. rbn

[ rb, j, f (8r/8~ ) d~ ] d~ ... d~J02 Jo., JUI ~~ 2 n

U dv

Integration by parts on the bracketed term, i.e.,

fb' Ib' fb'udv = uv - vdu", 0, 0,

yields

rb, ••• rbn [0 _rb

, r{8(J../il} d~ ] d~ '" d~Ja2 Jan JOt a~ 1 1 2 n

The double summation terms require two integrations by parts. Performing the integrations andcollecting terms yields

Since r(~) is an arbitrary positive function in S, this implies that the term within the outer bracketsmust be zero, which is in fact the forward Kolmogorov equation. Note that S can be expanded aslarge as necessary without altering the result. •

An alternative derivation of the forward Kolmogorov equation using char-acteristic functions is developed in Problem 11.14 [34, 45, 53].

EXAMPLE 11.13 For scalar Brownian motion, the stochastic differential equation is

dx(t) = qJ/2(t)d(3'(t)

where the diffusion of (3'(',.) is assumed to be unity and the diffusion of x(·,·) is then q(t); this is adegenerate case of (11-95) with f[x(t), t] == 0 and G[x(t), t] = G(t) = ql/2(t). The forward Kolmo-gorov equation for the Brownian motion x(·, .) is then

Note that this is also the equation for the diffusion of heat in an infinite length rod, and hence theterminology "diffusion" for q(t). •

The appropriate initial condition for (11-96)is simply j~(~, t' Ip, t') = (5(~ - pl.But we can view x(t ') itself as a random variable with associated dummy variablep, instead of considering p merely as a given realization ofx(t'). Thus, J.(~, tip, t')itself can be a random variable, a function ofx(t'). Its expected value is jrecalling(11-52), the expectation of the Chapman-Kolmogorov equation):

(11-97)

11.6 FORWARD KOLMOGOROV EQUATION 197

where now f.(I')(P) is considered as an initial condition. Now take (11-96), multi-ply through by f.(t.)(p)dp, and integrate out the dependence on p. By so doing,one can conclude that f.(t)(~) also satisfies the forward Kolmogorov equation:

af.(t)(~) _at -

n a-i~l a~i {f.(t)(~)/;[~,t]}

1 n n a2

+"2 i~l j~l a~ia~j [f.(t)(~){G[~,t]Q(t)GT[~J]LJ (11-98)

Solutions to the forward Kolmogorov equation are not generally simple toobtain analytically [3]. However, as the next example shows, solutions can begenerated for applications involving linear system models.

EXAMPLE 11.14 Consider the scalar linear system model

dx(t) = a(t)x(t) dt + g(t) d~(t)

where ~(.,.) is scalar Brownian motion of diffusion q(t). Thus, we can identify

f[~,tJ = a(t)~, G[~,tJQ(t)G[UJ = g2(t)q(t)

The forward Kolrnogorov equation becomes

of, [Of, ] 1 2 02f,at = - a(t)~ o~ + aiti]; + 2: g (t)q(t) i)~2

This can be used either to propagate f,(~, tip, r') forward from f,(~, t' Ip, t') = ,,(~ - p), or to prop-agate f,(l)(~) from an initial condition of f,(l')(p). To describe either density's time propagation, it iseasiest to employ the characteristic function; for instance,

¢,(/l, t) = f",,,, ei"<f,(lW d~

Multiplying both sides of (11-98) by ei"<and integrating yields

0<jJ,(/l, t) fro { of, 1 2 02f,} <--0- = -a(tl<k,(/l,t) + -a(t)~ - + - g (t)q(t) -2 e'" d~

t J - '" o~ 2 o~

The left hand side is obtained by writing the original partial as the appropriate limit,

multiplying each term by eM and integrating, and then taking the indicated limit. Now performintegration by parts, assuming that f,(l)(~) .... 0 as ~ .... ± cc , i.e., the boundary condition at theedge of the space is that the density has asymptotically approached zero height. First consider theterm

Since S'" co udv = uv 1", ro - S=' co V du, this is equal to

[ f ro ., . , ] [ i1¢,(P.,t)]- a(t) 0 - _cc (el"' + jp.~el"')I,d~ = a(t) ¢,(p., t) + p.---a;;-

Similarly, by two integrations by parts,

1 2 fro e2f, . < 1 2 2- 9 (t)q(t) -2 e'" d~ = -- 9 (t)q(t)p. ¢,(p., t)2 -roe~ 2

Substituting these results into the expression for (!¢,jC't yields

From our previous work, we know that x( " -] is in fact a Gaussian process, so we will assumethat ¢,( u, t) is the characteristic function for a scalar Gaussian x( " '), namely

so that

e¢,/et = [jp.mx(t) - !p. 21\(t)]¢,(p., t)

tl¢,jap. = [jmx(t) - p.PAt)]¢,(p., t)

Substituting these expressions into the equation above yields

[jp.m,(t) - !p.2P,(t)]¢,(p., t) = a(t)p.[jmx(t) - p.Px(t)]¢,(p., t) - !g2(t)q(t)p.2¢,(p., t)J

Now divide through by ¢,(p., t), and equate the real and imaginary parts of the equation separately(since these must be true individually for the total equation to be true), generating

mAt) = a(t)mAt), Px(t) = 2a(t)P At) + g2(t)q(t)

These can be recognized as the scalar form of the equations for propagating the mean and covariancedescribing .f~;r)(~)' as derived previously, i.e., (11-17) and (11-18). •

EXAMPLE 11.15 The previous example generalizes readily to the case of vector x( " -):

dx(t) = F(t)x(t)dt + G(t)dll(t)

where 11(',') is Brownian motion of diffusion Q(t). The forward Kolmogorov equation becomes

al. = _[Of, F(t)~ + f, tr{F(t)}] + ~ tr {G(t)Q(t)GT(t) jP~}Dt D~ 2 a~

and the corresponding time propagation for the characteristic function is

EXAMPLE 11.16 Consider the nonlinear system described by the n-dimensional vectorstochastic differential equation

dx(t) = f[x(t), t] dt + G(t) dll(t)

This is the special case of (11-95) upon which the extended Kalman filter was formulated in Chap-ter 9: G is not a function of x(" '), but merely a deterministic function of time. In this case, the


forward Kolmogorov equation is

199

Rather than attempting to propagate the entire density h(c;, tip, t') or!x(t)(c;), we might try to achieve a partial description of the density by propa-gating a finite number of its moments [40, 41, 76]. Unlike the case involvinglinear dynamics driven by white Gaussian noise, we can only expect a partialdescription since !x(t)(c;) will not generally be of Gaussian or other form describedcompletely by the first two, or any finite number, of moments. However, eventhis partial description cannot be obtained precisely: propagation of the meanand covariaryce of x(t) for all t will not be feasible without knowledge of theentire density !x(!)( c;) for all t! In fact, if x( " .) is the Markov solution to (11-95),then its mean mAt) and covariance PAt) propagate according to

mAt) = E{f[x(t), t]}

I\(t) = [E{f[x(t), t]xT(t)} - E{f[x(t), t]}m/(t)]

+ [E{ X(t)fT[x(t), t]} - mx(t)E{fT[x(t), tJ}]+ E{G[x(t), t]Q(t)GT[x(t), tJ}

as will now be shown.

Mean and Covariance Propagation Equations

(11-99)

(11-100)

Let x(·,·) be the solution to (11-95) and define the mean and covariance of x(t, .) for all t as

mAt) = f-'''oo ~J.(t)(~)d~

PAt) = f~oo ~~TJ.(r)(~)d~ - mxlt)m/(t)

Differentiating these expressions with respect to time yields

rilx(t)

= foo ~ (lJ.(tl~) d~- 00 (It

i'x(t) = fao ~~T OJ.(t)(~) d~ - rilxlt)m/(t) - mx(t)riI/(t)- 00 (It

Since J.\Ij(~) satisfies the forward Kolmogorov equation, we can evaluate ril x(t) as

n (Iril x(t) = - roo I (I~i {J.(t)(~);;[~,t]gd~

1 n n (12

+:2 roo i~l j~l (I~i (I~j [J.(t)(~l{ G[~, t]Q(t)GT[~, t]}ij]; d~


Writing out the first term on the right hand side yields

Let us evaluate the first of these n terms by parts to get

f "" f"" 8- ... -[f.f]"d~ dt: . "d~- 00 - "" a~ 1 • 1" 1 ~ 2 "

=f"" ···fOO [-foo ~~[f.j·]d~Jd~ '''d~- 00 - OCJ - 'X: c~ 1 xlI 2 n

-~U de

Treating the other (n - I) terms similarly yields the first right hand side term ofthe mAt) equation as

But note that a~/8~1 = [1 0 0 ... oy, 8~/a~2 = [0 I 0 ... oy, etc., so that the summationterm above is just f[~, t], yielding

+ r""f.(r)(~)f[~,t]d~ = +E{f[x(t),t]}

Integrating the second term in the mAt) equation by parts twice results in

where equality to zero follows from the fact that

since ej is an n-vector of zeros except for a unitjth component. Thus we have obtained (11-99).Similarly, the Px(t) equation becomes

. f"" ~ a TPAt) = - _"" ,L;li1~i {,r.(r)(~)/;[~,t]g~ d~

1 "" 82

+ 2. f~oo i~l j~l a~i a~j [.r.(r)(~){ G[~, t]Q(t)GT[~, t]}ij]~~T d~

- mAt)mxT(t) - mx(t)mxT(t)

By integrating the first term by parts, we obtain


But

(I(~~T)/O~l = [1 0 0 O]T~T +;[1 0 0 0]

O(~~T)/O~2 = [0 1 0 O]T~T + ~[O 1 0 0], etc.

JSo this first term becomes

roo .r..(l)(~){f[~, t] ~T + ~fT[~, t]} d~ = E{f[x(t), t]xT(tJ} + E {x(t)fT[x(t), t]}

Performing a double integration by parts on the second term in the PAt) relation generates

But

and similarly for the other such terms, so that this integral becomes

H2 f-"'oo .r..1t)(~){G[~, t]Q(t)GT[~, t]}) = E{G[x(t), t]Q(t)GT[x(t), t]}

Substituting these results into the PAt) equation generates (11-100). •

201

Note that (11-99) and (11-100) are analogous to the results

mAt) = F(t)mx(t)

PAt) = F(t)px(t) + PAt)FT(t) + G(t)Q(t)GT(t)

corresponding to the linear equation

dx(t) = F(t)x(t) dt + G(t) dp(t)

(11-101)

(11-102)

(11-103)

Moreover, (11-99) and (11-100) reduce to these results for the case off[x(t), t] =F(t)x(t), G[x(t), t] = G(t). However, unlike this special case, to calculate mAt)and PAt) for propagation purposes, we generally require the entire density

f..(ti~) in order to evaluate the expectations in (11-99) and (11-100).Looking ahead to the optimal filtering problem based upon nonlinear

dynamics models, we can gain an appreciation for the unique complexity ofthe results as compared to linear filtering. An obvious extension ofthe precedingdiscussion reveals that the entire density .r.(t)lz(t,)(~ IZi) would be required merely


to compute the propagations of the conditional mean and conditional covari-ance of x(t) from sample time t, to sample time t;+ r- The measurement updateswill also be more complex, as will estimators for the continuous-measurementcase. These topics will be the subject of the next chapter.

11.7 SUMMARY

In general, a stochastic process x( ., .) is characterized by a joint distributionFxU Il • . . . ,x (tN)(~ 1, ... , ~N)for all times t, in a time set T of interest, an uncountablyinfinite number of times for continuous-time processes. If we restrict ourattention to Markov processes, this complete description is also provided bythe transition probability distribution Fx(t)lx(t')(~ Ip) or its derivative (assumedhere to exist), the transition probability density. Besides being descriptive of alarge class of physically observed phenomena, Markov processes thus willfurnish substantial benefit in mathematical tractability of process characteriza-tion and subsequent estimator and stochastic controller derivations.

Ito nonlinear stochastic differential equations of the form of (11-73)-(11-75)generate solution processes which are in fact Markov. Proper development ofthese equations is based upon the definition of the Ito stochastic integral as in(11-63) and (11-69), and the associated Ito stochastic differential as in (11-70)and (11-71). The properties of these integrals, differentials, and solutions todifferential equations were seen to be affected particularly by the Levy oscilla-tion property of Brownian motion, (11-59). Formal rules of calculus were seento be invalid, and instead the Ito differential rule, (11-84), is a fundamentalresult for exploiting Ito stochastic differential equations for nonlinear dynamicsmodels.

Since the solutions to Ito differential equations are Markov, the time propa-gation of the transition probability density f.(~, tip, t') is of primary importancefor characterizing the process itself. The forward Kolmogorov equation (11-96)was shown to be the partial differential equation f.(~,tlp,t') satisfies throughsuch time propagation.

Nonlinear state dynamics models in the form of Ito stochastic differentialequations can be combined with either sampled-data measurement models ofthe form in (11-28), or continuous-time measurements as modeled by (11-29),to generate overall nonlinear system models. Those will be the basis not onlyof describing physical phenomena mathematically, but also of estimator andstochastic controller formulations in subsequent chapters.

REFERENCES

1. Arnold, L., "Stochastic Differential Equations: Theory and Applications." Wiley, New York,1974.

2. Astriirn, K. J., On a first order stochastic differential equation, Internat. J. Control I ,301-326(1965).

REFERENCES 203

3. Bharucha-Reid, A. T., "Elements of the Theory of Markov Processes and Their Applications."McGraw-Hill, New York, 1960.

4. Cameron, R. H., and Martin, W. T., The orthogonal development of nonlinear functionalsin series of Fourier-Hermite functions, Ann. of Math. 48, 385-392 (1947).

5. Cameron, R. H., and Martin, W. T., The transformation of Wiener integrals by nonlineartransformations, Trans. Amer. Math. Soc. 66, 253-283 (1949).

6. Clark, J. M. C., The Representation of Nonlinear Stochastic Systems with Applications toFiltering. Ph.D. dissertation, Imperial College, London, England (April 1966).

7. Clark, J. M. c., The representation offunctionals of Brownian motion by stochastic integrals,Ann. Math. Statist. 41 (4),1282-1295 (1970).

8. Desoer, C. A., "Notes for a Second Course on Linear Systems." Van Nostrand-Reinhold,Princeton, New Jersey, 1970.

9. Deyst, J. J., Estimation and Control of Stochastic Processes, unpublished course notes, MITDept. of Aeronautics and Astronautics, Cambridge, Massachusetts (1970).

10. Doob, J. L., "Stochastic Processes." Wiley, New York, 1953.II. Dynkin, E. B., "Markov Processes." Academic Press, New York, 1965.12. Einstein, A., "Investigations on the Theory of the Brownian Movement," Dover, New York,

1956.13. Elliot, D., Controllable Nonlinear Systems Driven by White Noise. Ph.D. dissertation, UCLA,

Los Angeles, California (1969).14. Feller, W., "An Introduction to Probability Theory and its Applications," Vol. II. Wiley,

New York, 1966.15. Fishman, P. M., and Snyder, D. L., The statistical analysis of space-time point processes,

IEEE Trans. Inform. Theory 1T-22 (3),257-274 (1976).16. Fisk, D. L., Quasi-Martingales and Stochastic Integrals. Ph.D. dissertation, Michigan State

Univ. (1963).17. Fisk, D. L., Quasi-Martingales, Trans. Amer. Math. Soc. 120,369-389 (1965).18. Friedman, H., "Stochastic Differential Equations and Applications," Vols. I and II. Academic

Press, New York, 1975 and 1976.19. Frost, P. A., Nonlinear Estimation in Continuous Time Systems. Ph.D. dissertation, Stanford

Univ. (1968).20. Frost, P. A., and Kailath, T., An innovations approach to least-squares estimation-Part III:

Nonlinear estimation in white Gaussian noise, IEEE Trans. Automat. Control AC-16 (3),217-226 (1971).

21. Fuller, A. T., Analysis of nonlinear stochastic systems by means of the Fokker-Planckequation, Internat. J. Control 9 (6), 603-655 (1969).

22. Gel'fand, I. M., and Vilenkin, N. Y., "Generalized Functions," Vol. 4. Academic Press,New York, 1964.

23. Gikhman, I. I., and Skorokhod, A. V., "Introduction to the Theory of Random Processes."Saunders, Philadelphia, Pennsylvania, 1969.

24. Gray, A. H., Jr., and Caughey, T. K., A controversy in problems involving random parameterexcitation, J. Math. Phys. 44, 288-296 (1965).

25. Hida, T., "Stationary Stochastic Processes," Princeton Univ. Press, Princeton, New Jersey,1970.

26. Hoversten, E. V., Harger, R. 0., and Halme, S. J., Communication theory for the turbulentatmosphere, Proc. IEEE 58, 1626-1650 (1970).

27. Ito, K., Stochastic integral, Proc. Imp. Acad. Tokyo 20, 519-524 (1944).28. Ito, K., Multiple Wiener integral, J. Math. Soc. Japan 3 (1), 157-169 (1951).29. Ito, K., On stochastic differential equations, Mem. Amer. Math. Soc. 4, 1-51 (1951).30. Ito, K., Lectures on Stochastic Processes. Tata Inst. Fundamental Research, Bombay, India

(1961).


31. Ito, K., and McKean, H. P., "Diffusion Processes and Their Sample Paths." Springer, Berlinand New York, 1965.


33. Kailath, T., Likelihood ratios for Gaussian processes, IEEE Trans. Informal. Theory 1T-16(3),276-288 (1970).

34. Kailath, T., and Frost, P., Mathematical modeling of stochastic processes, Stochastic Probl.Control, Proc. Joint Automat. Control Conf., Ann Arbor, Michigan pp. 1-38 (June 1968).

35. Karp, S., O'Neill, E. L., and Gagliardi, R. M., Communication theory for the free-spaceoptical channel, Proc. IEEE 58,1611-1626 (1970).

36. Kendrick, J. D., Estimation of Aircraft Target Motion Using Pattern Recognition OrientationMeasurements. Ph.D. dissertation, Air Force Institute of Technology, Wright-PattersonAFB, Ohio (1978).

37. Kendrick, J. D., Maybeck, P. S., and Reid, J. G., Estimation of aircraft target motion usingorientation measurements, IEEE Trans. Aerospace Electron. Systems AES-17, 254-260 (1981).

38. Kulman, N. K., A note on the differential equations of conditional probability densityfunctions, J. Math. Anal. Appl. 14,301-308 (1966).

39. Kunita, H., and Watanabe, S., On square-integrable martingales, Nagoya Math. J. 30, 209-245(1967).

40. Kushner, H. J., On the differential equations satisfied by conditional probability densities ofMarkov processes, with applications, J. SIAM Control 2, (1),106-119 (1964).

41. Kushner, H. J., Dynamical equations for optimum nonlinear filtering, J. Differential Equations3,171-190 (1967).

42. Levy, P., A special problem of Brownian motion, and a general theory of Gaussian randomfunctions, Proc. Berkeley Symp. Math Statist. and Probability, 3rd Vol 2, pp. 133-175. Univ.of Calif. Press, Berkeley, 1956.

43. Loeve, M., "Probability Theory." Van Nostrand-Reinhold, Princeton, New Jersey, 1963.44. Maybeck, P. S., "Stochastic Models, Estimation and Control," Vol. 1. Academic Press,

New York, 1979.45. McGarty, T. P., "Stochastic Systems and State Estimation." Wiley, New York, 1974.46. McKean, A. P., Jr., "Stochastic Integrals." Academic Press, New York, 1969.47. McShane, E. J., Stochastic integrals and stochastic functional equations, SIAM J. Appl. Math.

17 (2), 287-306 (1969).48. McShane, E. J., Stochastic differential equations and models of random processes, Proc.

Berkeley Symp, Math. Statist. and Probab., 6th pp. 263-294. Univ. of California Press,Berkeley, 1972.

49. Merklinger, K. J., Numerical analysis of non-linear control systems using the Fokker-Planck-Kolmogorov equation. Proc. IFAC Conf., 2nd, August 1963. Butterworth, London, 1965.

50. Meyer, P. A., "Probability and Potentials," Blaisdell, Waltham, Massachusetts, 1966.51. Mortensen, R. E., Mathematical problems of modeling stochastic nonlinear dynamic systems,

J. Statist. Phys. 1,271-296 (1969).52. Mortensen, R. E., Balakrishnan's white noise model versus the Wiener process model, Proc.

IEEE Conf. Decision and Control, New Orleans, Louisiana pp. 630-633 (December 1977).53. Moyal, J. E., Stochastic Processes and Statistical Physics, J. Roy. Statist. Soc. Ser. B 11 (2),

150-210 (1949).54. Musick, S. H., SOFE: A Generalized Digital Simulation for Optimal Filter Evaluation;

User's Manual, Tech. Report AFWAL-TR-80-1108, Avionics Lab., Air Force Wright Aero-nautical Laboratories, Wright-Patterson AFB, Ohio (1980).

55. Nelson, E., "Dynamical Theories of Brownian Motion." Princeton Univ. Press, Princeton,New Jersey, 1967.

56. Parzen, E., "Stochastic Processes." Holden-Day, San Francisco, California, 1962.

PROBLEMS 205

57. Pawula, R. F., Generalizations and extensions of the Fokker-Planck-Kolmogorov equations,IEEE Trans. Informat. Theory 1T-13 (1),33-41 (1967).

58. Rhodes, I. B., and Snyder, D. L., Estimation and control for space-time point-processobservations, IEEE Trans. Automat. Control AC-22 (3),338-346 (1977).

59. Robinson, S. R., Maybeck, P. S., and Santiago, J. M., Performance evaluation of an estimatorbased upon space-time point-process measurements, IEEE Trans. Informat. Theory IT-2S,1982.

60. Skorokhod, A. V., "Studies in the Theory of Random Processes." Addison-Wesley, Reading,Massachusetts, 1965.

61. Snyder, D. L., Filtering and detection for doubly stochastic Poisson processes, IEEE Trans.Informat. Theory IT-IS (1),91-102 (1972).

62. Snyder, D. L., Point process estimation with applications in medicine, communication, andcontrol, Proc. NATO Advanced Study Inst., New Directions in Signal Processing in Commun.and Control (1974).

63. Snyder, D. L., "Random Point Processes." Wiley, New York, 1975.64. Snyder, D. L., and Fishman, P. M., How to track a swarm of fireflies by observing their

flashes, IEEE Trans. Informat. Theory IT-21 (6), 692-695 (1975).65. Snyder, D. L., Rhodes, I. 8., and Hoversten, E. V., A separation theorem for stochastic

control problems with point-process observations, Automatica 13, 85-87, (1977).66. Stratonovich, R. L., "Topics in the Theory of Random Noise," Vols. I and II. Gordon and

Breach, New York, 1963 and 1967.67. Stratonovich, R. L., A new form of representing stochastic integrals and equations, J. SIAM

Control 4, 362-371 (1966).68. Taylor, A. E., "Introduction to Functional Analysis." Wiley, New York, 1961.69. Uhlenbeck, G. E., and Orenstein, L. S., On the theory of the Brownian motion, Phys. Rev.

36,823-841 (1930).70. Wang, M. C., and Uhlenbeck, G. E., On the theory of the Brownian motion II, Rev. Mod.

Phys. 17 (2,3),323-342 (1945).71. Wiener, N., Generalized harmonic analysis, Acta Math. 55,117-258 (1930).72. Wiener, N., "Nonlinear Problems in Random Theory." MIT Press, Cambridge, Massachu-

setts, 1958.73. Wong, E., "Stochastic Processes in Information and Dynamical Systems." McGraw-Hill,

New York, 1971.74. Wong, E., and Zakai, M., On the relation between ordinary and stochastic differential

equations, Internat. J. Eng. Sci. 3, 213-229 (1965).75. Wong, E., and Zakai, M., On the convergence of ordinary integrals to stochastic integrals,

Ann. Math. Statist. 36, 1560-1564 (1965).76. Wonham, W. M., Some applications of stochastic differential equations to optimal nonlinear

filtering, 1. SIAM Control 2, 347-369 (1965).77. Wenham, W. M., Random differential equations in control theory, "Probabilistic Methods

in Applied Mathematics" (A. T. Bharucha-Reid, ed), Vol. 2. Academic Press, New York, 1970.

PROBLEMS

11.1 A linear stochastic system satisfies the differential equation

d2x(t)/dt2 + x(t) = n(t)

where n(·, -] is exponentially time-correlated noise, of mean zero and autocorrelation

E{ n(t)n(t + T)} = u/e-·I'I

Is x(',·) a Markov process? Explain your answer fully, indicating the Markov nature of any andall processes associated with this problem description.

11.2 Show that the solution process (11-12) to the linear stochastic differential equation (11-1)or (11-10) is in fact Markov, as claimed in the text below (11-13).

11.3 Show that ifx(',') is Markov-2, then y(',') defined in (11-35) is Markov-1 as claimed.

11.4 The electronic calculator market is highly competitive. Consider the simplified model ofsuccess probability as fol1ows. There are two states: (1) your calculator is in public favor, and (2)your calculator is not in public favor. Suppose that if you are in state 1 at the beginning of a 3-monthperiod, there is a 50~,~ chance of going out of favor by the end of the quarter. If you are in state 2at the beginning of a quarter, there is only a 40% chance of returning to state 1 by the end of thequarter. What is the long-term (steady state) probability that your calculator will be in the publicfavor?

Suppose you had an initial advantage in the market, so that your calculator is totally successful:the probability of being in state 1 is 100~0' Suppose further that you require the probability of beingin public favor to be at least 45% to remain profitable. After how many months will you have tointroduce a new model to maintain a competitive edge?

11.5 Consider the reliability problem in Example 11.4.

(a) Plot the time history of P4(t;), the probability of system failure versus time.

(b) Consider the same problem, but modified by the fact that if actuator A has failed, theprobability of actuator B failing in the next At sec is 0.03 instead of 0.02, B is no more likely tofail than A if both are working to start, but assume B is less able to operate reliably than A whenit is subjected to the Jul/load. Carry through all evaluations to the time history of P4(t;).

(c) Now assume that B is generally a less reliable actuator than A. Given that both are working,the probability that A will fail in the next At sec is still 0.01, but the probability that B wil1 failis 0.02. Failures are still independent of one another. Assume that failure probabilities conditionedon one actuator already failed are as in part (b). Again plot P4(t;l versus time.

(d) Return to the original problem as in part (a). However, now assume that online repair(or healing) and recertification of the actuator as operational is possible. Mathematically, let T l 2 =

T 13 = 0.25 instead of 0, and rework the problem.

(e) Repeat part (d) but with T 12 = T l 3 = 0.5, and compare results.

11.6 Explicitly demonstrate the mean square convergence results of (11-59), correspondingto the Levy oscillation property, by the means described in the text.

11.7 (a) Show that the expectation and limit-in-the-mean (l.i.m.) operations are inter-changeable; i.e, that for a sequence of scalar random variables X Nfor N = 1,2, .. , ,

E{!.i.m. XN} = lim E{xNlN-.x' N ..... -x:

as claimed in the demonstration of (11-65),To show this, consider the Schwarz inequality for elements Xl and X2 of a linear vector space

X with inner product <','>defined on it (to establish a measure of "length" of elements):

which, for the space of scalar random variables with finite second moments, becomes

(for real random variables, the magnitude notation I I can be removed). Let Xl = {xN- xJ andX 2 == 1 in the preceding expression, where X = !.i.m. xN.

PROBLEMS

(b) Similarly show that for sequences {XN] and {YN} with finite second moments,

E{[I.i.m. XNJ[I.i.m. YNJ} = lim E{XNYN)N ..... oo N ..... oc N .....«.

207

used in the demonstration of (11-66). (Note the important special case of XN == YN') To do this,define x = l.i.m'N~oc. xNand y = l.i.m'N~oc YN, and write

and relate this to the sum of three individual magnitude terms, and then apply the Schwarz in-equality to these three terms.

11.8 Show that

as claimed in the demonstration of(11-66). To do so, first show explicitly what it means for "{aN(r))to converge to atr), to S r S t, in the sense of (11-62)." Then write

It E{aN(T)bN(T) - a(r)b(T)}q(T)dTI

= Is.: E{aN(T)[bN(T) - b(T)] + [aN(T) - a(T)]b(r))q(T) dr/

sit E{aN(r)[bN(T) - b(T)]}q(T)dT! + Is.: E{[aN(r) - a(T)]b(r))q(rJdT!

and show that this converges to zero as N -> 00 by applying the Schwarz inequality.

11.9 Evaluate the results of applying the ItO differential rule in Example 11.9 for the specificchoice of f == 0 and G == I, and relate this to the evaluation of the Ito stochastic integral given inExample 11.5.

11.10 Let x(',·) satisfy the scalar version of the general stochastic differential equation describedby (11-73) and (11-74). Use the Ito differential rule to show that

[dx(t) I G2[x(t), t]Q(t)

d ln x(t)] = -x-(t-) - 2: -----C'--X--02-(t)-=-=--

11.11 (a) Use the Ito differential rule to derive the "fundamental theorem of Ito stochasticcalculus [32]": ifi/t[~(t)] is a twice continuously differentiable real scalar function of scalar Brownianmotion of diffusion Q(t) for all t, then for all t > t',

(t ai/t[~(r)] d~(T) = i/t[~(t)] _ i/t[Wl] _ ~ (t a2i/t[~(T)] Q( )dTJr' a~(T) 2 Jr' a~(T)2 r

(b) Use this result to evaluate S:, ~(Tl d~(T).

(c) Similarly evaluate S:. ~2(T) d~(T).

(d) By induction, evaluate S:, ~N(Tld~(T).

11.12 (a) Consider the general coupled scalar stochastic differential equations

dx1(t) = f,[X 1(t),x2(t), t] dt + G1[X1(t), x2(t),t] d~(t)

dx 2(t) = f2[X 1(t),X2(t),t] dt + G2[X1(t), x 2(t),t] d~(t)


Show that the product [x,(t)x 2(t)] obeys

d[x,(t)x 2(t)] = x ,(t)dx2(t) + x 2(t)dx ,(t)

+ G, [x,(t), x2(t), t]G2 [ x ,(t), x2(t), t]Q(t) dt

by use ofthe Ito differential rule. Show that this reduces to the result of Example 11.9 if x, = X 2 = x,

(b) Now apply this product rule to derive the covariance propagation equations for the caseof linear stochastic differential equations

dx(t) = F(t)x(t)dt + G(t)d~(t)

each component of which can be written as

dxi(t) = I Fik(t)X.(t)dt + I Gik(t)dP.(t)k=l k=l

or, equivalently,

d[xi(t) - E{xi(t)}] = I Fik(t)[X.(t) - E{x.(t))] + I Gi.(t)dP.lt)k=l k=l

First consider the simple case in which the diffusion of ~(.,.) is Q(t) == 1 for all time t. Generate anexpression for d[xi(t) - E{xi(tl}][xN) - E{x}t)}]. Take the expectation of this to yield

s

dPi}t) = I {F,.(t)p.M + Fj.(t)P.i(t)) + I Gi.(t)Gjk(t)k=l k=l

or

P(t) = F(t)P(t) + P(t)FT(t) + G(t)GT(t)

Show how this generalizes to the usual equation, with the last term above becoming G(t)Q(t)GT(t).for the case of general diffusion Q(t) instead of Q(t) == I.

11.13 Let x,(·,·) and x2 ( · , · ) be two n-dimensional solutions to separate Ito stochastic differ-ential equations driven by the same Brownian motion of diffusion Q:

dx,(t) = f,[x,(t), t] dt + G,[xj(t), t] d~(t)

dx 2(t) = f 2 [ x2(t), t] dt + G 2 [ x2(t), t] d~(t)

and consider scalar functions if!l[X,(t),t] and if!z[x2(t),t], each having continuous second partialderivatives with respect to their first argument and each being continuously differentiable in theirsecond arguments. Use the Ito differential rule (as applied to the augmented process [x, T : x/Y)to show

11.14 Let x(·,·) satisfy the general scalar stochastic differential equation

dx(t) = f[x(t), t] dt + G[x(t), t] dp(t)

PROBLEMS 209

with x(to) = 0 with probability one and with ~(".) as Brownian motion having diffusion Q(t) forall t.

(a) Let ljJ[x(t),t] = eax(t) and use the Ito differential rule to derive d[e ax(')]. Integrate this fromto to t to obtain a useful implicit formula for eax(,) as

ft 1eaX(') = 1 + jto eax(<){af[x(T),T] + 2a'G'[X(T),T]Q(T)) dt

+ s.~ aeaxl<IG[X(T), T] d~(T)

(b) Consider the function

ct(t) = exp{Jtt[x(t) - x(t')]}

and by applying the Ito differential rule and taking the conditional expectation, conditioned onknowledge that x(t') = P, derive an expression for the conditional characteristic function for x(t)conditioned on x(t') = p (i.e., the Fourier transform ofthe transition probability density .(,.(Olx(,')( ~ Ip))as

<PX(Olx(t')(ttlp)';' E{ct(t) Ix(t') = p}

= 1 + s.~ E[ct(T){jttf[X(T),T] - ~tt'G'[X(T),T]Q(T)}IX(t') = PJdT

Show that taking the partial of this with respect to time t and then taking the inverse Fouriertransform yields the forward Kolmogorov equation, the scalar version of (11-96).

(c) As another application of ljJ[x(t), t] = exp[cxtn}, consider the special case in which thestochastic differential equation has the form given by f[x(t),t] = -~P(t) and G[x(t),t] = k(t),where k(') is a square integrable deterministic function, and where a = 1, to yield

L(t) = exp {f tk(T) d~(T) _ ~ ft k'(T) dT}Jto 2 Jto

As it turns out, this is the likelihood ratio associated with detecting a signal k(') in additive whiteGaussian noise [33, 34]' By applying the Ito differential rule, show that L(t) can be expressed asthe Ito integral

L(t) = 1 + t k(T)L(T) d~(T)

Note that this demonstrates that L(',') is a martingale with E {L(t)} = 1. Also, show by induction

that the likelihood ratio can be expressed by the series representation

;, f' f'" f'2L(t) = 1 + 1... j,o j,o .,. j,o k(Td'" k(T")d~(Td ' .. d~(T")n=1

Moreover, from the Ito integral expression, show that

which implies that

since both expressions obey the same initial conditions and differential equations.


11.15 A scalar Brownian motion (Gaussian) process x(·,·) has the following statistics:

E{x(t)} = 0

E{[x(t) - x(t'W} = t - t' for t ~ t'

What is the transition probability fx(~, t;O,O)? Show that this transition probability satisfies theappropriate forward Kolmogorov equation.

11.16 Consider the scalar linear system with white dynamic driving noise, w(·,·):

Write the transition probability for x(t) and show that it satisfies the forward Kolmogorov equation.

11.17 A scalar Brownian motion process P(·,·) has the statistics

E{P(t)} = 0,

The process y( ., .) is defined as

y(t) = IP(t)1

P(O) = 0 w.p.!

Obtain the forward Kolmogorov equation for the P(·,·) process. Using this result, obtain an ex-pression for the derivative of the mean of y(t). Solve for the mean of y(t).

11.18 Consider the simulation issues raised at the end of Section 11.5, specifically pertainingto the simulation of the simple case in which G is a function only of time and not of x(·,·):

dx(t) = f[x(t), t] dt + G(t)dP(t)

This is specifically the class of problems for which extended Kalman filters were developed inChapter 9, and we wish to explore the truth model simulation for a Monte Carlo performanceevaluation tool [54].

(a) First recall the simulation of "equivalent discrete-time models" for linear stochasticdifferential equations, as discussed in Example 11.11 and Sections 4.9 and 6.8 and Problem 7.!4of Volume 1. Depict such a simulation in detail, specifying all appropriate transformations to beperformed on independent scalar unit-variance zero-mean white Gaussian sequences as producedfrom pseudorandom noise generators.

(b) For sample intervals I1t short compared to system characteristic times, first orderapproximations

cJ)(ti+ l' til ~ I + F(ti) I1t

r: T T [ T ]t cJ)(ti+ 1> T)G(T)Q(T)G (T)cJ) (t'+I' T)dT ~ G(ti)Q(t;)G (t;) I1t

often suffice. Show that these yield approximate simulation equations as

where wd( · , · ) is zero-mean white Gaussian discrete-time noise with

and discuss how to implement such a simulation.

(c) Show that this is equivalent to using Euler integration (first order numerical integration)to write

x(ti+ tl = x(t;) + [x(t;)] I1t

PROBLEMS

where specifically x(t,) is simulated according to

where wd ' ( . , . ) is zero-mean white Gaussian discrete-time noise with

211

(d) Show that the approximate method of (c) is directly extendable to simulating the non-linear differential equation at the beginning of this problem. Discuss implementation of this simula-tion, including Cholesky square roots or U-D covariance factorizations. In some available softwaretools [54], the stochastic driving term effects are simulated by such an Euler integration approxi-mation, while the simulation of the effects of the f[x(t), t] term are in fact handled through higherorder integration techniques, such as fifth order Runge-Kutta integration.

(e) Compare these simple results to the more complicated issues raised in Examples 11.11and 11.12 for the more general case of G being a function of both x(t) and t.

11.19 A scalar linear system is described by the stochastic differential equation

dx(t) = -2x(t)dt + d~(t)

where P(', .) is scalar Brownian motion with statistics

E{p(t)} = 0, E{[p(t) - W)]'} = Sit - 1'1

and the initial value of x(t) is known exactly as

x(o) = 3

(a) Derive an expression for fx(t)lx,o)(~lp) = j~(~, tip, 0).

(b) At discrete time points t, perfect measurements of the square ofx(t,) are available:

z(t,) = x(t,J2

Develop recursion formulas for calculating the mean of x(t,), conditioned on the history of mea-surements up to time t i .

(c) Given measurements z(l) = 4, z(2) = I, calculate the mean of x(2) conditioned on themeasurements.

CHAPTER 12Nonlinear estimation

12.1 INTRODUCTION

This chapter applies the nonlinear stochastic system models of the previouschapter to the problem of estimating the system state, given real-time dataavailable from the actual system and the products of our modeling efforts.Unlike the case of estimation based upon linear system models driven bywhite Gaussian noises, the full-scale estimator turns out to be generally intrac-table without imposing simplifying assumptions: propagating and updatingeither an entire density function for the state conditioned on observed mea-surements or an infinite number of describing parameters for that density is notimplementable, as seen in Section 12.2.

Section' 12.3discusses means of generating approximate conditional momentestimators. The appropriateness of approximations to be made is problemdependent. Taylor series representations for dynamics and measurement non-linearities, as expanded about the current state estimate, are combined withassumptions about the conditional state density to yield the truncated andGaussian second order filters. Computational and performance considerationslead to both modified forms of these and first order filters with bias correctionterms. Assumed density filters and higher order moment filters which do notdepend on such Taylor series representations are also developed.

Alternatives to minimum mean square error (MMSE), conditional moment,estimators are discussed in subsequent sections. Conditional quasi-moment, con-ditional mode, and statistically linearized estimators are considered.

Finally, Section 12.7 presents the estimation problem and solutions for thecase of continuous, instead of sampled-data, measurements being available.Analogies to the earlier sections are made in order to gain better insights intothe problem and corresponding filter design.

212

12.2 NONLINEAR FILTERING, DISCRETE-TIME MEASUREMENTS

12.2 NONLINEAR FILTERING WITH DISCRETE-TIMEMEASUREMENTS : CONCEPTUALLY

213

Consider a system described by the Ito stochastic differential state equation

dx(t) = f[x(t), t] dt + G[x(t), t] dfJ(t) (12-1)

subject to an initial condition specifying the density function for x(to), withfJ(·, .) being s-vector dimensional Brownian motion of diffusion Q(t) for allt of interest (see Eq. (11-74)). Note that (12-1) admits systems driven by knowndeterministic inputs u(t) for all time t as well as by fJ(·,·) simply by writingf[x(t), t] = f'[x(t), u(t), t]; feedback control in which u is a function of perfectlyknown x(t) and t is similarly admitted; feedback control as a function of currentnoise-corrupted measurements, the time history of such measurements, or astate estimate based on that time history, will be discussed in subsequentchapters. Let discrete-time measurements be available from the system, in theform of

z(ti ) = h[x(ti ), t;] + V(ti) (12-2)

with v(·,·) m-dimensional discrete-time white Gaussian noise independent offJ(·,·), with mean zero and covariance R(ti)for all sample times ti:E {V(ti)VT(t)} =R(t;)Jij. We desire to establish, if possible, the conditional density for x(t i) ,

conditioned on the measurements up to and including time t.: !x(t;)IZ(t;)(~ I,;!lJIf we could accomplish this objective, then we could define various estimatorsthat are optimal with respect to some specified criterion, such as the conditionalmean (minimum mean square error, or MMSE, estimate) or the conditionalmode (maximum a posteriori, or MAP, estimate).

Conceptually, this can be accomplished in the following manner. First con-sider the time propagation of !x(tlIZ(ti _d(~ I,;!li- 1)from sample time t.: 1 to timet.. As shown in the previous chapter, the solution to (12-1) is a Markov process,completely characterized by the transition probability density !x(~, tip, t'). Ifwe know f.(~,tlp,ti-1) for all t e [ti- 1,ti), then the conditional density!x(tlIZ(ti-d(~I,;!li-1) can be established for all t in that interval, and thus bepropagated through that sample period, via

!x(tl!Z(ti- J~ Ia : d = f:'oo f.(~, tip, ti- 1)!x(ti_dIZ(ti- d(P Isr..d dp (12-3)

using the Chapman-Kolmoqorov equation valid for Markov x( ., .) (see (11-50)-

(11-52)) and!x(t,_ dIZ(ti-tl(P I~i-1) from the previous measurement update. Thus,we could obtain !x(t;)IZ(t,-J~I~i-1) from !x(ti-dIZ(ti-J~I,;!li-d by using theforward Kolmogorov equation (11-96) to evaluatef.(~,tilp,ti-d and then em-ploying (12-3), or by attempting to use these same concepts to derive an ex-pression (and/or suitable approximation) to propagate !x(t)IZ(t,-tl(~I~i-1)directly. In fact, it can be shown that !x(t)IZ(ti _J ~ I~ i_1) itself satisfies the

214 12. NONLINEAR ESTIMATION

forward Kolmogorov equation in the interval [r.; b t;),starting from the densityf.(t,-tlIZ(t,-tl(~I~i-d obtained from the previous measurement update.

Now we wish to incorporate the measurement z(t;, Wk) = 'i that becomesavailable at time t.. As for the case of linear estimation, Bayes' rule can beapplied repeatedly to establish

The second numerator term was just discussed, and is provided by the previouspropagation. For the first numerator term, if v(',·) is assumed to be a whiteGaussian discrete-time noise of mean zero and covariance R(ti), then it canbe readily shown that

!z(tillx(t,), ZIt, _tl('i I~, a.. d = !z(tillx(til('i I~)

1(2nr/2IR(ti)ll/2

x exp{ -U'j - h(~,t;)YR-l(t;)['i - h(~,t;)J}

(12-5)

For the denominator in (12-4), the concept of marginal densities and Bayes' rulecan be combined to yield

!z(tilIZ(t, - tl('d sr.: d = f:'oo !z(t,l, x(tilIZ(ti - tl('i' eI~i-l) d~

= f:'oo !z(tillx(til('il~).r.(tilIZ(ti-tl(~I~i-l)d~ (12-6).Thus, knowledge of f.(tilIZ(ti- d(~ Isr..d from propagation and !z(tillx(til('il~)from (12-5) provide sufficient information to evaluate f.(t,lIZ(til(~ I~;) via (12-4),but the result entails an integration as indicated in Eq. (12-6).

Conceptually, we have all we need, but actual evaluations are not possible.The computation ofthe entire density functionf.(t;)IZ(ti)(~ I~j) is generally intrac-table because time propagations involve partial integro-differential equations(derived by means of the forward Kolmogorov equation) and measurementupdates involve functional integral difference equations (derived by means ofBayes' rule). One might attempt to generate a partial or approximate depictionof the conditional density as accurately as possible with a small number ofappropriate parameters. The evolution ofthese parameters would then constitutethe nonlinear filter for a given problem. However, as was foretold in Section 6of the preceding chapter, even the partial description of the conditional densityby these parameters will not be feasible without knowledge of the entire density

12.3 CONDITIONAL MOMENT ESTIMATORS 215

itself to evaluate certain conditional expectations. Said another way, the optimalnonlinear estimator will be infinite dimensional. To obtain practically feasiblealgorithms, expansions truncated to some low order or other approximationsare required both in the time propagation and measurement update of thenonlinear filter. The suitability of approximations to be made is problem depen-dent, giving rise to many estimator forms, but some general trends can and willbe evaluated in this chapter.

There are a number of possible means of parameterizing the conditional den-sity function. Expressing it in terms of a complete orthogonal series and then trun-cating the series at a specified order is a valid approach, and series of Hermitepolynomials will be particularly motivated and approximate filters in termsof the series coefficients, known as quasi-moments,will be developed. Conditionalmode filters will also be discussed. However, attention will be concentrated onparameterization via moments, and filters of this form are developed in the nextsection.

12.3 CONDITIONAL MOMENT ESTIMATORS

Consider the time propagation of the conditional mean and covariance,

x(t/ti-d ~ E{x(t)IZ(ti - 1) = Zi-d (12-7)

P(t/ti- 1) ~ E{[x(t) - x(t/ti_1)][X(t) - x(t/ti_d]TIZ(ti_d = Zi-d (12-8)

as generated for a particular measurement history realization Z(t i- 1,wd = Zi-l'(Note that a parallel development can be made by conditioning on Z(ti-l,')instead of Z(t i _ l' Wk), viewing the conditional expectation as a random variablemapping itself and thereby defining x(t/ti- 1 ) and P(t/t i- 1).) Between sampletimes ti- 1 and t., i.e., for all t E [ti- 1, ti), the densities f.(t)(c;) and f.(tlIZ(li _ t>(c; IZi - dboth satisfy the forward Kolmogorov equation, so the propagation of X(t/ti- 1 )

and P(t/t i _ d can be obtained from (11-99) and (11-100), replacing the uncon-ditional expectations with expectations conditioned on Z(t i _ b wk ) = Zi- 1 :

.:.. ----------x(t/ti- 1) = f[x(t), t] (12-9)

. ---- --------P(t/ti- 1 ) = {f[x(t), t]xT(t) - f[x(t), t]XT(t/ti_ 1 )}---- ....----------.+ {x(t)fT[x(t), t] - x(t/ti_ dfT[x(t), t]}---+ G[x(t), t]Q(t)GT[x(t), t] (12-10)

where

(12-11)

and this notation is to be adopted throughout this section. These are propagatedto sample time t, to yield x(ti-) ~ x(tdti-d and P(ti-) ~ P(tdti-d. To incor-porate the next measurement at time t., the conditional mean can be calculatedas

x(t/) ~ x(tdt;) = E{x(ti)!Z(ti) = Z;}

= f~<X) ~.r.(tdlz(tJ~IZ,)d~ (12-12)

where .r.(t,lIZ(t,)(~ IZ;) is obtained from (12-4)-(12-6), expressed equivalently as

x(t/) = S~<X)<X) ~!.(tdlx(t,)(zil~).r.(tdIZ(t'_1l(~IZ'-1)d~S- <X) !.(t,llx(t.lzi I~).r.(tdIZ(t, _ll~ Iz, -1) d~

~---= x(ti)!.(tdlx(t,)(z;j x(t;)/!.(tdlx(td(Zi Ix(t i») (12-13)

using the same notation as in Eq. (12-11). Similarly,

P(t,+) = f~<X) ~~T.r.(tdlz(td(~IZi)d~ - x(t/)XT(ti+)

---=--- ---------= X(ti)XT(ti)!.(t,llx(t.)/!.(tdlx(td - x(t/)xT(t/) (12-14)

The conditional expectations in Eqs. (12-9)-(12-14) require knowledge ofthe entire conditional density, i.e., knowledge of all higher order moments aswell as the first two. By making certain assumptions about the density and/orhigher order moments, approximations to these full-scale estimator equationscan be generated that do not require evaluation of an infinite number of mo-ments, yielding computationally feasible algorithms. One might assume thatthe conditional density is nearly symmetric so that third and higher order oddcentral moments are essentially zero, and also that it is concentrated sufficientlyclosely about the mean that the fourth and higher order even central momentsare small enough to be neglected: this set of assumptions gives rise to thetruncated second order jilter. (Scaling of variables may be required to ensurethat higher order even moments decrease in relative magnitude instead ofincrease.) Or, one might assume that the conditional density is nearly Gaussian,so that third and higher order odd central moments are again essentially zero,and the fourth and higher order even central moments can be expressed interms of the covariance: this yields the Gaussian second order jilter. Typically,sixth and higher order even moments are also assumed small enough to beneglected, and fourth order central moments are expressed in terms of secondmoments. These filters are based upon a further approximation of representingthe functions f and h in (12-1) and (12-2) via a truncated Taylor series expandedabout the state estimate, but there are also assumed density jilters that do notrequire such series approximations. Finally, there are also higher order momentfilters that maintain estimates of moments beyond second order, based upondifferent types of approximations such as cumulant truncation.


(12-16)

Let us first consider the truncated second order filter [7, 52, 54]. To aidunderstanding of basic concepts without the overburdening notation inherentin the vector case, we restrict our immediate attention to scalar systems withscalar measurements and driving noise (n = m = s = 1). First consider theapproximate evaluation of the conditional expectation of j[x(t), t] in Eq. (12-9).Perform a Taylor series expansion of j[x(t), t] about the conditional mean,i.e., current estimate, X(t/t i- 1):

oif[x(t/t 0 _ ) t]j[x(t), t] = j[x(t/ti_ 1),t] + (l~ 1, {x(t) - x(t/ti- 1 )}

+ ~ 02j[X~:~_1)' t] {x(t) _ x(t/ti-1W + . . . (12-15)

where aj[x(t/ti- 1 ), t]/ox denotes oj[x, t]/ox evaluated with x = x(t/ti- d, andso on. Taking the conditional expectation of this series carried out to secondorder yields

.......-------. ~ ~ 1 a2j[X(t/ti_ d, t]

j[x(t),t] =j[x(t/ti-1),t] +2 ax2 P(t/ti- 1)

When the conditicnal expectation is taken, recall that x(t/ti- 1) is a junction ofZ(t i _ d, so that, for example,-----j[x(t/ti- J )] ~ E{f[x(t/ti- d] IZ(ti - 1) = r.. d = j[x(t/ti- J)]

where x(t/ti_ d is a particular realization of x(t/ti_ d. Similarly,

{aY [X(t/ti- d , t] [ ~ ]2\ }E ox2 x(t)-x(t/ti_tl Z(ti-1)=Zi-l

a2j[X(t/t

i_ tl, t] {[ ~ ]21 }= ox2 E x(t) - x(t/ti-d Z(ti-d = Zi-l

= a2j[X(t/t

i_ J), t] P( [t, )ox2 tt,-l

In a similar manner, the expectations in the scalar P(t/ti - 1 ) equation, (12-10),can be shown to equal, to second order,

[tX- fX] = [Xt- .if] ~ oj[x(t/ti- d, t] P(t/ti- 1) (12-17a)ax

------- oG[X(t/t. ) t]2G[x(t), t)]2Q(t) ~ G2[X(t/t

i_ d, t]Q(t) + a~-l , P(t/ti- dQ(t)

a2G[x(t/to

_ ) t]+ G[x(t/ti-1),t] ox 2' J, P(t/ti-1)Q(t) (12-17b)


(12-18)

Therefore, the time propagation of the state estimate between ti-t and t, isaccomplished by solving

s. ~ lo2f[x(tjti_d,t]x(tjti-d=f[x(tjti-d,t]+- 0 2 P(tjti-d

2 x

. Q{'[x(tjto_d t] ~P(tjti-tl= 2 J Q~ , P(tjti-t) + G[x(t),t]2Q(t) (12-19)

---------where G[x(t), t]2Q(t) is given in (12-17b).Note the additional term in (12-18) compared to the extended Kalman

filter estimate propagation, Eq. (9-65). When nonlinearities are significant, thesecond order filter yields performance that is superior to that of a first orderfilter on the average, due primarily to this "bias correction term" [4] and asimilar term in residual generation for measurement updating. Ignoring thisterm does generally result in a more biased estimate, and this will be discussedmore fully later. The other difference from the extended Kalman filter is thecovariance differential equation driving term, given by (12-17b) rather thansimply G2(t)Q(t). In developing the extended Kalman filter, G was assumed tobe a function only of t, and not of x(t) and t. Under this assumption, (12-17b)reduces to G2(t)Q(t). Without this assumption, a truncated first order filterwould introduce the second ofthree terms in (12-17b) beyond the computationsof the extended Kalman filter, whereas the current second order filter introducestwo additional terms.

For measurement updating at sample time t., one could attempt to approx-imate the expectation integrations inherent in (12-13) and (12-14). This requiresa series expansion of Jz(tdlx(t,)(( I~), and it turns out that the filter is very sensitiveto the particular approximation employed [52,54,131]. Kramer [67] andJazwinski- [52,54] have suggested an alternate, better behaved approximationtechnique. We assume that the conditional mean and covariance after measure-ment incorporation can be expressed as a power series in the residual (innova-tions),

(12-20)

For computational tractability, this power series is truncated at first orderterms:

x(t/) = ao + adz(t;) - z(ti-n

P(ti+) = b o + bt{Z(ti) - z(ti-n

(12-21a)

(12-21b)

The structural form of (12-21a) is motivated in part by the first order filterupdate, x(t/) = x(t i-) + K(tXz(ti) - z(ti-)], and ao, at, bo, b t , and Z(ti-) areall random variables that are viewed as explicit functions of Z(ti _ d. Note that(12-21b) is a variance equation driven by a random residual forcing function,

thereby admitting negative computed values of Pit, +): this unacceptableattribute will be discussed in more detail subsequently, and will motivate analternate assumed form with b 1 set identically to zero.

Now the coefficients in (12-21) are determined by manipulation of Bayes'rule into an appropriate form. To do so, consider two scalar functions of randomvariables x(t;) and z(ti), respectively, t/J[.] and e[·]. Then the desired relation is

E{t/J[x(ti)]e[z(ti)]\Z(ti-d = ~i-d

= E{E{t/J[x(ti)]IZ(ti) = Z(t;,·)}e[z(t;)]IZ(ti_d = ~i-d (12-22)

where the inner conditional expectation is itself a random variable since it isconditioned on Z(t;,·). The validity of (12-22) can be demonstrated by takingEq. (12-4), which is equivalently written as

f, ( ;: I ~ ) - !x(t,).Z(I,)IZ(I'_lle,cl~i-d (12-23a)x(I,)IZ(I,) '0 i - {" (I' I~ )

}z(I,)IZ(I,-l) <'i i-I

or

and using this result to generate

E{t/J[x(tJ]e[z(t;)] Ize.. d = sr..d

= r; J~oo t/J(e)e(u!x(lil,Z(I,)lz(I,_,le'(il~i-ddedC

= J~oo[J~oo t/J(e)!x(tilIZ(lil(e I~i) de] e(U!z(t,lIZ(I,_ j)«(i Ia.. d«.= E{E{t/J[x(t;)]IZ(t;) = Z(ti,)}e[z(ti)]IZ(ti-d = ~i-d

as desired. Evaluation ofthe coefficient a o is now obtained by letting l/J[x(tJ] =

x(t i) and e[z(ti)] = 1 in (12-22):

E{x(ti)IZ(ti- 1 ) = .;'1'i-d = E{x(ti+)IZ(ti-d = .;'1'i-d (12-24)

Recognizing the left hand side as the realization x(t i -) and substituting theassumed form (12-2la) into the right hand side yields

x(ti-) = E{ao + a1[z(ti) - z(t i-)]JZ(ti-l) = .;'1'i-d

= ao + al[z(ti - ) - z(t i - )] = ao (12-25)

Note that conditioning on Z(ti - 1 , ' ) instead of .;'1'i-l would yield X(ti-) = a o'Similarly, a1 is obtained by letting t/J[x(t;)] = x(t;) and e[z(t;)] = {z(t i) - z(t i-)}in (12-22), bo by letting t/J[x(t;)] = {x(ti) - x(ti+W and e[z(ti)] = 1, and b1 byletting t/J[x(t i)] = {x(t;) - X(t/)}2 and e[z(ti)] = {z(t i) - Z(ti -)}. In this manner,


(12-27a)

(12-27c)

(12-27b)

the coefficients in (12-21) are evaluated as

ao = x ~ E{x(tJIZ(ti-d = :Zi-d = x(ti- ) (12-26a)..-...... ...... ~

a 1 = [xh - xh][(h - h)2 + Rr 1 (12-26b)

bo = Ptt, -) - a1 [fIX- fix] (12-26c)--------- ------::::-- ~ ~ ~b1 = {[(x - x)2h] - 2a1[(h - h)(x - x)h] + a/[(h - h)2h + Rh] - bah}~

x {(h - h)2 + R} -1 (12-26d)

using the notation of (12-11) and dropping time arguments for compactness.Now the various expectations in Eq. (12-26) are approximated by generatingTaylor series representations, truncating to second order, and taking condi-tional expectations of the resulting terms individually, to yield

~( -) -----------he() ] h[~( -) ] 1 o2h[x(t;-),t;] (-)z t, = X t, .t, ~ X t; .t, +~ '" 2 P t i.... ux

[Xh'-xh] ~ Oh[X~~-), t;] P(ti- )

~_ oh[x(ti-), t;]2 P(ti-)-~ o2h[x(t; -), t;]2 P2(t.-)ox 4 ox2 ,

1 o2h ~ { oh2 ~b1~-2P2(ti-) ox2 [(h-h)2+RJ- 1 1-3P(ti-)~[(h-h)2+R]-1

oh2 ------=::- [1 o2h2 ]-P(ti-)~[(h-h)2+RJ-2 4P2(t;-)ox2 +R

. ah 4 ------=::- }+2p2(t;- ) ox [(h-h)2+Rr 2 (12-27d)

where x(ti-) = x(tdti- d, Ptt, ") = P(tdt;_ tl, and in (12-27d) the expectation isevaluated via (12-27c) and h, oh/ox, and o2h/ox2 are all evaluated at [x(ti-), t;}

Combining these results in the assumed form (12-21) yields the updateequations for the truncated second order filter as

(12-28)

(12-29)

(12-30)


(12-31)

(12-33)

(12-34)

where hI is defined by (l2-27c,d). The same relations but with hI == 0 in (12-31)yield what is called the modified truncated second order filter update equations[54]. Compare these results to the first order extended Kalman filter updaterelations, (9-61)-(9-64). First of all, the truncated second order filter gainKTS(t;) has an additional term in the matrix being inverted, ATS(ti) versus[H(ti)P(ti-)HT(ti) + R(ti)]. Second, there is an additional term in the residualgeneration in (12-30), and this is the second "bias correction term" mentionedearlier. Finally, one is motivated to use the modified second order filter from acomputational standpoint as well as from a desire to preclude negative valuesfor computed Pit, "): the computational burden of the second order filter abovethat of the extended Kalman filter is severe (especially in the vector case),whereas the additional computations for the modified second order filter areconsiderably more manageable. Had the power series (12-21b) been proposedas only zero-order in the residuals, this modified second order filter would havebeen derived directly.

Now consider the Gaussian second order filter for scalar systems [38, 52,54, 102]. Whereas the truncated second order filter ignores all central momentsof x above second order, the Gaussian second order filter accounts for thefourth central moments as well, by approximating them with the values theywould assume if the density were in fact Gaussian:

(Xi - x;)(xj - x)(xk - xk)(XC- xc) = PijPkl + PikPjC+ Pi/Pjk (l2-32a)

where P is the conditional covariance matrix, or, in the scalar case,

--------------(x - X)4 = 3p2 (12-32b)

Following the same reasoning as for the truncated second order filter, the timepropagation of the estimate between sample times ti- 1 and ti is accomplishedby means of integrating the relations

~ ~ 1 e2f[x(tlti_ rl, t]

x(tlti-d=f[x(tlti- 1),t] + 2 ex 2 P(tlti-tl

. elf[x(tlt. 1) t] -------------P(tlti- d = 2 e~- , PUlti- l) + G[x(t), t]2Q(t)

forward from X(ti- llti- l) = XUt-l), PUi-llti- l) = PUt-I)' using the results ofthe measurement update at time ti - l . These are identical in form to (12-18)

222

----------and (12-19), except that G2 Q is now evaluated as

12. NONLINEAR ESTIMATION

(12-35)

(12-27c')

(12-27d')

--------[]2 2[ ] aG[x(t/ti_d,t]2G x(t), t Q(t) ~ G x(t/t j _ d,t Q(t) + ax P(t/t j - 1)Q(t)

a2G[x(t/t._ ) t]+G[x(t/ti _ 1),t] ax 2' 1 ' P(t/tj-dQ(t)

~ a2G[X(t/t

i _ 1),t]2 2( / )Q( )+ 4 ax2 P t t j- 1 t

where the additional fourth term is due to the difference in assumptions of thetwo filter formulations. As before, the approximation technique (12-21) sug-gested by Kramer and Jazwinski is applied to obtain the update equations formeasurement incorporation. The expectations involved in (12-26) are againapproximated, yielding (12-27a) and (12-27b) as evaluated previously, but(12-27c) and (12-27d) are replaced by

(~ ) 2 ~ ah[x(t j-),t;]2 P( .-) ! a2h[x(t

j-),t;]2 p 2( .-)- ax t, + 2 ax2 t,

a2h ....------::--. {9 ah 2 ....------::--.b1 ~ P2(t

i-) ax 2 [(h - h)2 + R]-1 1 - 2: P(t i - ) ax [(h - h)2 + R]-1

1 8h2

....------::--. [ 8h2

] }+2: P(t i-)ax [(h-ttf+R]-2 7P(t i-)ax +R

Note particularly that both the sign and magnitude of the second term in(12-27c') differ from those of (l2-27c). Thus, the measurement update equationsfor the Gaussian second order filter are:

(12-36)

KGS(ti) = P(ti-) ah[x~:),t;] ACii(t j ) (12-37)

~(+ ~(- ({ h[~( -)] 1 a2h[x(t j-), t;] P( -)} (1238)x t, ) = X t, ) + KGS tj) Zj - x t i .t, - 2: ax 2 t, -

P(tj+) = P(t

j- ) - KGS(ti) ah[x~~-),t;] P(t i-)

b { [ ~ - ] 1a2h[x(t

i-),t;] (-} 39+ 1 z, - h x(t j ), t, - 2: ax 2 P t, ) (12- )

with b1 defined by (12-27c',d'). Note that the basic form of the update is thesame as before, but with AGS(ti ) and b, evaluated differently. Equations (12-33)-


(12-39) but with b1 == 0 in (12-39) yield the modified Gaussian second orderfilter, with no random forcing function on the variance relations [4, 54, 72].Since these are only approximate equations, such a random forcing functioncould yield negative computed variances, especially if the dynamic drivingnoise strength Q(t) is small and operating time is long. For this reason, themodified filters are often the preferred form. Moreover, since its inherentapproximations are usually more applicable, the modified Gaussian is oftenthe better of the two possible modified second order filters. Note that any ofthe four second order filters reduce to the extended Kalman filter if G is afunction only of time and if J2f/Jx2 and J2h/Jx2 are neglected. Similarly, anyof the four reduce to the Kalman filter if dynamics and measurement modelsare linear.

EXAMPLE 12.1 Consider estimation of the scalar x described by

dx(t) = - exIt) + ax 3(t)] dt + [1 + bx2(t)] d~(t)

based on sampled data measurements of the form

z(t,) = sin x(t i) + vetil

with ~(.,.) Brownian motion of diffusion Q, and v(·,·) white Gaussian zero-mean discrete-timenoise of variance R, independent of ~(', .), and let x(to) be described with mean Xoand variance Po.

The first order filter for this problem would be described by time propagation from t'_1 to i, as

R(t/t i_ Il = - [x(t/t,_ Il + ax 3(t/ti_ Il]

p(t/ti- Il = -2[1 + 3ax2(t/ti_ Il]P(tjti- Il + [1 + bx 2(t/t

i_ Il]2Q

+ [2bx(t/ti_ Il]2P(t/t i_ IlQ

Note that the last term in the P(t/t i _ Il equation is [oG/OX]2pQ, a term that does not appear inextended Kalman filtering. The measurement update at time t, would be

P(t, -)[cos x(t i-)]K (t) = -=-:-----;O---"'c,...--:-:----'---'c~-=c

, P(ti)[cos? x(ti )] + R

x(t/) = X(ti-) + K(ti){Zi - sinx(ti-j}

P(t i+ ) = P(ti-) - K(ti)[COSX(ti-)]P(t,-)

The modified truncated second order filter for this problem has a time propagation of (12-18)-(12-20):

R(t/t i- I) = - [x(t/t i- Il + a.x3(t/ti_ I l] - 3ax(t/ti- I )P(t/ci- ,)

P(C/C,_I) = -2[1 + 3ax 2(c/ti_ Il]P(t/ti-Il + [1 + hx 2(t/c

i_Il]2Q

+ [2bx(t/t i_ Ij]2P(t/ti_,}Q + 2b[1 + b.x2(t/ti_ I)]P(t/ti_,)Q

and a measurement update of (12-28)-(12-31) with bl == 0:

K (t) = P(ti -)[cos x(t i-)]TS' [cos? x(t, )]P{t; ) - Hsin2 x(c, )]P2(t, ) + R

x(t i +) = X{ti-) + KTS(tJlzi - sin xrr.") + Hsin.X(t,-)]P(t,-j}

P(t/) = P(t,-) - KTS(t,)[cosx(t,-)]P(ti-)


The time propagation for the modified Gaussian second order filter would be the same exceptfor an additional term,

+ 3b2P2(t/l,_')Q

added to the ?(t/li- d equation. The update would differ from the preceding result only in theevaluation of the gain denominator:

P(t i -)[cos x(t i -)JKas(t,) = [cos? x(t i )JP(ti ) + Hsin2 x(t i )JP2(t

j ) + R •Second order filters provide performance generally superior to that of first

order techniques, such as the extended Kalman filter, especially when non-linearities are significant and noise strengths Q(t) and R(U associated with thesystem model, (12-1) and (12-2), are small [4,22,31,54,91,112,114,130,145].Usually, the primary benefit is due to the bias correction terms [4] in stateestimate propagation and residual generation:

i(t/ti- i ) = f[x(t/t i- d, t] + bp(t/t;_ d

Z(ti-) = h[x(ti-),t;] + bm(ti-)

(l2-40a)

(12-40b)

where the propagation bias term bp(t/ti- d and measurement bias term bm(ti -)are given by

(l2-41a)

(12-41b)

The performance enhancement to be gained is related to the size of the secondpartial derivatives and error variances in these expressions. Intuitively, thegreater the magnitude or harshness of the nonlinearity, the greater the benefitof accounting for the higher order terms. Moreover, the improvement is in factoften exhibited in the form of estimate bias reduction. This is as expected since,for example, the residuals {z(ti) - z(t; -)} using (l2-40b) are zero mean tosecond order, whereas {z(t;) - h[x(t;"), t;]} are zero mean only to first order.The importance ofthe bias correction term bp(t/ti_ d is reduced if the dynamicsequations are driven by a high strength noise, and similarly the significance ofbm(t; -) is lowered if there is considerable measurement corruption noise: biasesdue to neglecting higher order effectsare masked by the substantial spreading ofthe density functions. Looking at (12-28) or (12-36), it can be seen that measure-ment nonlinearities will be significant if {[82 h/8x2

] 2t-«,-)} is large comparedto R(t;), and empirical results [22,31,54,114,131] have corroborated thatnonlinear effectsare most important when noise inputs are small while estima-tion error variance is relatively large.

In fact, this is exploited in the tuning of filters by purposely adding "pseu-donoise" to account for model inadequacies such as neglected higher order


terms, as a frequently viable alternative to accounting explicitly for these termswith substantially more complicated filter computations [88]. Often a firstorder filter, as an extended Kalman filter, is designed and tuned for a givenapplication. If performance (as indicated by a Monte Carlo analysis [96, 100])is not acceptable, first order filters based on better models are considered as wellas higher order filters. When biases due to neglected nonlinearities make itessential to design higher order filters, the modified Gaussian second orderfilter is often developed first. However, especially in the vector case to follow,the full-scale second order filter may not actually be implemented. Rather, afirst order filter with bias correction terms, incorporating (12-40) and (12-41)but without altering the covariance or gain expressions of the first order filter,may be generated to obtain the essential benefit of second order filtering withoutthe computational penalty of additional, time-consuming (matrix) secondmoment calculations.

EXAMPLE 12.2 The first order filter with bias correction terms for the problem defined inExample 12.1 would have the same p(t/ti - 1 ) , K(t,), and Pit, +) relations as the first order filterresults depicted earlier, but with ~(t/t'-l) and x(t, +) equations as for either second order filter.

•First order filters with bias correction terms can also be established with co-variance and gain calculations as given by the extended Kalman filter, or evenas given by the linearized Kalman filter. It has also been suggested [3, 79] that(12-40) be incorporated into a first order filter design, with the biases treatedas additional states to be estimated. Many of these ideas will be brought outfurther in the examples to follow the development of the second order filtersfor the vector case.

The modified truncated second order filter [54, 130] for the vector case updatesthe state estimate x(t; -) with the measurement z(t;, Wj) = Zi at sample time t, via

'ATS(ti) = H[ti;x(t;-)]P(ti-)HT[t;; x(ti-jJ -l)m(t;-)l)mT(t;-) + R(t;) (12-42)

KTS(ti) = P(ti- )HT[t;; x(ti- )]ATS -1(ti) (12-43)

x(t/) = x(t i-) + KTS(ti){Zi - h[x(ti-),tJ - bm(ti~)} (12-44)

Ptr,") = Ptr,") - KTs(ti)H[ti;x(t;-)]P(t;-) (12-45)

where H[t;;x(t;-jJ is defined as the m-by-n partial derivative matrix:

H[ti;x(t;-jJ ~ oh[x,tJ I ~ _ (12-46)ox ,,=,,(ti)

and the bias correction term I)m(t; -) is the m-vector with kth component given as

fj (t.-)~~t {o2 hk[X(t;- ), tJ p (t ._)} (12-47)mk, 2 r ox2 ,

These relations are direct extensions of (12-28)-(12-31), (12-40b), and (12-4lb).The estimate is propagated forward to the next sample time t.; 1 by using the

226

initial conditions provided by (12-44) and (12-45):

x(tiJtJ = x(ti+)

P(tiJti) = P(ti+)


(12-48a)

(12-48b)

(12-54a)

(12-54b)

and integrating

i(t/ti) = f[x(t/t i), t] + bp(t/ti) (12-49)

P(t/t;) = F[t; x(t/ti)]P(t/ti)+ P(t/ti)FT[t; X(t/ti)] +~(12-50)

where the bias correction term bp(t/ti) is the n-vector with kth component

bpk(t/ti) ~ ~ tr {a2fk[~~{t;), t] P(t/tJ} (12-51)

F[t; x(t/ti)] is the n-by-n partial derivative matrix

F[t; X(t/t i)] ~ af[x, t] I (12-52)ax x=x(t!t;)

and the last term in (12-50) is an n-by-n matrix with ij element as (droppingtime and x(t/tJ arguments for convenience):

[ ~ ] ~ ~ [ T {(aGikT

iJGL)}GQG ij = L. L. GikQklGlj + tr -a- Qkl -a Pk=11=1 X x

1 {a2G!j}

1 {iJ2G

ik} TJ+ "2 GikQkl tr ax2 P +"2 tr P ax 2 QklG'j (12-53)

where s is the dimension of the dynamic driving noise. Upon integrating theseto the next sample time, x(ti-+ 1)and P(t i-+ tl are defined as

x(ti+d = x(t i + tlti )

for use in the next measurement update. These equations are similarly directextensions of(12-17)-(12-19), (12-40a), and (12-41a). Note the direct comparisonbetween this filter and the extended Kalman filter given by (9-61)-(9-69).

Now consider the modified Gaussian second order filter for the vector case[4, 54, 72, 130]. The measurement update at time t, is given by

AGs(tJ = H[ti;X(ti-)]P(ti-)HT[ti;X(ti-)] + Bm(ti-) + R(ti) (12-55)

KGS(ti) = P(t i-)HT[ti;x(ti-)]AGi(t;) (12-56)

x(t/) = x(ti-) + KGS(ti){Zi - h[x(ti-),t;] - bm(ti-n (12-57)

Pu,") = P(t i-) - KGs(t;)H[ti;X(ti-)]P(ti-) (12-58)

where H[ti;x(ti-)] and bm(ti-) are as defined in (12-46) and (12-47), and Bm(ti-)


is an m-by-m matrix with the kIth element as

B (t.-) ~ ~ t {82h

k[X(t i - ), tJ P(t.-) 82h

1[ x (t i - ), tJ P(t.-)} (12-59)mkl, 2 r ilx2 I 8x2 I

Note the difference in form between this and the corresponding term in (12-42);both reduce to a scalar times [82h/8x2yp2(t

i - ) in the scalar case. These resultsare the vector case of (12-36)-(12-39). The propagation relations correspondingto the scalar case of(12-33)-(12-35) are identical to (12-48)-(12-52) and (12-54),

~

but an additional term appears in GQGT as compared to the previous result(12-53). To express the result conveniently, define

G'[x(t), t] ~ G[x(t), t]Ql/2(t) (12-60)~~

as in (11-94), and then GQGT = G'G'T is an n-by-n matrix with kj element as[54; 85, Appendix E] (see Problem 10.9)

[GQGT]ij = ±[G;kGk} + tr {(8G;k T 8Gk}) p}k=l ox ox

1 , {82G

k}} 1 { 02G;k} ,T+.2 Gik tr 8x2 P +.2 tr P 8x2 Gkj

1 {02G;k } {02Gk} } ~ {02G;k o2Gk} }]+ 4 tr ox2 P tr 8x2 P + 2 tr 8x2 P 8x 2 P (12-61)

Note that the first four terms replicate (12-53), and the last two terms correspondto the last term of (12-35) for the scalar case. These filter relations are alsodirectly comparable to the extended Kalman filter, (9-61)-(9-69).

EXAMPLE 12.3 To demonstrate the performance difference between a first order (extendedKalman) and the modified Gaussian second order filters, a problem [4] that has significant non-linearities in both the state dynamics and measurement equations is explored. Consider the esti-mation of the altitude, velocity, and constant ballistic coefficient of a vertically falling body, asshown in Fig. 12.1.Sampled-data measurements of range (only) are provided by a radar, corruptedby wideband noise assumed to be well modelled as white. Note in the figure that the body is assumed

H

FIG. 12.1 Geometry of estimation problem.From [4], © 1968 IEEE.

Body

@~Range/j

ret)

/ Xt')Radar location.- M--;

I :::, Xl(t)Altitude

,,i!I


to be falling vertically to simplify this example, the horizontal miss distance M between the radarlocation and the trajectory of the body is assumed to be known (100,000 ft), the altitude H of theradar is to be fixed at one of three values corresponding to three different scenarios, and that verticalvelocity X 2 is defined as positive downward while altitude XI is positive upward.

Assuming the effect of gravity to be negligible compared to atmospheric deceleration as a furthersimplification, the two basic state equations become

)(I(t) = -x2(t )

[CoAP(t)]

)(2(t)= - ~ x/(t) + W(I)

where CD is the constant drag coefficient of the body and A is its reference area used for drag eval-uation, m is its constant mass and p(t) is the mass density of the atmosphere. The air density isapproximated by the exponential function

p(t) = poexp{ -1'XI(t)}

where Po is sea level density (1.22 kg/m' = 0.076 lb.ft ') and l' = 5 x 10- 5 ft - I. Thus, if we definea third state variable X3 ~ [CoApoJ/[2mJ as a constant whose value is not known precisely apriori, inversely related to the ballistic coefficient fJ ~ mglCoA as X 3 = gPo/2fJ, the state equationscan be written as

)(1(1) = -x2(t) ~ I,

)(2(t) = -x22(t)x

3(t)exp{ -yxl(tJ} + w(t) ~ 12 + w

)(3(t) = 0 + w'(t) ~ 13 + W'

w(',·) and w'(',·) are independent zero-mean white Gaussian noises of strengths Qand Q', respec-tively, and can be considered "pseudonoises" for tuning purposes (Q = Q' = 0 for the "truth model"in simulations to follow). These state equations can be put into the form

dx(t) = f[x(t)J dt + G dJJ(t)

i.e., time invariant and with G not a function of x( " -),Radar measurements, assumed to be available once a second, are described in terms of variables

of Fig. 12.1 as

with v( " .) white Gaussian discrete-time noise of mean zero and

E{V(t;)2} = R = const = 10,000 ft2

The purpose of allowing H to be set at 0, lOOK ft, or 200K ft is to alter the effect ofthe measurementnonlinearity relative to the dynamics nonlinearities. As seen in Fig. 12.1, when the target bodypasses through the radar location altitude (i.e., x, = Hj, the range data is least useful in updatingthe state estimates; this is an observability problem in that variations in XI are imperceptible fromrange at XI = H, and little information is provided about XI from range when XI is nearly equal toH. Figure 12.2 plots the altitude and velocity time histories of a body with true X3 value set at3 x 10- 3 ft- I

, and thus these time periods ofleast useful data were at about 5 sec for H = 200K It,10 sec for H = lOOK ft, and at the end of simulation time for H = O.

True initial conditions were x 1(0) = 300K ft, x 2(0) = 20K ft/sec, and x 3(0) = 3 x 10- 3 ft- 1,

while the filter assumed initial conditions modeled as independent Gaussian random variables,with means and variances

x,(O) = 300K ft

x2(0) = 20K ft/sec

x3(0) = 3 x 10- 5 ft- I

Pl1(O) = 106 feP 22(0) = 4 X 106 ft2/sec2

P 33(0) = 10-" ft - 2

3025

X3 = 3 x 10- 3 20,000

o

100.000

a-S 300.000;;

] 200,000

~

10 15 20

Time (sec)

FIG. 12.2 Altitude and velocity time histories for true target. From [4], © 1968 IEEE.

Thus, the initial state estimates of altitude and velocity were exact, while the estimate of x 3 wasinitially very poor. Simulation results revealed that large initial errors in Xl and Xl were reducedsubstantially after only one or two measurements (respectively), so good initial values were usedhere to allow reasonable resolution of plotted error statistics. The error in the X 3 estimate corre-sponded to a light true target (fJ ~ 16.6) while the filter assumed it to be heavy (fJ ~ 1660). Notefrom Fig. 12.2 that correct estimation of X 3 cannot take place until the body descends into the denseratmosphere and deceleration characteristics allow the body's density to become apparent: the mostnotable dynamics nonlinearities started taking effect at about 9 sec, with maximum decelerationoccurring between 9 and 16 sec in the simulation. Until this time, one could expect little differencein performance between estimators of different order.

A Monte Carlo analysis involving 50 simulation runs (each having different measurementnoise samples) was conducted to evaluate an extended Kalman filter versus a modified Gaussiansecond order filter (using the same 50 noise history samples for each). To accentuate the differences,Q and Q' were assumed zero in the filters as well as in the true simulations. For each run, the errorvector [x(t i) - X(ti+)] was generated for all t., and true average and rms values computed as samplestatistics. These actual rms errors can be compared to the filter-computed Ptr, +) value as well.

Figures 12.3-12.5 plot the absolute value of the average errors in Xl (altitude), Xl (velocity),and X 3, respectively, for the case of the radar located on the ground (H = 0). Here the measurement

I'"~I •I •

I \ p-q, p! ~Ai><f b'-qq

: \".o-q,I CI

:I

?First order filter

:

70

,;:.60~g

uu

50"0

"~~ 40

~>os

30~

0u

"OJ 20>uE]«:

0 6 12 18 24 30

Time (sec)

FIG. 12.3 Average altitude estimation errors of the two filters (H = 0). From [4], © 1968IEEE.

30241812

I'~: 'I, ., .

\ First order filter

~····\'I,

ll,~Q,

0.0.

6o

230

80

U~ 70~

~ 60;>,

E0"il 50.."blle" 40s-ee'0"" 30-;;;..s""0.E«

Time (sec)

FIG. 12.4 Average velocity estimation errors of the two filters (H = 0). From [4], © 1968IEEE.

.~

G<~ First order filter

q, I~~

FIG. 12.5 Average parameter X3 estimationerrors of the two filters (H = 0). From [4], © 1968IEEE.

10- 6Second order filter

I

Time (sec)

12.3 CONDITIONAL MOMENT ESTIMATORS

250t '2240 "J.:..180

....",. r.:. 0' , ~

.... ! ~ 'O-'l.t>-<>.g 0 : ~

150 I I ,0) 0 . b.0)

0 I n,<>'l.0 ,-e 9.~ ~, b...~ I,0)

,on

,e IFirst order filter0)

100>"....00)

,::l ,

'" 0> ,B !::l-0 50'".0-c

231

Time (sec)

FIG. 12.6 Average altitude estimation errors of the two filters (H = lOOK ft). From [4], ©1968 IEEE.

nonlinearity was more pronounced towards the end of the trajectory, after the maximum decel-eration region (period of most significant effect of dynamics nonlinearities). As expected, the per-formances were essentially indistinguishable until about 9 sec, after which the second order filterachieved an order of magnitude better estimate of X 3 and better precision in the other estimatesas well.

Figure 12.6 is directly comparable to Fig. 12.3, but for the case of H = lOOK ft. Here the mea-surement nonlinearity and observability problem were most severe at about 10 sec, causing estimatedegradation. Upon recovery, the second order filter again outperformed the first order significantly;an order of magnitude improvement in X 3 estimation was achieved as in Fig. 12.3.

Figure 12.7 plots the same results for H = 200K ft, for which the measurement information wasleast useful at 5 sec, resulting in increased estimation errors at this time. Again, an order of magnitudeenhancement was realized in X3 estimation by using the second order filter. Figure 12.8 depicts thecorresponding actual rms altitude errors and the average second order calculation of ~; thefirst order filter calculations were within one percent of these computed values. Thus, the markedimprovement in performance of the second order filter did not appear to be due to the more com-plex covariance and gain calculations. To corroborate this, an extended Kalman filter with biascorrection terms, as given by (12-44), (12-47), (12-49) and (12-51), was evaluated, and the resultswere virtually the same as those of the second order filter.

In Fig. 12.8 it can be seen that the filter-computed error covariance was a better representationof true error covariance in the second order filter than in the extended Kalman filter, but both filtersunderestimated their own errors. This was due to setting Q and Q' to zero in the filters. Because ofits better internal model (second order terms included), the second order filter would require lesspseudo noise addition for tuning. Nevertheless, both filters would yield improved precision, and


140

~ 120g......~ 100

-;;~ 80~>'"'0 60...:I-;;>...:;"0~ 20

o

.Q.

,P 'o..'tT.Q'O~~ '0.

/ 'b.o.'t>'O.II 'Q, '0..

: First order filter

i-:~,,,,,

I,,

30

Time (sec)

FIG. 12.7 Average altitude estimation errors of the two filters (H = 200K It), From [4],©1968 IEEE.

500

First order filter (actual)

I,P.l1Q'b

o"-00000000-

000

Second order filter (actual)

I

400

g300

0::......"0.g~E 200.

cr::

100

o 10 20 30 40

Time (sec)

FIG. 12.8 Actual RMS altitude errors and filter-computed values (H = 200K ft). From [4],© 1968 IEEE.

their differences would be reduced, by adding enough pseudo noise to each to accomplish propertuning (i.e., yielding smallest rms errors; matching computed variances to true rms errors is moreconservative than matching to true error variances, but still does not guarantee elimination ofbiases and serious performance degradation in the lower order filter). •

EXAMPLE 12.4 Residual monitoring can be used in an approximate fashion for reasonable-ness checking, sensor failure detection, and adaptation in nonlinear filtering, as discussed in Volume1 for linear filtering. By monitoring the residuals in real-time operation, one can determine if theyare well approximated as white, zero-mean, with covariance given by ATS(t,) or Aas(t,).

Consider the outputs of an aircraft's flight control system rate gyros and Euler angle orientationinformation provided by either the inertial navigation system (INS) or attitude and heading referencesystem (AHRS). Inherent functional redundancy among the signals from these different sensorscan be exploited rather than resorting solely to hardware duplication to achieve fault detectionand high reliability in the overall data system [12, 29, 86-88, 92, 93].

If wx , wy ' and W z are roll rate, pitch rate, and yaw rate, respectively, as would be measured bythe rate gyros, and if 1jJ, 8, and ¢ are Euler angles of yaw, pitch, and roll relative to a reference co-ordinate frame, then the functional relationships among the variables are

iJ = wycos¢ - wzsin¢

cj, = W x + wy tan 8 sin ¢ + W z tan 0 cos ¢

ifJ = [wysin¢ + Wzcos¢]/cos 8

These provide the basis of a dynamics model by letting x = [e, <1>, IjJY and letting the rate gyrooutputs be denoted as

u. [::] ~ [::] + [::1. W,,"" +.

where w(',·) is zero-mean white Gaussian noise with E{w(t)wT(t + r)} = Qo(r). Then Wx can bereplaced by [u, - WI]' and so forth, in the previous differential equations, and a realization ofu(',·) would be available as real-time rate gyro data. Thus is generated

[Xl] [ (u2-w2)cosx2-(u3-w3)sinx2 ]x2 = (ul-wd+(u2-w2)tanx,sinx2+(u3-w3)tanx,cosx2

x3 [(u 2 - w2)sinx2 + (u 3 - w3)COSX2]/cOSX I

or

dx(t) = f[x(t), u(t)] dt + G[x(t)] dll(t)

The discrete-time attitude measurements from the INS can be modeled as true values plus whiteGaussian noise:

Thus a three-state nonlinear filter can be generated to combine rate gyro and INS outputs. Aseparate filter of identical structure (but different R) can combine rate gyro and AHRS data. Ifabnormal residual growth is confined to the first filter, the failure can be isolated as occurring inthe INS; if only in the second filter, the failure can be declared in the AHRS; if residuals in bothfilters are exhibiting abnormal behavior, the fault most likely occurred in the rate gyros, since theiroutputs feed both filters (simultaneous failures in the INS and AHRS are of much lower probability).

As discussed in Section 5.4 of Volume I, a failure can be indicated by a likelihood function,defined in terms of a conditional joint density for the N most recent values of a particular residualcomponent, passing a predetermined threshold. For the kth residual component, this becomes

LN.(tJ= I In'/;,(tjll"(l,.d.. . ",('d(Pilpi-I, ... ,pdj=i-N+ 1

234

Time(sec)-

10


FIG. 12.9 Roll state likelihood function during turn: no failures. From [87], © 1976 IEEE.

'"~-1600

FIG. 12.10 Roll state likelihood function during turn: INS gyro float leak. From [87], ©1976 IEEE.

using the approximating description of the residuals as zero-mean, white and Gaussian, with (Jk2 ascomputed through ATs(t;) or Aas(t;). By appropriately choosing N and the threshold for each LN.'false and missed alarms can be minimized for "hard failures," those which cause abrupt, largechanges in LN~'

"Soft failures" require more sophisticated detection logic. Figure 12.9 depicts the likelihoodfunction corresponding to the second residual (on the roll state t/J) of the INS/rate gyro filter duringa turn, with no failures, in a simulated flight (actually for a first order filter, but representative ofsecond order filter characteristics as well). The transients seen in the figure correspond to times ofrapid roll rates (snap rolls) at the beginning and end of a coordinated turn. At these times, the trueresidual magnitudes far surpass the statistical characterization ((J/(t;) terms), particularly becauseof the very simple dynamics model and first order integration method employed in the filter forcomputational simplicity. Essentially, the filter's internal model does not provide an adequaterepresentation of the dynamics at these times, and the likelihood function undergoes a transientmagnitude growth. Figure 12.10 portrays the same trajectory simulation, but with an INS gyrofloat leak "soft" failure. Although the failure characteristic is evident, the growth due to this truefailure never exceeds the magnitudes encountered in normal operation.

One solution to this difficulty is "time-to-failure-declaration" parameters, specifying a timeinterval length over which a threshold must be surpassed consistently before declaring a failure.This allows tighter thresholds, and thereby a higher sensitivity to "soft" or hard-to-discern failures,without causing false alarms due to transitory threshold surpassing. Moreover, a second threshold,of magnitude greater than that attained under any normal circumstances, can be associated witheach likelihood function to allow immediate declaration of "hard" failures.

More sophisticated methods, as using generalized likelihood ratios [30, 32, 34. 143, 144], havealso been applied to this problem. •

Higher order filters can be generated by including higher order terms fromthe Taylor series expansions of f and h. However, the severe computationaldisadvantage makes such filters unattractive, and first or second order filtersbased on better models are preferable.

In some cases, it is beneficial to utilize Eqs. (12-9)-(12-14), and the vectorcase equivalent of (12-21) and (12-26), without such series representationsfor f and h, instead evaluating the needed conditional expectations based onan assumed conditional density function form [44, 68-70]. Typically,J.(t>!Z(ti- .« :!Li-l) is assumed to be Gaussian with mean x(t/ti-t) and covarianceP(t/ti-t) as computed within the algorithm itself. This is especially convenientwhen nonlinearities simply involve powers of x (or products of componentsXj, Xk, etc., in the vector case). Moreover, since no Taylor series for f and harerequired, these functions are not required to be differentiable with respect tox. Thus, this form of assumed density filter is applicable to problems involvingsuch discontinuous nonlinearities as relays and saturations, for which theprevious filters are not suitable.

EXAMPLE 12.5 Consider the problem of Examples 12.1 and 12.2. For propagations, (12-9)and (12-10) yield

R=f =-[x+a?]. .......... ............-:---.. - - ,..... _..-..

P = 2fx - 2fx + G2 Q = - 2[x 2 + ax4] + 2[x + ax 3]x + [1 + 2bx 2 + b2x4]Q

Now expressing 0 and 0 under the assumption that f,jlllZjl,- tl(~ Isr.. d is Gaussian, i.e.,

o= f~oo Cf,jlllZjl'_d(~I:!l'j-dd~ = 3xP + x3

o= f~oo ~4f,(t)IZ(I. ,)(~I:!l"-I)d~ = 3p2 + 6x 2p + i 4

results in an estimator given by

i = - x - ai3- 3axP

p = - 2[1 + 3ai2 + 3iP]P + [(1 + bi2) 2 + 2bP + 3b2p 2 + 6b2i2 P]Q

This agrees with the Gaussian second order filter of Example 12.1 except for the appearance of the3iP in the first bracketed term of P, the difference due to retaining versus neglecting the last term of

and yielding a term that is dominated by the other terms in those brackets.Measurement updating can be accomplished by using (12-13) and (12-14), and evaluating the

conditional expectations under the assumption that f,</d[Z(/, _d(~ I~j_ 1) is Gaussian. However, it isusually better to invoke the residual power series assumption (12-21),with b, '" 0, producing (12-26),in which the conditional expectations can be evaluated under the Gaussian assumption. Ifh[x(t j ) , ti]had been a simple power of x(t,) instead of sinx(t,), these results could be written down without


further approximation, but here such terms as fi must be left in the form

f = f_oooo(sin~)fx(',JlZ(';_j)(~IZi-l)d~

to be integrated numerically or approximated in some fashion. The results are, using x and P aspropagated from the previous relations,

x(t,+) = x + [Xh - xh][~ + R]-l{Z(t,) - h}

P(t,+) = P - [Xh - xh][~ + R]-l[Xh - xh] •

Higher order moment filters can be generated by writing explicit momentequations as (12-9)-(12-14) and higher order moments as well, again using theforward Kolmogorov equation (11-96) or Ito differential rule (11-84) to createpropagation relations and Bayes' rule to establish conditional density updaterelationships. For instance, by taking the conditional expectation of (11-84)and using the notation of (12-11), we get

------ »r>: ~ - -

dljJ[~~t),t] = a~ + ~~ f[x(t),t] +~tr{G[x(t),t]Q(t)GT[X(t),t] ~:~} (12-62)

and by letting the scalar ljJ[x(t),t] be successively set to xit), [xAt)xk(t)].[Xj(t)xk(t)Xt(t)], etc., individual scalar components of all order moments canbe generated. Then an assumption about the densities or moments wouldagain be involved to allow approximate estimator algorithms to be developedin implementable form. One useful technique is cumulants truncation [97, 110,111,120], and in fact the Gaussian second order filter can be viewed as a specialcase of truncated cumulants filter.

Consider the scalar case. In both second order filters discussed previously,we wrote expressions for various terms assuming that higher order centralmoments associated with f'x(tJIZ(t,l¢ I:!Z;) could be ignored. These central momentsare the coefficients of a Taylor series for the conditional characteristic functionfor x(t) minus its conditional mean x(t),

E{eh.[X(tl-X(tJ1IZ(t;) =:!Zd

while noncentral moments are the coefficients of a Taylor series for the condi-tional characteristic function

E{ei/lX(tl!Z(ti) = :!Zi}

Cumulants are the coefficients of a Taylor series for

InE{ej/lX(tljZ(ti) = :!Zd

It is often a better approximation to set higher order cumulants to zero thanto ignore higher order moments (especially if [x(t) - x(t)] realizations aregenerally larger than one, so that successive higher order moments are largerrather than decreasing toward zero). If x(t) is Gaussian, then the first two

cumulants are the conditional mean and variance, and all higher order cumu-lants are identically zero (not just "negligible"): for x(t) Gaussian, the charac-teristic function is exp{jJ.Lx(t/t;) - tp(t/t i)J.L

2} so that

InE{eillx(tl!Z(t i) = ~;} =jJ.Lx(t/t;) - tJ.L2p(t/t;)

00 (jJ.L)k= k~l £k ---r! (12-63)

with £1 = x(t/tJ, £2 = P(t/ti ), £3 = £4 = ... == O. Thus, by assuming £k = 0 forall k> 2, we are in fact making the same Gaussian density assumption usedpreviously. If we truncate at a level higher than k = 2, we obtain higher ordernon-Gaussian corrections that are appropriate in the sense that cumulants dogo to zero with increasing order (though not necessarily monotonically). Oneis not guaranteed of uniformly improving performance by going to higher order,and performance evaluations and comparisons must be accomplished viaMonte Carlo simulations in general.

In the general scalar case, the cumulants £k and noncentral moments mkcan be directly related by exponentiating the series for [In E {eillX(I) IZ(t i) = ~i}]and equating coefficients to those of like powers in the series for [E{ei /1X(I) IZ(tJ = ~i}], yielding [26]

£1 = m1 = m1

£2 = m2 - m12 = m2c

£3 = m3 - 3m1m2 + 2m13 = m3c

£4 = m4 - 3m22 - 4m1m3 + 12m/m2 - 6m14 = m4c - 3m'?

cs = = msc - 10m2cm3c

(12-64)

as expressed also in terms of central moments mkc

• Note that setting £3 and £'4

to zero equates the third central moment to zero and invokes (12-32b), respec-tively.

EXAMPLE 12.6 Reconsider the problem introduced in Example 12.1. Letting the kthconditional moments be denoted as mk(t), (12-62) yields successively, for IjJ = X, x2

, x':

ml =1 = -[ml +am3]

m2 = 2Xt + G2Q = -2[m2 + am4] + [1 + 2bm2 + b2m4]Q

'2' ~ 2 ]m3 = 3x f + 3G Qx = - 3[m3 + ams] + 3[ml + 2bm3 + b ms Q

Expressions for m4 and ms can be generated for a third order truncated cumulant filter by settingC4 and <s to zero in (12-64), i.e.,

m4 = 3m22 + 4m,m3 - 12m!2m2 + 6m[4

ms = 10m2m3 + 5mlm4 - 30mlm/ - 20m!2m3 + 60m!3m2 - 24m!s

Substituting these into the differential equations above define the propagation relations for thefilter. Measurement updates can be generated, for example, by performing power series expansionsas in (12-21), using Bayes' Rule as in (12-26), and employing cumulant truncations to express theresulting moments.

Note that a second order cumulant truncation filter time propagation would be based on therill and «. equations above, with m4 expressed as already given but m3 provided by setting £3 = 0in (12-64) as well:

This yields

rill = -ml - 3amlm2 + 2aml3

ril2 = -2[m2 + 3am22 - 2am14] + [1 + 2bm2 + 3b2m/ - 2b2m j4]Q

rilj = -m j - aml3 - 3am jP

j> = -2[1 + 3aml2 + 3aP]P + [(1 +bm/f + 2bP + 3b2p 2 + 6b2m/P]Q

which agrees with the result of Example 12.5. •

Finally, the results of this section can also be applied to the case in whichthe dynamics are described by a discrete-time (or discretized) model [2, 40, 50],instead of(12-1),

x(t;) =t!>[x(ti-d,u(ti-d,ti-t] + Gd[X(ti-d,ti-t]Wd(ti-d (12-65)

with wd ( " , ) discrete-time, zero-mean, white Gaussian noise with covariance

(12-66)

The true conditional mean and covariance would propagate as (using thenotatiop. of(12-11»):

where

 [x(ti-t), u(ti-t), ti-t]

Gd ~ Gd[X(ti-t), ti-t]

(l2-67a)

(12-67b)

(l2-67c)

(l2-67d)

These would replace (12-9) and (12-10), again giving only a partial descriptionof the conditional density, and again requiring approximation for developmentof implementable estimation algorithms. Appropriately truncated Taylor seriesfor ljJ and Gd can be developed, or assumed density filters evaluating the requiredconditional expectations in (12-67) as though the conditional density wereGaussian and of mean x(tt-t) and covariance P(tt-t) can be generated.

12.4 QUASI-MOMENTS AND HERMITE POLYNOMIAL SERIES

12.4 CONDITIONAL QUASI-MOMENTS ANDHERMITE POLYNOMIAL SERIES

239

The previous section discussed parameterizing the conditional density.r.(t,)IZ(I;)(~ I;;Xi) with moments to attempt to achieve useful approximations forestimation, but there are other possible parameterizations as well. One couldgenerate a complete orthogonal series representation and then truncate theseries at a specified order and perhaps make other approximations as requiredto obtain an implementable algorithm. It is particularly appropriate to considerHermite polynomial series and the associated parameterization via quasi-moments, as will now be presented [20, 26-28, 38, 47, 59, 65, 66, 76-78, 116,131, 132, 135].

To motivate Hermite polynomials, consider the scalar process exp{~(t) - it},as discussed in Examples 11.5 and 11.6, where ~( " .) is scalar Brownian motionwith unit diffusion. For Ito integrals and stochastic differential equations, thisexponential plays the same role as exp{f3(tj} would if 13(-) were a deterministiccontinuous function of bounded variation. By applying the Ito differentialrule, it can be shown that

exp{~(t) - it} = f H;[t~~(t)Ji=O l.

where Hi[t, ~(t)] is the Hermite polynomial defined for i ~ °as

Hi[t, ~(t)J = (- t)iexp{ - ~2 /2t} oi[exp{ - ~2 /2t}]/oW

This can be compared to the case involving deterministic f3( .):

00 f3(t)iexp{f3(t)} = I -.-,

i= 0 l.

(12-68)

(12-69)

(12-70)

It can be seen that in Ito theory, the Hermite polynomials are the counterpartsof the ordinary powers for deterministic functions [59]. Thus, when dealingwith Ito nonlinear stochastic differential equations, instead of using moments

(12-71a)

or central moments

(l2-71b)

to parameterize the density fx{I)W, it might be more natural to use "quasi-moments" defined as

(12-72)

240

where


or defined recursively as

Ho(~) = 1

Hl(~) = ~

(12-73a)

(12-73b)

The extension to conditional quasi-moments would be obvious.Now consider expanding some density of interest, .r.(t)(~), in a series repre-

sentation, and let f.°(tl~) be some reference density. An arbitrary (measurable,square integrable, real) density function .r.(t)(~) can be expressed in terms of acomplete orthonormal set of basis functions [cPi(~)PO(~)] via a generalizedGram-Charlier series

00

.r.(t)(~) = L kicPi(~)p02(~);i=O

00

= L kicPi(~)f~(t)(~)i=O

(12-74)

where the coefficients k, are "general quasi-moments" of .r.(t)(~):

(12-75)

For scalar x(t) and f~(t)(~) chosen to be a Gaussian density with zero mean andunit variance,

f~(t)«() = (1/J2n) exp{ - ~2 /2}

the cPi(~) functions in (12-74)are the scaled Hermite polynomials

1cP;(~) = ;';l Hi(~)

'\/ I,

(12-76)

(12-77)

and the coefficients k, are the conventional quasi-moments qi defined in (12-72).Moreover, the functions [cPi( ~)Po( mare indeed orthogonal:

f~oo [cPi(~)pO(m[cPj(~)po(~)] d~ = f~oo cPi(~)cPj(~)p02(~)d~

= f~oo cPi(~)cP j(~)f~t)(~) d~

= Ji!1.J]1 f:'oo Hi(~)Hj(~)f~(t)(~)d~ = 0, i # jI. J.

(12-78)

12.5 CONDITIONAL MODE ESTIMATORS 241

Said another way, Hi(~) and Hj(~) are orthogonal relative to the "standardized"Gaussian density (with mean 0 and variance 1).

Ifwe did not truncate the series, we could represent any density perfectly via(12-74) and (12-75). However, as a further motivation for Hermite polynomialsto be considered in particular, we seek good approximations to nearly Gaussiandensity functions with as few parameters required as possible. Quasi-momentsare the coefficients in the expansion of the ratio:

in a series of multidimensional Hermite polynomials, where f~(t)(~) is a Gaussianzero-mean density of specified covariance. If f.(t)(~) were in fact Gaussian withzero mean and the specified covariance, the ratio would be equal to one andall quasi-moments would equal zero. Since the multidimensional Hermite poly-nomials do form a complete orthogonal set of eigenfunctions relative to theGaussian density, any ratio [f.(t)(~)/f~t)(~)] can be expanded in a series ofmultidimensional Hermite polynomials, provided J~ 00 [f.(t)(~)/f~t)(~W d~ isfinite (i.e., assuming that you are working in a Hilbert space). Additionally,approximation theory then yields that any such ratio can be approximatedto any desired precision (in the integrated square error sense) by an appropriatefinite number of terms in the series.

Thus, we can approximate f.(t)IZ(til(~ I~i) with a truncated series represen-tation in terms of quasi-moments instead of moments. However, convergenceof such expansions is limited, a finite number of given quasi-moments woulderroneously produce a unique reconstructed density (an infinite number ofdensities actually satisfies a finite number of moment or quasi-moment specifi-cations), and density approximations based on finite segments of the series mayyield a density function approximation which can assume negative values[41, 72]. Some of these objections can be removed by expanding the squareroot of the density instead of the density itself [47], but two facts remain.Conventional quasi-moments are directly related to moments; for example, inthe scalar case, (12-71)-(12-73) yield qo = mo = 1, ql = m1 , q2 = m2 - mo,q3 = m 3 - 3m!> etc. Moreover, for estimation purposes, only the lowest orderquasi-moments or moments are approximated (as especially the first two), andthere is little benefit from considering quasi-moments instead of moments asin the previous section. Algorithmic details are available in the references atthe beginning of this section.

12.5 CONDITIONAL MODE ESTIMATORS

Rather than developing approximate relations for the conditional mean, i.e.,the minimum mean square error (MMSE) estimate, one can seek an approximatealgorithm for the conditional mode, the maximum a posteriori (MAP) estimate


[54]. The mode during time propagations from ti - 1 to t, is defined by

ofx(t)lZ(ti-ll(~I~i-d/o~TI~=XMAP(t/ti_ll == 0 (12-79)

provided the second derivative is positive definite to assure that the mode iswell defined and unique. The time derivative of (12-79) must also be identicallyzero:

0= :t [ofxlz/O~T]

= :t [(ofxlz/o~)T] + [o2fxlz/O~2]iMAP(t/ti_ d

= [:~ (Ofx,z/ot)J + [o2f.lz/O~2]iMAP(t/ti_1)or

(12-80)

as the desired conditional mode equation, where (ofxlz/ot) is given by the forwardKolmogorov equation and fx1z and its partials are evaluated at XMAP(t/ti-1).The right hand side of (12-80) involves various order partials of the densitythat must be evaluated, and time propagation relations for them involve stillhigher order partials, again yielding an infinite dimensional estimator: asbefore, approximations are required to generate an implementable finite dimen-sional filter. One of the terms that appears in the right hand side of (12-80) isthe n-by-n matrix

(12-81)

For the conditions under which fx(t)lZ(ti _tl is Gaussian and the MAP estimatorbecomes the Kalman filter, I:.(t/ti_ d is in fact the conditional covariance, whichis also computed in that filter structure. To propagateI:.(t/ti _ 1) here, differentiate(12-81) to yield

dI:.(t/ti_ 1)/dt = - [o2f.lz/O~2] -l(dfxlz/dt)

+ f.1z[ o2fxlz/O~2] - 1 [d(02fxlz/O~2)/dt][ o2fxlz/O~2] - 1 (12-82)

where

dfxlz/dt = ofxlz/ot + [afxlz/o~]iMAP = ofxlz/ot (12-83)

in view of (12-79), but no such simplification occurs for [d(a2fxlz/a~2)/dt]; againthese are evaluated using the forward Kolmogorov equation.

For the measurement update, the partial of (12-4) with respect to ~ is againset to OT. The denominator of (12-4) is not a function of ~, so this is equivalent

12.6 STATISTICALLY LINEARIZED FILTER

to solving

or

243

(12-84)

(12-85)

for XMAP(t i +). By differentiating (12-84) again, the update relation for I: becomes

I:- 1Ui+) = - [(a2.r.1./a~2)f.lz + (a.r.I./a~)T(af.lz/a~)

+ (af.lz/a~)T(a.r.I./a~) + (a2f.lz/a~)2hl'] --;- [hl.f.1z] (12-86)

The mode estimator is then defined by appropriate approximations to (12-80),(12-82), (12-85), and (12-86). Especially when discrete dynamics are proposed,this formulation has led to useful filter forms, exploiting the simplifying factthat the denominator of (12-4) is not involved in the estimate generation as itwas in the previous sections [13, 24, 25, 95, 98, 99].

12.6 STATISTICALLY LINEARIZED FILTER

If G in (12-1) is a function of only time instead of x(t) and t, then statisticallinearization can be used to develop nonlinear estimators which, like the latterestimators of Section 12.3, do not employ series representations of f and h[44, 82, 138]. For this reason, it is a method that does not require f and h tobe differentiable, thereby admitting such important nonlinearities as saturation.But this advantage is gained at the expense of requiring evaluation of conditionalexpectations, i.e., of knowing the entire density. Typically, the density is approx-imated as Gaussian, and the resulting implementable algorithm often has bettercharacteristics than those based on truncated series expansions of f and h aboutthe conditional mean approximate estimate.

Consider approximating f[x(t),t] by a linear approximation of the form

f[x(t), t] ~ fort) + ~(t)x(t)

that is "best" in the sense that the error in the representation,

eft) = f[x(t), t] - fort) - ~(t)x(t)

has minimum (generalized) mean square error

(12-87)

(12-88)

(12-89)

for all t E [ti - b ti), where W is a general n-by-n weighting matrix. Note thatf.(t)IZ(li _ tl(~ I~i-l) would be required to evaluate this expectation. Setting thepartials of (12-89) with respect to foUl and ~(t) simultaneously to zero yields,

using the notation of (12-11),

---------foUl = f[x(t), t] - ~(t)x(t)

----------- ---------~(t) = {f[x(t), t]xT(t) - f[x(t), t]xT(t)}P-1(t)

(12-90a)

(12-90b)

with pet) being the conditional covariance of x(t), conditioned on Z(ti-l) =:!Zi-l. ~(t) is seen to be intimately related to describing functions: in the scalarzero-mean case, (12-90b) becomes (h</)Z2), the describing function gain for anodd-function nonlinearity (such as a symmetric saturation) driven by a zero-mean input [44, 45]. Similarly, h[x(t i ), tJ can be approximated by the firsttwo terms in a power series

h[x(t i ) , t;] ~ hO(ti) + :1t'(ti)X(ti) (12-91)

with the coefficients hO(ti) and :1t'(ti) statistically optimized to produce

-------------hO(ti) = h[x(t i), tJ - :1t'(ti)X(ti -) (12-92a)

-------- -------------.Yf'(ti) = {h[x(t;),t;]xT(t;) - h[x(t;),t;]XT(ti-)}P- 1(t ;- ) (12-92b)

Once the statistically optimized approximations (12-87) and (12-91) are intro-duced into the problem description, (12-1) with G = G(t) and (12-2), and itssolution, (12-9)-(12-14), the statistically linearized filter can be expressed as atime propagation

. ---------x(t) = f[x(t), t] (12-93)

pet) = ~(t)P(t) + p(t)~T(t) + G(t)Q(t)GT(t) (12-94)-----. ~

with ~(t) given as in (12-90b), and txT, f, and i calculated assuming x(t) to beGaussian with ,mean x(t) and covariance pet). The measurement update attime t, is given by

KSL(t;) = P(t;-):1t'T(ti)[.Yf'(t;)P(t;-):1t'T(t;) + R(ti)]-l (12-95)

-------------x(t/) = x(ti-) + Ksdti){Z; - h[x(t;),tJ} (12-96)

Pu,") = Ptr,") - K SL(ti):1t'(t;)P(t;-) (12-97)~~

with .Yf'(t;) given by (12-92b), and hxT, h, and i computed as though x(t;) wereGaussian with mean x(t; -) and covariance pet; -), as generated by the previouspropagation. Note that these relations are intimately related to the assumeddensity filters of Section 12.3(assuming G to be a function only oft) that directlyimplement conditional moment relations (12-9)-(12-14), (12-21), and (12-26)and evaluate conditional expectations by assuming a Gaussian conditionaldensity. This is to be expected since the conditional mean does provide theminimum mean square error estimate, as derived fundamentally under similarassumptions in this section.

12.7 NONLINEAR FILTERING, CONTINUOUS-TIME MEASUREMENTS 245

Structurally, the covariance and gain equations, (12-94), (12-95), and (12-97),are the same as those for an extended Kalman filter, but with g;:(t) replacingaf[x(t/ti), t]/ax and .#'(tJ replacing ah[X(ti "), tJ/ax. Because of its relation todescribing functions, ~(t) accounts for the probability that [x(t) - x(t)], approx-imated as Gaussian with zero mean and covariance P(t), will take on valueslarge enough such that

f[x(t),t] + {af[x(t),t]/ax}{x(t) - x(t)}

is no longer an adequate approximation to f[x(t), t], and similarly for .#'(t).As a result, gains computed via (12-95) tend to be more conservative, andperformance better, than achieved with an extended Kalman filter, especiallyfor cases involving large error covariance magnitudes.

EXAMPLE 12.7 Consider Example 12.1, but let b == 0 so that G is not a function of x(t).Then the time propagation is identical to that of Example 12.5 and the second order cumulanttruncation filter of Example 12.6, with b == O.

For the update, hand hX can be evaluated as

h = f."'oo(sin~)fx(tJlIZ(,,-t>l~I~i-dd~

hX = f."'oo ~(sin~)fx(tJlIZ(t'_l,(~I~i-dd~

assuming fx{t,)IZ(t,_tl(~I~i-d to be Gaussian with mean X(ti-) and variance P(ti-); the integralscould be integrated numerically and perhaps curve-fitted with appropriate extrapolation functionsof X(ti-) and P(ti-)' Once computed, these expectations can be used to evaluate (12-92b) and(12-95)-(12-97). •

Statistical approximations based on power series of higher order than in(12-87) and (12-91) can be sought, as can higher order moment estimatorsbased on non-Gaussian cumulant truncation forms. However, the algorithmspecified by (12-93)-(12-97) is particularly more attractive from a computationalstandpoint.

12.7 NONLINEAR FILTERING WITHCONTINUOUS-TIME MEASUREMENTS

Again let a continuous-time system be described by the Markov solutionx( " .) to the Ito stochastic differential equation

dx(t) = f[x(t), t] dt + G[x(t), t] dfJ(t)

with P(·,·) Brownian motion of diffusion Q(t):

E{dP(t)dpT(t)} = Q(t)dt

E{[P(t2 ) - P(tl)][P(t 2 ) - P(tl)]T} = l:2 Q(t)dt

(12-98)

(12-99a)

(12-99b)

However, now consider an m-vector continuous-time measurement processdescribed by

dy(t) = h[x(t),t]dt + dPm(t) (12-100)

where IJm(·,·) is a Brownian motion independent of IJ( ., .), and of diffusionRc(t) for all t E [to, tf]:

E{dPm(t)dPmT(t)} = Rc(t)dt

E{dPm(t)dpT(t)} = 0

Heuristically, this corresponds to

i(t) = f[x(t),t] + G[x(t),t]w(t)

y(t) g, z(t) = h[x(t), t] + vc(t)

(12-101)

(12-102)

(12-98')

(12-100')

with w( ., .) and vc(., .) independent zero-mean white Gaussian noises with

E{w(t)wT(t + r)} = Q(t) <5(r)

E{vc(t)vcT(t + r)} = Rc(t)<5(r)

E{vc(t)wT(t + r)} = 0

(12-99')

(12-101 ')

(12-102')

The subscript c on Rc(t) and vc(·,·) denotes continuous-time, to distinguishthem from R(tJ and v(·,·) associated with the discrete-time measurement (12-2).As in the previous sections, we wish to establish the time history of the con-ditional density of the state x(t,·), conditioned on the entire history of measure-ments observed up to time t, which in this case becomesfx(t)(~1 {y(r), to ~ r ~ t})or fx(t)(~1 {ztr), to ~ r ~ t}). In view of the discussion of conditional densitiesand expectations in Section 3.7 of Volume 1, this is more rigorously describedasfx(ti~l.?y[to,t]), where .?y[to,t] is the minimal a-algebra generated by themeasurement process {y(r,·), to ~ r ~ t}, but we will adopt the less rigorousnotation. Once such a conditional density time propagation were established,it could conceptually be used for real-time estimation to generate fx(t)(~ I{y(r)= y(r), to ~ r ~ t})or.f';(t)(~I{z(r) = z(r),to ~ r ~ t}),given the observed sampleof the measurement process, y(r) or z(r), for all r E [to, t].

It can be shown that the conditional density satisfies the Kushner equation[14,38,54,68-71, 89, 102, 134, 136, 137, 141, 147], sometimes denoted as theKushner-Stratonovich equation,

of. n a 1 n n iJ2at

X

= - i~l a~i {fx./;} + 2 i~l j~l a~ia~j {fx[GQGT]ij}

--------+ {h[~,t] - h[x(t),t]VR;l(t){Z(t) - h[x(t),t]}fx (12-103)

where

-------- fooh[x(t),t] g, -00 h[~,t]fx(t)(~I{y(r),to ~ r ~ t})d~ (12-104)

provided the indicated derivatives exist. The first two terms of (12-103) corre-spond directly to the forward Kolmogorov equation (11-96), which can beused to propagate the conditional density for discrete-time measurements,I.(tlIZ(t, -1l(~ Is .:d, from one sample time to the next. The last term in (12-103)is due to the continuous-time incorporation of measurement information, incontradistinction to the discrete-time updates seen previously. Without con-tinuous-time measurement information, i.e., with R; 1 == 0, the Kushner equa-tion reduces to the forward Kolmogorov equation. As written, (12-103) is astochastic partial differential equation, yielding a sample solution for eachsample of the measurement process, y(r, wd = y(r) for all r E [to, t] and corre-sponding z(r) = dy(r)/dr. Arguments on the existence and uniqueness of solu-tions to such equations are available [36, 89, 108].

This relation can be established from the sampled-data results ofthe previoussections in the limit as the sample period goes to zero. Basically, (12-4) iswritten, replacing Z(t i - d by {y(r), to ::; r ::; t} and z(ti ) by dy(t) to write

I.(tl(~I{y(r), to::; r::; t}, dy(t))

= [h y(t)('1 lx(t) =~, {y(r), to::; r::; t})JI.(t)(~I{y(r), to::; r <; t}) (12-105)h y(t)('11 {y(r), to ::; r ::; t})

and expanding the bracketed term to order dt, recalling that, by the Levyoscillation property (see (11-59)),

w.p.l; in m.s. (12-106)

After some algebra [68, 69, 89], the bracketed term in (12-105) can be shownequal to

----------- -----------[.j = 1 + {h[~,t] - h[x(t),t]VR;l(t){dy(t) - h[x(t),t]dt} + (0(dt) (12-107)

-----------where limM _ o @(I1t)/11t = 0 and h[x(t), t] is given by (12-104). Rigorous equalityin (12-107) is established by noting that x(',·) is a martingale relative to {y(r),to ::; r ::; t}, (see (11-55)and (11-64)) and then invoking martingale convergenceresults [35]. Then the partial derivative

(JI.(tl(~1 {y(r), to ::; r ::; t})

at= lim I.(tHt)(~I{y(r), to::; r::; t + 11t}) - I.(t)(~I{y(r), to::; r::; t})

llt-O At

or the partial derivative for the corresponding characteristic function can beevaluated, using the Chapman-Kolmogorov equation (as in the derivation of(11-96)) and Eq. (12-102), to express the first numerator term conveniently[68,69, 89]. Carrying this through yields the result (12-103) as claimed. Ex-tensions to the case of independent-increment point-process descriptions of


----------- --------------dx(t) = f[x(t),t]dt + {x(t)hT[x(t),t]-------- .....------- x(t)hT[x(t),t]}R;l(t){dy(t) - h[x(t), t] dt} (12-108)

----------- -------."" ~ ---- ~dP(t) = {[x - X]'T + '[x - x]T + GQGT- [x - x]h TR;l(t)h[x - x] }dt

+ ex- x][x - x]T{[h - h]TR;l(t)[dy(t) - hdt]} (12-109a)

dynamics driving noise and measurement corruptions, instead of Brownianmotions as considered here, are also available [39, 89, 104, 106, 107, 121-125].

Once the fundamentally important Kushner equation (12-103) is availableto describe the evolution of the conditional density .fx(t)(~1 {y(r), to :s; r :s; t}),the conditional mean and covariance propagations can be evaluated anal-ogously to (12-9) and (12-10) [70,115]. They satisfy the stochastic differentialequations

or, on an element-by-element basis,.----.... ~ .----.... ~ ~

dPij(t) = {[x;fj - xifj] + [fixj - fixJ + [GQG L.................... ......-....... .....

- [x.h - xih]TR; l(t)[hxj - hxJ} dt~........-...... ..... ........-........,...-............. .....

+ {xixjh - xixjh - xixjh - Xjxih + 2xixjhVR;1(t){dy(t) - hdt}

(12-109b)

for i, j = 1, 2, ... , n, where Xi is the ith component of x, fj is the jth componentoff[x(t),t], h represents h[x(t),t], and all terms written with the caret symbolare conditional expectations of the corresponding functions as in (12-104).Note that (12-108) and (12-109) can be divided through by dt heuristically,identifying dy(t)/dt as z(t). Also note the structure of the filter gain, the termpremultiplying, the residual in (12-108). The indicated conditional expectationsrequire knowledge of the entire density function, i.e., of all higher moments aswell as the first two, as was the case in Section 12.3 [108].

Various assumptions about the conditional density, higher order moments,or higher order cumulants allow development of approximations to these full-scale equations in the form of finite-dimensional, implementable filter algo-rithms [72, 74, 113]. In the special case of linear dynamics and measurementmodels, they reduce without approximation to the continuous-time, continuous-measurement Kalman filter of Chapter 5 (Volume 1). Only in this special casedo the conditional covariance equations truly decouple from the conditionalmean and measurement residual driving function, thereby becoming precom-putable. If G is a function only of time and not of x(t), Taylor series represen-tations for f and h expanded about i(t) and truncated at first order terms, yieldthe extended Kalman filter given by (9-80)-(9-82). As in Section 12.3, Taylorseries carried to second order can generate either troncated or Gaussian secondorder filters, depending on the assumptions made about the conditional density


for the state. The truncated second order filter [7, 52, 54, 91, 114J is given by

dx(t) = {f[x(t), tJ + 6p(t)} dt

+ P(t)HT[t;x(t)JR;l(t){dy(t) - h[x(t),tJdt - 6m(t)dt} (12-110)

---------dP(t) = {F[t; x(t)JP(t) + P(t)FT[t; x(t)J + GQGT

- P(t)HT[t; x(t)JR; l(t)H[t; x(t)JP(t)} dt

- p(t)[6mT(t)R; l(t){dy(t) - h[x(t), tJ dt - 6m(t) dt}] (12-111)

where

(12-113)

(12-112)

(12-114)

(12-115)

F[t;x(t)J £ 8f[x,tJ Iax x=i(t)

H[t;x(t)J £ ah[x,tJIax x=i(t)

and 6p(t ) and 6m(t) are defined componentwise by

b () £ ~ {a 2f k[x (t), tJ P( )}pk t 2 tr ax 2 t

b () £ ~ {a 2hk[X(t), tJ P( )}

mk t 2 tr ax 2 t

--------and GQGT is as given in (12-53). Compare these relations to the discrete-timemeasurement filter of the corresponding form, (12-42)-(12-54), and notice thatthe filter gain is now P(t)HT[t; x(t)JR; l(t). The Gaussian second order filter[23,38,54,72,91,113, 114J is described by the same relations, except for (12-111)bein,g replaced by

dP(t) = {F(t; x(t)JP(t) + P(t)FT[t; x(t)] + GQGt- P(t)HT[t; x(t)JR; l(t)H[t; x(t)JP(t)} dt

m [ a2h [x(t) tJ ]+ Jl P(t) k 8x2' P(t)e/R;l(t){dy(t) - h[x(t),tJdt - 6m(t)dt}

(12-116)

---------where GQGT is now given as in (12-61) and ek is an m-dimensional vector witha one as its kth component and zeros elsewhere. This is comparable to thediscrete-time measurement version, (12-48)-(12-52) and (12-54)-(12-61). Asdiscussed for the discrete-time measurement case, the stochastic driving termson (12-111) and (12-116) can cause significant difficulties, since P(t) is notnecessarily positive definite, and modified second order filters (the same filterrelations but with these driving terms removed) and first order filters with biascorrection terms are therefore motivated as viable alternatives.


EXAMPLE 12.8 Recall the scalar example introduced in Example 12.1, but now assumecontinuous measurements are available as

dy(t) = sin x(t) dt + d~m(t) or z(t) = sin x(t) + vc(t)

with ~m( " .) scalar Brownian motion of diffusion Rc' or vc(" -] zero-mean white Gaussian noise ofstrength Rc:

The truncated second order filter for this application is given by

.{;(t) = -[x(t) + ax3(t)] - 3ax(t)P(t) + P(t)[cos x(t)] {z(t) - sin xtr) + HsinX(t)]P(t)}/Rc

P(t) = - 2[1 + 3ax 2(t)]P(t) + [1 + bx2(tJYQ + [2bx(t)]2 P(t)Q

+ 2b[1 + bx2(t)]P(t)Q - p 2(t )[ cosx (tJY /Rc

- tp2(t)[sin x(t)] {z(t) - sinx(t) + Hsinx(t)]P(tl}/Rc

The Gaussian second order filter shares the same .{;(t) equation, but the p(t) equation has an addi-

tional term of [ + 3b2 p 2(t )Q] due to the different GQG1' evaluation, and the last term above isdoubled in magnitude and of opposite sign. Modified filters would have this last term on the p(t)

equation removed, and a first order filter with bias correction would differ only in GQGi'. •In actual implementation for cases involving time-invariant systems and sta-tionary noises (or described by slowly varying parameters that are approximatedquasi-statically as constant), one often considers the steady state constant gainfirst order filter with bias correction terms because ofthe reduced computationalloading. This entails an assumed nominal for evaluation of partials in Ptr) andgain calculations, but full accounting for nonlinearities and new evaluationpoint in the state dynamics and residual generation expressions of (12-110).

Continuous-measurement assumed density filters can also be derived withoutseries representations for f and h, evaluating required conditional expectationsin (12-108) and (12-109) on the basis of assumed conditional density functionforms, typically Gaussian with mean itt) and covariance pet),as in Section 12.3.Moreover, higher order moment filters can be derived by writing explicit momentequations using Kushner's equation in a form analogous to (12-62): for scalartjJ[x(t), t] that is twice differentiable with respect to x(t),

~ ~ ------dtjJ[x(t), t] = a\jl + N f[x(t) t]

dt at ax '------------+ ~ tr {G[x(t), t]Q(t)GT[ x(t), t] ~:~}

--------------- ~ ---------- ----------+ {\\1h[x(t),t] - \jIh[x(t),t]VR;l(t){Z(t) - h[x(t),t]}(12-117)

Letting tjJ[x(t), t] be successively higher order products of x(t) componentsyields (12-108), (12-109), and higher order scalar moments. As in the discrete-measurement case, cumulants truncation can be applied to achieve the final

equations for implementation [97, 110, 111, 120]. As in the discrete-time mea-surement case, conditional mode estimators can also be developed [33, 54, 71].

Another means of deriving continuous-measurement filters rigorously is byapplication of integral techniques in the form of a representation theoremoriginally due to Bucy [14-19,21,35,43,62-64,89,94]. For the problemdescribed in the beginning of this section, it can be shown that the conditionaldensity f.(t)(el {y(r); to ::;; r ::;; t}) can be expressed by the representation

J. (el { ( ). t < < t}) = E[exp(H)I x(t) = e, {y(r); to ::;; r ::;; t}] J. (e)x(t) Yrj ; 0 - r - E[exp(H)I{y(r); to::;; r::;; t}] xlll

(12-118)

St litH = hT[x(r),r]R;l(r)dy(r)-- hT[x(r),r]R;l(r)h[x(r),r]dr (12-119)to 2 to

where the expectations are over {x(r); to ::;; r ::;; t} given the minimal a-algebragenerated by {y(r); to ::;; t ::;; t}. Function space integration and other con-cepts from functional analysis, as well as martingale properties, are requiredfor rigorous proof [41, 63, 64, 94, 148], though more heuristic approaches[14,54,89] provide considerable insight. By explicitly representing the con-ditional density in this manner, the theorem provides an alternate means ofderiving the propagation relations for both the theoretically optimal and theapproximate, finite-dimensional filters for the nonlinear estimation problem.An analogous result also exists for problems with discrete-time measurements[14,89].

As mentioned in Section 12.2, an equation ofthe form of(12-98) can representa system driven by deterministic control inputs u(t), or feedback controls as afunction of the current perfectly known state u[x(t), t] by writing

f[x(t),t] = f'[x(t),u(t),t] (12-120a)

or

f[x(t),t] = f"{x(t),u[x(t),t],t} (12-120b)

The same remarks also pertain to h in (12-100).However, a number of "feedbackestimation problems" are well modelled by replacing (12-100) with a stochasticfunctional differential equation,

dy(t) = h[x(t), {y(r); to::;; r::;; t}, t]dt + dPm(t) (12-121)

For instance, in many signal tracking applications, such as correlation tracking,the measurement is specifically a function of the difference between somecomponents of the current system state (representable as Tsx(t)) and somecomponents ofa filter state estimate (Trx(t)):

h[x(t), {y(r); to s r ::;; t}, t] = h'[Tsx(t) - Trx(t), t] (12-122)


It can be shown [11, 23, 43J that a function l/J[',' J twice differentiable withrespect to its first argument satisfies the Kushner moment equation, (12-117),when h is generalized as in (12-121), and that (12-118) and (12-119) also applyunder this generalization. The general stochastic control problem in which fis also allowed to be a function of {y(r); to :s:: r :s:: t}, as through a feedback ofestimated states, u[x(t), tJ, will be discussed in subsequent chapters.

EXAMPLE 12.9 Consider the tracking of a pseudorandom code having an autocorrelationfunction as given in Fig. 12.11, by a receiver able to generate the code sequence in order to lockonto an incoming signal ofthis form [11]. A feedback loop as shown in Fig. 12.12keeps the receiver'slocally generated code in near coincidence with the incoming code. Many current code trackingloops develop a detector output as the difference between an "early" correlation (the result of multi-plying the incoming code by a local code advanced by one time unit known as a "chip" relative tothe expected arrival time of the input code) and a "late" correlation (using a one-chip-delayed localcode). Let s(t) denote the pseudorandom code as generated locally, and model the incoming codei(t) as

i(t) = str) + n(t)

where nl',') is zero-mean white Gaussian noise with strength N(t): E{n(t)n(t + tl] = N(t) (j(t).

Define the detector error characteristic D,(e) as the expected value of the detector output for a

E{s(t)s(t + r)}

s

-I +1 r (in chips)

FIG. 12.11 Autocorrelation of pseudorandom code.

Incoming Outputcode code

Early and late codes

FIG. 12.12 Code tracking loop.

12.7 NONLINEAR FILTERING, CONTINUOUS-TIME MEASUREMENTS

D,(e)

253

+1

-1

+ I + 2 e (in chips)

FIG. 12.13 Detector error characteristic.

phase tracking error of e chips normalized by the signal power S, assuming N == 0, i.e., i(t) = s(tj,

b>. E{i(t)[s(t + e - 1) - s(t + e + I)J}1D,(e) - ---'-----'----c--------=-

S ""=.,,,This is plotted in Fig. 12.13. Letting N(t) be nonzero does not affect the mean value of the detectoroutput, but contributes only to its covariance:

E{i(t)[S(t + e - I) - s(t + e + I)J/S} = D,(e)

E{(i(t)[s(t + e - I) - s(t + e + I)JS- 1- D,(e))(i(1')[S(1' + e - 1) - str + e + I)JS- 1

- D,(e))}

= [2N(t)/SJ b(t - 1')

Now the incoming signal phase £.I(t) in chips is assumed to be described adequately by thedynamics model

d8(t) = d~(t)

where ~(.,.) is scalar Brownian motion with diffusion Q(t). The detector output measurement isgiven by

dy(t) = D,[8(t) - O(t)Jdt + d~m(t)

where ~m(',') is Brownian motion independent of ~(.,.) and of diffusion 2N(t)/S, or

z(t) ~ dy(t)/dt = D,[8(t) - O(t)J + ve(t)

where ve( · , · ) is zero-mean white Gaussian noise of strength Re(t) = 2N(t)/S. Then (12-108) and(12-109) yield, for x ~ e,

dOlt) = [00, - OD,J[2N(t)/SJ- 1{dy(t) - D,dt}

dP(t) = {Q(t) - [00, - OD,J2[2N(t)/SJ - I} dt +~ [2N(t)/SJ - 1{dy(t) - D,dt}

Noting that D,(') is an antisymmetric function, if one assumes the conditional density to be sym-metric about its mean, then

~b.~0, = D,[8(t) - 8(t)J = °


and the coefficient premultiplying the residual in dP(t) is also zero. If one further assumes theconditional density is Gaussian with mean ~(t) and variance PIt), 00, can be evaluated as

---- 1 f'" {_~2}eD, = ~ ~D,(~)exp - d~ = P(t)H[D"P(t)J'" 2nP(t) - co 2P(t)

where H[D"P(tlJ is the describing function gain [45J for the nonlinearity D.('). Thus, the filterequations become

dOlt) = P(t)H[D" P(t)JR; l(t) dy(t)

P(t) = Q(t) - P2(t)H2[D"P(t)JR;I(t)

with Rc(t) = 2N(t)/S, which has the structure of a Kalman filter linearized with the describingfunction gains. Such an interpretation is typical for the Gaussian approximation, except that thestochastic driving term on p(t) is generally present.

An extended Kalman filter for this problem would be described by the same form of equationsbut with H[D"P(t)J replaced by

oD,[O - OJ I = oD,[OJ = 100 9~ii ao

The partial is evaluated at 0 = ~, and thus there is no accounting for the shape of D, other than itsslope at e = 0; this is an inherent weakness of methods based on Taylor series representation forf and/or h. Because this is a correlation tracking problem with an assumed linear dynamics model,the variance and gain expressions of the extended Kalman filter are not dependent on the statetrajectory or measurement history, as they generally would be. Therefore, a steady state, constantgain extended Kalman filter can be generated if Q and R; are assumed constant. Empirical tests[11J have shown the filter-computed variance to be a good representation of actual error statisticsup to a standard deviation .,jPof about! chip, above which the filter underestimates its own errorssignificantly and performance degrades unacceptably.

In contrast, the describing function H[D" P(t)J in the filter just derived takes into account theprobability that [e - OJ, assumed Gaussian with mean zero and variance P(t), lies outside thelinear region of D,. For small PIt), this filter approaches the extended Kalman filter form, but asPIt) increases, it tends to have more conservative gains and to provide adequate performance overa larger domain.

EXAMPLE 12.10 As seen in Fig. 12.13, the detector in the previous example provides noinformation for signal tracking purposes if the magnitude of the phase tracking error exceeds 2chips. This range can be extended by correlating the incoming code with more advanced anddelayed bits, and using a weighted sum of such correlation products to drive the loop filter. Figures12.14a and 12.14b portray the detector error characteristics oftwo such methods using K advancedand K delayed bits instead of one of each as in Example 12.9; the associated linear and flat errorcharacteristics are given by

L .t!..E{i(t)~J~-K[-kS(t+e-k)J}1D, (e) _ --C_=_~----:::- -=S ;(t)~SII)

F .t!.. E{i(t)If~-K[ -3gn(k)s(t + e - k)J}1D, (e) - ----'--=-~--=--------=:.:..

S '1t)~sll)

with detector output means and covariance kernels of

{D,7(e), [(2K + I)(K + I)KN(t)/3SJ 8(t - r)}, {Dt(e), [2KN(t)/SJ8(t - rl}

respectively, replacing {D,(e), [2N(t)/SJ 8(t - rn of Example 12.9. The post-detection noise is seento increase with increasing detector range, more so in the linear extension case than the flat case.


(a) D,L(e)

--'--'-,....:...-.,.-----~t::-----~---'---_e

(b)

-(K + 1)-K

(e)

-I

1-1

K (K + 1)

- __=-----------J~----------:=--e

FIG. 12.14 Extended range detector error characteristics. (a) Linear detector error charac-teristic D,L. (b) Flat detector error characteristic D/. (c) General antisymmetric error characteristicDG A, .

Therefore, one may seek a best K within these detector characteristics, or even a best characteristicfunctional form as in Fig. 12.14c, to yield desired tracking performance for a specified noise andcode phase dynamics environment. As long as the detector characteristic is anti symmetric, thefilter equations of the previous example are valid, with D,L, D/, or D~A replacing D,. Note thatthe replacement of H[D,', Pit)] by 1 in the extended Kalman filter for all cases (assuming dD~A(O)!

de = 1) is particularly inappropriate since one envisions using the detectors in the regions overwhich their characteristics differ markedly.


In actual implementation, Q(t) can be established adaptively to account for varying code phasedynamics, as due to transmitting and/or receiving vehicle dynamics, and Rc(t) can be adapted toenvironmental noise, as due to jamming. Moreover, better dynamics models can be incorporatedto reflect knowledge of phase dynamics characteristics. •

Another method that has been used to generate estimator results is theinnovations process approach [42,46,55,56,58-61]. Basically one seeks totransform the measurement history {Y(T); to ~ T ~ t} to a Brownian motion{U(T); to ~ T ~ t} by a causal (nonanticipative) and causally invertible trans-formation; ut, .) is then the innovations process. Estimation of the solution x(t)to (12-98) and (12-99) given the innovations history is simpler than estimationbased upon the original measurement history described by (12-100)-(12-102).Given (12-98) and (12-99), and a unit-diffusion Brownian motion (or Wienerprocess) u(',·) such that

the conditional mean (MMSE estimate) can be shown to satisfy

------- -------di(t) = f[x(t), t] dt + x(t)i>T(t)du(t)

The process u(', .) described by

-------du(t) = JRc(t) 1{dy(t) - h[x(t),t]dt}

(12-123)

(12-124)

(12-125)

is in fact such a Wiener process, and substitution of (12-125) and (12-100) into(12-124) yields the previous result of (12-108). This also demonstrates that thenonlinear filter residual {z(t) - ~} is zero-mean white Gaussian noise ofthe same covariance as the measurement noise [42, 59]:

~ ----,E{(z(t) - h[x(t),t])(Z(T) - h[X(T),T])T} = Rc(t)c5(t - T) (12-126)

-------ifh[x(t), t] could be evaluated without approximation. Note that the Gaussian-ness of the nonlinear filter residual is valid only in the continuous-measurementcase. Finding a means of generating innovations processes in a general and/orpractical computational manner is then the crux of estimator design by thismethod, though one of difficult tractability.

Continuous-time state estimation problems based on both continuous anddiscrete measurements can be solved by combining the results of this sectionand Section 12.3. To be more specific, the discrete-time measurement updatesgiven by (12-12)-(12-14) or the vector equivalent of (12-21) and (12-26), or thevarious approximations thereto, can be combined with the appropriate equa-tions for incorporating the continuous measurements between the discrete-timesample points, namely (12-108) and (12-109), or approximations to these.

In the development of appropriate nonlinear filter structures, the insightsgained from simpler linear estimation problems have often been exploited[126-129,142]. Martingales [35,51, 119], use of orthogonality in a more

12.8 SUMMARY 257

general Hilbert space setting [5,6,37,39,81,84], reproducing kernel Hilbertspace methods [57], invariant embedding [8, 33, 101], least squares [49], andalgebraic structure [83] approaches have also been used as alternative means ofgenerating estimator results. Estimators have also been developed for the caseof uncertainties described as unknown but bounded or as members of a givenset [9, 103, 117, 118], for distributed parameter models of systems [73, 90,140],and for the cases of smoothing [10,80] and finite memory filtering [53,54].Futhermore, stochastic approximations [1,44,48, 75, 105, 109, 139, 146] havebeen used to produce estimators which are not necessarily optimum in anyspecific statistical sense, but which have certain desirable convergencecharacteristics.

12.8 SUMMARY

The nonlinear filtering problem for continuous-time systems with Markovstate descriptions (12-1), given discrete-time noise-corrupted measurements(12-2), was described conceptually in Section 12.2. Basically, one attempts todepict the conditional density for x(t), conditioned on measurements availableup to that time. A partial description of this density is provided by Eqs. (12-9)and (12-10) for propagating the first two moments between measurement sampletimes (as derived from the forward Kolmogorov equation), and update equa-tions given by either (12-12)-(12-14) or the vector equivalents of (12-21) and(12-26), based upon Bayes' rule. The latter, better computational, update rela-tions are dependent upon the assumption that the moments after update canbe expressed by a truncated power series in the residual. However, even thispartial description is an infinite-dimensional filter in general, since the condi-tional expectations in these relations require knowledge of the entire conditionaldensity, i.e., of moments of all orders.

To develop implementable algorithms, various approximations can be made.One class of nonlinear filter exploits a Taylor series representation for thedynamics f function and the measurement h function. Within this class, thetruncated second order filter is described for scalar problems via (12-17)-(12-19)and (12-27)-(12-31); the modified form removes the stochastic driving term fromthe covariance relation by setting hi = 0 in (12-31), and is described for thevector case by (12-42)-(12-54). Alternately, one can make generally bettermodeling assumptions to yield the Gaussian second order filter, (12-33)-(12-39),or the modified form thereof by setting hi =0 in (12-39). The modified Gaussiansecond order filter for the vector case problem, (12-48)-(12-52) and (12-54)-(12-61), is perhaps the most applicable of this form offilter. Based on a tradeoffof computational loading and performance, a first order filter with bias correc-tion terms, based on first order covariance and gain computations, but with aresidual as given by (12-44) and (12-47) and a state propagation as by (12-49)and (12-51), is often a viable alternative to full-scale second order filters.


Assumed density filters do not inherently involve Taylor series for f and h,and therefore are more generally applicable and often outperform filters basedon such representations. Fundamentally, these filters implement the full-scalemoment equations (12-9)-(12-14) and the vector equivalent of (12-21) and(12-26), evaluating required conditional expectations as though the conditionalstate density were of an assumed form, as especially Gaussian with meanx(t/ti _ d and covariance P(t/ti - 1 ) as computed by the algorithm for allt e [ti - 1 , t;).

Higher order moment filters can be generated by writing higher order momentequations explicitly, as via the moment equation (12-62) derived from theforward Kolmogorov equation and expansions of the form of (12-21) and (12-26).Cumulants truncation as described in (12-64) can be used in the approximateevaluation of required conditional expectations.

The corresponding results for a discrete-time dynamics model, (12-65) and(12-66), were delineated by (12-67)and approximations thereto.

Section 12.4described conditional quasi-moments as an alternative parameter-ization to moments of the conditional state density. Hermite polynomial serieswere motivated as a natural replacement for power series involving moments,and the resulting filter does provide good approximations for nearly Gaussiandensities with few parameters required. However, for most problems of interest,there is little benefit to this form over conditional moment estimators, sincethey are both typically truncated in practice at reasonably low order, underwhich condition they differ only slightly.

Instead of generating an MMSE estimate as just described, the MAP estimatecan be given by the conditional mode. Equations (12-80), (12-82), (12-85), and(12-86), and suitable approximations to these, yield this estimator.

For the special case of G being a function only of time t and not of xU), astatistically linearized filter as given by (12-90b) and (12-92b)-(12-97) can beemployed. Like assumed density filters, it does not depend upon series repre-sentations for f and h. Structurally the algorithm has the form of an extendedKalman filter, but with partials of f and h evaluated at the current estimatebeing replaced by statistically optimized matrices that are directly related todescribing function gains for the associated nonlinearities. Its performanceoften surpasses that of an extended Kalman filter by a considerable margin.

Finally, Section 12.7considered the case of continuous-time state estimationbased upon continuous-time measurements being available, as described by(12-98)-(12-102). The Kushner equation (12-103) describes the propagation ofthe conditional state density, and it forms a fundamental basis for estimationin the same manner as the forward Kolmogorov equation did for propagationsbetween sampled-data measurements. The moment equations (12-108), (12-109),and the general (12-117) derived from the Kushner equation provide an infinite-dimensional filter analogous to the discrete-time measurement results (12-9)-(12-14), (12-21), (12-26), and (12-62). As in the discrete-time measurement case,

REFERENCES 259

approximations yield the truncated second order filter ((12-11 0)-(12-115)),Gaussian second order filter ((12-110), (12-112)-(12-116)), modified second orderfilters (removing the stochastic driving term from (12-111)), first order filterwith bias correction terms (additionally removing the second order derivative------=terms from GQGT ), assumed density filter ((12-108) and (12-109) with Gaussiandensity assumption for expectation evaluation particularly), and higher ordermoment filter ((12-117) with cumulant truncation). Other fundamental toolsfor filter generation are the representation theorem, (12-118) and (12-119), andthe innovations process approach, (12-123)-(12-126).

REFERENCES

I. Albert, A. E., and Gardner, L. A., Jr., "Stochastic Approximation and Nonlinear Re-gression." MIT Press, Cambridge, Massachusetts, 1967.

2. Andrade Netto, M. L., Gimeno, L., and Mendes, M. J., On the optimal and suboptimalnonlinear filtering problem for discrete-time systems, IEEE Trans. Automat. Control AC-23(6), 1062-1067 (1978).

3. Athans, M., The compensated Kalman filter, Proc. Symp. Nonlinear Estimation and ItsAppl., 2nd, San Diego, California pp. 10-22 (September 1971).

4. Athans, M., Wishner, R. P., and Bertolini, A., Suboptimal state estimators for continuous-time nonlinear systems from discrete noisy measurements, IEEE Trans. Automatic ControlAC-13 (5),504-518 (1968); also Proc. Joint Automat. Control Conf., Ann Arbor, Michiganpp. 364-382 (June 1968).

5. Balakrishnan, A. V., A general theory of nonlinear estimation problems in control systems,J. Math Anal. Appl. 8, 4-30 (1964).

6. Balakrishnan, A. V., State estimation for infinite dimensional systems, Comput. Syst, Sci. I,391-403 (1967).

7. Bass, R. W., Norum, V. D., and Schwartz, L., Optimal multichannel nonlinear filtering,J. Math. Anal. Appl. 16,152-164 (1966).

8. Bellman, R. E., Kagiwada, H. H., Kalaba, R. E., and Sridhar, R., Invariant imbedding andnonlinear filtering theory, J. Astronaut. Sci. 13, 110-115 (1966).

9. Bertsekas, D. P., and Rhodes, I. B., Recursive state estimation for a set-membership descrip-tion of uncertainty, IEEE Trans. Automat. Control AC-16 (2), 117-128 (1971).

10. Bizwas, K. K., and Mahalanabis, A. K., Suboptimal algorithms for nonlinear smoothing,IEEE Trans. Aerospace Electron. Syst. AES-9 (4), 529-534 (1973).

II. Bowles, W. M., Correlation Tracking, Ph.D. dissertation. MIT, Cambridge, Massachusetts(January 1980).

12. Box, G. E. P., and Jenkins, G. M., "Time Series Analysis: Forecasting and Control."Holden-Day, San Francisco, California, 1976.

13. Bryson, A. E., and Frazier, M., Smoothing for nonlinear dynamic systems, Proc. OptimumSyst. Synthesis Conf., Tech. Rep. ASD-TRD-63-119, pp. 354-364. Wright-Patterson AFB,Ohio (February 1963).

14. Bucy, R. S., Nonlinear filtering theory, IEEE Trans. Automatic Control AC-I0 (2), 198(1965).15. Bucy, R. S., Recent results in linear and non-linear filtering, Proc. ofJoint Automat. Control

Conf., Stochastic Prob!. in Control, Ann Arbor, Michigan pp. 87-106 (June 1968).16. Bucy, R. S., Nonlinear filtering with pipeline and array processors, Proc. IEEE Conf,

Decision and Control, New Orleans, Louisiana pp. 626-629 (December 1977).17. Bucy, R. S., Hecht, C., and Senne, K. D., An Engineer's Guide to Building Nonlinear Filters,

Frank J. Seiler Research Laboratory Rep. SLR- TR-72-0004, Colorado Springs (1972).


18. Bucy, R. S., and Joseph, P. D., "Filtering for Stochastic Processes with Applications toGuidance." Wiley (Interscience), New York, 1968.

19. Bucy, R. S., and Senne, K. D., Digital synthesis of non-linear filters, Automatica 7,287-299(1971).

20. Cameron, R. H., and Martin, W. T., The orthogonal development of nonlinear functionalsin series of Fourier-Hermite functions, Ann. ofMath. 48, 385-392 (1947).

21. Cameron, R. H., and Martin, W. T., The transformation of Wiener integrals by nonlineartransformations, Trans. Amer. Math. Soc. 66, 253-283 (1949).

22. Carney, T. M., and Goldwyn, R. M., Numerical experiments with various optimal estimators,J. Optim. Theory Appl. 1, 113-130 (1967).

23. Clark, J. M. c., Two recent results in nonlinear filtering theory, in "Recent MathematicalDevelopments in Control" (D. J. Bell, ed.). Academic Press, New York, 1973.

24. Cox, H., On the estimation of state variables and parameters for noisy dynamic systems,IEEE Trans. Automat. Control AC-9 (1),5-12 (1964).

25. Cox, H., Estimation of state variables via dynamic programming, Proc. Joint Automat.Control Conf., Stanford, California pp 376-381 (1964).

26. Cramer, H. "Mathematical Methods of Statistics." Princeton Univ. Press, Princeton, NewJersey, 1961.

27. Culver, C. 0., Optimal Estimation for Nonlinear Stochastic Systems. Sc.D. dissertation,MIT, Cambridge, Massachusetts (March 1969).

28. Culver, C. 0., Optimal estimation for nonlinear systems, Proc. AIAA Conf., Princeton, NewJersey Paper No. 69-852 (1969).

29. Davis, M. H. A., The application of nonlinear filtering to fault detection in linear systems,IEEE Trans. Automat. Control AC-20 (3),257-259 (1975).

30. Deckert, J. C., Desai, M. N., Deyst, J. J., and Willsky, A. S., F-8 DFBW sensor failureidentification using analytic redundancy, IEEE Trans. Automat. Control AC-22 (5) 795-803(1977).

31. Denham, W. F., and Pines, S., Sequential estimation when measurement function non-linearity is comparable to measurement error, AIAA J. 4, 1071-1076 (1966).

32. Desai, M. N., Deckert, J. C., and Deyst, J. J., Jr., Dual-sensor failure identification usinganalytic redundancy, AIAA J. Guidance and Control 2 (3), 213-220 (1979).

33. Detchmendy, D. M., and Sridhar, R., Sequential estimation of states and parameters innoisy non-linear dynamical systems, Proc. Joint Automat. Control Conf., Troy, New York,pp 56-63 (1965); also Trans. ASME,J. Basic Eng. 362-368 (1966).

34. Deyst, J. J., Jr., and Deckert, J. C., Application of likelihood ratio methods to failuredetection and identification in the NASA F-8 DFBW aircraft, Proc. IEEE Con! Decisionand Control, Houston, Texas p. 237 (1975).

35. Doob, J. L., "Stochastic Processes." Wiley, New York, 1953.36. Duncan, T. E., Probability Densities for Diffusion Processes with Applications to Nonlinear

Filtering Theory and Detection Theory. Ph.D. dissertation, Stanford Univ., Stanford,California (1967).

37. Falb, P. L., Infinite dimensional filtering: The Kalman-Bucy filter in Hilbert space, Informat.Controlll, 102-107 (1967).

38. Fisher, J. R., Optimal nonlinear filtering, "Control and Dynamic Systems: Advances inTheory and Applications," Vol. 5, pp 198-300. Academic Press, New York, 1967.

39. Fisher, J. R., and Stear, E. B., Optimum nonlinear filtering for independent incrementprocesses-Part I, IEEE Trans. Informat. Theory 1T-3 (4), 558-578 (1967).

40. Friedland, B., and Bernstein, I., Estimation of the state of a nonlinear process in the presenceof non gaussian noise and disturbances, J. Franklin Inst. 281, 455-480 (1966).

41. Friedman, H., "Stochastic Differential Equations and Applications," Vols. I and 2. Aca-demic Press, New York, 1975, 1976.

REFERENCES 261

42. Frost, P. A., and Kailath, T, An innovations approach to least-squares estimation-PartIII: Nonlinear estimation in white Gaussian noise, IEEE Trans. Automat. Control AC-16(3),217-226 (1971).

43. Fujisaki, M., Kallianpur, G., and Kunita, H., Stochastic differential equations for thenonlinear filtering problem, Osaka J. Math. 9 (1),19-40 (1972).

44. Gelb. A. (ed.), "Applied Optimal Estimation." MIT Press, Cambridge, Massachusetts, 1974.45. Gelb, A., and VanderVelde, W. E., "Multiple-Input Describing Functions and Nonlinear

System Design." McGraw-Hill, New York, 1968.46. Gevers, M., and Kailath, T, An innovations approach to least-squares estimation-Part VI:

Discrete-time innovations representation and recursive estimation, IEEE Trans. AutomaticControl AC-18 (6),588-600 (1973).

47. Hempel, P. R., General expansion of the density for nonlinear filtering, AIAA J. Guidanceand Control 3 (2),166-171 (1980).

48. Ho, Y. C, On the stochastic approximation method and optimal filtering theory, J. Math.Anal. Appl. 6,152-154 (1962).

49. Ho, Y. C., The Method of Least Squares and Optimal Filtering Theory, Memo RM-3329-PRo Rand Corp., Santa Monica, California (October 1962).

50. Ho, Y. C., and Lee, R. C K., A Bayesian approach to problems in stochastic estimation andcontrol, IEEE Trans. Automat. Control AC-9 (4),333-339 (1964).

51. Hsu, K., and Marcus, S. I., A general martingale approach to discrete-time stochasticcontrol and estimation, IEEE Trans. Automat. Control AC-24 (4), 580-583 (1979).

52. Jazwinski, A. H., Filtering for nonlinear dynamical systems, IEEE Trans. Automat. ControlAC-ll (5), 765-766 (1966).

53. Jazwinski, A. H., Limited memory optimal filtering, IEEE Trans. Automat. Control AC-13(5),558-563 (1968).


55. Kailath, T., An innovations approach to least squares estimation-Part I: Linear filtering inadditive white noise, IEEE Trans. Automat. Control AC-13 (6),646-654 (1968).

56. Kailath, T., The innovations approach to detection and estimation theory, Proc. IEEE 58(5),680-695 (1970).

57. Kailath, T., RKHS approach to detection and estimation problems-Part I: Deterministic'signals in Gaussian noise, IEEE Trans. Informat. Theory IT-17 (5), 530-549 (1971).

58. Kailath, T, A note on least-squares estimation by the innovations method, SIAM Control J.10,477-486 (1972).

59. Kailath, T. and Frost, P. A., Mathematical modeling of stochastic processes, Proc. JointAutomatic Control Conf., Stochastic Probl. Control, Ann Arbor, Michigan pp. 1-38 (June,1968).

60. Kailath, T., and Frost, P. A., An innovations approach to least squares estimation-Part II:Linear smoothing in additive white noise, IEEE Trans. Automat. Control AC-13 (6),655 -661(1968).

61. Kailath, T, and Geesey, R., An innovations approach to least squares estimation-Part IV:Recursive estimation given lumped covariance functions, IEEE Trans. Automat. ControlAC-16 (6),720-727 (1971).

62. Kallianpur, G., and Striebel, C., Arbitrary system process with additive white noise observa-tion errors, Ann. Math Statist. 39 (3), 785-801 (1968).

63. Kallianpur, G., and Striebel, C., Stochastic differential equations occurring in the estimationof continuous parameter stochastic processes, Theory Probab. Appl. 14 (4), 567-594 (1969).

64. Kallianpur, G., and Striebel, C, Stochastic differential equations in statistical estimationproblems, "Multivariate Analysis" (P. R. Krishnaiah, ed.), Vol. II. Academic Press, NewYork, 1969.


65. Kendall, M., and Stuart, A., "The Advanced Theory of Statistics," Vol. I. Hafner, NewYork, 1958.

66. Kizner, W., Optimal Nonlinear Estimation Based on Orthogonal Expansions, Tech. Rep.32-1366. Jet Propulsion Laboratory, Pasadena, California, 1969.

67. Kramer, J. D. R., Partially Observable Markov Processes. Ph.D. dissertation, MIT, Cam-bridge, Massachusetts, 1964.

68. Kushner, H. J., On the differential equations satisfied by conditional probability densities ofMarkov processes, SIAM J. Control Ser. A 2,106-119 (1964).

69. Kushner, H. J., On the dynamical equations of conditional probability density functions,with applications to optimal stochastic control theory, J. Math. Anal. Appl. 8, 332-344(1964).

70. Kushner, H. J., Dynamical equations for optimal nonlinear filtering, J. Differential Equa-tions3,179-190(1967).

71. Kushner, H. J., Nonlinear filtering: The exact dynamical equations satisfied by the con-ditional mode, IEEE Trans. Automat. Control AC-12 (3),262-267 (1967).

72. Kushner, H. J., Approximations to optimal nonlinear filters, IEEE Trans. Automat. ControlAC-12 (5),546-556 (1967).

73. Kushner, H. J., Filtering for linear distributed parameter systems, J. SIAM Control 8 (3),346-359 (1970).

74. Kushner, H. J., A robust discrete state approximation to the optimal nonlinear filter for adiffusion, Stochastics 3,75-83 (1979).

75. Kushner, H. J., and Clark, D. S., "Stochastic Approximation Methods for Constrained andUnconstrained Systems." Springer-Verlag, Berlin and New York, 1978.

76. Kuznetsov, P. I., Stratonovich, R. L., and Tikhonov, V. I., Quasi-moment functions in thetheory of random processes, Theory Probab. Appl. 5 (1),80-97 (1960).

77. Kuznetsov, P. I., Stratonovich, R. L., and Tikhonov, V. I., Some problems with conditionalprobability and quasi-moment functions, Theory Probab. Appl. 6 (4), 422-427 (1961).

78. Kuznetsov, P. I., Stratonovich, R. L., and Tikhonov, V. I., "Non-Linear Transformationsof Stochastic Processes." Pergamon, Oxford, 1965.

79. Lee, W. H., and Athans, M., The Discrete-Time Compensated Kalman Filter, Rep. ESL-P-791. MIT Electronic Systems Laboratory, Cambridge, Massachusetts (December 1977).

80. Leondes, C. T., Peller, J. B., and Stear, E. B., Nonlinear smoothing theory, IEEE Trans.Syst. Sci. and Cybernet. SSC-6 (1),63-71 (1970).

81. Luenberger, D. G., "Optimization by Vector Space Methods." Wiley, New York, 1969.82. Maha1anabis, A. K., and Farooq, M., A second-order method for state estimation for

nonlinear dynamical systems, Internat. J. Control 14 (4), 631-639 (1971).83. Marcus, S. 1., and Will sky, A. S., Algebraic structure and finite dimensional nonlinear

estimation, SIAM J. Math Anal 9 (2),312-327 (1978).84. Masani, P., and Wiener, N., Nonlinear prediction, in "Probability and Statistics" (U.

Grenander, ed.), pp. 190-212. Wiley, New York, 1959.85. Maybeck, P. S., Combined Estimation of States and Parameters for On-Line Applications.

Ph.D. dissertation, MIT, Cambridge, Massachusetts (February 1972).86. Maybeck, P. S., Failure Detection Through Functional Redundancy, Tech. Rept. AFFDL-

TR-74-3. Air Force Flight Dynamics Laboratory, Wright-Patterson AFB, Ohio (January1974).

87. Maybeck, P. S., Failure detection without excessive hardware redundancy, Proc. IEEENat. Aerospace Electron. Conf, Dayton, Ohio pp. 315-322 (May 1976).

88. Maybeck, P. S., "Stochastic Models, Estimation and Control," Vol. I. Academic Press,New York, 1979.

89. McGarty, T. P., "Stochastic Systems and State Estimation." Wiley, New York, 1974.90. Meditch, J. S., On the state estimation for distributed parameter systems, J. Franklin Inst.

290 (1),49-59 (1970).

REFERENCES 263

91. Mehra, R., A comparison of several nonlinear filters for re-entry vehicle tracking, IEEETrans. Automat. Control AC-16 (4),307-319 (1971).

92. Mehra, R. K., and Peschon, J., An innovations approach to fault detection and diagnosis indynamic systems, Automatica 7,637-640 (1971).

93. Meier, L., Ross, D. W., and Glaser, M. B., Evaluation of the Feasibility of Using InternalRedundancy to Detect and Isolate On-board Control Data Instrumentation Failures, Tech.Rep. AFFDL-TR-70-172. Air Force Flight Dynamics Lab., Wright-Patterson AFB, Ohio(January 1971).

94. Mortensen, R. E., The representation theorem, Proc. Symp. Nonlinear Estimation (1970).95. Mowery, V. 0., Least squares recursive differential-correction estimation in nonlinear

problems, IEEE Trans. Automat. Control AC-I0 (4),399-407 (1965).96. Musick, S. H., SOFE: A Generalized Digital Simulation for Optimal Filter Evaluation;

User's Manual, Tech. Report AFWAL-TR-80-1108, Avionics Laboratory, Air Force WrightAeronautical Laboratories, Wright-Patterson AFB, Ohio (1980).

97. Nakamizo, T., On the state estimation for nonlinear dynamical systems, Internat. J. Control11 (4), 683-695 (1970).

98. Neal, S. R., Linear estimation in the presence of errors in the assumed plant dynamics,IEEE Trans. Automat. Control AC-12 (5),592-594 (1967).

99. Neal, S. R., Nonlinear estimation techniques, IEEE Trans. Automat. Control AC-13 (6),705- 708 (1968).

100. Park, S. K., and Lainiotis, D. G., Monte Carlo study of the optimal non-linear estimator,Internat. J. Control 16 (6), 1029-1040 (1972).

101. Pearson, J. B., On nonlinear least-squares filtering, Automatica 4, 97-105 (1967).102. Pearson, J. 0., Estimation of uncertain systems, "Control and Dynamic Systems: Advances

in Theory and Applications," Vol. 10, pp. 256-343. Academic Press, New York, 1972.103. Rhodes, I. B., and Gilman, A. S., Cone-bounded nonlinearities and mean-square bounds-

Estimation lower bounds, IEEE Trans. Automat. Control AC-20 (5), 632-642 (1975).104. Rhodes, I. B., and Snyder, D. L., Estimation and control for space-time point-process

observations, IEEE Trans. Automat. Control AC-22 (3),338-346 (1977).105. Robbins, H., and Monroe, S., A stochastic approximation method, Ann. Math. Statist. 22,

400-407 (1951).106. Robinson, S. R., Maybeck, P. S., and Santiago, J. M., Jr., Tracking a swarm of fireflies in

the 'presence of stationary stragglers, Proc. Internat. Symp. Informat. Theory, Griqnano,Italy (June 1979).

107. Robinson, S. R., Maybeck, P. S., and Santiago, J. M., Jr., Performance evaluation of anestimator based upon space-time point-process measurements, IEEE Trans. Informat. Theory1T-28 (to appear, 1982).

108. Rozovskii, B. L., and Shiryaev, A. N., On infinite order systems of stochastic differentialequations arising in the theory of optimal non-linear filtering, Theory of Probab. Appl. 17(2),218-226 (1972).

109. Saridis, G. N., Nikolic, Z. J., and Fu, K. S., Stochastic approximation algorithms for systemidentification, estimation and decomposition of mixtures, IEEE Trans. Syst. Sci. Cybernet.SSC-S, (I), 8-15 (1969).

110. Sawaragi, Y., and Sugai, M., Accuracy considerations of the equivalent linearization tech-nique for the analysis of a non-linear control system with a Gaussian random input, MemoirsFaculty ofEnq., Univ. ofKyoto, Kyoto, Japan 23, Part 3 (July 1961).

III. Sawaragi, Y., Sunahara, Y, and Nakamizo, T., Statistical Behaviors of the Response ofNon-Linear Control Systems Subjected to Random Inputs, Rep. 78. Engineering ResearchInst., Univ. of Kyoto (March 1961).

112. Schmidt, G. T., Linear and nonlinear filtering techniques, "Control and Dynamic Systems:Advances in Theory and Applications" (c. T. Leondes, ed.), Vol. 12, pp. 63-98. AcademicPress, New York, 1976.


113. Schwartz, L., Approximate continuous nonlinear minimum-variance filtering, "Control andDynamic Systems: Advances in Theory and Applications," Vol. 7, pp. 32-72. AcademicPress, New York, 1969.

114. Schwartz, L., and Stear, E. 8., A computational comparison of several nonlinear filters,IEEE Trans. Automat. Control AC-13, (I), 83-86 (1968).

115. Schwartz, L., and Stear, E. 8., A valid mathematical model for approximate nonlinearminimal variance filtering, J. Math. Anal. Appl. 21, 1-6 (1968).

116. Schwartz, S. C; Estimation of probability densities by an orthogonal series, Ann. Math.Statist. 38, 1261-1265 (1967).

117. Schweppe, F. C., Recursive state estimation: Unknown but bounded errors and systeminputs, IEEE Trans. Automat. Control AC-13 (1),22-28 (1968).

118. Schweppe, F. c., "Uncertain Dynamic Systems." Prentice-Hall, Englewood Cliffs, NewJersey, 1973.

119. Segall, A, Stochastic processes in estimation theory, IEEE Trans. Informal. Theory IT-22(3),275-286 (1976).

120. Smith, H. W., "Approximate Analysis of Randomly Excited Nonlinear Controls." MITPress, Cambridge, Massachusetts, 1966.

121. Snyder, D. L., Filtering and detection for doubly stochastic Poisson processes, IEEE Trans.Informat. Theory IT-18 (I), 91-102 (1971).

122. Snyder, D. L., Information processing for observed jump processes, Informat. Control 22,69-78 (1973).

123. Snyder, D. L., Point process estimation with applications in medicine, communication, andcontrol, Proc. NATO Advanced Study Inst., New Direct. Signal Processing Commun. Control(1974).

124. Snyder, D. L., "Random Point Processes." Wiley, New York, 1975.125. Snyder, D. L., and Fishman, P. M., How to track a swarm of fireflies by observing their

flashes, IEEE Trans. Informat. Theory IT-21 (6), 692-695 (1975).126. Sorenson, H. W., On the development of practical nonlinear filters, Informat. Sci. 7 (3/4),

253-270 (1974).127. Sorenson, H. W., An overview of filtering and control in dynamic systems, "Control and

Dynamic Systems: Advances in Theory and Applications" (C. T. Leondes, ed.), Vol. 12,pp. 1-61. Academic Press, New York, 1976.

128. Sorenson, H. W., Approximate solutions of the nonlinear filtering problem, Proc. IEEEConf. Decision and Control, New Orleans, Louisiana pp. 620-625 (December 1977).

129. Sorenson, H. W., and Alspach, D. L., Recursive Bayesian estimation using Gaussian sums,Automatica 7, 465-479 (1971).

130. Sorenson, H. W., and Stubberud, A R., Recursive filtering for systems with small butnonnegligible nonlinearities, Internat. J. Control 7, 271-280 (1968).

131. Sorenson, H. W., and Stubberud, A R., Non-linear filtering by approximation of the aposteriori density, Internat. J. Control 8, 33-51 (1968).

132. Srinivasan, K., State estimation by orthogonal expansion of probability distribution, IEEETrans. Automat. Control AC-15 (1),3-10 (1970).

133. Stratonovich, R. L., On the theory of optimal nonlinear filtering of random functions,Theor. Probab. Appl. 4, 223-225 (1959).

134. Stratonovich, R. L., Conditional Markov processes, Theor. Probab. Appl. 5, 156-178(1960).

135. Stratonovich, R. L., "Topics in the Theory of Random Noise." Gordon and Breach, NewYork, 1963.

136. Stratonovich, R. L., A new representation for stochastic integrals and equations, J. SIAMControl Ser. A 4 (2),362-371 (1966).

137. Streibel, C. T., Partial differential equations for the conditional distribution of a Markovprocess given noisy observations, J. Math. Anal. Appl.ll, 151-159 (1965).

PROBLEMS 265

138. Sunahara, Y., An Approximation Method of State Estimation for Nonlinear DynamicalSystems, Tech. Rep. 67-8. Brown Univ., Center for Dynamical Systems, Providence, RhodeIsland, December 1967; also Proc. Joint Automat. Control Conf., Univ. of Colorado (1969).

139. Tarn, T., and Rasis, Y., Observers for nonlinear stochastic systems, IEEE Trans. Automat.Control AC-21 (4), 441-448 (1976).

140. Tzafestos, S. G., and Nightingale, J. M., Maximum-likelihood approach to the optimalfiltering of distributed-parameter systems, Proc. IEEE 116 (6), 1085-1093 (1969).

141. Ventzel, A. D., On the equations of the theory of conditional Markov processes, TheoryProbab. Appl. 10,357-361 (1965).

142. Wiener, N., "Nonlinear Problems in Random Theory." MIT Press, Cambridge, Massa-chusetts, 1958.

143. Willsky, A. S., A survey of design methods for failure detection in dynamic systems, Au-tomatica 12,601-611 (1976).

144. Willsky, A. S., and Jones, H. L., A generalized likelihood ratio approach to the detectionand estimation of jumps in linear systems, IEEE Trans. Automat. Control AC-21 (I), 108-112(1976).

145. Wishner, R. P., Tabaczynski, J. A., and Athans, M., A comparison of three non-linearfilters, Automatica 5, 487-496 (1969).

146. Wolfowitz, J., On the stochastic approximation method of Robbins and Monroe, Ann.Math. Statist. 23 (3), 457-466 (1952).

147. Wonham, W. M., Some applications of stochastic differential equations to optimal non-linear filtering, SIAM J. Control 2, 347-369 (1965).

148. Zakai, M., On the optimal filtering of a diffusion process, Z. Wahrsch. Vert'. Gebiete Il,230-243 (1969).

PROBLEMS

12.1 (a) Show that .r.(t)(~) satisfies the forward Kolmogorov equation.

(b) Show that .r.(t)IZ(I,_j~l~i-d satisfies the forward Kolmogorov equation in the interval[ti - 1 , ti) , starting from the density .r.(I'_dIZ(I'_d(~I~i-1) obtained from the measurement updateat sample time t'-1 (as claimed below (12-3)).

12.2 (a) Show that, for any scalar random variables y and z, and scalar functions g(.) and h( ),

E{g(z)h(y)lz = n= g(OE{h(y)lz = nand

E{g(z)h(y)lz = z(·)} = g(z)E{h(y)lz = z(-)}

(b) Demonstrate how this extends to the vector case, as needed for (12-16) and many otherresults of this chapter.

12.3 Obtain the vector equivalent of Eqs. (12-21) and (12-26).

12.4 (a) In the scalar case, show that the truncated second order, modified truncated secondorder, Gaussian second order, and modified Gaussian second order filters all reduce to the extendedKalman filter if G is a function only of t (and not of x(t) as well) and if a2f lax2 and a2hlox2 areneglected.

(b) Show that both modified second order filters in the vector case similarly reduce to theextended Kalman filter under these assumptions.

(c) Show that all four second order filters of part (a) reduce to the standard Kalman filter ifdynamics and measurement models are linear.

266 12. NONLINEAR ESTIMA nON

(d) Show that both modified second order filters of part (b) reduce to the standard Kalmanfilter if dynamics and measurement models are lin~

12.5 (a) Derive the expression (12-53) for GQGT in the modified truncated second orderfilter, and show that it reduces to (12-17b) in the scalar case.

-------(b) Derive the expression (12-61) for GQGT in the modified Gaussian second order filter bywriting the appropriate terms in summation notation and invoking the form developed in Problem10.9. Show that this reduces to (12-35) in the scalar case.

12.6 Consider a system that can be modeled by means of the scalar equation

x(t) = t-'u(t)cosx(t) + [u(t)w(t) - t]x3(t ) + 4e- X(' j[ u(t ) + wIt)] + 2w(t)

where u( .) is a prespecified open-loop control function of time, and w( " .) is a white Gaussian noiseof mean zero and strength Q(t).

A scalar measurement is available at discrete time instants t.. of the form

2 sin t,z(t·) = --- + v(t.)

, 1 + X(ti) ,

where v(',·) is white Gaussian discrete-time noise of mean zero and variance R(tJWrite out the explicit equations for the modified truncated and modified Gaussian second order

filters for this problem. Explain the assumptions inherent in each of these filters. Can you alsogenerate an extended Kalman filter for this problem?

12.7 Show that the relationships of (12-64) are correct between cumulants "i and noncentralmoments mi' Show the relationships to central moments mt' are also correct.

12.8 Consider state estimation based on a scalar model of the form

based on sampled-data measurements described by

where ~(".) is Brownian motion of constant diffusion Q, x(to) is described by mean Xo and variancePo, and v( " .) is a zero-mean white Gaussian discrete-time noise of variance R. Let x(to), ~( ' .: ), andv(·,·) be independent of each other. For this problem, generate and compare

(a) the linearized Kalman filter.

(b) the first order nonlinear filter. Under what conditions on model coefficients is this theextended Kalman filter?

(c) the first order filter with precomputed gains; the first order filter with constant gains.(What is the difference between these two, and again when does this become the extended Kalmanfilter with precomputed or constant gains?)

(d) the first order filter with bias correction terms.

(e) the precomputed-gain or constant-gain first order filter with bias correction terms.

(f) the truncated second order filter.

(g) the modified truncated second order filter.

(h) the Gaussian second order filter. Under what conditions on model coefficients does thisresult have the same time propagation relations as in part (f)? Under what conditions are themeasurement update equations the same as in (f)?

(i) the modified Gaussian second order filter.

(j) the moment propagation equations given by (12-9), (12-10), and (12-62) for the full-scale(infinite-dimensional) filter, and measurement updates generated by (12-13) and (12-14).

PROBLEMS 267

(k) the second order Gaussian assumed density filter (not involving Taylor series approxi-mations as in (a)-(i)) based on the first two moment propagation relations from (j), and a measure-ment update based upon (12-21) with b, '" °and (12-26) instead of (12-13) and (12-14).

(1) the second order assumed density filter based on cumulants truncation; express in terms ofcentral moments as well as noncentral moments. When is this equivalent to the results in (k}?

(m) the third order assumed density filter based on cumulants truncation; express in terms ofcentral moments as well as noncentral.

(n) the full-scale conditional mode estimator equations; compare these to the results in (j).

(0) the statistically linearized filter; compare these results to those in (aj-te), and (k) and (I).

Specifically compare the describing function gains to the corresponding terms in the first orderfilters.

By explicitly generating each of these filters, both similarities and differences of the various approxi-mate solutions should become more evident. Under what conditions on the model coefficients dothe distinguishing characteristics disappear for this particular scalar problem?

12.9 (a) Write out the explicit equations for the extended Kalman filter and modified Gauss-ian second order filter for Example 12.3. The purpose of this is to gain an appreciation for thecomputational differences between the filters. Note that f and b are both nonlinear here, but thatG is a function only of time t.

(b) Write out the extended Kalman filter with bias correction terms and compare to theresults in part (a).

(c) Evaluate the linearized Kalman filter for this problem.

(d) Evaluate the precomputed-gain and constant-gain extended Kalman filters for this prob-lem, and compare to the previous results.

(e) Consider the filters of part (d) with bias correction terms.

(f) Generate the equations to define the modified truncated second order filter.

(g) Evaluate second order and third order assumed density filters for this problem.

(h) Evaluate the statistically linearized filter for this problem, and compare to the previousresults. Especially compare the describing function gains here to the corresponding gains of theextended Kalman filter.

12.10 Repeat parts (a)-(o) of Problem 12.8, but for the following vector-case problems (inparts (f)-(i), consider only the modified filters):

(a) the satellite orbit determination problem described by Example 9.8 and Problems 9.7 and9.13. Here fis nonlinear, b is linear, and G is a function only of time t.

(b) the uncertain parameter estimation problem of spacecraft thrust-vector controlling, asdiscussed in Example 9.9 and Problem 9.9. With the uncertain parameter treated as an additionalstate, f, h, and G have the same structural form as in part (a).

(c) the pointing and tracking problem portrayed in Example 9.10, with f linear, h nonlinearand G a function only of t. Compare these filter propagation equations to those for the higherdimensional linear dynamics model in Problem 9.12c, and the nonlinear dynamics model of Problem9.12d.

(d) the residual monitoring/sensor failure detection problem in Example 12.4.Note specificallythat here, unlike the previous cases, G is a function of x(t) as well as t.

12.11 In air-to-air tracking with radar measurements, one can choose to express state variablesin an inertial coordinate system, yielding linear dynamics and nonlinear measurement models. Or,one can use tracker line-of-sight coordinates and achieve another description with nonlineardynamics and linear measurement models. If tracking performance were comparable for filtersbased on these two different models, which would be preferable from a computational standpoint?


12.12 Assume that a discrete-time dynamics model, as (12-65) and (12-66), and discrete-timemeasurement model (12-2), adequately describe a system of interest. Develop the filters of parts(a)-(o) of Problem 12.8 for this problem formulation.

12.13 Apply the results of the previous problem to the scalar problem defined by models

x(t i+ d = [a,x(t i) + a2x2(t

i ) + a3x3(ti)] + [bo + b,x(ti) + b 2x2(t,l]W

d(t i)

z(t,) = [c,x(t,) + C2X2(ti ) + C3X

3(t,)] + v(t i)

where wd( ·, · ) and v(·,' l are independent zero-mean white Gaussian noises of variances Qd and R,respectively, and both are independent of the initial x(to) of mean Xo and variance Po.

12.14 (a) A scalar linear system is described by the stochastic differential equation

dx(t) = - 2x(t) dt + df3(t)

where f3( ., .) is scalar Brownian motion with statistics

E{f3(t)} = 0, E{[f3(t) - f3(t')Y} = SIt - t'l

The initial value of the state is known exactly: x(O) = 1. Derive an expression for the transitionprobability density for x(t). Why is this function of primary interest in estimation?

(b) At time t = 1, a measurement of the state is available. The measurement is corrupted byan error which is independent of the past history. The measurement is

z(l) = x(l) + v

The error v can only assume the discrete values + 1,0, and -1 with the following probabilities:

Prob{v = -I} =!. Prob{v = O} = 1. Prob{v = I} =!

Develop an expression for the mean of xtl ), conditioned on the measurement z(1) = (. Calculatethis conditional mean of x(l) if the measured value is

z(1) = (= 2

(c) How would your approach and/or results to this estimation problem change if (considerthe 5 parts independently)

(1) the Brownian'rnotion were replaced by a non-Gaussian driving process?(2) the system description were nonlinear?(3) v could assume a value from a continuous range of values, with the probability of its

magnitude being defined by a Gaussian density function; v independent of x(I)?(4) same as 3 but v correlated with x(l) as E{x(I)v} = ({xv?

(5) same as 3 but with a non-Gaussian density function describing v?

(d) Other estimates ofx(l) exist besides the conditional mean. What other logical definitions ofan "optimal" estimate of x(1) could be made? Under what conditions do some of these alternativesyield the same value for the "optimal" estimate?

12.15 Consider the scalar system model

dx(t) = ax 2(t) dt + bx(t) df3(t)

dy(t) = x(t)dt + df3m(t)

where f3(.,.) and f3m(',') are independent unit-diffusion Brownian motions, or, in terms of whitenoise notation,

x(t) = ax2(t) + bx(t)w(t), z(t) = x(t) + ve(t)

where w(·,·) and ve ( · , · ) are independent zero-mean unit-strength white Gaussian noises.

REFERENCES 269

(a) Show that the conditional kth order noncentral moments, mk(t), for k = 1, 2, 3, ... , basedon Kushner's equation can be written with use 0[(12-117) as

ml(t) = aml(t) + [m2(t) - mI2(t )] {z(t) - ml(t)}

m2(t) = 2am3(t) + b2m2(t) + [m 3(t) - ml(t)m2(t)] {z(t) - ml(t)}

m3(t) = 3amit) + 3b2m3(t) + [m4(t) - m l(t)m3(t)] {z(t) - ml(t)}

Express this also in terms of conditional central moments.

(b) Generate the second order assumed density filter based on a Gaussian assumed density.

(c) Using the fundamental relationships given by (12-64), generate the second order filterthat results from cumulant truncation: letting e. = 0 for all k z 3. Show that this is the same asthe result obtained in (b).

(d) Similarly generate the third order filter based on cumulant truncation above third order:letting "k = 0 for all k z 4.

(e) Express the results of(b)-(d) in terms of conditional central moments as well as noncentralmoments.

(f') Compare these results to those of filters based upon Taylor series approximations:

(1) truncated second order,(2) Gaussian second order,(3) first order with bias correction terms,(4) first order without bias correction terms,(5) linearized Kalman filter.

(g) Why are modified truncated second order filters and modified Gaussian second orderfilters not included in the list in part (f)?

12.16 Consider the basic signal phase tracking problem described as follows. A sinusoidalsignal of carrier frequency w" as corrupted by additive zero-mean white Gaussian noise v,(',·) ofstrength R, is available from a receiver:

z(t) = cos[wet + e(t)] + ve(t),

where the uncertain phase e( " .) to be tracked is modeled as a first order Gauss-Markov process:

e(t) = - belt) + WIt), E {w(t)w(t + r)} = [2bI12] b(r)

where 11 and b are the rms value and bandwidth, respectively, of e(·,·), i.e., [l/b] is the correlationtime of the phase process.

(a) Show that the extended Kalman filter for this problem has the form

Ott) = -bOlt) - {P(t)/R,}sin[w,t + O(t)] {z(t) - cos[wet + O(tm

PIt) = -2bP(t) + Q(t) - p 2(t ) sin2[ w , t + O(t)]/R,

This can be implemented as a phase-locked loop: the current Ott) is used to generate two localoscillator outputs, cos[ wet + ott)] and sin]'wet + ott)]. The first of these is subtracted from thecurrent measurement, and this residual multiplied by the sin[wet + a(t)] (to demodulate [B(t)-a(t)] and the [P(t)/R,] term, and finally passed through a low-pass first order lag filter, l/(s + b),to produce the filtered output ott) itself. Show that, in fact, this product does demodulate [1I(t) -ott)], that:

sin]'wet + Ott)] {z(t) - cos[wet + O(tm ~ ott) - B(t) + v/(t) + {sinEwet + ott)] cos[ wet + O(t)]

+ other high frequency filterable terms}

where v,'(t) is modulated white noise.


(b) As an alternative to extended Kalman filtering, design the following filters for this appli-cation, and compare them to the extended Kalman filter:

(1) linearized Kalman filter, and steady state version thereof.(2) precomputed-gain extended Kalman filter.(3) constant-gain extended Kalman filter.(4) extended Kalman filter with bias correction terms; consider precomputed-gain and

constant-gain approximations.(5) truncated second order filter.(6) Gaussian second order filter.(7) Gaussian second order assumed density filter (without Taylor series approximations for

f and h).(8) assumed density filter based on cumulants truncation, letting ~k = 0 for all k ~ 3.(9) assumed density filter based on cumulants truncation, letting ~k = 0 for all k ~ 4.

(10) statistically linearized filter; identify the describing function gains and compare directlyto those of the extended Kalman filter and assumed density filters in (7) and (8) above; these gainstend to give better performance than the extended Kalman filter gains, which are based on truncatedseries representations of f and h, and so this filter is less prone to "cycle slipping" that is typicalof phase-locked loops based on filters embodying first order series truncation assumptions.

(c) Show the modification to filter structure in (a) when the phase process is modeled as theoutput of the first order lag l/(s + b) driven by Brownian motion of diffusion Q, instead of drivenby white Gaussian noise of strength Qas in (a); note that the structure of the appropriate phase-locked loop tracker is intimately linked to the model for the phase process itself.

(d) Repeat part (c), but with Brownian motion ~(.,.) described by

E{[~(t) - ~(t'W} = f,~(10 + sin rj dr

(e) Repeat part (a), but assuming that two scalar measurements are available as

Zl(t) = cos[w,t + S(t)] + vd(t)

Z2(t) = sin[w,t + eun + v02(t)

where v, 1(., .) and v'2( ., . ) are assumed independent, and each a zero-mean white Gaussian noise ofstrength R,. Develop the extended Kalman filter, and show that, in this very special case, the varianceequation is not dependent on the state estimate O(t).

(f) Repeat part (b) for the model in part (c).

(g) Repeat part (b) for the model in part (d).

(h) Repeat part (b) for the model in part (e).

12.17 Consider the sampled-data problem analogous to the previous continuous-measurementproblem, in order to generate a digital phase-locked loop tracker. Let the dynamics model be as inProblem 12.16, but let measurements be taken every Ar sec of the form

z(t,) = cos[w,t, + e(t,)] + v(t,)

where v( " -}is zero-mean white Gaussian discrete-time noise of variance R. Generate the "equivalentdiscrete-time system model" for this problem.

(a) Repeat (a) of the previous problem, including a demodulation interpretation.

(b) Repeat (b) of the previous problem, adding modified truncated and modified Gaussiansecond order filters to the list of designs to consider.

(e) Repeat (c) of the previous problem, with sampled-data measurements.

(d) Repeat (d) of the previous problem, with sampled-data measurements.

PROBLEMS 271

(e) Show how the results of (e) of the previous problem are altered for sampled-datameasurements

z,(t;) = cos[w,t; + 8(t;)] + v,(t;)

Z2(t i ) = sin[wet; + 8(t;)] + v2(t ;)

where v, (" .) and v2(" .) are independent of each other, and each a zero-mean white Gaussiandiscrete-time noise of variance R.

(f) Repeat part (b) for the model in part (c).

(g) Repeat part (b) for the model in part (d).

(h) Repeat part (b) for the model in part (e).

12.18 Using the moment equations (12-108) and (12-109) based upon the Kushner equations,derive (a) the Kalman filter, (b) the extended Kalman filter, (c) the truncated second order filter,(d) the Gaussian second order filter, and (e) the Gaussian assumed density filter, based on suitableassumptions.

12.19 Using (12-117), derive the third order assumed density filter based upon cumulantstruncation for the scalar case.

12.20 (a) Show that the mean and variance of D,(e) as given in Example 12.9 are correct.

(b) Demonstrate the validity of Fig. 12.13 for the detector error characteristic for this example.

(c) Show that the expressions for D8e) and Dt(e) of Example 12.10 do have the means andvariance kernels as claimed.

12.21 (a) Use the representation theorem of (12-118) and (12-119) to derive the continuous-time measurement Kalman filter for the linear models

dx(t) = F(t)x(t)dt + G(t)dp(t)

dy(t) = H(t)x(t) dt + dPm(t)

where P(', .) and Pm(', .) are independent Brownian motions of diffusions Q(t) and Re(t), for all t,respectively.

(b) Use the innovations approach of (12-123)-(12-126) to derive this same result.

12.22 Derive the estimator results for the case of both continuous-time and discrete-timemeasutements being available. Specifically, let the state evolve according to (12-1), and let bothcertain discrete-time measurements Zd(t;) as described by (12-2) and certain other continuous-timemeasurements ze(t) as described by (12-100) or (12-100') be available from the system. Developestimators ofthe forms depicted in parts (a)-(o) of Problem 12.8 for this formulation.

A

Accuracy, maximally achievable, 72, see alsoPerformance analysis

Adaptive filtering, 69, 120, 129, 136, 141,232,256

Adaptive noise estimation, 69, 120, 129, 136,141,156

Additive, see Linearly additive noiseAdequate model, 24, 25, 160, 190, 224, 225,

see also ModelsAdjoint, 10, 15, 21Admissible, 175, 179, 181Age-weighting of data, 28, 63, see also Fading

memory filterAlgebra, o-algebra, 176Algebraic operations, see Constant gainAlgebraic Riccati equation, see Constant gainAlgebraic structure, 257Algorithms, see Filter; Performance analysis;

Software programs for designAliasing, 75Almost everywhere, see ConvergenceAlmost surely, see ConvergenceAmbiguity function, 97, 99, 141A posteriori probability, 213, 241, see also

MAP estimate; Measurement, updateApproximate nonlinear filter, 214, see also

FilterApproximations, 39, 81, 83, 89, 101, 109, 123,

214,218,234,241,242A priori probability, 35, 73, 76, 130, see also

Initial conditions; Initial covariance ma-trix; Mean

Artificial lower bounding of covariance, 27,135,see also Compensation

Index

Assumed density filter, 216, 235, 244, 250, 254Asymptotic, 71, 73, 96Asymptotic behavior, 71, 73, 96,102,133,136Asymptotic efficiency, 71, 96Asymptotic normality, 71, 96, 136, 139Asymptotic unbiasedness, 71, 96, 136, 138Augmentation, see State, augmentedAutocorrelation, see CorrelationAverage, see Expectation; Time average

ensemble, 81Averaging, 103, 141

B

Backward filter, 2, 5, 9Backward Kolmogorov equation, 192Basis functions, 240Batch processing, 74Bayes' rule, 35, 76, 77, 80, 130, 162,214,219,

236Bayesian estimation, 73, 129, 214, 236Benchmark of performance, 27, 82Best, see Optimal criteriaBias, 52, 72, 109, 138, 224, see also Random

biasBias correction terms, 218, 221, 224-226, 231,

249,250Block diagram of multiple model filtering al-

gorithm, 132Bounded, 182, 183Bounded matrix, 27Bounded variation, 176Brownian motion (Wiener process), 39, 51,

160,175-176,179-181

273

274

C

Canonical form, 114, 115Causal, 256Central moments, 215, 236, 237, see also Co-

varianceChapman-Kolmogorov equation, 171, 174,

194, 213, 247Characteristic function, 196, 209,236

conditional 209, 236, 247Characteristic polynomial, see EigenvalueCharacterization of stochastic process, 161,

174,202Cholesky square root, 21, 192, 210-211, see

also FactorizationCoefficient, correlation, 28Combining states, see Order reductionCompensation for model inadequacies, 23, 24,

27,224-225, see also TuningComplete, 239Completely observable, 114Computational aspects, see ComputerComputed covariance, 23, 27, 52, 231, see

also Conditional covariance matrix; Tun-ing

Computerloading, 10, 17, 35, 37-38, 73, 74, 81, 88,

101-103, 110, 114-1l6, 216, 221, 225,234,245

memory, 17,37-38,74,103, Ill, 116programs for design, see Software pro-

grams for designCondition number, 102Conditional characteristic function, 209, 236,

247Conditional covariance matrix

bounds on, 25, 27, 72, 97, 99, 101, 102, seealso Benchmark of performance

for continuous-time states with continuousmeasurements, 58, 248, 249

with discrete measurements, 9-10, 15-17,30, 32,44-45, 131, 133,215-216,225-227, 238, 244

for discrete-time state with discrete mea-surements, 238

Conditional density functionfor Gauss-Markov process, 162for Markov process, 162,213-214,248,251

Conditional distribution function, 162Conditional expectation, 1, 176,215, 220, see

also Conditional moments; FilterConditional Gaussian density, see Assumed

INDEX

density filter; Gaussian second order fil-ter; Kalman filter; Statistically linearizedfilter

Conditional information matrix, 81, 85-88,120

Conditional mean, 213, 215-216, 248, 256for continuous-time state with continuous

measurements, 58, 248-249with discrete measurements, 213, 215for discrete-time state with discrete mea-

surements, 238Conditional mode, 72, 213, 241, 251

for continuous-time state with continuousmeasurements, 251

with discrete measurements, 241for discrete-time state with discrete mea-

surements, 243Conditional moments

central, 215, see also Conditional co-variance matrix

for continuous-time state with continuousmeasurements, 58, 248-249

with discrete measurements, 215for discrete-time state with discrete mea-

surements, 238noncentral, 236, see also Correlation

Conditional probabilitydensity, 162,213-214,248,251distribution, 162

Conditional quasi-moments, 239Conditional variance, see Conditional co-

variance matrixConfidence, 139, see also Information, Un-

certaintiesConservative filter, see Robustness, TuningConsistent estimator, 71, 96, 136, 138Constant-gain

approximation, 57, 58,250,266,267,270extended Kalman filter, 57, 58, 266, 267,

270steady state filter, 57,58, 137,250,266, 267,

270Continuity

mean square, 161, 176, 179, 183with probability one, 161, 176, 179, 183

Continuous, 39, 161, 176, 179, 182, 183, 191Continuous time

control, 39, 160, 161, 213, 251estimation, 20-21, 57, 245filtering, 57, 245linear dynamic system model, 39-41, 160-

161, 165

INDEX

measurement, 57, 160, 165,246,251nonlinear dynamic system model, 39, 165,

166, 181, 213Control, 2, 10,37,39,86, 160, 161,213,251Control input estimation in smoothing, 10Controllability, stochastic, 24Convergence, 58, 82, 133,241,247,257

in the mean (mean square), 161, 176, 177,182, 183

in probability, 71with probability one (almost sure), 161, 176,

182, 183rate, 82

Coordinates, shifted, see Perturbation; Taylorseries

Correlated noisespatial, 54temporal, see Shaping filter

Correlation, see also Covarianceautocorrelation, 137, 252coefficient, 28cross correlation, 28, 136, 185, 256estimate of, 120, 123, 137function, see Correlation kernelkernel, 39, 70, 137, 160, 162, 164, 256matrix, 40, 70, 120, 137, 160, 162, 176, 181,

185methods for self tuning, 136spatial, 54temporal, see Shaping filtertime, see Shaping filter

Corruption, measurement, see Measurementnoise

Covarianceanalysis, 48bounds, 25, 27, 72, 97, 99,101, 102,see also

Benchmark of performanceconditional, see Conditional covariance

matrixcross covariance, 28, 136, 256error, 137, 224, 231, see also Conditional

covariance matrixestimate of, 120, 123, 129, 136, 141, see also

Monte Carlofactorization, 1,9,21,27,116,192,210,211filter-computed, 23, 27, 52, 231, see also

Conditional covariance matrix; Tuningfunction, see Covariance kernelKalman filter, 9, 44-45, 58, 84-87,131,248kernel, 39, 40, 70, 160, 162, 164matching for self tuning, 141matrix, 39, 70, 120, 160, 162, 176, 181

275

matrix for continuous linear system model,162, 201

matrix for continuous nonlinear systemmodel, 199-201

matrix for discrete linear system model, 238notation, xvtrue error, see Performance analysis; Truth

modelCramer-Rao lower bound, 72, 97, 99, 101,

102Criteria, optimality, 1,213, see also EstimateCross correlation, 28, 136, 185, 256Cross covariance, 28, 136, 256Cumulant, 236, 237Cumulant truncation, 236, 237, 250Curve fitting, 111

D

Dataprocessing, see Measurement updatespurious, 232

Delta function, 185Density function

conditional, see Conditional density func-tion; Estimate; Filter

for continuous linear system, 162, 201for continuous nonlinear system, 192, 197,

199-201for discrete linear system, 84for discrete nonlinear system, 238Gaussian, 129, 132joint, 161, 162, 202, 233

Density, power spectral, 54Describing function, 244, 254Design

filter, see Adequate model; Performanceanalysis; Tuning

software, see Monte Carlo; Performanceanalysis

Design model, see ModelsDeterministic control input, 2, 10, 37, 39, 86,

160, 161, 213, 251Difference equations, see also Propagation

stochastic, 70, 238Differentiable, 161, 176, 191,235,243Differential, 161, 181Differential equations, see also Propagation

stochastic, 160, 165, 166, 181Differential generator, 187, 188Differential rule, ItO, 186, 236

276

Diffusion, 161, 181, 184, 187, 196, see alsoKolmogorov's equations

Digital computation, see ComputerDirac delta, 185Discontinuity, 164Discrete-time

control, 2, 10,37,86dynamics, 2, 70, 238, 243equivalent, 2, 70, 210estimation, see Filter; Smoothingfiltering, see FilterMarkov process, 167measurement, 2, 40, 70, 160, 213model, 70, 238model, equivalent, 2, 70, 210process, 70, 167,238system, see Discrete-time dynamics and

measurementDiscrete-valued random vector, 71,130,167,

171Discretization, 2, 70, 130, 136Distance measure, 178, 181Distributed computation, 144Distributed parameter system, 256Distribution, 161, 162, see also Density func-

tionDivergence, 23, 24, 38, see also Compensa-

tion for model inadequacies; TuningDrift, 184Dummy variable, xiiiDynamic programming, 18Dynamic system model, see System model

(linear); System model (nonlinear)continuous-time linear, 41, 160-162, 165continuous-time nonlinear, 39, 165, 166,

181, 199, 213discrete-time linear, 2, 70discrete-time nonlinear, 238

Dynamics noise, 2, 39, 70, 120, 160,213

E

Edgeworth series, 239Efficient estimate, 71Eigenvalue, 58, 75, 1I5

assignment or placement, 58Eigenvector, 58, 75, 1I5

assignment or placement, 58Ensemble average, 81Epsilon technique of Schmidt, 31

INDEX

Equilibrium state, 57Equivalent discrete-time model, 2, 70, 210,

21IErgodic, 123Erroneous model, see Adequate model; Com-

pensation for model inadequaciesError

analysis, see Performance analysisbounds, 72, 82, 97, 99, 101, 102compensation, see Compensation for model

inadequaciescovariance, 137,244, 231, see also Condi-

tional covariancecovariance, computed, 23, 27, 52, 231, see

also Conditional covariance; Tuningmodel, see Modelssensitivity, see Sensitivity; Robustnessstates, see Perturbationtracking, 54-56, 109true, see Performance analysis

Estimate, see also Filterasymptotically efficient, 71, 96asymptotically unbiased, 71, 96, 136Bayes, 73, 129, 213-215, 236best linear, see Kalman filterbias, 52, 72, 109, 138, 224computed, 23, 27, 52, 231, see also Per-

formance analysis; Reduced-order;Tuning

conditional mean, see Bayesian estimation;Conditional mean

consistent, 71, 96, 136, 138continuous-time, continuous-measurement,

20-21,58,245continuous-time, discrete-measurement,

see Filtercovariance, 120, 123, 128, 129, 136, 141,see

also Conditional covariance matrix;Monte Carlo

discrete-time, 9, 84-88, 120-123, 238, 243error, see Errorleast squares, 18, 72, 156-158, 257linear, see Kalman filterfor Markov process, 162, 163-165,213maximum a posteriori, (MAP), 213, 241,

251maximum likelihood, 18, 35,71, 74, 80, 96,

101, 120of mean, see Conditional mean; Monte

Carlominimum error variance, unbiased, 72, 213,

243, 256, see also Conditional mean

INDEX

minimum mean squared error, see Condi-tional mean

minimum variance reduced order, 25of moments, 123, 128, see also Conditional

moments; Monte CarloMVRO,25notation, xviparameter, 48, 68, 84, 120, 129, 141predicted, see Propagationrecursive, 37, 131, 213smoothed, 1, 58, 257of statistical parameters, see Monte Carlounbiased linear, 72unknown but bounded, 71, 72, 257of variance, see Estimate of covarianceweighted least squares, 18, 72, 156-158,

257Estimator, see EstimateEuclidean norm, 22Euclidean space, xiii, 22Existence, 178, 181, 182,247,251Expansion point, see Equilibrium state; Lin-

earization; Nominal; Relinearizations;Taylor series

Expectation, 199conditional, I, 176,215,220, see also Con-

ditional expectationExpected value, see ExpectationExponential age weighting of data, 28, 63, see

also Fading memory filterExponentially time-correlated process, 53,

139, 188, 205Extended Kalman filter, 44-59, 198,218,227,

231,245,254constant-gain steady state, 57, 58, 254continuous-time measurements, 57iterated, 58precomputed gain, 57, 58

F

Factorization, 1,9,21,27, 116, 192,210, 211,227

Fading memory filter, 28, 31,63, 156Failure detection, 232Feedback control, 86, 213, 251Feedback gain, 86Filter,* see also Estimate

adaptive, 69, 120, 129, 136, 141, 232, 256

*Numbers in parentheses are equationnumbers.

277

age-weighting, 28, 31, 63, 156, (9-10)-(9-11)

approximate nonlinear, 214assumed density, 216, 235, 244, 250, 254,

(12-9)-(12-14), (12-21), (12-26),(12-108)-(12-109)

bias correction terms included, 218, 221,224-226, 231, 249, 250, (12-44), (12-47),(12-49), (12-51)

conditional mode, 213, 241, 251, (12-80)-(12-86)

conditional moment, 215, 225-227, 238,248-249

conditional quasi-moment, 239constant-gain, 57, 58, 137, 250, 266, 267,

270continuous-time measurements, 58, 245,

248, 249control inputs included, 9, 37, 44. 58, 213.

233, 238, 251cumulant truncation, 236, 250, (12-64)discrete-time dynamics, 9, 238, 243, (8-31)-

(8-33), (12-67)extended Kalman, 44-59, 198, 218, 227,

231, 245, 254, (9-61)-(9-69), (9-80)-(9-82)

fading memory, 28, 31, 63, 156, (9-10)-(9-11)

finite memory, 33, 74, 80, 257, (9-25)-(9-29)

first order, 42, 218, 225, 249, see also Ex-tended Kalman filter

first order filter with bias correction terms,218, 224, 225, 249, (12-40)-(12-41)

Gaussian second order, 216, 221, 226, 249,(12-33)-(12-39). (12-110), (12-112)-(12-116)

higher order moments, 216, 235, 236, 250,(12-21), (12-26), (12-62), (12-117)

infinite dimensional, 213, 215, 238, 242,248, (12-9), (12-10), (12-12)-(12-14),(12-21), (12-26), (12-62), (12-117)

inverse covariance, 3,6,9,35,54, see alsoInformation

Kalman, see Kalman filterKalman-Bucy,248limited memory, 33, 74, 80, 257, (9-25)-

(9-29)linearized Kalman, 42mode, 213, 241, 251modified Gaussian second order, 223, 224,

226, 227,249, (12-48)-02-61)

278

Filter (con' t, )

modified truncated second order, 221, 223,225, 249, (12-42)-(12-54)

moment, 215, 238, 248multiple model adaptive, 129, 131, 142,

(10-104)-(10-108)MYRO,25quasi-moment, 239second order: Gaussian, 216, 221, 226, 249,

(12-33)- (12-39), (l2-IIO), (l2-1I2)-(l2-116)

second order: modified Gaussian, 223, 224,226, 227, 249, (12-48)- (12-6I)

second order: modified truncated, 221, 223,225, 249, (12-42)-(12-54)

second order: truncated, 217, 218, 220, 225,249, (12-17)-(12-19), (12-27)-(12-31)

statistically linearized, 243, (12-90)-(12-97)truncated second order, 217, 218, 220, 225,

249, 02-17)-(12-19), (12-27)-02-31)Filter-computed covariance, 23, 27, 52, 231,

see also Conditional covariance matrix;Tuning

Filter design, see DesignFilter divergence, 23, 24, 38, see also Com-

pensation for model inadequacies; Tun-ing

Filter error, see ErrorFilter gain, 9, 27, 28, 30, 44, 57, 58, 128, 220,

222,225,226,244,249,266,267,270compensation of, 23, 24, 27, 28, 30, 32, 69,

120, 128, 136, 141, see also TuningFilter model, see System model; Reduced-

order; Simplified; design modelFilter robustness, 143, see also Divergence;

TuningFilter stability, 24, 34, see also Numerical sta-

bilityFilter tuning, see TuningFinite dimensional, 214-216, 248Finite memory filter, 33, 74, 80, 257First moment, see Expectation; MeanFirst order approximation, 41, see also First

order filter; Modified second order filterFirst order density, 162First order filter, 42, 218, 225, 249, see also

Extended Kalman filterFirst order filter with bias correction terms,

218, 224, 225, 249First order lag, 49, 53First order Markov model, 53, 139, 188, 205

INDEX

First variation, see Linear perturbationFisher information matrix, 36, 158; see also

Inverse covariance form filterFixed-gain approximation, see Constant gainFixed-interval smoother, 4, 5, 9Fixed-lag smoother, 5, 16Fixed-length memory filter, 33, 74, 80, 257Fixed-point smoother, 5, 15, 17Fokker-Planck equation, see also Kolmo-

gorov's equationsFormal rules, 166, 177, 178, 186,187,189, 192Forward filter, 2, 5, 9Forward Kolmogorov equation, 192, 193, 197,

209, 213, 215, 236, 242Fourier transform, 209, see also Characteris-

tic functionFrequency domain, 54, 63, 65, see also Sta-

tionary; Time-invariant system modelFunction

characteristic, 1%density, see Density functiondistribution, see Distributionprobability, see Probability

Function of random vector, 185, 218, 219Functional analysis, 178, 180, 251Functional redundancy, 233Fundamental theorem of differential equa-

tions, 182Fundamental theorem of It6 stochastic cal-

culus,207

G

Gain matrix, see Filter gainGaussian probability density, 77, 132,241

conditional, 77,132,216,221,226,235,241,243, 244, 249, 250, 254

Gaussian filterassumed density, 216, 235, 244, 250, 254second order, 216, 221, 226, 249

Gaussian process, 161, 164, 176, see alsoBrownian motion; Gauss-Markov pro-cess model

white, see White Gaussian noiseGaussian random vector, 39, 77, 132, 161,

164, 176, 241Gaussian second order filter, 216, 221, 226,

249

INDEX

Gaussian stochastic process, see Gaussianprocess

Gaussian white noise, see White Gaussiannoise

Gauss-Markov process model, 53, 139, 160,162, 188, see also System model (linear)

Generalized ambiguity function, 97, 141N-step,98

Generalized inverse, 31, 123, 138, 142Generalized quadratic cost, 72, 158Gradient, 81, 154Gram-Charlier series, 240

H

Half-power frequency, 54Hermite polynomial, 239Hessian, 81Higher order filters, 216, 235, 236, 250Hilbert space, 157,241,257History

measurement, 2, 83, 160, 165,213,215,246residual, see Residual

Hypercube, 194Hypothesis conditional probability, 130Hypothesis testing, 73, 130, 139, 232Hypothetical derivative, 164, 165, 246

I

Identifiability, 75, 156Identification, see System identification; Pa-

rameterImplementation, see ComputerImpulse, 185Impulsive corrections, 26Inaccuracies, model, see Adequate model;

Numerical precision; System model (lin-ear); System model (nonlinear)

Increment, see Independent incrementsIndependent, 44, 70

in time, see White noiseand uncorrelated, see Correlation

Independent increments, 160, 163, 176, 183,247, see also Brownian motion; Poissonprocess

Independent processes, 44, 53, 70

279

Independent random vectors, 44, 70Infinite dimensional optimal filter, 213, 215,

238, 242, 248Infinitesimal, 184Information, 35, see also Information matrix;

Inverse covariance form filterInformation matrix, 36, 158, see also Condi-

tional information matrixInitial conditions, 2, 39, 41, 70, 76, 84, 130,

160, 196, 213Initial covariance matrix, 2, 39, 41, 70, 160,

213Initial transient, 43, 103, 104, 112, 113, 135Inner product, 22, 157,206Innovations, 18, 256, see also ResidualInput

deterministic control, 2, 10,37, 39, 86, 160,161,213,251

optimal design, 136Integral

ItO stochastic, 166, 175, 178, 180, 181, 189Riemann, 160stochastic, see ItO stochastic integral; Stra-

tonovich integral; Wiener integralStratonovich stochastic, 163, 178, 188, 191-

193Wiener, 161, 175, 180

Integral equation, 181Integration by parts, 1%Integration methods, 45, 189, 190-192, 210-

211Invariant embedding, 257Inverse, 3, 6, 8-10, 35, 54, 102

generalized, 31, 123, 138, 142Inverse covariance form filter, 3, 6, 9, 35, 54,

see also InformationInversion lemma, matrix, 6Invertibility, 114, 123,256Iterated extended Kalman filter, 58Iterated linear filter-smoother, 58, 59Iterative scalar measurement updating, 56,

64, 116; see also Measurement updateIterative solutions, 80, 81, 102, 154, 155ItO differential, 181ItO differential rule, 186,207,236ItO differential equations, 181-183, 191, 213

existence and uniqueness of solutions, 182Markov solution, 182, 183properties of solution, 182, 183

ItO integral, 166, 175, 178, 180, 181evaluation, 178, 189

280

ItO integral (con' t. )relation to Stratonovich integral, 189, 191,

192ItO stochastic integral, see ItO integral

J

Jacobian, 41, 44, 45, see also Partial deriva-tive

Joint density and distribution, 161, 162, 202,233

Jointly Gaussian, 162Jordan canonical form, 115Joseph form, 10, 137

K

Kalman filtercontinuous-time, continuous-measurement,

248discrete-measurement, 2, 9, 15, 16, 17, 24,

42,44,78,85,87, 13l, 223, 265, 266discrete-time, discrete-measurement, 2, 9,

15, 16, 17,78,85,87, 13ldivergence, 23, 24, 38, see also Compensa-

tion for model inadequacies; Tuningextended, 44-59, 198, 218, 227, 231, 245,

254gain, see Filter gainlinearized, 42optimal smoother based on, 2, 5, 9-10, 16,

17perturbation, 42robustness, see Robustness; Tuningsensitivity, see Sensitivitystability, 24, 34steady state, 57, 58, 137,250tuning, see Tuning

Kalman gain, see Filter gainapproximations, see Curve fittinglinearized, 42relinearized, 44, see also Extended Kalman

filtersteady state, 57, 58, 137,250tuning, see Tuning

Kalman-Bucy filter, 248Kernel

correlation, 39, 40covariance, 39,40cross-correlation, 256cross-covariance, 256

INDEX

Kolmogorov's equationsbackward, 192forward, 192, 193, 197, 209, 213, 215, 236,

242Kushner equations, 246Kushner moment equation, 248, 250

L

Laplace transform, 54Least squares estimation, 18, 72, 156-158,

257weighted, 18, 72, 156-158, 257

Least squares fit, 72, 141Levy oscillation property, 177, 186,247Levy process, see Brownian motionLikelihood equation, 75, 78-80, 120, 242Likelihood function, 35, 73, 75-78, 109, 120-

122, 140, 233, 242Likelihood ratio, 176,209,235Limit in mean, 161, 176, 177, 182, 183,206Limited memory filter, 33, 74, 80, 257Linear controller, 86Linear filter, see Filter; Kalman filterLinear function, 180, see also Filter; System

model (linear)Linear functional, 180Linear minimum variance filter, see Kalman

filterLinear models, see System model (linear)Linear operations, 22, 157, see also Linear

function; LinearizationLinear perturbation, 40-42, 48, 52, see also

PerturbationLinear stochastic difference equation, 70Linear stochastic differential equation, 160,

161Linear system model, 70, 160, 161, see also

System model (linear)Linear transformation, 22, 114, 180, see also

Linear functionLinear unbiased minimum variance estimate,

see Kalman filterLinearization, 40, 41Linearized Kalman filter, 42Linearly additive noise, 39, 40, 160, 165Lipschitz, 182, 191Local iterations, 58, 81, 84, 138Lower bound

artificial bounding of covariance, 27Cramer-Rao, 72,97, 99,101,102performance, see Benchmark of perform-

INDEX

ance; Ambiguity function; Informationmatrix

Lower triangular, 21, 192,210-211, see alsoFactorization

Low-pass filter, 49, 53Luenberger observer, 58

M

MAP estimate, 213, 241, 251Mapping, xiii, see also FunctionMarginal probability

density, 130, 162,214distribution, 162, 202

Markov process, 136, 159, 162, 163, 166, 167,176, 183, 192, 213

Martingale, 176, 179,209,247,251,256Matrix

block diagonal, see Cross-correlationCholesky square root, 21, 192, 210-211,

see also Factorizationconditional information (J), 81, 85-88, 120control input (B;Bd ) , 2, 37, 41, 70, 75, 160controller gain (G e ) , see Controller gaincorrelation ('It), see Correlationcovariance (P), see Covariancecross-correlation, see Cross-correlationcross-covariance, see Cross-covariancediagonal, 191, 192dynamics noise strength (Q;Qd)' 2, 39, 70,

120, 160, 181,213factorization, I, 9, 21, 27, 116, 192, 210,

21l,227filter gain (K), see Filter gainfundamental (F), 41, 160gain, see Filter gainHessian, 81information (~), 36, 37, 158,see also Condi-

tional information matrix; Inverse co-variance form filter

initial covariance (Po), 2, 39, 70, 160, 213inverse, 3, 6, 8-10, 35, 54, 102inversion lemma, 6invertible, 114, 123, 256Jacobian, 41,44,45, see also Partial deriva-

tiveKalman gain, see Kalman gainlower triangular, 21,192,210-211, see also

Factorizationmeasurement (H), 2, 41, 70, 156, 160measurement noise strength (or covariance)

(R;R.,), 2, 40, 70, 120, 160, 213, 246

281

noise input (G;Gd ) , 2, 39, 70, 71,160,181,238

nonsingular, see Invertibilityoperations, see Linear transformationpartition, 81positive definite, see Positive definitepositive semidefinite, see Positive semidef-

initeRiccati, see Riccati equationsimilarity transformation, 114singular, 8, 70, 123, see also Positive semi-

definitesquare root, 1, 9, 21, 116, 192, 210-211,

227, see also Factorizationstate transition (<<1», 2, 15,70,74, 115, 161,

190state transition probability, 169symmetric, 116system dynamics (F), 41, 160trace of, 181transformation, 114unitary, 27, 211upper triangular, 27, 211weighting, 18, 72, 156-158, 257

Maximum a posteriori (MAP), 213, 241, 251Maximum likelihood (ML), 18,35,71,74,80,

96, 101, 120Mean, see also Expectation

conditional, see Conditional mean; Bayes-ian estimation; Filter

Mean squarecontinuity, 161, 176, 179, 183convergence, 161, 176, 177, 178, 182derivative, 160, 161, 164, 165,246limit, 161, 176, 177, 178, 182, 183

Mean value, 72, see also MeanMean vector, 72, see also Mean

for continuous linear system models, 162for continuous nonlinear models, 199for discrete-time linear models, 84for discrete-time nonlinear models, 238

Measurable, 176,246,251Measurement

continuous-time, 57, 160, 165,246as controller input, 213, 251discrete-time, 40, 70, 160, 163, 165equation, see Measurement, discrete-time;

Measurement, continuous-timeerror, see Measurement noisehistory, 2, 83, 160, 165,213,215,246iterative scalar update, 56, 64, 116, see also

Measurement updatelinear continuous-time model, 160, 163

282

Measurement (con't.)linear discrete-time model, 2, 70, 160matrix, 2, 41, 70, 156, 160noise, 2, 40, 70, 120, 160, 165, 213, 246nonlinear continuous-time model, 57, 165,

246,251nonlinear discrete-time model, 40, 165, 213point-process model, 163-165,247-248Poisson model, 163-165,247-248residual, see Residualsampled-data, 2, 40, 70, 160, 165, 213statistical description, 2, 40, 70, 120, 160,

213, 246spurious, 232time-correlated noise, see Shaping filterupdate, 9, 27, 28, 30, 32,44, 87, 131, 220,

222, 225, 226, 242, 244, see also Filtervector, 2,40, 57, 70,160,163,165,213,246,

251Median, 72Memory requirements, 17, 37-38, 74, 103,

111,116Microprocessor, 144Minimal o-algebra, 176,246,251Minimum mean square error (MMSE), 213,

243, 256, see also Conditional meanMinimum variance estimate, 72, 213, 243,

256, see also Conditional meanMinimum variance reduced order (MYRO),

25Mismodeling, 69, 133, see also Adequate

model; Compensation for model inade-quacies; Divergence; Robustness; Tun-ing

ML estimate, see Maximum likelihoodMMSE, see Minimum mean square errorMode

conditional, 72, 213, 241, 251of system response, 75, 115

Models, see also System model (linear); Sys-tem model (nonlinear); Measurement

dynamics, see Dynamic system modeleffect of errors, 23, 24, 38, see also Error,

Mismodellingequivalent discrete-time linear, 2, 70, 210,

211error models, see Error; System model (lin-

ear); System model (nonlinear)linear system models, see System model

(linear)measurement, see Measurementnonlinear system models, see System

model (nonlinear)

INDEX

process, see Dynamic system model; Sto-chastic process; System model (linear);System model (nonlinear)

reduced-order, 23, 25, see also Adequatemodel

simplified, 109, 121, 128, 216, see also Ade-quate model; Approximations; Re-duced-order

simulation, 189, 210, see also Monte Carlosystem, see System model (linear); System

model (nonlinear)time invariant, see Time-invariant system

modelModified canonical form, 115Modified Gaussian second order filter, 223,

224, 226, 227, 249Modified Jordan canonical form, 115Modified second order filter, 221-227, 249Modified truncated second order filter, 221,

223, 225, 249Moment

central, 215, 236, 237, see also Covarianceconditional, 58, 215, 236, 238, 248-249, see

also Conditional covariance matrixfor continuous linear system models, 162for continuous nonlinear system models,

199for discrete-time linear models, 84for discrete-time nonlinear models, 238estimate of 123, 128, see also Conditional

moments; Monte Carlofirst, see Meannoncentral, 236, see also Correlationsecond, see Correlation; Covariance

Moment generating function, see Characteris-tic function

Monitoring, residual, 69, 131, 133, 232Monte Carlo analysis, 48, 51, 54, 93, 105, 190,

210,225,229,231,237Moving window, 33, 74, 80, 257Multiple model filtering algorithm, 129, 131,

142Multiplicative noise, 166, 181,213,238,245Mutually exclusive, 169MYRO estimator, 25

N

Neglected terms, 109, 121, 128,216, see alsoAdequate model; Approximations; Re-duced-order

Newton-Raphson, 81, 155

INDEX

Noisecorrelated (in time), see Shaping filterGaussian, 161, 164, 176, see also Brownian

motion, White Gaussian noisegeneralized Poisson, 164, 165point process, 163-165, 247Poisson, 163-165, 247white, 39, 40, 70, 160, 164-166,256

Noise input matrix, 2, 39, 70, 71, 160, 181,238Nominal, 40, see also LinearizationNonanticipative, see CausalNoncentral moment, 236, see also CorrelationNondifferentiability, 161, 164, 165, 176, 235,

243, 246Nonlinear effects, 39, 159, 212, see also Ade-

quate model; Neglected terms; Nonlinearmodels

Nonlinear estimation, 18, 39, 212Nonlinear fiiter, 212, see also Filter

continuous-time, continuous-measurement,57,58,245

continuous-time, discrete-measurement,39, 213, 215, 239, 241, 243

discrete-time, discrete-measurement, 238Nonlinear functionals, 180Nonlinear models, see Dynamic system

model; Measurement; System model(nonlinear)

Nonlinear observers, 58Norm, see Inner productNormal, see Gaussian random vectorNotation, xiii, xviN-step information matrix, 36, 37, see also

Conditional information matrixNumerical precision, 9, 28, 45, 102, 115, 116,

121,123,210,211, see also Factorization;Wordlength

Numerical stability, 24, 28, 102, 116, 121, seealso Factorization

o

Observability, 114, 138stochastic, 24

Observations, see MeasurementsObserver, 58Off-diagonal, see Cross-correlation; Factori-

zationOffline calculations, see PrecomputableOff-nominal, see Nominal; Perturbation; Ro-

bustness; Sensitivity

283

Omega (er), xiii, 63, 65, see also Realizationof random vector

Online, 69, 101, see also Offline calculationsOptimal criteria, 1,213, see also EstimateOptimal estimate, see Estimate; Filter;

SmoothingOptimal filter, 213-216, 246-248, see FilterOptimal input design, 136Optimal prediction, see _Filter; PropagationOptimal smoothing, see SmoothingOrder reduction, 23, 25, see also Adequate

model; Approximations; SimplifiedOrthogonal, 239, 256Orthogonal projection, 256Orthonormal, 240Overweighting most recent data, 28, 31, 63,

156

p

Parallel architecture, 144Parameter

estimation, 48, 68, 84, 120, 129, 136, 141,143

identification, 48, 68, 69, 84, 129, 153sensitivity, 68, 98,143, see also Robustnessuncertainties, 48, 68, 143, see also Models;

NoiseParameterization of density function, 214,

215,239Partial derivative, 41, 66, 75, 79, 85, 193, 194,

217, 224, 225, 226, see also Expansionpoint; Taylor series

Partition, 81Penrose pseudoinverse, 31, 123, 138, 142Perfect knowledge of entire state, 213, 251Performance analysis, 48

benchmark, 27, 82covariance, 48error, 48, see also Errorfilter tuning, see TuningMonte Carlo, 48, 51, 54, 93, 105, 190, 210,

225,229,231,237sensitivity, 48, 68, 71, 98, 99, 143, see also

Parameter; Robustnesssoftware, see Software programs for designtruth model, see Truth model

Performance criteria, see Criteria, optimality;Estimate

Perturbationanalysis, see Robustness; SensitivityKalman filter, 42

284

Perturbation (con' t. )linear, 40-42, 48, 52measurement, 41model, 41, see also Taylor seriesstate, 41

Phase-locked loop, 269-270Phase variable form, 114Physically realizable, 256Point process, 163-165, 247-248Poisson process, 163-165, 247-248

generalized, 164, 165homogeneous, 163nonhomogeneous, 164

Poles of a system, 58, 75, 115placement, 58

Polygonal approximation, 190Positive definite, xiv, 70, see also Matrix,

measurement noise strengthPositive semidefinite, xiv, 70, see also Ma-

trix, dynamics noise strength; Initial co-variance matrix

Power series in the residual, 218, 222, 235, 238Power spectral density, 54, 63, 65, see also

Correlation; Fourier transformPractical aspects, see Computer; Design; Im-

plementation; Reduced-Order; Robust-ness; Tuning

Precisionestimation, 68-69, 71, 72, 97, 101, see also

Performance analysisknowledge, 68-71, see also Covariancenumerical, 9, 28, 45,102,115,116,121,123,

210, 211, see also Factorization; Word-length

Precomputable, 41-42, 46, 54, 57, 58, 82, 86,102,111-113,121,131,248,266,267,270

Precomputed-gain extended Kalman filter, 57,58, 266, 267, 270

Prediction, see PropagationPrincipal axes, see EigenvectorPrior knowledge, see Prior statistics; System

model (linear); System model (nonlinear)Prior statistics, see A priori probability; Initial

conditions; Initial covariance matrix;Mean

Probabilitya posteriori, 213, 241, see also MAP esti-

mate; Measurement updatea priori, 35, 70, 73, 76, 130, see also Prior

statisticsconditional, see Conditional probabilitydensity, see Density functiondistribution, see Distribution

INDEX

joint, 161, 162, 202, 233law, see Forward Kolmogorov equationmarginal, 130, 162, 202, 214model, see Dynamic system model; Mea-

surement; Models; System model (lin-ear); System model (nonlinear)

Procedure, design, see DesignProcess, stochastic, see Stochastic processProcess noise, see Dynamics noiseProduct space, xiii, 157Projection, see Orthogonal projectionPropagation, see also Filter

covariance, 44, 84, 162, 199-201,238,248filter equations, 44, 84, 213, 215, 218, 221,

226, 238, 242, 244, 246, 248mean, 44, 84, 162, 199-200, 238, 248model, 70, 160, 165, 166, 175, 181, 193,238,

see also Dynamic system modelPseudoinverse, 31, 123, 138, 142Pseudonoise, 24, 27, 133, 135

Q

Quadraticcost, 18,72, 141, 156-158,257generalized cost, 18,72, 141, 156-158,257variation property, 177, 186,247

Quantization, 50, see also Discretization;Wordlength

Quasi-moments, 239Quasi-static, 71, 73, 74,102,122,123,250, see

also Stationary; Steady state; Time-in-variant system model

R

Random bias, 25, 51, see also Brownian mo-tion

Random constant, see Random biasRandom process, see Stochastic processRandom sequence, 70, 238, see also Discrete-

timeRandom variable, see Random vectorRandom vector

conditional mean of, 213, 215-216, 248, 256continuous, 71, 129, 168, 174, 175correlated, 28, 40, 70, 120, 136, 137, 160,

162, 176, 181, 185, 256covariance of, 28, 39, 70, 120, 136, 160, 162,

176, 181, 256definition of, xiii

INDEX

discrete, 71, 130, 167, 171expected value of function of, 185,219function of, 185, 218, 219Gaussian, 39, 77, 132, 161, 164, 176,241independent,44,70jointly Gaussian, 162mean vector of, 72, 84, 162, 199, 238, see

also Meannormal, see Gaussian random vectororthogonal, 239realization, xiii, 36, 41, 130, 213, 214, 219uncorrelated, 44, 70, see also Correlationuniform, 65, 72, 130

Random walk, 25, 51, see also Brownian mo-tion

Rate, convergence, 82Rate parameter, 163Realizable, 256, see also CausalRealization ~f random vector, xiii, 36, 41,

130, 213, 214, 219Reasonableness checking, 232Recertification, 173Reconstructibility, see also ObservabilityRecursive estimation, 37, 131,213Reduced-order, 23, 25, see also Adequate

model; Approximation; SimplifiedReference trajectory, see NominalRegularity conditions, %Relinearizations, 42, 44Representation theorem, 251Reproducing kemel Hilbert space, 257Residual, 15, 18, 37, 69, 131, 133, 136, 144,

218, 232, 234, 256monitoring, 69, 131, 133,232whitening, 136

Riccati equationalgebraic, see Constant-gaindifferential, 58difference, 9steady state solution, see Constant-gain

Riemann integral, 160RMS, see Root mean squareRobustness, 143, see also Divergence; TuningRoot mean square (rms), see Correlation;

Standard deviation; VarianceRoundoff errors, see Numerical precision

S

Sample, stochastic process, xiii, see also Re-alization of random vector

Sample frequency, see Sample time

285

Sample instant, see Sample timeSample rate, see Sample timeSample space, xiiiSample statistics, 123, 128, see also Monte

CarloSample period, see Sample timeSample time, measurement, 2, 40, 70, 160,

165,213Sample value, see RealizationSampled data, 2, 40, 70, 160, 165, 213Saturation, 235, 243Scalar measurement updating, iterative, 56,

64, 116, see also Measurement updateSchmidt epsilon technique, 31Schwartz inequality, 179, 180,206,207Score, 81, 82, 85, 87, 120Scoring, 81, 120Second order

density, 162, see also Markov processfilter, see Filterstatistics, see Correlation; Covariance

Self-tuning, 69, 120, 129, 136, 141Semi-group property, 171Sensitivity, 56, 68, 71, 74, 98, 99,143, see also

Erroranalysis, 48, 68,71,98,99,143, see also Pa-

rameter; Robustnesssystem, 85

Sensor, 232, see also MeasurementSequence, see also Discrete-time

control, 2, 10, 37, 86Gaussian, 2, 70, 77, 132Gauss-Markov, 2, 70, 77innovations, see ResidualMarkov, 2, 70, 167,238measurement, 2, 40, 70, 160, 213noise, 2, 40, 70, 160, 165, 213, 238residual, 15, 37, 69, 132, 136, 144, 218, 232state vector, 2, 70, 77, 167,238

Set theoretic, 71, 72, 257Shaping filter, 54, 160, 166, 188, 190Shifted coordinates, see Perturbation; Tay-

lor seriesSifting property of Dirac delta, 185Sigma (IT), see Standard deviationIT-algebra, 176, 246, 251Similarity transformation, 114Simple function, 177, 178Simplified (controller, filter, model), 109, 121,

128, 216, see also Adequate model; Ap-proximations; Model; Neglected terms;Reduced-order

Simulation, 189, 210, see also Monte Carlo

286

Singular, 8, 70, 123, see also Positive semidef-inite

Singular value decomposition, 102Smoothability, IISmoothing, 1,58,59,257

continuous measurement, 18, 20fixed-interval, 4, 5, 9fixed-lag, 5, 16fixed-point, 5, 15, 17Fraser form, 5, 16Meditch form, 14, 15nonlinear, 18

Snyder's equation, 247, 248Software programs for design, 48, 61 (Ref.

38), 97, 99, 190, 191, 210, see also Per-formance analysis; Tuning

Solutiondeterministic differential equation, 182nominal,4ORiccati equation, see Riccati equationstochastic differential equation, 161, 181-

186Space

Euclidean, 22Hilbert, 157, 241, 257reproducing kernel Hilbert space, 257state, see State

Spectral analysis, 54, 63, 65, see also Eigen-value

Spectral density, 54, 63, 65Spectrum, see Eigenvalue; Spectral analysisSpurious data, 232Square root matrix, 1, 9, 21, 116, 192, 210-

211, 227, see also FactorizationStability

filter, 24numerical, 24, 28, 102, 116, 121, see also

Factorizationrobustness, 143

Standard controllable form, 114Standard deviation, 123, 128, see also Monte

Carlo; VarianceStandard observable form, 114, 115State

augmented, 160concept, 167controllable, 24, 114difference equation, see Dynamic system

model; Discrete-timedifferential equation, see Dynamic system

model; Continuous-time; Stochasticdifferential equations

INDEX

equation, see Dynamic system modelequilibrium, see Equilibrium state; Nomi-

nalerror, see Error; Perturbationestimate, see Filterfeedback, 213, 251notation, xiii, xviobservable, see Observability; Standard

observable formpropagation, see Dynamic system modelspace, 114, 115, 167steady state, see Steady statetransition, 2, 15, 70, 74, 115, 161, 169, 190,

238variables, choice, 114, 115vector, see Dynamic system model; System

model (linear); System model (nonlin-ear)

State space representations, 114, 115State transition matrix, 2,15,70,74,115,161,

190State transition probabilities, 169State transition probability matrix, 169Static model, see Measurement; Random vec-

torStationary, 54, 63, 65, 123, 136Statistical approximation, 245Statistical linearization, 243Statistical tests for whiteness, 139, 142Statistically linearized filter, 243Statistics

estimates of, 120, 123, 128, 129, 136, 141,see also Monte Carlo

first order, 162, see also First order density;Mean

partial description of density, 214, 215, 239second order, 162, see also Correlation;

Covariance; Monte Carlosufficient, 72

Steady state, see also Constant-gainerror, see Errorfilter, 14, 52, 57, 58, 112, 137,250, 266, 267,

270Riccati equation solution, 9, 58gain, 57, 58, 137,250,266,267,270

Stochastic approximations, 58, 73, 257Stochastic controllability, 24Stochastic difference equations, see also Dis-

crete-timelinear, 2, 70nonlinear, 238

Stochastic differential, 39, 161, 175, 181

INDEX

Stochastic differential equations, see alsoStochastic integral

linear, 41,160-162,165nonlinear, 39, 165, 166, 181, 191, 199,213

Stochastic integralIt6, 166, 175, 178, 180, 181, 189Stratonovich, 163, 178, 188, 191-193Wiener, 160, 161, 175, 180

Stochastic model, see Dynamic systemmodel; Measurement; Model; Stochasticprocess; System model (linear); Systemmodel (nonlinear)

Stochastic observability, 24Stochastic process, see also Dynamic system

model; Measurement; Modeladmissible, 175, 179, 181bias, 25, 51, 52, 72, 109, 138,224Brownian' motion, 39, 51, 160, 175-176,

179-181characterization, 161, 174,202continuous-time, see Continuous-timecorrelated, 53, 54, see also Correlationcorrelation kernel of, 39, 70, 137, 160, 162,

164,256correlation matrix of, 40, 70, 120, 137, 160,

162, 176, 181, 185covariance kernel of, 39, 40, 70, 160, 162,

164covariance matrix of, 39, 70, 84, 120, 160,

162, 176, 181, 199-201, 238cross-correlation of, 28, 136, 185, 256cross-covariance of, 28, 136, 185, 256, see

also Covariancedefinition, xiiidescription of, 161, 174, 202discrete-time, 2, 70, 167,210,238exponentially time-correlated, 53, 139, 188,

205first order Gauss-Markov, 53,139,188,205,

see also System model (linear)Gaussian, 161, 164, 176, see also Brown-

ian motion; Gauss-Markov processmodel; White Gaussian noise

Gauss-Markov, 53,139,160,162,188,205,see also System model (linear)

independent, 44, 53, 70independent increment, 160, 163, 176, 183,

247, see also Brownian motion; Pois-son process

Markov, 136, 159, 162, 163, 166, 167, 176,183, 192, 213

martingale, 176, 179,209,247,251,256

287

mean of, see Mean vectornonstationary, see Stationarynormal, see Gaussian stochastic processPoisson, 163-165, 247-248power spectral density, 54, 63, 65, see also

Correlation; Fourier transformprobability laws for, see Forward Kolrno-

gorov equationrandom bias, 25, 51, see also Brownian mo-

tionrandom constant, see Random biasrandom walk, 25, 51, see also Brownian

motionsamples, xiii, see also Realizationsimulation, 189, 210, see also Monte Carlostationary, 54, 63, 65, 123, 136time-correlated, 54, see also Shaping filteruncorrelated, see White noise; Correlated

noisewhite, see White noisewhite Gaussian, 39, 40, 70, 160, 164-166,

256Wiener, see Brownian motion

Storage, 17,37-38,74,103,111,116Stratonovich integral, 163, 178, 188, 191-193Strength of white noise, 39, 57, 160, 181, 187,

213,246Structural model, see Dynamic system model;

Measurement; ModelsSubintervals, integration, 45, 190, 210Suboptimal filter, see Approximations; Re-

duced-order; FilterSufficient statistic, 72Superposition, see System model (linear)Symmetric, 116, 216System dynamics matrix, 41, 160System identification

offline, 84, 153, 156online, 69, 153, 156simultaneously with state estimation, 48,

68, 74-119, 129-136System model (linear), see also Stochastic

model; Dynamic system model; Mea-surement; Models

adequate, see Adequate modelconstant coefficient, see Time-invariant

system modelcontinuous-time, 39, 41, 160, 161, 165, see

also Continuous-time measurementdiscrete-time, 70dynamics equation, 2, 39, 41, 70, 160, 161

165

288

System model (linear) (con't.)equivalent discrete-time, 2, 70, 210, 211frequency domain, 54, 63, 65, see also Sta-

tionary; Time-invariant system modelmatrices, see Matrixmeasurement equation, 2, 70, 160, 163noise, 70, 160, see also Dynamics noise;

Measurement noise; White Gaussianpseudonoise, see Pseudonoise; Tuningstate, 70, 160, see also Statetime domain, 70, 160, see also State spacetime-invariant, see Time-invariant system

modeltime-varying, see Time-varying systemtransfer function, 54, 63, 65, see also Time-

invariant system modeluncertainty, 70, 160, see also Dynamics

noise; Measurement noise; Compensa-tion for model inadequacies

System model (nonlinear), see also Stochasticmodel; Dynamic system model; Mea-surement; Models

adequate, see Adequate modelconstant coefficient, see Time-invariant sys-

tem modelcontinuous-time, 39, 165, 166, 181, 191, 199,

213discrete-time, 238dynamics equation, 39, 165, 166, 181, 191,

199, 213, 238measurement equation, 40, 57, 165, 213,

246,251noise, 39, 40, 165, 166, 213, 238, 246, 251pseudonoise, see Pseudonoise; Tuningstate, 39, 165, 166, 181, 191, 199,213,238,

see also Statetime domain, see System model (nonlinear),

statetime-invariant, see Time-invariant system

modeltime-varying, see Time-varying systemuncertainty, 39,40, 165, 166,213,238,246,

251, see also Dynamics noise; Measure-ment noise; Compensation for modelinadequacies

T

Taylor series, 40,177,186,195,216,217,220,235,248

Terminal condition, 8, 9

INDEX

Terminal time, 8Time average, 103, 141Time-correlated noise models, see Shaping

filterTime domain analysis, 160, see also State;

System model (linear); System model(nonlinear)

Time history, 2, 83, 160, 165, 213, 215, 246Time-invariant system model, 54, 57, 58, 71,

115, 136, 170Time propagation, see PropagationTime series analysis, 120, 123, 129, 133, 136,

137, 139, 141, see also Adaptive; Resid-ual monitoring

Time-varying system, see Models; Systemmodel (linear); System model (nonlinear)

Trace, xiv, 181Tracking error, 54-56, 109Tradeoff, see Adequate model; Computer;

Performance analysis; TuningTransfer function, 54, 63, 65, see also Time-

invariant system modelTransformation, 22, 114, 180, 256

matrix, 114similarity, 114

Transient, 12, 93, 103, 104, 112, 113, 135Transition density, 162, 174, 183, 192,213Transition matrix, 2, 15,70,74, 115, 161, 169,

190Transition probability, 162, 168, 174, 183, 192,

213Transition probability diagram, 170Transpose, xivTriangular matrix, 21, 27, 192,210-211, see

also FactorizationTrue errors, see Monte Carlo; Performance

analysis; Truth modelTruncated second order filter, 217, 218, 220,

225,249Truncation errors, 9, 28, 45, 115, 121, 123, see

also Factorization: Numerical precision;Wordlength

Truth model, 25, 48, 98, 228, see also MonteCarlo; Performance analysis

Tuning, 23, 24, 27, 48, 52, 69, 133-135,224,225, 228

u

U - D covariance factorization, 27, 211, seealso Factorization

INDEX

Unbiased estimate, 72, %Unbounded variation, 176Uncertain parameters, 48, 68-144, see also

ParameterUncertainties, model for, see Models; Noise;

Random vector; Shaping filter; Stochas-tic process; System model (linear); Sys-tem model (nonlinear)

Uncontrollable, 24Uncorrelated random vectors, 44, 70, see also

CorrelationUncorrelated stochastic processes, 44, 70, see

also CorrelationUniformly distributed, 65, 72, 130Unimodal, 47, 216, 241Uniqueness, 71, 97, 128, 182Unitary matrix, 27, 211Unknown but bounded, 71, 72, 257UnmodeHed effects, see Neglected termsUnobservable, 24Unreconstructible, see UnobservableUnstable, see StabilityUpdate, filter, see MeasurementUpper triangular matrix, 27, 211, see also

Factorization

v

Variable, random, see Random variableVariance, see also Covariance

cross-variance, 28, 136, 256, see also Co-variance

estimator of, 120, 123, 129, 136, 141, seealso Monte Carlo

Variation, first, see Linear perturbationVariation, unbounded, 176Vector

control input (u), 2,10,37,39,86,160, 161,213,251

289

dynamics noise (w), 2, 39, 70, 120, 160,213It6 differential, 39, 181measurement (z), 2, 40, 57, 70, 160, 163,

165, 213, 246, 251measurement noise (v), 2, 40, 70, 120, 160,

165, 213, 246process, see Stochastic processrandom, see Random vectorresidual (r), see Residualstate (x), see Dynamic system model;

State; System model (linear); Systemmodel (nonlinear)

w

Weight, probability, 72,109,131, see also Fil-ter gain; Weighted least squares

Weighted least squares, 18, 72, 156-158, 257Weighting matrix, see Weighted least squaresWhite Gaussian noise, 39, 40, 70, 160, 164-

166,256White noise

continuous-time, 39, 160, 164-166, 256discrete-time, 40, 70, 160, 165, 256

Whiteness tests, 139, 142Whitening of residuals, 136Wiener integral, 160, 161, 175, 180Wiener process, see Brownian motionWiener stochastic integral, 160, 161, 175, 180With probability one, see Convergence, with

probability oneWordlength, I, 9, 27, 65, 115, 116, see also

Factorization; Numerical precision

Z

Zero-mean noise, 70, see also Dynamicsnoise; Measurement noise

Stochastic Models, Estimation, And Control Volume 3

Documents

Transcript of Stochastic Models, Estimation, And Control Volume 3