SAAB2001_A Discrete Time Stochastic Learning Algorithm

11
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO. 6, JUNE 2001 877 A Discrete-Time Stochastic Learning Control Algorithm Samer S. Saab, Senior Member, IEEE Abstract—One of the problems associated with iterative learning control algorithms is the selection of a “proper” learning gain matrix for every discrete-time sample and for all successive iterations. This problem becomes more difficult in the presence of random disturbances such as measurement noise, reinitialization errors, and state disturbance. In this paper, the learning gain, for a selected learning algorithm, is derived based on minimizing the trace of the input error covariance matrix for linear time-varying systems. It is shown that, if the product of the input/output cou- pling matrices is full-column rank, then the input error covariance matrix converges uniformly to zero in the presence of uncorrelated random disturbances. However, the state error covariance matrix converges uniformly to zero in presence of measurement noise. Moreover, it is shown that, if a certain condition is met, then the knowledge of the state coupling matrix is not needed to apply the proposed stochastic algorithm. The proposed algorithm is shown to suppress a class of nonlinear and repetitive state disturbance. The application of this algorithm to a class of nonlinear systems is also considered. A numerical example is included to illustrate the performance of the algorithm. Index Terms—Iterative learning control, optimal control, sto- chastic control. I. INTRODUCTION I TERATIVE learning control algorithms are reported and ap- plied to fields topping robotics [4]–[7], and [23]–[36], for example, precision speed control of servomotors and its appli- cation to a VCR [1], cyclic production process with application to extruders [2], and coil-to-coil control in rolling [3]. Several approaches for updating the control law with repeated trials on identical tasks, making use of stored preceding inputs and output errors, have been proposed and analyzed [4]–[38]. Several algo- rithms assume that the initial condition is fixed; that is, at each iteration the state is initialized always at the same point. For a learning procedure that automatically accomplishes this task, the reader may refer to [28] and [17]. For algorithms that employ more than one history data, refer to [15] and [16]. A majority of these methodologies are based on contraction mapping require- ments to develop sufficient conditions for convergence and/or boundedness of trajectories through repeated iterations. Conse- quently, a condition on the learning gain matrix is assumed in order to satisfy a necessary condition for contraction. Based on the nature of the control update law, conditions on , such as or , are imposed (where the matrix is a function of input/output coupling functions to the system) Manuscript received September 1, 1999; revised April 25, 2000. Recom- mended by Associate Editor M. Polycarpou. This work was supported by the University Research Council at the Lebanese American University. The author is with Lebanese American University, Byblos 48328, Lebanon (e-mail: [email protected]). Publisher Item Identifier S 0018-9286(01)05129-7. [4]–[20]. Several methods are proposed to find an “appropriate” value of the gain matrix . For example, the value of is found based on gradient methods to minimize a quadratic cost error function between successive trials [35], another is based on pro- viding a steepest-descent minimization of the error at each itera- tion [37], an additional is achieved by choosing larger values of the gain matrix until a convergence property is attained [38]. An- other methodology, based on frequency response technique, is also presented in [25]. On the other hand, for task-level learning control, an on-line numerical multivariate parametric nonlinear least square optimization problem is formulated and applied to a two-link flexible arm [24]. Then again, these methods are based on deterministic approach. To the best of the author’s knowl- edge, this paper presents the first attempt to formulate a compu- tational algorithm for the learning gain matrix, which stochas- tically minimizes, in a least-square sense, trajectory errors in presence of random disturbances. This paper presents a novel stochastic optimization approach, for a selected learning control algorithm, in which the learning gain minimizes the trace of the input error covariance matrix. A linear discrete time-varying system, with random state reinitial- ization, additive random state disturbance and biased measure- ment noise, is initially considered. The selected control update law , at a sampled time and iterative index , consists of the previous stored control input added to weighted difference of the stored output error, that is . Although this up- date law possesses a derivative action, the proposed algorithm is shown to become insensitive to measurement noise throughout iterations. The problem can be interpreted as follows: assume that the random disturbances are white Gaussian noise and un- correlated, then, at time and iterative index , find a learning gain matrix , such that the trace of the input error co- variance matrix at is minimized. It is shown [11] that if is chosen such that , then boundedness of trajectories of this control law is guaranteed. A choice of this learning gain matrix requires the matrix to be full-column rank. In addition, in ab- sence of all disturbances, uniform convergence of all trajecto- ries is also guaranteed if and only if is full-column rank. This latter scenario is not realistic because there are always random disturbances in the system in addition to measurement noise. Unfortunately, deterministic learning algorithms cannot learn random behaviors. That is, after a finite number of iter- ations, the repetitive errors are attenuated and the dominating errors will be due to those random disturbances and measure- ment noise, which cannot be learned or mastered in a determin- istic approach. In this paper, we show that if is 0018–9286/01$10.00 © 2001 IEEE

description

Technical Paper

Transcript of SAAB2001_A Discrete Time Stochastic Learning Algorithm

Page 1: SAAB2001_A Discrete Time Stochastic Learning Algorithm

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO. 6, JUNE 2001 877

A Discrete-Time Stochastic Learning ControlAlgorithm

Samer S. Saab, Senior Member, IEEE

Abstract—One of the problems associated with iterativelearning control algorithms is the selection of a “proper” learninggain matrix for every discrete-time sample and for all successiveiterations. This problem becomes more difficult in the presence ofrandom disturbances such as measurement noise, reinitializationerrors, and state disturbance. In this paper, the learning gain, fora selected learning algorithm, is derived based on minimizing thetrace of the input error covariance matrix for linear time-varyingsystems. It is shown that, if the product of the input/output cou-pling matrices is full-column rank, then the input error covariancematrix converges uniformly to zero in the presence of uncorrelatedrandom disturbances. However, the state error covariance matrixconverges uniformly to zero in presence of measurement noise.Moreover, it is shown that, if a certain condition is met, then theknowledge of the state coupling matrix is not needed to apply theproposed stochastic algorithm. The proposed algorithm is shownto suppress a class of nonlinear and repetitive state disturbance.The application of this algorithm to a class of nonlinear systems isalso considered. A numerical example is included to illustrate theperformance of the algorithm.

Index Terms—Iterative learning control, optimal control, sto-chastic control.

I. INTRODUCTION

I TERATIVE learning control algorithms are reported and ap-plied to fields topping robotics [4]–[7], and [23]–[36], for

example, precision speed control of servomotors and its appli-cation to a VCR [1], cyclic production process with applicationto extruders [2], and coil-to-coil control in rolling [3]. Severalapproaches for updating the control law with repeated trials onidentical tasks, making use of stored preceding inputs and outputerrors, have been proposed and analyzed [4]–[38]. Several algo-rithms assume that the initial condition is fixed; that is, at eachiteration the state is initialized always at the same point. Fora learning procedure that automatically accomplishes this task,the reader may refer to [28] and [17]. For algorithms that employmore than one history data, refer to [15] and [16]. A majority ofthese methodologies are based on contraction mapping require-ments to develop sufficient conditions for convergence and/orboundedness of trajectories through repeated iterations. Conse-quently, a condition on the learning gain matrixis assumed inorder to satisfy a necessary condition for contraction. Based onthe nature of the control update law, conditions on, such as

or , are imposed (where the matrixis a function of input/output coupling functions to the system)

Manuscript received September 1, 1999; revised April 25, 2000. Recom-mended by Associate Editor M. Polycarpou. This work was supported by theUniversity Research Council at the Lebanese American University.

The author is with Lebanese American University, Byblos 48328, Lebanon(e-mail: [email protected]).

Publisher Item Identifier S 0018-9286(01)05129-7.

[4]–[20]. Several methods are proposed to find an “appropriate”value of the gain matrix . For example, the value of is foundbased on gradient methods to minimize a quadratic cost errorfunction between successive trials [35], another is based on pro-viding a steepest-descent minimization of the error at each itera-tion [37], an additional is achieved by choosing larger values ofthe gain matrix until a convergence property is attained [38]. An-other methodology, based on frequency response technique, isalso presented in [25]. On the other hand, for task-level learningcontrol, an on-line numerical multivariate parametric nonlinearleast square optimization problem is formulated and applied to atwo-link flexible arm [24]. Then again, these methods are basedon deterministic approach. To the best of the author’s knowl-edge, this paper presents the first attempt to formulate a compu-tational algorithm for the learning gain matrix, which stochas-tically minimizes, in a least-square sense, trajectory errors inpresence of random disturbances.

This paper presents a novel stochastic optimization approach,for a selected learning control algorithm, in which the learninggain minimizes the trace of the input error covariance matrix. Alinear discrete time-varying system, with random state reinitial-ization, additive random state disturbance and biased measure-ment noise, is initially considered. The selected control updatelaw , at a sampled timeand iterative index ,consists of the previous stored control input added to weighteddifference of the stored output error, that is

. Although this up-date law possesses a derivative action, the proposed algorithm isshown to become insensitive to measurement noise throughoutiterations. The problem can be interpreted as follows: assumethat the random disturbances are white Gaussian noise and un-correlated, then, at timeand iterative index , find a learninggain matrix , such that the trace of the input error co-variance matrix at is minimized. It is shown [11] thatif is chosen such that

, then boundedness of trajectories of this control law isguaranteed. A choice of this learning gain matrix requires thematrix to be full-column rank. In addition,in ab-sence of all disturbances, uniform convergence of all trajecto-ries is also guaranteed if and only if is full-columnrank. This latter scenario is not realistic because there are alwaysrandom disturbances in the system in addition to measurementnoise. Unfortunately, deterministic learning algorithms cannotlearn random behaviors. That is, after a finite number of iter-ations, the repetitive errors are attenuated and the dominatingerrors will be due to those random disturbances and measure-ment noise, which cannot be learned or mastered in a determin-istic approach. In this paper, we show that if is

0018–9286/01$10.00 © 2001 IEEE

Page 2: SAAB2001_A Discrete Time Stochastic Learning Algorithm

878 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO. 6, JUNE 2001

full-column rank, then the algorithm that generates the learninggain matrix, which minimizes the trace of the input error co-variance matrix automatically assures the contraction condition,that is , which ensures the sup-pression of repeatable or deterministic errors [11], [14]. More-over, it is shown that the input error covariance matrix convergesuniformly to zeroin presenceof random disturbances, wherethe state error covariance matrix converges uniformly to zero inpresence of biased measurement noise. Obviously, if the inputerror is zero and in the case where reinitialization errors arenonzero, then one should not expect this type of learning controlto suppress such errors. Furthermore, if a zero-convergence ofthe state error covariance matrix is desired in presence of statedisturbance, then the controller (if any) is expected to “heavily”employ full knowledge of the system parameters, which makesthe use of learning algorithms unattractive. On the other hand,it is shown that if a certain condition is met, then knowledge ofthe state matrix is not required. Application of this algorithm toa class of nonlinear system is also examined. In one case wherethe nonlinear state function does not require the Lipschitz condi-tion, unlike the results in [5]–[9], [12]–[15], and [24], it is shownthat the input error covariance matrix is nonincreasing. In addi-tion, if a certain condition is met, then the results for this classof systems become similar to the case of linear time-varyingsystems. A class of nonlinear systems satisfying Lipschitz con-dition is also considered.

The outline of the paper is as follows. In Section II, we for-mulate the problem and present the assumptions of the distur-bance characteristics. In Section III, we develop the necessarylearning gain that minimizes the trace of the input error covari-ance matrix, and present the propagation formula of this covari-ance matrix through iterations. In Section IV, we assume that

is full-column rank and present the convergenceresults. In Section V, we present a pseudo-code for computerapplication, and give a numerical example. We conclude in Sec-tion VI. In Appendix A, we show that if a certain condition ismet, then the knowledge of the state matrix is not required. InAppendix B, we apply this type of stochastic algorithm to a classof nonlinear systems.

II. PROBLEM FORMULATION

Consider a discrete-time-varying linear system described bythe following difference equation:

(1)

where;

;;;;

and .The learning update is given by

(2)

where is the learning control gain matrix, andis the output error; i.e., where

is a realizable desired output trajectory. It is assumed thatfor any realizable output trajectory and an appropriate initialcondition , there exists a unique control input

generating the trajectory for the nominal plant. That is, thefollowing difference equation is satisfied:

(3)

Note that if is full-column rank and is arealizable output trajectory, then a unique input generating theoutput trajectory is given by

Define the state and the input error vectors as

, and , respectively.Assumptions:Denote , and to be the expectation oper-

ators with respect to time domain, and iteration domain, respec-tively. Let and be sequences of zero-meanwhite Gaussian noise such that ispositive–semidefinite matrix, is pos-itive–definite matrix for all , and

for all indices , and, ; i.e.,the measurement error sequence have zero crosscorrela-tion with at all times. The initial state error andthe initial input error are also assumed to be zero-meanwhite noise such that is pos-itive–semidefinite matrix, andis a symmetrical positive–definite matrix. One simple scenariois to set . Moreover, is uncorrelatedwith , and . The main target is to finda proper learning gain matrix such that the input errorvariance is minimized. Note that since the disturbances are as-sumed to be Gaussian processes, then uncorrelated disturbancesare equivalent to independent disturbances.

III. STOCHASTIC LEARNING CONTROL ALGORITHM

DEVELOPMENT

In this section, we develop the appropriate learning gain ma-trix, which minimizes the trace of the input/state error covari-ance matrix. In the following, we combine the system given by(1) and (2) in a regressor form. We first write expressions forthe state and input errors. The state error at is given by

(4)

and the input error for iterate is given by

Page 3: SAAB2001_A Discrete Time Stochastic Learning Algorithm

SAAB: A DISCRETE-TIME STOCHASTIC LEARNING CONTROL ALGORITHM 879

Substituting the values of the state errors in the previous equa-tion, we have

Collecting terms, the input error yields to

(5)

Combining (4) and (5) in the two-dimensional Roessor Model[39] as shown in (6) at the bottom of the page. The followingis justified because the input vector of difference equation (6)has zero mean. Note that if disturbances are nonstationary orcolored noise, then the system can be easily augmented to coverthis class of disturbances. Writing (6) in a compact form, wehave

(7)

where

are vectors

is an vector

is an matrix, and

is an matrix.Next, we attempt to find a learning gain matrix such

that the trace of the error covariance matrixis minimized. It is implicitly assumed that the input and stateerrors have zero mean, so it is proper to refer toas a covari-ance matrix. Note that the mean-square error is used as the per-formance criterion because the trace is the summation of error

variances of the elements of . Toward this end, we first formthe error covariance matrix

Since the initial error state is uncorrelated withthe disturbances and , then at iterate ,

, and. Thus, , and . The

last equation is now reduced to

(8)

Let

where

and

As a consequence, and are symmetrical and posi-tive–semidefinite matrices.

where the zero submatrices are due to zero crosscorrelationbetween and . For compactness, we denote

, , , ,

, , and .Expanding on the left hand terms of (8), we get the equationshown at the bottom of the next page. Consequently, the traceof is equivalent to the following:

trace

trace

(6)

Page 4: SAAB2001_A Discrete Time Stochastic Learning Algorithm

880 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO. 6, JUNE 2001

Expanding and rearranging the terms on the right-hand side ofthe last equation, we get

trace

trace

(9)

Setting

and

then the above equation is reduced to

trace

trace

Since the learning gain, is employed to update the con-trol law at , and in addition, this gain does not effectthe state error at , that is

trace

then it will be reasonable to find a proper gain to minimize thetrace of instead of . In order to find such that the traceof is minimized, we set the derivative of trace of withrespect to to zero as follows:

trace

Note that since and are positive–semidefinite matrices,and is positive definite, then

in nonsingular. The following solution of is the op-timal learning gain which minimizes the trace of

(10)Next, we show that . This isaccomplished by expressing the state error in functionof the initial state error , , and ,

, and similarly, the input error in function of, , , , and ,

, and then correlating and . Iterating theargument of (4), we get

(11)

Define

and

Page 5: SAAB2001_A Discrete Time Stochastic Learning Algorithm

SAAB: A DISCRETE-TIME STOCHASTIC LEARNING CONTROL ALGORITHM 881

Similarly, iterating the argument of (5), we get for

(12)

Correlating the right hand terms of (11) and (12), it can bereadily concluded that , , ,

, , and, are all uncorrelated. In addition, these

terms are also uncorrelated with ,and . At this point, the correlationof (11) and (12) is equivalent to correlating

and , where

, .Since the terms andcannot be represented as a function of one another, hencethese terms are considered uncorrelated. Therefore,

, and consequently .Then, is reduced to

(13)

and the input error covariance matrix becomes

(14)

IV. CONVERGENCE

In this section, we show that “stochastic” convergence isguaranteed in presence of random disturbances if isfull-column rank. In particular, it is shown that the input errorcovariance matrix converges uniformly to zeroin presenceof random disturbances (Theorem 3), where the state errorcovariance matrix converges uniformly to zero in presence ofbiased measurement noise (Theorem 5). In the following, someuseful results are first derived.

Proposition 1: If is a full-column rank matrix, thenif and only if .

Proof: Sufficient condition is the trivial case [setin (13)]. Necessary condition. Since is full-column rank,

then is a full-row rank

is also full-row rank. Multiplying both sides of (13), from theright-hand side, by and letting ,implies that .

The input error covariance matrix associated with the learninggain (13) may now be derived. Note that if the initial conditionof is symmetric, then, by examining (14), is sym-

metric for all . For compactness, we define , and.

Expanding (14), we have

(15)

Substitution of (13) into (15) yields to

(16)

Claim 1: Assuming that is symmetric positive–defi-nite, then , and

(17)

Proof: Substituting (13) in , we get

One way to show the equality claimed, we multiply the left-handside of (17) by the inverse of the right-hand side, and show thatthe result is nothing but the identity matrix

(18)

Using a well-known matrix inversion lemma [40], we get

(19)

Substituting (19) into the right-hand side of (18), we have

Page 6: SAAB2001_A Discrete Time Stochastic Learning Algorithm

882 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO. 6, JUNE 2001

Canceling two terms and rearranging others, the above equationyields to

We may rewrite the above equation as

Lemma 2: If is full-column rank, then thelearning algorithm, presented by (2), (13), and (16), guaranteesthat is a symmetric positive-def-inite matrix , and . Moreover, the eigenvaluesof are positive and strictly less than one, i.e.,

, and .Proof: The proof is proceeded by induction with respect

to the iteration index for . By examining (14),since is assumed to be a symmetric positive–definite ma-trix, then is symmetric and nonnegative definite. Define

. Equation (17) implies that. Since is full-column rank and is

symmetric positive–definite, then is symmetric pos-itive–definite. In addition, having symmetric and posi-tive–definite implies that all eigenvalues of arepositive. Therefore, the eigenvalues of arestrictly greater than one, which is equivalent to conclude thatthe eigenvalues of are positive and strictly less than one.This implies that is nonsingular. Equation (16) implies that

is nonsingular. Thus, is a symmetric pos-itive definite matrix. We may now assume that symmetricpositive definite. Equation (14) implies that is sym-metric and nonnegative definite. Using a similar argument of

for , implies that , and consequently ,with strictly positive eigenvalues, is nonsingular and issymmetric and positive–definite.

Remark: Since all the eigenvalues of arestrictly positive and strictly less than one , and(Lemma 2), then there exists a consistent norm such that

, and

(20)

Theorem 3: If is full-column rank, then thelearning algorithm, presented by (2), (13), and (16), guaranteesthat . In addition, and

uniformly in as .

Proof: Equation (16) along with (20), imply

where again . We now show thatby a contradiction argument. Since

is strictly decreasing sequence and bounded frombelow by zero, then the exists. Assume that

. Consider the following ratio test for thesequence :

Note that

Since is bounded in , thenis also bounded in , where

. This can be seen by em-ploying (11). Define

and

then

(21)

Since is bounded, thenis symmetric positive–definite and bounded. Therefore,

is symmetric, positive–definite,and bounded. The latter together with the assumptionthat , then one may conclude that

, or

Since , then the ratio test implies that, which is equivalent to

. Employing Proposition 1, we get . Sincetakes on a finite values ( ), then uniform convergenceis equivalent to pointwise convergence.

Lemma 4: If , , thenfor .

Proof: Let , and . Then,(positive–semidefinite matrix), and

. Taking the limit as , we have. Sim-

Page 7: SAAB2001_A Discrete Time Stochastic Learning Algorithm

SAAB: A DISCRETE-TIME STOCHASTIC LEARNING CONTROL ALGORITHM 883

ilarly, .Therefore, .

Theorem 5: If is a full-column rank matrix,then in absence of state disturbance and reinitialization errors(excluding biased measurement noise is positivedefinite), the learning algorithm, presented by (2), (13), and(16), guarantees that . In addition,

, , and the state error covariance matrixuniformly in as .

Proof: Obviously, the results of the previous theorem stillapplies. Setting , and , (11) is reducedto

Consequently, taking the expected value of ,

and defining , we have

Taking the limit as , we get

From previous results,, . Therefore,

, .Consequently, . Again,uniform convergence for is straightforward.

V. APPLICATION

In this section, we go over the steps involved in the appli-cation of the proposed algorithm. A pseudo code is presentedfor both cases where the termis considered, and neglected. In addition, a numerical exampleis presented to illustrate the performance of the proposed algo-rithm.

A. Computer Algorithm

In order to apply the proposed learning control algorithm,the state error covariance matrix needs to be available. FromSection III, one can extract the following

(22)

TABLE IPERFORMANCE OFPROPOSEDALGORITHM

We assume that , , , , , , andare all available. Starting with , then the pseu-

docode is as follows:

Step 1) For

a) apply to the system described by (1),and find the output error ;

b) employing (22), compute ;c) using (13), compute learning gain ;d) using (2), update the control ;e) using (16), update .

Step 2) , go to Step 1).In the case where may be

neglected, then the pseudocode is the same as the previous oneexcluding the second numbered item, and using (24) instead of(13), to compute learning gain .

B. Numerical Example

System Description:In this section, we apply the proposedlearning algorithm to a discrete-time varying linear system, andcompare the performance with a similar algorithm where thegain is computed in a deterministic approach. The latter is re-ferred to as “generic” algorithm. The system considered is givenby

(23)

, and ,where “the integration period” second, ,

, , , and are normally distributedwhite random processes with variances , ,

, , and 1, respectively. All the disturbances areassumed to be unbiased except for the measurement noise withmean . The desired trajectories with domain(1 second interval) are the trajectories associated with

. The initial input for the generic and proposed al-gorithms , . The initial input errorcovariance scalar (single input) , .Next, the generic algorithm requires the control gain to sat-isfy . It is noted that if is chosensuch that is close to zero, then, in the iterative do-main, fast transient response is noticed but at the cost of larger“steady-state” errors. However, if is chosen such that

Page 8: SAAB2001_A Discrete Time Stochastic Learning Algorithm

884 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO. 6, JUNE 2001

Fig. 1. State errors and input error atk = 100, for t 2 [0; 100].

is close to one, then, smaller steady-state errors is no-ticed at the cost of very slow transient response. Consequently,the “generic” gain is chosen such that ,for all .

Performance: In order to quantify the statistical and deter-ministic size of input and state errors, after a fixed number ofiterations, we use three different measures. For statistical eval-uation, we measure the variance and the mean square error ofeach variable. For deterministic examination, the infinite normis measured. Since the convergence rate, as the number of it-eration increases, is also of importance, then the accumulationof each of these measures (variance, mean square, and infinitenorm) are also computed. The superiority of the proposed al-gorithm can be detected by examining both Table I and Fig. 1.In Fig. 1, the solid, and dashed lines represent the employmentof the proposed and generic gains, respectively. The top plotof Fig. 2 shows the evolution of , whereas the bottom plotshows the conformation of Proposition 1. In addition, an exam-ination of Table I and Fig. 2 (top) shows the close conformityof the computed input error variance and the estimated one gen-erated by the proposed algorithm. In particular, at ,

, and, with .

VI. CONCLUSION

This paper has formulated a computational algorithm for thelearning gain matrix, which stochastically minimizes, in a least-square sense, trajectory errors in presence of random distur-bances. It is shown that if is full-column rank,

then the algorithm automatically assures the contraction condi-tion, that is , which ensuresthe suppression of repeatable errors and boundedness of trajec-tories in presence of bounded disturbances. The proposed algo-rithm is shown to stochastically reject uncorrelated disturbancesfrom the control end. Consequently, state trajectory tracking be-comes stochastically insensitive to measurement noise. In par-ticular, it is shown that, if the matrix is full-columnrank, then the input error covariance matrix converges uniformlyto zero in presence of uncorrelated random state disturbance,reinitialization errors, and measurement noise. Consequently,the state error covariance matrix converges uniformly to zeroin presence of measurement noise. Moreover, it is shown that,if a certain condition is met, then the knowledge of the statecoupling matrix is not needed to apply the proposed stochasticalgorithm. Application of this algorithm to class of nonlinearsystems is also examined. A numerical example is added to con-form with the theoretical results.

APPENDIX AMODEL UNCERTAINTY

Unfortunately, the determination of the proposed learninggain matrix, specified by (13), requires the update of the stateerror covariance matrix, which in turns requires full knowledgeof the system model. This makes the application of the proposedlearning control algorithm somehow unattractive. Note thatin many applications, only the continuous-time statistical anddeterministic models are available, furthermore, the discretematrices and , and the system and measurement noise

Page 9: SAAB2001_A Discrete Time Stochastic Learning Algorithm

SAAB: A DISCRETE-TIME STOCHASTIC LEARNING CONTROL ALGORITHM 885

Fig. 2. Evolution of learning gain functionK , t 2 [0; 100] for k = 1; 2; 5, and 10 (top),avg K , andavg P versusk, 1 � k � 100(bottom).

covariance matrices depend on the size of the sampling period(while discretizing). Suppose that a continuous-time lineartime-invariant system is given by ,and where is the time argument in contin-uous domain. In order to evaluate the discrete-state matrix,we may use

( ) with the sampling period. For a small samplingperiod, one may approximate

. There are several other mathematical applicationswhere and the sampling period does not have tobe small, e.g., if the rows of the state matrixcorresponding toth output is equal to the associated row of the output coupling

matrix . For example, if , and only the secondrow of is equal to , then . Therefore, thelearning gain matrix is reduced to

(24)

Note that, by using the learning gain of (24) [instead of (13)],allthe results in Sections IV and V still apply as presented. Conse-quently, the computer algorithm does not require the update of

the state error covariance matrix, hence knowledge of the statematrix is no longer needed. In the case where , andin absence of plant knowledge, then a parameter identificationscheme, similar to the one presented in [34], may be integratedwith the proposed algorithm.

APPENDIX BAPPLICATION TO NONLINEAR SYSTEMS

A. Application to a Class of Nonlinear Systems

Consider the state-space description for a class of nonlineardiscrete-time varying systems given by the following differenceequation:

(25)

where is a vector-valued function with range in . De-fine , and . We assumethat the initial state and a desired trajectory are givenand assumed to be realizable that is,bounded ,

, , and , such that, and .

The restrictions on the noise inputs , and are as as-

sumed in Section II. Again, we define , ,

, , , the state error

vector , and the input error vector

Page 10: SAAB2001_A Discrete Time Stochastic Learning Algorithm

886 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO. 6, JUNE 2001

. Using (25), and the learning algo-rithm presented in (2), we obtain

Consequently, the input error covariance matrix becomes

where , and since is a functionof , then . Borrowing the resultsfrom Section III, we find that the learning gain that minimizesthe trace of is similar to the one presented by (13). Inparticular, defining

then

(26)

Therefore, borrowing some of the results presented in Sec-tions IV and V, one can conclude the following.

Theorem 6: Consider the system presented in (25). Ifis full-column rank and ,

then the learning algorithm, presented by (2), (26), and (16),guarantees that . In addition,and uniformly in as .

Lipschitz requirement on is not necessary. In casewhere , then

. If the function is uniformly globally Lipschitz inand for ; then, positive constant such that

.Theorem 7: Consider the system presented in (25), If

is full-column rank and is uniformly globally Lip-schitz in and for , then the learning algorithm,presented by (2), (26), and (16), guarantees that

. In addition, and uniformly inas .

Proof: The proof is the same as the corresponding partof the proof of Theorem 3 except the nonsingularity of .Since is Lipschitz, then positive constant such that

. Borrowing the resultsgiven in [14, Th. 1], then the infinite norms of the state errorsare bounded. Therefore, is bounded for all , consequently,

is nonsingular.

B. Nonlinear Deterministic Disturbance

In this section, nonlinear repetitive state disturbance is addedto the linear system as follows

(27)

where may represent the repetitive state disturbance. It isassumed that the function is uniformly globally Lipschitzin and for . Let

, then is also uniformly globally Lipschitzin and for . The error equation corresponding to(27) is given by

(28)

Substituting in place of in the definition of thematrix of (26), the following results are concluded.

Theorem 8: If is a full-column rank matrix,then for any positive constant , the learning algorithm, pre-sented by (2), (26), and (16), will generate a sequence of inputssuch that the infinite-norms of the input, output, and state errorsgiven by (28) decrease exponentially for , in

. In addition, .

Proof: Define . Since , ,

in particular, for , and , thus, for and .

Borrowing results of [14, Corollary 2], then the infinite-normsof the input, output, and state errors given by (28) decrease ex-ponentially for , in . The results relatedto the input error covariance matrix can be concluded from theprevious subsection.

REFERENCES

[1] Y.-H. Kim and I.-J. Ha, “A learning approach to precision speed controlof servomotors and its application to a VCR,”IEEE Trans Contr. Syst.Technol., vol. 7, pp. 466–477, July 1999.

[2] M. Pandit and K.-H. Buchheit, “Optimizing iterative learning controlof cyclic production process with application to extruders,”IEEE TransContr. Syst. Technol., vol. 7, pp. 382–390, May 1999.

[3] S. S. Garimella and K. Srinivasan, “Application of iterative learning con-trol to coil-to-coil control in rolling,”IEEE Trans Contr. Syst. Technol.,vol. 6, pp. 281–293, Mar. 1998.

[4] S. Arimoto, S. Kawamura, and F. Miyazaki, “Bettering operation ofrobots by learning,”J. Robot. Syst., vol. 1, pp. 123–140, 1984.

[5] S. Arimoto, “Learning control theory for robotic motion,”Int. J. Adap-tive Control Signal Processing, vol. 4, pp. 544–564, 1990.

[6] S. Arimoto, T. Naniwa, and H. Suziki, “Robustness of P-type learningcontrol theory with a forgetting factor for robotic motions,” inProc. 29thIEEE Conf. Decision Control, Honolulu, HI, Dec. 1990, pp. 2640–2645.

[7] S. Arimoto, S. Kawamura, and F. Miyazaki, “Convergence, stability,and robustness of learning control schemes for robot manipulators,” inInt. Symp. Robot Manipulators: Modeling, Control, Education, Albu-querque, NM, 1986, pp. 307–316.

[8] G. Heinzinger, D. Fenwick, B. Paden, and F. Miyaziki, “Stability oflearning control with disturbances and uncertain initial conditions,”IEEE Trans. Automat. Contr., vol. 37, pp. 110–114, Jan. 1992.

[9] J. Hauser, “Learning control for a class of nonlinear systems,” inProc.26th Conf. Decision Control, Los Angeles, CA, Dec. 1987, pp. 859–860.

[10] A. Hac, “Learning control in the presence of measurement noise,” inProc. American Control Conf., 1990, pp. 2846–2851.

Page 11: SAAB2001_A Discrete Time Stochastic Learning Algorithm

SAAB: A DISCRETE-TIME STOCHASTIC LEARNING CONTROL ALGORITHM 887

[11] S. S. Saab, “A discrete-time learning control algorithm for a class oflinear time-invariant systems,”IEEE Trans. Automat. Contr., vol. 40, pp.1138–1142, June 1995.

[12] , “On the P-type learning control,”IEEE Trans. Automat. Contr.,vol. 39, pp. 2298–2302, Nov. 1994.

[13] S. S. Saab, W. Vogt, and M. Mickle, “Learning control algorithms fortracking slowly varying trajectories,”IEEE Trans. Syst., Man, Cybern.,vol. 27, pp. 657–670, Aug. 1997.

[14] S. S. Saab, “Robustness and convergence rate of a discrete-time learningcontrol algorithm for a class of nonlinear systems,”Int. J. Robust Non-linear Control, vol. 9, no. 9, July 1999.

[15] C.-J. Chien, “A discrete iterative learning control for a class of non-linear time-varying systems,”IEEE Trans. Automat. Contr., vol. 43, pp.748–752, May 1998.

[16] Z. Bien and K. M. Huh, “Higher-order iterative learning control algo-rithm,” in Proc. Inst. Elec. Eng., vol. 136, 1989, pp. 105–112.

[17] Y. Chen, C. Wen, Z. Gong, and M. Sun, “An iterative learning controllerwith initial state learning,”IEEE Trans. Automat. Contr., vol. 44, pp.371–376, Feb. 1999.

[18] Z. Geng, R. Carroll, and J. Xie, “Two-dimensional model and algorithmanalysis for a class of iterative learning control systems,”Int. J. Control,vol. 52, pp. 833–862, 1990.

[19] Z. Geng, M. Jamshidi, and R. Carroll, “An adaptive learning controlapproach,” inProc. 30th Conf. Decision Control, Dec. 1991, pp.1221–1222.

[20] J. Kurek and M. Zaremba, “Iterative learning control synthesis based on2-D system theory,”IEEE Trans. Automat. Contr., vol. 38, pp. 121–125,Jan. 1993.

[21] T. Y. Kuc, J. S. Lee, and K. Nam, “An iterative learning control theoryfor a class of nonlinear dynamic systems,”Automatica, vol. 28, no. 6,pp. 1215–1221, 1992.

[22] K. L. Moore, “Iterative Learning control for deterministic systems,” inAdvances in Industrial Control. London, U.K.: Springer-Verlag, 1993.

[23] P. Bondi, G. Casalino, and L. Gambardella, “On the iterative learningcontrol schemes for robot manipulators,”IEEE J. Robot. Automat., vol.4, pp. 14–22, Aug. 1988.

[24] D. Gorinevsky, “An approach to parametric nonlinear least square op-timization and application to task-level learning control,”IEEE Trans.Automat. Contr., vol. 42, pp. 912–927, July 1997.

[25] R. Longman and S. L. Wirkander, “Automated tuning concepts for Iter-ative learning and repetitive control laws,” inProc. 37th Conf. DecisionControl, Dec. 1998, pp. 192–198.

[26] J. Craig, “Adaptive control of manipulators through repeated trials,” inProc. Amer. Control Conf., San Diego, CA, June 1984, pp. 1566–1573.

[27] C. Atkeson and J. McInntyre, “Robot trajectory learning through prac-tice,” in Proc. IEEE Int. Conf. Robotics Automation, San Francisco, CA,1986, pp. 1737–1742.

[28] P. Lucibello, “Output zeroing with internal stability by learning,”Auto-matica, vol. 31, no. 11, pp. 1665–1672, 1995.

[29] , “On the role of high-gain feedback in P-type learning control ofrobot arms,”IEEE Trans. Syst., Man, Cybern., vol. 12, pp. 602–605,Aug. 1996.

[30] A. De Luca and S. Panzieri, “End-effector regulation of robots withelastic elements by an iterative scheme,”Int. J. Adapt. Control SignalProcessing, vol. 10, no. 4–5, pp. 379–393, 1996.

[31] R. Horowitz, W. Messner, and J. B. Moore, “Exponential convergenceof a learning controller for robot manipulators,”IEEE Trans. Automat.Contr., vol. 36, pp. 890–894, July 1991.

[32] N. Amann, D. Owens, E. Rogers, and A. Wahl, “AnH approach tolinear iterative learning control design,”Int. J. Adaptive Control SignalProcessing, vol. 10, no. 6, pp. 767–781, 1996.

[33] S. Hara, Y. Yamamoto, T. Omata, and M. Makano, “Repetitive controlsystems: A new type servo system for periodic exageneous signals,”IEEE Trans. Automat. Contr., vol. 33, pp. 659–667, May 1998.

[34] S. R. Oh, Z. Bien, and I. H. Suh, “An iterative learning control methodwith application for the robot manipulators,”IEEE J. Robot. Automat.,vol. 4, pp. 508–514, Oct. 1988.

[35] M. Togai and O. Yamano, “Analysis design of an optimal learningcontrol scheme for industrial robots: A discrete system approach,” inProc. 24th Conf. Decision Control, Ft. Lauderdale, FL, Dec. 1985, pp.1399–1404.

[36] N. Sadegh and K. Gluglielmo, “Design and implementation of adaptiveand repetitive controllers for mechanical manipulators,”IEEE Trans.Robot. Automat., vol. 8, pp. 395–400, June 1992.

[37] F. Furuta and M. Yamkita, “The design of a learning control system formultivariables systems,” inProc. IEEE Int. Symp. Intelligent Control,Philadelphia, PA, Jan. 1987, pp. 371–376.

[38] K. L. Moore, M. Dahleh, and S. P. Bhattacharyya, “Adaptive gain adjust-ment for a learning control method for robotics,” inProc. 1990 IEEE Int.Conf. Robotics Automation, Cincinnati, OH, May 1990, pp. 2095–2099.

[39] R. Roessor, “A discrete state-space model for linear image processing,”IEEE Trans. Automat. Contr., vol. AC-21, pp. 1–10, Jan. 1975.

[40] F. L. Lewis,Applied Optimal Control and Estimation: Digital Designand Implementation. Upper Saddle River, NJ: Prentice-Hall, 1992.

Samer S. Saab (S’92–M’93–SM’98) receivedthe B.S., M.S., and Ph.D. degrees in electricalengineering in 1988, 1989, and 1992, respectively,and the M.A. degree in applied mathematics in 1990,all from the University of Pittsburgh, Pittsburgh, PA.

He is Assistant Professor of Electrical andComputer Engineering at the Lebanese AmericanUniversity, Byblos, Lebanon. Since 1993, he hasbeen a Consulting Engineer at Union Switch andSignal, Pittsburgh, PA. From 1995 to 1996, hewas a System Engineer at ABB Daimler-Benz

Transportation, Inc., Pittsburgh, PA, where he was involved in the designof automatic train control and positioning systems. His research interestsinclude iterative learning control, Kalman filtering, inertial navigation systems,nonlinear control, and development of map matching algorithms.