Application of IFT and SPSA to servo system control.download.xuebalib.com/79xdNdmt3yD2.pdf · IEEE...

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 22, NO. 12, DECEMBER 2011 2363

Application of IFT and SPSA toServo System Control

Mircea-Bogdan Radac, Radu-Emil Precup, Senior Member, IEEE, Emil M. Petriu, Fellow, IEEE,and Stefan Preitl, Senior Member, IEEE

Abstract— This paper treats the application of two data-basedmodel-free gradient-based stochastic optimization techniques, i.e.,iterative feedback tuning (IFT) and simultaneous perturbationstochastic approximation (SPSA), to servo system control. Therepresentative case of controlled processes modeled by second-order systems with an integral component is discussed. New IFTand SPSA algorithms are suggested to tune the parameters ofthe state feedback controllers with an integrator in the linear-quadratic-Gaussian (LQG) problem formulation. An implemen-tation case study concerning the LQG-based design of an angularposition controller for a direct current servo system laboratoryequipment is included to highlight the pros and cons of IFT andSPSA from an application’s point of view. The comparison ofIFT and SPSA algorithms is focused on an insight into theirimplementation.

Index Terms— Iterative feedback tuning, performance indices,servo systems, simultaneous perturbation stochastic approxima-tion, state feedback control.

I. INTRODUCTION

THE SERVO systems are important as they have to exhibitvery good control system (CS) performance in many

applications [1]–[6]. A convenient control strategy that dealswith servo systems is represented by state feedback CSs whichare widely used due to the advantages offered by the state-space mathematical modeling [7]–[9]. The improvement ofthe CS performance is normally obtained by optimizationin terms of the minimization of cost functions expressed asintegral quadratic performance indices [10]–[17], that alsoprovides a convenient way to deal with the degrees of freedomassociated to the pole placement design of multi-input–multi-output (MIMO) systems.

An important feature of data-based control techniques is theuse of additional information on the process. This informationshould be obtained using data collected from the real-worldCSs in terms of experiments that do not affect their normaloperating regimes. This idea narrows the general gap betweenthe theory and the practice of control design especially if

Manuscript received January 14, 2011; revised October 22, 2011; acceptedOctober 22, 2011. Date of publication November 10, 2011; date of currentversion December 13, 2011. This work was supported in part by a grant of theRomanian National Authority for Scientific Research, Consiliul National alCercetarii Stiintifice-UEFISCDI, under Project PN-II-ID-PCE-2011-3-0109.The authors contributed equally to this paper.

M.-B. Radac, R.-E. Precup (corresponding author), and S. Preitl are with theDepartment of Automation and Applied Informatics, Politehnica Universityof Timisoara, Timisoara 300223, Romania (e-mail: [email protected];[email protected]; [email protected]).

E. M. Petriu is with the School of Information Technology and Engi-neering, University of Ottawa, Ottawa, ON K1N 6N5, Canada (e-mail:[email protected]).

Digital Object Identifier 10.1109/TNN.2011.2173804

these informative experiments affect as less as possible theCS behavior.

The most frequently used data-based control techniquesare iterative feedback tuning (IFT), virtual reference feedbacktuning (VRFT), correlation-based tuning, frequency domaintuning, iterative regression tuning, and simultaneous pertur-bation stochastic approximation (SPSA). Two of the rep-resentative techniques, viz., IFT and SPSA, are based onstochastic approximation (SA) results that are used in thegeneral context of stochastic optimization. That is reallyimportant since the stochastic effects should be considered ifthe data-based control techniques are applied in real-worldprocesses. IFT uses the Robbins–Monro’s SA and uses anunbiased estimate of the gradient of the cost function (c.f.)through experiments. SPSA starts with Kiefer–Wolfowitz’s SAalgorithm where an estimate of the gradient of the c.f. isobtained via finite differences. IFT and SPSA were developedfor slightly different purposes, i.e., IFT was developed withinthe area of CS design and SPSA was developed for moregeneral-purpose optimization applications.

The drawing of the complete connections of these data-based algorithms with all related disciplines of control engi-neering is a tremendous effort, and it is not the intendedaim of this paper. The main aim of this paper is to revealthe applicability of two data-based model-free gradient-basedstochastic optimization techniques, viz., IFT and SPSA, toservo system control in a linear-quadratic-Gaussian (LQG)formulation. A special emphasis is given to the insight learnedfrom the application of IFT and SPSA algorithms to tune theparameters of state feedback controllers with an integrator forthe representative case of controlled processes modeled bysecond-order systems with an integral component.

IFT offers a direct data-based offline-adaptive controllertuning approach. IFT performs a gradient-based minimizationof the c.f., and it provides an efficient way to deal with someof the specific problems of nonlinear or ill-defined processes.The c.f. minimization algorithm uses data obtained from thereal-time experiments conducted with the real-world CS.

A good overview of the standard IFT is given in [18]. Theextension of IFT according to [19] provides additional stepsto improve the convergence properties of IFT while rejectingthe disturbances. The input–output signals of the process areemployed in [20] to identify a linear time-varying model ofthe process which is further used in IFT. IFT applications toindustrial control problems are reported in the literature, forexample, for the control of chemical process [21] and for servodrive control [22], [23]. Discussions of the IFT approach tothe nonlinear process control are given in [24]–[26].

1045–9227/$26.00 © 2011 IEEE

2364 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 22, NO. 12, DECEMBER 2011

SPSA was introduced in [27] and [28] as an efficient alter-native to finite difference stochastic approximation algorithmin which the number of evaluations of the c.f. per iteration isequal to the number of variables of the c.f., viz. the numberof tuning parameters in case of optimal control. SPSA usesonly two c.f. evaluations per iteration resulting in reducedcosts with advantages when the measurements associated tothe evaluations are conducted on real-world processes.

Many attractive applications of SPSA algorithms arereported in the literature in relation with parameter estimationof neural networks [29], [30], drive systems [31], model pre-dictive control [32], intelligent control [33], neural network-based fault detection and isolation [34], filter design [35] ormotion planning for mobile robots [36]. The reduction of thenumber of evaluations of the c.f. per iteration to only one issuggested in [37].

IFT and SPSA can be viewed in the general frameworkof optimal control strategies for nonlinear systems withuncertainties, where the popular approaches include adaptivedynamic programming (ADP), neuro-dynamic programmingor adaptive critics. A mathematical formulation of ADPderived from reinforcement learning is discussed in [38]. Theneuro-fuzzy ADP with eligibility traces is proposed in [39]to deal with the problems of multiple ramps metering. Themetro line train control application presented in [40] showsthat neuro-dynamic programming with recurrent critic is ableto find a near-optimal solution more rapidly and accuratelythan that with forward critic design. An approximate dynamicprogramming approach implemented with an adaptive criticneural network structure to solve optimal control problemsis analyzed in [41]. An ADP approach based on geneticalgorithms for global searching and fast learning speed issuggested in [42]. A neuro-dynamic programming approach torefine a heuristic solution of a logistic control node problemis presented in [43]. An iterative ADP algorithm that obtainsthe optimal control law which makes the performance indexfunction close to the greatest lower bound of all performanceindices within a certain error bound is given in [44] andapplied to solve the finite-horizon optimal control problem fordiscrete-time nonlinear systems.

The new contributions of this paper are:1) the application of IFT and SPSA to the LQG problem

formulation for state feedback controllers with an inte-grator using the process plus fixed state estimator;

2) new IFT and SPSA algorithms based on an originalsetup to conduct the experiments in order to calculate thegradients of the c.f.s. Our new state-space formulationis attractive as IFT has been applied in a similar settingwith state feedback plus estimator in [45] but for atransfer function formulation;

3) the comparison of IFT and SPSA by simulation andexperimental results that correspond to the angular posi-tion of a DC servo system laboratory equipment.

The internal stability of the state feedback CS throughoutthe tuning process is checked after each iteration in the normalexperiment, since it is a requirement to ensure the convergenceof the algorithms. If the current set of parameters does notstabilize the closed-loop system we go back to the previous set.

However, it is not the purpose of this paper to focus on thetools that are already available for automated stability check.One such tool is based on the calculation of the generalizedstability margin and on the calculation of the Vinnicombemetric (ν-gap) between the current and the next controller[46]. On the other hand, when making small enough stepsin the gradient direction, it is always safe to experiment onthe closed-loop since it will be stable. For the purpose ofavoiding non-robust stability, IFT or SPSA should not be usedto minimize c.f.s where only the output is weighted, and theonly driving input is the noise, without a reference input. Thisleads to a minimum variance controller which is non-robust.

This paper is structured as follows. The LQG servo con-troller problem is discussed in Section II. Sections III and IVfocus on IFT and SPSA and on the formulation of thenew algorithms to solve the LQG servo controller problem.Section V is dedicated to the implementation of our IFT andSPSA algorithms in a case study. Digital and experimentalresults concerning the optimal state space control of theangular position of a DC servo system laboratory equipmentare included. A discussion on IFT and SPSA is carried out inSection VI. The conclusions are drawn in Section VII.

II. PROBLEM SETTING

Let the controlled process as part of servo systems be char-acterized by the following discrete time linear time-invariant(LTI) single input–single output (SISO) state-space model:

x1(k + 1) = x1(k) + T�

[1 − exp

(−Ts

T�

)]x2(k)

+ kP

[Ts + T� exp

(−Ts

T�

)− T�

]u(k)

x2(k + 1) =[

exp

(−Ts

T�

)]x2(k)

+ kP

[1 − exp

(−Ts

T�

)]u(k)

y(t) = x1(k) (1)

obtained through discretization from a typical servo systemtransfer function

P(s) = kP

[s(1 + T�s)](2)

where u is the control signal, x1 is the angular position andalso the controlled output y, x2 is the angular velocity, Ts isthe sampling period, k, k ∈ N, is the discrete time argument,and kP and T� are the process gain and time constants,respectively.

The LQG problem is formulated for processes in a stochas-tic framework that are described by the more general state-space model

x(k + 1) = Ax(k) + Bu(k) + Bw(k)

y(k) = Cx(k) + Du(k) + Dw(k) + ν(k) (3)

where u is the control signal, x = [ x1 x2 ]T ∈ R2 is the statevector, A ∈ R2×2, B ∈ R2×1, B ∈ R2×2, C ∈ R1×2, andD ∈ R1×2 are constant matrices, D = const ∈ R, w ∈ R2,and v ∈ R are the uncorrelated process state noise vector

RADAC et al.: APPLICATION OF IFT AND SPSA TO SERVO SYSTEM CONTROL 2365

and measurement noise, respectively, that include the normalindependent identically distributed random variables with zeromeans and the variances σ 2

w and σ 2v , respectively. Zero initial

conditions are assumed throughout this paper for the processdynamics without affecting the generality. It is accepted thatthe controlled output is measurable and the process is con-trollable and observable. For our strictly causal second-orderpositioning system modeled in (1) D = 0 and D = 0 areconsidered.

The LQG problem deals with c.f.s expressed as

I (K, L) = E

{ ∞∑k=0

[xT (K(k), L(k))Qx(K(k), L(k))

+λu2(K(k), L(k))]}

(4)

subject to the process dynamics (3), where the expectationE{} is taken with respect to the stochastic disturbances w andv,T indicates the matrix transposition, and Q and λ are theweights

Q ≥ 0, Q = [qi j ]i, j=1,n, qi j = q j i , i, j = 1, 2, λ > 0. (5)

It is well known that the LQG problem consists of twoproblems that can be designed independently, the optimalcontrol problem and the optimal estimation problem. K(k) ∈R1×2 in (4) is the time-varying state feedback gain matrix inthe state feedback optimal control law

u(k) = −K(k)x(k) (6)

and L(k) ∈ R2×1 is the optimal estimation gain of the Kalmanfilter which is employed to estimate the state variables of theprocess. The optimal control sub-problem is in fact the linear-quadratic regulator (LQR) problem. The corresponding steady-state solutions are used often in practice.

In order to design the optimal filter (i.e., the Kalman filter),the noise magnitudes have to be supplied, that is the covariancematrices of the noises QN and RN

QN = E{w(k)wT (k)} ∈ R2×2, RN = E{v2(k)} ∈ R (7)

and the cross-covariance matrix NN as well

NN = E{w(k)ν(k)} ∈ R2×1. (8)

The measurement noise stochastic properties acting onthe output can be calculated relatively easily. However, theproperties of the state noise are more difficult to estimate,and w(k) is usually considered to be white noise in orderto account for a large class of possible disturbances, modeluncertainties, but also for the simplification of the optimalestimation solution.

The resulting state estimate x(k) minimizes the steady-stateerror covariance P

P = limk→∞ E

{(x(k) − x(k))(x(k) − x(k))T

}. (9)

The discrete time steady-state Kalman filter equations are

x(k + 1|k) = Ax(k|k − 1)

+Bu(k) + L(y(k) − Cx(k|k − 1))[y(k|k)x(k|k)

]=[

C(I − MC)

I − MC

]x(k|k − 1)+

[CMM

]y(k) (10)

Plant

StateestimatorK

ur e XI

1/z

KI

y

vw

xà

− − −

Fig. 1. State feedback control system structure with an integrator to ensurezero steady-state control error.

where the first equation in (10) is the time update, thesecond set of equations are the measurement update, L is theestimation gain, and M is the innovation gain. The notationsx(k|k) and x(k|k − 1) outline the state vector at time k, givenmeasurements up to time k and to time k–1, respectively. Thestate vector x(k|k) is used for feedback in the optimal controllaw of type (6) as it is the true state vector.

The LQG problem allows an alternative (optimal) designfor state feedback controllers with an integrator accordingto the CS structure presented in Fig. 1 that can be used inservo system control. Another controller design solution is thepopular pole placement techniques for both the state observerand for the state feedback control law.

The advantages of this CS structure can be weighted takinginto account the fact that the integrator in our process makes itdifficult to alleviate the overshoot in the dynamic response forone-degree-of-freedom or two-degree-of-freedom CSs whenthere is also an integrator in the controller and when stepreference input is considered. An integrator is required in thecontroller to ensure zero steady-state control error and therejection of input and/or output load disturbances. The processintegrator can ensure only the regulation behavior (unit staticgain from reference to output). The proposed state feedbackCS can eliminate one integrator in the open-loop system,thus leaving the regulation problem and the load disturbancerejection to the integrator. The use of the LQG problem todesign the state feedback gains and the integrator gain in Fig. 1will be referred to as follows as the LQG servo controllerproblem. In the model-based paradigm the LQG design isconducted in an iterative manner, in which proper values forthe weighting matrices Q and λ are tested in order to meetthe design specifications.

Several situations that can occur in practice are:1) the discrepancy between the process model and the

reality is large enough to compromise the design specifi-cations expressed as imposed performance requirements;

2) the process changes in time and the CS needs retuningto meet the design specifications;

3) a change in the design specifications is required orneeded.

For these situations the solution is retuning, and in this paperwe study the opportunity to implement two techniques that donot require a process model in the tuning procedure, namelyIFT and SPSA. The purpose is to adapt these data-basedtechniques to the LQG optimization problem formulation.

In practical situations it is desired to drive the state vectorsto a desired point in the state space and the introduction of


input references for reference input tracking is required. Inaddition, the zero steady-state control error is targeted, hencean integrator is used as shown in Fig. 1, where an additionalstate variable (viz., the integrator state variable) xI is addedto the dynamics defined in (3).

The dynamics of the integrator is expressed as follows usingFig. 1 and (3):xI (k + 1) = xI (k) + e(k + 1) = xI (k)

+ r(k + 1) − y(k + 1) = xI (k) + r(k + 1)

− C(Ax(k) + Bu(k) + Bw(k)) − v(k + 1)

(11)

where r is the reference input and e is the control error.The dynamics of the state feedback CS results by the

combinations of (3) and (11) and of the optimal control lawobtained with LQR

xa(k + 1) = Gxa(k) + Hu(k) +[

01

]r(k + 1)

+[

B−C

]w(k) +

[0

−1

]v(k + 1), xa(0)

= [ 0 . . . 0 ]T ∈ R(2+1)×1, (12)

u(k) = −Kaxa(k)

where the matrices are

xa(k) =[

x(k)xI (k)

], G =

[A 0

−CA 1

], H =

[B

−CB

],

Ka = [ K KI ] (13)

and the subscript a stands for the augmentation of the statevector and state feedback gain matrix.

The steady-state analysis of (12) for the step reference inputof magnitude r(∞) that respects

r(1) = · · · = r(k + 1) = r(∞) = const (14)

leads to

xa(∞) = Gxa(∞) + Hu(∞) +[

01

]r(∞)

+[

In

−C

]w(∞) +

[0

−1

]v(∞)

u(∞) = −Kaxa(∞) (15)

where the argument ∞ associated to a variable points out thesteady-state value of that variable. The stochastic characterwith respect to w and v is preserved.

We next define the state error vector ε(k) (which consists ofthe state errors) with respect to the steady-state value xa(∞)and the control signal error uε(k) with respect to the steady-state value u(∞)

ε(k) = xa(k) − xa(∞), uε(k) = u(k) − u(∞). (16)

The subtraction of (15) out of (12) and the use of (16) leadsto the following error dynamics:

ε(k + 1) = Gε(k) + Huε(k) +[

In

−C

]w(k)

+[

0−1

]v(k + 1), ε(0) �= [ 0 . . . 0 ]T

∈ R(2+1)×1

uε(k) = −Kaε(k). (17)

The LQG servo controller problem for this dynamicalsystem can be formulated such that to minimize the

Iε(Ka, L) = E

{ ∞∑k=0

[εT (Ka, L, k)Qε(Ka, L, k)

+λu2ε(Ka, L, k)

]}(18)

where the weights Q and λ are defined similar to the ones in(5) and with appropriate dimensions. The solution is expressedas the optimal estimator gain L and the optimal state feedbackgain matrix Ka .

In the view of applying data-based optimization to theaforementioned problems, one would have to be able toevaluate the c.f.s for finite time horizons using the estimatedstates when measurements are not available. A suitable c.f.used in this context is

J (Ka, L) = E

{ N∑k=0

[ε

T (Ka, L, k)Qε(Ka, L, k)

+λu2ε(Ka, L, k)

]}(19)

where we use the state estimates to define their state errors[which belong to the estimates state error vector ε(k)], exceptfor the integrator state which does not need to be estimated.The c.f. defined in (19) is not the original LQG criterion(18) but rather an approximation based on the estimated statevariables. If the c.f. defined in (19) is minimized by data-basedoptimization and the Kalman filter is already designed suchthat the filter gain L is fixed then the new system dynamicswill be again augmented with the filter dynamics. Equations(10) and (12) are expressed as the following fifth-order system:⎡⎣ x(k + 1)

x(k + 1|k)xI (k + 1)

⎤⎦ =

⎡⎣ A 0 0

LC A − LC 0−CA 0 1

⎤⎦⎡⎣ x(k)

x(k|k − 1)xI (k)

⎤⎦

+⎡⎣ B

B−CB

⎤⎦ u(k) +

⎡⎣ 0

01

⎤⎦ r(k + 1)

+⎡⎣ B

0−CB

⎤⎦w(k) +

⎡⎣ 0 0

L 00 −1

⎤⎦[

ν(k)v(k + 1)

],

x(k|k) = [MC I − MC 0

] ⎡⎣ x(k)x(k|k − 1)

xI (k)

⎤⎦+ Mν(k),

u(k) = − [ 0T K KI]⎡⎣ x(k)

x(k|k)xI (k)

⎤⎦. (20)

The following notation is introduced to highlight the para-meterization of the optimization problem to be solved by IFTand SPSA:

ρ = (Ka)T ∈ R(2+1)×1. (21)


For c.f.s defined in accordance with (19), the use of theargument vector defined in (21) leads to the new expression

J (ρ) = E

{N∑

k=0

[ε

T (ρ, L, k)Qε(ρ, L, k) + λu2ε(ρ, L, k)

]}.

(22)The IFT and the SPSA algorithms will conveniently be

employed in the next sections to find a solution ρ∗ to theoptimization problem

ρ∗ = arg minρ∈S D

J (ρ) (23)

where SD stands for the stability domain of all state feedbackgain matrices that ensure a stable CS.

In order to solve the optimization problem defined in (23)a parameter vector ρ has to be found such that

∂ J

∂ρ=[

∂ J∂ρ1

· · · ∂ J∂ρ3

]T = [ 0 · · · 0]T (24)

which, for the c.f. J defined in (22), becomes

∂ J

∂ρl= 2

N∑k=0

⎧⎪⎪⎨⎪⎪⎩

⎡⎢⎢⎣

n∑i, j=1i≥ j

(qi j εi

∂ε j

∂ρl

)⎤⎥⎥⎦

+λuε∂uε

∂ρl

}= 0, l = 1, 2, 3. (25)

The cases of constrained optimization problems useKarush–Kuhn–Tucker optimality conditions instead of the nullgradient given by (24). These constraints account for techno-logical and/or economical conditions related to the operationof the real-world processes [47]–[52].

III. IFT ALGORITHM

IFT is a gradient-based SA technique meant to find theminimum of a c.f. that can only be known through noisymeasurements. It was developed to cope with LQG like perfor-mance criteria, in a variety of problems such as combinationsof reference model tracking, control effort penalty, noiserejection, optimal tracking.

The partial derivatives ∂εi/∂ρl and ∂uε/∂ρl need to be cal-culated first in order to obtain the derivatives est[∂ J/∂ρl ], l =1, 2, 3, in the gradient of the c.f. What can be obtained,however, are estimates of the gradients, est[∂ J/∂ρl ], l =1, 2, 3, by obtaining estimates of the gradients involved in theright side of (22). Having this gradient estimate calculated, theminimum of the c.f. can be aimed through iterative steps inthe gradient direction

ρi+1 = ρ i − γ i (Ri )−1est

[∂ J

∂ρ

(ρi)]

, Ri > 0 (26)

where the superscript i, i ∈ N, is the current iteration/exper-iment index, γ i , γ i > 0, is the step size, est[(∂ J/∂ρ)(ρi )] isthe unbiased estimate of the gradient, and the regular matrix Ri

can be the estimate of the Hessian matrix, the Gauss–Newtonapproximation of the Hessian, or the identity matrix in thecase of less demanding and slower convergent computations.

The step size sequence {γ i }i∈N should evolve in time suchthat to satisfy some bounds. With this regard the condi-tions to ensure the convergence of the stochastic algorithmare [18], [19]

∞∑i=0

γ i = ∞,

∞∑i=0

(γ i )2 < ∞ . (27)

A good choice of the step size sequence that ensures thedivergence of the first series in (27) and also the convergenceof the second series in (27) is

γ i = γ 0

iα, i ∈ N, i ≥ 1, 0.5 < α ≤ 1 (28)

where the initial step size γ 0, γ 0 > 0, is set such that to ensurea compromise to the numerical stability and to the convergencespeed.

A biased estimate of the Hessian matrix can be employedin the update law (26) as the Gauss–Newton approximation

Ri =N∑

k=1

{est

[∂ ε

∂ρ(ρi )

]T

Q est

[∂ ε

∂ρ(ρi )

]

+λ est

[∂uε

∂ρ(ρi )

]T

est

[∂uε

∂ρ(ρi )

]}(29)

where the estimates of the gradients are used when thestochastic environment is accepted. An example of unbiasedestimator is given in [53].

In order to apply IFT to the c.f. defined in (22) using thedynamics defined in (20) with fixed L, the derivatives of theestimated state error vectors ε(Ka, k) have to be calculated.The definition of these errors is

ε(Ka, k) =[

x(Ka, k|k) − x(Ka,∞|∞)xI (Ka, k) − xI (Ka,∞)

](30)

and their derivatives with respect to one parameter Kl , l =1, 2, 3, in the matrix Ka = [ K1 K2 K3 = KI ] are

∂ ε(Ka, k)

∂Kl=⎡⎣ ∂ x(Ka,k|k)

∂Kl− ∂ x(Ka ,∞|∞)

∂Kl

∂xI (Ka,k)∂Kl

− ∂xI (Ka,∞)∂Kl

⎤⎦. (31)

Since the partial derivatives of the state estimates are neededtogether with the derivative of the integrator state variable, andtaking into account that the derivation of r , w, and v withrespect to the parameter Kl are zero, the derivation of (20)with respect to Kl leads to

∂

∂Kl

⎛⎝⎡⎣ x(k + 1)

x(k + 1|k)xI (k + 1)

⎤⎦⎞⎠ =

⎡⎣ A 0 0

LC A − LC 0−CA 0 1

⎤⎦

× ∂

∂Kl

⎛⎝⎡⎣ x(k)

x(k|k − 1)xI (k)

⎤⎦⎞⎠+

⎡⎣ B

B−CB

⎤⎦ ∂

∂Klu(k),

∂

∂Klx(k|k) = [MC I − MC 0

]

× ∂

∂Kl

⎛⎝⎡⎣ x(k)

x(k|k − 1)xI (k)

⎤⎦⎞⎠


∂

∂Klu(k) = − ∂

∂Kl

([0T K KI

]) ⎡⎣ x(k)x(k|k)xI (k)

⎤⎦

− [ 0T K KI] ∂

∂Kl

⎛⎝⎡⎣ x(k)

x(k|k)xI (k)

⎤⎦⎞⎠. (32)

Equations (32) represent the deterministic dynamics (sincethe derivatives of the noise inputs with respect to the parame-ters are zero) with the state feedback gain, with zero referenceinput and with an additive artificial disturbance fed to thecontrol signal u(k). The derivative of the gain matrix in thelast equation in (32), calculated with respect to one of its para-meters, is a gain matrix with 1 on the position of the respec-tive parameter and 0 otherwise. Therefore, by injecting therecorded state of a normal experiment (obtained with a refer-ence input different from zero) into the state feedback schemewith zero reference input we obtain the derivatives of the statevariables in (31) that are needed in order to evaluate the c.f.

If l as a superscript denotes the l-th gradient experiment andas subscript the l-th state variable, l = 1, 2, 3, then all statevariables of the new dynamic system are in fact the estimatesof the derivatives of the initial state variables with respect tothe l-th parameter in the parameter vector Ka or ρ [via (21)].We talk about estimates because at each real-time experimentthe dynamics are subject to the random disturbances w and v.In this context, to express the system dynamics in the gradientexperiment, the first two equations in (32) are transformed intothe first two equations in (20) but with r = 0, and all variablesare replaced by their values in the l-th gradient experiment(pointed out by the superscript l.) The expression of the thirdequation in (32) expressed in the l-th gradient experiment isobtained using the same variables as in the first two equationsbut, since the derivative of the gain matrix with respect to oneof its parameters is a gain matrix is a gain matrix with 1 onthe position of the respective parameter and 0 otherwise, thisleads to

∂

∂Kl

([0T K KI

])⎡⎣ x(k)

x(k|k)xI (k)

⎤⎦ = xl . (33)

Consequently, the dynamics of the system in the l-th gradientexperiment is characterized by

⎡⎣ xl(k + 1)

xl(k + 1|k)

xlI (k + 1)

⎤⎦ =

⎡⎣ A 0 0

LC A − LC 0−CA 0 1

⎤⎦⎡⎣ xl(k)

xl(k|k − 1)

xlI (k)

⎤⎦

+⎡⎣ B

B−CB

⎤⎦ ul(k) +

⎡⎣ B

0−CB

⎤⎦wl(k)

+⎡⎣ 0 0

L 00 −1

⎤⎦[

vl (k)

vl (k + 1)

],

xl(k|k) = [MC I − MC 0

]⎡⎣ xl(k)

xl(k|k − 1)

xlI (k)

⎤⎦,

+ Mvl (k),

ul(k) = −xl − [ 0T K KI]⎡⎣ xl(k)

xl(k|k)

xlI (k)

⎤⎦. (34)

The corresponding experimental setup is presented in Fig. 2.Proceeding this way we obtain the estimates of the gradientsof the state errors. Using the unbiased estimate of the gradientof the c.f., several steps can be carried out in the gradientdirection toward the solution.

The IFT algorithm consists of the following steps.

Step 0: Set the step size, the initial parameter vector ρ0 andthe weights in the c.f. The vector ρ0 is obtained asthe solution to the LQR servo controller problemapplied to (18) in a deterministic framework (that isas if we consider that the estimated state variablesare the true state variables).

Step 1: Conduct the initial (normal) experiment makinguse of the CS structure presented in Fig. 1 andrecord the evolution of all state variables.

Step 2: Conduct the three gradient experiments making useof the experimental setup presented in Fig. 2 toobtain all partial derivatives ∂εi/∂ρl and ∂uε/∂ρl ,l = 1, 2, 3.

Step 3: Conduct the normal experiment again such that thestates contain realizations of noise that differ formthe noise at Step 2 to ensure the unbiased estimateof the gradient.

Step 4: Calculate the estimates of the gradient of the c.f.according to (25).

Step 5: Calculate ρi+1 in terms of the update law (26).Step 6: If no significant decrease in the c.f. with the new

set of parameters is obtained, stop the algorithm,otherwise go to Step 1.

The parameter vector obtained by this IFT algorithm,referred to as the optimal parameter vector ρ∗, correspondsto the optimal state feedback gain matrix (Ka)

∗, expressedas [via (21)]

(Ka)∗ = (ρ∗)T =∈ R1×(2+1). (35)

The noises w and v are not the same in the normal and in thegradient experiments since they come from different realiza-tions of stochastic processes. The following assumptions arerequired to guarantee the convergence of the search algorithm.

1) From the point of view of the noise realizations thedisturbances should be zero mean bounded discrete timestochastic processes in any experiment.

2) The second-order statistics of the disturbances should bethe same in all experiments, but it is not required to bestationary within one experiment.

3) The disturbance sequences in different experiments aremutually independent [18].

IV. SPSA ALGORITHM

SPSA is, like IFT, a gradient-based SA technique that usesthe update law

ρ i+1 = ρi − ai est

[∂ J

∂ρ

(ρi)]

. (36)


Plant

StateestimatorK

r2 � 0 e2

∂xI

∂P2

∂u∂P

2I

∂P2

∂P2

1/z

KI

∂yv2

w2

∂X

− −−

−

x2

Fig. 2. Setup for the gradient experiment where a disturbance is added tothe control signal.

In IFT, it is possible to calculate the gradients by usingdata from the real time experiments. However, when suchschemes cannot be employed, according to Kiefer–Wolfovitz’sSA algorithm the gradients have to be estimated on the basisof the noisy measurements of the c.f. in terms of the calcu-lation of finite difference approximations around the currentpoint. Under specific conditions regarding the existence ofa minimum of the c.f., the differentiability with respect tothe parameters, and a suitable selection of {ai}i∈N, Robbins–Monro’s SA algorithm and Kiefer–Wolfowitz’s SA algorithmstate that the sequence of parameter vectors {ρi }i∈N convergesto the parameter vector ρ∗ that minimizes the c.f., J .

In SPSA, the approximations of the gradient are calculatedas follows using finite differences:

est

[∂ J

∂ρ

(ρ i)

est

]=

⎡⎢⎢⎣

J(ρi+ci i )− J (ρi −ci i )2cii1

. . .J(ρi+ci i )− J (ρi −ci i )

2ci il

⎤⎥⎥⎦ (37)

where i = [ i1 . . . il ]T , ci is the difference magnitudecoefficient, J in (37) represent noisy measurements of the c.f.The sequences {ai }i∈N and {ci }i∈N are degrees of freedom inthe SPSA algorithm. The numerator in (37) is the same forall components in the gradient vector, but the denominator isdifferent and proportional to the variation of the correspondingparameter in the set. The standard condition imposed to theelements il , l = 1, 2, 3, is that they should be independent,identically distributed with symmetric distribution around zero,and of bounded magnitude. In addition, there is a conditionrelated to the inverse moments of these random variables sothat a suitable distribution that respects all these requirementsis a Bernoulli distribution. A common choice is that therandom variables il , l = 1, 2, 3, take the values ±1 withprobability 0.5. A normal or uniform distribution is shown toreduce the performance of the algorithm.

The SPSA-based estimate is biased due to the noise, andthe convergence to ρ∗ is ensured if the following conditionsare fulfilled [54]:

ai > 0, ci > 0, ai → 0, ci → 0∞∑

i=0

ai = ∞,

∞∑i=0

(ai

ci

)2

< ∞. (38)

A suitable selection of the sequences {ai }i∈N and {ci}i∈Nis [54]

ai = a0

(i + A)α, ci = c0

iγ(39)

where a0 > 0, c0 > 0, A > 0, 0 < α ≤ 1, and γ > 0γ > 0.Only two evaluations of the c.f. defined in (22) are needed

in the application of SPSA algorithms to the LQG servocontroller problem. The design is started with the LQR solu-tion accounting for deterministic dynamics of the processaugmented with the integrator, and the c.f. defined in (22)is next minimized using SPSA algorithms.

Our SPSA algorithm consists of the following steps.Step 0: Set the parameters a0 > 0, c0 > 0, A > 0, α > 0,

and γ > 0, the initial parameter vector ρ0 and theweights in the c.f. The vector ρ0 is obtained asthe solution to the LQR servo controller problemapplied to (19) in a deterministic framework.

Step 1: Calculate i , ai , and ci .Step 2: Evaluate J(ρi + cii ) and J (ρi − cii ), and find

an estimate of the gradient according to (37).Step 3: Calculate ρi+1 in terms of the update law (36).Step 4: Test the decrease of the c.f. using one of the

two evaluations of the c.f. with the correspondingdisturbed parameters. If no significant decrease isrevealed then stop the algorithm, otherwise go toStep 1. This is valid if the disturbed parametervector is close to the current parameter vector.

In other words, the parameter vector is randomly disturbedonly two times per iteration to evaluate the gradient in theSPSA algorithm. The parameter vector ρ∗ obtained by thisSPSA algorithm leads to the optimal state feedback gainmatrix (Ka)

∗ expressed in (35).

V. CASE STUDY: SIMULATION AND

EXPERIMENTAL RESULTS

The case study aims the design of a CS dedicated to theangular speed control for a modular DC servo system withan integral component. The process is characterized approx-imately by the discrete time LTI SISO state-space modeldefined in (3) with the matrices

A =[

1 0.04870 0.9471

], B =

[0.18677.3993

],

B = I2, C = [1 0], D = 0, D = [0 0](40)

where I2 is the second-order identity matrix, the angularposition and the angular speed are the state variables x1 andx2, respectively.

The model defined in (3) with the matrices according to(40) is a simplified model of the process that corresponds to anexperimental setup built around an INTECO DC servo systemlaboratory equipment. The main features of the experimentalsetup are [55], the rated amplitude of 24 V, the rated currentof 3.1 A, the rated torque of 15 N cm, the rated speed of3000 rpm, the weight of inertial load of 2.03 kg. The angularspeed can be measured by a tachogenerator, but only theposition is measured here and the angular speed is estimatedvia a Kalman filter.

The values of the process parameters were obtained as kP =139.88 and T� = 0.92 s by the parameter identification of thefirst-principle model of the equipment given in (1) resulting inthe simplified process transfer function (2). A sampling periodof Ts = 0.05 s was next set.


×107

c.f.

JSPSA

IFT

Iteration number0 10 20 30

1.8

1.6

1.4

1

1.2

0.8

Fig. 3. Evolution of the c.f. over 30 iterations.

As it is usually the case, the model-based design makes useof a model that is different from the real-world process. It isassumed that an initial LQR servo design is desired for thedeterministic process augmented with the integrator in (18).Since the quadratic c.f. has to be convergent, the differencebetween the state variables and their steady-state values areweighted. Since the position measurement is available and itis affected by noise and the integrator state variable is alreadyavailable, an estimation of the state variables is required inthe LQR design. An optimal estimation design is carried outin order to obtain a Kalman filter. With the filter’s fixedparameters, and because the estimated states are available tothe user, an attempt is made to minimize the LQG-like c.f.defined in (19) over a finite time horizon of 10 s.

A rather crude model is used to design the LQR controllerand the Kalman filter, which starts from the process parameterskP = 150 and T� = 1.2 s. We used the following weights inthe LQR design:

Q =⎡⎣ 100 0 0

0 200 00 0 1

⎤⎦, λ = 300. (41)

A white noise disturbance is acting on the state with thestate noise intensity matrix QN and the measurement noiseintensity matrix RN

QN =[

σ 2w1 = 2 0

0 σ 2w1 = 1

],

RN = σ 2v = 0.06, NN = [0 0]T. (42)

Therefore the noise effect on both estimated states is alle-viated. In this setup, we account for the additional estimatordynamics in the process model, so we are sure that the LQR-based initial solution is not optimal as far as the minimizationof the c.f. defined in (19) is concerned. Next, the two data-based techniques, viz., IFT and SPSA, are employed in theminimization of the c.f. defined in (22). The estimator andthe innovation gains for the Kalman filter are L = M =[

0.0157 0.0025]T

.A number of 30 iterations were conducted for the IFT

algorithm presented in Section III for N = 1000 samples.The initial step size in the IFT algorithm was set to the initialvalue γ 0 = 10−10, and the values of the consequent step sizes

−0.15

−0.1

−0.05

0.35

0.45

0.55

1.8

1.85

1.9

K1

K2

K1

Iteration number

Parameters: IFT (solid) and SPSA (dotted)

0 5 10 15 20 25 30

0 5 10 15 20 25 30

0 5 10 15 20 25 30

Fig. 4. Evolution of the state feedback controller parameters versus theiteration number.

were set in terms of (28), with α = 0.51, such that to satisfy(27), and Ri = I3 were used.

The SPSA implemented here is characterized by the sameN and by the same number of iterations. The parameters inthe SPSA were set to the values a0 = 10−10, c0 = 0.005,A = 0.1, α = 0.4, and γ = 0.05 that fulfill the conditionsrelated to (39).

In both cases the starting point in the parameter space, asdesigned via LQR, was

Ka =(ρ0)T

= [K1 = 1.9229 K2 = 0.5163 KI = −0.0348

].

(43)

A step reference input of r = 20 rad was chosen for theposition. The final set of parameters obtained by the IFTalgorithm is

(Ka)∗ = (

ρ∗)T= [

K1 = 1.9249 K2 = 0.5174 KI = −0.0778].

(44)

The final set of parameters obtained by the SPSA algorithm is

(Ka)∗ = (

ρ∗)T= [

K1 = 1.8212 K2 = 0.4155 KI = −0.0645].

(45)

The evolution of the c.f. versus the iteration number ispresented in Fig. 3. The evolutions of the state feedbackcontroller parameters are presented in Fig. 4.

The difference in the initial value of the c.f. is due to thestochastic noise. For the same reason, a certain value of thec.f. varies because of the random factor at each evaluation.The decrease is obvious. The evolutions versus time of fourvariables of the state feedback CS are shown in Fig. 5 inthree situations corresponding to the initial set of parameters,the final set of parameters after tuning with IFT and the finalset of parameters after tuning with SPSA.


0

1

2

0

500

1000

0

10

20

time (s)

(PW

M d

uty

cycl

e)

Con

torl

sig

nal u

x 1 (r

ad s

) In

tegr

ator

sta

te

x 2 (r

ad/s

)

Est

imat

ed s

peed

x 1 (r

ad)

Est

imat

ed p

ositi

on

4 4.5 5 3.5 3 2.5 2 1.5 1 0.5 0

time (s) 4 4.5 5 3.5 3 2.5 2 1.5 1 0.5 0

time (s)

20

40

0

4 4.5 5 3.5 3 2.5 2 1.5 1 0.5 0

time (s) 4 4.5 5 3.5 3 2.5 2 1.5 1

Time responses: initial response (line-dot), after tuning with IFT (Solid), after tuning with SPSA (dotted)

0.5 0

Fig. 5. Responses of the state feedback CS recorded from simulated results. Estimated position, estimated angular speed, integrator state, and control signalversus time.

The evolutions versus time of the same variables of thestate feedback CS in the same three situations are presentedin Fig. 6, but they correspond to the experiments conductedwith the state feedback CS. The differences between the timeresponses in Figs. 5 and 6 are due to the difference betweenthe linear process model used in the design and tuning andthe real-world process model, and also to the different noiseproperties that act on the simulated process and the real-world process. The former difference influences the optimalcontroller gains and the latter influences the Kalman filterdesign and correspondingly the dynamics of the process plusestimator.

On the other hand, our process is nonlinear although thetheory behind the derivation of the IFT technique is based uponlinearity assumptions. The actuator itself includes a dead-zone,and it has a saturation which constraints the control signal inthe interval −1 ≤ u ≤ 1. Although the tuning has been carriedout on a simulated model, it is of interest to include someconstraints on the control signal and/or on some other signals.Then the problem becomes a LQG-based constrained opti-

mization one, and an idea to include constraints is presentedin [56].

VI. DISCUSSION

Both IFT and SPSA are data-based stochastic optimizationtechniques, therefore they represent more than gradient-basedsearch algorithm using sensitivity functions of the quantities inthe control structure with respect to some design parameters.Second, they make no use of the process model in tuning. IFTuses a successive-experiment method for obtaining gradientsof the variables of the closed-loop CS and next the estimateof the gradient of the c.f., whereas SPSA starts with the finite-difference approximation to find directly the gradient of the c.f.

The initial starting point in the search space needs tobe provided for both IFT and SPSA. In general this is notpossible without using a process model, but techniques suchas Ziegler–Nichols tuning or VRFT could be used for thispurpose. Also, there is no automated way of finding theinitial parameters of the search algorithms without using


0

1

2

3

0

500

1000

1500

0

10

20

30

time (s)

(PW

M d

uty

cycl

e)

Con

torl

sig

nal u

x 1 (r

ad s

)

Inte

grat

or s

tate

x 2 (r

ad/s

)

Est

imat

ed s

peed

x 1 (r

ad)

Est

imat

ed p

ositi

on

4 4.5 5 3.5 3 2.5 2 1.5 1 0.5 0

time (s) 4 4.5 5 3.5 3 2.5 2 1.5 1 0.5 0

time (s)

20

40

0 4 4.5 5 3.5 3 2.5 2 1.5 1 0.5 0

time (s) 4 4.5 5 3.5 3 2.5 2 1.5 1 0.5 0

Time responses: initial response (line-dot), after tuning with IFT (Solid), after tuning with SPSA (dotted)

Fig. 6. Responses of the state feedback CS recorded from experimental results on the real-world process. Estimated position, estimated angular speed,integrator state, and control signal versus time.

information from the process, so they are chosen by trialand error. Moreover, throughout the iterations, although theconvergence of the algorithm is ensured by the choice of thecorresponding sequences, the stability of the control structureis not guaranteed. A mechanism devoted to this purposecan be used in the case where a process model is available(e.g., the ν-gap metric) as it is the case in our approaches.Otherwise, a new mechanism should be developed. Since wedeal with numerical algorithms, it is possible that the globalminimum is never obtained, so the algorithms could get stuckin local extremum points. Different starting points in thesearch space may not always be available.

In our paper, the analysis was done for a step referenceinput. However, different deterministic and stochastic excita-tion signals as well obviously influence the CS behavior. Byexpressing the time domain c.f. in the frequency domain, viaParseval’s theorem, we can see that the reference input spec-trum and the noise inputs spectrum act as frequency-domainweights. Therefore, the dynamic behavior is enhanced in thosefrequency regions where the reference spectrum and the noisespectrum are consistent. In the case where a reference input

is chosen different from a step input, we cannot talk anymoreabout the optimal regulation problem that is specific to ourformulation but rather about the optimal tracking problem.This analysis can be used whenever the frequency domainanalysis of the c.f. is carried out [57], [58]. In addition, afrequency domain expression of the c.f. shows that three objec-tives are pursued: the minimization of the energy transfer fromthe reference input to the state variables, the minimization ofthe control effort and the minimization of the energy transferfrom the noise inputs to the state variables (noise rejection).

The sensitivity of the tuning techniques to the initialconditions does not represent an issue if the length of theexperiments over time is increased since the transients weightless in the c.f. It is of course desirable to avoid as muchas possible the experiments that affect the normal operatingregimes of the CS, thus keeping them as short as possible. Ifthe process is almost linear then the initial conditions do notrepresent anymore a problem; whereas in the nonlinear casethe switching from one operating point to another may causethe algorithm to not converge. A typical potential applicationof the techniques is in robotics where tasks are repeated in


similar conditions, and from this point of view a connectioncan be made to the iterative learning control technique whereexperiments are required to be similar from one iterationto another one [15], [48], [59], [60]. IFT has proven to besuccessful in smooth nonlinear systems applications althoughthe background theory relies on the linearity assumptions. Onthe other hand, SPSA can work very well on nonlinear systemsaround certain operating points.

The derivation of the gradient experiment equations canbe very laborious in IFT. This is not the case with SPSA.Only two evaluations of the c.f. are needed with SPSA whenwe have a p-dimensional parameter vector, but with IFT thenumber of experiments in this setting is p + 2 (one gradientexperiment for each parameter and two normal experiments)in order to obtain an unbiased estimate of the gradient of thec.f. which is critical for the performance of the algorithm.

IFT assumes linear processes, but SPSA is not constrainedby this and it could be employed also on nonlinear processesas long as the c.f. is smooth as function of the parametersallowing higher order derivatives. In the same view, the c.f.for IFT can only be used in LQG-like form but with SPSA,performance indices of different nature could be aggregatedtogether.

SPSA can be employed in the minimization of various c.f.sin relation with nonlinear systems, and it is not constrainedjust to LQG-type c.f.s. Therefore, these two advantages overIFT make it a very useful tool. It can only be used for thefurther improvement and tuning of an initial designed CS.This means that we have to start with a fixed stabilizingCS structure. The same problems that are related to theconvergence speed of the algorithm and the stability of theCS during iterations need to be addressed. Although it ismodel-free in the tuning step, asserting the robust stabilityand performance still needs a process model. For example,using the ν-gap distance according to [46], the stability canbe checked at each step. A combination of SPSA and IFT isalso suggested in [61].

IFT evaluates the c.f., an estimate of the gradient and theconstraints, and it transfers them to a constrained optimizationalgorithm that works offline, i.e., in the iteration domain.This solution can lead to an increased number of iterations.However, this number of iterations can be reduced according todesigner’s option because the improvement of the CS behavioris targeted rather by reaching the minimum at any costs. Theconstrained versions of the problem formulation are given in[56], and they can be developed relatively easily.

VII. CONCLUSION

This paper has conducted a comparison of IFT and SPSAon an LQG servo controller problem using an LQG-like c.f.,where the initialization for the two algorithms was provided bythe corresponding deterministic LQR problem in addition towhich a Kalman filter was designed. The advocated data-basedalgorithms can be used for processes plus state estimators bymeans of general LQG-like c.f.s. The performance exhibitedby IFT and SPSA are very similar, but SPSA outperformsIFT in terms of complexity of implementation and degrees

of freedom in the formulation of the c.f. A performanceanalysis of the two techniques is not extremely relevant sincethey both share the same type of gradient SA algorithmswith the corresponding degrees of freedom in the choice ofthe parameters in the update law. However, when it comesto simplicity of implementation, SPSA appears to be moreattractive. In addition, SPSA does not affect too much thenominal operating regimes of the CSs during the informativeexperiments.

The advantages of our data-based approaches is that trulyoptimal CS can be found with no process model in use, on avariety of design problems that are formulated as optimizationproblems. Our algorithms are generic, and they can be appliedwidely [62]–[67].

A limitation of our approaches is that they use the estimatedstates instead of the true state variables because of the needfor practical evaluations of the c.f. Since this paper has carriedout the simulation-based implementation of IFT and SPSAalgorithms and tested the final solutions on the real-worldprocess the future research will deal with the implementationof the algorithms such that to run directly on real-worldprocesses.

The tuning of the Kalman filter together with the statefeedback gain matrix is also targeted. Future research will alsodeal with the extension of the proposed data-based approachesto MIMO CSs, to the inclusion of constraints, and to the tuningof state feedback fuzzy CSs. Further study of the convergenceand stability throughout tuning of the data-based algorithms isneeded for all applications including the nonlinear processes.

REFERENCES

[1] R. Abdullah, A. Hussain, K. Warwick, and A. Zayed, “Autonomousintelligent cruise control using a novel multiple-controller frameworkincorporating fuzzy-logic-based switching and tuning,” Neurocomputing,vol. 71, nos. 13–15, pp. 2727–2741, Aug. 2008.

[2] W. Zuo, Y. Zhu, and L. Cai, “Fourier-neural-network-based learningcontrol for a class of nonlinear systems with flexible components,” IEEETrans. Neural Netw., vol. 20, no. 1, pp. 139–151, Jan. 2009.

[3] D. Hladek, J. Vašcak, and P. Sincák, “Multirobot control system forpursuit-evasion problem,” J. Electr. Eng., vol. 60, no. 3, pp. 143–148,Jun. 2009.

[4] D. Huang, J.-X. Xu, and Z. Hou, “A discrete-time periodic adaptivecontrol approach for parametric-strict-feedback systems,” in Proc. Joint48th IEEE Conf. Decision Control 28th Chin. Control Conf., Shanghai,China, Dec. 2009, pp. 6620–6625.

[5] X. Yang, J. Cao, Y. Long, and W. Rui, “Adaptive lag synchronizationfor competitive neural networks with mixed delays and uncertain hybridperturbations,” IEEE Trans. Neural Netw., vol. 21, no. 10, pp. 1656–1667, Oct. 2010.

[6] O. Linda and M. Manic, “Fuzzy force-feedback augmentation for manualcontrol of multirobot system,” IEEE Trans. Ind. Electron., vol. 58, no. 8,pp. 3213–3220, Aug. 2011.

[7] J.-H. Park, S.-H. Kim, and C.-J. Moon, “Adaptive neural control forstrict-feedback nonlinear systems without backstepping,” IEEE Trans.Neural Netw., vol. 20, no. 7, pp. 1204–1209, Jul. 2009.

[8] T. Chai, “Optimal operation and feedback control for complex industrialprocess,” in Proc. IEEE Int. Conf. Netw. Sensing Control, Okayama,Japan, Mar. 2009, pp. 4–5.

[9] D. Qi, M. Liu, M. Qiu, and S. Zhang, “Exponential H∞ synchronizationof general discrete-time chaotic neural networks with or without timedelays,” IEEE Trans. Neural Netw., vol. 21, no. 8, pp. 1358–1365,Aug. 2010.

[10] I. Skrjanc, S. Blazic, S. Oblak, and J. Richalet, “An approach topredictive control of multivariable time-delayed plant: Stability anddesign issues,” ISA Trans., vol. 43, no. 4, pp. 585–595, Oct. 2004.


[11] Y. Fu and T. Chai, “Nonlinear multivariable adaptive control usingmultiple models and neural networks,” Automatica, vol. 43, no. 6,pp. 1101–1110, Jun. 2007.

[12] D. Vrabie, O. Pastravanu, M. AbuKhalaf, and F. L. Lewis, “Adaptiveoptimal control for continuous-time linear systems based on policyiteration,” Automatica, vol. 45, no. 2, pp. 477–484, Feb. 2009.

[13] H. Zhang, Y. Luo, and D. Liu, “Neural-network-based near-optimalcontrol for a class of discrete-time affine nonlinear systems with controlconstraints,” IEEE Trans. Neural Netw., vol. 20, no. 9, pp. 1490–1503,Sep. 2009.

[14] C. Yin, J.-X. Xu, and Z. Hou, “Iterative learning control design withhigh-order internal model for nonlinear systems,” in Proc. Jointly 48thIEEE Conf. Decision Control 28th Chin. Control Conf., Shanghai, China,Dec. 2009, pp. 434–439.

[15] Y. Hayakawa and K. Nakajima, “Design of the inverse function delayedneural network for solving combinatorial optimization problems,” IEEETrans. Neural Netw., vol. 21, no. 2, pp. 224–237, Feb. 2010.

[16] Z. C. Johanyák, “Student evaluation based on fuzzy rule interpolation,”Int. J. Artif. Intell., vol. 5, no. 10, pp. 37–55, Sep. 2010.

[17] Z. Liu, H. Zhang, and Q. Zhang, “Novel stability analysis for recur-rent neural networks with multiple delays via line integral-type L-Kfunctional,” IEEE Trans. Neural Netw., vol. 21, no. 11, pp. 1710–1718,Nov. 2010.

[18] H. Hjalmarsson, “Iterative feedback tuning-an overview,” Int. J. Adapt.Control Signal Process., vol. 16, pp. 373–395, Jun. 2002.

[19] J. K. Huusom, N. K. Poulsen, and S. B. Jørgensen, “Improving conver-gence of iterative feedback tuning,” J. Process Control, vol. 19, pp. 570–578, Apr. 2009.

[20] J. Sjöberg, P.-O. Gutman, M. Agarwal, and M. Bax, “Nonlinear con-troller tuning based on a sequence of identifications of linearized time-varying models,” Control Eng. Pract., vol. 17, no. 2, pp. 311–321,Feb. 2009.

[21] J. K. Huusom, N. K. Poulsen, and S. B. Jørgensen, “Data driventuning of state space control loops with unknown state information andmodel uncertainty,” Comput. Aided Chem. Eng., vol. 26, pp. 441–446,Dec. 2009.

[22] R.-E. Precup, S. Preitl, I. J. Rudas, M. L. Tomescu, and J. K. Tar,“Design and experiments for a class of fuzzy controlled servo systems,”IEEE/ASME Trans. Mechatronics, vol. 13, no. 1, pp. 22–35, Feb. 2008.

[23] S. Kissling, P. Blanc, P. Myszkorowski, and I. Vaclavik, “Application ofiterative feedback tuning (IFT) to speed and position control of a servodrive,” Control Eng. Pract., vol. 17, pp. 834–840, Jul. 2009.

[24] H. Hjalmarsson, “Control of nonlinear systems using iterative feedbacktuning,” in Proc. Amer. Control Conf., Philadelphia, PA, 1998, pp. 2083–2087.

[25] J. Sjöberg, F. D. Bruyne, M. Agarwal, B. D. O. Anderson, M. Gevers, F.J. Kraus, and N. Linard, “Iterative controller optimization for nonlinearsystems,” Control Eng. Pract., vol. 11, pp. 1079–1086, Sep. 2003.

[26] A. J. McDaid, K. C. Aw, S. Q. Xie, and E. Haemmerle, “Gain scheduledcontrol of IPMC actuators with model-free iterative feedback tuning,”Sens. Actuators A: Phys., vol. 164, nos. 1–2, pp. 137–147, Dec. 2010.

[27] J. C. Spall, “A stochastic approximation algorithm for large-dimensionalsystems in the Kiefer-Wolfowitz setting,” in Proc. 27th IEEE Conf.Decision Control, Austin, TX, 1998, pp. 1544–1548.

[28] J. C. Spall, “Multivariate stochastic approximation using a simultaneousperturbation gradient approximation,” IEEE Trans. Autom. Control,vol. 37, no. 3, pp. 332–341, Mar. 1992.

[29] J. I. M. Martinez, K. Nakano, and K. Higuchi, “Parameter estimation inneural networks by improved version of simultaneous perturbation sto-chastic approximation algorithm,” in Proc. ICCAS-SICE Conf., Fukuoka,Japan, Aug. 2009, pp. 4567–4572.

[30] Y.-Y. Hong, H.-L. Chang, and C.-S. Chiu, “Hour-ahead wind power andspeed forecasting using simultaneous perturbation stochastic approxima-tion (SPSA) algorithm and neural network with fuzzy inputs,” Energy,vol. 35, no. 9, pp. 3870–3876, Sep. 2010.

[31] H. Zhang, J. Zhao, T. Geng, and R. Wang, “The improved convergenceof SPSA and its application in drive system,” in Proc. 4th IEEE Conf.Ind. Electron. Appli., Xi’an, China, May 2009, pp. 662–666.

[32] H. Dong, X.-H. Tang, Y. Tong, and Y.-P. Li, “Research on modelpredictive control for inventory management in decentralized supplychain system,” in Proc. 2009 Int. Conf. Inform. Manag. InnovationManag. Ind. Eng., Xi’an, China, Dec. 2009, pp. 250–253.

[33] O. Granichin, L. Gurevich, and A. Vakhitov, “SPSA with a fixed gain forintelligent control in tracking applications,” in Proc. IEEE Conf. Syst.Control, Saint Petersburg, Russia, Jul. 2009, pp. 1415–1420.

[34] C. C. Hyun, J. Knowles, M. S. Fadali, and S. L. Kwon, “Fault detectionand isolation of induction motors using recurrent neural networks anddynamic Bayesian modeling,” IEEE Trans. Contr. Syst. Technol., vol. 18,no. 2, pp. 430–437, Mar. 2010.

[35] Y.-Y. Hong and C.-S. Chiu, “Passive filter planning using simultaneousperturbation stochastic approximation,” IEEE Trans. Power Delivery,vol. 25, no. 2, pp. 939–946, Apr. 2010.

[36] M. Kumon, K. Fukushima, S. Kunimatsu, and M. Ishitobi, “Motionplanning based on simultaneous perturbation stochastic approximationfor mobile auditory robots,” in Proc. IEEE/RSJ Int. Conf. Intell. RobotsSyst., Taipei, Taiwan, Oct. 2010, pp. 431–436.

[37] A. H. Alhabsi, “Improved SPSA optimization algorithm requiring asingle measurement per iteration,” in Proc. 10th Int. Conf. Inform. Sci.Signal Process. Applicat., Kuala Lumpur, Malaysia, May 2010, pp. 263–265.

[38] F. L. Lewis, “Adaptive dynamic programming for feedback control,” inProc. 7th Asian Control Conf., Hong Kong, China, 2009, pp. 10–11.

[39] X. Bai, D. Zhao, and J. Yi, “Coordinated multiple ramps metering basedon neuro-fuzzy adaptive dynamic programming,” in Proc. Int. Joint Conf.Neural Netw., Atlanta, GA, 2009, pp. 241–248.

[40] W.-S. Lin and J.-W. Sheu, “Neuro-dynamic programming with recurrentcritic for automatic train regulation of metro line,” in Proc. Int. JointConf. Neural Netw., Atlanta, GA, Jun. 2009, pp. 1807–1813.

[41] J. Ding, S. N. Balakrishnan, and F. L. Lewis, “A cost function basedsingle network adaptive critic architecture for optimal control synthesisfor a class of nonlinear systems,” in Proc. Int. Joint Conf. Neural Netw.,Barcelona, Spain, Jul. 2010, pp. 1–8.

[42] Z. Wang, Y. Dai, and Y. Yao, “Research of a parallel learning adaptivedynamic programming based on genetic algorithms,” in Proc. 2nd Int.Conf. Commu. Syst. Netw. Applicat., Hong Kong, China, 2010, pp.350–353.

[43] M. Boccadoro and F. Martinelli, “Control of a logistic node via neuro-dynamic programming,” in Proc. 49th IEEE Conf. Decision Control,Atlanta, GA, Dec. 2010, pp. 4896–4901.

[44] F.-Y. Wang, N. Jin, D. Liu, and Q. Wei, “Adaptive dynamic programmingfor finite-horizon optimal control of discrete-time nonlinear systemswith ε-error bound,” IEEE Trans. Neural Netw., vol. 22, no. 1, pp.24–36, Jan. 2011.

[45] J. K. Huusom, N. K. Poulsen, and S. B. Jørgensen, “Iterative feedbacktuning of uncertain state space systems,” Braz. J. Chem. Eng., vol. 27,no. 3, pp. 461–472, Jul.–Sep. 2010.

[46] L. C. Kammer, R. R. Bitmead, and P. L. Bartlett, “Direct iterative tuningvia spectral analysis,” Automatica, vol. 36, pp. 1301–1307, Sep. 2000.

[47] L. Horváth and I. J. Rudas, Modeling and Problem Solving Methodsfor Engineers. Burlington, MA: Academic, 2004.

[48] R.-E. Precup, S. Preitl, J. K. Tar, M. L. Tomescu, M. Takács, P. Korondi,and P. Baranyi, “Fuzzy control system performance enhancement byiterative learning control,” IEEE Trans. Ind. Electron., vol. 55, no. 9,pp. 3461–3475, Sep. 2008.

[49] D. Vrabie and F. L. Lewis, “Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinearsystems,” Neural Netw., vol. 22, pp. 237–246, Apr. 2009.

[50] R. E. Haber, R. Haber-Haber, A. Jiménez, and R. Galán, “An optimalfuzzy control system in a network environment based on simulatedannealing an application to a drilling process,” Appl. Soft Comput., vol.9, pp. 889–895, Jun. 2009.

[51] G. Hulko, C. Belavy, P. Bucek, K. Ondrejkovic, and P. Zajicek,“Engineering methods and software support for control of distributedparameter systems,” in Proc. 7th Asian Control Conf., Hong Kong,China, Aug. 2009, pp. 1432–1438.

[52] A. Skoglund, B. Iliev, and R. Palm, “Programming-by-demonstration ofreaching motions a next-state-planner approach,” Robot. Auton. Syst.,vol. 58, pp. 607–621, May 2010.

[53] G. Solari and M. Gevers, “Unbiased estimation of the Hessian foriterative feedback tuning (IFT),” in Proc. 43rd IEEE Conf. DecisionControl, Dec. 2004, pp. 1759–1760.

[54] J. C. Spall, “Implementation of the simultaneous perturbation algorithmfor stochastic optimization,” IEEE Trans. Aerosp. Electron. Syst., vol.34, no. 3, pp. 817–823, Jul. 1998.

[55] R. Precup, C. Borchescu, M. Radac, S. Preitl, C.-A. Dragos, E. M.Petriu, and J. K. Tar, “Implementation and signal processing aspectsof iterative regression tuning,” in Proc. IEEE Int. Symp. Ind. Electron,Bari, Italy, Jul. 2010, pp. 1657–1662.

[56] M. Akerblad, A. Hansson, and B. Wahlberg, “Automatic tuning forclassical step-response specifications using iterative feedback tuning,”in Proc. 39th IEEE Conf. Decision Control, Sydney, Australia, 2000,pp. 3347–3348.


[57] A. S. Bazanella, M. Gevers, L. Miskovic, and B. D. O. Anderson,“Iterative minimization of H2 control performance criteria,” Automatica,vol. 44, pp. 2549–2559, Oct. 2008.

[58] H. Hjalmarsson, M. Gevers, S. Gunnarsson, and O. Lequin, “Iterativefeedback tuning: Theory and applications,” IEEE Control Syst. Mag.,vol. 18, no. 4, pp. 26–41, Aug. 1998.

[59] D. A. Bristow, M. Tharayil, and A. G. Alleyne, “A survey of iterativelearning control,” IEEE Control Syst. Mag., vol. 26, no. 3, pp. 96–114,Jun. 2006.

[60] W. Zuo, Y. Zhu, and L. Cai, “Fourier-neural-network-based learningcontrol for a class of nonlinear systems with flexible components,”IEEE Trans. Neural Netw., vol. 20, no. 1, pp. 139–151, Jan. 2009.

[61] L. Gerencsér, Z. Vágó, and H. Hjalmarsson, “Randomized iterativefeedback tuning,” in Proc. 15th IFAC World Congr., Barcelona, Spain,2002, pp. 1–6.

[62] Z. Petres, P. Baranyi, P. Korondi, and H. Hashimoto, “Trajectory trackingby TP model transformation: Case study of a benchmark problem,”IEEE Trans. Ind. Electron., vol. 54, no. 3, pp. 1654–1663, Jun. 2007.

[63] K. N. Gurney, A. Hussain, J. M. Chambers, and R. Abdullah,“Controlled and automatic processing in animals and machines withapplication to autonomous vehicle control,” Lect. Notes Comput. Sci.,vol. 5768, pp. 198–207, Sep. 2009.

[64] W. L. Tung and C. Quek, “eFSM-A novel online neural-fuzzy semanticmemory model,” IEEE Trans. Neural Netw., vol. 21, no. 1, pp. 136–157,Jan. 2010.

[65] A. Rodan and P. Tino, “Minimum complexity echo state network,”IEEE Trans. Neural Netw., vol. 22, no. 1, pp. 131–144, Jan. 2011.

[66] H. Zhang, Q. Wei, and D. Liu, “An iterative adaptive dynamicprogramming method for solving a class of nonlinear zero-sumdifferential games,” Automatica, vol. 47, pp. 207–214, Jan. 2011.

[67] A. Garcia, A. Luviano-Juarez, I. Chairez, A. Poznyak, and T. Poznyak,“Projectional dynamic neural network identifier for chaotic systems:Application to Chua’s circuit,” Int. J. Artif. Intell., vol. 6, no. 11,pp. 1–18, Mar. 2011.

Mircea-Bogdan Radac received the Dipl.Ing.degree in systems and computer engineering andthe Ph.D. degree in systems engineering fromthe Politehnica University of Timisoara (PUT),Timisoara, Romania, in 2008 and 2011, respectively.He has been pursuing the Doctoral degree with thesame university since 2008.

He is currently a Post-Doctoral Researcher withthe Department of Automation and Applied Infor-matics, PUT. He is the co-author of more than20 papers published in scientific journals, refereed

conference proceedings, and contributions to books. His current researchinterests include control structures and algorithms with focus on iterativemethods in control design and optimization.

Mr. Radac is a member of the Romanian Society of Control Engineeringand Technical Informatics.

Radu-Emil Precup (M’03–SM’07) received theDipl.Ing. (with honors) degree in automation andcomputers from the Traian Vuia Polytechnic Insti-tute of Timisoara, Timisoara, Romania, the Diplomadegree in mathematics from the West Universityof Timisoara, Timisoara, and the Ph.D. degree inautomatic systems from the Politehnica Universityof Timisoara (PUT), Timisoara, in 1987, 1993, and1996, respectively.

He was with Infoservice S.A., Timisoara, from1987 to 1991. He is currently with PUT, where

he became a Professor with the Department of Automation and AppliedInformatics in 2000, and is currently a Doctoral Supervisor of systemsengineering. He is an Honorary Professor of the Doctoral School of Applied

Informatics, Óbuda University, Budapest, Hungary. He is the author or co-author of more than 150 papers published in scientific journals, refereedconference proceedings, and contributions to books. His current researchinterests include intelligent control systems, databased controls, and nature-inspired algorithms for optimization.

Prof. Precup is a member of the Subcommittee on Computational Intelli-gence as part of the IEEE Industrial Electronics Society, the InternationalFederation of Automatic Control Technical Committee on ComputationalIntelligence in Control, the Hungarian Fuzzy Association, the RomanianSociety of Control Engineering and Technical Informatics, and the DoctoralSchool of Applied Informatics.

Emil M. Petriu (M’86–SM’88–F’01) received theDipl.Eng. and Dr.Eng. degrees from the PolytechnicInstitute of Timisoara, Timisoara, Romania.

He has been a Faculty Member of the University ofOttawa, Ottawa, ON, Canada, since 1985, where heis currently a Professor and the University ResearchChair with the School of Information Technologyand Engineering. He has published more than 100refereed journal papers, ten book chapters, morethan 200 papers in refereed conference proceedings,authored two books, edited three books, and received

two patents. His current research interests include robot sensing and per-ception, interactive virtual environments, human-computer symbiosis, softcomputing, and digital integrated circuit testing.

Prof. Petriu served as a member of the Administrative Committee from 1996to 2005 and the Vice-President of the IEEE Instrumentation and MeasurementSociety from 2000 to 2002. He served as the Chair of the IEEE Joseph F.Keithley Award Committee from 2007 to 2010 and a member of the IEEETechnical Field Awards Council from 2007 to 2010. He is serving as theChair of the TC-15 on Virtual Systems in Measurements, the Co-Chair ofthe TC-30 Security and Contraband Detection of the IEEE Instrumentationand Measurement Society, and the Chair of the Virtual Reality Task Forceof the Intelligent Systems Applications Technical Committee of the IEEEComputational Intelligence Society.

Stefan Preitl (M’03–SM’07) received the Dipl.Ing.degree in electrical engineering and the Ph.D.degree in measurement techniques from the TraianVuia Polytechnic Institute of Timisoara, Timisoara,Romania, in 1966 and 1983, respectively.

He was with Electromotor S.A., Timisoara, from1967 to 1972. He is currently with the PolitehnicaUniversity of Timisoara, Timisoara, where hebecame a Professor with the Department of Automa-tion and Applied Informatics in 1992, and is cur-rently a Doctoral Supervisor of systems engineering.

He is an Honorary Professor of the Doctoral School of Applied Informatics,Óbuda University, Budapest, Hungary. He is the author or co-author of morethan 200 papers published in various scientific journals, refereed conferenceproceedings, and books in the field of automatic control. His current researchinterests include conventional and advanced structures and algorithms for auto-matic control applied to power or servo systems, control systems of electricaldrives, methodical aspects of teaching, and development of computer-assistededucation.

Prof. Preitl is a Board Member of the Romanian Society of ControlEngineering and Technical Informatics, the Hungarian Fuzzy Association,the International Federation of Automatic Control Technical Committee onControl Design, and the Doctoral School of Applied Informatics.

本文献由“学霸图书馆-文献云下载”收集自网络，仅供学习交流使用。

学霸图书馆（www.xuebalib.com）是一个“整合众多图书馆数据库资源，

提供一站式文献检索和下载服务”的24 小时在线不限IP

图书馆。

图书馆致力于便利、促进学习与科研，提供最强文献下载服务。

图书馆导航：

图书馆首页文献云下载图书馆入口外文数据库大全疑难文献辅助工具

http://www.xuebalib.com/cloud/

http://www.xuebalib.com/

http://www.xuebalib.com/cloud/


http://www.xuebalib.com/vip.html

http://www.xuebalib.com/db.php

http://www.xuebalib.com/zixun/2014-08-15/44.html


Application of IFT and SPSA to servo system control.download.xuebalib.com/79xdNdmt3yD2.pdf · IEEE...

Documents

Transcript of Application of IFT and SPSA to servo system control.download.xuebalib.com/79xdNdmt3yD2.pdf · IEEE...