Nash Equilibrium for2nd order Two-Player Non-Zero Sum LQ ...cruz/Papers/C129-CDC-2006-Wang.pdf ·...

Proceedings of the 45th IEEE Conference on Decision & Control WeIP11.3Manchester Grand Hyatt HotelSan Diego, CA, USA, December 13-15, 2006

Nash Equilibrium for 2nd_order Two-Player Non-Zero Sum LQGames with Executable Decentralized Control Strategies

Xu Wang, and Jose B. Cruz Jr., Life Fellow, IEEE

Abstract-Riccati equations are the basis to construct Nashequilibrium solutions for linear quadratic (LQ) non-cooperativegames when players apply linear state feedback strategies.Diagonal-form solutions of the Riccati equation associated witha second-order two-player non-zero sum linear quadratic gameis discussed and the Nash equilibrium can be implemented bydecentralized control strategies. The result is also extended to ahigh order two-player linear quadratic game.

I. INTRODUCTION

IN 1960, Kalman [1] introduced his theory (later named asLinear Quadratic Regulator) for time-varying

multiple-input multiple-output linear systems with an integralperformance index of quadratic form. Using calculus ofvariations, Kalman showed that the optimal control is a linearfeedback of the system states and the time varying statefeedback gains can be obtained by solving a Riccati equationbackward in time to steady-state. There are various kinds ofRiccati equations (e.g. matrix / operator Riccati algebraic /differential / difference equations) which are associated withoptimization problems such as linear optimal control, optimalfiltering, and singular perturbation theory. For linear dynamicgames with quadratic performance indices, if each playerapplies a control strategy of linear feedback of system states,the Nash equilibrium solution also depends on the solution ofassociated Riccati equations.

The Riccati equations associated with game problems arecoupled nonlinear equations. Compared with the Riccatiequations associated with optimal control problems, theRiccati equations of linear quadratic dynamic games are morecomplicated because of cross terms - products involvingdifferent players' Riccati equation solutions. There are nogeneral results pertaining to the existence and uniqueness ofthe solutions of Riccati equations associated with gameproblems. There are many papers in which the conclusionsare given in terms of assumptions of the existence of thesolutions of Riccati equations [2-5]. Although sufficientconditions are provided in [6] for two-personnon-cooperative linear quadratic game, the result is difficultto apply because the implicit condition already involves thesolutions of the Riccati equation. Some conclusions wereobtained for simple cases, e.g. Schumacher and Engwerda [7]

Xu Wang and Jose B. Cruz, Jr. are with the Department of Electrical andComputer Engineering, The Ohio State University, Columbus, OH 43210USA.(Corresponding author: cruz, phone: 614-292-1588; fax: 614-292-1588;e-mail: cruz wece.osu.edu).

studied a sufficient condition such that the solution to theRiccati equation associated with a scalar system exists.Papavassilopoulos and Olsder [8] studied for the first time theglobal existence of solutions of closed loop Nash matrixRiccati equations, but the game studied in this paper is ofspecial form: the two players must have the same controlmatrices in the system equation and the same controlweighting matrices in the performance indices. In [9], Cruzand Chen provided a series solution to the asymmetric Riccatiequations associated with open-loop non-cooperative games.Papavassilopoulos, Cruz and Medanic [10][11] studiedsufficient conditions for local existence of solutions ofcoupled symmetric Riccati equations in nonzero-sumlinear-quadratic games applying Brower's fixed-pointtheorem and comparison theorem. To our best knowledge,there are no general solutions for the existence of solutions ofnth-order M-player Riccati equations associated with gameproblems.

In this paper, necessary and sufficient conditions areprovided for the existence of diagonal-form solutions ofRiccati equations associated with a linear quadraticsecond-order two-player non-cooperative game, where eachplayer has complete information structure and applies controlstrategies of linear feedback of system states. Consequentlythe Nash equilibrium solution can be constructed when eachplayer uses only decentralized control strategy. The structureof this paper is arranged as follows: the next section describesa second-order linear quadratic game which involves twoplayers and the Riccati equation associated with this game isalso introduced; then in section III, existence conditions fordiagonal-form Riccati equation solutions are provided andconsequently the Nash equilibrium solution implemented bydecentralized control is obtained explicitly in terms of thesystem coefficients. In section IV, the result of thesecond-order system is extended to a 2n-order (n>1) linearsystem and section V is the conclusion.

II. PROBLEM STATEMENT

A. A 2nd order 2-player Nonzero-sum GameA second-order 2-player nonzero-sum linear-quadratic

game is described by (1)X1= a,x±+ a2X2+±bll (X1 (to ) = X10 )

(1 .a)X2 = a3X1 + a4X2 +b2u2 (X2 (to) = X20 )

1-4244-0171-2/06/$20.00 ©2006 IEEE. 1 960

Authorized licensed use limited to: The Ohio State University. Downloaded on January 9, 2009 at 19:56 from IEEE Xplore. Restrictions apply.

45th IEEE CDC, San Diego, USA, Dec. 13-15, 2006

Ji (x, U1,u2) = X (tf )CiX(tf ) +t

(xTQiX + Ui2 + r,,u'2)dt (i, j =1, 2; i#j) (Lb)

where xi E R (i=1,2) are the two system states and denote the

state vector x(t) =(X (t), X2 (t))T with x(t0) = x = (Xlo IX210 )T

ai E R (i=1,...,4) and bi E R (i=1,2) are the scalar systemcoefficients with a, . 0 (i=2,3) and b1 . 0 (i=1,2). In (lb),

()Tmeans the transpose of a matrix; the state and control

weighting matrices are Q,,C1 e22X2, i E R (i, j=1,2;i j)and

(ql q3 >(qii 0), Ciqi3 qi2

(c:i > 0) (i=1,2)

Ci3 C12

Define A=r a1 a2 b(0,B1 ) Then (A,B1)

a3 a4 ° b2

and (A, B2) are all controllable. The definiteness of Q, and C1

(i= 1,2) is not mandatory. The requirement of q,,>0 and c11>0

reflects the fact that player i cares more about the system statex, in his/her own subsystem, i.e. the ith equation in (l.a)(i= 1,2). Suppose, in Game (1), that each player has perfectinformation about the game model, the other player'sperformance index structure and the system states. Eachplayer applies a linear state feedback strategy (2) to minimizehis/her own performance index (l.b) independently.

u, (t) = -K1 (t)x(t), where K1 (t) = (kj1 (t) ki2 (t))(2)

(1) if and only if there is a solution (01, 02) of algebraicRiccati equation (4.a) for the infinite horizon case and there isa solution (01, 02) of differential Riccati equation (5) for thefinite horizon case. Moreover the Nash equilibrium solutioncan be constructed as (6). For the infinite horizon case,additional condition (4.b) is required in order to guarantee thestability of the closed-loop system (3).

0 A01 +±01A + Q1 - 0IBfBT01- 01B2B2 02

02B2B021 ±02B2rj2B02 (4.a)0 A02 + 02A + Q2 -02B B T2 - 02B(4.a01

-0BJ3T2 + 3Br-BJO1

i E C ,-ti E {eigenvalues of KA -ZB1B0J)}

-01 = A'01 ±01A + Q1 - 0IB1T01 - 01B2B2 02

-02 -fi2 1 + 02B rl 2-f2-02 A02 + 02A+Q2 - 02B2B2 02 - 02BlT 01

-IBIBlT 02 + lB1r2l,Bl 01

(4.b)

(5)

01(tf) =Cl, 02(tf) =C2 (Vte[totf1])ui = -BjT0x (i= 1,2) (6)

Remark 1 In Theorem 1, necessary and sufficient conditionsare provided, while in other similar results (e.g. [5]) onlysufficient or necessary conditions are discussed.

Denote 01 and 02 in (4-5) as 0l = ( 03 0203 02)

Here Cl x, (i>0) denotes the set of all matrix valued functionswhose mxn entries are ith order continuously differentiablefunctions of an argument within the domain [to, tf]. For scalarfunctions, the subscript mxn which should be lxI will bedropped in the notation Cl xn. The closed-loop system of (l.a)under (2) is (3).

(X1 (to, ) = X10" ) (A

X2 =(a3 -b2k2)x1 +(a4 -b2k22)x2 (x2(t) x20)It should be understood that if the game termination time

tf is infinite, then C1 = 0 (i= 1,2) and (2) should be a constant

linear state feedback strategy, i.e. kij (t)=kij (Vt E [to, oo]) (i,

j= 1,2).A. Nash Equilibrium Solution

A control strategy pair (u7, u*) is called a Nashequilibrium solution to Game (1) if the following inequalitieshold.

Ji (xo,U *,u*) < J,(xo,*

u,) (i, j = 1, 2; i j) (3)Nash equilibrium solutions are desired for non-cooperative

games because it can prevent players' unilateral deviationsfrom equilibrium solutions. By Theorem 4 in [12], we haveTheorem 1: (2) will be a Nash equilibrium solution of Game

X1 = (a - blk1)X1 + (a2 -blkl2 ) X2

III. NASH EQUILIBRIUM IMPLEMENTED BY DECENTRALIZEDCONTROL

From part II, we know that Nash equilibrium solutions aredependent on the solutions of Riccati equations. Observing (6)and the structure of B (i= 1, 2), if 0i (i= 1, 2) is diagonal, then

ui will be a strategy only involving state x, (i=1,2) and theresulted closed-loop system is a decentralized control system.One ofthe merits of decentralized systems is the reduction ofsystem complexity. Sometimes players may have incompleteinformation structure due to physical constraints, e.g. player ionly has access to state x,(i= 1,2) so that the Nash equilibriumsolution can only be formed by partial state feedbackstrategies. This is another motivation to study the possibilityof constructing Nash Equilibrium solutions by decentralizedcontrol. In this part, conditions on system coefficients toguarantee the diagonal-form solution of Riccati equations(4.a) and (5) are discussed first. Then on the basis of theseRiccati equation solutions, the Nash equilibrium solutionimplemented by decentralized control is provided.

A. Diagonal-Form Riccati Equation Solution1) Algebraic Riccati Equation

For algebraic Riccati equation (4.a), we haveTheorem 2: For algebraic Riccati equation (4.a), there is adiagonal-form solution (7) if and only if condition (8) is

1961

(k.- (t) : [t,,, t, ] -> R E C'[to, tf ], i, j = 1, 2)

WelP1 1.3

Vfl V3V3 V2



satisfied.

=

02 =

Vf2 =

-(a,+dl)/bl2, 03 =0

= -[a2 (a1 + d) +qi3bi2]/a3bi

-Fa3 (a4 + d, ) + q23b 2]/a2b2( +d2 )/b22,I 3 = 0

q12 - r2(a4 +d2)/b-22[a2 (ai + di) +qi3bi2 ]q21 r21 (a1 +d1)2/b2 -2[a3 (a4 +d2)+q23b]2

(7.a)

(7

d2 la3iAd/a2)

(8.a-8

where d, = a,2±+b12q11 and d2 = a,2±+ b22q22Proof of Theorem 2: After substituting all the coefficientmatrices into (4.a), it is equivalent to (9)o = 2a1l±+ 2a3±03+ q11 - b12b1 -2b2 03±h 2b2f

0 = a2A + a32 ±+ (a1 + a4 ) 03 + q13- b,20103 (9.a-9

- b22 (OW2 + 02V3 ) + rI2b2 V2 3

L 0 2a4o2 +2a203 q12- b1 032 2b22'2 +j2b22f220 = 2a11 + 2a3 V3 ±q2 - b22Vf42- 2b,2A,Vyl +r2lb'2A210°= a2 + a3y12 +(a + a4)3 + q23 -b2VY3 (9.d-9.f)

-bI2 (03 1 + 01t3 ) + r2lbl 0103

10 = 2a4v2 + 2a2V3 + q22 -bV 2bj20y3 + r21b i03When condition (8) holds, substituting (7) into the right handside of the equations in (9), we will get six zeros. So (7) is asolution to (9) under condition (8). Thus sufficiency is proved.Note that if 03= 3=0, (9.a) and (9.f) are one-variable 2-degreepolynomials of the Riccati equation solutions 01 and Vg2respectively; (9.b) is a two-variable one-degree polynomial of01 and 02 and (9.e) is a two-variable one-degree polynomial ofy1 and 2. Solve (9.a) and (9.f) and select the nonnegativesolution to guarantee the negative feedback (6) for eachplayer when the effect of b, (i=1,2) in (1) is consideredalready. The corresponding solution can be described by (7).But the coupled equations (9.c) and (9.d), whichsimultaneously involve the entries of the two matrices 64 and02, should be satisfied too. Substitute the solution (7) to (9.c)and (9.d) and we know that they can be satisfied if and only ifcondition (8) holds. Thus necessity is proved too..

2) Differential Riccati EquationFor the differential Riccati equation (5), Theorems 3 and 4

describe the constant and time varying diagonal-formsolution respectively.Theorem 3: For the differential Riccati equation (5), there isa unique time-invariant diagonal-form solution (7) ifand onlyif the conditions (8) and (10) are satisfied.

c1i[(a±+dl)lblC=2-1a2 (al +dl'

C13 0

) + ql3bl ]/a3bl'(IO.a)

c2 =: - [a3 ( a + d2 ) + q2b gl2b(lO)21 4~/ ±23b]/2b(lOb)

c22 =(4 + d2 )/b2 ' C23 = 0

Proof of Theorem 3: Again, after substituting all thecoefficient matrices into (5), we have (11)

I.b)[3 =

b2 -02 =JAh2 40 =

).b) Oi (tf ) =

1 1 =

VYi (tf )

:2a1Al + 2a3q + q11 b-2q -2b2203 +±2bg2 2

a201 + a3b2 + (al + a4)03 + q13 -bj20103

-b22 (03 2 + 023 ) + rl2b2 2 3

-2a4 +±2a +q - b12032q- 2b22q2,2 + ,b22,42

40 2023 ±q22 b4 2bj2 +ry±

cl, (1 1, 2,3) (Vt E [to, tf])(Illa-ll .c)

=2alyj + 2a3f3 +q2l - h22V32_- 2tl20,Vf + r2lb,2012= fa1W +3 2 + (a, + a4 ) 3 + q23 -b2 VY3

-bl2 (03VI + OIV3 )+ r2lbI2010322a4 V2 + 2a2 V3 + q22 - b2-2 2b,203 Vf3 + r2lb12032=cli (i = 1, 2, 3) (Vt E [to, tf ])

(lIld-lIlf)Note that solution (7) is an equilibrium for the differentialequation (I1) if condition (8) is satisfied. If the boundarycondition in (I1) satisfies (10), then the termination value ofdifferential equation (I1) will be the equilibrium. Also notethat (I1) is a homogeneous ordinary differential equation sothat if the boundary condition is the equilibrium then thesolution will be time-invariant and identical with theequilibrium for any value t within [to, tf]. Due to the constraintof c11>0 in section II, the non-positive equilibrium for (I l.a)and (1 lf) cannot be selected. Thus (7) is the onlytime-invariant solution. So Theorem 3 is proved. 0

Theorem 4: Riccati differential equation (5) will have aunique diagonal-form time-varying solution (12) if and onlyif condition (13) holds.

001(t)=cl/0 + ilb2 (tf t)]0i2(t) -a2ci/[a3 (1 + c1lbl(t1 -)] (12.a)

03 (t) 0 t C [to, tf

V1yl(t) = -a3C221 a2 (1+ c22b22 (tf - ))qW2(t)c2/0l c2b22 (tf t)] (12.b)Vg3(t) 0 2 2

Vf3 (t) = 0 ( Vt EE tontf 2

a1 a4 = c13 = c23 0, Q1 =Q2 , b/b= a2r2l /a3c11 = - c22a31'2/a2, c12 a2c11/a3, C21 a3C22/a2,i'2 r21 i

(13)Proof of Theorem 4:When al=a4=0, Qi=Q2=0 and 03= 3=0, from ((I I.e) and (I I.f), then

|02(t)-ClL+lb2t'=ta21t)a

Illa), (Ii.b),

(14.a)

1962

WelP1 1.3



Theorem 5: When tfoc, Game (1) has a decentralized Nashequilibrium solution (18) if and only if conditions (8) and (19)are satisfied.

01yl (t) = -a3 (t)/a,tC t,fI4b

lYt2~ ~~~b(t=2llc22(tf _t)] t0otf) (4b

Substituting (14) into (tlic) and (lIle), the necessary andsufficient conditions such that (1 .c) and (1 .e) can besatisfied with 03 (t) = igy (t) = 0 are

2'2~2 222 3rl2r2l =a1, bl /b2 a2r2l a3, C 22a3 2 (15)

C12 = - a2C1 /la3, C21 = - a3C22/ a2

If 03 (t) = y3 (t) =, then the boundary condition should be

c13 = c23 =0. There are no other diagonal-form time-varyingsolutions to (11). The proof is as follows. If the nonzerotermination state weighting matrices C1 and C2 in (11) aresome values other than those in (13), and of the twoparameters a1 and q1l, at least one of which is nonzero(similarly for parameters a4 and q22), Xl (t) and V2 (t) can beanalytically solved as (16) from ( l.a) and (1if)respectively.

a+±di±(d- al)(clll2 a dl)(( t- ))a,+, + A2 ex2d (k t

KI1 2 a,1 d1 exp(2djt tf )cb2I ±d1 I)

(Vt e [to t,t1]

a4+ d2 (d2 a4)(C22b2 - a4 - d2)exp(2d2_a4 ±d2

V2 (t) = 2

b22 I c22b _a4 d2 exp(2d2 (t - tfC22b2 -a4 ±d2

(Vt E [to0tf])From (lib), (lIl e) and (16), 02(t) and Vi, (t) can b

correspondingly. In order to satisfy (1 lc) and (1 Ildthe necessary and sufficient conditions are

Val +bl2q11 =-Val> +bl2q11a+ b2 q22 --a±+ b q22

According to the assumptions: bi 0 (i=1, 2), a1

or qll0,a4.0 and / or q22.0, So (17) ca

satisfied. Theorem 4 is proved.The Riccati equations associated with game prob

very difficult to solve. The above discussion illustrdifficulty because even for this 2nd order 2-playproblem extensive calculation is involved in order tRiccati equation solutions.

B. Nash Equilibrium by Decentralized Control

Based on the analysis of the above subsection,construct the Nash equilibrium strategy implemedecentralized control for Game (1).

1) Infinite Horizon GameFrom Theorems 1 and 2, we naturally have Th

which provides the decentralized Nash equilibriumfor Game (1) for the infinite horizon case.

U1 - a + al + bI ql)X1Xb

u2 2-(a4± +b2q22 )X2b2

(1 8.a)

(18.b)

a42±b2q a4±+b2q22 -a2a3 >0 (19)Proof of Theorem 5:Under condition (8), (18) is a direct result after substituting

(7) into (6). And consequently the closed-loop systembecomes (20).

2X1 al +±b2q1 X1±+ a2X2 (X1 (to) = x10) (20)

X2 = a3XI - a4 b2 q22 X2 (X2(t) = X20)Notice that (19) is required by condition (4.b). Suppose the

two eigenvalues of (20) are Al and 22. Then we have

A1±22 (-Aal b1q1+ 2 +±q22)<° (21.a)

A22 = Val' + bj2q11 Va,' + b2q22 -a2a3 (21.b)

The inequality in (21.a) is evident. If the two eigenvalues are

a pair of complex conjugate numbers, then (21.a) guarantees(16.a) that they must have negative real parts and (20) is stable. If

the two eigenvalues are real numbers, by (19), their product is

(t t)) positive. Hence (21) tells us that both ofthe two eigenvaluesmust be negative and (4.b) is satisfied. This completes theproof of Theorem 5.

2) Finite Horizon Game(16.b) For the finite horizon game, we have

Theorem 6: For Game (1) with finite horizon, 1) there is ae solved unique Nash equilibrium solution (18) implemented by), two of decentralized control with constant state feedback gains if

and only if conditions (8) and (10) are satisfied; 2) there is a

unique Nash equilibrium solution (22) implemented by(17) decentralized control with time varying state feedback gains

if and only if condition (13) is satisfied.and bul= -blcllxl/[l +cb(t- t)] (Vt E[tt]) (22.a)

innot be

um b=-2c22x2/1±c22b(tf- t) (Vt e[tto ]) (22.b)

lems are Proof of Theorem 6: The proof of Theorem 6 is

*ates this straightforward. Note that the state in (22) is that of the

er game closed-loop system (23).to derive

we can

nted by

eorem 5solution

xI a -±bl c2cbl2 b-t) 1 + a2x2

x2 = 3XI + (a4 - b2C2c2 /[l + C22b2 t-t)}X (23)

(X1 (t ) = X1 x2 (to ) = X20 )

Remark 21) Each player's performance index corresponding to theNash equilibrium solution can be computed as

J1 (x0 , u1, u2) (x1, x20 )Oi (to )(x1 x20)' (i = 1, 2) (24)

1963

q5i (t)

WelP1 1.3



2) In the infinite horizon game, if a2a3 < 0, then condition(19) will be satisfied automatically and the closed-loopsystem (20) is stable. Under this assumption, the closed-loopsystem (20) has a negative feedback loop connectionstructure. The assumption a23 <0 is a special case for (19).If a2 and a3 have the same signs, as long as

±al+b12q I a 2 + b2q22 is large enough such that (19) issatisfied, the closed-loop system will still be stable.3) Solutions to algebraic Riccati equation (4.a) do notguarantee stability of the closed-loop system. We still needspecific explicit conditions for each concrete game, e.g. (19)is required for infinite horizon Game (1) to have a stabilizingdecentralized Nash equilibrium solution (18).4) Because we do not add any constraints on definiteness ofQi and rij (i,j= 1, 2; i#j), Riccati equation solutions (7) and / or(12) may be indefinite. But note that 01 and V2 in (7) and (12)are nonnegative. This is consistent with the facts that q,>20,cH1>0 and u12 having a posive coefficient (which is actuallynormalized to be 1) in Ji (i= 1,2).5) Riccati equation (4.a) and (5) may have multiple solutions.So there may exist multiple Nash equilibria for Game (1). Inthis section, only the diagonal-form Riccati equation solutionis discussed. The Nash equilibria constructed on the basis ofdiagonal-form Riccati equation solutions are of special partiallinear state feedback structure. The resultant closed-loopsystem is a decentralized control system6) By the definition of strong time consistency in [13], thestrong time consistency property of Nash equilibrium (18)and (22) is guaranteed by the dynamic programming methodused in the proof of Theorem 4 in [12].7) In condition (13), if r12<0 and consequently r21<0, thena2a3>0, C12<0 and c21<0. This implies that the two playershave conflicts in their performance indices and we obtain anadversarial game. If r12>0 and consequently r21>0, then wehave a2a3<0, C12>0, C21>0 and the diagonal-form Riccatiequation solution (12) is positive definite for any t within[to, tf ] . This implies that the effort that one player made to

improve his/her performance index will also benefit the otherplayer so that the two players actually have a cooperativerelation.

IV. EXTENSION TO 2N ORDER LINEAR QUADRATIC GAME

Now extend the results in section III to a particularmulti-dimension linear quadratic game. A 2n-order (n>1)system is described by (25)

XI = AIX, + A2X2 + BBU (X1 (to) = X10)

X2 = IX314 2BU2 (X2 (to ) = X20 )

J, (X0, Ul U2) = X (tf )CX(tf )

z/L3.a)

+ (XT Q- X+ U,2 +,I~UJ2(25.b)

+±J (xTQ1x±ui2±R11u]2)dt (i,j=1,2;iXj)where X1, X2, U1 and U2 E R ;X=(X1, X21) withX(t0) X0 (X1o~, X20 )-; A, and Bj are all real nxn diagonalmatrices with A2, A3, B1 and B2 nonsingular. The state and

control weighting matrices in (25.b) Qi, Ci E R2nx2n and

REj Rnxn (i,j= 1, 2; ij). Denote

Qi = Q ;3:$ci = (C i = 1,2)

Qij Cij and Rik are all nxn diagonal matrices and Qi,>O andCii>0 (i, k-1,2; i.k; j= 1,2,3). Detectability assumption andcontrollability conclusion are similar with those in Game (1).

Player i (i=1,2) uses linear state feedback control strategy(26) to minimize their own performance index (25.b).

U =-K =(t)X -[KI (t) K12(t)] XI X2] (26)

As in the second-order system, if tf-±+oo, the terminal stateweighting matrices in (25.b) C1 = 0 (i=1, 2) and (26) shouldbe a constant linear state feedback strategy. Denote

DI = A2 + B 2QH and D2 =A ± . We have

Corollary 1: When tf-oO, Game (25) has a decentralized Nashequilibrium solution (27) if and only if conditions (28) and(29) are satisfied.

U = (A1+ AA2+ BIQ )B1Xj (27.a)

U2 = - (A4 + A4 BQ22 )B2X2

Q12= -R12 (A4 + D2 )2 B2-2- 2 A2 (A1 + D1) + Q13BD2A3B2

Q21 =-21(A1+D1)2 B

(27.b)

(28.a)

(9ZR h)-2 A (A4±D2)±Q2 JD1A2 %2 \D v}

Al +B1Q1 A, +±B2 Q22 A2A3 > 0 (29)Corollary 2: For Game (25) with finite horizon, 1) thereexists a unique Nash equilibrium solution (27) implementedby decentralized control with constant state feedback gains ifand only if conditions (28) and (30) are satisfied; 2) thereexists a unique Nash equilibrium solution (31) implementedby decentralized control with time varying state feedbackgains if and only if condition (32) is satisfied.

ICl = (A1+D1)Bj , C13 = U

C12 = - [A2 (Al +D) + Q13B2 ]A3IB12

C21 =[A3 (A4 +D2) + Q23B22JA2IB2

{C22 (A4 +D2)B22, C23 = 0

U1 = -BIC, LI±+CB(tB t)] X1 (vt E Lto, tf)

U2 = -B2C22 [I + C22B22 (tf - t)] X2 (Vt E [tO,tf])

(30.a)

(30.b)

(31.a)

(31.b)

Ak Ci3 = , Qi = 0, Bj'B2' = -A 3R-

C1l = -A3A21IR12C22, C12 = -A2C1IA33 (32)

C21 = -A3C22A2, R12R2I = I (i = 1,2;k = 1,4)The proof of Corollary 1 and 2 is very similar to those of

1964

WelP1 1.3



Theorem 5 and 6.Remark 31) If conditions (28) and (29) are satisfied for Game (2'infinite horizon and conditions (28) and (30) are satisfGame (25) with finite horizon, under the Nash equilsolution (27), each player's performance index ccomputed by (33). If condition (32) is satisfied for Ganwith finite horizon, under the Nash equilibrium solutioieach player's performance index can be computed b)Actually, (35) and (36) are the solutions of the Iequations associated with Game (25) under diconditions.

Jij(X0,U,̂U2) = X~TOi (to )Xo (i = 1, 2)

Ji (Xo,U U2) =XoT (to )Xo (i = 1, 2)

where 01 (to) = _, 02 (to) K__, (to)]0fi3 A)2 KV3 V22

,l

-

/-

and 02(to) are described by (35) and (36).t3 V2 J

{ il = (A1 +D1) B1-2 013 = 0

012 = [A2 (A1 +D1) + Q13BA3B12

21 = -[A3 (A4 +D2) + Q23BJ22 A2-1B222 = (A4 + D2)BJ 3 0B°

tl(t) =

02(t)

03 (t) =

C1lLI+ C1lB2 (tf -It)-ACilP+3 (IB _1B(t) t)

o (ttECt°'~~2tf])

tI1l(t) A3C22 A2 (I + C22B2(t- t))]

y2 (t) = C22 I±+ C22B2 (tf t)]y3(t) = (t t0ot]tf)

2) In this section, a matrix (say A,,,) is larger than zero

that this matrix is positive definite, i.e. X Ax > 0 (Vxand XAx = 0 if and only if x=O. The square root isection means that if a diagonal matrix (say A,,,) is pcthen A = Q2 or Q = \ for some Q>O.

V. CONCLUSION

In this paper, the diagonal-form solution of Ialgebraic/differential equations associated with a s

order linear quadratic game where each player can onlylinear state feedback control strategy is derived and expexplicitly in terms of the system coefficients. BasedRiccati equation solution, the Nash equilibrium soluobtained. It happens that the resultant closed-loop syst(decentralized system, which implies that theequilibrium solution can be constructed by execdecentralized control. The results for the second order

5) withied foribrium,an bene (25)tn(3 1),y (34).Riccatifferent

(33)(34)

Sf1 Sf3-2

75i3 A)i

quadratic game is also extended to a 2n (n >1) order linearquadratic game.

The discussion in this paper is about the relationshipsamong the three problems in [14]: the general problem (giventhe system model and the performance index, to derive theoptimal control or Nash strategy), the inverse problem (giventhe system model and the optimal control or Nash strategy,,find the performance index such that the optimal control orNash strategy is valid) and the converse problem (given theperformance index and the optimal control or Nash strategy,find the system model such that the optimal control or Nashstrategy is valid). Hopefully the discussion in this paper willprovide some additional insights to the design degree offreedom for two-player linear quadratic non-cooperativesecond-order games.

REFERENCES

[1] R. E. Kalman, "Contributions to the theory of optimal control," Bol. DeSoc. Math. Mexicana, 1960, pp. 102.

[2] X. Chen and K. Zhou, "Multi-objective filtering design, " Proceedingof IEEE Conf Electrical and Computer Engineering, Canada, May1999, pp. 708-713.

(35.a) [3] 0. L V. Costa and E. F. Tuesta, "Finite horizon quadratic optimalcontrol and a separation principle for Markovian jump linear systems,"IEEE Trans. Automat. Contr., vol. 48, No. 10, pp. 1836-1842, 2003.

[4] B. S. Chen and W. Zhang, "Stochastic H2HI2H control with(3 5.b) state-dependent noise," IEEE Trans. Automat. Contr., vol. 49, No. 1, pp.

45-57, 2004.[5] T. Basar and G. J. Olsder, Dynamic non-cooperative game theory.

SIAM edition, 1999.[6] H. Abou-Kandil, G. Freiling, V. Ionescu and G. Jank, Matrix Riccati

equations: in control and systems theory. Birkhauser Verlag, 2003.(36.a) [7] A. J. T. M. Weeren, J. M. Schumacher and J. C. Engwerda,

"Asymptotic analysis of Nash equilibrium in nonzero-sumlinear-quadratic differential games," Research Memorandum FEW 634,Tilburg University, 1995.

[8] G. P. Papavassilopoulos and G. J. Olsder, "On the linear-quadraticclosed-loop no-memory Nash game," J Optim. Theory Appl., vol. 42,pp. 551-560, 1984.

(36.b) [9] J. B. Cruz Jr. and C. I. Chen, "Series Nash solution of two personnonzero-sum linear-quadratic games," J. Optim. Theory Appl., vol. 7,pp. 240-257, 1971.

[10] G. P. Papavassilopoulos and J. B. Cruz Jr., "On the existence of

means solutions to coupled matrix Riccati differential equations inlinear-quadratic Nash games," IEEE Trans. Automat. Contr., vol.

RTn ) AC-24, No. 1, pp. 127-129, 1979.

in this [I1] G. P. Papavassilopoulos, J. V. Medanic and J. B. Cruz Jr., "On theexistence ofNash strategies and solutions to coupled Riccati equations)sitive, in linear-quadratic games," J Optim. Theory Appl., vol. 28, No. 1, pp.

49-76, 1979.[12] X. Wang and J. B. Cruz Jr., "When is linear state feedback a Nash

Equilibrium solution for linear-quadratic game," IEEE Trans. AutomaicControl, submitted for publication.

[13] T. Basar and P. Bernhard, H,, optimal control and related minimaxRiccati design problems-a dynamic game approach. Birkhauser, 1991, pp.

second 22-22.apply [14] J. Doyle, J. A. Primbs, B. Shapiro and V. Nevistic, "Non-linear games:examples and counterexamples," Proceedings of the 35th Conference

mressed on Decision and Control, Kobe, Japan, Dec. 1997, pp. 3915-3920.on theltion isem is aNash

zutablelinear

1965

WelP1 1.3


Nash Equilibrium for2nd order Two-Player Non-Zero Sum LQ ...cruz/Papers/C129-CDC-2006-Wang.pdf ·...

Documents

Transcript of Nash Equilibrium for2nd order Two-Player Non-Zero Sum LQ ...cruz/Papers/C129-CDC-2006-Wang.pdf ·...