HYPERSONIC VEHICLE TRAJECTORY … vehicle trajectory optimization and control ... mo 65409-0050...

HYPERSONIC VEHICLE TRAJECTORY

OPTIMIZATION AND CONTROL

S. N. Balakrishnan

J. Shen

J. R. Grohs

Dept. of Mechanical & Aerospace Engineering

and Engineering Mechanics

University of Missouri-Rolla

Rolla, MO 65409-0050

FINAL REPORT

July 1997

Grant Number: NAG 1 1728

Hypersonic Vehicles Office

Nasa Langley Research Center

Hampton, VA 23681-0001

https://ntrs.nasa.gov/search.jsp?R=19980017705 2018-06-30T06:29:37+00:00Z

TABLE OF CONTENTS

EXECUTIVE SUMMARY .................................................... 1

BACKGROUND ............................................................ 2

WHY NEURAL NETWORKS ................................................. 3

LITERATURE REVIEW ..................................................... 4

PROBLEM FORMULATION ................................................. 18

OPTIMIZATION/CONTROL ................................................. 20

NUMERICAL RESULTS .................................................... 30

CONCLUSIONS ........................................................... 32

ACKNOWLEDGMENT ..................................................... 32

BIBLIOGRAPHY .......................................................... 33

FIGURES ................................................................ 37

APPENDIX ............................................................... 52

HYPERSONIC VEHICLE TRAJECTORY OPTIMIZATION AND CONTROL

EXECUTIVE SUMMARY

Two classes of neural networks have been developed for the study of hypersonic vehicle

trajectory optimization and control.

The first one is called an 'adaptive critic'. The uniqueness and main features of this approach

are that: 1) they need no external training, 2) they allow variability of initial conditions, and 3) they

can serve as feedback control. This is used to solve a 'free final time' two-point boundary value

problem that maximizes the mass at the rocket burn-out while satisfying the pre-specified burn-out

conditions in velocity, flightpath angle, and altitude.

The second neural network is a recurrent network. An interesting feature of this network

formulation is that when its inputs are the coefficients of the dynamics and control matrices, the

network outputs are the Kalman sequences (with a quadratic cost function); the same network is also

used for identifying the coefficients of the dynamics and control matrices. Consequently, we can use

it to control a system whose parameters are uncertain.

Numerical results are presented which illustrate the potential of these methods.

I. BACKGROUND

For theUnitedStatesto maintainits leadershipin spacetechnology,cheapermeansof space

transportation- alternativesto spaceshuttlemust be developed. In order to developsuchan

alternative,differentconfigurationsof hypersonicvehiclesmustbestudiedfrom the perspectivesof

cost-effectiveperformance.A majorpartof suchstudy involvesoptimal trajectorydesignfor its

missionand control of vehicles. Sincecurrent state of knowledgeof hypersonicvehicles(in

atmosphericflight,especially)is limited,it is imperativethat anytool that isdevelopedfor trajectory

optimization and control be usablewith variationsin flight parameters.Therearequite a few

methods- direct and indirect - availablein the existing literature which deal with trajectory

optimizationandoptimalcontrol. However,theyareeitherill-suitedfor designor donot consider

thedesignphaseof avehicle.First,for eachscenario,typically,a two-point boundaryvalueproblem

needsto be solved. This processcould lead to an enormousamount of time when several

combinationsof scenariosareconsidered.Second,manytrajectoryoptimizationmethodsdo not

directlyyield afeedbackform of control that canbeusedin flight.

In thisstudy,two newneuralnetworkbasedapproacheshavebeenformulatedwhichaddress

the two problemsmentioned. The resultingdesigntechniqueenablesthe user to studyoptimal

trajectoryof hypersonicvehicleswith asetof predeterminedneuralnetworks.

scenarios, this approach is expected to yield near optimal trajectories.

problem in such a way as to produce a feedback control directly.

the gains of the matrices used in a linearized control.

For an envelope of

We formulate the

In the case of recurrent networks,

2

m

II. WHY NEURAL NETWORKS

Use of direct or indirect methods of optimization necessitates having to solve a problem for

each set of initial conditions. This requires determining a separate solution for each possible initial

condition for a given system. Dynamic programming is also a method of determining optimal control

for a family of initial conditions. However, the usual method of solution becomes very difficult to

solve in higher dimensions and nonlinear systems. These methods of solution for control do not

usually yield a feedback form of control in terms of states either.

Other methods of solution also have their advantages and disadvantages. Neighboring

optimal control is beneficial in that the solution of a single two-point boundary value problem

(TPBVP) allows an approximate solution over a limited range of initial conditions. The disadvantage

is that approximation methods such as neighboring optimal control can fail at a distance from the

original TPBVP solution.

Currently, there is no unified mathematical formalism under which a controller can be

designed for nonlinear systems. Techniques like feedback linearization have been used for a few

nonlinear problems under limited conditions, such as equal number of inputs and outputs. More

rigorous and general solutions are available with linearized models; however, they are restricted by

the assumption of linear models. Other available solutions for nonlinear controllers are highly

problem oriented. Consequently, we propose a formulation with neural networks which: 1) solves

a nonlinear control problem directly without any approximation to the system model (in the absence

of a good model this approach can synthesize a nonlinear model of the states), 2) yield a control law

in a feedback form as a function of the current states, and 3) maintain the same structure regardless

of the type or problem (handles linear problems as well). Such a formulation is afforded by the field

of neural networks. In the following sections,we tracethedevelopmentof neuralnetworksand

developmentof learningcontrolin particular.

13I. LITERATURE REVIEW

Thedevelopmentof intelligentcontrol systemdesigntechniqueshasalong andrichhistory

asdoesthefieldof controlsystemsengineeringingeneral.Neuralnetwork techniqueshavealsobeen

usedin control systemsfor quite a longtimebut recentlyhavebecomevery popular. Thissection

containsa brief surveyof the historyof control rangingfrom cyberneticsin the 1940'sthrough

learningcontrol systemsandthe beginningof neural control in the 1960's. The next important

landmarkoccurredwith the useof critic architecturesin reinforcementlearningsystems. We

concludethesectionwith abrief surveyof currentliteratureinneuralcontrol organizedin theareas

of systemidentification,nonlinear,adaptive,andoptimalneuralcontrol.

1. Cybernetics. Neural Networks and Learning Control. Norbert Wiener is recognized

as the father of cybernetics, a field which he describes as "the control and communication in the

animal and in the machine" [I]. Cybernetics also provided some of the motivation for the

development of control theory and neural networks during the 1950's and 1960's. For example,

Ashby contributed two complementary monographs in cybernetics, Desig71for a Brain [2] and An

Introduction to Cybernetics [3] which discussed control and communication in biological systems.

In the former, Ashby gave an early implementation of an artificial neural network called the hemostat.

The latter contribution was a careful development of cybernetics intended to popularize the

technology. Topics discussed include feedback, stability, a black box theory for large systems,

regulation and control in biological systems, and hierarchical control.

4

K. S. Fu givesoneof the first formal descriptionsof learningcontrol in [4]. A learning

controlsystemis acontrolsystemcapableof modifyingits behaviorbasedonexperiencein order to

maintain acceptable performance in the presence of uncertainties. Possible measures of performance

include the amount of time required to adapt to changes and the evaluation of suitable performance

indices. A learning control system is distinguished from adaptive control systems through its ability

to recognize familiar patterns in a situation and, based on past experience, to adjust in order to

improve performance. Adaptive control systems emphasize a control system's ability to react to new

situations.

Sklansky gives an early survey of learning control [5]. According to Sklansky, learning in the

automatic control literature is associated with a hierarchical arrangement of three feedback loops.

These are the controller, a system identifier or pattern recognizer, and a teacher. The pattern

recognizer transforms observable quantities in the system into a fixed set of categories, each of which

corresponds to a set of controller parameters. Categories are represented by fixed regions in an

intermediate feature space. The teacher provides information to the pattern recognizer for adjusting

the boundaries between categories in the feature space so that improved control system performance

results. An adaptive control system uses only the first two loops. The learning loop, which

distinguishes a learning control system from an adaptive control system, sends reinforcement signals

in the form of a reward or a punishment to the pattern recognizer based on an assessment of current

control system performance.

The advantage of the use of the learning loop is that it provides a means of training the pattern

recognizer on-line. Sklansky describes five techniques for the design of learning control systems and

notes their interrelationships and pattern classification. These techniques are decision theory,

trainablethresholdlogic, hill climbing,samplesetconstruction,andMarkov chains. In thedecision

theoretic approach,the boundariesbetweenclassesaredeterminedby estimatingjoint probability

densitiesusingmeasurementstakenfromthe systemduringoperation.Thetrainablethresholdlogic

methodwhich Sklanskydescribesisactuallyaprecursorto theuseof neuralnetworksfor control.

In thismethod,categoryboundariesaremovedby adjustmentof weightedsumsof componentsin

afeaturevector, thisweightedsumisthenpassedthroughathresholdfunctionto produceabipolar

control signal. Theteacherin athresholdlogic learningsystemprovidesinformationfor adjusting

weights in the categorizer. The sample set constructiontechnique breaks categories into

subcategoriesbasedon distancesmeasuredin the featurespace. During traininga fixed set of

prototypefeaturevectorsaredevelopedwith thesubcategoriesgivenby openballssurroundingthe

prototype.Wethenformthecategoryregionsastheunionsofsubcategories.Theboundarybetween

categoriesis formedasa sequenceof hyperplanesperpendicularto hyperplanesjoining prototypes

from eachcategory.

Ideas from decisiontheory, trainablethreshold logic, and sampleset constructionare

prominent in the developmentof neuralnetwork theory. In 1966Nikolic andFu [6] describean

algorithmbasedondecisiontheoryfor on-linelearningcontrolof anunknowndiscretetime plant

withoutanexternalteacher.Controlactionsarechosenfrom afinite set. Theperformanceindexis

theconditionalexpectationof the instantaneousperformanceevaluationswith respectto observed

statesandallowablecontrolactions.ThemodelusedbyNikolic andFu isverysimilarto Sklansky's

generallearningcontrol systemandtheyincludeprovisionsfor thecasewhentheteacherdoesnot

haveperfectknowledgeof theplantbeingcontrolled. Thiswork providesthefoundationfor later

critic basedschemes.

6

Tsypkinalsomakescontributionsin learningcontrol systemsbasedondecisiontheoryand

optimization.In anarticleabout'self-learning'[7], Tsypkindistinguishesbetweenthreemethodsfor

determining decisionrulesin the patternrecognizer. The first method assumesthat statistical

informationis availablein advance.In thiscasestatisticaldecisiontheorycanbeusedto determine

thedecisionrule. In thesecondmethod,thedesignerassumesthat asequenceof correctlyclassified

patternsexists. In thiscase,thedecisionruleis determinedbasedon datain thetrainingsetandthe

methodiscalledlearningwith reinforcement. In thethird case,no informationis assumedinitially

andthedecisionruleis foundusingobservedbutunclassifiedpatternsfrom the system.Tsypkincalls

this third caseself-learningExtensionsof the ideaof self-learningin automaticsystemsappliedto

patternrecognition,identification,dualcontrol,andtheallocationof resourcesarediscussedin a later

work [8] andcompiledinto atext [9].

Theimprovementinperformancewith respectto givenperformanceobjectivesandbasedon

experienceis a commontheme in learningcontrol. There are three componentsrelatedto

performancein the control system:i) the specificationoptimal performanceobjectives,2) the

assessmentof the system'slevelof performance,and3) ameans for improving performance over

time. Cybernetics and learning control are based on the use of pattern recognition, optimization, and

control of uncertain dynamic systems using biologically inspired models of intelligent behavior.

Rudimentary neural networks in the form of linear threshold logic units have been used as an

implementation medium for learning control systems cited above. We now turn to a discussion of

a subclass of learning control systems called reinforcement learning systems which build in methods

for assessing and improving control system performance.

7

2. Learning with a Critic, The ground breaking work on learning control in the 1960's,

along with studies in cybernetics, has led to a study of critic-based systems for two decades and this

study has recently been revived even in the current decade. In 1970, Mendel and McLaren introduced

a concept in learning control which they call reinforcement learning [ I0]. Reinforcement learning

control is developed as a subclass of learning control discussed above with the addition of

performance assessment and a method for modifying controller actions. The idea is to provide a

means of control for unstructured environments where the plant model may not be known or where

a complex performance measure is used [ 11]. In reinforcement learning systems, a critic is used to

monitor plant inputs and outputs and to provide an evaluation signal which represents an indication

of current performance to the controller.

Widrow, Gupta and Maitra [12] describe the concept of the critic for adaptation of neural

networks. Widrow et al., delineate three separate modes of learning. A supervised learning system,

also known as learning with a teacher, modifies the parameters of the neural network using error

between network output signals and the desired output signals. The assumption here is that the

desired output signals corresponding to each input signal are known at the time that learning is taking

place. In an unsupervised learning procedure, also called learning without a teacher or decision-

directed learning, the parameter adjustments are not guided by knowledge of a desired output signal.

Learning with a critic bridges the gap between the two previous methods. Learning with a critic does

not assume that desired output signals are known for each input signal but rather that some indication

can be made with respect to network performance over a series of trials.

Barto, Sutton and Anderson [13] extend the idea of learning with a critic through the

development of a learning system which includes both an adaptive critic element and an adaptive

search element. As in the learningwith a critic approach,explicit desiredcontrol actionsare

unknown.The objectiveis to providecontrolsignalswhichtendto optimizea performanceindex.

Thepurposeof theadaptivesearchelementis to implementatrial-and-errorprocedureto associate

controlvectorswith respectiveobservationsof thestateof thesystembeingcontrolled. Theadaptive

critic elementreceivesasuccess/failuresignalfromanoutsidesourceasa resultof aseriesof control

actions. This signalis calledanexternalreinforcementsignal. The adaptivecritic elementalso

receivesweightedsignalsfrom eachof the statevariableof the controlledsystem. The external

reinforcementsignalprovidesfeedbackfor modifying the strengthsof theseconnections. The

adaptivecritic elementusestheexternalreinforcementsignalandweightedstatesignalsto provide

a continuousevaluationof performanceto helpguide the searchfor appropriatecontrol actions.

Sutton callsthe critic basedadaptationalgorithmthe"AdaptiveHeuristicCritic" anddevelopsits

applicationin creditassignmentproblemsin hisPh.D.dissertation[ 14].

Theimplementationof theadaptivecritic is basedonWidrow's methodfor learningwith a

criticbutprovidesahigherlevelof feedbackto thecontrol system.Two setsof connectionweights

connectingtwo processingelementsareadjustedduringthelearningprocedure.This,in conjunction,

with theactivesearchdistinguishestheadaptivecriticarchitecturefrom previouswork. Theadaptive

critic architectureis capableof learningto balanceapolemountedon amovablecart byapplying

controlsignalsto a movablecartwithnopriorknowledgeof thesystemto becontrolled.Thisability

to determinecontrolactionsassumingnopreviousknowledgeisagreatstrengthof theadaptivecritic

architecture.Thedisadvantageof thearchitectureis that manyfailedtrialsoccurbeforeasuccessful

run iscompleted.Thecart-polesolutionalsodependson thepartitioningof theproblemstatespace

intoafinitenumberof regions.Thispartitioningmaynotbepracticalin problemswherefinercontrol

9

is required. In this case the number of regions required may be too large for effective results.

Examples of such problems include those with time-varying dynamics, tracking problems, and some

nonlinear problems.

Barto et al. [13], distinguish between supervised learning paradigms and reinforcement

learning used in the adaptive critic approach. In the supervised learning approach training proceeds

in several steps. First an input pattern is presented to a neural network. An output response is

produced based on the current parameters embedded within the network. The response is then

compared with a desired response and error is used to modify the neural network parameters to

improve its mapping. Reinforcement learning is based on an evaluation of the current network output

in relationship with current external factors (states in a system for example). This evaluation may be

as simple as a binary decision indicating a reward for proper response or punishment for inappropriate

response. The quality of feedback for a system using reinforcement learning is lower than that

available in a supervised learning system. This property makes reinforcement learning methods useful

for situations when a quantitative answer is not available.

Werbos [15] defends the use of neural networks for control applications. He suggests that

neural networks will be able to solve difficult problems faced by modern controls engineers including

the real-time control of nonlinear possibly unknown systems with high noise levels and high

throughput. Werbos describes five dominant paradigms for use in neural control systems. These are

Supervised Control, Inverse Dynamics, Stabilization Systems, Backpropagation Through Time and

Adaptive Critics with Reinforcement Learning. The Supervised Control architecture uses a neural

network trained to map current state vectors to corresponding control vectors. In the inverse

dynamics approach, observed system state is assumed to be a function of the current control and

10

previoussystemstate. Theneuralnetworkis trainedto invertthe plantin orderto providecontrol

actionswhichleadto desiredstates.Stabilizationsystemsaredesignedto providestablecontrol in

tracking andregulatorproblems.Backpropagationthroughtime dependson a plantmodelanda

performanceindexwritten in termsof control andstateactions. The neuralnetwork predictsa

sequenceof statesgivenasequenceof controlactions.Thebackpropagationalgorithmthenprovides

derivativesof theperformanceindexwhichcanbeusedto updatecontrol actionsat eachstepalong

theway. Adaptivecriticarchitecturesandreinforcementlearningarethefocalpoint in [33]. Werbos

describessystemsbasedon theadaptivecritic asanapproximationto dynamicprogrammingand

presentsthenotionof thebackpropagatedcritic.

Jameson[16] claimsto be thefirst to publishresultsusinga backpropagatedcritic. The

primarydifferencebetweentheadaptivecritic architectureof Barto Sutton,andAndersonandthe

backpropagatedcritic is themaximizationof thecritic output providinggradientinformationvia a

plantmodelnetworkto thecontrollersothatfuture controlactionscanbeimproved.Thepurpose

of the critic network in this architectureis to predict future reinforcementsignalsfrom the

environment. Thecritic networkanda modelof theplantareusedto calculatederivativesof the

predictedreinforcementsignalwith respectto controlactions.The controlactionsarethenmodified

to improve performance. The prediction providedby the critic network is also improvedby

comparingtheactualreinforcementsignalwith previouslystoredpredictions.Thebackpropagated

critic, like previouscritic designs,assumesnoknowledgeof the plantandresultsareimprovedby

makingmultipleattemptsat a solution.

SofgeandWhite[17] advocatethedevelopmentof neuralcontrol architectureswhichcanbe

adaptedon-linefor stableoperationof unknown,nonlinearplantswhich mayincludenoisein the

I1

feedbackloop. Theysuggestthat adaptivecritic architecturesmaybeusedin manufacturingthe

processcontrol applicationsto provideflexibility andefficientadaptabilitythroughchangeswhich

occurduring thelife-cycleof equipment.Theauthorsuseanadaptivecritic architecturebasedon

Albus'CMAC neuralnetwork[18]to doprocesscontrolinathermoplasticcompositemanufacturing

process.Accordingto SofgeandWhite,"the goal of on-linelearningis thereal-timeoptimization

of a largescalenon-linearprocessat minimalcomputationalcost." Theauthorshavedesignedand

built anadaptivecritic systemfor controlof manufacturingprocesses.

Watkins givesa recentimplementationof reinforcementlearningcalledQ-learningin his

dissertation[19]. Q-learningis basedon theapproximationof arealvaluedfunction,calledtheQ-

functionbyWatkins. Theq-functionis afunctionthat mapscurrentplantstateandcontrolinto an

estimationof thefutureperformanceof thesystem.Thisestimateis basedon theassumptionthat

optimalcontrol is appliedto theplantfrom the next timeinstantforward. A Q-learningalgorithm

is an algorithmwhich iteratively improvesthe estimationfor the Q-function. There is a close

correspondencebetweenQ-learninganddynamicprogrammingusedin the control of dynamical

systems[20]. Bradtke[21] distinguishesbetweentwo typesof Q-learningalgorithms.Bradtkecalls

theformdescribedaboveanoptimizingQ-learningalgorithmbecauseit tries to learntheQ-function

directly. A slightly modifiedform calledthe policy-basedQ-learningalgorithmtries to learnan

optimalsequenceof plantcontrolinputs(thecontrol policy).

Manyrecentcontrolsystemapplicationsof the ideasof reinforcementlearningandadaptive

criticarchitecturesexist. Gullapallidescribesareinforcementlearningalgorithmfor learningcontrol.

Thismethodusesradialbasisfunctionsandtheadjustableparametersof thenetworkaremeansand

variancesof normaldistributionfunctions.Themethodisappliedto a simulated3 degree-of-freedom

12

robotic arm [22]. Stamenkovich uses adaptive critic and adaptive search elements for learning to

guide a ship through a channel [23]. Shelton [24] demonstrates an adaptive critic design for

controlling a track with a CMAC (Cerebellar Model Articulated Controller, [18). Tham and Prager

compare the adaptive heuristic critic algorithm with the Q-Learning algorithm for obstacle avoidance

and control in multi-linked robotic manipulators [25]. Gachet et al., present an adaptive heuristic

critic based control system for learning goal based behavior for autonomous robot control. The three

types of behavior discussed are: 1) move to a goal state,

path [26].

3.

2) do surveillance, and 3) follow a specified

Neural Identification and Control. There has been an explosion of reported research

in the use of neural networks in control systems in recent years. Bavarian [27] gives an introduction

to the use of neural networks for intelligent control. Several monographs have been compiled

including a well known work edited by Miller, Sutton, and Werbos [28]. White and Sofge have

compiled a book which includes several chapters dealing with the use of neural networks in intelligent

control systems [29]. Hunt et al., have produced a comprehensive survey of the field [30].

Psaltis, Sideris and Yamamura describe three possible architectures for neural control systems

[31]. The indirect learning architecture attempts to invert the plant in order to provide control signals

which track a given input signal. In the generalized learning architecture the desired plant input signal

is assumed known and the neural network is trained to produce input signals for the next sampling

interval given the current plant output. The result is an output feedback control. The third

architecture is called the specialized learning architecture where the neural network is trained to

provide control to track an input function by minimizing the tracking error.

13

Levin andNarendra[32] presenta theory for the designof neuralcontrol systemswhich

stabilizenonlineardynamicsystemsaboutanequilibriumpoint. This theory is basedonnonlinear

control theory. Thearticlecontainsnecessarybackgroundinformationin nonlinearcontroltheory

and many examplesillustrating the interactionbetweennonlineartheory and the useof neural

networksfor stableregulation. Possiblecontrol methodsfor nonlinearsystemsinclude:1) theuse

of a linearcontrollerwhich assumesthat the plantcanbe linearizedaboutthe operatingpoint, 2)

stabilizingcontrol usingfeedbackstabilizationwherea changein statevariablesanda feedback

controllawareusedto transformasysteminto onewhich is linearaboutanoperatingpoint, and3)

directstabilizationthroughtheuseof anonlinearcontrol law. Neuralcontrol designsaregivenfor

thefeedbackstabilizationanddirect stabilizationmethods.

As statedabove,adaptiveandlearningcontrol systemsdependon theability to identifyplant

dynamics. Therehavebeena numberof contributionsin the useof neuralnetworksfor system

identification. Narendraand Parthasarathydiscussfeedforwardand recurrent neural network

structuresfor identificationandcontrolof systems[33]. Theauthorspresenta methodfor training

recurrentneuralnetworksanddescribenecessaryassumptionsfor wellposedneuralcontrolproblems.

Fernandez,Parlos,andTsaiinvestigatenonlinearsystemidentificationwith neuralnetworksby using

a recurrent networkto identifynonlineardynamicsystemsin discretetimebasedon input-output

measurements.Theresultsareappliedto the identificationof boilerdynamics[34]. Polycarpouand

Ioannou presenta stabilitytheoryapproachto synthesisandanalysisof identificationandcontrol

schemesin nonlinearsystemsusingneuralnetworks[35]. Both gradientandLyapunovsynthesis

approachesareapplied.

14

u

Applications of neural networks in adaptive control have also been investigated by several

researchers. Guez, Eilbert, and Kam [36] propose a neural network architecture for neural model

reference adaptive control. This system adjusts feedback gains so that the closed loop time response

matches a desired time response of a given reference model. Hoskins, Hwang and Vagners [37] use

iterative inversion of a neural plant model to provide control signals to the plant. The method is

applied to a problem in redundant manipulator kinematics, a model reference adaptive control system,

and a linear mass-spring-damper system. Hoskins and Himmelblau use similar techniques with an

emphasis on reinforcement learning applied to process control [38].

Goldenthal and Farre[1 [39] backpropagate the error between the actual plant and a reference

model through a neural network model of the plant and then continue the backpropagation procedure

through the controller network to update controller weights. The technique is demonstrated in a

model reference neural adaptive control system applied to the cart-pole problem. To accomplish this,

the backpropagation algorithm is extended so that the network can function as a closed-loop

controller and to force the closed loop system to match desired reference response.

Lan and Chand also investigate the discrete time linear quadratic regulator problem [40].

They point out that the conventional solution of the problem is an off-line solution. The computed

control history is stored and used later in an open loop control. The disadvantage to this approach

is that it is not robust and does not work for time-varying systems. Lan and Chand formulate an

augmented performance index with the linear constraint equations of the controlled system embedded.

The augmented performance index is then related to parameters in the energy function of a Hopfield

network [41]. The Hopfield network then minimizes the performance index in an iterative fashion

producing the required optimal control.

15

uncertainties.

space model.

Iiguni, Sakai and Tokumam [42] report a nonlinear regulator design which uses feedforward

neural networks to augment a linear quadratic regulator design for a nonlinear plant with parameter

The authors assume that the nonlinear plant can be modeled using a known linear state

This linear model is then used as the basis for a linear quadratic regulator (LQR)

design. The LQR design procedure yields gains for plant state feedback which minimizes a linear

quadratic performance index. We now have a regulator design which may be used with the actual

plant, however, the range of optimal control operation is limited.

Bouzerdoum and Pattison give a method for mapping a class of optimization problems onto

a recurrent neural network architecture [43].

index,

1 TQ x xTyJ (x) = --x2

with respect to vectors x e IR" subject to bound constraints

gi -< Xi < Vi' i = 1,''" ,n

The method minimizes a static quadratic performance

(1)

(2)

where the subscripts indicate components of the respective vectors. This static optimization problem

has a known solution. However, a matrix inversion is necessary and this is computationally intensive

for large dimensional spaces and difficult for ill-conditioned weighting matrices. The recurrent neural

network solution provides a parallel implementation for solving the problem.

Antony and Acar develop algorithms for real-time optimal control of discrete systems with

respect to a quadratic performance index over a finite time interval [44]. Problem formulations based

on the discrete time Hamiltonian for linear and partially unknown nonlinear systems are given. The

method depends on a model of the plant dynamics using a feedforward neural network. Two distinct

methods are given. For the first method, control vectors at each sample instant are modified during

16

every iteration of the algorithm. The second method develops the optimal control by a backward

sweep beginning at the final time. The second method has slower convergence rates but requires less

storage and fewer computations during each iteration.

In this research, we have formulated two types of neural networks. The first one is called an

"Adaptive Critic' architecture. The reason for choosing this structure for formulating the hypersonic

vehicle optimal control problems are: 1) this structure obtains an optimal controller through solving

dynamic programming equations, 2) this approach (see, Figure 1), has a supervisor (critic) which

critiques the outputs of the controller network and a neural network controller. Therefore, this

approach has a built-in fault tolerance, 3) this approach needs NO external training as in other forms

ofneurocontrollers, 4) this is not an open loop optimal controller but a feedback controller, and 5)

it preserves the same structure regardless of the problem (linear or nonlinear).

The adaptive critic method determines an optimal control law for a system by successively

adapting two networks, an action and a critic network. The control law does not need to be

determined a priori mathematically. This method simultaneously computes and adapts the neural

networks to the optimal control policy for both linear and nonlinear systems. In addition, it is

important to know that the form of control does not need to be known in order to use this method.

Since the control law is computed for a range of initial conditions, this approach is ideal for design

studies.

The second approach is to formulate a neural network for simultaneous identification and

control. This uses a modified form of Hopfield neural networks. The need for this network arose

atler the customer indicated that there is a large level of uncertainty in the system parameters. We

anticipated the need for this during the second year and formulated the network while awaiting the

17

POST3Dprogramandinputs. Researchanddevelopmentbasedon this approacharepresentedas

a conferencepaperattheend.Thispaperwaspresentedat the 1996AtmosphericFlightMechanics

ConferenceinJuly 1996at SanDiego,CA. Thispaperis enclosedin the Appendix.

The first part of the rest of this report dealswith the adaptivecritic approach,problem

formulation,algorithmdevelopmentandresults.

le

IV. PROBLEM FORMULATION

Statement of the General Problem

In this study a problem of the form (finite-time with terminal constraints) where a cost

function, J, given by

tf

J=qb(x(tr))0

(3)

subject to differential constraints

:L=f(x,u) (4)

tr-given Xo-given (5)

is considered, x is an n-dimensional state vector, u is an m-dimensional control vector, qb(), qJ(),

and f( ) are linear or nonlinear functions of state and/or control. Xo are the initial conditions and

tf is the final time.

18

o Dynamic Programming Background

We can rewrite Eq. (3)

J(x(t)) =U(x(t),u(x(t))) +<_r(x(t + 1))> (6)

Here, J(x(t)) is the cost associated with going from time t to the final time. U(x(t),u(x(t))) is the

utility, which is the cost from going from time t to time t+ 1. <.I(x(t+ 1)) > is assumed to be the

minimum cost associated with going from time t+l to the final time. If both sides of the

equation are differentiated and we define

).(x(t))- _iJ(x(t))6x(t) (7)

then

,1.(x(t))= 6U(x(t),u(t)) + 6U(x(t),u(t))6x(t) 6u(t)

( +( 6x(t+l) 6u(x(t)) /6x(t+!)/ _.(x(t+l))

_.(x(t+l)) 6x(t) / 6u(t) 6-_ /

(8)

From this it can be seen that if < Z(x(t+ 1)) >, U(x(t),u(t)) and the system model derivatives axe

known then _.(x(t)) can be found.

Next, the optimality equation is defined as

6J(x(t)) _ 0

6u(t)(9)

Dynamic programming uses these equation to aid in solving an infinite horizon policy or to

determine the control policy for a finite horizon problem.

19

3. Training Methods (Approximation Techniques)

This study uses Eqns. (8) and (9) in order to determine the optimal control policy. The

basic tr_ing takes place in two stages, the training of the action network (the network modeling

u(x(t)) and the training of the critic network (the network modeling, or approximating 3.(x(t)).

Both networks are assumed to be feedforward multiple layer perceptron networks.

The schematics of the controller (action) and critic networks are presented in Figures 2 and

3. To train the action network for time step t, first x(t) is randomized and the action network

outputs u(t). The system model is then used to find x(t+ 1) and (Sx(t+ 1))/(Su(t)). Next, the

critic from t+ 1 is used to find X(x(t+ 1)). This information is used to update the action network.

This process is continued until a predetermined level of convergence is reached.

In order to train the critic network for the time step t, x(t) is randomized and the output

of the critic 3.(x(t)) is found. The action network from step t calculates u(t) and (Su(t))/(6x(t)).

The model is then used to find (6x(t+ 1))/(Sx(t)), (6x(t+ 1))/(6u(t)) and x(t+ 1). The critic from

step t+ 1 is then used to find 3.(x(t+ 1)). After this, Eq. (8) is used to find X'(x(t)), the target

value for the critic. This process is continued until a predetermined level of convergence is

reached. In an infinite-dimensional problem, the training ends with one stage; however, for a

finite dimensional problem, such as this study, this series of steps is used at each stage. This

process will be explained in detail in the next section.

V. OPTIMIZATION/CONTROL

Motivation for the formulation in this section comes from the need of the customer in that

they would like to study the trajectories from the scramjet turn-off.to the rocket burn-out conditions

2O

of a certain vehicle. The reason for this is the uncertainties in the parameters of the earlier stage

designs. Consequently, there will be an envelope of conditions from which the rocket will have to

start and yet carry the payload to the pre-specified burn-out conditions. It is assumed that the rocket

burn-out conditions will ensure a proper apogee through the coasting period.

The cost function is given by

J

where

=

=

m

v

Y

h

Si

Subscripts

f

1 1

J = - Slmf+-_-S2(vf-vfD)2 +_'S3_f-YfD) 2

+ ! S4(hf- hfD)22

cost function to be minimized

mass

velocity

flightpath angle

altitude

-- weights on the final conditions

(10)

= final

fD -: desired final

Note that this cost function maximizes the final payload while ensuring that the velocity, the flightpath

angle and the altitude at the final time are as close to the finaVdesired burn-out conditions as possible.

The equations of motion are given by

21

rh T

glp

In = vsiny

(11)

(12)

,;' = (T cos_ -D)/m - gsiny/r 2 (13)

where

(v 1? = (Tsinct +L)/mv + - - g/r2v cosyr

T - thrust

1

L = kl, a _- pv2S = lift

D -= _-pvES - drag

la = gravitational constant

r = radial distance from the center of the earth

R_ = radius of the earth

I = specific impulsesla

ct ; angle of attack

(14)

A schematic of the scenario is presented in Figure 4. Final time is unknown. That means, this is a

'flee-final time' problem. There is no solution in the current literature for solving the 'free-final time'

problem for an envelope of initial conditions (other than the general method of dynamic

programming).

In order to solve this problem with neural networks, we transform it to one where altitude is

the independent variable. Through this step, we convert it to a problem where we can break it down

22

m

into several segments of altitude; this also allows us to reach the final desired altitude in all cases.

The initial conditions for this scenario are the possible final conditions from the termination of

scram jet.

This is a two-point boundary value problem where the initial conditions are known but the

final conditions are unknown. Usually, it is solved for a given set of initial conditions; however, in

this project we develop an adaptive critic-based solution which will solve the problem for an envelqpe

of initial conditions. By reformulating the model, we are able to remove altitude from the cost

function since the final condition in altitude is satisfied exactly.

The reformulated equations of motion with altitude, h as the independent variable, are given

by

dm/dh = T/gIp. 1/vsiny (15)

1 ldv/dh = mvslny - g / v (16)

COS t_ -

where the drag coefficient has been approximated with a parabolic drag polar with a least squares fit.

dy/dh TsinO_mv2siny+kl_lpv2S 1

In Eqns. (15-17), where lift coefficient CL has been approximated with a linear least squares fit.

23

CDo ,K 2,K l = constants

glocal acceleration due to gravity

In order to calculate the flight time, a fourth equation is added as,

dt/dhv sin y

For solutions with neural networks,

single-step discrete equations as:

(18)

we convert these nonlinear differential equations to

ink, I = m k -_Tk I ) Ahgklsp vksinYk

(19)

vk, t =v k+[(T kcos%- Dk)/m kvksinY k - gk/VklAh (2O)

Yk*l = yk +[I •T k sin % + Lk)/m k v 2 sm Yk + rk Vk Vk sm Yk]

1tk. i = t k +vksinYk

Ah

(21)

(22)

where

Ah

k

step size in altitude

stage

24

The corresponding Hamiltonian of the optimized problem is

Hk = )" mR + )'vk Vk" + )'_'k,Ykmk, 1 +i ",1 I . -I

where

Xk+!

Lagrangian multiplier for variable x at stage (k+l).

(23)

The propagation equations for the Lagrange's

differentiation of the Hamiltonian with respect to the states.

multipliers

They are:

are obtained by partial

X 0Hk , ]T- , x =- [m v,yxk _X k

(24)

Xm k

Ah

mk+l 2

m k

T k cos a k - D k

v k sin Yk

"XVk,I

T k sin a k --+Lk lS - Xy_., JVk staY k

(25)

"_'Yk =

Xv k Vk, 1

Ah

v k siny k Tk'nk" gk Isp

gk sinYk+

V k

+).Vk*l

T k cos a k

mkV k

I 2T ksina k+_'Yk., mk V 2

2gkcosy k

2

V k

Ah [ TkCOtYk ).

_"tk., + vksinyk [ gklsp mk+,

-(T kcosak - Dk)COtYk")'Vk-' mk

(TkSinak +Dk)COtYk/_Yk-t -- 2

mkV kV kr k

1

V k

2:k]1]

Vk

(26)

(27)

25

Note that %k+_is needed to solve for _k. The boundary conditions for the multiplier equations are

0J) (28)

Optimal control is obtained by partially differentiating the Hamiltonian with respect to the control.

In our case, angle of attack, cq is the control variable. We get

0Hk - 0 (29)

This gives

/_' Vk. I

_Yk*l

T k coso_ k + k I _ 9k Vk S = 0

V k

(30)

First, we solve for the control at the (N-l) 'h stage where N is the preselected number of stages.

That is, (aRer using small angle (_)) assumption

)'vN T_-t+kt_ py-Lvy-ls /vY-I = 0

(31)

26

Note that

/-vs = S2@N-VfD) (32)

Iys = S3(Yy-YFD). (33)

By substituting for IvN and Iyg in Eq. (31), we get

+ $3 N-Y_] TN-I÷kI_'PN-IVN-IS /Vs-1

(34)

We substitute for vs and YN in Eq. (34) in terms ofvs_ 1 and Ys._ by using propagation equations, Eq.

(25-27).

+ $3 [ YY-I

VN- 1 +

+

( 2) s t- TN - 1 CD D + k2 aN- 1 qN- 1 _ qN- I Ah

mN-1 VN_ 1 sinYN_ 1 VN_ 1

- reD] [ - (Trv_t + 2 k_ qS-_) _N_,]

/._.•,qN_ls.._l/ vN_l qN_l)mN_ 1VN-i sinyN_ _ rN-t VN-_

f'-' qN-1sI:°._1cOtYN- 11VN - 1

(35)

27

where the dynamic pressure qN-t is

1 2

qN-t - 10N-I VN-12

(36)

This leads to a cubic equation in _S-tas

T2T 3 o__t + (TIT 3 +T_T6) C_y_L + T 4 T_ =0 (37)

where

TI

TN-I - CDDqN-I S

mN - 1 VN - 1sin %_ x

qN-I

VN - 1

Ah- (38)

r 2

S 2 k2qN_ 1 S

mN-IVN-lsinYN-1

Ah (39)

T 3

T 4

- (TN_, +21qqs_,S )

S3[Yy_t + (VN-_I qN-t )cOtYN-,rN - 1 VN - I VN - 1

Ah-

(40)

(41)

Y s

T 6

TN-t + kl qN-1 S

VN - 1

$3 T5 / (mN-I sinYN-l vN-t)

28

(42)

(43)

Wecanobservethat all quantitiesareknownin termsof quantitiesat N-1. That is c_s., is available

asa feedbackcontrol basedonstatesatN-1.

For all other stages,k, we obtainthe expressionfor control in termsof the Lagrangian

multipliersat k + 1.

)'Yk., 1 Tk + kl qk S0_K -

_vk+i Vk Tk + 2k2 qk S

(44)

How do we construct the neural networks to solve this problem?

1. Solve for c_s._ in terms of my. L, vs. ,, Ys.I

Generate various _s-_ by changing ms. 1 , vs.1, Y,-z._•

Use a neural network to output as. 1 for m._.,, vs. t, Ys-, .... called _,,,., network.

[ We have optimal c_s. l now]

2. In order to solve for % (k=-0,1,2...N-2), of mk, vk, Yk, we need 3.m_., , 3.v_._ , )'Yk., ' So, use

)-s, ms-,, VN-I, YN-I and c_s. ' from step 1 to solve ford.ms, ._ , 3. N_, , and )_y__, using the ).-

(backward) propagation equations, Eq. (25-27). Train a neural network with ms.,, vs.,, Ys._

as inputs and ks-_ as output. Call this )-s-_ network.

[We have optimal ).s-_ now.]

How do we construct other networks?

3. Assume different values of ms. 2, vs.2, Ys-z and use a neural network to output e%,. This will

no, be optimal. Use rosa, vs.z, Ys-z, cos-2 in state equations, Eq. (19-21), to obtain m, v, y at

N-1. Use these states in 3.s. t network to output )-s-t. Use these 3._._ in optimal c_ equation,

Eq. (44), to compute (as.,) ,_,,_,. Continue this process till convergence.

[We have optimal _s-2 now.]

29

4. Assumedifferentvaluesof ms.2,vN.2,YN-2andusethemto get t_y.2 from aN-2 network. Use

all these in the state propagation equations to calculate states at N-1. Input these states in

)'s-_ network to get Xs. _. Use this XN._ and states and control at N-2 to find )-N-2 from the _,

propagation equations. Construct a ,ks.,, network to output ks. 2 with ms.2, vs.2, Ys-2 as inputs.

[We have optimal Ys-2 now.]

5. Assume different values of m_. 3, vs. 3, YN-3 and construct an aN. 3 network similar to ¢tN.2

network in step 3.

6. Construct a XN.3 network similar to Xs. 2 network in step 4.

Continue this process from k = N-l, N-2, ....0

How do we use these networks to generate optimal trajectory from given initial conditions?

Assume any mo, Vo,¥o and l_ [within the trained range]. Use q, neural network to find optimal t_ and

integrate till h for a_ network is reached. Use the m_, vt, y_ values to find t_t from the ct, neural

network and integrate till h2 is reached, and so on, till hr is reached.

Note that the forward intem'ation can be done in terms _)f_;im¢. and note that the Lagrange multiplier

network, used in the controller synthesis, is not needed now.

VI. NUMERICAL RESULTS

In order to verify the applicability of the adaptive critic approach to flexible trajectory

optimization, we used the rocket vehicle contained in a test case sent by the customer. We present

the results corresponding to two stages of neural-controlled trajectories from the burn-out of the

rocket in Figures 5-15. The desired end conditions are vf = 7617 ft/sec, yf = 16.636 deg., hf =

243,600 ft. In trying to match the final conditions, the values ofS_, S.,, and S3 are chosen to be 1,

30

1,and106.Thismeansthatwedesireto try andmatchthefinal flightpathanglemorecloselyrelated

to maximizationof final weightandmatchingthe desiredfnat velocity. Effect of changesto initial

flightpathanglearepresentedinFigures5-7. Wehavefixedtheinitial velocityandmassandchanged

theinitialflightpathangles. It canbeobservedfrom Figure5that afterfollowing differentpathsof

velocityin Stage1for thefirst 12.2seconds,all the l0 pathstry to converge;thesametrendcanbe

seenin Figure6 whichshowsthe flightpathanglehistories. Due to the relativeemphasison the

flightpathangle,wecanobservethat theflightpathanglesaremoreconvergentto thedesiredfinal

valuethanthevelocities.Theweighthistoryisalmostinvariantsincethe thrustisalmostconstant.

Figures8-I 1representthemass,flightpathangle,velocity,andaltitudehistorieswith timewherewe

changetheinitialmassin steps.Theeffectivenessof thisformulationis clearfrom theflightpathangle

historypresentedinFigure9. Eventhoughtheinitial step(dueto changesin mass)leadsto different

flightpathangles,thecontrolfrom thelaststagebringsthemveryclose.Althoughvelocitiesappear

divergent,it shouldbeobservedthattheyarescatteredcloseto thedesiredfinalvalue. Thealtitude

historyisverycloseto thesameinall thecasesasexpectedandsatisfiesthefinal condition. Figures

12-15representthestatevariablehistoriesdueto changesin thevelocities.Dueto thedivergence

of theflightpathanglevalueat theendof the first stage,thesecondstagevelocitiesshowapparent

deviationsfrom the desiredvaluesothat theresultingsecondstageflightpathanglescanbecloser

to thedesiredvalue. Theslightvariationsin thefinal altitudearedueto theforward integrationin

timewhichwe limitedto 20.4seconds.

31

VII. CONCLUSIONS

An approachto solving'free finaltime' problemswith anenvelopeof initial conditionshas

beenproposed.Thisapproachcalled'the adaptivecritic' consistsof two neuralnetworksat stage

developedinabackwardsweep.Afterdevelopment,onlythecontrollerisusedin forward integration

of trajectories.Numericalresultsfrom thelaststageof a launchvehicletrajectory(providedby the

customer)showthatthisapproachworkswell andcanbeusedin design.Furtherwork will involve

integrationwith POST3D,considerationof theotherphasesof flight etc.

VIII. ACKNOWLEDGMENT

We gratefullyacknowledgethepartialsupportprovidedbyNASA grantNAG-l-1728.

32

-- BIBLIOGRAPHY

[1

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

N. Wiener, Cybernetics: or Control and Communication #1 the Animal and the Machine.

Cambridge, MA: MIT Press, 1949.

W. Ashby, Design for a Brain. New York, NY: Wiley & Sons, 1952.

W. Ashby, An hztroduction to Cybernetics. New York, NY: Chapman & Hall, Ltd., 1957.

K. S. Fu, "Learning Control Systems," in Computer andbzformation Sciences (J. T. Tou and

R. H. Wilcox, eds.), pp. 318-343, Washington D. C.: Spartan Books, 1964.

J. Sklansky, "Learning Systems for Automatic Control," IEEE Transactions on Automatic

Control, Vol. AC-11, pp. 6-19, January 1966.

Z. Nikolic and K. Fu, "An Algorithm for Learning without External Supervision and Its

Application to Learning ControI Systems," [EEE Transactions on Automatic Control, Vol.

AC-11, pp. 414-422, July 1966.

Y, Tsypkin, "Optimization, Adaptation, and Learning in Automatic Systems," in Computer

andbformation Sciences-KK (J. T. Tou, ed.), pp. 15-32, New York, NY: Academic Press,

1967. Proceedings of the Second Symposium on Computer and Information Sciences held

at Battelle Memorial Institute, August 22-24, 1966.

Y. Tsypkin, "Self-Learning--What is it?," IEEE Transactions on Automatic Control, Vol.

AC-13, pp. 608-612, December 1968.

Y. Tsypkin, Adaptation and Learning in Automatic Systems. New York, NY: Academic

Press, 1971.

J. Mendel and R. McLaren, "Reinforcement Learning Control and Pattern Recognition

Systems," in Adaptive, Lectrnmg, cmd Pattern RecogT#tion Systems: Theory and AppBcations

(J. Mendel and K. S. Fu, eds.), pp. 287-318, New York, NY: Academic Press, 1970.

A. Barto, "Connectionist Learning for Control: An Overview," in Neural Networks for

Control(W. T. M. l-I, R. Sutton, and P. Werbos, eds.), ch. 1, pp. 5-58, Cambridge, MA: MIT

Press, 1990.

B. Widrow, N. Gupta, and S. Maitra, "Punish/Reward: Learning with a Critic in Adaptive

Threshold Systems," IEEE Transactions on Systems, Matt., and Cybernetics, Vol. SMC-3,

pp. 455-465, September 1973.

33

[13]

[14]

[15]

[16]

[17]

[18]

[19]

A. G. Barto, R. S. Sutton, and C. W. Anderson, "Neuronlike Adaptive Elements That Can

Solve Difficult Learning Control Problems," IEEE Transactions on Systems, Matt., and

Cybernetics, Vol. SMC-13, pp. 834-846, September/October 1983.

R. Sutton, Temporal Credit AssigvmTent m Re#forcement Learning. Phi9 thesis, University

of Massachusetts, Amherst, 1984. This is the origin of the Adaptive Heuristic Critic

algorithm.

P. Werbos, "Backpropagation and Neurocontrol: A Review and Prospectus," in Proceedings

of the International Joint Conference on Neural Networks, Voi. I, (Washington D.C.), pp.

209-216, June 18-22, 1989.

J. Jameson, "A Neurocontroller Based on Model Feedback and the Adaptive Heuristic

Critic," in Proceedings of the bTternational Joint Conference on Neural Networks, (San

Diego, CA), pp. II37-II44, IEEE, June 1990.

D. A. Sofge and D. A. White, "Neural Network Based Process Optimization and Control,"

in Proceedings of the 29th Conference on Decision and Control, (New York, NY), pp. 3270-

3276, IEEE, December 1990.

J. Albus, "A New Approach to Manipulator Control: The Cerebellar Model Articulated

Controller (CMAC)," Transactions of the ASME Journal of Dynamic Systems, Meas_lrement

and Control, Vol. 97, pp. 220-227, September 1975.

C. Watkins, Learning with Delayed Rewards. Phi) Thesis, Cambridge University, 1989.

[20]

[21]

[22]

[23]

[24]

C. Watkins and P. Dayan, "Q-learning," Machine Learning, Vol. 8, No. 3-4, pp. 279-292,

1992.

S. Bradtke, "Reinforcement Learning Applied to Linear Quadratic Regulation," in Advances

in Neuralbforma#on Processing Systems, pp. 295-302, San Mateo, CA: Morgan Kaufmann

Publishers, 1993.

V. Gullapalli, "A Stochastic Reinforcement Learning Algorithm for Learning Real-Valued

Functions," Neural Networks, Vol. 3, pp. 671-992, 1990.

M. Stamenkovich, "An Application of Artificial Neural Networks for Autonomous Ship

Navigation Through a Channel," in Proceedings of the Vehicle Navigation & Information

Systems Conference, (Dearborn, M-I), pp. 475-481, October 20-23, 1991.

R. Shelton and J. Peterson, "Controlling a Truck with an Adaptive Critic CMAC Design,"

Simulation, Vol. 58, pp. 319-326, May 1992.

34

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

C. Tham and R. Prager, "Reinforcement Learning Methods for Multi-Linked Manipulator

Obstacle Avoidance and Control," Tech. Rep., Cambridge University Engineering

Department, Trumpington Street, Cambridge CB2 1PZ, UK, March 25, 1993.

D. Gachet, M. Salichs, and J. Pimentel, "Learning Emergent Tasks for an Autonomous

Mobile Robot," Tech. Rep., Dpto. Ingenieria, Universidad Carlos III de Madrid, Spain, 1994.

B. Bavarian, "Introduction to Neural Networks for Intelligent Control," IEEE Control

Systems Magazine, Vol. 8, pp. 3-7, April 1988.

W. Miller, R. Sutton, and P. Werbos (Eds.), Neural Networks for Control. Cambridge, MA:

The MIT Press, 1990.

D. White and D. Sofge (Eds.), Handbook of hTtelligent Control: Neural, Fuzzy, and Adaptive

Approaches, New York, NY: Van Nostrand Reinhold, 1992.

K. Hunt, D. Sbarbaro, R. Zbikowski, and P. Gawthrop, "Neural Networks for Control

Systems-A Survey," Automatica, Vol. 28, No. 6, pp. 1083-1112, 1992.

D. Psaltis, A. Sideris, and A. Yamamura, "A Multilayered Neural Network Controller,"

IEEE Control Systems Magazine, Vol. 8, pp. 17-21, April 1988.

A. Levin and K. Narendra, "Control of Nonlinear Dynamic Systems Using Neural Networks:

Controllability and Stabilization," 1EEE Transactions on Neural Networks, Vol. 4, pp. 192-

206, March 1993.

K. Narendra and K. Parthasarathy, "Identification and Control of Dynamical Systems Using

Neural Networks," 1EEE Transactions on Neural Networks, Vol. 1, pp. 4-27, March 1990.

B. Fernandez, A. Parlos, and W. Tsai, "Nonlinear Dynamic System Identification Using

Artificial Neural Networks (Anns)," in Proceedings of the International Joint Conference on

NeuralNetworks, Vol 2. (New York, NY), pp. 133-141, IEEE, June 17-21, 1990.

M. Polycarpou and P. A. Ioannou, "Identification and Control of Nonlinear Systems Using

Neural Network Models: Design and Stability Analysis," Tech. Rep. Report 91-09-01,

Electrical Engineering, University of Southern California, Sept. 1991.

A. Guez, J. Eilbert, and M. Kam, "Neural Network Architecture for Control," IEEE Control

Systems Magazine, Vol. 8, pp. 22-25, April 1988.

D. Hoskins, J. Hwang, and J. Vagners, "Iterative Inversion of Neural Networks and Its

Application to Adaptive Control," IEEE Transactions on Neural Networks, Vol. 3, pp. 292-

301, March 1992.

35

[38]

[39]

[40]

[41]

[42]

[43]

[44]

J. Hoskins and D. Himmelblau, "Process Control Via Artificial Neural Networks and

Reinforcement Learning," Computers and Chemical Engineering, Vol. 16, pp. 241-251, April1992.

W. Goldenthal and J. Farrell, "Application of Neural Networks to Automatic Control," in

Proceedings of the AIAA Conference on Guidance, Navigation, and ControL, (Portland,

OR), pp. 1108-1112, August 20-22 1990.

M.-S. Lan and S. Chand, "Solving Linear Quadratic Discrete-time Optimal Controls Using

Neural Networks," in Proceedings of the 29 'h Cot_erence on Decision and Control,

(Honolulu, HI), pp. 2770-2772, December 1990.

D. Tank and J. Hopfield, "Simple Neural Optimization Networks: An a/d Converter, Signal

Decision Circuit and a Linear Programming Circuit," IEEE Transactions on Circuits and

Systems, Vol. CAS-33, pp. 533-541, May 1986.

Y. Iguni, H. Sakai, and H. Tokumaru, "A Nonlinear Regulator Design in the Presence of

System Uncertainties using Multilayered Neural Networks," IEEE Transactions on Neural

Networks, Vol. 2, No. 4, pp. 410-417, 1991.

A. Bouzerdoum and T. Pattison, "Neural Network for Quadratic Optimization with Bound

Constraints," 1EEE Transactions on Neural Networks, Vol. 4, pp. 293-304, March 1993.

J. Antony and L. Acar, "Real Time Nonlinear Optimal Control Using Neural Networks," in

Proceedings of the American Control Cotference, Vol. 3, (Baltimore, MD), pp. 2926-2930,

June 29-July 1, 1994.

m

36

C_'_rZ'ZC

_f_QDEL Dq':r_ K'y

: ..... > TARGET= _A@:)x(:)

Figure I: Adaptive Critic for Control

37

I

i

X

c_

S

c-b

g\

L

cO

cn_

i

°_

w°_

J

v

V

Z_ v>4

v

c"0

V

X

c-O

L

V

XV

-,-I

0

_JZ

0.,-t

0

io

-,-I

38

m

X

<9 --

.,4

0

Z

0

.,-4

,o

39

Local VerticalT

LL_ _ _/, vV

__ Local Horizontal

\ \ \_ _ /_ Center of the earth

Figure 4: Schematic of the Trajectory

Optimization Scenario

40

7600

7550

o 7500

.m

o0

> 7450

7400

73500 2 4 6 8

Figure 5 :

0 12 14 16

Time (sec)

Velocity History

(with changes in initial gamma)

18 2O

4[

18"2I

18

17.8 ........................ ........................ : ..........................

17.6

_17.4 .........

"U

17.2 ..............................EE

© 17

16.8 ................. ........ ......... : .........

16.6 ............... ........................... :......

16.4 .................. : .................................................. : ............................

16.2 I , , , r0 2 4 6

Figure 6:

8 10 12 14 16 18

Time (sec)

Gamma History


2O

42

x 10 410.3

10.2 .........................................................................

10.1 ..............................................................

10 ........

o719.6 ........................................................................

9.5 ..................................................................... .........

9.4 ' ' _ '

-_ 9.9mv

f-CYl

9.8

0 2 4 6

Figure 7:

8 10 12 14 16 18

Time (sec)

Weight History


20

43

10.3x 104

10.2

10.1

10

%"9.9

1v

c-G3

9.8

9.7

9.6

9.5

9.40

Figure 8:

I ! t ' 1 I

2 4 6 8 10 12 14 16 18 20

Time (sec)

Weight History

(with changes in initial mass)

44

18

17.5

03d3

v

E

17

16.50 2 4 6

Figure 9:

8 10 2 14 16

_me (sec)

Gamma History


18 20

z_5

7620

7600

7580 ....................... ................................. _................

7560 .........................................................................

"6"7540 ................................................

_7520 ...........................................................

0

> 7500 ....................

7480

7460

744O

74200 2 4 6 8

Figure i0:

10 2 14 16

Time (sec)

Velocity History


18 2O

_6

2.5

2.4

2.3

v

_2

.-AM

<

2.1

2

1.90

x 10 s

: : : : :

F l

2 4 6 8 10 12 14 16 18 20

Time (sec)

Altitude History


Figure II:

47

18

0-_

17.8

17.6 -

17.41

_3'u 17.2 ....................

EE

©17

16.8

16.6 ................ : .......... ..................................

16.4- ..........................................................................

16.2 L , , i r ' ,0 2 4 6 8

Figure 12:

10 12 14 16 18

Time (sec)

Gamma History

(with changes in initial velocity)

2O

48

x 10 4.10.3 , ! b

10.2 .................. ......... ................. .................. : ........................

10.1 ........................ ......... : ....... .......................................

10

%-.-Q 9.9

c_

9.8

9.7

9.6 ......... :......... : ........ :................... :.......... ' ......... i ..........................

9.5 ............... ....................................... : .....................

9.4 , , i !0 2 4 6 8

Figure 13:

10 12 14 16 18

Time (sec)

Weight History(with changes in initial velocity)

2O

_9

7650

7600

7550

75oo

07400

7400

7350

7300 L0 4 6

Figure 14:

8 10 12 14 16 18 20

Time (sec)

Velocity History(with changes in initial velocity)

50

2.5x lO s

0 2 4 6

Figure 15:

8 10 12 14 16 18

Time (sec)

Altitude History

(with changes in initial velocity)

2O

51

APPENDLX

52

APPENDIX

A Class of Modified Hopfield Networks

for Aircraft Identification and Control

Jie Shen S.N. Balakrishnan"

Department of Mechanical and Aerospace Engineering

and Engineering Mechanics

University of Missouri-Rolla

Rolla, MO 65401

(573)341-4675

Abstract

This paper presentsa classofmodifiedHopfieldneuralnetworks and theiruse in

solvingaircraftoptimalcontroland identificationproblems.This classofnetworksconsistsofparallelrecurrentnetworks which have variabledimensions that can be

changed tofitthe problemsunder considezal;ion.Ithas a structureto implement an

inversetransformationthatisessentialforembedding optimalcontrolgainsequences.

Equilibriumsolutionsaxe discussed.Energy minimizationofthe networks leadstoidentificationofthe system parameters.Numerical resultsareprovided toidenti_v

the dynamics ofan aircraft,and the correspondingoptimalcontroliscalculatedon-

line.Comparison of the neuralnetwork solutionswith point-wiseoptimalcontrol

using LQiW.formulationforthismuitivaxiablecontrolproblem shows nearidencica/

resultsthroughoutthe trajectories,

I Introduction

There has been a spurt of activities in the area of

artificial neural networks (ANN) during the last ten

years. For a survey of the ANN work done in the

areas of identification and control, see bibliography.

There are two types of networks used in almost allANN applications. The first is the more widespreadfeedforward network and the second is a less un-

derstood recurrent network. The feedforward net-

works where data flow is unidirectional are essen-

tially static; the recurrent networks, on the otherhand, are based on feedback connections. Due to

feedback connections, the recurrent networks are

better suited for control problems which are based

on closed-loop solutions.

In this paper, a variation of the Hopfield net-

work is proposed. Compared to the classic Hopfieid

network, it keeps the characteristic of energy min-

imization, which is used to minimize the identifi-

cation errors. The mean-square error is used as a

performance criterion in system identification, andis formulated in an energy form to utilize the net-

work functionality. Based on the equilibrium analy-sis, these networks can perform an inverse transfor-mation on matrices and other auxiliary mathemat-

ical operations. This feature allows the networksto give out optimal control gain sequences based on

the identified system parameters. In addtion, this

class of networks has more degrees of freedom thanthe classic Hopfield networks. The network archkec-

ture can be augmented according to the problems athand.

The modified Hopfield network is analyzed in sec-

tion 2. Its identification application is presented in

section 3, while the control application is in section

4. Both the principles and examples are given in

"Associate Fellow, AIAA (to whom all correspondence should be sent)

I

American Instituteof Aeronautics and Astronautics

eachind_viduMsection.Conclusionsarepresentedin section5.

2 Nlodified Hopfietd iN-etworks

2.1 Stability

T_e modified Hopfie[d network isa varianto[ the

da_sicad Hopfield network. Fig (1) shows ks basicfeatures.

We will demonstrate its stability by analyzing its

dynamics and using energ-y _ncfion. The network

has two clusters of neurons. The right part of the

networks is characterized by outputs @i which are

nonlinea_ functionsf of theirs_a_eu_

% = f(_,) (1)

where

uj =iw#vi-b_, j=t,2 .... ,m (2)i=].

with bj the exogenous input current, and v_ the out-

pu_ of the left cluster of amplifiers. Conductance wijconnects the output of the j'thneuron to the input

of the i'thneuron, which are indicatedinFig (i) as

I.

The leftpar_ of the networks ischaracterizedby

the dynamics. The amplifiershave input conduc-

lancesand capacitancesdenoted asgiand c_,respec-

tive!y.They both represen_ the a_mpiifiers'particle

input impedance and are responsibleforthe appro-

priatedme-doma/n behavior of the entirenetwork.

._.t_he same time, we assume that _he responsetime

of _(uj) isnegligiblysmall compared to that of the

ampkLfiersg(u_).

Under these assumptions, f(irchhoff'slaw _ves

US

d_ _-C_ d--?= -a_ - Gim - y__w#_j,

:=_ (3)

(i = ].,__.... ,_)

where Gi denotes _he sum of all conduc_ances con-

nec:ed to the input of _he i_h neuron and is equal

tO

Gi _ag, --fi _i] (4)1_--L

and ai isthe exogenous knput current.

Using Equation (1), the above formula can beexpressed as follows

j=t k=_

(i = t, <..., n) (5)

We now d&,ne _he followingLiapunou fi.mc.ion

as an energy function Efor the modified Hopfieidne_worka

E(_) a_u_ + F w_u_ - bjk=l 7=I

+_a, _-_(_)d_, (_)

Define

dF(:)/(_)= __- (r)

The components of the g-rad[ent vector of the as-

sumed energD" [unc_ion (6) can be expressed by find-

ing its derivatives as follows

OE(v) i ±-- = ai + Giai + w_if( w_jv_ - b_):=_ _=_ (8)

The tkne derivative of the energT _nction can

now be expressed using the above equations

dE

dt -_ al + u_Gi + fujii=t 7:1

fi dvi du{= - C, _-T" _-7

-- - i c'_-t (Vd k at /i=t

(9)

Since C, > 0, and g-t(vi) is a monotonically La-

creasing .;unction n, :he sum on the righ_ sight of (9)is aormegal:ive, and there/ore w e have dE�dr <_ O,unless d_i/dt = O, in which case dE�dr = O. Thismeans that the evolution of dynamic sys=em (5) in

state space always seeks the minima of the energysurface E. Imega=ioa of Eqs. (5) and (6) shows

tha_ _he outputs v i do followgTadJen_descen_ pathson the E surface.

2

American Institute of Aeronautics and Astronautics

----o-----------O

_-_b--_T --,_---._I °' i :'

\¢/ ....

; I

1 I

°i

,Qio

Fiomare h .Modified Hopfie[d Networks

2.2 Solution

In order _o get the anal_ic expression for the con-

verged value of the networks, we assume small sig-

nals and that they work in the linear region of the

amplkier. 'Note that in _he above derivation, there is

no di_,erence if.-redenote the connection matrices in

_he leftand right adjoint subnets separately. These

connection matrices are no<king but the weights w_.

Le_ _he righ_ connection matrix be DI, and the lef_

connec:ion matrix be Do_, the stability conclusion

stillholds. [inder these mild assumptions, and with

ffrchho_s law, we can have a relation in a matrix

form as

dl/

C de - a- GU - D_Q (i0)

q = K (D V-b)

= Ko(KID?U - b) (II)

where a and b are the exogenous inouts of the ad-

join_ networks G and F respectively. U is the inpu_

to G and V is the output: of G. _,Vealso assume

tha_ all aa:npiiSergains Kt fn G are equal. Similarly

_he ga/ns of aanpki_ers in F are K2. K1 and /(? are

sca/axs. Suhsutu=e Equation (Ii) into Equation (I0)

to get

CdI7dt

- a - GU -

Dr_ K_.(K:D,.U - b) (12)

= -(G + ._',.KtD_rD:)U

+KtD_rb - a (13)

When the networ "_

dIJ/dt = 0, and

V = K:U

= /DrD: + GKt K, )\

reach equilibrium,

2.3 Discussion

Equation (14) _ves the general solution for the mod-

i._ed Hopaeld networks. Compared with the classi-

cal Hopfieid networks, an obvious /ea_ure is tha_ i_

involves more parameters. We may _nd some ap-

plications in which these parameters can be taken

advantage of. Also some of them can be avoided

depending upon _he desized objective.

Note we get two fat:ors involved in the averse

operation. As a resut_, the s_ru_ure of _his .kind of

recurren_ networks is quite flemble, while the clas-

sical Flopfieid is self-recurrent, that is, i_ feeds back

itsov,-n output; the wariadon is mutually recurrent,

that is, [_ feeds back :he outputs of ks _wo-adjoin_

3

American tnsu_ute of Aeronautics and Astronautics

pans. This architecturecanbeexpandedfurther,_-ithease to three or four subnets or several layers

needed. Some special appiicacions may need that

computationa/relationship, but it is not needed for

the application considered here.

The dimensions of parameters a, b, DI and Do.

depend on the applications. KL and K2 also can be

desired to provide appropriate magnkudes. If Kt

IS large, then G and a will both have [es3 effect on

the output V or ig-norab[e. If we want a have rea-

sonable influence in _he expression while G should

not, then we design Ko large, and determine [(i ac-

cording to the requirements on a.

and (.)r is the transpose of macrt'<. (see, R;_.ol, Bib-

liography)

E= T!/or ;_q(:)r_q(t)dt_

troT1 •= _ [(x- A,x- B,u) r

•(i - A,x - B,u)dt

In order to facilitate the deriv'at[on, we expand

the kems in the factors of the ener=_/ function, and

utilize the trace identities to sknp[[_.

3 System Identification

3.1 Problem Formulation

The proposed structure for system identification in

the time domain is shown in Fig (2). The dynam-

ics of a linear pLant (to be identified) are defined by

the usual equations, where .4p and Bp are unknown

matrices and z and u are the state and control re-

spectively.

.%= Apx + Bpu (15)

The dynamic equation of the system mode[ de-

pends on e, which is the error vector between actual

system s=a_es x and eszimated values y.

_"= A,(e, t)x + B,(e, t)u - Ke (16)

Therefore, the error dynamics equation is a func-

tion of state and control.

= (Ap -- A,)x + (Bp - B,)u + Ke

(17)

The goal is to minimize simultaneously square-

error rates of all stares utilizinga Hopfield network.

To ensure global convergence of the paxameters, the

enero_y function of the network mus_ be quadratic

in terms of the parameter errors, (Ap- A.,) and

(Bp -B,). However, the error rates e in Eq. (IT)

axe functions of the parameter errors and the s_ate

errors. The state error depends on y, which, L_ turn,

is inHuenced by A, and B,. Hence, an ener_ func-

tion based on _ will have a recurrent relation with

A., and B,. To avoid this, we use the follow'ragen-

ergy hmc_ion, _here tr defines the trace of a marrL_,

E = _r

+ tr

4- tr

4-

A,

i xuTd tA, T [r

J0

fT T ]

B, T)

))

(19)

Equation (19) is quadratic in terms of A, and

B,. SubstitutLng ApX 4- Bpu for i m Eq. (19) in-

dicates that E is also a quadratic function of the

parameter errors. Based on Eq. (19), we can pro-

=='ram a Hopfie[d network chat has neurons with their

states representing dhYerenc elements of the A, and

B, matrices. From the convergence properties of the

Hopfietd network, the equilibrium state is achieved

when the paxzial derivatives OE/OA, and _E/OB,

are zero. We use the following identities_o find the

parziai derivatives of E.

t_ (,4_BAr) = 2,-ta (2o)&4

O__.tr(ABD) = BrD r (21)bA

This results in the following, where A_ and B:

are optimum solutions of the estimation problem.

American Insutute of Aeronautics and Aszronau=ics

. AI_GLK 0 P AT-FACK , / ""

Y" FUGHT _ATH A'_tGLE" i• %'U._OCIT"Y _._/""

a • PrT_H ,t_GLEV.'"

q - PFrCH P.ATE ...%//

Figure 3: Schematic of Lon_tudinal FHght

De6ne,

[ ,Jl = ¥

"xx T 0 0 0 xu 0 0

0 _x T 0 0 0 xu 0

0 0 XX T 0 0 0 XZL

0 0 0 XX T 0 0 0

ux r 0 0 0 u 2 0 0

0 ux r 0 0 0 u 2 0

0 0 ux r 0 0 0 42

0 0 0 ux r 0 0 0

0

0

0

XU

dt0

0

0

I_2

C d=¥ =3×(2g)

With these as weights and biases of _he networks,

a_j, and bj can be solved _hrough Eqs. (27) and (28).

Derivation of [wu] and [al]assumes tha_ the neuron

input conductance, Gi, islow enough so that _he the

second term in Eq. (3) can be negiected.

3.2 Numerical Example

We presen_ a representative numerical example to

va/ida_e the capacities of the modiEed Kop6e[d ne_-

works. The oden=a_ion of an aircraftinvolving [on=_.-

_udina/dynamics is shown in Fig (3). The linearized

equauions of motion of an aircraf_ in a vertical plane

are _ven by

._ =Ax + Bu (30)

where, _he elements of :he stare space x are

x=[_' _ 8 q]r (31)

The matrhx .% represents the dynamic s_ability

derivatives and [s _ven by

--0.0148 -13.88

-0.00019 -0.34

"% = 0 0

0.00005 -4.S

-32.2 0

0 1

0 1

0 -05

The matrLx B represents _he control derivatives and

is _ven by-I.I

-0.ii

B_ = 0

-8.74

The control vzriabie u represents elevator deflection.

Fig (4) shows :he simulation results of the system

identification. These flgmres represen_ only A_tz,

A_2, B_2, and B_ histories;simL1ar results can be

obtained fo_ other elements of :he A_ and B_ matri-

ces. From _.henumerical results shown in Fig (4), [t

is clear cha_ the network is able _o identify system

parameters very well.

4 Optimal Control Application

4.1 Problem Formuladon

ie_ _he plant. _o be con_roLled be described by the

linear equation

=_+t = A_z_= + B_u_ (32)

w/_h =_ E I%= and u_ E IZm. The associated perfor-

mance index is the quad:anc ffmction

l T !'_-7"_ _" _"

- - _= (33)

6

American [rmd_ute of Aeronautics and Astronautics

:ii : rE:iilI

0 I 2 _ _ _ _ ? 8 _ IQ

-3

= ,, L ,

1"_'_*( = ,,¢:)

Figure 4: Identification History

de_ned over the time [nterval of [nteres_ [i,N]. Note

that both the plant and the ¢os_-weighdng matrices

can be time-vaxy/ng. The {nitia/plane s_a_e is _ven

as z_. \Ve assume thac Q_:, R_ and SN are symmetric

positive semidefnke matrices, and is addition thag

Ia_i# 0 _'or all k.

The objective [s to fund the control sequence

u{, u___,. .. ,uN-_ to minimize J_.

To solve _his linear quadratic regq.t[a_or(LQR)

pcob[em, we bezin with the }-IaxnJ/tonianfunction

(3_)

and the stationaxity condition

0 = OJ--! = R_,_ + B[_\_+_ (37)

This procedure will finallylead to the control,

where the Kalman gain K_ is given by

K,= (B/S,+,B_+a_)-_B_S_÷_.&_

Then we can ge_ the szate and cos_ate equations

-- = A_=.z_+ B_u_ (35)

In _erms of the R/cacti variable S_, now

S_ = r

7

American InsziCuce of Aeronau=ic_ and As=ronau=ics

(3S)

(39)

L

Q

Figure 5: Simulation plo:

In the application where the control in_er_-a[is

finite,S_v will be _ven. Alternatively use Equa-

tion (38) and (39), we will gec a series of/<'e. The

gMn matrix /<_ v,illgenerally be _ime-var-ying even

when the matrices .4_, B_, O;= and E_ are all con-

stan_. But ifthe control interval isinfinite,the above

formu/arion need to be changed a little.

4.2 _'etwork Solution/Implementation

%Ve brieflydiscuss the recuzrent network solution for

optimal gain sequence.

Based on the recursions in Equations (38) and

(89), the most commonly encoun=ered opera=ions are

scalar and outer product vector multip[icadons and

macrt,c-vector multiplication. Bu_ the crucial oper-

ation here is the reverse to gec the Kalman gain.

The modiEed Hop6eld networks contain both Lu-

variant and variable parameters. Invarian_ parame-

ters are fuxed [n _he neuron-compu=ing model, while

v-ariableparameters can be modified. By compa__.ng

Eqs. (38) and (39) with the s=able outpu= of the ne_.-

work Eq. (14), ifwe se= D_ T = B_rS_+_, Do- = B_,

G = R_, and b = A_, a = 0, the networkK, W,give us the Kaknan sequence..-is we know, i= [s noc

di;_cu/_ for the circuks to achieve the multiplication

of two signais. However, since D_ and Do are con-

nection conductances, can _hey be changed by ocher

sig-na[s like S_ S_ _ z and B_ ?

The answer is a voltag_con_ro[led s_icch. A

voltage-controlled switch can be implemented usin_

a single _eldoe_'ecu or MOS transistor operating in

the resistive(ohmic, also called linear) reg'.on. So, all

the slg-naisare preferred _o be voltage signals. The

system parameters A_ and .9_ are generally the ouc-

pu_s of identification modules which are convenient

_.o be _ven out as voltages. The optimal control

formulation does no_ [im/_ _he X_, S_, O_ and R_

matrices to be constants and _he modl_ed [-Iop_eld

HopEeld network doesn't [Lrnk its capacities either.

Tlme-v-_g A_, S_ e_c. are easy to be feed into

the nec as voltage sig-nalsto be used in =he compu-

tations.

4.3 Numerical Example

%Ve consider the synr.hesis of an optimal lon_udi-

nai autopL[o_ in _his section. The performance index

in _h/s application is an infinite-dine quadratic cos_

func:ion. The niazinizing control isexpec:ed to drive

the deviations of _he [on_udinal dv-na._xucs[n pi:ck

angie 8, piuch race q, forward ve[ocky u', and andle

of attack a _o zero.

Ame_,/can Institute of Aeronau=ics and .Ls_ronautics

I

2P ..... J

l I

:L'I ̧: :- • [ ..... i ¸ " : ........ :............. i......

P _

- 0 1 2 1 _ 5 G 7 3 3 lO

Figure 6: Control His_ouI

The s?'s_em parameters are she same as identifi-

cation. The performance index has the form

//j = (×rq× + ) dt (ao)

where Q, and R are appropriate weighdn_ matrices.

We seIec_ R = 91.32 and

"10.37 0

0 0.0004

Q = 0 0.0016

0 0

0 0

0.0016 0

7.25 0

0 14.84

The simulation plot is shown isFig (5). The con-

trols which are calculated by networks, compared

,with LQR results are shown [n Fig (6). The sta_es

_rajec_ories are shown in F[_ (7). The controls are

applied at '2seconds.

5 Conclusion

A class of modified HopEeid necwor_ has been pre-

sented to solve para.me_er identificationand opc£mal

control problems. The architectures are designed

to suit an ener=o-ym_inim/zacion for system iden=i_-

cauon and a _ypical opcunal contro[ algorithm for

syscem controL Shrnilax to the Hopfield he=work,

_he stability of these modified networks [s guarAn-

teed. Bu_ _hey provide more de_rees of freedom and

5exibiiity _o accommodate differen_ applications. A

four-dimensional aircraft con=to[ problem is iden-

"_Lfiedand optimal control is obtained as illus=ra-

_ions of these approaches. Future work on this _opLc

wiil investigate the robustness of such network con-

trollersand _he use of these methods for ocher rele-

va.a_ apphcauons.

A CKNO WL ED G _VIEIV'T

This s_udy was partially funded by

NSF Grant ECS-9313946, the Missouri

Department of Economic Developmen_

Cen_er [or Advanced Teci'mo[o=o-y Pro-

_am and by NASA Grant NAGI-1728.

Bibliography

i. Balakhshnan, S.N. and Weft, R..D., "Neuro-

control: A literature_urvey', _fathi Compu_.

,_{odelEng, Vo[. 23, :qo. 1/2 pp. i01-I17, 1996.

2. Hun_, N.F., Sbarba_o, D., Zbikowski, I<. and

Gaw-_h.rop, P.J., "Neural Networks [or Control

System - A Survey,:' Automatic.a, Vo[. 28, No.

6, pp. i083-II12, 1992.

3. Hop6eld, J.J., and D. W. Tank. 1986. _'Com-

puung w%h Neural Circuits: A Mode[," Sci-

enc_ 233: 625-633.

4.._[ftler, W.T., Suttom I:LS. and \Verbos, P.J.

_Yeural _Ve_works /or" Contrv_, M_IT Press,

Caznbrid=_e, MA, 1990.

9

American [nstituce of Aeronautics and As=ronau=ics

'° /1 J//_ 4;- .............. • •

_I- .... :'r ': I :_ ................................. " J

I 2 3 • S _ ? ,,3 ; 10

i

1-20 f ..............

..30 i0 1

0

:O r

I

20[ .................................. f_ ]

101- ......... i _. . .........

!:

i

3 4 i _ 7 t _ 10

Fig-ureT: Sta:esTrajectories

5. Whi:e, D.A. and Serge, D.A., f-/andbook of In-

teiligent Control - .VeuraI, .z',azzy, and Adap-

tive Approaches, Van Nos_rand Reinhold, New

York, 1992.

6. R2_[, J._q.., Parameter estimation o� state

space models by recurrent neurat networks, lEEPrec.-Control Theo_" Appl., Vol. 142, No. 2,

pptl4118, March 1995

7. Naxendra, K.S, and Paxzhasarathy, K., "Iden-tilcarion and Control of Dynamical Systems

Using Neural Networks', IEEE Trans. onzVeuraI :Vetworks, VoL 1, No. 1, pp. _26,

March, 1990.

8. DARPA .Veurat _Vetwork S_udy, Faiffa.x, Vir-

gin.ta: AFCEA [.n_. ?re._s, 1988

9. Kopfie[d, J.J., '_T_he Effectiveness of .knalog_-e'Neural N'e_work' Hardware,".Vetwork 1: 27-

10.

!1.

40, 1990

Kopfieid, J.J., ';Neurons with Graded Re-

sponse Have Coilec:ive Computational Pro_er_ies Like Those of Two Sta_e Neurons," Prec.

_Vationa_ Academy o� Sc:ences 81: 3088-3092,

1984

Hopfie[d, J.J., "Neural Networks and Physical

Sysrems with Emergent Celiac=ire Compu=a-tional Abilities," Prec. #,rational Academy o/Sc.ences 79: 2554-2558, i982

12. Speciaf Section on :Veural _Vetworks for Con-

trot Systems, IEEE Con_. Sys. May., Vol. 9,

No. 3, pp. 25-59, April 1989.

I3. Special issue on _Veural zVetworics in Contro_

Systems, I'EEE Con_. Sys. >tag., VoL i0,No.

3, pp 3-87,April 1990.

!0

American Insu_u=e of Aeronautics and As_ronau=ics

I4. Kamp,Y., andHaster,M., Recnrsive :Venrat ._Vet-works /or Assoc:e_ive 5'fernorS', Chiches_er, U.K.:

John Wiley & Sons, I990

15. Friedlaad, B., Control $Tjstem Design, .McGraw-Hili

Book Company, 1986

16. Simullnk, D_jnemic Sl/stem Simulation Software

Fc>.r the 2( Windsw S_s_em, Tb.e Ninth Works Inc.,

1993

11

American Ins=i_ute of Aeronautics and Astronautics

HYPERSONIC VEHICLE TRAJECTORY … vehicle trajectory optimization and control ... mo 65409-0050...

Documents

Transcript of HYPERSONIC VEHICLE TRAJECTORY … vehicle trajectory optimization and control ... mo 65409-0050...