From: AAAI Technical Report SS-0 , AAAI ( ...€¦ · Analysis and Design of Hybrid AI/Control...

Analysis and Design of Hybrid AI/Control Systems

Carl Glen Henshaw and Robert M. SannerSpace Systems Laboratory

University of Maryland, College Parkmachine @ ssl.umd.edu

Abstract

This paper discusses the nature of interactions between high-level intelligent agents and low-level control algorithms.Control algorithms offer the promise of simplifying a dy-namic system’s apparent behavior as perceived by an intel-ligent agent, thus making the agent’s task much easier. How-ever, the coupled dynamics of such a hybrid system can bedifficult to predict and may lead to undesirable behavior. Wedemonstrate that it is possible for a rational intelligent agentacting on a well-controlled dynamical system to cause un-desirable behavior when coupled, and present a method foranalyzing the resulting dynamics of such coupled, hybrid sys-tems. A technique for alleviating these behaviors using newlydeveloped control algorithms is then suggested. These con-trollers, which are adaptive in nature, also suggest the possi-bility of "distributing" learning and intelligence between thehigh and low levels of authority. A new architecture for hy-brid systems encapsulating these ideas is then suggested.

IntroductionThe deployment of dynamically complex autonomous sys-tems such as NASA’s New Millennium spacecraft series andthe military’s AUVs and UAVs has created the need for"high level" intelligent software agents which will interactwith "low level" hardware control systems. The low levelcontrol algorithms are designed to simplify the input/outputbehavior of a complex dynamical system as perceived bya high-level intelligent agent, while the high level "intelli-gence" is responsible for specifying the desired system be-havior which will satisfy top level goals and constraints. Thedynamical simplification accomplished by the low level con-troller potentially makes the high level agent’s task mucheasier, reducing the scope of potential behaviors which mustbe accommodated by goal directed plans.

Often, the lower-level software modules are treated as"black boxes" by the higher-level software; however, the dy-namic behavior of a control algorithm coupled with a phys-ical system can still be quite complex. Therefore, when in-telligent agents and control algorithms are linked in a feed-back system, undesirable behavior may result even thougheach module can be verified to function correctly in isola-tion. Some control algorithms are more successful than oth-ers in simplifying the apparent dynamics of the physical sys-tem; careful design of the control algorithm is therefore onekey to producing desirable overall system performance.

This paper discusses both the problems incurred by un-wanted AI agent/control system coupled dynamic interac-tions, and also the possibility for improved system perfor-mance through careful design of the coupled dynamics. Ourprinciple interest is in automating the task of docking twospace vehicles, as illustrated by the (terrestrial) researchprototype in Figure 3, which is currently human piloted.We use a well-known problem in flight dynamics -- pilot-induced oscillations, or PIO -- as an example of the undesir-able, possibly catastrophic problems that can occur throughthe interaction of higher- and lower-level systems. Meth-ods for analyzing PIO--like behavior in a prototype hybridagent/control system are presented, thus possibly providinga bridge between the AI and controls communities.

We also describe possible strategies for mitigating the ef-fects of unwanted PIO-like behaviors. We discuss new non-linear control technologies which can be shown to be lesssusceptible to these behaviors. Unlike more standard linearcontrollers, though, these nonlinear controllers require anaccurate mathematical model of the physical system. Thismotivates a discussion of adaptive controllers which canlearn system models in situ. Adaptive controllers furtherallow for the possibility of distributing learning and intel-ligence through each of the two layers of software so as tomaintain desirable coupled dynamics. Finally, we suggesta new architecture for hybrid AI agents/control software,along with challenges that this architecture presents for re-searchers.

Hybrid AI/control systemsWhile it is difficult to describe a "typical" autonomous sys-tem, a sufficiently general block diagram might be similarto Figure 1. The box labeled "controlling agent" can havemany instantiations, depending on design philosophy, avail-able resources, and the exact requirements for the system’sperformance. A common -- indeed prototype -- instanceof this agent might be a skilled (or unskilled) human, flyingan aircraft or docking a spacecraft; the goal of the consid-erations in this paper is to replace this most "canonical" ofagents with fully autonomous software of comparable (orsuperior) capability. The agent may be given a set of goalsor be capable of generating goals internally, or it may bedesigned to instinctively perform actions which embody thegoals of the designer. It may also internally generate plans

83

From: AAAI Technical Report SS-01-06. Compilation copyright © 2001, AAAI (www.aaai.org). All rights reserved.

IAotuator I

I Actu[l

Controlling Coranands~l Physical Trajec ory

Agent F" I System!

Figure 1: Block diagram of a generic autonomous system

intended to fulfill these goals and desired trajectories to carryout the plans. Regardless of the nature of the agent or thesources of the goals which drive it, its outputs are the actu-ator commands intended to cause the dynamical system tofollow trajectories consistent with its goals. In general, weassume that the agent receives feedback from the environ-ment which includes information about the performance ofthe physical system.

In attempting to automate the process of mapping goals tophysical inputs, it is helpful to break the problem into twosubtasks: the problem of translating goals into sequencesof desired actions, and the problem of translating desiredactions into the physical inputs required to force the sys-tem to carry out these actions. This decomposition is shownschematically in 2. The output of both of these subtasks mayvary continuously in response to the evolution of the actualphysical state of the system, as indicated by the nested feed-back loops.

While this division of labor may be artificial in terms ofhow the performance of the agent of Figure 1 is achievedin biological systems, it is nonetheless a useful intermedi-ate abstraction, amenable to rigorous analysis. In fact, onecould argue that the decomposition between Figure 1 andFigure 2 represents a current distinction between two classesof engineering expertise: control theory deals with the de-sign of the second block of this diagram and analysis of theperformance of the inner feedback loop, while artificial in-telligence theory is often broadly directed towards design ofthe first block and analysis of the performance of the outerfeedback loop. Moreover, as we shall discuss at the end ofthis paper, even if the indicated partition is non-biological,it may serve as a useful framework within which biologicalcapabilities may be closely mimicked.

While the box labeled "control algorithm" can have manyinstantiations, control laws as a class are usually mathemat-ically derived algorithms which accept as their input a de-sired trajectory through the physical system’s state space,and output commands to the system’s actuators which at-tempt to cause it to follow the trajectory. Control algorithmshave proven successful in controlling a wide range of phys-ical systems, often in the face of dynamic, changing envi-ronments, noisy sensors and actuators, and poorly modeledtarget systems. Control algorithms may be used as shownin Figure 2 to simplify the apparent dynamics of the physi-cal system as seen by the intelligent agent; indeed, there areaircraft which are difficult or impossible for a human agentto pilot without a continuous controller which simplifies thedynamics ("handling qualities") perceived by the pilot.

Ideally, this arrangement should make the agent’s taskmuch easier in that it can rely on the control algorithm

Controlling Agent

] Intelligent ~te°~rY ~ ph~-t~olry

,I Ag~ I -~, q ~y.tom I "

Figure 2: Block diagram of a hybrid AI/control system. Thedashed box shows components encapsulated by the "control-ling agent" block of Figure 1.

to force the physical system to carry out desired actions,even if the physical system has very complex dynamicsand complicated mappings between actuator inputs and sys-tem outputs. However, even in the very simple case of aphysical system which can be accurately modeled by lin-ear differential equations and a linear proportional-integral-derivative (PID) control algorithm, the coupled control algo-rithm/physical system can exhibit nontrivial dynamic behav-ior. In particular, control algorithms require nonzero time tobring the system near a desired state or to respond to systemdisturbances. In addition, the manner in which the systemtransitions from the initial state to the desired state often en-tails a damped oscillatory trajectory with an overshoot thatcan be a significant fraction of the difference between theinitial and desired states. Systems that must be modeledby nonlinear equations -- the more common case, whichincludes such physical systems as robotic arms, spacecraft,aircraft, and underwater vehicles -- can exhibit even morecomplex dynamic behavior (Slotine & Li 1991, pp. 4-12).

A related, but more subtle, challenge involves the un-expected behaviors that can result from the coupled dy-namics of a rational intelligent agent commanding a well-controlled vehicle. On first glance, the challenge of math-ematically analyzing the dynamics of a system like the onein Figure 2 may seem intractable due to the inherently non-mathematical nature of most intelligent agents. For insighton how this might be accomplished, one may take as inspi-ration research performed on what has been referred to as"the senior flying qualities problem": pilot-induced oscilla-tions, or PIO (McRuer 1995, p. 2). PIO is a rapid oscillatorymotion which can beset a piloted aircraft under certain cir-cumstances, often with disastrous results. PIO is most oftenobserved with very experienced, well-trained pilots flying awide variety of aircraft, which have included the M2-F2, X-15, YF-22, MD-11, C-17, and the Space Shuttle (McRuer1995, p. 2).

The PIO phenomenon is not caused by a poorly behaveddynamical system; in fact, PIOs have been observed on sys-tems which are quite well behaved in isolation. Instead, PIOis known to result from the coupled interaction of the pi-lot and the underlying system dynamics (McRuer 1995, p.3); that is, it is an instability phenomenon introduced by theclosing of the outer feedback of Figure 2. Humans are of-ten considered the "gold standard" for intelligent agents, andaircraft dynamics share many physical characteristics withthe many complex systems that are now the subject of au-tomation research. If a well-trained, highly experienced and

84

Figure 3: MPOD performing a docking task

supremely rational human pilot can interact with a mechan-ical system in such a way as to cause sudden, violent be-havior, the possibility that an intelligent agent acting on anunderlying dynamical system could result in undesirable be-havior should at least be entertained.

Analysis of PIO in a hybrid system

The Space Systems Laboratory (SSL) at the University Maryland, College Park has a long history of studying theoperation of human-piloted spacecraft.The primary researchtool of the SSL is the Neutral Buoyancy Research Facility.Neutral buoyancy, the use of water to simulate weightless-ness, is the principal method used by NASA and other spaceagencies to practice end-to-end space operations. The SSLhistorically has concentrated on developing robotic systemsto assist humans or, in some cases, perform space operationstasks which humans cannot or do not desire to perform. Ingeneral SSL research utilizes these robots operating in neu-tral buoyancy to analyze human/robotic interaction, demon-strate advanced spacecraft control algorithms, and developautonomous robotic systems.

The specific target of the research described below isMPOD (the Multimode Proximity Operations Device), neutral buoyancy version of an orbital maneuvering vehicle.Orbital maneuvering vehicles have been proposed for ferry-ing cargo and personnel between spacecraft in different or-bits, for positioning astronauts at worksites during extrave-hicular operations, and for retrieving malfunctioning spacehardware from remote locations. As such, OMVs must ac-tively maneuver with great precision and reliability while inclose proximity to other spacecraft and astronauts.

To render the apparent dynamics of this complex vehi-cle as simple as possible to the pilot, a number of differ-ent control strategies have been explored. Only the vehi-cle’s attitude is currently controlled automatically, as theneutral buoyancy tank does not currently provide the sen-sors (GPS or equivalent) necessary to implement a transla-tional feedback algorithm. The attitude control algorithmsimplemented on MPOD include a standard linear trajectory-tracking attitude controller and a family of new nonlineartrajectory-tracking attitude controllers. These latter are de-rived from an underlying PD component, with additional

1

0.5

0

-0.5

Actual r_ _ Desired

100i i

50 150 200

Figure 4: Example stiffPD controller step response. Dashedline is the requested trajectory; solid is the actual trajectory.

nonlinear terms in the control algorithms designed to di-rectly compensate for the nonlinearities in the vehicle’s dy-namics.

The PD controllers were tuned with three different gain-sets -- a "loose" set, a "medium" set, and a "stiff" set --each resulting in closed-loop dynamics with similar over-shoot characteristics to a step change in orientation, butwith increasing bandwidth. Theoretically, these controllerswould be expected to exhibit more accurate tracking andfaster responses as the gainsets increase in stiffness. Theloose and medium gainsets were also used for the PD com-ponents of the nonlinear controller; the action of the addednonlinear terms used sufficient additional control authoritythat the stiff gainset could not be employed without saturat-ing the actuators on the vehicle.

Figure 4 shows two step responses for the stiff PD con-troller. Notice that the controller appears to be quite wellbehaved. There is no oscillatory behavior present, withan overshoot of only 7%. The system exhibits little or nosteady-state offset, with the deviations from the desired tra-jectory being caused by noise and disturbances largely dueto currents and eddies in the neutral buoyancy tank. Theloose and medium gainset produced responses with similarovershoot, longer settling times, and slightly higher steady-state errors, in accordance with theory. Following the stepresponse evaluations, the pointing accuracy of each of thecontrollers was evaluated on a more demanding task bycommanding MPOD to track a sinusoidally varying ("tum-bling") attitude maneuver. The mean-square pointing errorin each control mode is shown in Figure 5. The tracking met-ric here is expressed in quaternion units, and may roughly beconverted to degrees by multiplying by 120. Notice that themean-square tracking error decreases as the gains increase,and that the nonlinear controllers are more accurate than thelinear controllers with the same PD gainset; this again is aspredicted by theory. Notice, also, that the stiff PD controlleris the most accurate of any of the controllers tested.

Thus, by the generally accepted metrics used by controlengineers, each of the controllers appears to result in a suc-cessful design of the inner loop of Figure 2, with the highgain PD controller producing the best overall response inthe isolated task of tracking a specified reference attitude.Next, the coupled performance of a human pilot performinga docking task with each of the different controller modes

85

Loose PD Medium PD Stiff PD Loose MediumNonlinear Nonlinear

Controller Type

Figure 5: Pointing accuracy of linear and nonlinear con-trollers. Metric is quaternion error.

was investigated.The setup for this task is illustrated in Figure 6: the

pilot commands the desired motion of MPOD using two3DOF joysticks (one for attitude, one for translation). Pi-lots were instructed to fly MPOD from a point in the middleof the NBRF tank to a hard dock with a target affixed to thetank wall. During testing, completion time and pilot handcontroller usage (a measure of physical workload) wererecorded. In addition, after using each controller the pilotswere asked to rate the system performance using a Cooper-Harper questionnaire, a popular method for determining pi-lot mental workload (Sanders & McCormick 1993).

Although it was expected that human pilots would per-form better with more accurate controllers, this proved notto be the case. Figure 7 shows a graph indicating the per-formance of four experienced pilots after six docking at-tempts with each controller. The metric in this graph factorsin both mental and physical effort multiplied by task com-pletion time as described in (Henshaw 1998, pp. 76-78).Clearly, lower values for this metric are indicative of betterperformance. The individual metrics also broadly trackedthe trends seen in Figure 7. Notice that while performancewas uniform with the nonlinear controllers, it varied quiteconsiderably across the three PD controllers with the worstperformance being exhibited with the stiff PD controller.

Somewhat counter-intuitively, then, the controller whichprovides the best performance in isolation appears to pro-duce the worst performance when utilized with a goal-driven human pilot. Moreover, video observation of MPODand its pilots when using this controller showed that MPODcould exhibited undamped oscillatory behavior, with the pi-lot "fighting" with the vehicle to try to stop the oscillationsas shown in Figure 8. From the previous graphs, however,it is clear that this behavior was not caused by poor perfor-mance on the part of the controller. Instead, the oscillatorybehavior appeared to be due to a coupling of the pilot andthe closed-loop vehicle system. This is a classic symptom

Camera

Camera

50 ft

m............ ,..

©s

25 ft

Figure 6: Schematic of NBRF tank showing the MPODdocking task (not to scale).

Figure 7: Pilot performance during docking tasks. Lettersindicate statistically different groups.

of pilot-induced oscillations (Blakelock 1991).In order to mathematically analyze a PIO such as that ob-

served on MPOD, a mathematical model of human pilot be-havior is needed. One widely accepted method for accom-plishing this is called the "crossover model", which modelsthe dynamics of a human pilot as a linear PD controller witha time delay. The crossover model attempts to represent thebehavior of an experienced human pilot performing a set-

86

1,4

1.2

1

0.8

0.6

0,4

0.2

0

-0.20

Stiff PD Observed Step Response

\// ~ /" ~"

i \ i / ’,~ i \

A t i/

I "°’Ji i i i

5 10 15 20 25Time (seconds~

Figure 8: System behavior during a PIO event. Solid lineis the requested trajectory from the pilot; dashed line is theactual trajectory of the vehicle.

point maintenance task such as maintaining straight-and-level flight in an airplane or hovering in a helicopter. Thecrossover model has been used within the aerospace com-munity for at least thirty years, and has proven successfulin modeling pilot behavior across a very wide range of con-trolled elements. For a synopsis of the crossover model andjustification for its use as a human pilot model, see (Blake-lock 1991) or (McRuer 1995).

Specifically, the crossover model states that the pilot’s be-havior (the mapping between the perceived error betweenthe actual and goal state and the resulting commanded mo-tion) can be modeled as

Gpitot (s) = Spe-Th~ (TLS + 1) (1)

where Gpitot (s) is the pilot transfer function. The pilot timedelay, e-rhs, represents processing delays and neuromus-cular lag and is normally on the order of 0.2 to 0.4 seconds.The crossover model states that pilot’s effective proportionalgain Sp and derivative gain Sp x TL change depending on thedynamics of the element being controlled so that the slopeof the Bode plot of the open-loop pilot/plant system is ap-proximately -20dB/decade in the crossover region. The gainSp in particular is task dependent, generally increasing whenthe required precision to complete a task increases, as duringthe final phases of an attempted docking of a spacecraft orlanding of an aircraft. An implied condition is that the pilotzero must be minimum phase, i.e. the pilot zero must lie inthe left half-plane (Blakelock 1991, pp. 459-465).

Mathematical models of the rest of the system are alsoneeded. From physical principles and extensive flight test-ing, a single-axis, linear model of MPOD was determinedto be

2.8 x 10-4GMPOD (S) -- S (0.-~ 1) (2)

Similarly the action of the PD controller is given by

TpD = --Kd(~ -- Kp~-- --Kda

t

ControllerMPOD

Figure 9: Block diagram of the MPOD system

where ~ is a configuration error (here the attitude error) andis the corresponding velocity tracking error (here angular

velocity). An alternate description of the actions of the PDcontroller (useful in the discussion below) employs the aux-iliary variable a = a) -t- ,~ with complete equivalence to thefirst form by taking Kp = AKD. For the stiff PD controller,Kd is 160 and ,~ is 8. Finally, there is an input shaping (tra-jectory generation) filter between the pilot and the controllerwhich smooths the transition between new attitudes. Thefilter can be represented by the transfer function

KGfiuer (s) -- s(s 2) (3)

where, in this instance, K-- 5--ig-2The linearized dynamics of the controlled MPOD system

as perceived by the pilot is then computed to be

K wn2 (Is + 1)GeL (8) -- 8 (8 2) x s2 + 2¢w.s + w2. (4)

and the transfer function for the coupled open-loop dynam-ics of the pilot/vehicle system is

Spe-’hs (TLs + 1) x GCL (s) (5)

A reprisal of Figure 2 illustrating the model just developedis shown in Figure 9.

The root locus plot is a tool that is used to examine the lo-cation of the poles (modes) of a feedback system as a func-tion of the overall gain of the feedback loop. A linear systembecomes unstable (exhibits "runaway" behavior) when anyone of the poles becomes positive. The gain in the feedbacksystem of Figure 9 is controlled by the gain of the pilot Sp,and Figure 10 shows a root locus plot for the coupled systemas a function of this gain. The arrows indicate the directionof migration of the closed-loop poles as the pilot gain in-creases.

Notice that the system can become unstable if the pi-lot gain becomes large enough, as may happen in the finalstages of a docking maneuver when accuracy is critical. Thisimmediately suggests that PIO may be a problem for thissystem. Indeed, simulations using this model predict the os-cillatory behavior experimentally observed in Figure 8 andcorrectly forecast the frequency of the oscillations.

This analysis illustrates that a rational "high-level" in-telligent agent commanding a well-controlled system canresult in unsatisfactory or even unstable system behavior.It is important to realize that the linear control method-ology which produced the observed instability is virtuallyubiquitous, being found an nearly all commercially avail-able robotic systems, many flight control systems, and most

87

2

1.5

1

.~ 0.5

_E -o.s

-1

-1.5

-2

-4

Figure 10:

Stiff PD Root Locus

.... 0 ............................ ~ ........ 0

i i r-3 -2 -1

Real Axis

Root locus of coupled pilot-stiff PD con-troller system. Asterisks show estimated position of systemclosed-loop poles.

spacecraft attitude control systems. Moreover, the dynamicsof the pilot model are quite simple, and in fact might be areasonable representation for the actions of a rational intel-ligent agent attempting to perform a task such as tracking amoving target.

What can be done about undesirable interactions?In recent years there has been considerable progress in de-veloping control algorithms which have superior perfor-mance to the linear controller analyzed above. One alterna-tive which has also been implemented and tested on MPODis a nonlinear controller related to the computed torque con-troller that is beginning to gain acceptance in the control ofrobotic manipulators (Slotine & Li 1991), (Egeland & God-havn 1994), (Sanner 1996).

The form of the controller is

T~ = "rpD + H (e) (or + C (e, w) wr + E (e,

wr = w - a (6)where a is the auxiliary variable introduced in the above dis-cussion of the PD controller. The additional nonlinear termsemployed in this design utilize H -- the positive definitesymmetric inertia matrix of the dynamics, C -- containingthe contribution of Coriolis and centripetal forces on the dy-namics, and E -- representing environmental forces. Manyphysical systems, including aircraft, spacecraft, underwatervehicles, wheeled vehicles, and robotic manipulators, havedynamics to which this algorithm can be successfully ap-plied. Indeed, for each of these systems, it can be (non-trivially) proven that the above control algorithm producesclosed-loop dynamics (that is, the inner loop of Figure 2)which are globally stable with tracking errors which decayexponentially to zero for any smooth desired trajectory (Fos-sen 1994), (Henshaw 1998, pp. 15-30), (Egeland & havn 1994).

In terms of the linear analysis above, this convergencealso implies that the controller causes the closed-loop ve-hicle/controller dynamics to become:

KacL (s) s (s+ 2) (7)

1

0.8

0.6

0.4

0.2

-0.:

-0,4

-0.6

-0.8

-1-3

I I I11-2.5 -2 -1.5 -

Figure 11: Root locustroller/pilot system

Nonlinear Root Locus

i i-0.5 015

Real Axistl,5

plot of coupled nonlinear con-

Note in particular that, although the action of the controlleris nonlinear, its effect is to put a particulary simple lin-ear "face" on the closed-loop dynamics, as perceived by ahigher-level agent. If the PIO analysis from above is nowrepeated using the nonlinear controller, the root locus plotfor the outer feedback loop (representing a prediction of thecoupled agent/controlled system interaction) shown in Fig-ure 11 results. The structure of the predicted interaction dy-namics predicted by this plot is quite different from that ob-tained with the PD controller. Disregarding the effects ofpilot time delay, this system will always be stable regardlessof the position of the pilot zero or the value for the pilot gain,and in fact will be critically damped for a wide range of zerolocations and gains. In theory, therefore, the system shouldexhibit PIO only when the pilot gain becomes enormouslylarge, in fact, an order of magnitude larger than the gainsfound to match the simulated responses with the observeddata (Henshaw 1998, p. 107).

Notably, the nonlinear controller/vehicle system appearsto be far less susceptible to pilot-induced oscillations. Thisdoes not appear to be because the controller is more accu-rate or has faster response times. Instead, the resistance toPIO--type behaviors appears to arise because the nonlinearcontroller presents a much simpler dynamical system to theagent than does the corresponding PD controller. The dif-ference in perceived dynamic complexity can easily be seenby directly comparing equation 4, which represents the lin-ear controller/vehicle dynamic behavior as seen by the pi-lot, with equation 7, the dynamic behavior of the nonlin-ear controller/vehicle system. It appears that systems withmore complex dynamics may be more susceptible to PIOregardless of how "well they perform in terms of traditional(isolated) controller metrics such as accuracy, overshoot, bandwidth.

88

Desirable Hybrid System InteractionsAdaptation: low-level learningClearly, one challenge to using more advanced controllerssuch as the one described above is the problem of obtain-ing an accurate mathematical model of the physical systembeing controlled, including good estimates for the mass androtational inertias, environmental effects, and thruster char-acteristics. While this can sometimes be accomplished, itmay be quite challenging to completely characterize actua-tors, rotational inertias, or environmental forces such as dragand friction. The situation can be further complicated whenthe physical system changes, as with an aircraft droppingcargo or a robotic arm experiencing a partial actuator fail-ure.

In order to alleviate this problem, adaptive versions ofthese controllers have been developed. The exact forms andcapabilities of adaptive controllers can be quite varied de-pending on the exact nature of the parameters or forces be-ing learned. One possible instantiation of such a controlleris

= TpD "~- Y& (8)

where H, C, and/~ represent approximations of H, C, andE respectively. Under the further (and not very restrictive)assumption that the estimated nonlinear terms can be fac-tored as

B + d + = Ya,where Y is a matrix of known nonlinear functions and & is avector of unknown parameters, then the "adaptation law"

= -FYTa (9)

guarantees global stability and asymptotically convergingtracking errors, even if the "true" physical parameters a arecompletely unknown when the system is initialized (Ege-land & Godhavn 1994). The positive definite matrix F is thelearning rate for the system.

System parametric uncertainties such as mass, rotationalinertias and drag coefficients can be handled this way. Morecomplex structural uncertainties such as the exact nature ofdrag or frictional forces can be dealt with by neural-networkbased adaptive components, as can complex actuator andstructural dynamics. Global asymptotic stability can also beshown for neural-network based adaptive controllers and forcombinations of different kinds of adaptation mechanisms(Sanner 1996).

Cooperative learning in hybrid systemsMany AI designs for robotic learning model the learningprocess at the top level (the first box in Figure 2), assuming"dumb" low-level servo controllers which mindless push thesystem to the configurations requested by the (adaptive) highlevel agent. The above discussion shows that control theoryhas its own notions of learning, which enable "smart" servosthat can labor to maintain the fiction of dynamic simplic-ity for a top level agent, despite uncertainty in the physicalproperties of the systems they control.

This suggests that learning can be profitably distributedbetween the two modules of Figure 2, with the low-levelcontroller becoming incrementally better at its goal of track-ing specified system configurations in a manner relativelytransparent to the top level agent, while simultaneously thisagent may also be incrementally learning to specify motionswhich better achieve its top level goals. In fact, the analysisabove suggests that such a strategy might be necessary forthe success of the hybrid system, as PIO-type instabilitiesmay set in if the low-level controller does not successfullymaintain the appearance of dynamic simplicity for the highlevel agent.

While rigorous mathematical analysis of this cooperation,in the spirit of the analysis above, will be quite complex, itis encouraging to note from Figure 7 that when the adaptivenonlinear controller was paired with a human pilot (who ispresumably learning to better perform the docking task ateach iteration), the overall performance metric was amongthe best seen in these experiments. More extensive data mustbe collected quantifying the interaction of a learning agent(like a human) with a learning controller like that describedabove to fully support the advantages of this cooperation.This is the subject of current experimentation in the SpaceSystems Lab.

Biological Considerations

Since human and other biological agents are, in a sense, thecurrent "gold standard" against which potential autonomoussoftware architectures are evaluated, it is interesting to in-quire whether there is any biological support for the hier-archicai interaction and distributed learning paradigm sug-gested above. While the data is limited, there do appear to beindications that the actions of biological intelligences mightat least be profitably modeled in this fashion.

It has been hypothesized that humans may generate "de-sired trajectories" to perform certain tasks which are in-variant regardless of the muscles or joint actions requiredto carry them out. For instance, it has been noted that aperson’s handwriting remains nearly the same regardless ofwhether the writing takes place on a piece of paper or achalkboard, indicating a characteristic pattern of hand mo-tions used to describe each letter. However, writing on pa-per requires a very different set of actuation of arm and handmuscles than does writing on chalkboard, and a differentset of environmental forces is encountered (Winter 1990).While one could attempt to explain this behavior in terms ofthe "black box" agent seen in Figure 1, it seems more eas-ily interpreted in the model of Figure 2: the brain may havecharacteristic trajectories for the hand in its goal to trace outeach letter (which are subgoals of spelling a word, whichare subgoals of sentences, etc.), while a low-level, adaptivecontrol strategy works to execute these motions.

More concretely, consider a planar reaching task -- thatis, when the goal presented is to move the hand to a newposition on a horizontal surface. In this situation, humansappear to adopt an invariant strategy of moving the hand ina straight line between the new and old positions, with a ve-locity profile that assumes a characteristic bell shape. This

89

Flight Par~eter~velope Est~ates

Actuator ActualIntelligent TraJ ecto~ Co~nds ~ajecto~

Agent

ConfidenceConfidence Estimator

Figure 12: Suggested architecture for hybrid autonomousagent/control systems

is the so-called "minimum jerk" trajectory, in that it corre-sponds to a hand trajectory which minimizes the derivativeof acceleration (jerk) as the hand transitions between the twoendpoints (Flash & Hogan 1985). Additionally, if a deter-ministic external force is applied to the hand as it movesin the plane, disturbing this trajectory, with practice humansubjects learn to account for the presence of this force untilthey can again execute the minimum jerk trajectory, despitethe disturbance (Shadmehr & Mussa-Ivaldi 1994). Interest-ingly, rejection of the perturbation is not accomplished by"stiffening" control of the arm (just as stiffer gains in a PDcontroller will produce lower tracking errors), but rather bylearning a model of the forces which will be applied to thearm at each point in the workspace. This was shown by sud-denly removing the presence of the disturbances in subjectswho had already learned to accommodate them, and observ-ing that the resulting arm motion was perturbed away fromthe minimum jerk trajectory just as though the "negative" ofthe original disturbance was applied (Shadmehr & Mussa-Ivaldi 1994).

One possibility for modeling these observations is againin the framework of Figure 2, with the top level agent com-puting a minimum jerk trajectory between current and goalstates, and the low level agent acting as an adaptive nonlin-ear controller in the fashion described in the previous sec-tion. Indeed (Kosha & Sanner 1999) performed a simula-tion of the perturbed reaching task in this spirit, additionallymodeling the adaptive nonlinear component of the arm con-trol dynamics using neuron-like elements, and produced re-suits which qualitatively matched many of the reported fea-tures of the human responses.

These observations are not meant to suggest that biolog-ical signal processing is segregated into the subtasks andcomponents discussed herein, but rather that this frameworkmay be useful for producing systems which can mimic theimportant observed features of biological systems. More-over, as demonstrated above, this framework appears quiteamenable to theoretical analysis which can both predict thedangers of PIO-like behavior and provide the designer withinsights which may lead to the creation of more robust adap-tive autonomous systems.

Directions for Further Work

Control systems do not have unlimited "command author-ity": there are practical limits to the magnitude of the forces,

torques, voltages, etc. which can be manipulated in thephysical system to influence its actions. This means thatthe low-level controller can maintain the desired apparentdynamic simplicity to the high-level agent only over a fi-nite range of configurations; in aerospace this range is some-times known as the "flight envelope". Accurate knowledgeof this envelope is an important requirement for the high-level agent, lest, in pursuit of its goals, it commands motionof which the system is incapable.

The flight envelope inherently depends on the exact dy-namics of the system -- the mass, rotational inertias, ex-ternal forces, and so on, as well as the characteristics ofthe actuators. To compute the envelope of feasible trajec-tories, then, the high level agent must have knowledge of,or a means to learn about, the same physical parameters thatan adaptive low level controller requires. Extending the co-operative learning notion suggested above, might the twomodules actually share this information, for example by theadaptive controller feeding back its parameter estimates tothe high level agent?

This appears to be an attractive idea at first glance, but itis complicated by a well-known feature of adaptive controlalgorithms: although they can guarantee accurate tracking,they cannot guarantee that their parameter estimates willconverge to the true physical values without additional con-ditions. A simple example to illustrate this point might bean attitude controller attempting to estimate rotational iner-tias. If the vehicle is only asked to rotate around one axis,the adaptive controller does not need accurate estimates ofthe rotational inertias of the other axes in order to guaranteegood tracking, and in fact there is no way for the adaptivemechanism to accurately estimate a rotational inertia with-out experiencing rotations around the corresponding axis.

One could think of this as the need for the controller tocontinually "practice" throughout its envelope in order toguarantee that all of its parameter estimates are exact. Math-ematically, convergence of the estimated parameters can beguaranteed if the desired trajectories given to the controllerare sufficiently complex to allow the controller to "explore"enough of the state space. This requirement is known as thepersistency of excitation criterion (Slotine & Li 1991, pp.331-332), and can be exactly quantified for a given physicalsystem.

Unfortunately, it may be that the need to fulfill the persis-tency of excitation requirement might conflict with the sys-tem’s goals. For instance, exploring large parts of the statespace might involve large motions, fast velocities, or com-plex "wiggles" which, by themselves, do not aid in accom-plishing the system’s objectives and could actually violatesafety constraints during certain operations like docking aspacecraft or landing an airplane. Conversely, though, gen-erating an accurate flight envelope is necessary to accom-plish future goals. The tension between optimally accom-plishing near-term goals and fulfilling the persistency of ex-citation requirement entails a high level of cooperation be-tween the agent and the controller. One possible approachto solving this problem might be for the agent to take "timeout" between goals -- for instance, during transit betweenworksites -- to specifically command trajectories which sat-

9O

isfy the persistency of excitation criterion. Exactly how thiscould be done is presently an unsolved problem, but has anintriguing parallel with children practicing in order to learnthe abilities and parameters of their physical systems -- theirbodies.

Similarly, it may be useful for the agent to have someknowledge of the accuracy of the flight envelope estimate.If an upcoming task required rotation around a specific axis,for instance, the agent might use an estimate of the accuracyof the corresponding rotational inertia to decide whether to"practice" rotational maneuvers around that axis in order toensure an accurate estimate before attempting the task. Be-cause the persistency of excitation criterion is a mathemati-cal requirement on the desired trajectories, it should be pos-sible to use it to produce confidence intervals on the accu-racy of the physical parameter estimates. Methods for gen-erating confidence intervals, and more importantly for theiruse by an agent, are also matters of current research.

The considerations discussed here suggest an architecturefor hybrid systems as shown in Figure 12. This new archi-tecture allows for distributed adaptation, with the controllerboth simplifying the apparent system dynamics for the intel-ligent agent and generating estimates of the system’s phys-ical parameters. The agent uses these estimates to producedesired trajectories which fall within the flight envelope and,in addition, are sufficiently complex to allow the controller’sadaptation mechanism to accurately update it’s parameterestimates. Much work remains to make the ideas encapsu-lated by this architecture rigorous, and especially to analyzethe implications of the additional feedback loops.

Conclusions

We have demonstrated that undesirable behavior can resultfrom the interaction of rational intelligent agents and well-designed control algorithms. Mathematical methods of anal-ysis can predict this behavior, however. These methods alsolead to new design goals for the controllers in these systems,favoring controllers which provide simpler closed-loop sys-tem dynamics. We have also presented advanced controllerdesigns which, although more complex than standard linearcontrollers, are more successful at simplifying the closed-loop dynamics. This appears to make them less susceptibleto PIO.

We discussed adaptive versions of these controllers whicheliminate the need for the designer to accurately model thedynamics of the physical system. Furthermore, these con-trollers naturally lead to systems with distributed adaptationwhich may allow for the improved performance of intelli-gent agents in the face of changing system parameters orchanging environmental factors. Preliminary but intriguingevidence indicates that biological systems may work in asimilar manner.

As more complex autonomous systems are deployed, in-telligent agents will be faced with tasks which will requirehuman-level control abilities. This work is hopefully a smallstep in that direction.

ReferencesBlakelock, J.H. 1991. Automatic Control of Spacecraftand Missiles. John Wiley & Sons.Egeland, O., and Godhavn, J.-M. 1994. Passivity-basedadaptive attitude control of a rigid spacecraft. IEEE Trans-actions on Automatic Control 39(4).Flash, T., and Hogan, N. 1985. The coordination ofarm movements: an experimentally confirmed mathemat-ical model. The Journal of Neuroscience 5.Fossen, T. I. 1994. Guidance and Control of Ocean Vehi-cles. John Wiley & Sons.Henshaw, C.G. 1998. Nonlinear and adaptive controlstrategies for improving the performance of teleoperatedspacecraft and underwater vehicles. Master’s thesis, Uni-versity of Maryland, College Park.Kosha, M., and Sanner, R. M. 1999. A mathematical modelof the adaptive control of human arm motions. BiologicalCybernetics 80.

McRuer, D. T. 1995. Pilot-induced oscillations and humandynamic behavior. Contractor Report N96-24466, NationalAeronautics and Space Agency.

Sanders, M. S., and McCormick, E. J. 1993. Human Fac-tors in Engineering and Design. McGraw-Hill.Sanner, R. M. 1996. Adaptive attitude control using fixedand dynamically structured neural networks. In Proceed-ings of the 1996 AIAA Guidance, Dynamics and ControlConference.Shadmehr, R., and Mussa-Ivaldi, E A. 1994. Adaptiverepresentation of dynamics during learning of a motor task.The Journal of Neuroscience 14(5).Slotine, J.-J. E., and Li, W. 1991. Applied Nonlinear Con-trol. Prentice Hall.

Winter, D. A. 1990. Biomechanics and Motor Control ofHuman Movement. Wiley Interscience Publication.

91

From: AAAI Technical Report SS-0 , AAAI ( ...€¦ · Analysis and Design of Hybrid AI/Control...

Documents

Transcript of From: AAAI Technical Report SS-0 , AAAI ( ...€¦ · Analysis and Design of Hybrid AI/Control...