[American Institute of Aeronautics and Astronautics AIAA 3rd "Unmanned Unlimited" Technical...

American Institute of Aeronautics and Astronautics1

Increasing UAV Intelligence Through Learning

Dr J T Platts*, S E Howell†

QinetiQ Ltd, Bedford, Beds, MK44 2FQ, UK

E C Peeling, Dr C Thie, Z Lock‡

QinetiQ Ltd, Malvern, Worcs, WR14 3PS, UK

and

Dr P R Smith§

Blue Bear Systems Research Ltd, Bedford, Beds, MK41 6AE, UK

For Unmanned Air Vehicles to be successfully deployed in strike missions, system intelligence must be increased to robustly replace the planning, decision-making and conflict resolution functionality normally carried out by the crew in similarly deployed strike aircraft. The process of accurately capturing the requirements for such vehicles depends on evolutions from current manned aircraft operation thinking and synthetic environment based experimentation. The process is also used for technical development and refinement of intelligent machine based code and accordingly requires a significant investment in the fidelity of the synthetic environment. This work addresses the feasibility of capitalizing on the investment in the synthetic environment and using it to reduce the lengthy knowledge acquisition process. By using machine learning capability the aim is to reduce the length of the total knowledge acquisition and code development process. Initial experimentation with both deductive and inductive learning techniques is described with results presented. The paper concludes by postulating the likely benefits of such an approach as well as discussing the evident limitations.

I. NomenclatureAG14 = Action Group 14 (GARTEUR)AI = Artificial IntelligenceATR = Aided Target RecognitionEBL = Explanation Based LearningFM = Flight Mechanics (GARTEUR)GARTEUR = Group for Aeronautical Research and Technology in EURopeILP = Inductive Logic ProgrammingML = Machine LearningROE = Rules of EngagementSAM = Surface to Air MissileSAR = Synthetic Aperture RadarSCA = Symbolic Concept AcquisitionSE = Synthetic Environment

* Technical Manager, Intelligent Dynamic Systems, B109, QinetiQ, Bedford Technology Center, AIAA member† Project Manager, Intelligent Dynamic Systems, B109, QinetiQ, Bedford Technology Center‡ Researcher, Advanced Signal and Information Processing, E Bldg, QinetiQ, Malvern Technology Center§ Principal Scientist, Bldg 32, BBSR, Twinwoods Business Park, Twinwoods Road, Clapham, AIAA Member

AIAA 3rd "Unmanned Unlimited" Technical Conference, Workshop and Exhibit20 - 23 September 2004, Chicago, Illinois

AIAA 2004-6413

Copyright © 2004 by QinetiQ Ltd. Published by the American Institute of Aeronautics and Astronautics, Inc., with permission.


Figure 1. Flexibility and Capability as a function of Certainty

SEAD = Suppression of Enemy Air DefensesUAV = Unmanned Air VehicleUCAV = Unmanned Combat Air Vehicle

II. Introductionutonomous vehicles may fulfil a developing requirement in many applications and domains. Of particular interest in the defense sector, is the Unmanned Air Vehicle (UAV). UAVs applied in Combat roles (UCAVs)

have the potential to reduce significantly the risk to aircrew in military operations and improve mission effectiveness. Moreover, in addition to the reduced risk to aircrew, the attraction of relatively low cost, highly reliable, readily available assets that are not subject to the physical, physiological and training constraints of human pilots, is self-evident. This promise has prompted studies of future capability providing systems to be broadened to research UAV deployment and consider the possibility of using UCAVs in isolation, in swarms and in packages of mixed unmanned and manned platforms.

Existing UAV systems place a large reliance on remotely situated operators with the larger vehicles requiring a pilot in the loop for mission phases such as landing and take-off. High levels of operator interaction with the vehicle increase the vulnerability of the UAV due to detection of electromagnetic emissions and the increased likelihood of control datalink jamming. In addition the already intense competition for scarce bandwidth will be exacerbated. Moreover, reliance on a critical communications link has safety implications with regard to the need for system redundancy to protect against single point failure with concomitant cost implications. Consequently, to reduce this interaction, many of the air vehicle sub-systems must be largely autonomous. With military aspirations towards deploying UAVs in combat roles, such vehicles must be able to operate with minimal interaction from remote operators, communicating only when human decisions are a necessity to meet constraints such as those resulting from Rules of Engagement (ROE).

Achieving sufficient autonomy represents a significant challenge and is a key area of research in the UAV community1. Whilst autonomy does not depend on intelligence2, the need for autonomy coupled with flexibility drives the need for intelligent machines. Consequently, it might be concluded that considerable system flexibility, robustness and significantly increased effectiveness might be gained by the use of UAV systems exploiting intelligent behavior derived from knowledge.

Studies addressing UAV autonomy thus far have historically proposed essentially reactive systems whereby the UAV actions are determined by “hard-coded” knowledge based behavior stimulated by events anticipated by the

system designer. This makes the “knowledge acquisition” of the autonomy implementation phase very lengthy and costly. Moreover, with current levels of machine intelligence pre-mission contingency planning is extremely lengthy and costly (as of 8 Aug 02, Global Hawk was using a 24 hour planning cycle and was working towards an 8 hour mission planning time). Time implications severely impact operational planning tempos and ultimately may limit the deployability of UAVs. Software that is sufficiently robust and intelligent to reduce the time taken in contingency planning by humans is extremely complex, costly to produce and difficult to specify. The specification of software functionality is a knowledge acquisition

bottleneck and therefore becomes a critical issue within the UAV domain. The aims of this work are to reduce software development times through the use of learning algorithms and to increase UAV intelligence.

This paper uses an idealized Suppression of Enemy Air Defenses (SEAD) mission as a backdrop to explore learnt UAV behaviors. The behaviors are learnt within a desktop simulation environment. Initial work has been carried out using the Soar cognitive modeling language to represent the a priori knowledge3. The work has also

A


Figure 2. The Autonomy Building Process

examined symbolic machine learning methods to allow the augmentation of prior knowledge with that automatically gained from exercising within a simulated environment.

Section II of this paper will address the motivation for intelligent UAVs. Section III will describe one way in which intelligent UAVs may be achieved to illustrate the complexity of the process. Section IV will briefly describe the nature of machine learning whilst sections V to VII will describe experimentation with the use of Inductive and Deductive learning methods in turn. The paper will summarize with initial conclusions as to the efficacy of using learning methods to reduce both the knowledge acquisition and code development times and indicate the possible future direction of the research.

III. What are Intelligent UAVs?Autonomy and intelligence are terms often confused but Clough4 makes the point that autonomy and

intelligence are not necessarily linked. For example, the Amoeba is a very simple organism that is autonomous but also low on the intelligence scale. So is high intelligence required of UAVs?

White5,6 introduces the following argument. Figure 1 shows a number of systems plotted on a graph of flexibility and capability versus certainty. In essence, as one travels down the page and to the right, the assumption is that we are more certain of the environment and situation such that less sophisticated weapons or systems can be deployed. Conversely, as we move up the graph and to the left, we are less certain of the target and situation and so must field a more intelligent system able to deal robustly with the level of uncertainty and hence the addition of the crew to provide what humans currently do best. In the case of the UAV, the intelligence to cope with uncertainty is now shared between the on-the-spot vehicle and the remotely situated human operator. White goes on to describe how the system autonomy is therefore achieved via an appropriate partnership between the UAV operator and the intelligent vehicle system. This conclusion is implied in the following statement taken from Lin7 defining intelligent control:

“control that replaces the human mind in making decisions, planning control strategies, and learning new functions whenever the environment does not allow or does not justify the presence of a human operator.”

The work described here is principally concerned with the elements of system intelligence embodied within the UAV platform itself.

IV. One Intelligent UAV Design ProcessHaving differentiated between autonomy and intelligence, this section will describe the process used to

accurately capture the user’s requirements for autonomous UCAVs using a Synthetic Environment (SE).QinetiQ have been addressing UCAV

autonomy issues since 1997 when an SE was assembled to simulate various UCAV based concepts and in particular their interaction with manned assets. Since then the tools, process and program aims have all matured to the extent that the current imperative is to develop flight-trial-ready software whilst still addressing the broader de-risking aims of the research program. Specifically, the process has evolved in a series of SE based trials conducted annually since 19988,9,10 covering a range of missions (SEAD, Deep Strike, Attack of High Value Mobile Targets, Air to Air) and force mix combinations of UAV and manned platforms. Modern sensors, communications systems and weapons have been modeled. A critical part of the

SE has been the implementation of autonomy options, the Human Machine Interface and the human and vehicle intelligence partnership required to achieve airborne control of unmanned strike systems.


Latterly the increased focus on a narrower range of mission types has highlighted the highly sensitive nature of the knock on effects resulting from apparently small changes in system specification. For example, field of regard of sensors can impact the required turning performance of the air vehicle with concomitant effects on mission timelines and required sophistication of the task execution system software.

Figure 2 shows the process by which the autonomy level is increased in the work described. The process has evolved as a result of the trials described above. The process begins (1) with the definition of the likely system capabilities in terms of both performance and composition. For example the airframe may be of limited maneuver performance, different sensors may be distributed across different members of the package, sensor performance will be defined and sensors selected, weapon capabilities such as stand-off and targeting requirements will be defined and self-defense capabilities listed.

The operational analysis teams (2) then define concepts of use. This involves developing appropriate timelines, distributing the work share appropriately across the resources such that the appropriate platforms are in position to carry out their task at the correct time. For example, two platforms will stand-off and use Synthetic Aperture Radar (SAR) to search an area whilst the other 2 will penetrate to carry out any required attack.

At this point (3) the required behaviors must be articulated down to individual component behavior. For example, sensor coverage is achieved by a swathe of a predefined size and dictated by a racetrack pattern carried out at a certain slant range from the target. This sensor swathe must be established before other assets penetrate the target area and require the sensor information to carry out their part of the mission. Accordingly we can describe the optimum position and duration of any platform racetrack requirements. Further examples might be how do we wish the UAVs to loiter in specific phases of the flight – figures of eight or circular, how do we position and launch the weapon to optimize weapon success probabilities and do we want to position the non-attacking UAVs in support to carry out battle damage assessment and then re-attack if necessary?

Having completed stage 3 of the process it is now necessary to decide the system capabilities to make decisions (4). In this discussion the term decision making is loosely applied and used to encompass a number of capabilities. For example, the flight management system of the vehicle can produce a range of loiter patterns but may not have the intelligence to decide on which is most applicable to any given situation. It relies on an external decision making authority to decide which to use. The sensor may have the sophistication to be able to gather or attempt to gather data on a particular target location and to continue to do so until data of sufficient quality is obtained and could be said to have a decision making capability in that it knows when it can move on to the next target. As a further example reliance and trust of specific Aided Target Recognition (ATR) algorithms may allow us to delegate image sorting and gathering to decision making within the sensor management components. Clearly one should note that decision making capability will reside at a number of command abstraction levels and across the system.

In the light of knowledge of the system decision making capabilities the level of trust we can place in the system to make the correct decision will become apparent. Accordingly, armed with knowledge of constraints for reasons of legality, ROE and safety clearance, we can mandate which of the decisions are critical and must be made by the human being in the loop (5). For example, association of track information to the target, identification of the target from suitable imagery and weapon release would almost certainly be mandatory human decisions.

Having obtained the critical decisions it is now possible to allocate the autonomy levels to each of them (6) and decide on the appropriate technology to achieve each level. For example, it may require decision support technology for the lower levels but the higher levels can rely on the task execution components2.

Step 7 and 8 constitute a lower level iteration process as they require the agent based task execution software to be developed in full knowledge of, and integrated with, the variable autonomy interface. Depending on the autonomy level set then different actions are permissible. Some decisions can be carried out entirely by the agents. Whereas, other decisions requires operator interaction and a delay in task execution until the decision is made. This delay requires the agents to make the UAVs do something sensible whilst they await the human decision such that they are able to continue to address mission aims.

The intimacy of the relationship between the variable autonomy interface and the agent development raises certain issues regarding the flexibility of the resultant UCAV based capability and its flexibility. As can be seen from the process, seemingly simple changes to the system can result in major changes to the software induced UAV behaviors with the corresponding implementation and clearance overheads. This issue must be addressed to successfully achieve the aims.

Given the investment required to enable targeted increases in autonomy as argued in this section then it is interesting to postulate the idea of using the SE as a tool for educating UAVs and thus reducing the knowledge acquisition burden. This idea raises many questions both on the technical as well as the ethical viewpoint. Will we


really trust a UAV that alters its behavior as a function of experience? The remaining sections of this paper discuss machine learning in the round and describe early work in answering some of the fundamental questions about how to address the “learning UAV” problem by examining two types of learning philosophy.

V. Machine LearningMachine Learning (ML) has been one attempt to reducing the burden of knowledge acquisition11. ML is a

well-established branch of Artificial Intelligence (AI). It is concerned with the ability of a machine to improve its performance at a particular task in response to experience.

Knowledge representation is a key issue in ML and it can impact heavily on its success. ML techniques can be divided into two categories with respect to their representations: symbolic or sub-symbolic. Symbolic ML techniques represent knowledge and learned theories often as sets of IF-THEN rules, which are understandable and transparent to the user. This gives an audit trail for the solution and the comprehensible output of these techniques can provide justification for any decisions made and allows further insight into the problem and domain. Examples of such techniques are inductive logic programming and decision trees. In contrast, sub-symbolic ML techniques are opaque, or ‘black-box’, and not particularly intuitive to the user. The user cannot easily see how or why the output has been produced. However, sub-symbolic techniques have been used successfully in a number of domains, such as image processing. Examples of such techniques are artificial neural networks and some binary genetic algorithms.

The knowledge representation choice depends heavily on the domain. For safety- or mission-critical systems, it is important to be able to interrogate a concept or control rule in order to judge its future effectiveness and safety within the domain. For this reason, symbolic techniques are preferable for systems such as those to control UAV tactical behaviors.

ML can employ deductive or inductive reasoning approaches. A useful working definition of deductive and inductive learning is** two words that describe kinds of reasoning: that from the general to the specific is called deductive: All mice like cheese; this is a mouse; therefore this mouse likes cheese. That from the specific to the general is called inductive: The mice I know like cheese; these mice are typical; therefore all mice must like cheese.

Both approaches use a set of training examples or observations from the real world of a target concept and attempt to construct a general theory or hypothesis to describe the target concept.

For example, consider the problem of a robot learning to drive on a public road using vision sensors. A target concept for the robot to learn might be ‘turn corner’. The training set of examples might consist of sequences of images and steering commands recorded while observing a human driver. For this study, training examples for ML have been derived from a simulation model described in the next section.

VI. Desktop Simulation ModelThe simulation model requirements derive from two user communities: firstly the ML experts addressing the

research theme described in this paper, and secondly the GARTEUR FM AG14 group who are addressing the more general topic of autonomy in UAVs12. In the latter case, the purpose of the model is primarily to provide a common design challenge against which the performance of a wide variety of techniques, representing different branches of artificial intelligence, can be compared.

Since the ML techniques under investigation within the work described here are themselves considered to be a form of AI, there is considerable commonality in the requirements. An overview of the GARTEUR model will be given to offer insight into the desktop tool used for the work described in this paper.

The objective of GARTEUR FM AG14 is defined as “Development and comparison of largely autonomous planning and decision making techniques to enable a number of UAVs, of pre-specified dynamics and sensor fit, cooperating in a highly uncertain environment to achieve a goal that has no unique solution.”

The significance of some of the key words within this objective are expanded on below:1) A number of UAVs implies that 2 or more co-operating together (to achieve the desired goal) are to be

considered

** http://www.bartleby.com/68/81/1681.html

http://www.bartleby.com/68/81/


2) The UAVs are defined primarily by the characteristics of their flight management and sensor systems. The characteristics of the co-operating UAVs may be different and variables such as weapon load will vary as a mission proceeds and will need to be taken into account.

3) The techniques under investigation should enable the group of UAVs to co-operate and change their planning on a mission/navigation level.

4) A highly uncertain environment implies that the group of UAVs will encounter unexpected external events or entities.

5) No unique solution implies that some level of onboard ‘system intelligence’ is required to assess the situation, resolve conflicts and decide on the ‘best’ course of action.

A. Mission ObjectiveIn order to make the lessons learned within AG14 as pertinent to the real world as possible, the goal identified

for the UAVs was based on the objective of a SEAD mission. The SEAD mission objective is to generate a safe corridor through enemy air defenses. The corridor generated allows high-value platforms, potentially manned aircraft, to pass through what was previously an area of high-risk and to perform their mission.

B. MissionThe specific mission13 specified by AG14 involves an attack by four UCAVs conducting a SEAD mission to

clear a route for a fifth strike UCAV to attack a high value target. The enemy air defense consists of up to ten Surface-to-Air Missile (SAM) units clustered in an area of 400x300Km.

C. Model fidelityIt was recognized at the outset that it would be unrealistic to develop a synthetic environment in which each

element was represented by high-fidelity model components. The development of such models is an expensive and lengthy process typically culminating in a complex distributed simulation. Facilities of this nature exist and have been used extensively in research into UAV autonomy2,8,9,10. In those studies, simulation fidelity is of critical importance since a primary trial objective is the validation of the results of operational analysis studies.

Use of a SE of this complexity was dismissed for the following reasons: 1) It would be impractical to distribute to the AI practitioners;2) It would represent an inefficient approach to the generation of numerous example cases;3) Use of the GARTEUR model would enable comparison of the performance of the ML techniques with that

achieved by a variety of alternative approaches;4) Use of higher complexity models could result in rules for which the origin was not readily apparent.The last point is particularly important given the relatively low level of understanding of the application of ML

in this domain. At this stage of the research, the priority is to gain confidence in the ability of the learning schemes to infer properties of the environment - rather than to represent the environment with high validity. It was recognized that to gain that confidence, the behavior of the model components would have to be completely transparent.

Thus the requirement for both AG14 and the work described here was for a simple model, that captured some essential properties of the SEAD mission and could be used for developing algorithms for collaborative behavior. These essential properties were the following:

II. Some of the SAM units would be active, while others would act as pop-up threats when a UCAV came within range

III. Both UCAVs and SAM units would be equipped with sensors to detect one another and missiles to engage one another

IV. The UCAVs could employ various tactics to locate, avoid, or kill the SAM unitsThe overall objective of this work is to learn rules for the UAVs to operate in a full SEAD mission described by

the simulation model of the previous section. Work so far has been successful and concentrated on learning “local” (or singleton vehicle) UAV behaviors. Further work will focus on learning “global” (or collective vehicle) behaviors, such as collaborative UAV behaviors for threat suppression in the SEAD scenario. The results described below were taken from using a single UCAV versus single threat sub-set of the overall AG 14 5 vehicle 10 threat model.


VII. Deductive LearningA deductive reasoning approach works from existing knowledge and deduces conclusions, or ‘new’ knowledge,

that follows logically from the existing input information. For example, if we have factsA = B and B = C

Then we can deduce with confidence thatA = C

Explanation Based Learning (EBL) is a deductive ML technique. EBL takes as input a small number of training examples, a domain theory (a set of facts & rules that describe relationships between objects and actions in the domain), and the target concept. The output is a proof, or general explanation, of the target concept.

In EBL, the domain theory is used to find a proof for each example. In other words, an attempt is made to ‘explain’ the existence of each example. Depending on which parts of the domain theory are used, irrelevant facts and rules can be ‘pruned’, and the relevant features can be used to define the target concept. An alternative description of this technique could be to say that the proof or explanation of a single example can be used as a template for a general concept proof. The example proof can be generalized by a process that replaces constants with variables. EBL can be a successful learning technique but to function properly, it requires that the data provided (both the examples and domain theory) must be both complete and correct.

Soar†† is derived from a production system architecture. The system works from a set of states and state operators. State operators take the system from one state to another and are represented as a production rule: IF conditions THEN actions. Operator selection is Soar’s main problem as often many operators will apply in a certain situation and there will be no available knowledge to make a decision on which operator should be used. Such an event is known as an impasse.

The learning capacity of Soar is limited. It relies mainly on a process known as chunking. Chunking is related to the deductive learning method EBL. Existing knowledge is used to make a decision when impasses occur, and this knowledge is stored as a new operator (a chunk) so that that particular impasse will not occur again. Chunking can be successful but it performs little generalization over the existing knowledge. Therefore, it fits the training data well, but not unseen, future data. This also means that the data provided must be both correct and complete, which is of course rare in the real, uncertain, world. The work here used a refinement to Soar’s learning capabilities known as Symbolic Concept Acquisition (SCA).

A. Symbolic Concept AcquisitionSCA is an approach to learning that has been modeled on the learning processes of human beings, learning by

experience. The specific approach used here was developed by Wray and Chong14,15, and the work has made use of code available from the Soar website††.

Research on human subjects presented with sets of information has shown that people can automatically track the acceptability of categories and properties once an appropriate memory structure is present. Essentially, when strong associations link examples or properties to a category, learned information accrues automatically for the category each time an example or property occurs.

The code is actually designed to provide learning as a result of experience, as an ongoing real-time process –like the human beings being modeled. For the purposes of this work, it has been configured to learn from data from a series of files logged from a time series simulation of a UCAVs attacking a SAM site. The functionality of the code, the problem area considered and the results produced are now discussed.

B. Problem Area AddressedThis study has applied the SCA code, mentioned above, to data generated from the desktop simulation model

described above. 766 example runs were generated for use in the inductive learning work (see below). This same data has been used to exercise the SCA code. There are variations on the theme, but the data essentially describes cases where:

I. the UCAV fires a missile and destroys the SAM systemII. the UCAV fires a missile and misses the SAM systemIII. the UCAV is destroyed by the SAM

Mutual kills have not been considered at this stage.

†† http://sitemaker.umich.edu/soar

http://sitemaker.umich.edu/soar


The aim then is to generate learned rules that determine successful (1) and unsuccessful (2) firing conditions, and also conditions under which the UCAV is shot down. Whilst something of an academic exercise, the process establishes the general principles and illustrates what can be achieved. The basic concept is that rules learned in a synthetic environment could then be scrutinized and applied to an actual vehicle.

The features varied in the training examples in the cases identified above are as follows:IV. range from the UCAV to the SAM siteV. bearing of the SAM site relative to the UCAVVI. UCAV altitude

Note that in order to avoid the generation of learned rules being over-specific, range is rounded to the nearest 5000 m, height to the nearest 1000m and bearing to the nearest 10º. These classifications are entirely under the control of the user, but prevent the inclusion of real numbers such as height = 5,398.7365… etc metres. A similar treatment would need to be applied when applying such a rule, rather than just learning it.

A typical rule generated for a successful missile launch case using the SCA code is as follows:

sp {chunk-4*d145*opnochange*1:chunk(state <s1> ^task predict ôbject <o1> ^problem-space <p1>)(<o1> ^count 3 ^description <d1>)(<d1> ^launch-height 2000 ^launch-bearing 120 ^launch-range 75000)(<p1> ^name predict)-->(<s1> ôperator <o2> +)(<o2> ^category accept + ^name prediction +)}

Rather than describe the Soar syntax in any detail consider that the bracketed expressions above the represent the conditions or IF part of the rule and those after the “-->” actions or THEN parts. Here we can see that the decision to launch a missile from a height of 2,000, a bearing of 120, and a range of 75,000 would be categorized as acceptable. On the other hand the proposal to fire a missile under the following conditions:sp {chunk-3*d54*opnochange*1

:chunk(state <s1> ^task predict ôbject <o1> ^problem-space <p1>)(<o1> ^count 3 ^description <d1>)(<d1> ^launch-height 1000 ^launch-bearing 0 ^launch-range 0)(<p1> ^name predict)-->(<s1> ôperator <o2> +)(<o2> ^category reject + ^name prediction +)}

would be rejected. Note that the UCAV in this case is directly above the SAM.Cases where the UCAV is shot down by the SAM system, for the purposes of the work at this stage, are

classified for rejection:sp {chunk-3*d53*opnochange*1

:chunk(state <s1> ^task predict ôbject <o1> ^problem-space <p1>)(<o1> ^count 3 ^description <d1>)(<d1> ^killed-height 2000 ^killed-bearing 40 ^killed-range 45000)(<p1> ^name predict)-->(<s1> ôperator <o2> +)(<o2> ^category reject + ^name prediction +)}

Considering these types of cases alone it can be seen how a vehicle would be able to learn by experience, and that this experience could then be used to modify the platform behavior. Note that even though a large number of data runs have been collected there is no guarantee that all possible cases will have been captured, and indeed, there are likely to be many duplicate cases. Further analysis is required to determine the completeness of the captured data.


VIII. Inductive LearningInductive ML involves finding a general solution to a given problem that has been defined by a set of specific

examples. Within the autonomous UAV application, the task will either be to define a concept or to learn a control rule. For example, having observed many individual target attacks, a general characterization of the concept ‘successful target attack’ may be induced. Given examples of both successful and unsuccessful attacks, an inductive learning technique can be used to identify those general characteristics of successful attacks that differentiate them from unsuccessful ones. By identifying such general characteristics they could be used to dictate the best characteristics of future attack strategies. Inductive ML is a general capability and therefore has a wide applicability within science and technology and in tasks such as planning, control and scheduling.

Most inductive learning research has focused on classification problems in static domains rather than the dynamic UAV problem proposed here. Initial successful work by the group involved learning aircraft control actions based on Computer Assisted Pilot Training data15. As a further stepping stone, initial work enhanced the learning engine to allow the learning of rules to control the classic ‘steam boiler’16.

This part of the work involved using Inductive Logic Programming (ILP)18, a symbolic inductive ML technique, to learn rules for UAV behaviors for a SEAD mission. This complex task was approached by decomposing the problem into sub-tasks and learning hierarchical behaviors. Possible types or levels of behavior were categorized as local (individual) or global (collaborative). Local behaviors include actions by individual UAVs such as ‘destroy threat’ or ‘force threat shutdown’. Global behaviors refer to collaborative actions and team working between the UAVs to accomplish tasks such as ‘clear corridor through threats’.

The learnt local behaviors facilitate the learning of higher level behaviors. The lower-level local behaviors can be used to assist in generating training examples and provide the features used in the learning process for higher-level tasks. Essentially, the basic, lower-level, local behaviors can be combined through learning both directly (so the behaviors are executed concurrently) and temporally (executed sequentially) to produce the higher-level behaviors19. This approach of hierarchical task decomposition can be referred to as layered learning20.

A. Learning local UAV behaviorsSo far, work has concentrated on learning local UAV behaviors. The following sections describe experiments

carried out for learning rules that characterize threat destruction by individual UAVs. As previously stated inductive ML techniques learn rules by generalizing over a set of specific examples. A simulation of the SEAD scenario was used to extract examples of good and bad behaviors by individual UAVs for threat destruction. The scenario involved one threat and one UAV. During each simulation run, the UAV would launch a missile towards the threat at a random point. Over the whole example set, different features of the UAV flight path were varied, such as the UAV altitude, and the heading deviation from the direct flight path over the threat center. ‘Good’ behavior was defined as actions that resulted in threat destruction; ‘bad’ behavior accounted for any simulation runs where the threat was not killed. In total, 767 examples were generated using the SEAD simulation. In 104 (13.5%) of the examples, the UAV exhibited ‘good’ (positive) behavior and the threat was destroyed. In the remaining 663 (86.5%) examples the threat was not destroyed and the UAV behavior was classed as ‘bad’ (negative).

The learning process was performed using 10-fold cross-validation. This is where the example set is divided into 10 subsets of (approximately) equal size. Rules are learnt over the example set, each time leaving out one of the subsets, and using the omitted subset for computing test accuracy. The results can then be averaged over the 10 learning runs for a reliable validation of results.


B. Learning results

Training set

Rule set induced Training accuracy (% correctly

classified examples)

Test accuracy (% correctly classified

examples)

1 - 2 destroy_threat(Attack) :-missile_launched(Attack, T),heading_deviation(Attack ,T, Dev),lteq(-0.38, Dev), lteq(Dev, 0.23),threat_range(Attack, T, Range),lteq(62182, Range), lteq(Range, 77360).

destroy_threat(Attack) :-missile_launched(Attack, T),heading_deviation(Attack ,T, Dev),lteq(-0.86, Dev), lteq(Dev, 0.02),threat_range(Attack, T, Range),lteq(Range, 37316).

Translated into natural language this rule set has the following meaning:

IF A missile is launched by the UCAV at time step T

AND The distance to the threat at time T isbetween 62182m and 77240m

AND The heading deviation from a flight path thatpasses directly over the threat center isbetween –0.38 and 0.23 radians

THEN The threat will be destroyed

OR

IF A missile is launched by the UCAV at time step T

AND The distance to the threat at time T isless than 37316m



94.8 96.1

3 - 10 destroy_threat(Attack) :-missile_launched(Attack, T),heading_deviation(Attack ,T, Dev),lteq(-0.36, Dev), lteq(Dev, 0.23),threat_range(Attack, T, Range),lteq(62182, Range), lteq(Range, 77360).

Translated into natural language this has the following meaning:

IF A missile is launched by the UCAV at timestep T

AND The distance to the threat at time T is between 62182m and 77360m



92.9 93.8

Averages 93.3 93.9Table 1: Learning results for ‘destroy threat’. The rules are shown in both the first-order predicate logic program form, as induced by the ILP algorithm, and translated into IF-THEN rules.


The results of learning are shown in Table 1. These results are extremely promising, exhibiting an average classification accuracy (percentage of correctly classified examples) of 96.6% over the unseen test examples. The rules learnt are both accurate and robust.

Consider the rule that was induced over the training sets 3-10. An illustration of the rule appears in Fig. 3. Over the whole training set (767 examples), this rule covered 82/104 positive examples and 39/663 negative examples, giving 93.9% classification accuracy. The UCAV approaches the SAM threat with deviation from the direct path over the threat center between –0.36 and 0.23 radians. A missile is launched when the distance to the threat center from the UCAV is between 62182m and 77360m. This rule is a general characterization of a successful attack on a threat that could be used to determine when UAVs should attack threats in order to be successful. The results indicate that if an attack is carried out under the conditions stated in the IF part of the rule then the attack will be successful 82 times out of 106 – or 68% of the time.

The rules themselves also offer insight into threat suppression by highlighting the important factors of an attack. In this case, the key features of the UCAV flight path that influence the success of an attack are the range (or distance) to the threat center from the UCAV, and the UCAV bearing relative to the threat. Other features, such as the altitude of the UCAV, have not been selected as significant by the learning algorithm.

IX. DiscussionPrevious UAV work has explored the use of Soar to endow

UAVs with high level decision making capabilities. Soar was chosen in this latter application due to its demonstrated application to Computer Generated Forces for SEs and its symbolic nature, which render its knowledge transparent to interrogation. Soar allows the construction of agents to carry out task execution by employing coded knowledge from subject matter experts. This background knowledge can then be enhanced or augmented with Soar's learning mechanism.

However, the learning capacity of Soar is limited. Soar relies mainly on a deductive learning technique called chunking. Chunking can be successful but it generalizes little over the existing knowledge. Therefore, it fits the training data well, but not unseen, future data which may be different to the training data. Deductive learning techniques generally require domain knowledge that is both correct and complete, which is of course rare in the real, uncertain, world. At the start of this work it was proposed that an inductive ML technique might be employed to increase Soar’s learning capacity and produce more suitable solutions. Inductive techniques are as a rule more tolerant of noisy data and tend to produce solutions that are more general and robust to future

data. The previous section outlined results obtained from applying an ILP algorithm to the problem of learning local UAV behaviors in a SEAD mission. However ILP is a batch learning process hitherto applied inpredominately off-line data analysis applications. Induced rules could not be applied without there being translated into some kind of execution engine.

The remainder of this program of research will address the efficacy of a hybrid learning scheme where the relative strengths and weaknesses of both SCA and ILP will be used in a synergistic way. There are a number of options as to how this will be carried out and a full discussion of these and any results will form the basis of a future paper.

X. ConclusionThe overall aim of the work has been to examine if ML can be exploited by UCAV system designers to reduce

development cycles and increase system effectiveness. In pursuing this initial work a number of issues have become apparent. This paper has described early work in applying ML to induce tactical behaviors of UCAVs in a SEAD

Figure 3: Illustration of a learnt rule


scenario. Two learning methods have been compared and the potential benefits of a combination of ILP with Soar posited.

An initial aim of this work was to assess whether the use of ML could ameliorate the knowledge acquisition task and thus reduce development cycles. At this stage of the work any reduction is masked by the extra work required to obtain the data needed for training of the algorithms. Moreover, significant background knowledge is required to judge how best to present the data and what are the best conditions (in the IF-THEN rule sense) to present to the algorithm. It is not yet clear on the basis of this work what the balance between background knowledge required in the absence of a ML capability compared to that with and to the exact level and choice of command abstraction to apply the learning method. Clearly a desired outcome is that the UAVs can learn more robust behaviors making them more intelligent in the face of uncertainty and achieve this with a reduction in the overall software development task.

At the highest level the work here would appear to use the same simulation model to evaluate the learnt behaviors as it did to train them. The cross-validation scheme does mean that the data produced is not the same for both training and validation. In the longer term the learnt behaviors should be assessed in a higher fidelity SE where the effects at the desktop level reflect those at the higher level albeit of a lower fidelity. Once this transition is better understood it remains to make the more complex transition from SE to the real world.

For ML algorithms such as ILP, the rules are only as good as the data over which they are trained. If the data does not truly represent the causal effects then the rules learnt will be inadequate. Consequently great care must be exercised when developing simulation facilities for rule development and training such they can be easily used to generate many example positive and negative behaviors.

Nothing has been said regarding the practical implementation of such learning capability but the results do allow some thought in this area. Setting aside the issue of whether or not these algorithms can be cleared in a safety critical sense to the satisfaction of the authorities, there is the practical matter of implementation in an on-line sense versus off-line. Perhaps the safest implementation would be that the algorithms are applied post flight on data recovered from a vehicle. In this way learnt behaviors could be examined and cleared through SE experimentation. However, ILP would need a significant amount of data reflecting mission instances in order to learn anything sensible. The other extreme would have the algorithms learning in flight during a mission and the behavior loop closed on these new rules during the mission. Clearly at this stage of the work it is too early to propose a practical implementation scheme.

Overall the work here motivates further work to pursue the goals of:1) The potential benefits to a broadened “knowledge acquisition” bottleneck, 2) Understanding where and how best to apply learning3) Inducing appropriate behaviors at the local and global levels4) Increased capability of more intelligent UAVs.

XI. AcknowledgmentsThis work was carried out under the UK Ministry of Defense Corporate Research Program on behalf of the

Weapons Platforms and Energy Technical Area. Their support is gratefully acknowledged.

XII. References1Platts, J.T.; Application of a Variable Autonomy Framework to the Control of Multiple Air Launched UAVs; Proceedings of

16th Association of Unmanned Vehicle Systems International Conference, Orlando, FL; July 2002.2Platts, J.T.; The use of Synthetic Environments for Autonomous UAV Development; Proceedings of 18th Association of

Unmanned Vehicle Systems International Conference, Anaheim, CA; USA, Aug 2004.3Smith P. R., Marvin – Smart Algorithms for Combat UAVs, Blue Bear Systems Ltd, Aug 2001.4Clough, B May 2002, Metrics, Schmetrics! How Do You Track a UAV's Autonomy?, 1st AIAA UAV Systems, Technologies,

and Operations Conference and Workshop, Portsmouth VA, USA5White, A., 2002 (a), The Human-machine Partnership in UCAV Operations, 17th Unmanned Air Vehicle Systems

Conference, Bristol, UK.6White, A., 2002 (b), The role of the operator in UCAV operations, Proceedings of 16th Association of Unmanned Vehicle

Systems International Conference, Disney's Coronado Springs Resort, FL USA.7Lin, Ching-Fang, 1994, Advanced Control Systems Design, Prentice Hall Series in Advanced Navigation, Guidance, and

Control, and their Applications, Prentice-Hall Inc, USA.


8Smith, P. R. S., Mayo, E., O’Hara, J., Griffith, D., 1999, Combat UAV Real-Time SEAD Mission Simulation, AIAA Flight Mechanics Conference 1999, AIAA-99-4185, Baltimore, USA.

9Howitt, S. L., Mayo, E., Platts, J. T., 2001, Simulating the Attack of High Value Mobile Targets Using Combat UAVs, Bristol RPV Conference, Bristol, UK.

10Howitt, S. L., Platts, J. T., 2002, Real-Time Deep Strike Mission Simulation Using Air-Launched UAVs, AIAA Conference on UAVs, Portsmouth VA, USA.

11Mitchell T.M., Machine Learning, McGraw-Hill, 1997.12Platts, J. T.; Terms of Reference for GARTEUR Flight Mechanics Action Group FM AG14, Autonomy in UAVs, ver 1.4,

Feb 2003.13Platts, J. T., Ogren P., Howell S. E., McCallum A., Supporting Documentaion for the GARTEUR FM AG14 Design

Challenge, ver 1, Aug 04.14Wray R. E., A Brief Overview of Symbolic Concept Acquisition (SCA), http://sitemaker.umich.edu/soar15Wray R. E., Chong R. S.; Quantitative Explorations of Category Learning with Symbolic Concept Acquisition, 5th

International Conference on Cognitive Modelling, Bamberg, Germany, April 200316Camacho R., Learning stage transition rules with IndLog, in Proceedings of the 4th International Conference on Inductive

Logic Programming (S.Wrobel, ed.), Bonn, Germany, September 1994.17Abrial J-R, Borger E., and Langmaack H., Formal Methods for Industrial Applications: Specifying and Programming the

Steam Boiler Control, LNCS1165, Springer-Verlag, October 1996.18Lavrac N. and Dzeroski S., Inductive Logic Programming: Techniques and Applications. Ellis Horwood, New York, 1994.19Stone P. and Veloso M., Layered Learning, in Proceedings of the 11th European Conference on Machine Learning,

Barcelona, Catalonia, Spain, June, 2000.20Matarić M.J., Interaction and Intelligent Behavior, PhD Thesis, 1994.

[American Institute of Aeronautics and Astronautics AIAA 3rd "Unmanned Unlimited" Technical...

Documents

Transcript of [American Institute of Aeronautics and Astronautics AIAA 3rd "Unmanned Unlimited" Technical...