Archetypical Motion: Supervised Game Behavior...

8
Archetypical Motion: Supervised Game Behavior Learning with Archetypal Analysis Rafet Sifa Game Analytics and University of Bonn [email protected] Christian Bauckhage Fraunhofer IAIS and University of Bonn [email protected] Abstract—The problem of creating believable game AI poses numerous challenges for computational intelligence research. A particular challenge consists in creating human-like behaving game bots by means of applying machine learning to game-play data recorded by human players. In this paper, we propose a novel, biologically inspired approach to behavior learning for video games. Our model is based on the idea of movement prim- itives and we use Archetypal Analysis to determine elementary movements from data in order to represent any player action in terms of convex combinations of archetypal motions. Given these representations, we use supervised learning in order to create a system that is able to synthesize appropriate motion behavior during a game. We apply our model to teach a first person shooter game bot how to navigate in a game environment. Our results indicate that the model is able to simulate human-like behavior at lower computational costs than previous approaches. I. I NTRODUCTION In addition to being a popular source of entertainment and revenues, video games have spawned considerable research in human level artificial intelligence (AI). Modern video games not only provide realistic environments that simulate the real world but also allow for investigating task specific behaviors of players, for instance by means of mining the network traffic between game clients and game servers. Over the years different computational intelligence tech- niques have been considered as a means of bridging the gap between human behavior and video game AI. Recent examples include analyzing social behavior in MMORPGs [1], analyzing players’ lifetime values [2] or modeling their experience [3], generating game content [4], or creating believable Non-Player Characters (NPCs) [5]–[7]. For a comprehensive review of the state of the art regarding the latter, we refer to [8]. One of the many challenging tasks in modeling believable NPC behavior is the problem of generating autonomously navigating game bots. Recent games use techniques like finite state machines or the A * algorithm to implement navigation behaviors. Such techniques create an illusion of intelligence that vanishes quickly once players get used to a game because the actions of game-bots tend to become stale and predictable. In this paper, we address this problem and study a behavior learning model that uses recorded human game-play data to create NPCs that act human-like. Our focus is on learning how to navigate the game environment. Similar to how movement is realized in biological organisms, we model player actions in terms of mixtures of primitives. That is, we describe actions as displacement vectors that move game agents from one location to another and assume a finite number of action primitives that Fig. 1: Unreal Tournament 2004 R with a Pogamut 3 Bot are linearly combined to represent any possible action that the game bot can perform. Our computational model is composed of two parts: first, we extract action primitives from large amounts of in-game data, and second, we learn how to mix action primitives based on the state history of a game bot. For the first part, we identify action primitives using Archetypal Analysis [9] to represent motions as convex mixtures of primitives. For the second part, we build a predictive model that learns mixing coefficients in a supervised manner so that, during the game, our system can determine suitable motions and synthesize them automatically. To the best of our knowledge, this is the first study that applies archetypal analysis to game play data in order to create movement behavior for game-bots. We test our behavior learning model in the context of Unreal Tournament 2004 R (UT 2004), a fast paced first person shooter game by Epic Games (see Fig. 1). For the work reported here, we access the decision making modules via the middle-ware library Pogamut 3 [10]. Next, we review earlier work on supervised learning in video games and briefly discuss action primitives. Then, in Section III, we provide an introduction to Archetypal Analysis and explain the Simplex Volume Maximization Algorithm to efficiently determine archetypes from data. In Section IV, we elaborate on our model and present experimental evaluations and comparisons to other behavior learning techniques in Section V. Finally we conclude our contribution in Section VI.

Transcript of Archetypical Motion: Supervised Game Behavior...

Page 1: Archetypical Motion: Supervised Game Behavior …eldar.mathstat.uoguelph.ca/dashlock/CIG2013/papers/4_cig...on the state history of a game bot. For the first part, we identify action

Archetypical Motion: Supervised Game BehaviorLearning with Archetypal Analysis

Rafet SifaGame Analytics and University of Bonn

[email protected]

Christian BauckhageFraunhofer IAIS and University of [email protected]

Abstract—The problem of creating believable game AI posesnumerous challenges for computational intelligence research. Aparticular challenge consists in creating human-like behavinggame bots by means of applying machine learning to game-playdata recorded by human players. In this paper, we propose anovel, biologically inspired approach to behavior learning forvideo games. Our model is based on the idea of movement prim-itives and we use Archetypal Analysis to determine elementarymovements from data in order to represent any player action interms of convex combinations of archetypal motions. Given theserepresentations, we use supervised learning in order to createa system that is able to synthesize appropriate motion behaviorduring a game. We apply our model to teach a first person shootergame bot how to navigate in a game environment. Our resultsindicate that the model is able to simulate human-like behaviorat lower computational costs than previous approaches.

I. INTRODUCTION

In addition to being a popular source of entertainment andrevenues, video games have spawned considerable research inhuman level artificial intelligence (AI). Modern video gamesnot only provide realistic environments that simulate the realworld but also allow for investigating task specific behaviorsof players, for instance by means of mining the network trafficbetween game clients and game servers.

Over the years different computational intelligence tech-niques have been considered as a means of bridging the gapbetween human behavior and video game AI. Recent examplesinclude analyzing social behavior in MMORPGs [1], analyzingplayers’ lifetime values [2] or modeling their experience [3],generating game content [4], or creating believable Non-PlayerCharacters (NPCs) [5]–[7]. For a comprehensive review of thestate of the art regarding the latter, we refer to [8].

One of the many challenging tasks in modeling believableNPC behavior is the problem of generating autonomouslynavigating game bots. Recent games use techniques like finitestate machines or the A∗ algorithm to implement navigationbehaviors. Such techniques create an illusion of intelligencethat vanishes quickly once players get used to a game becausethe actions of game-bots tend to become stale and predictable.

In this paper, we address this problem and study a behaviorlearning model that uses recorded human game-play data tocreate NPCs that act human-like. Our focus is on learning howto navigate the game environment. Similar to how movementis realized in biological organisms, we model player actions interms of mixtures of primitives. That is, we describe actions asdisplacement vectors that move game agents from one locationto another and assume a finite number of action primitives that

Fig. 1: Unreal Tournament 2004 R© with a Pogamut 3 Bot

are linearly combined to represent any possible action that thegame bot can perform.

Our computational model is composed of two parts: first,we extract action primitives from large amounts of in-gamedata, and second, we learn how to mix action primitives basedon the state history of a game bot. For the first part, weidentify action primitives using Archetypal Analysis [9] torepresent motions as convex mixtures of primitives. For thesecond part, we build a predictive model that learns mixingcoefficients in a supervised manner so that, during the game,our system can determine suitable motions and synthesizethem automatically. To the best of our knowledge, this isthe first study that applies archetypal analysis to game playdata in order to create movement behavior for game-bots.We test our behavior learning model in the context of UnrealTournament 2004 R©(UT 2004), a fast paced first person shootergame by Epic Games (see Fig. 1). For the work reported here,we access the decision making modules via the middle-warelibrary Pogamut 3 [10].

Next, we review earlier work on supervised learning invideo games and briefly discuss action primitives. Then, inSection III, we provide an introduction to Archetypal Analysisand explain the Simplex Volume Maximization Algorithm toefficiently determine archetypes from data. In Section IV, weelaborate on our model and present experimental evaluationsand comparisons to other behavior learning techniques inSection V. Finally we conclude our contribution in Section VI.

Page 2: Archetypical Motion: Supervised Game Behavior …eldar.mathstat.uoguelph.ca/dashlock/CIG2013/papers/4_cig...on the state history of a game bot. For the first part, we identify action

II. MOTIVATION AND RELATED WORK

In this section, we briefly review previous approaches thatused Machine Learning and Pattern Recognition in order tocreate more believable NPCs in video games. Furthermore, weintroduce the notion of movement primitives from the fieldsof biology and neuroscience and discuss examples of their usein robotics and game AI.

Learning the behavior of a video game character can be castas estimating a function f that maps from a player’s state spaceto the corresponding action space [5]. In particular, given ahistory of player states st, st−1, . . ., where t defines the currenttime, and the current state of the game environment et, thefunction f is supposed to determine an appropriate action at+1,that is

at+1 = f(st, st−1, . . . , st−q, et) and (1)st+1 = st + at+1. (2)

The above model has been adapted in many studies onlearning human-like behavior in video games. Using recordedtournament data sets Zanetti and Rhalibi [11] realized thefunction f by means of Multilayer Perceptrons (MLPs) to learnaiming, shooting, weapon selection, and navigation. Thurauet al. [6] used a hybrid, neural network based method thatcombines Self Organization Maps and MLPs to learn humanlike behavior from previously recorded demo data for the FPSgame Quake II R©. The same authors also considered Bayesianlearning methods based on state transition probabilities in away point map created using a Neural Gas Algorithm [12].Similar learning approaches have been adopted to other gamegenres as well. Chaperot and Fyfe [13] used MLPs in amotocross game to teach game bots how to ride a motorcyclefrom recorded human game-play data. Gemine et al. [7] usedMLPs to learn production strategies in Starcraft II R©.

Though generally successful, all these approaches arerather ad-hoc in that they did not systematically address aspectsof information representation or coding. In particular, questionspertaining to suitable topologies or parameterizations of neuralarchitectures have not been studied in the above contributionsand, consequently, issues such as computational efficiency ormemory consumption have been largely ignored.

Aspects of efficient neural representations are studied inbiology and neuroscience. With respect to the learning ofmovement behaviors in video games, we may thus assumeanother point of view on how to model the generation ofbehaviors. One of the currently dominant theories addressingthe production of motion behavior relies on the notion of actionprimitives. Action primitives are understood as finite sets ofelementary action generators whose outputs are combined inorder to solve complex tasks. For instance, having analyzedthe forces generated in the limbs of frogs through stimulatingtheir spinal cords, Bizzi et al. [14] showed that certain forcefields can act as primitives which, when linearly combined,create other force fields that are in the limb’s workspace. Con-sidering this mechanism as a weighting scheme, the weightsare believed to be determined by the nervous system [15].Contributions like these support the idea that elementary motorprimitives are not unique for every action the body performsbut rather that combinations of general motor primitives pro-

Fig. 2: Didactic examples of Archetypal Analysis using threeor four archetypes (shown as squares) to represent a sample of2D data points. The archetypes characterize extreme tendenciesin the data. Using equation (3), points inside of the archetypalconvex hull (shown in green) can be reconstructed as a convexcombination of archetypes; points on the outside (shown inred) are mapped to the closest point on the hull. Best viewedin color.

vide the set of behaviors (or repertoires) that biological bodiescan perform [15].

The notion of movement primitives has already been usedin robotics, for instance to learn human arm movements.Taking four principle components of an arm trajectory dataset as primitives and linearly combining them, Park and Jo[16] described an optimization method to calculate the mixturecoefficients that realize complex motions using the primitives.A similar representation has also been studied in the contextvideo games. Thurau et al. [17] proposed to learn human-likemovement behavior in Quake II R© and extracted elementarymovement vectors using Principal Component Analysis (PCA).They then determined action primitives as the most frequentlyoccurring actions and used probabilistic models to selectappropriate action vectors based on a player’s state history.

With the work reported here, we extend these previouscontributions. In particular, we introduce convexity constraintsin the determination of primitives. This causes the extractedmotion primitives to be extreme motions rather than averageones and, as we will show, leads to more efficient representa-tions for behavior synthesis.

III. ARCHETYPAL ANALYSIS AND SIMPLEX VOLUMEMAXIMIZATION

Given a data set the main goal of Archetypal Analysis [9]is to determine extreme entities in the data called archetypes.Once archetypes have been determined, every available dataentity can be represented as a convex combination of theseextremes (see Fig. 2).

Formally, the problem is as follows: Given an m dimen-sional data set of n entities X = {xi ∈ Rm | i = 1, . . . , n},Archetypal Analysis attempts to find k � n archetypesW = {wj ∈ Rm | j = 1, . . . , k} and n stochastic coefficientvectors H = {hi ∈ Rk | i = 1, . . . , n; ‖hi‖1 = 1} toapproximate each data vector as a convex combination of the

Page 3: Archetypical Motion: Supervised Game Behavior …eldar.mathstat.uoguelph.ca/dashlock/CIG2013/papers/4_cig...on the state history of a game bot. For the first part, we identify action

Algorithm 1 Simplex Volume Maximization

Select xi randomly from XChoose the first simplex vertex:w1 = argmaxl dist(xl, argmaxz dist(xi, xz))

for index i ∈ [2, k] doLet Si−1 be the current simplex with i− 1 vertices.Find the vertex that maximizes:wi = argmaxq V ol(S ∪ xq)

Update Si = Si−1 ∪ wi.end for

archetypes. Thus, once archetypes are available, we may write

xi ≈ x′i =k∑

j=1

wjhij where hij ≥ 0 ∧k∑

j=1

hij = 1. (3)

Various ways have been proposed to determine suitableArchetypes. Cutler and Breiman [9] proposed an alternatingleast squares algorithm to simultaneously identify archetypesand coefficients. Unfortunately, this constrained convex opti-mization problem scales cubically with the number of data andhence does not allow for efficient learning from large data sets.

Thurau et al. [18] therefore introduced several methods tospeed up archetypal computation. In particular, they proposedthe Simplex Volume Maximization (SIVM) algorithm as aprovably linear time approach to determine archetypes bymeans of fitting a simplex of maximum volume to a givendata set. In contrast to the approach in [9], SIVM constrainsthe archetypes wj to coincide with certain extremal data pointsxi. The approach is based on notions from distance geometryand considers the volume of the Cayley-Menger Determinantin order to determine suitable archetypes; the essential stepsof the procedure are summarized in Algorithm 1.

IV. ARCHETYPICAL MOTION

In this section we describe our hybrid biologically inspiredmodel to learn movement behavior in video games. We adaptthe idea of Mussa-Ivaldi and Bizzi [15] to video games; insteadof directly mapping from the game state space to the actionspace we aim to represent actions of players as weightedcombinations of previously determined action primitives. Ourgoal is to obtain a learning model that is not only biologicallyadequate but also efficient in terms of learning.

Dividing our model into two parts, we first compute arepresentation of primitives that highly matches with biologicalmechanisms. Second, we aim to generalize this model insupervised manner. Fig. 3 shows an overview over the work-flow of our system for learning action primitives and buildinga model to generalize the mixing coefficients.

A. Archetypal Analysis to Identify Action Primitives

With respect to the learning of motion behaviors, we definea primitive as an archetypical action that moves the body of thegame player from one location to another. Having recorded ndisplacement vectors D = {di ∈ R3 | i = 1, . . . , n} from thegame environment as our training action vectors, we determineaction primitives by means of extracting k � n archetypes

Fig. 3: Overview over the architecture of our approach towardslearning movement primitives for game bot motion generation.

P = {pj ∈ D | j = 1, . . . , k} from D. Suitable mixingcoefficients hij are then determined from minimizing

minhij

n∑i=1

∥∥∥∥di − k∑j=1

pjhij

∥∥∥∥2 (4)

s.t. hij ≥ 0 ∧k∑

j=1

hij = 1

which can be accomplished using common solvers [19]. Sub-sequently, we can represent each displacement vector di as aconvex combination of archetypal movements

di =

k∑l=1

hilpl. (5)

This representation directly matches the notion of actionprimitives described in [15]. Additionally, since we considerconvex combinations, our model also copes with the fact thataction primitives cannot be combined arbitrarily but have tocomply with limitations (i.e. work-spaces of body parts [15])when generating movements. In the next step, we show howwe can generalize our model and learn the mixing coefficientsin order to control the navigation behavior of an UnrealTournament 2004 R© game bot.

B. Learning Mixing Coefficients

In order to use the above model to generate actions fromhuman action data in real time, we need to determine how tofind the mixing coefficients to combine the primitives.

Formally, using the primitives P , the coefficient vectors H ,and histories of state vectors S, our main goal is to learn afunction

fmixture : S → H (6)

where H denotes the domain of all possible mixing vectors h.Considering the state history as in (1), our task is to find the

Page 4: Archetypical Motion: Supervised Game Behavior …eldar.mathstat.uoguelph.ca/dashlock/CIG2013/papers/4_cig...on the state history of a game bot. For the first part, we identify action

mixing coefficients ht at each time step t and then use themto weight the combination of the primitives to determine thenext action at+1. That is, we compute ht and at+1 as follows

ht = fmixture(st, st−1, · · · , st−q) (7)

at+1 =

k∑j=1

htjpj . (8)

Any function approximation technique can be used tolearn fmixture. We choose to learn fmixture using MultilayerPerceptrons (MLPs) as they are considered to be the universalfunction approximators. Additionally, MLPs are in line withour idea of biologically adequately generated motions sincethey compute mixing coefficients using neural mechanisms.

Since our training data consists of n consecutive datasamples containing pairs of states and actions, we can trainour MLP histories using histories of states as input. Followingthe model in (1), we thus set a time delay parameter q todenote the length of sequence of states and build an input dataset

S = {σt = (st, st−1, . . . , st−q) | t = n, n− 1, . . . , q} (9)

for training. Similarly, we define the output as the set thatis composed of the corresponding actions observed in ourrecordings

A = {at | t = n, n− 1, . . . , q}. (10)

Since our aim is to interpret every action as a combina-tion of the previously determined primitives, we replace theactions in A with the corresponding coefficients that mapsto the actions. Namely, we map the actions aj ∈ A to theircorresponding coefficient vector hj ∈ H and obtain

O = {ht | t = n, n− 1, . . . , q} (11)

Finally, after training an MLP with the input data set S andoutput data O, we can generate mixing coefficients at run time.Combining the primitives weighted by the coefficients outputby the trained MLP provides a game bot with the action to betaken at the current state of the game.

V. RESULTS AND EVALUATIONS

In this section we present results of experiments conductedto evaluate our model. Here, we focus on our model’s learningcapability in two baseline settings and consider the followingimitation tasks to be learned: running a circular or a crossedpath. Then, having shown that our model is capable of learningthese tasks, we compare it to different previous models todemonstrate that our model facilitates learning and that it gen-erates plausible movements from only a rather small numberof action primitives.

A. Experiment Settings

In order to determine mixing coefficients during runtimeof a game, we used Time Delayed Neural Networks (TDNNs)which are a type of MLPs where input vectors contain a historyof previous states. We trained our model with data extractedat 10 Hz, from the UT 2004 map DM-TrainingDay. Figure 4

(a) Circular Motion

(b) Crossed Path Motion

Fig. 4: Exemplary 2D projections of recording of paths tra-versed by a human player (green dots). Cluster centroids (bluedots) obtained from a Neural Gas algorithm represent thetopology of the corresponding game map.

shows examples of two dimensional projections of recordedin-game data.

As the number of states in the history vectors is a crucialparameter when considering the complexity of learning task,we evaluated its impact on performance. Extensive experimentsshowed that histories of 3 and 8 state vectors for circular andcrossed motion, respectively, yielded the best results in termsof smoothness of the motion performed by the bot. Therefore,we confine our presentation of results to cases observed forthese settings.

Similar to [12], we clustered the data set using a NeuralGas algorithm to learn the topology of the map and to providea way to discretize the state space, and to be able to directly

Page 5: Archetypical Motion: Supervised Game Behavior …eldar.mathstat.uoguelph.ca/dashlock/CIG2013/papers/4_cig...on the state history of a game bot. For the first part, we identify action

(a) Circular Motion Data

(b) Crossed Path Motion Data

Fig. 5: Simplex Projection of the action vectors in our trainingdata. Extracted motion primitives are shown in boxes; greendots are meant to visualize how the remaining training dataare obtained as convex mixtures of these archetypes.

deal with motion vectors in the interfacing library.

We used fully connected TDNNs with 2 hidden layerscontaining 50 and 25 neurons in the first and the second hiddenlayers, respectively. Additionally, we used sigmoid function asthe transfer function and trained the TDNNs using the ResilientPropagation algorithm [20] with 2000 epochs. Finally, wemeasured the learning performance. We applied 70% of thedata (using random sampling without replacement) for trainingand 30% for testing and compared the Mean Squared Error(MSE) values obtained from the training and testing phases.

(a) Training Error (b) Testing Error

Fig. 6: MSE statistics for increasing numbers of Neural Gascentroids for the task of learning to run a circular path.

(a) Training Error (b) Testing Error

Fig. 7: MSE statistics for increasing numbers of Neural Gascentroids for the task of learning to run a crossed path.

B. Learning Circular and Crossed Motion

As a starting point, we trained our model with data setscontaining 1493 and 5727 samples from tours of a humanplayer running on a circular and crossed paths respectively.In a first step we extracted the movement primitives usingSIVM. Figure 5 shows a two-dimensional visualization of howthe actions observed in the corresponding data are distributedover six automatically determined archetypal motion vectors.Note that the archetypal motion vectors shown in the figureare indeed three dimensional. Variations along the x and ydirections are immediately obvious; variations along the z di-rection are encoded using colors: cyan represents a downwardsdisplacement, for instance observed whenever the player didrun down a flight of stairs; red represents an upward motion,for instance observed during jumps or when walking up stairs;vectors of magenta color represent planar motion only.

Having trained MLPs to determine mixture coefficients foractivity synthesis, we measured an MSE of 0.0015 for thetraining data and an MSE of 0.0016 for the test data on circularmotions. For motions along a crossed path, we observed atraining MSE of 0.0018 and a test MSE of 0.0019. In eithercase, using the resulting MLPs in the decision making moduleof UT 2004 to synthesize motion within the game environment,we visually observed the resulting game bot to behave verysimilar to the human player whose data had been used fortraining.

In order to analyze the effect of the number of neural gasneurons for topology representation on behavior learning, weconducted experiments with varying numbers of Neural GasCentroids. For the configurations described above, we repeatedthe training process 30 times for an increasing number of

Page 6: Archetypical Motion: Supervised Game Behavior …eldar.mathstat.uoguelph.ca/dashlock/CIG2013/papers/4_cig...on the state history of a game bot. For the first part, we identify action

Fig. 8: Comparison of Testing Error for Circular Motion

neural gas centroids and analyzed the training and testingerrors for circular and crossed paths. Figures 6 and 7 showour results for the two tasks considered. (Note that here andin the following figures, the y axis is scale to a maximum of0.01 to ease comparisons.) The results summarized in thesefigures suggest that the appropriate number of Neural Gasclusters is likely to be 60 for the task of circular motion dataas this yields the lowest average testing error. Additionally, weobserve that choosing centroid numbers larger than about 20does not significantly impact the performance of our behaviorlearning module. For the task of learning to run along a crossedpath, we observe similar behavior; the error values do not differsignificantly if the number of Neural Gas centroids is increasedbeyond 80. Yet, the smallest average mean error is found for160 centroids.

C. Evaluations

Having tested the learning capability of our model, weconducted two additional sets of experiments in order toinvestigate the impact of representing actions as combinationsof primitives and the merits of using Archetypal Analysis todetermine primitives.

In the first type of experiments we examine the benefits ofadding the intermediate step of learning primitives and describ-ing a players’ actions as combinations of action primitives. Tothis end, we compare the learning results of our model to thoseof a model that learns a direct mapping from the state spaceto the action space.

Figure 8 shows the resulting average errors on our test datafor circular motion where the whiskers indicate the correspond-ing standard deviations. Using the direct mapper, our modelwith 6 and 8 action primitives. We observe that, when usingonly six action primitives, our approach always yields betterresults than the direct mapper. We performed the same typeof experiments to learn crossed motion using either a directmapper or our model with 6, 8, 10, or 12 action primitives.The resulting average errors are shown in Fig. 9. Again wefind that our model, which learns stochastic coefficient vectorsto synthesize actions, outperforms the approach where statehistories are mapped to actions directly.

Therefore, action primitives determined by means ofArchetypal Analysis seem to simplify the problem of context

(a) 6 and 8 Action Primitives

(b) 10 and 12 Action Primitives

Fig. 9: Comparison of Testing Error for Crossed Motion

sensitive learning of complex activities. This is indeed wellin line with biological findings and corresponding theories asto the utility of action primitives. In other words, our resultssuggest that the task of learning to perform appropriate actionsis simplified if complex actions are being synthesized fromelementary ones rather than being represented explicitly.

In the second set of experiments we attempt to validatethe usefulness of our representation of action primitives interms of extreme data points. To this end, we compare ourmodel to an approach where action primitives are assumed tobe average data points. That is, we compare our model to abehavior learning method based on defining action vectors interms of primitives that occur frequently. The latter strategyhas been proposed by Thurau et al. [17] and we follow theirsuggestion in that we we cluster our training set of playeractions using k-means clustering [21] in order to determineaction primitives. Subsequently, we used MLPs of the structureand parametrization as above to learn mappings from the statespace to a space of coefficient vectors which are used tosynthesize actions.

Using the same color coding as above, Fig. 10 displays theresults of k-means clustering of our action data sets for circularand crossed motions. Comparing these results to the ones in

Page 7: Archetypical Motion: Supervised Game Behavior …eldar.mathstat.uoguelph.ca/dashlock/CIG2013/papers/4_cig...on the state history of a game bot. For the first part, we identify action

(a) Primtives from Circular Motion

(b) Primtives from Crossed Motion

Fig. 10: Action primitives resulting from k-means clusteringwhere wet set k = 6.

Fig. 5, we note that primitives resulting from k-means tend torepresent shorter displacements and appear to cover a smallervariety of displacement angles. Using k-means to determine 6or 8 action primitives, we obtained a bot that manages to runthe circular but shows rather abrupt changes of direction. Thisapparent artificial, i.e. less believable, behavior disappearedafter increasing the number of k-means centers to 20.

For the scenario where the bot was supposed to learn torun a crossed path, we obtained an even less encouragingresult. While our model was perfectly capable to synthesizecomplex motion from only a small number of primitives, theapproach based on k-means clustering was not. Having trainedthe k-means based model to learn crossed motion with 6, 8,10, 12 primitives, we did not obtain working bots but botsthat frequently ran into walls or deviated from their path.As stated in [17] increasing the number of k-means centershas an obvious impact on the type of the motion and, in ourcase, increasing the number of primitives helped smoothingthe generated motion. Yet, even though for models with 50,100, 150, and 200 action primitives the motion performed bythe bot were noticeably improved, the bot was still hitting thewalls from time to time or got stuck in curves. These resultsindicate that the more complex the task is the larger the numberof primitives we need if we use k-means to represent actionprimitives.

Finally, we note that the results of the last set of exper-iments illustrate the impact of the level of complexity of atask to be learned from human generated data. While ourexample of learning to perform circular motion around a mapcan be used as a baseline test-bed to asses the capabilitiesof a learning architecture, the slightly more difficult task oflearning to move along a crossed path already poses challengesthat learning frameworks which are too simplistic cannot copewith anymore.

VI. CONCLUSION AND FUTURE WORK

In this paper, we addressed the problem of learning be-lievable game-bot behavior from recordings of human gameplay. To this end, we proposed a model that adapts thenotion of action primitives from biology and neuroscience. Incontrast to earlier related work, we explored the merits of usingextreme data points rather than average ones to represent actionprimitives. Our resulting model thus consists of two stages.

In the first stage, we use Archetypal Analysis by means ofthe SIVM algorithm in order to extract action primitives froma data set of observed human activities. In the next stage,we apply supervised learning using Multilayer Perceptrons inorder to map histories of game states to coefficient vectorswhich then allow us to synthesize actions in terms of convexcombinations of archetypal actions. The resulting frameworkwas shown to produce believable, i.e. human-like, behaviorsfor NPCs.

In addition to being biologically adequate, extensive exper-iments using Unreal Tournament 2004 as a test-bed revealedthe following beneficial properties of our approach:

• Convex combinations of archetypal actions are able tomimic a variety of more complex activities which arenecessary to allow a game bot to perform extendedtasks.

• Convex combinations of archetypal actions show verygood quantitative and qualitative performance; quan-titatively, our approach yields consistently lower errorrates on test data than related approaches that attemptto learn actions directly; qualitatively, our approachwas observed to produce in-game behavior that closelyresembles that of human players.

• Convex combinations of archetypal actions requirefewer primitives in order to produce appropriate be-haviors than approaches based on linear combinationsof average or frequent actions; we attribute this to thefact that the MLP which learns to associate state histo-ries with coefficient vectors has to deal with constraintdomain; that is, in contrast to learning architecturesthat have to learn arbitrary combinations of primitives,the mixing coefficients in our model obey convexityconstraints hij ≥ 0 and

∑j hij = 1. In other words,

the coefficient vectors hi reside in a k dimensionalsimplex so that the range of the output of our MLPis naturally bounded. This yields higher numericalstability in training, faster convergence to solutions,and apparently good generalization capabilities.

To the best of our knowledge, synthesizing behaviors fromarchetypal action primitives has not been attempted before, nei-ther in biological research, nor in robotics or game AI. Giventhe favorable performance of the proposed model in rathersimple tasks, the obvious next step is to apply the approachto more demanding settings. This will include capabilities for,say, aiming and shooting and likely require more elaboratehybrid architectures such as in [6], [11], [13]. In future workwe will thus explore to what extend the approach proposedin this paper can be tailored towards hierarchical systemsconsisting of dedicated expert networks that are activated in acontext-dependent manner.

In fact, as this idea is akin to the idea of mixtures of expertarchitectures [22] where there are gating networks whoseoutput layers weight the contributions of individual experts, werecognize a similarity to the idea of behavior synthesis fromaction primitives. We therefore have reasons to be optimisticthat this approach will yield useful behaviors for game bots.

Page 8: Archetypical Motion: Supervised Game Behavior …eldar.mathstat.uoguelph.ca/dashlock/CIG2013/papers/4_cig...on the state history of a game bot. For the first part, we identify action

REFERENCES

[1] C. Thurau and C. Bauckhage, “Analyzing the Evolution of SocialGroups in World of Warcraft,” in Proc. IEEE CIG, 2010.

[2] C. Bauckhage, K. Kersting, R. Sifa, C. Thurau, A. Drachen, andA. Canossa, “How Players Lose Interest in Playing a Game: AnEmpirical Study Based on Distributions of Total Playing Times,” inProc. IEEE CIG, 2012.

[3] G. Yannakakis and J. Togelius, “Experience-Driven Procedural ContentGeneration,” IEEE Trans. Affective Computing, vol. 2, no. 3, pp. 147–161, 2011.

[4] C. Pedersen, J. Togelius, and G. Yannakakis, “Modeling Player Expe-rience in Super Mario Bros,” in Proc. IEEE CIG, 2009.

[5] C. Bauckhage, C. Thurau, and G. Sagerer, “Learning Human-LikeOpponent Behavior for Interactive Computer Games,” in Pattern Recog-nition, ser. LNCS, vol. 2781. Springer, 2003.

[6] C. Thurau, C. Bauckhage, and G. Sagerer, “Combining Self OrganizingMaps and Multilayer Perceptrons to Learn Bot-Behavior for a Commer-cial Computer Game,” in Proc. GAME-ON NA, 2003.

[7] Q. Gemine, F. Safadi, R. Fonteneau, and D. Ernst, “Imitative Learningfor Real-Time Strategy Games,” in Proc. IEEE CIG, 2012.

[8] G. Yannakakis, “Game AI Revisited,” in Proc. ACM Conf. on Comput-ing Frontiers, 2012.

[9] A. Cutler and L. Breiman, “Archetypal Analysis,” Technometrics,vol. 36, no. 4, pp. 338–347, 1994.

[10] J. Gemrot, R. Kadlec, M. Bıda, O. Burkert, R. Pıbil, J. Havlıcek,L. Zemcak, J. Simlovic, R. Vansa, M. Stolba, T. Plch, and C. Brom,“Pogamut 3 Can Assist Developers in Building AI (Not Only) for TheirVideogame Agents Agents for Games and Simulations,” in Agents forGames and Simulations, ser. LNCS. Springer, 2009, vol. 5920.

[11] S. Zanetti and A. Rhalibi, “Machine learning techniques for FPS inQ3,” in Proc. ACM ACE, 2004.

[12] C. Thurau, T. Paczian, and C. Bauckhage, “Is Bayesian ImitationLearning the Route to Believable Gamebots?” in Proc. GAME-ON NA,2005.

[13] B. Chaperot and C. Fyfe, “Improving Artificial Intelligence In aMotocross Game,” in Proc. IEEE CIG, 2006.

[14] E. Bizzi, S. F. Giszter, E. Loeb, F. A. Mussa-Ivaldi, and P. Saltiel,“Modular Organization of Motor Behavior in the Frog’s Spinal Cord,”Trends in Neurosciences, vol. 18, no. 10, pp. 442–446, 1995.

[15] F. A. Mussa-Ivaldi and E. Bizzi, “Motor Learning Through the Combi-nation of Primitives,” Philosophical Trans. of the Royal Soc.: BiologicalSciences, vol. 355, no. 1404, pp. 1755–1769, 2000.

[16] F. Park and K. Jo, “Movement Primitives and Principal ComponentAnalysis,” in Advances in Robot Kinematics, J. Lenarcic and C. Galletti,Eds. Springer Netherlands, 2004.

[17] C. Thurau, C. Bauckhage, and G. Sagerer, “Synthesizing Movementsfor Computer Game Characters,” in Pattern Recognition, ser. LNCS,vol. 3175. Springer, 2004.

[18] C. Thurau, K. Kersting, M. Wahabzada, and C. Bauckhage, “Descrip-tive Matrix Factorization for Sustainability: Adopting the Principle ofOpposites,” Data Mining and Knowledge Discovery, vol. 24, no. 2, pp.325–354, 2012.

[19] S. Boyd and L. Vandenberghe, Convex Optimization. CambridgeUniversity Press, 2004.

[20] M. Riedmiller and H. Braun, “A Direct Adaptive Method for FasterBackpropagation Learning: The RPROP Algorithm,” in IEEE Proc. Int.Conf. on Neural Networks, 1993, pp. 586–591.

[21] J. MacQueen, “Some Methods for Classification and Analysis of Mul-tivariate Observations,” in Proc. 5th Berkeley Symp. on MathematicalStatistics and Probability, 1967.

[22] R. Jacobs, M. Jordan, S. Nowlan, and G. Hinton, “Adaptive Mixturesof Local Experts,” Neural Computation, vol. 3, no. 1, pp. 79–87, 1991.