Fast Game Content Adaptation Through Bayesian-based Player ...

Fast Game Content Adaptation ThroughBayesian-based Player Modelling

Miguel Gonzalez-DuqueCreative AI Lab

IT University of CopenhagenCopenhagen, Denmark

[email protected]

Rasmus Berg PalmCreative AI Lab


[email protected]

Sebastian RisiCreative AI Lab


[email protected]

Fig. 1: Fast Bayesian Content Adaption (FBCA). Our Bayesian Optimization approach can adapt the level of a simple Roguelike game toa user, such that it takes the player approximately tg = 10 seconds to solve. Our approach models the player’s completion time t(x) usingGaussian Process Regression and a modified acquisition function in a Bayesian Optimization scheme, starting with a simple prior. Shownhere is the session of a single player: the acquisition function first suggests a level with 5 enemies (leniency) which an A* agent can solvein 11 steps (reachability). This level takes the player roughly 5 seconds to solve. We update the prior with this information and query theacquisition function for the next level to show. After 8 levels, our system finds a level that takes this player 9.6 seconds to solve.

Abstract—In games, as well as many user-facing systems,adapting content to users’ preferences and experience is animportant challenge. This paper explores a novel method torealize this goal in the context of dynamic difficulty adjustment(DDA). Here the aim is to constantly adapt the content of agame to the skill level of the player, keeping them engaged byavoiding states that are either too difficult or too easy. Currentsystems for DDA rely on expensive data mining, or on hand-crafted rules designed for particular domains, and usually adaptsto keep players in the flow, leaving no room for the designer topresent content that is purposefully easy or difficult. This paperpresents Fast Bayesian Content Adaption (FBCA), a system forDDA that is agnostic to the domain and that can target particulardifficulties. We deploy this framework in two different domains:the puzzle game Sudoku, and a simple Roguelike game. Bymodifying the acquisition function’s optimization, we are reliablyable to present a content with a bespoke difficulty for playerswith different skill levels in less than five iterations for Sudokuand fifteen iterations for the simple Roguelike. Our methodsignificantly outperforms simpler DDA heuristics with the addedbenefit of maintaining a model of the user. These results pointtowards a promising alternative for content adaption in a varietyof different domains.

Index Terms—Dynamic Difficulty Adjustment, Bayesian Opti-mization, Gaussian Processes

I. INTRODUCTION

The problem of creating interactive media that adapts tothe user has several applications, ranging from increasing theengagement of visitors of web applications to creating tailoredexperiences for students in academic settings [1]. One of theseapplications is Dynamic Difficulty Adjustment (DDA) [2],which consists of adapting the contents of a video game tomatch the skill level of the player. If the game presents tasksthat are too difficult or too easy, it might risk losing the playerdue to frustration or boredom.

Current approaches to DDA focus on specific domains (e.g.MOBAs [3], Role-Playing games [4] or fighting games [5]),and use agents and techniques that either do not generalize(such as planning agents requiring forward models [6]), orrely on an expensive process of gathering data from playersbefore the optimization can take place [7], [8]. Also relevantis the fact that most of these approaches focus on maximizedengagement and achieving flow states. However, sometimesthe designer’s intent might be to purposefully present contentthat is difficult (i.e. out-of-flow) for a particular player [9].

Bayesian Optimization has been recently proposed as apromising approach to DDA, since it does not rely on previ-

arX

iv:2

105.

0848

4v2

[cs

.AI]

29

Jun

2021

ously gathered information on either user or domain, and canbe deployed online with only minimal specifications about thegame in question [10]. However, so far this approach has onlybeen tested with AI agents and in a single domain, while onlyallowing for one possible difficulty target.

We propose a new Bayesian-based method for Fast BayesianContent Adaption and test it on DDA with human players.The method maintains a simple model of the player andleverages it for optimizing content towards a target difficultyon-the-fly in a domain-agnostic fashion. Fig. 1 illustrates howthe proposed approach works: (1) Players are presented withlevels that are predicted to have the right difficulty by theunderlying probabilistic model; (2) the model is updated oncenew data about the player’s performance arrives; (3) steps 1–2are repeated until a level with the desired target difficulty isfound.

We test this novel approach on two domains: the puzzlegame Sudoku, and levels for a simple Roguelike game. Ourresults show that Bayesian Optimization is a promising alter-native for domain-agnostic automatic difficulty adjustment.

II. METHODS AND RELATED WORK

A. Related Work on Difficulty Adjustment

Dynamic Difficulty Adjustment (DDA) consists of adaptingthe difficulty of a game to the skill level of the player, tryingto keep a flow state (i.e. a psychological state in which userssolve tasks that match their ability). DDA algorithms workby predicting and intervening [11]. They model a so-calledchallenge function that stands as a proxy for difficulty (e.g. winrate, health lost, hits received, completion time) and intervenethe game so as to match a particular target for this challengefunction [12].

Hunicke [13] points out that DDA can help players toretain a “sense of agency and accomplishment”, presentinga system called Hamlet that leveraged inventory theory topresent content to players in Half-Life [2], [13]. Since then,several other approaches and methods have been presented andstudied in this context, including alternating between different-performing AI agents in MOBA games [3], player modellingvia data gathering or meta-learning techniques [7], [8], andartificially restricting NPCs that are implemented as planningagents such as MCTS [5], [6]. Other DDA methods modelthe player using probabilities such as the the probabilities thata player would re-try, churn, or win a particular level in amobile game [11]; the resulting probabilistic graph is thenused to maximize engagement.

Dynamic Difficulty Adjustment is a particular instance ofthe larger field of automatic content creation and adaption.Examples in this area include work on evolving and evaluatingracetracks for different player models [14]. In another exam-ple, Shaker et al. optimize the game design features of plat-former games towards fun levels, where fun is defined usingplayer models trained on questionnaires [15]. Similar effortshave been made in the Experience Management community(with the goal of creating interactive storytelling games) [16].

These approaches, however, either rely on gathering databeforehand and leveraging it to create a player model thatis then used for optimization, or are domain-specific (e.g.storytellers or platformers). Bayesian Optimization serves asan alternative that is data efficient and flexible. Moreover, incontrast to AI-based approaches (like adjusting NPCs perfor-mance), Bayesian Optimization of game parameters does notrely on forward models.

Bayesian Optimization has been used a tool for automaticplaytesting and DDA. Zook et al. use Active Learning tofine-tune low level parameters based on human playtests [17].Khajah et al. [18] apply Bayesian Optimization to find gameparameters in Flappy Bird and Spring Ninja (e.g. distancebetween pipes and gap size) that would maximize engagement,measured as volunteered time. Our work differs in the fact thatwe build a model of player performance, instead of playerengagement. With our system, designers have the affordanceto target content that is difficult for a given player in a bespokefashion.

To the best of our knowledge, our contribution is the firstexample of a Bayesian Optimization-based system that modelslevels of difficulty for particular players, and dynamicallypresents bespoke content according to this model.

B. Bayesian Optimization (B.O.) using Gaussian Process Re-gression

Bayesian Optimization is frequently used for optimizationproblems in which the objective function has no closed form,and can only be measured by expensive and noisy queries (e.g.optimizing hyperparameters in Machine Learning algorithms[19], active learning [20], finding compensatory behaviorsin damaged robots [21]). The problem we tackle in thispaper is indeed black-box: we have no closed analytical formfor the time it would take for a player to solve a givenlevel, having them play a level is expensive time-wise, anda player’s performance on a single level may vary if we serveit repeatedly.

There are two main components in B.O. schemes [22]: asurrogate probabilistic model that approximates the objectivefunction, and an acquisition function that uses this probabilisticinformation to inform where to query next in order to max-imize said objective function. A common selection for theunderlying probabilistic model are Gaussian Processes [23],and two frequently used acquisition functions are ExpectedImprovement (EI) and the Upper Confidence Bound (UCB).

1) Gaussian Process Regression: Practically speaking, aGaussian process GP defines a Normal distribution for everypoint-wise approximation of a function t(x) using a priorµ0(x) and a kernel function k(x, x′) (which governs thecovariance matrix).

If we assume that the observations of the function for a setof points x = [xi]

ni=1, denoted by t = [t(xi)]

ni=1 = [ti]

ni=1,

are normally distributed with mean µ0 = [µ0(xi)]ni=1 and

covariance matrix K = [k(xi, xj)]ni,j=1, we can approximate

t(x) at a new point x∗ by leveraging the fact that the Gaussiandistributions are closed under marginalization [23]:

t∗ |x∗,x, t ∼ N (µ0 + kT∗ (K + σnoise)

−1t,

k(x∗, x∗)− kT∗ (K + σnoise)−1k∗),

(1)

where k∗ = [k(xi, x∗)]ni=1 and σnoise ∈ R+ is a hyperparam-

eter.Two standard choices for kernel functions are the

anisotropic radial basis function (RBF) kRBF(x,x′) =

exp(x(θI)x′T ) and the linear (or Dot Product) kernelkLinear(x,x

′) = σ0+xTx′. These kernels, alongside Gaussian

Process Regression as described by Eq. (1), are implementedin the open-source library sklearn [24], which we use forthis work.

One important detail in our experimental setup is that, sincethe function that we plan to regress (t(x)) will always bepositive empirically, we choose to model log(t(x)) instead.We thus assume that log(t) (and not t) is normally distributed.This is a common trick for modeling positive functions.

2) Acquisition Functions: Let t(x) be the objective functionin a Bayesian Optimization scheme. If we model t(x) usingGaussian Processes, we have access to point estimates anduncertainties (in the form of the updated mean and stan-dard deviation described in Eq. (1)). An acquisition functionα(x) uses this probabilistic model to make informed guesseson which new point x∗ might produce the best outcomewhen maximizing t(x). These functions balance exploration(querying points in parts of the domain that have not beenexplored yet) and exploitation (querying promising parts ofthe landscape, areas in which previous experiments have hadhigh values for t).

In our experiments, we use two acquisition functions: Ex-pected Improvement, defined as αEI(x) = E[max(0, t(x) −tbest)], where tbest is the best performance seen so far, andthe Upper Confidence Bound αUCB,κ = µ(x) + κσ(x) whereµ(x) and σ(x) are the posterior mean and standard deviationof the Gaussian Process with which we model t(x). Thehyperparameter κ measures the tradeoff between explorationand exploitation.

III. FAST CONTENT ADAPTION THROUGH B.O.Our approach, called Fast Bayesian Content Adaption

(FBCA), uses Bayesian Optimization to select the best con-tents to present to a player in an online fashion. On a high-level, our approach works as follows (Fig. 1): (1) Start a B.O.scheme with a hand-crafted prior over a set of levels/tasks1; (2)present the player an initial guess of what might be a level withthe right difficulty and record the player’s interaction with it;(3) update our estimates of the player and continue presentinglevels that have ideal difficulty according to the internal model.

In normal applications of B.O., the approximated functionis precisely the one to be optimized. In our approach, however,we separate the optimization from the modeling using amodified acquisition function.

1Having a handcrafted prior is optional, since the system would also workwith a non-informative one.

procedure FastBayesianContentAdaption(tg , D, µ0, k, β):X = ∅ ⊆ D // list of served contents xT = ∅ ⊆ R // the log−time log(t) they tookwhile True:

// Start the GP and optimize its hyperparametersinitialize GP(µ0, k) and fit it with (x, log(t)) ∈ X × T// Maximize the modified acq. function.xnext = maxx∈D{βtg (x)}// This task is the most likely to have time tg .present task xnext and record time tadd xnext to X and log(t) to T

Algorithm 1: Pseudocode for FastBayesianContentAdaption. Ourapproach takes a goal time tg , a design space D, a prior µ0, a kernelfunction k and a modified acquisition β. This algorithm iterativelypresents contents x from the design space, measures the interactionwith the user t, updates µ0 using a Gaussian Process with kernel kand uses this model to query new contents xnext that probably haveperformance close to the goal tg .

A. Modeling variables of interest using Gaussian Processes

Given a design space D (e.g. a collection of levels in a videogame, or a set of possible web sites with different layouts), itis of interest to model a function t : D → R. The first step ofour approach is to approximate this variable of interest t(x)using Gaussian Process Regression.

In the context of DDA, we choose to model the logarithmof the time log(t(x)) it takes a player to solve task x usingGaussian Process Regression, and we will optimize it for acertain goal time tg using Bayesian Optimization. We modellog-time instead of time to ensure that our estimates of t(x) arealways positive. Instead of using time as a proxy for difficulty,other metrics such as score, win rate or kill/death ratio couldbe chosen, depending on the nature of the game.

B. Separating optimization from modeling

Once a model of the player t(x) has been built, we can opti-mize it towards a certain goal tg . Since the acquisition functionin a B.O. scheme is built for finding maximums, we separatemodeling from optimization to be able to maximize the targettg: once we regress t(x), we consider t(x) = −(t(x)−tg)2 asthe objective function. After this transformation, the functiont(x) has its optima exactly at tg , and the acquisition functionis seeking an improvement in t instead of t, which translatesto values for t(x) that are close to tg .

In other words, we modify the typical acquisition functions(Sec. II) in the following way: Expected Improvement be-comes βtgEI = Elog(t(x))∼GP[max(0,−(t(x)−tg)2−tbest] (wheretbest is the value of {t(xi)}ni=1 closest to 0 (i.e. t ≈ tg), and theUpper Confidence Bound becomes βtgUCB,κ = −(exp(µ(x) +κσ(x)) − tg)2 (where µ(x) and σ(x) are the posterior meanand variance according to the Gaussian Process). Since weneed to compute expectations (in the case of EI), we performancestral sampling and Monte Carlo integration: computeseveral samples of log(t(x)) ∼ GP, and average the argumentof the expectation inside βtgEI .

Our approach is summarized in pseudocode in Algorithm 1.Once a player starts interacting with our system, we maintain

pairs (x, log(t)) and we apply B.O. to serve levels which willlikely result in t(x) ≈ tg . Notice that maintaining all pairsof tasks and performance, we assume that the player is notimproving over time. This could be addressed by having asliding window in which we forget early data that is no longerrepresentative of the player’s current skill. In our experiments,however, we update our prior on player’s performance with allthe playtrace of a given player.

IV. EXPERIMENTAL SETUP

Our approach for Fast Bayesian Content Adaption promisesto be able to adjust contents in a design space D according tothe behavior of the player. To test this approach we chose twodomains: the puzzle Sudoku, and a simple Roguelike game.

A. Sudoku

A Sudoku puzzle (size 9×9) requieres filling in the digitsbetween 1 and 9 in grids of size 3×3 such that no numberappears more than once in its grid, row, and column. Thedifficulty of a Sudoku can be measured in terms of the numberof prefilled digits, which ranges from 17 (resulting in a verydifficult puzzle) to 80 (which is trivial). Thus, we settled onD = {17, 18, . . . , 80} as the design space for this domain.

To test our approach, we deployed a web app2 with aninterface that serves Sudokus of varying amounts of prefilleddigits x ∈ D. We settled for a goal of tg = 180sec, instructedusers to solve the puzzles as fast as they could, and initiatedthe player model with a linear prior µ0(x) that interpolates thepoints (80, 3) and (17, 600) (see Fig. 2a). For this experiment,we used the RBF kernel kRBF and the modified ExpectedImprovement acquisition βtg = β

tgEI , and we computed this

expectation using ancestral sampling (since log(t(x)) ∼ GP)and Monte Carlo integration (see Sec. III).

We compare our FBCA approach with a binary searchbaseline which explores D by halving it, and jumping to theeasy/difficult part according to the performance of the player.We also compared our approach with a linear regressionpolicy for selecting the next Sudoku: we approximate player’scompletion time starting with the same prior as with ourapproach and, once new data arrives, we fit a linear modelto it in log-time space and present a sudoku that the modelsays will perform closest to tg . Unfortunately, we were notable to gather enough playtraces to make significant statisticalcomparisons in the linear regression case, so our analysiswill focus on the comparison between our approach andsimple binary search. However, we present a short comparisonbetween leveraging this linear regression model and deployingour FBCA system.

B. Roguelike

We also tested our approach in a simple Roguelike game,in which the player must navigate a level, grab a key andproceed to a final objective while avoiding enemies that moverandomly. The player can move in all directions and whenfacing an enemy, can kill it (Fig. 3).

2https://adaptivesudoku.herokuapp.com/

This domain is significantly different from Sudoku sincedifficulty is not as straight-forward to define and model, andit is also an instance of a roguelike game that players havenot played prior. Two variables that influence the difficultyof a level are leniency l, defined as the number of enemiesin the level; and reachability r, defined as the sum of theA* paths from the player’s avatar to the key and from thekey to the goal. Using these, we define the as design spaceD = {(l, r)} ⊆ R2.

Users are presented levels of the Roguelike game online3,and are instructed to solve them as quickly as they can. Inour optimizations, we target to find levels that take tg = 10seconds.

We construct a prior µ0(x) by computing a corpus of 399randomly-generated levels L in the intervals l ∈ [0, 24] andr ∈ [4, 50]. Levels were generated in an iterative process:first by sampling specifications such as height, width, andamount of enemies and randomly placing assets such that theA* path between the player and the goals are preserved, andthen by mutating (adding/removing rows, columns, and assets)previously generated levels. We defined µ0(x) by interpolatinga plane in which a level with l = 0, r = 4 takes 1 second tosolve, and a level with l = 14, r = 50 would take 20 secondsto solve. Fig. 3 shows this prior, as well as example levels andtheir place in the design space.

For the GP, we chose a kernel k that combines a linearassumption, as well as local interactions using the RBFkernel: k = kRBF + kLinear, together with the modified UpperConfidence Bounds acquisition function βtg = β

tgUCB,κ with

κ = 0.05. These hyperparameters (kernel shape and κ)were selected after several trial-and-error iterations of theexperiment. Hyperparameters inside the kernels themselveswere obtained by maximizing the likelihood w.r.t the data.

We compare our approach with two other methods: selectinglevels completely at random from the corpus L, and a NoisyHill-climbing algorithm that starts in the center of D, takesa random step (governed by normal noise) arriving at a newlevel x and performance t, and consolidates x as the newcenter for exploration if t is closer to the goal tg . When usersenter the website and game, they are assigned one of thesethree approaches at random. These baselines were selectedsince methods for DDA discussed in Sec. II are either domain-specific, or they optimize for engagement and thus do notapply in this context.

V. RESULTS AND DISCUSSION

A. Sudoku

We received a total of 288 unique playtraces for FBCA, and94 unique playtraces using binary search. This discrepancy,however, is accounted for in the statistical tests we perform.

Figure 2 shows on the one hand the prior µ0(x) we usedfor the fast adaption, and on the other the approximationt(x) ∼ GP after a player was presented with 4 differentSudokus, following Algorithm 1. The first Sudoku had 65

3https://adaptive-roguelike.herokuapp.com/

https://adaptivesudoku.herokuapp.com/

https://adaptive-roguelike.herokuapp.com/

12

34

(a) Prior for the Sudoku experiment. This prior encodes the assumptionthat a Sudoku with 80 prefilled digits would be trivial to solve, and that thedifficulty and uncertainty increases as there are fewer prefilled digits.

12

34

(b) 4th iteration of a FastBayesianContentAdaption playtrace.

Fig. 2: A Sudoku playtrace. Once FBCA is deployed with an appropriate prior (a), the player is shown a Sudoku with 65 digits. In the caseof this particular playtrace, the player took 92 seconds to complete it. Our approach then runs an iteration of the Bayesian Optimization,serving a level with 63 hints. Since this level is still too easy (111 seconds), the modified acquisition function β180

EI suggests a Sudoku with53 digits, which takes the player 216 seconds to solve. Finally, a Sudoku with 55 hints is presented that takes the player 175 seconds tosolve. The resulting model of the player at this fourth iteration of FBCA is presented in (b).

Fig. 3: Prior creation for the Roguelike experiment. We pre-computed a corpus of levels for our Roguelike game and, for eachone of them, we computed their leniency l (amount of enemies) andreachability r (sum of the length of the A* paths from the avatarto the key and from the key to the final goal). All optimizationsof the roguelike experiment take place in the (l, r) space this figureillustrates. We associated to each level a prior, built on the assumptionthat levels with low leniency and reachability could be solved in 1second, and levels high leniency and reachability in 20 seconds.

prefilled digits and was too easy for the player, who solved it inonly 92 seconds. In the next iteration, the modified acquisitionfunction β180

EI achieved its maximum over D = {17, . . . , 80}at 63 hints. This Sudoku, however, also proved too easy forthe player, who solved it in 111 seconds. The next Sudokusuggested by the system had 55 prefilled digits and tookthe player over 3 minutes to solve and, finally, at the 4thiteration, the system presented a Sudoku that took the player175 seconds to solve, showing that our method was able tofind (for this particular player) a puzzle with difficulty closeto the target tg = 180 in only 4 iterations.

Fig. 4: The average Sudoku player. This figure presents the resultof fitting a Gaussian process to all the playtraces we gathered forSudoku when testing the FBCA algorithm. The prior µ0(x) presentedin Fig. 2a was updated using an RBF kernel with data from 598correctly-solved Sudokus, resulting in an approximation t(x) of howlong it would take an average player to solve a Sudoku with x hints.

To measure which approach performed better (betweenFBCA and binary search), we use the mean absolute errormetric, defined as

µe(T ; tg) =1

|T |∑t∈T|t− tg|, (2)

this quantity measures how far, on average, the times recordedwere from the goal time tg . For brevity in the presentation,we will use the symbols µFBCA

e to stand for the mean absoluteerror of the times collected for the Bayesian experiment TFBCAand goal tg = 180, and likewise for the binary search data withµbine .

(a) Example: linear regression of log-time is enough. (b) Example: linear regression fails to show appropriate sudokus.

Fig. 5: Gaussian Process Regression vs. Linear regression in Sudoku. Shown is a preliminary comparison of our approach against agreedy policy that models players time using linear regression in log(t) space. (a) shows an example in which this greedy policy was ableto find a sudoku with the target time (with a playtrace of [145, 186, 184] seconds), while in (b) we see an example in which the linearregression fails to capture an adequate model of difficulty for the player (with a playtrace of [169, 132, 125] seconds). In both examples, ourapproach (that relies on Gaussian Process Regression with an RBF kernel) is able to capture a model of the player that decreases completiontime w.r.t increasing hints. This seems to be a sensible model of difficulty, both intuitively and according to the result of fitting our approachwith all (solved) sudoku playtraces (see Fig. 4)

Fig. 6: Mean absolute error for the Roguelike experiment. Shown is the average distance to the goal tg = 10 for three different approaches(presenting levels randomly, using Fast Bayesian Content Adaption, and using a baseline of Noisy Hill-climbing). Overall, we can see thatthe Bayesian optimization’s absolute mean error is decreasing, albeit with noise. The difference between the Bayesian Optimization and theother approaches is small (in the ballpark of 1 or 2 seconds) but significant. The blue shaded region represents ±σ, and it disappears aftere.g. 45 in the FBCA since, from that point onwards, we only have one playtrace.

Table I shows the mean absolute error for the Sudokuexperiment, segmented by the ith sudoku presented to theuser. We only consider puzzles that were correctly solved,ignoring Sudokus solved in over 3000 seconds. Accordingto the data for the first iteration, Sudokus with 65 hints (thestarting point for FastBayesianContentAdaption) are closer tothe goal of 3 minutes than the starting point of binary searchon average. Both approaches appear to be getting closer tothe goal the more Sudokus the player plays, ending with anaverage distance to the goal of about 30 seconds for FBCA inthe best case (i = 8) and of about 40 seconds in the binarysearch (i = 4).

While these results have high variance, two-tailed t-tests

that assume different group variances allow us to reject thenull hypothesis H0 : µFBCA

e = µbine (with a p-value of 0.01,

less than the acceptance rate of 0.05) when including all theplaytraces for both experiments. FBCA has a mean absoluteerror that almost halves the binary search one (with statisticalsignificance). We hypothesize that this happens because thesearch in FBCA takes place around the easier levels, as isshown in Fig. 4. We assume this preference for easier puzzlesis the result of a prior that favors easy levels and a variancethat is more uncertain of harder levels. The same is not thecase for the binary search; if a player solves the first Sudokuin less than 3 minutes, they are presented with a puzzle withonly 33 prefilled digits next (which is significantly harder).

Iteration µFBCAe ± σ µbin

e ± σ H0 rejected

1 72.5± 47.5 135.9± 266.4 yes (p = 0.04)2 76.9± 94.6 122.3± 202.6 no (p = 0.17)3 49.6± 33.8 132.7± 361.8 no (p = 0.22)4 47.6± 66.7 39.5± 33.2 no (p = 0.52)5 60.4± 114.5 44.1± 35.6 no (p = 0.45)6 36.5± 34.9 72.6± 106.5 no (p = 0.35)7 37.3± 36.3 49.1± 40.1 no (p = 0.50)8 29.9± 29.6 79.8± 79.4 no (p = 0.19)

All 63.9 ± 67.3 110.5 ±237.1 yes (p = 0.01)

TABLE I: Mean absolute errors for the Sudoku experiment.This table presents the mean errors µe and their respective standarddeviations. At each iteration, we consider all sudokus solved by usersand their respective times. We compute the distance to the goal ofthe optimization (180 seconds), and we average them over the totalamount of sudokus presented for said iteration. The mean error forFBCA µFBCA

e is significantly less than that for the binary search µbine ,

which means that the null hypothesis H0 : µFBCAe = µbin

e is rejectedaccording to a two-tailed t-test with acceptance rate of 0.05.

level #s µFBCAe µNH

e µrande HNH

0 rejected H rand0 rejected

1 ≤ i < 5 3.66± 3.48 5.76± 5.63 7.09± 6.73 yes (p = 0.01) yes (p < 0.01)5 ≤ i < 10 3.65± 3.94 4.88± 4.98 6.28± 6.03 no (p = 0.10) yes (p < 0.01)10 ≤ i < 15 3.97± 5.15 4.39± 6.76 4.65± 3.80 no (p = 0.72) no (p = 0.47)15 ≤ i < 20 4.76± 6.62 3.69± 3.09 6.39± 6.39 no (p = 0.41) no (p = 0.32)20 ≤ i < 25 3.30± 3.68 4.36± 4.91 5.13± 5.45 no (p = 0.42) no (p = 0.20)25 ≤ i < 30 2.23± 1.68 5.20± 6.17 3.97± 3.15 no (p = 0.07) no (p = 0.08)30 ≤ i < 35 2.58± 2.44 4.00± 4.05 4.55± 1.91 no (p = 0.29) no (p = 0.06)

All 3.78± 4.59 4.73± 5.32 5.94± 5.74 yes (p = 0.01) yes (p < 0.01)

TABLE II: Mean absolute errors for the Roguelike experiment.This table presents the mean absolute error for FastBayesianCon-tentAdaption (FBCA) µFBCA

e , the Noisy Hill-climbing (NH) baselineµNHe , and for serving random levels µrand

e . This data supports twoclaims: Our approach FBCA improves in its understanding of theplayer over time (which can be sensed from the fact that µFBCA

e

decreases the more levels a player solves), and FBCA performsbetter than NH and serving random levels on average (since µFBCA

e isless than µNH

e and µrande in almost all intervals). While our approach

performs significantly better when taking all iterations into account,in some instances this performance increase, while observable, is notsignificant. These results are also reported visually in Fig. 6

.

While we were not able to gather enough playtraces to per-form significant statistical comparisons between our approachand using a greedy policy based on linear regression, we cananecdotally tell that there were cases in which a linear modelof log-time failed to capture the player’s performance. In Fig. 5we show two example playtraces that were gathered for thelinear regression policy experiment. In the first one (Fig. 5a)we can see that using linear regression was enough to capturea model of the player and accurately present to them Sudokusthat targeted the appropriate difficulty. On the other hand,Fig. 5b shows a critical example in which the predictions ofsimple linear regression state that sudokus with 17 hints areextremely easy. This seems to indicate that Gaussian ProcessRegression (with RBF kernels) is a better model for predictingcompletion log-time in this particular experiment.

B. Roguelike

Players were assigned one of the three experiments dis-cussed in Sec. IV-B at random when using our web application

and were instructed to solve the levels they were presentedas quickly as possible. We recorded a total of 18 uniqueplaytraces for FBCA, 21 for the Noisy Hill-climbing, and30 for the completely random approach. Like in the Sudokuexperiment, our statistical tests are resilient to this discrepancyin sample sizes. In preparation for the data analysis, weremoved all levels solved in over 60 seconds of time, sincethey are designed to be short, and a solved time of over aminute may indicate that the player got distracted.

Table II shows the mean absolute error (see Eq. (2)) for allthree approaches, dividing the iterations in smaller intervals forease of presentation and smoothing purposes. We see that themean absolute error for FBCA (µFBCA

e ) decreases the morelevels a player solves. This effect is to be expected sincethe underlying player model better approximates the player’sactual performance. Although the results have high variance,we can say with statistical confidence (using a two-tailed t-testwith an acceptance rate of 0.05) that FBCA performs betterthan Noisy Hill-climbing and serving completely randomlevels, but this is due to a small margin: of less than a secondvs. NH, and two seconds vs. serving random levels. Fig. 6shows the mean absolute error for all three methods, includingone standard deviation and the average for all levels solved.Notice how the mean absolute error for all playtraces is lowerfor FBCA than for serving random levels, or Noisy Hill-climbing. We argue that the position of this mean error alsoindicates that the high spikes that can be seen in all approachesare outliers that could be explained by temporary distractionsfrom the players.

VI. CONCLUSIONS AND FUTURE WORK

This paper presented a new approach, called Fast BayesianContent Adaption (FBCA), with a focus on automatic diffi-culty adjustment in games. Our method maintains a simplemodel of the player, updates it as soon as data about theinteraction between the user and the game arrives usingGaussian Processes, and uses a modified acquisition functionto identify the next best level to present.

We tested FBCA in two domains: finding a Sudoku thattakes 3 minutes to solve, and a level for a Roguelike game thattakes 10 seconds to solve. Experiments that compare FBCAwith simpler baselines in both domains show that our approachis able to find levels with target difficulty quickly and withsignificantly less distance to the target goal. While the resultshave a high variance, the mean error of FBCA is significantlylower than that of the baseline we compare to.

Our method thus provides an alternative for automaticdifficulty adjustment that can, in only a few trials, learn whichcontent in a design space elicits a particular target value (e.g.completion time) for a given player.

There are several avenues for future research: First, en-forcing stronger assumptions about the monotonicity of thefunction that is being modeled in the Bayesian Optimization[25] might be beneficial if the content of the game is encodedin features that relate to difficulty directly (e.g. amount of

enemies). Moreover, approaches in which the prior is auto-matically learned using artificial agents as proxies for humanscould be explored [26]. The fact that our model assumes thatthe player does not improve over time could also be tackled infuture research, by forgetting the initial parts of the playtraceusing a sliding window. Finally, the features that make thecontents of the design space difficult could be automaticallylearned by training generative methods (e.g. GANs or VAEs)and exploring the latent space of content they define [27]–[30].

ACKNOWLEDGEMENTS

This work was funded by a Google Faculty Award 2019,Sapere Aude DFF grant (9063-00046B), the Danish Ministryof Education and Science, Digital Pilot Hub and SkylabDigital. We would also like to thank the testers of our twoprototypes, and the reviewers for their helpful comments onour manuscript.

REFERENCES

[1] O. Pastushenko, “Gamification in education : dynamic difficulty adjust-ment and learning analytics,” in CHI PLAY 2019 - Extended Abstractsof the Annual Symposium on Computer-Human Interaction in Play, no.October, 2019.

[2] R. Hunicke and V. Chapman, “AI for dynamic difficulty adjustment ingames,” AAAI Workshop - Technical Report, vol. WS-04-04, pp. 91–96,2004.

[3] M. P. Silva, V. do Nascimento Silva, and L. Chaimowicz, “Dynamicdifficulty adjustment on MOBA games,” Entertainment Computing,vol. 18, pp. 103–123, 2017.

[4] A. E. Zook and M. O. Riedl, “A temporal data-driven player model fordynamic difficulty adjustment,” Proceedings of the 8th AAAI Conferenceon Artificial Intelligence and Interactive Digital Entertainment, AIIDE2012, pp. 93–98, 2012.

[5] S. Demediuk, M. Tamassia, W. L. Raffe, F. Zambetta, X. Li, andF. Mueller, “Monte Carlo tree search based algorithms for dynamic diffi-culty adjustment,” 2017 IEEE Conference on Computational Intelligenceand Games, CIG 2017, pp. 53–59, 2017.

[6] Y. Hao, Suoju He, Junping Wang, Xiao Liu, jiajian Yang, and WanHuang, “Dynamic difficulty adjustment of game ai by mcts for thegame pac-man,” in 2010 Sixth International Conference on NaturalComputation, vol. 8, 2010, pp. 3918–3922.

[7] M. Jennings-Teats, G. Smith, and N. Wardrip-Fruin, “Polymorph: Dy-namic Difficulty Adjustment through level generation,” Workshop onProcedural Content Generation in Games, PC Games 2010, Co-locatedwith the 5th International Conference on the Foundations of DigitalGames, no. June 2010, pp. 2–6, 2010.

[8] H.-S. Moon and J. Seo, “Dynamic difficulty adjustment via fast useradaptation,” in Adjunct Publication of the 33rd Annual ACM Symposiumon User Interface Software and Technology, 2020, pp. 13–15.

[9] A. Anthropy and N. Clark, A Game Design Vocabulary: Exploring theFoundational Principles Behind Good Game Design, 1st ed. Addison-Wesley Professional, 2014.

[10] M. Gonzalez-Duque, R. B. Palm, D. Ha, and S. Risi, “Finding gamelevels with the right difficulty in a few trials through intelligent trial-and-error,” in 2020 IEEE Conference on Games (CoG), 2020, pp. 503–510.

[11] S. Xue, M. Wu, J. Kolen, N. Aghdaie, and K. A. Zaman, “Dynamicdifficulty adjustment for maximized engagement in digital games,”26th International World Wide Web Conference 2017, WWW 2017Companion, pp. 465–471, 2017.

[12] M. Zohaib, “Dynamic difficulty adjustment (DDA) in computer games:A review,” Advances in Human-Computer Interaction, vol. 2018, 2018.

[13] R. Hunicke, “The case for dynamic difficulty adjustment in games,”ACM International Conference Proceeding Series, vol. 265, pp. 429–433, 2005.

[14] J. Togelius, R. De Nardi, and S. M. Lucas, “Towards automatic person-alised content creation for racing games,” Proceedings of the 2007 IEEESymposium on Computational Intelligence and Games, CIG 2007, pp.252–259, 2007.

[15] N. Shaker, G. Yannakakis, and J. Togelius, “Towards automatic per-sonalized content generation for platform games,” Proceedings of the6th AAAI Conference on Artificial Intelligence and Interactive DigitalEntertainment, AIIDE 2010, no. June 2014, pp. 63–68, 2010.

[16] D. Thue, V. Bulitko, M. Spetch, and W. Eric, “Interactive storytelling:A player modelling approach,” Proceedings of the 3rd Artificial Intel-ligence and Interactive Digital Entertainment Conference, AIIDE 2007,no. January 2007, pp. 43–48, 2007.

[17] A. Zook, E. Fruchter, and M. O. Riedl, “Automatic playtestingfor game parameter tuning via active learning,” in Proceedings ofthe 9th International Conference on the Foundations of DigitalGames, FDG 2014, Liberty of the Seas, Caribbean, April 3-7,2014, M. Mateas, T. Barnes, and I. Bogost, Eds. Society forthe Advancement of the Science of Digital Games, 2014. [Online].Available: http://www.fdg2014.org/papers/fdg2014 paper 39.pdf

[18] M. M. Khajah, B. D. Roads, R. V. Lindsey, Y.-E. Liu, and M. C.Mozer, “Designing engaging games using bayesian optimization,” inProceedings of the 2016 CHI Conference on Human Factors inComputing Systems, ser. CHI ’16. New York, NY, USA: Associationfor Computing Machinery, 2016, p. 5571–5582. [Online]. Available:https://doi.org/10.1145/2858036.2858253

[19] J. Snoek, H. Larochelle, and R. P. Adams, “Practical bayesianoptimization of machine learning algorithms,” in Advances in NeuralInformation Processing Systems, F. Pereira, C. J. C. Burges, L. Bottou,and K. Q. Weinberger, Eds., vol. 25. Curran Associates, Inc.,2012, pp. 2951–2959. [Online]. Available: https://proceedings.neurips.cc/paper/2012/file/05311655a15b75fab86956663e1819cd-Paper.pdf

[20] E. Brochu, V. M. Cora, and N. de Freitas, “A Tutorial on BayesianOptimization of Expensive Cost Functions, with Application to ActiveUser Modeling and Hierarchical Reinforcement Learning,” 2010.[Online]. Available: http://arxiv.org/abs/1012.2599

[21] A. Cully, J. Clune, D. Tarapore, and J.-B. Mouret, “Robots that canadapt like animals,” Nature, vol. 521, no. 7553, pp. 503–507, May2015. [Online]. Available: https://doi.org/10.1038/nature14422

[22] B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. de Freitas,“Taking the human out of the loop: A review of bayesian optimization,”Proceedings of the IEEE, vol. 104, no. 1, pp. 148–175, 2016.

[23] C. Rasmussen and C. Williams, Gaussian Processes for Machine Learn-ing, ser. Adaptive Computation and Machine Learning. Cambridge,MA, USA: MIT Press, Jan. 2006.

[24] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vander-plas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duch-esnay, “Scikit-learn: Machine learning in Python,” Journal of MachineLearning Research, vol. 12, pp. 2825–2830, 2011.

[25] J. Riihimaki and A. Vehtari, “Gaussian processes with monotonicityinformation,” Journal of Machine Learning Research, vol. 9, pp. 645–652, 2010.

[26] J. T. Kristensen, A. Valdivia, and P. Burelli, “Estimating player comple-tion rate in mobile puzzle games using reinforcement learning,” in 2020IEEE Conference on Games (CoG), 2020, pp. 636–639.

[27] V. Volz, J. Schrum, J. Liu, S. M. Lucas, A. Smith, and S. Risi,“Evolving mario levels in the latent space of a deep convolutionalgenerative adversarial network,” in Proceedings of the Genetic andEvolutionary Computation Conference, ser. GECCO ’18. New York,NY, USA: Association for Computing Machinery, 2018, p. 221–228.[Online]. Available: https://doi.org/10.1145/3205455.3205517

[28] R. Rodriguez Torrado, A. Khalifa, M. Cerny Green, N. Justesen, S. Risi,and J. Togelius, “Bootstrapping conditional gans for video game levelgeneration,” in 2020 IEEE Conference on Games (CoG), 2020, pp. 41–48.

[29] M. C. Fontaine, R. Liu, A. Khalifa, J. Modi, J. Togelius, A. K. Hoover,and S. Nikolaidis, “Illuminating mario scenes in the latent space of agenerative adversarial network,” 2020.

[30] A. Sarkar, Z. Yang, and S. Cooper, “Conditional level generation andgame blending,” in Proceedings of the Experimental AI in Games(EXAG) Workshop at AIIDE, 2020.

http://www.fdg2014.org/papers/fdg2014_paper_39.pdf

https://doi.org/10.1145/2858036.2858253

https://proceedings.neurips.cc/paper/2012/file/05311655a15b75fab86956663e1819cd-Paper.pdf

https://proceedings.neurips.cc/paper/2012/file/05311655a15b75fab86956663e1819cd-Paper.pdf

http://arxiv.org/abs/1012.2599

https://doi.org/10.1038/nature14422

https://doi.org/10.1145/3205455.3205517

Fast Game Content Adaptation Through Bayesian-based Player ...

Documents

Transcript of Fast Game Content Adaptation Through Bayesian-based Player ...