Conceptual Game Expansion

14
1 Conceptual Game Expansion Matthew Guzdial 1 , Mark O. Riedl 2 , 1 Department of Computing Science, University of Alberta, Edmonton, AB T6G 0T3, Canada 2 School of Interactive Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA emails: [email protected], [email protected] Automated game design is the problem of automatically producing games through computational processes. Traditionally these methods have relied on the authoring of search spaces by a designer, defining the space of all possible games for the system to author. In this paper we instead learn representations of existing games and use these to approximate a search space of novel games. In a human subject study we demonstrate that these novel games are indistinguishable from human games for certain measures. Index Terms—Computational and artificial intelligence, Machine learning, Procedural Content Generation, Knowledge representation, Pattern analysis, Electronic design methodology, Design tools I. I NTRODUCTION Video game design and development requires a large amount of expert knowledge. This skill requirement serves as a barrier that restricts those who might most benefit from the ability to make games, such as educators, activists, and those from marginalized backgrounds. Researchers have suggested automated game design as a potential so- lution to this issue, in which computational systems produce games without major human intervention. The promise of automated game design is that it could democratize game design and allow for games currently infeasible given the resource requirements of modern game development. However, automated game design has relied upon encoding human design knowledge via authoring parameterized game design spaces, game components or entire games for a system to remix. This authoring work requires expert knowledge, and is time intensive to author and debug, which limits the applications of these automated game design systems [1]. Machine learning (ML) has recently shown great success at many tasks. One might consider applying machine learning to solve the automated game design knowledge authoring problem, learning the required knowledge from existing games instead of requiring human authoring. However, modern machine learning requires large datasets with thou- sands of instances [2] and often underperforms or fails when confronted with contexts outside its original training sets or knowledge bases. Thus, it is insufficient to try to naively apply machine learning to solve this problem. As an alternative to traditional AI and ML approaches, we explore the application of combina- tional creativity to automated game design. Combi- national creativity is a particular type of creativity employed when one recombines old knowledge to make something new, like combining the wings of a bird and the body of a horse for a flying horse or Pegasus. In this way we can apply machine learning to learn representations of existing games and remix these representations to produce new games. In this paper we draw on novel machine learning methods to derive models of level design and game mechanics, and combine them in a representation we call a game graph. We employ a combinational creativity algorithm called conceptual expansion to combine game graphs from three different games. These combinations represent novel game graphs, which can then be transformed into playable games. We evaluate these games in a human subject study comparing them to human-developed games and baseline combinational creativity-developed games, with results that support the appropriateness of this approach to automated game design. II. RELATED WORK In this section we summarize prior related work on combinational creativity, machine learning fo- cused on games and procedural content generation. Notably we focus on the most relevant prior work, and especially prior work that relates to our own in technique or subject matter. A. Combinational Creativity Combinational creativity algorithms attempt to approximate the human ability to recombine old knowledge to produce something new. There are two core parts of traditional combinational approaches: mapping and constructing the final combination. Mapping determines what parts of the old knowl- edge might be relevant in producing the new knowledge and what parts of the old knowledge might be combined. Mapping typically requires some knowledge base to allow these algorithms to reason over common-sense knowledge. How the final combination is constructed or chosen is the major difference among combinational creativity approaches. arXiv:2002.09636v1 [cs.AI] 22 Feb 2020

Transcript of Conceptual Game Expansion

Page 1: Conceptual Game Expansion

1

Conceptual Game Expansion

Matthew Guzdial1, Mark O. Riedl2,1Department of Computing Science, University of Alberta, Edmonton, AB T6G 0T3, Canada2School of Interactive Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA

emails: [email protected], [email protected]

Automated game design is the problem of automatically producing games through computational processes.Traditionally these methods have relied on the authoring of search spaces by a designer, defining the space ofall possible games for the system to author. In this paper we instead learn representations of existing gamesand use these to approximate a search space of novel games. In a human subject study we demonstrate thatthese novel games are indistinguishable from human games for certain measures.

Index Terms—Computational and artificial intelligence, Machine learning, Procedural Content Generation,Knowledge representation, Pattern analysis, Electronic design methodology, Design tools

I. INTRODUCTION

Video game design and development requiresa large amount of expert knowledge. This skillrequirement serves as a barrier that restricts thosewho might most benefit from the ability to makegames, such as educators, activists, and thosefrom marginalized backgrounds. Researchers havesuggested automated game design as a potential so-lution to this issue, in which computational systemsproduce games without major human intervention.The promise of automated game design is that itcould democratize game design and allow for gamescurrently infeasible given the resource requirementsof modern game development. However, automatedgame design has relied upon encoding human designknowledge via authoring parameterized game designspaces, game components or entire games for asystem to remix. This authoring work requiresexpert knowledge, and is time intensive to authorand debug, which limits the applications of theseautomated game design systems [1].

Machine learning (ML) has recently shown greatsuccess at many tasks. One might consider applyingmachine learning to solve the automated gamedesign knowledge authoring problem, learning therequired knowledge from existing games insteadof requiring human authoring. However, modernmachine learning requires large datasets with thou-sands of instances [2] and often underperformsor fails when confronted with contexts outside itsoriginal training sets or knowledge bases. Thus, it isinsufficient to try to naively apply machine learningto solve this problem.

As an alternative to traditional AI and MLapproaches, we explore the application of combina-tional creativity to automated game design. Combi-national creativity is a particular type of creativityemployed when one recombines old knowledge tomake something new, like combining the wings ofa bird and the body of a horse for a flying horse or

Pegasus. In this way we can apply machine learningto learn representations of existing games and remixthese representations to produce new games.

In this paper we draw on novel machine learningmethods to derive models of level design and gamemechanics, and combine them in a representationwe call a game graph. We employ a combinationalcreativity algorithm called conceptual expansion tocombine game graphs from three different games.These combinations represent novel game graphs,which can then be transformed into playable games.We evaluate these games in a human subject studycomparing them to human-developed games andbaseline combinational creativity-developed games,with results that support the appropriateness of thisapproach to automated game design.

II. RELATED WORK

In this section we summarize prior related workon combinational creativity, machine learning fo-cused on games and procedural content generation.Notably we focus on the most relevant prior work,and especially prior work that relates to our own intechnique or subject matter.

A. Combinational Creativity

Combinational creativity algorithms attempt toapproximate the human ability to recombine oldknowledge to produce something new. There are twocore parts of traditional combinational approaches:mapping and constructing the final combination.Mapping determines what parts of the old knowl-edge might be relevant in producing the newknowledge and what parts of the old knowledgemight be combined. Mapping typically requiressome knowledge base to allow these algorithmsto reason over common-sense knowledge. How thefinal combination is constructed or chosen is themajor difference among combinational creativityapproaches.

arX

iv:2

002.

0963

6v1

[cs

.AI]

22

Feb

2020

Page 2: Conceptual Game Expansion

2

Combinational creativity algorithms tend to havemany possible valid outputs. This is typically viewedas undesirable, with general heuristics or constraintsdesigned to pick a single correct combinationfrom this set [3], [4]. This limits the potentialoutput of these approaches, we instead employ adomain-specific heuristic criteria to find an optimalcombination from a space of possible combinations.

We identify three specific existing combinationalcreativity techniques for deeper investigation, dueto the fact that they are well-formed in domain-independent terms: conceptual blending, amalgams,and compositional adaptation. It is worth noting thatthe final combination and the process for combiningare often referred to with the same words (e.g.conceptual blending produces conceptual blends).We employ these three approaches as baselinesduring the human subject study.

B. Conceptual Blending

Fauconnier and Turner [5] formalized the “fourspace" theory of conceptual blending. In this theorythey describe four spaces that make up a blend:two input spaces represent the unblended elements,input space points (or features) are projected intoa common generic space to identify equivalence,and these equivalent points are projected into ablend space. In the blend space, novel structureand patterns arise from the projection of equivalentpoints. Fauconnier and Turner [5], [6] argued thiswas a ubiquitous process, occurring in discourse,problem solving, and general meaning making.

C. Amalgamation

Ontañón designed amalgams as a formal unifica-tion function between multiple cases [4]. Similarto conceptual blending, amalgamation requires aknowledge base that specifies when two componentsof a case share a general form, for example Frenchand German can both share the more generalform nationality. Unlike conceptual blending, thisshared generalization does not lead to merging ofcomponents, but requires that only one of the twobe present in a final amalgam. For example, a redFrench car and an old German car could lead toan old red French car or an old red German car.

D. Compositional Adaptation

Compositional adaptation arose as a CBR adapta-tion approach [7], [8], but has found significantapplications in adaptive software [9], [10]. Theintuition behind compositional adaptation is thatindividual component cases can be broken apart intopieces and reconnected based on their connections.In adaptive software this is sets of functions withgiven inputs and outputs that can be strung together

to achieve various effects, which makes composi-tional adaptation similar to planning or a grammarwhen it includes a goal state or output. However, itcan also be applied in a goal-less way to generatesequences or graphs of components. Unlike amal-gamation and conceptual blending, compositionaladaptation does not require an explicit knowledgebase by default. However, it is common to makeuse of a knowledge base to generalize componentsand their relationships in order to expand the set ofpossible combinations.

E. Procedural Content Generation

Procedural content generation is the umbrellaterm for systems that take in some design knowledgeand output new content (game assets) from thisknowledge. Approaches include evolutionary search,rule-based systems and instantiating content fromprobability tables [11], [12].

F. PCGML

The basis of this work fits into the field ofprocedural content generation via machine learning(PCGML) [13], which encompasses a range oftechniques for generating content from modelstrained on prior content. Super Mario Bros. levelgeneration has been the most consistent domainfor this research, with researchers applying suchtechniques as markov chains [14], Monte-Carlotree search [15], long short-term recurrent neuralnetworks [16], autoencoders [17], generative adver-sarial neural networks [18], and genetic algorithmsthrough learned evaluation functions [19]. Closest toour own work in terms of combining computationalcreativity and machine learning, Hoover et al. [20]adapt computational creativity techniques originallydeveloped to generate music to create Super MarioBros. levels.

The major differences between our research andthe other work in the field of PCGML are (1)the source of training data and (2) the focus onproducing entirely new games. In terms of the first,by focusing on gameplay video as the source oftraining data we avoid the dependence on existentcorpora of game content, opening up a wider set ofpotential training data. Second, our work focuseson the creative generation of novel games, whilemost prior PCGML techniques attempt to recreatecontent in the same style as their training data.

G. Automated Game Design

In this section we cover examples of automatedgame design, PCG used to produce entire games.

Treanor et al. [21] introduced Game-o-matic,a system for automatically designing games tomatch certain arguments or micro-rhetorics [22].This process created complete, if small, video

Page 3: Conceptual Game Expansion

3

games based on a procedure for transforming thesearguments into games. Cook et al. produced theANGELINA system for automated game designand development [23]. There have been a varietyof ANGELINA system versions, each typicallyfocusing on a particular game genre, with eachemploying grammars and genetic algorithms tocreate game structure and/or rules [24].

Nelson and Mateas [25] proposed system to swaprules in and out of game representations to createnew experiences. Gow and Corneli [26] proposed asimilar system explicitly drawing on combinationalcreativity. Nielsen et al. [27] introduced an approachto mutate existing games expressed in the videogame description language (VGDL) [28]. Nelsonet al. [29] defined a novel parameterized space ofgames and randomly alter subsets of parameters toexplore this design space.

More recent work has applied variational autoen-coders to produce noisy replications of existinggames, called World Models or Neural Renderings[30], [31]. By its nature this work does not attemptto create new games, but recreate subsections ofexisting games for automated game playing agenttraining purposes.

Osborn et al. [32] proposed automated gamedesign learning, an approach that makes use ofemulated games to learn a representation of thegame’s structure [33] and rules [34]. This approachis most similar to our work in terms of derivinga complete model of game structure and rules.However, this approach depends upon access toa game emulator and has no existing process forcreating novel games.

III. GAME GRAPH REPRESENTATION

To generate full new video games we employ anovel representation we call a game graph, whichencodes both level-design information and gameruleset information. In this section we discuss howwe learn the two constituent parts and construct agame graph for a given game. We will combinemultiple game graphs via conceptual expansion tocreate new games.

The process for learning the input game graphsis as follows: we take as input gameplay video anda spritesheet. A spritesheet is a collection of allof the images or sprites in a game, including allbackground art, animation frames, and componentsof level structure. We run image processing on thevideo with the spritesheet to determine where andwhat sprites occur in each frame. Then, we learna model of level design and a ruleset for the gamefrom this input. We then use the models of leveldesign and game ruleset to construct a game graph.In the following subsections we review the pertinentaspects of the techniques we use to learn models oflevel design and game rulesets, and how we create

Figure 1: A visualization of the model (left) andbasic building blocks from a subsection of the NESgame Mega Man.

game graphs from the output of these techniques.We ran this process on three games to derive gamegraphs: Super Mario Bros., Mega Man, and Kirby’sAdventure.

A. Level Design Learning

In this section we summarize our process forlearning level design knowledge, which is coveredin greater detail in [35], [36]. At a high level thistechnique derives a hierarchical graphical modelor Bayesian network that represents probabilitiesover local level structure. We visualize the abstractmodel and two base components in Figure 1 for thegame Megaman. There are a total of five types ofnodes in the network:

• G: Distribution over geometric shapes of spritetype t. In Figure 1 each box contains a G nodevalue.

• D: Distribution over relative positions of spritetype t. In Figure 1 all the purple lines representD node values.

• N : Distribution of numbers of sprite types ina particular level chunk. For example in Figure1 there are eleven spikes, three bars, one flyingshell, etc.

• S: The first hidden value, on which G and Dnodes depend. S the distribution of sprite stylesfor sprite type t, in terms of the distribution ofgeometric shapes and relative positions. Thatis, categories of sprites. For example, in Figure1 there are three bars, but they all have thesame S node value.

• L: Distribution over level chunks.L nodes are learned by a process of parsinggameplay video and iterative, hierarchical clustering.A probabilistic ordering of L nodes is learned togenerate full levels. This ordering is representedas a directed graph we call a level graph withprobabilistic weights on its edges. These edges canbe walked to determine a sequence of types oflevel chunks, which can then be filled in with theassociated L node.

Page 4: Conceptual Game Expansion

4

Figure 2: A visualization of two pairs of framesand an associated engine modification.

B. Ruleset Learning

In this section we summarize the approach tolearn rules from gameplay video as originallydescribed in [37]. This technique re-represents eachgameplay video frame as a list of conditional factsthat are true in that frame. The fact types are asfollows:

• Animation: Each animation fact tracks aparticular sprite seen in a frame by its name,width, and height.

• Spatial: Spatial facts track spatial informa-tion, the x and y locations of sprites on thescreen.

• RelationshipX/RelationshipY : The Rela-tionshipX and RelationshipY facts track therelative positions of sprites to one another intheir respective dimensions.

• V elocityX/V elocityY : The VelocityX andVelocityY facts track the velocity of entities intheir respective dimensions.

• CameraX: Tracks the camera’s x position.• CameraY : Tracks the position of the camera

in the y dimension.

The algorithm iterates through pairs of frames, usingits current (initially empty) ruleset to attempt topredict the next frame. When a prediction fails itbegins a process of iteratively updating the ruleset byadding, removing, and modifying rules to minimizethe distance between the predicted and actual nextframe. We visualize adding and later modifying arule in Figure 2. The rules are constructed withconditions and effects, where conditions are a setof facts that must exist in one frame for the rule tofire and effects are a pair of facts where the secondfact replaces the first when the rule fires.

C. Game Graph Construction

The output of the level design model learningprocess is a probabilistic graphical model andprobabilistic sequence of nodes. The output of theruleset learning process is a sequence of formal-logic rules. We take knowledge from both of these

Figure 3: A subset of the game graph for one WaddleDoo enemy sprite from Kirby’s Adventure and arelevant gameplay frame.

to construct a game graph. This represents a minimalgraphical representation for an entire game.

The construction of a game graph is straightfor-ward. Each sprite in a spritesheet for a particularexisting game becomes a node in an initiallyunconnected graph. Then all the information fromthe level design model and ruleset representationsis added as edges on this graph. There are ninedistinct types of edges that contain particular typesof values:

• G: Stores the value of a G node from thelevel design model, represented as the x and ypositions of the shape of sprites, the shape ofsprites (represented as a matrix), and a uniqueidentifier for the S and L node that this G nodevalue depends on. This edge is cyclic, pointingto the same component it originated from. Wenote one might instead store this informationas a value on the node itself, but treating it asan edge allows us to compare this and othercyclic edges to edges pointing between nodes.

• D: Stores the value of a D node connection,represented as a vector with the relative po-sition, the probability, and a unique identifierfor the S and L node that this D node valuedepends on. This edge points to the equivalentnode for the sprite this D node connectionconnected to.

• N: Stores the value of an N node, which is avalue representing a particular number of thiscomponent that can co-exist in the same levelchunk and a unique Identifier for its L node.This edge is cyclic.

• Rule condition: Stores the value of a partic-ular fact in a particular rule condition, whichincludes the information discussed for therelevant fact type and a unique identifier forthe rule it is part of. This edge can be cyclicor can point to another component (as withRelationship facts).

• Rule effect: Stores the value of a particularrule effect, which includes the informationfor both the pre and post facts and a uniqueidentifier for its rule. This edge can be cyclic

Page 5: Conceptual Game Expansion

5

or can point to another component.We note that a prior game graph representation

did not include the level graph data, the probabilisticsequencing data that allowed for entire levels to beconstructed instead of just chunks of levels [38].We add a node for each L node and the followingfour edges in order to capture this information.

• Level Chunk Type: This simply stored theid of this level chunk category, which is anarbitrary but unique id generated during thelevel chunk categorization process. Notably thisid is related to the information stored in D, G,and N edges to identify their associated L node.Thus generated L nodes can be associated withthis level chunk sequence information whengenerating levels from game graphs. This edgeis cyclic, pointing to the same component itoriginated from.

• Level Chunk Repeats: Stores a minimum andmaximum number of times this level chunktype can repeat in a sequence. This edge typeis cyclic.

• Level Chunk Position: Stores a float valuespecifying the average, normalized position thislevel chunk tends to be found in a sequenceof level chunks. This information is not usedduring level generation, but is a helpful featurefor mapping similar level chunk category nodes.This edge is cyclic.

• Level Chunk Transition: Stores a pointerto another level chunk category node and theprobability of taking that transition. However,notably this edge stores insufficient data torecreate the level graph. Instead this solves thelevel chunk sequence generation problem witha Markov chain-like representation, samplingfrom possible transitions probabilistically untilit hits a node without any outgoing transitions.This edge points from this node to the associ-ated level chunk category node.

We visualize a small subsection of a final gamegraph for the Waddle Doo enemy from Kirby’sAdventure in Figure 3. The cyclic arrows in bluewith arrows represent rule effects that impact veloc-ity (we do not include the full rule for visibility).The orange dotted arrows represent rule effects thatimpact animation state. The dashed purple arrowsrepresent D node connection edges. The actual gamegraph is much larger, with dozens of nodes andmore than a hundred thousand edges. It containsall information from the learned level design modeland game rulesets. That is, a playable game can bereconstructed from this graph. This is a large, denserepresentation, with each of the three game graphswe constructed having hundreds of thousands ofedges. The graph structure can be manipulated inorder to create new games by adding, deleting, oraltering the values on edges. Small changes allow

for small variations (e.g. doubling Mario’s jumpheight by tweaking the values of two edges, one forthe starting y velocity of the jump and another tobegin the jumps deacceleration) and larger changesallowing for entirely distinct games.

IV. CONCEPTUAL EXPANSION

The size, complexity, and lack of uniformity ofthe game graph representation makes them ill-suitedto generation approaches that rely on statisticalmachine learning. Instead we apply conceptual ex-pansion to these game graphs. Conceptual expansionis a parameterized function that defines a spaceof possible output combinations. We define theconceptual expansion function as:

CEX(F,A) = a1 ∗ f1 + a2 ∗ f2...an ∗ fn (1)

Where F = {f1...fn} is the set of all mappedfeatures and A = {a1, ...an} is a filter representingwhat of and what amount of mapped feature fishould be represented in the final conceptual ex-pansion. X is the concept that we are attemptingto represent with the conceptual expansion. In thispaper the final CEX value is some novel gamegraph node that is being created via the combinationof existing game graph nodes F with the A valuesinforming the formula above on what to take fromeach mapped existing game graph node f .

As an example, imagine that we do not haveGoombas (the basic Mario enemy seen in Figure 2)as part of an existing game graph. Imagine we aretrying to recreate a Goomba node with conceptualexpansion (CEGoomba) and we have the Waddle Doonode as seen in Figure 3 mapped to this Goombanode. In this case there may be some edges fromthis node we want to take for the Goomba nodesuch as the velocity rule effect to move left and fall,but not the ability to jump. In this case one canencode this with an awaddledoo that filters out jump(e.g. awaddledoo = [1,0...]). One can alter awaddledoo

to further specify one wants the Goomba to movehalf as fast as the Waddle Doo. One might imagineother f and corresponding a values to incorporateother information from other game graph nodesfor a final CEGoomba(F,A) that reasonably ap-proximates a true Goomba node. We demonstratedthat conceptual expansion could be employed tomore successfully recreate existing games than otherstandard automated game generation approaches in[38]. However, we had not yet determined howhuman players would react to games produced byconceptual expansion, which we explore in thispaper.

At a high level in this process, conceptual ex-pansion creates a parameterized search space froman arbitrary number of input game graphs. Onecan think of each aspect of a game level design orruleset design as a dimension in a high-dimensional

Page 6: Conceptual Game Expansion

6

Algorithm 1: Goal-Driven Conceptual Ex-pansion Search

input : a partially-specified goal graph goalGraph,and a mapping m

output : The maximum expansion found according tothe heuristic

1 maxE ← ExpansionFromGoal(goalGraph)+m;2 maxScore ← 0;3 improving ← 0;4 while improving < 10 do5 n ← maxE.GetNeighbor();

6 s ← Heuristic(n, goalGraph);7 oldMax ← maxScore;8 maxScore, maxE ← max([maxScore, maxE ],

[s, n ]);9 if oldMax < maxScore then

10 improving ←improving +1;11 else12 improving ←0;

13 return maxE;

space. Changing the value of a single dimensionresults in a new game design different from theoriginal in some small way. Input game graphsprovides a learned schematic for what dimensionsexist and how they relate to each other. With twoor more game graphs, one can describe new gamesas interpolations between existing games, extrap-olations along different dimensions, or alterationsin complexity. One can then search this space tooptimize for a particular goal or heuristic, creatingnew games. One can then re-represent these newgame graphs into level design models and rulesetsto create new, playable games.

Conceptual expansion over game graphs has twosteps. First, a mapping is determined, which givesus the initial a and f values for each expandednode. This is accomplished by determining the bestn nodes in the knowledge base according to somedistance function. For the second step we make useof a greedy hill-climbing approach we call goal-driven conceptual expansion search.

We include the pseudocode for the goal-drivenconceptual expansion search algorithm in Algorithm1. The input to this process is a partially-specifiedgoal graph (goalGraph) which defines the basicstructure of the initial conceptual expansion (e.g.the number of nodes and some subset of the edgeson those nodes) and a mapping (m) derived as wediscuss above. On line one the function “Expan-sionFromGoal” creates an initial set of nodes fromthe goalGraph, and for each expanded node fillsin all its a and f values according to a normalizedmapping in which the best mapped component hasa values of 1.0, and the a values of the rest followbased upon their relative distance. We then begin agreedy search process with the operators from [38],searching ten randomly sampled neighbors at eachstep, and stopping when a local maxima is reached.We discuss our heuristic later in this section.

Figure 4: The GrafxKid “Arcade Platformer Assets”spritesheet used in this paper.

V. SPRITESHEET-BASED GAME GENERATION

One issue at this point is how to construct newvisuals for any generated games, as the givenapproach will output novel game graphs but whereeach entity is just a rectangle with a unique id [38].It is common practice for artists to make spritesheetsfor games that do not exist. If we could alter theconceptual expansion generation of game graphsto target a particular spritesheet and make a gameto match it, that would solve this issue. Further,this limits the problem from one of creating anypossible video game to creating games that match aparticular spritesheet, making the evaluation muchsimpler.

In this section we cover the pipeline for thespritesheet-based conceptual expansion game gener-ator, the construction of what we call a proto-gamegraph from a spritesheet, the mapping from the threeexisting game graphs onto that spritesheet, and theheuristic we designed to represent value, surprise,and novelty. We note that for the purposes of thispaper we used the spritesheet in Figure 4.

A. Proto-Game Graph Construction

Just by looking at a spritesheet a human designercan imagine possible games it could represent. Itstands to reason that constructing a game graph-like representation from a spritesheet could provehelpful as an analogue to this process. Towards thisend we developed a procedure to construct whatwe call a “Proto-Game Graph” from the spritesheet.It is common practice for visually similar spritesin a particular spritesheet to be related in termsof game mechanics and/or level design. Given thisintuition we employ one round of single-linkageclustering on the individual sprites of a spritesheetin an attempt to automatically group visually similarsprites. First, we represent each sprite as a bag of allof its 3x3 pixel features, given that bag of featuresstrategies tend to perform well on image processingtasks even compared to state-of-the-art methodsand we lack the training data for these methods

Page 7: Conceptual Game Expansion

7

[39]. We then construct a simple distance functionrepresented as the size of the disjoint set of thesefeatures normalized to the sprite sizes.

We treat all clusters as a single game graph node.This gives an initial, underspecified game graph thatwe call a proto-game graph. The proto-game graphtechnically has some mechanical (animation fact)and level design (G node) information, but does notrepresent a playable game and could not be used togenerate levels. However, by having it in the samerepresentation as a true game graph we can treat itas one for the purposes of constructing a mapping.

We also identify the game graph nodes that shouldbe used as the player of the game, representedas a simple boolean value on each node. This isnecessary as the player is identified for the threeexisting game graphs. However, in order to minimizethe potential for this choice to impact the creativityof the output all the baselines (including human-designed games) were also informed what spritesto treat as the player.

B. Proto-Game Graph Mapping

The first step of any combinational creativityapproach is to determine a mapping. In this casethe nodes of the existing game graphs need to bemapped onto the nodes of the proto-game graph.To accomplish this each node of all three of theexisting game graph is mapped onto the closest nodein the proto-game graph. We use an asymmetricalChamfer distance metric as in [38], which meansthat it didn’t matter that the proto-game graph nodeswere under-specified. This distance function variesfrom [0,1] and we only allow mappings where thedistance between the two nodes is <1, ensuring theyhave some similarity. In the case that this mappingleaves some nodes in the proto-game graph withoutany existing game graph nodes mapped to them,which would leave them empty and mean that theyhad no chance to appear in any generated games, wemap each of these empty proto-game graph nodesto its closest existing game graph node (flipping thedirection of the distance function).

We separately handle the L nodes, nodes pertain-ing to the camera, and to the concept of nothing(“None”) used to represent empty game entitiesin the ruleset learning process as they were notderivable from a spritesheet. We add nodes for theCamera and None entities to the proto-game graph,due to the fact that all of the existing game graphsalso have one of each of these specialized nodes.For the remaining unmapped existing game graphnodes we run K-medians clustering with k estimatedwith the distortion ratio [40]. For the spritesheetproto-game graph used in this paper this lead tofive clusters of level chunk category nodes. Wemap each cluster to a single node, meaning a final

five level chunk category nodes appeared for thisspritesheet proto-game graph.

This mapping is used to derive an initial concep-tual expansion for the conceptual expansion searchprocess. We also used it as the mapping for the threecombinational creativity baselines (amalgamation,conceptual blending, and compositional adaptation)in the human subject study described below.

C. Heuristic

Our goal for the output games was that theybe considered creative, unique new games. Thusfor the heuristic we drew on the three aspects ofcreativity identified by Boden [41]: novelty, surprise,and value.

For novelty we employ the minimum Chamferdistance comparing the current game graph to allexisting game graphs in a knowledge base. Theknowledge base in this case includes the threeoriginal existing game graphs, and any generatedgame graphs created by the conceptual expansiongame generator. Intuitively this can be understoodas a measure of how dissimilar a particular gamegraph is compared to the most similar thing thesystem has seen before.

Surprise is difficult to represent computationallygiven that it relies on some audience’s expectationsbeing contradicted [41]. We also wanted a metricthat wouldn’t just correlate exactly with novelty,but would represent a distinct measure. Thus wefirst built a representation of audience expectationsby constructing what we call Normalized MappingMagnitude Vectors. One of these vectors is con-structed for each of the three original game graphs,by mapping each node of each game graph to itsclosest node in the two remaining graphs. From thiswe collected the number of times each of the re-maining two graph’s nodes had been mapped, sortedthese values and then normalized them. This givesus three one-tailed distributions. These essentiallyreflect how a person seeing one of these gameswould relate it to the two games they had alreadyseen, thus this hypothetical person’s expectations.To determine the surprise for a novel game graph thesystem constructs the same distribution by findingthe mapping counts from the current graph to theexisting graphs (including any graphs producedby the system). We compared the distributions bycutting the tails so that they were equal, normalizingonce again, and then directly measuring the distancebetween the two values at each position. Giventhese were normalized vectors, the max differencebetween the two would be 2, and thus we dividethis number by 2.0 to determine the surprise valuecompared to each existing game. We then take theminimum of these values as the final surprise value.

Value is the final creativity component to repre-sent in this heuristic. Value in games is difficult to

Page 8: Conceptual Game Expansion

8

evaluate objectively [42], [43]. Given the focus ofthis paper is not on this unsolved problem we insteadfocused on only one aspect of a video game’s value,which is more easily represented computationally:challenge. However, the notion of automaticallyreflecting human experience across a level is anothernon-trivial problem and a current research area interms of automated playtesting [44], [45], [46].

We decided on an A* agent simulation approach.Specifically, we looked at how an A* agent usingthe generated rules performed. However, instead ofonly using the coarse measure of whether or not theagent could complete a level chunk, we collect threesummarizing statistics from the A* agent’s searchacross a given level chunk. Specifically we collectthe max distance the A* agent traversed, normalizedto the width of the level chunk (0 meaning the agentnever moved, 1 meaning the agent made it to theend), the number of times neighbors to a currentnode did not include the A* agent (the agent dying)and the number of times the A* agent fell belowthe level chunk’s lowest point (falling off screen).

We collected these summarizing statistics acrossone-hundred sampled level chunks from each ofthe original three game graphs. This allowed usto derive a distribution over these values for theoriginal games. However, given the time cost insimulating an A* agent, when the heuristic wascalled we did not wish to simulate one-hundredlevel chunks for each game graph during the searchprocess, during which hundreds of game graphswould be evaluated. Instead we simulate only fivelevel chunks and compared the median, first quartile,and third quartile of this five sample distributionand the existing one-hundred sample distributions interms of the collected metrics. In this case the goalfor the system is to minimize the distance betweenthis representation of value and the original games(as we assume the original games have appropriatechallenge). Thus if the median, first quartile, andthird quartiles exactly match any of the three originalexisting games this returns a 1.0, with the lineardistance taken otherwise. Notably this means thatthe value metric is the only metric that does notinvolve any other generated games previously outputby the system, as there is no way to ensure thepreviously generated games are well-designed interms of challenge.

Taken together then the maximum heuristic valueis 3.0, with a maximum of 1.0 coming from thenovelty, surprise, and value metrics. An ideal gamethen would be nothing like any of the original inputgames or previously output games, while still beingroughly as challenging.

VI. HUMAN SUBJECT STUDY

In this section we discuss the human subjectstudy that we designed to evaluate this spritesheet-

Figure 5: The first screen of all nine games in thehuman subject study.

based, conceptual expansion game generator. Wefirst discuss the nine games created for the studyin terms of the three types of game generators(human game designers, our conceptual expansionapproach, and the three existing combinationalcreativity approaches). The first screen of each gameis visualized in Figure 5, where each run representsone of these three types, in order. The games arediscussed in more detail below. After the games,we discuss the process participants went through inthe study.

A. Human Games

We contacted seven human game designers tomake games for this study. We asked these designersto create a game under the same constraints as theautomated approaches, in terms of using the samesprites and producing a platformer game. Threedesigners agreed to take part, each producing onegame. Due to our involvement in this process, aseparate video game playing expert without knowl-edge of this study outside of these contracts verifiedthe games were sufficient and met our requiredconstraints. Notably, we did not see the human-made games until the study launched. We highlighteach designer and their game in the order in whichthe games were completed below. Information aboutthe game designers and games was collected afterthe designers submitted their games and had themapproved by the third party.

Kise (K): The first game designer goes byKise and described herself as a hobbyist gamedeveloper. She developed her game in javascript,and took inspiration from the game Super MeatBoy. The player retains acceleration in the x-dimension, leading to a “slidey” feeling. The gameis playable at http://guzdial.com/StudyGames/0/.Chris DeLeon (CD): The second game designerwas Chris DeLeon. He described himself as an“online educator”. His javascript game is much

Page 9: Conceptual Game Expansion

9

more of a puzzle game, with the goal being tocollect all of the coins (some hidden in blocks)without falling off the screen or without a bluecreature touching you. One can find more fromChris DeLeon at ChrisDeLeon.com. The gameis playable at http://guzdial.com/StudyGames/1/.Trenton Pegeas (TP): The third game designer wasTrenton Pegeas. When asked to describe himselfhe stated he’d “like to classify as indie, but closerto hobbyist”. He created his game in Unity. Healso created by far the more complex game, both interms of creating the largest level and includingthe most sprites and entities in his game. Theplayer follows a fairly linear path winding alongthe level from their starting position, interactingwith different entities until they make their way toa goal. One can find more from Trenton Pegeasat https://twitter.com/Xist3nce_Dev. The game isplayable at http://guzdial.com/StudyGames/2/.

B. Conceptual Expansion Games

We generated three conceptual expansion games(shortened to “expansion games”) for this study.After each conceptual expansion game graph wasgenerated it was added to the knowledge base ofexisting games for the system, which impacted thenovelty and surprise portions of the heuristic. Thusthe second game had the first game in its knowledgebase and the third game had both the first and secondgames in its knowledge base. Below we brieflydescribe all three games in the order they weregenerated. The expansion games outperformed thecombinational creativity baseline games describedbelow in terms of the final heuristic values forthe novelty metric, which is surprising given thatthey had additional points of comparison. This islikely due to the larger output space of conceptualexpansion.

Expansion Game 1 (E1): The first expansiongame generated by the system, and therefore notablythe only one based solely on the existing games,is not much of a platformer. Instead the systemproduced something like a surreal driving or flyinggame, with the player moving constantly to the rightand needing to avoid almost all in game entitiessince the player dies if it touches them. The game isplayable at http://guzdial.com/StudyGames/3/. Ex-pansion Game 2 (E2): The second expansion game,which included the prior game in its knowledge basefor the novelty and surprise measures, was also notmuch of a platformer. If the player moved left orright they would slowly go in that direction whilegently falling along the y-axis. If the player went upthey shot up quickly, meaning that the player has tobalance going up to a set rhythm, somewhat like thegame Flappy Bird. If the player goes too high or toolow they die and if they stay still they die. There isnothing to dodge in this game, meaning that once

the player gets the rhythm of hitting the up arrow orspace bar it is a simple game. The game is playableat http://guzdial.com/StudyGames/4/. ExpansionGame 3 (E3): The third expansion game, whichincluded the prior two games in its knowledge basefor the novelty and surprise measures, was evenmore like Flappy Bird. The player normally fallsat a constant speed. The avatar moves slowly leftor right when the player hits the left or right arrow.If the player hits the up arrow or space bar theavatar shoots up, with the inverse happening if theplayer hits the down arrow. The game is largelythe same as the second game except for the moresurreal level architecture and an area of “anti-gravity”(as dubbed by a study participant) whenever theplayer is above or below a patch of heart sprites.The player stretches when they go up or down, butnothing animates otherwise. The game is playableat http://guzdial.com/StudyGames/5/.

C. Baseline Games

To determine the extent to which conceptualexpansion was better suited to this task comparedto existing combinational creativity approaches, weincluded one game made by each of the three com-binational creativity approaches highlighted in therelated work (amalgamation, conceptual blending,and composition adaptation).

In all cases we employed a similar processto conceptual expansion search, only instead ofsampling from the space of possible conceptualexpansions we instead sampled from the particularoutput spaces of each approach. The only change wemade to these existing approaches was in allowingfor more than two inputs, but given that all threeapproaches employed the same mapping as theconceptual expansion we largely side-stepped thisissue. We chose to include a single game from eachapproach as each approach has been historicallydesigned to only produce one output for every input.We briefly highlight each game below.

Amalgamation (A): The amalgamation processmeant that whole existing game graph nodes wereused for each node of the proto-game graph. Thislead to perhaps the second most platformer-likeof the generated games, with the player largelyhaving the physics of kirby from Kirby’s Adven-ture, though slightly broken due to some nodesfrom the Kirby game graph not being included.Because of the Kirby-esque physics it is almostimpossible to lose this game. The game is playableat http://guzdial.com/StudyGames/6/. ConceptualBlending (B): The conceptual blend combined asmuch of the mapped existing game graph nodesas possible for each node of the proto-game graph.This lead to a game strikingly similar to the secondand third expansion game, except with the notabledifference that at the end of every frame the player

Page 10: Conceptual Game Expansion

10

transformed into a crab arm sprite. Thanks toa serendipitous mapping that lead to treating acrab sprite as ground this lead to a surprisinglysemantically coherent (if strange) game. The game isplayable at http://guzdial.com/StudyGames/7/. Com-positional Adaptation (C): The compositionaladaptation approach lead to a game that we believewas the most like a standard platformer, giventhat it was essentially a smaller version of theamalgam game. The game again used the system’sunderstanding of Kirby physics, but without theamalgamation requirement to incorporate as muchinformation as possible it is more streamlined. Thegame is essentially a series of small, very simpleplatforming challenges. The only strange thing wasthe random patches of numbers floating in theair, having had decorative elements from othergames mapped onto them. The game is playable athttp://guzdial.com/StudyGames/8/.

D. Human Subject Study Method

The study was advertised through social media,in particular Twitter, Facebook, Discord, and Reddit.Participants were directed to a landing page wherethey were asked to review a consent form, and thenpress a button to retrieve an ID, the links to the threegames they would play, and a link to the post-studysurvey. Each player played one human game, oneexpansion game, and one baseline game in a randomorder. Players were asked to give each game “ afew tries or until you feel you understand it”. Afterthis the player took part in the post-study survey.

The participants were asked to rank the gamesacross the same set of experiential features em-ployed in [36]. Specifically fun, frustration, chal-lenge, surprise, “most liked”, and “most creative”.Once the participant finished the ranking questionsthey were asked to indicate which of the gamesthey felt was made by a human and which by anAI. Participants had been informed “some” of thegames were made by humans and “some” by AI,but not which was which. We note this was madesomewhat easier given that all the AI-generatedgames were made in Unity, while two of the threehuman games were made in javascript. Followingthese questions participants were asked a series ofdemographic questions, concerning how often theyplayed games, how often they played platformergames, how familiar they were with video gamedevelopment, how familiar they were with artificialintelligence, their gender, and age.

E. Human Subject Study Results

One-hundred and fifty-six participants took partin this online study. Of these, one-hundred andseventeen identified as male, twenty-seven as female,six as non-binary, and the remaining six did not

answer the gender question. Eighty-eight of theparticipants placed themselves in the 23-33 agerange, Forty-one in the 18-22 age range, and twenty-five in the 34-55 age range. There was a strongbias towards participants familiar with video gamedesign and development, with one-hundred and eightparticipants ranking their familiarity as three or moreon a five point scale. Similarly, one-hundred andtwelve of the participants played games at leastweekly, though only eighty-two played platformergames at least monthly. These results suggest thatthis population should have been well-suited toevaluating video games. There was also a strongbias towards the participants being AI expertswith eighty-one selecting four or more out of five,with an additional thirty-two selecting three. Thusthese participants should have been well-suitedto evaluating AI-generated video games, thoughperhaps too skilled at differentiating between humanand AI generated games.

Our principle hypothesis was that the conceptualexpansion games would do better in terms ofvalue and surprise compared to the baseline games,with the human games considered a gold standard.Notably we do not include novelty as we havean explicit, objective measure for novelty, whileboth the value and surprise metrics depended onapproximations of human perception. We can breakthese into the following hypotheses for the purposeof interpreting the results.

H1. The expansion games will be more similarto the human games in terms of challenge incomparison to the baseline games.

H2. The expansion games will be overall the mostsurprising.

H3. The expansion games will be more like thehuman games in terms of the other attributesof game value.

H4. The expansion games will be more creativethan the baseline games.

Hypothesis H1 essentially mirrors the measurewe designed for value, creating games that arechallenging in the ways that the existing, human-designed games are challenging. H2 in turn mirrorsthe measure we designed for surprise, which aimsto be as or more surprising compared to the human-designed games. H3 is meant to determine theimportance of the choice of heuristic when it comesto conceptual expansions, whether the approachcarries over aspects of the input data not representedin the heuristic (e.g. all of the input games are fun,but the heuristic doesn’t attempt to measure that). H4then is meant to determine if conceptual expansionon its own reflects the human creative processbetter than the existing combinational creativityapproaches. In the following sections we break downdifferent aspects of the results in terms of thesehypotheses.

Page 11: Conceptual Game Expansion

11

Table I: A table comparing the median experientialranking across the different games according toauthor type.

Human Expansion BaselineFun 1st 3rd 2ndFrustrating 2nd 2nd 2ndChallenging 1st 2nd 3rdSurprising 3rd 1st 2ndLiked 1st 3rd 2ndCreative 2nd 2nd 2nd

Participants had no issue differentiating betweenthe human and artificially generated games outsideof a few outliers. We therefore do not include theseresults in our analysis.

F. Quantitative Results By Author Type

The first step was to determine the aggregateresults of people’s experience with the games,broken into the three types we identified above(human, expansion, baseline). One can considerthis as separating the games into different groupsaccording to the nature of their source or authoras long as one treats the baseline combinationalcreativity approaches as part of the same class ofapproach as in [47]. By bucketing the games intoone of three author types there were essentially sixpossible conditions participants could be placed in,based upon the order in which they interacted withgames from these three types. Because of randomlyassigning participants, the one-hundred and fify-sixparticipants were not evenly spread across these sixconditions. The smallest number assigned was to thecategory in which participants played a human gamefirst, an expansion game second, and a baselinegame third, with only nineteen participants. Thus werandomly selected nineteen participants from eachof the six categories, and use these one-hundredand fourteen results for analysis in this section.

We summarize the results according the themedian value of each ranking in Table 1. We reportmedian values as the rankings are ordinal but notnecessary numeric. We discuss the results more fullybelow and give the results of a number of statisticaltests. Notably due to the number of hypotheses andassociated tests we use a significance threshold ofp < 0.01. For all comparisons we ran the pairedWilcoxon Mann Whitney U test, given that theseare non-normal distributions.

Hypothesis H1 posits that the expansion gamesshould be ranked more like the human challengerankings than the baseline challenge rankings. Themedian values in Table 1 give some evidence to this.Further, there are significant differences betweenthe human and baseline challenge rankings (p =9.62e−06), and between the expansion and baselinechallenge rankings (p = 7.61e− 09). However, weam unable to reject that the human and expansion

challenge rankings come from the same underlyingdistribution (p = 0.041). H1 is thus supported bythese results.

H2 suggests that the expansion games shouldbe more surprising than either of the other twotypes of games. The median ranking being firstand therefore most surprising for the expansiongames lends credence to this. Statistically, we foundthat the expansion games were ranked significantlyhigher in terms of surprise than both the human(p = 1.95e − 13) and baseline (p = 4.83e − 06)rankings. Further, the baseline games were rankedas more surprising than the human game rankings(p = 3.81e − 09). H2 is thus supported. Theseresults further suggest that combinational creativityapproaches may generally produce more surprisingresults than human generated games for this partic-ular problem. Automated approaches can produceoutput unlike that which a human is likely to give,however even given that the conceptual expansionapproach performed the best.

H3 suggested that the expansion games wouldbe ranked more like the human games for otherelements of game value, which we investigate interms of the fun, frustrating, and most liked results.In terms of fun the median rankings do not supportthis hypothesis, with expansion ranked third morefrequently than the baseline ranking. However, thestatistical test was unable to reject the null hypoth-esis that the baseline and expansion fun rankingscame from the same distribution (p = 0.39). Butthe human rankings were significantly higher thanboth the baseline (p = 1.03e− 08) and expansionrankings (p = 1.23e − 09). For the frustratingquestion all of the games median rankings wereexactly the same, all of them having a median rankof second. However, the statistical tests allowed us tocompare the distributions with the expansion gamesranked as significantly more frustrating overall incomparison to both the human (p = 1.61e − 03)and baseline (p = 1.84e−04) rankings. The humanand baseline frustrating rankings were too similar toreject the that they arose from the same distribution(p = 0.25). Finally the most liked results parallelthe fun results, with the human games rankedsignificantly higher than baseline (p = 1.24e− 09)and expansion (p = 1.32e− 09) rankings, but withthe baseline and expansion rankings unable to rejectthe null hypothesis that they arose from the samedistribution (p = 0.66). This evidence does notsupport H3, with the expansion games only foundto be about as fun and liked as the baseline games,and more frustrating than these games.

H4 states that the expansion games will be viewedas more creative than the baseline games. Howeveras with frustration the median ranking of all ofthese games suggests that the participants evaluatedthe games equivalently on this measure. Unlike

Page 12: Conceptual Game Expansion

12

frustration though the statistical tests are unable topick apart these results, with no comparison able toreject that they derive from the same distribution(p > 0.7). Thus H4 is not supported by these results.

G. Quantitative Results By Game

There are several reasons why taking an aggregateview of the games may not be ideal. First, both in thecase of the human and baseline games there are threevery different authors, with the human games beingcreated by three humans and the baseline gamescreated by three distinct combinational creativity ap-proaches. Second, the conceptual expansion gamesbuild off one another in terms of surprise andnovelty, which cannot be captured in aggregate.Third, humans were ranking individual games, notthe sources of these games.

For this section we instead analyze the results interms of each game individually. Taken this waythere can be thought to be 27 distinct categoriesin terms of a random selection of three of thesenine games, and this is without taking into accountthe ordering of these games. Given the randomassignment the results are too skewed to run statis-tical tests given these categories (there are only twoparticipants who interacted with Kise’s game (K),Expansion game 2 (E2), and the adaptation game(A), as an example). Thus instead we treat theserankings as ratings, given that a multi-way ANOVAcannot find any significant impact from orderingon any of the experiential factors (p > 0.01). Inthis case we determine the game with the fewestnumber of results (the blend with only forty-threeparticipants), and randomly select this number ofresults for each of the nine games, for a total of three-hundred and eighty-seven ratings per experientialmeasure (compare this to the four-hundred andsixty eight total ratings available). We employ theseratings for the analysis in this section, due tothe number of comparisons we employ the sameconservative significant threshold of p < 0.01, andwe use the unpaired Wilxocon Mann Whitney Utest. We present the results in Table 2 in terms ofmedian rating per game.

The ratings give more nuanced results in termsof hypothesis H1. Hypothesis 1 states that theexpansion games (E1, E2, and E3) should be ratedmore like the human games (K, CD, and TP) thanthe baseline games (A, B, and C) in terms ofchallenge. For E2 and E3 there was no significantdifference between their challenge ratings and thechallenge ratings of any of the human games, butthere were significant differences found betweenthe challenge ratings of the amalgamation (A) andblend (B) games. However, E2 and E3’s challengeratings did not differ significantly from the challengeratings of the composition game (C). E1’s challengeratings only differed significantly with Kise’s game

(K), they were too like the other challenge ratingsto differ significantly for any of the other games.Thus H1 is supported.

Hypothesis H2 suggests that the expansion games(E1, E2, and E3) should be rated more surprisingthan either the human games or baseline games.E2 and E3 both had significantly higher surpriseratings than all but the amalgamation (A) andblend (B) games. However, while not significantaccording to the unpaired Wilxocon Mann WhitneyU test, their median surprise ratings were still higherthan these two games, as demonstrated in Table 2.Comparatively, E1 only had significantly highersurprise ratings than the three human games. Therewas no significant difference between E1’s surpriseratings and the surprise ratings of any of the baselinegames. This follows from the way the surpriseportion of the heuristic functioned, which wouldhave pushed E2 and E3 to have been consideredmore surprising than the expansion games outputbefore them. Thus H2 is supported.

Taken in aggregate the results in the prior sectiondid not support H3, which stated that the expansiongames would be evaluated more like the humangames than the baseline games in terms of othermeasures of game value outside of challenge. Look-ing at the results by game the picture becomes morenuanced.

In terms of the fun ratings, while largely thegames follow their aggregate rankings, E1 standsout as an outlier from the other two expansion games.In fact E1 is found to be rated significantly more funcompared to E2 (p = 4.22e−08), E3 (p = 3.601e−05), and the blend game (p = 2.49e). However,we cannot reject that this rating distribution comesfrom the same rating distribution as the amalgam(p = 0.03) and composition (p = 0.27) games.Similarly, there is no significant difference betweenE1 and CD (p = 0.16), both the remaining humangames are rated significantly more fun (p < 0.001).This suggests that E1 is more fun than the blendgame and roughly as fun as Chris DeLeon’s game(CD), the amalgam game (A), and the compositiongame (C).

For the frustration ratings E2 and E3 stand outas clear outliers from the remaining games, beingrated as the most frustrating the majority of thetime. These games are clearly the cause of theexpansion games being found to be significantlymore frustrating when taken in aggregate comparedto the other types of games, which individual testsconfirm (p < 0.01). However, there is no significantdifference comparing the E1 frustrating ratings andany of the other games. This suggests E1 is aboutas frustrating as all of the non-expansion games.

For the liked rating, both E1 and E3’s ratingsdiffer from the aggregate expansion games medianrating of 3. While E2 is significantly lower rated

Page 13: Conceptual Game Expansion

13

Table II: A table comparing the median ratings across each measure for each game. Median ratings of “1”marked in bold to make the table easier to parse.

K CD TP E1 E2 E3 A B CFun 1 1 1 2 3 3 2 2 2Frustrating 2 2 3 2 1 1 2 2 2Challenging 1 2 2 2 2 2 3 3 2Surprising 3 3 3 2 1 1 2 2 2Liked 1 1 1 2 3 2 2 3 2Creative 2 2 2 2 2 2 2 2 2

than all other games on this measure (p < 0.01),both E1 and E3 are not. E1 is rated significantlyless in terms of this measure than Trenton Pegeas’Game (TP), and significantly higher rated than theblend game, with no significant difference from anyof the other non-expansion liked ratings. E3 on theother hand is rated significantly lower than all threehuman games, with no other significant differencescomparing any of the other liked rating distributions.

Overall these results suggest more support forH3, given that at least E1 seemed to be more like atleast one of the human games in terms of measuresof game value. This points to a potential issue withthe heuristic, which overly biased games towardssurprise and novelty at the cost of other positiveelements of the input games that impacted theseexperiential features. The effect of this would havebeen compounded for E2 and compounded again forE3 given the changing knowledge bases associatedwith these games that impacted the novelty andsurprise measures. We will discuss this further inthe next section.

Given H4, we would expect to find the expansiongames rated as more creative compared to the othergames. However, breaking apart the games againlead to equal median values, pointing to the sameissue identified in [36] that untrained participantsappear to have difficulty consistently evaluatingcreativity. However, this is not the case. In particular,both E3 and the amalgam are rated significantlymore creative than Kise’s game (p < 0.01). How-ever, they do not differ significantly from any ofthe other games ratings. Thus we can only considerthere to be mixed support for H4.

H. Discussion

The results suggest strong support for H1 andH2, specifically that the games output by theconceptual expansion game generator closely relateto human made games in terms of challenge andare more surprising than human-made games orother combinational creativity games. Further, theoutput conceptual expansion games differed fromthe input games more than the output of existingcombinational creativity approaches. This seems tofollow from our choice of heuristic, which involvedvery specific measures for value, surprise, andnovelty. Notably all the combinational creativity

approaches used the same heuristic to search theirspaces of potential output, but the output space ofconceptual expansions seemed to have more optionsthat had higher measures of these values. This hadboth positive and negative consequences. Whilethe expansion games outperformed the baselinesin terms of challenge and all games in termsof surprise, the latter two games seemed to gainsurprise at the expense of other measures of gamevalue besides challenge (specifically being less fun,more frustrating, and less liked). This could in partbe due to the nature of the heuristic, where noveltyand surprise were the majority of the heuristic valueor due to the heuristic only focusing on challengeas a representation of value. Either way, futureapplications of conceptual expansion should likelyavoid this compounding effect in order to retainpositive features of the inputs.

VII. CONCLUSIONS

In this paper we presented an approach employingconceptual expansion over game graphs for auto-mated game generation. These results indicate theimportance of heuristics or other types of content se-lection strategies when it comes to conceptual expan-sion. Conceptual expansion outperformed existingcombinational approaches in terms of the particularheuristic we employed, thus creating games withchallenge like a human-designed games when werewarded that, and individual output outperformedcertain human games on general measures of gamequality. These results indicate the appropriatenessof conceptual expansion to automated game design.

REFERENCES

[1] M. Cook, S. Colton, and J. Gow, “The angelina videogamedesign system—part ii,” IEEE Transactions on Compu-tational Intelligence and AI in Games, vol. 9, no. 3, pp.254–266, 2016.

[2] F. Pereira, P. Norvig, and A. Halevy, “The unreasonableeffectiveness of data,” IEEE Intelligent Systems, vol. 24,no. undefined, pp. 8–12, 2009.

[3] G. Fauconnier, “Conceptual blending and analogy,” Theanalogical mind: Perspectives from cognitive science, pp.255–286, 2001.

[4] S. Ontañón and E. Plaza, “Amalgams: A formal approachfor combining multiple case solutions,” in Case-BasedReasoning. Research and Development. Springer, 2010,pp. 257–271.

[5] G. Fauconnier and M. Turner, “Conceptual integrationnetworks,” Cognitive science, vol. 22, no. 2, pp. 133–187,1998.

Page 14: Conceptual Game Expansion

14

[6] ——, The way we think: Conceptual blending and themind’s hidden complexities. Basic Books, 2002.

[7] J. H. Holland, Induction: Processes of inference, learning,and discovery. Mit Press, 1989.

[8] J. Fox and S. Clarke, “Exploring approaches to dynamicadaptation,” in Proceedings of the 3rd International Disc-CoTec Workshop on Middleware-Application Interaction.ACM, 2009, pp. 19–24.

[9] P. K. McKinley, S. M. Sadjadi, E. P. Kasten, and B. H.Cheng, “A taxonomy of compositional adaptation,” RapportTechnique numéroMSU-CSE-04-17, 2004.

[10] S. Eisenbach, C. Sadler, and D. Wong, “Component adap-tation in contemporary execution environments,” in Dis-tributed Applications and Interoperable Systems. Springer,2007, pp. 90–103.

[11] M. Hendrikx, S. Meijer, J. V. D. Velden, and A. Iosup,“Procedural content generation for games: A survey,” ACMTrans. Graph., vol. 9, no. 1, pp. 1:1–1:22, Feb. 2013.

[12] J. Togelius, G. Yannakakis, K. Stanley, and C. Browne,“Search-based procedural content generation: A taxonomyand survey,” Computational Intelligence and AI in Games,IEEE Transactions on, vol. 3, no. 3, pp. 172–186, 2011.

[13] A. Summerville, S. Snodgrass, M. Guzdial, C. Holmgård,A. K. Hoover, A. Isaksen, A. Nealen, and J. Togelius, “Pro-cedural content generation via machine learning (pcgml),”arXiv preprint arXiv:1702.00539, 2017.

[14] S. Snodgrass and S. Ontanon, “Learning to generate videogame maps using markov models,” IEEE Transactions onComputational Intelligence and AI in Games, 2016.

[15] A. J. Summerville, S. Philip, and M. Mateas, “Mcmctspcg 4 smb: Monte carlo tree search to guide platformerlevel generation,” in Eleventh Artificial Intelligence andInteractive Digital Entertainment Conference. AIIDE,2015.

[16] A. Summerville and M. Mateas, “Super mario as a string:Platformer level generation via lstms,” in Proceedings ofthe First International Conference of DiGRA and FDG,2016.

[17] R. Jain, A. Isaksen, C. Holmgård, and J. Togelius, “Au-toencoders for level generation, repair, and recognition,” inICCC Workshop on Computational Creativity and Games,vol. 306, 2016.

[18] V. Volz, J. Schrum, J. Liu, S. M. Lucas, A. Smith, andS. Risi, “Evolving mario levels in the latent space ofa deep convolutional generative adversarial network,” inProceedings of the Genetic and Evolutionary ComputationConference. ACM, 2018, pp. 221–228.

[19] S. Dahlskog and J. Togelius, “A multi-level level generator,”in Computational Intelligence and Games (CIG), 2014IEEE Conference on. IEEE, Aug. 2014, pp. 1–8.

[20] A. K. Hoover, J. Togelius, and G. N. Yannakis, “Composingvideo game levels with music metaphors through functionalscaffolding,” in First Computational Creativity and GamesWorkshop. ACC, 2015.

[21] M. Treanor, B. Blackford, M. Mateas, and I. Bogost,“Game-O-Matic: Generating videogames that representideas,” in Proceedings of the 3rd FDG working on Proce-dural Content Generation, 2012.

[22] M. Treanor, B. Schweizer, I. Bogost, and M. Mateas, “Themicro-rhetorics of game-o-matic,” in Seventh InternationalConference on the Foundations of Digital Games, 2012.

[23] M. Cook, S. Colton, and J. Gow, “The angelina videogamedesign system, part i,” IEEE Transactions on ComputationalIntelligence and AI in Games, 2017.

[24] M. Cook, S. Colton, A. Raad, and J. Gow, “Mechanicminer: Reflection-driven game mechanic discovery andlevel design,” in European Conference on the Applicationsof Evolutionary Computation. Springer, 2013, pp. 284–293.

[25] M. J. Nelson and M. Mateas, “Recombinable game me-chanics for automated design support.” in Fourth ArtificialIntelligence and Interactive Digital Entertainment Confer-ence, 2008.

[26] J. Gow and J. Corneli, “Towards generating novel gamesusing conceptual blending,” in Proceedings of the 11thAIIDE Conference, 2015.

[27] T. S. Nielsen, G. A. Barros, J. Togelius, and M. J. Nelson,“General video game evaluation using relative algorithmperformance profiles,” in European Conference on theApplications of Evolutionary Computation. Springer, 2015,pp. 369–380.

[28] T. Schaul, “A video game description language for model-based or interactive learning,” in Computational Intelligencein Games, 2013, pp. 1–8.

[29] M. Nelson, S. Colton, E. Powley, S. Gaudl, P. Ivey,R. Saunders, B. Perez Ferrer, and M. Cook, “Mixed-initiative approaches to on-device mobile game design,”in CHI Workshop on Mixed-Initiative Creative Interfaces,2016.

[30] D. Ha and J. Schmidhuber, “World models,” arXiv preprintarXiv:1803.10122, 2018.

[31] S. A. Eslami, D. J. Rezende, F. Besse, F. Viola, A. S.Morcos, M. Garnelo, A. Ruderman, A. A. Rusu, I. Dani-helka, K. Gregor et al., “Neural scene representation andrendering,” Science, vol. 360, no. 6394, pp. 1204–1210,2018.

[32] J. C. Osborn, A. Summerville, and M. Mateas, “Automatedgame design learning,” in Computational Intelligence andGames, 2017, pp. 240–247.

[33] J. Osborn, A. Summerville, and M. Mateas, “Automaticmapping of nes games with mappy,” in Proceedings ofthe 12th International Conference on the Foundations ofDigital Games. ACM, 2017, p. 78.

[34] A. Summerville, J. Osborn, and M. Mateas, “Charda:Causal hybrid automata recovery via dynamic analysis,”26th International Joint Conference on Artificial Intelli-gence, 2017.

[35] M. Guzdial and M. Riedl, “Toward game level generationfrom gameplay videos,” arXiv preprint arXiv:1602.07721,2016.

[36] ——, “Game level generation from gameplay videos,”in Twelfth Artificial Intelligence and Interactive DigitalEntertainment Conference, 2016.

[37] M. Guzdial, L. Boyang, and M. Riedl, “Game enginelearning from video,” in Twenty-Sixth International JointConference on Artificial Intelligence, 2017.

[38] M. Guzdial and M. Riedl, “Automated game design viaconceptual expansion,” in Fourteenth Artificial Intelligenceand Interactive Digital Entertainment Conference, 2018.

[39] W. Brendel and M. Bethge, “Approximating cnns withbag-of-local-features models works surprisingly well onimagenet,” arXiv preprint arXiv:1904.00760, 2019.

[40] D. T. Pham, S. S. Dimov, and C. D. Nguyen, “Selectionof k in k-means clustering,” Proceedings of the Institutionof Mechanical Engineers, Part C: Journal of MechanicalEngineering Science, vol. 219, no. 1, pp. 103–119, 2005.[Online]. Available: http://pic.sagepub.com/content/219/1/103.abstract

[41] M. A. Boden, “Creativity and artificial intelligence,” Artifi-cial Intelligence, vol. 103, no. 1-2, pp. 347–356, 1998.

[42] P. Sweetser and P. Wyeth, “Gameflow: a model forevaluating player enjoyment in games,” Computers inEntertainment (CIE), vol. 3, no. 3, pp. 3–3, 2005.

[43] R. Bernhaupt, M. Eckschlager, and M. Tscheligi, “Methodsfor evaluating games: how to measure usability and userexperience in games?” in Proceedings of the internationalconference on Advances in computer entertainment tech-nology. ACM, 2007, pp. 309–310.

[44] A. Zook and M. O. Riedl, “Automatic game design viamechanic generation.” in AAAI, 2014, pp. 530–537.

[45] A. Zook, B. Harrison, and M. O. Riedl, “Monte-carlotree search for simulation-based strategy analysis,” inProceedings of the 10th Conference on the Foundations ofDigital Games, 2015.

[46] C. Holmgård, M. C. Green, A. Liapis, and J. Togelius, “Au-tomated playtesting with procedural personas through mctswith evolved heuristics,” arXiv preprint arXiv:1802.06881,2018.

[47] M. Guzdial and M. O. Riedl, “Combinatorial meta search,”in Proceedings of the 1st Knowledge Extraction from GamesWorkshop, 2018.