Enhancing Student Models in Game-based Learning with ... · Enhancing Student Models in Game-based...

Enhancing Student Models in Game-based Learning

with Facial Expression Recognition

Robert Sawyer Department of Computer Science

North Carolina State University

[email protected]

Andy Smith Department of Computer Science


[email protected]

Jonathan Rowe Department of Computer Science


[email protected]

Roger Azevedo Department of Psychology


[email protected]

James Lester Department of Computer Science


[email protected]

ABSTRACT

Recent years have seen a growing recognition of the role that

affect plays in learning. Because game-based learning

environments elicit a wide range of student affective states,

affect-enhanced student modeling for game-based learning holds

considerable promise. This paper introduces an affect-enhanced

student modeling framework that leverages facial expression

tracking for game-based learning. The affect-enhanced student

modeling framework was used to generate predictive models of

student learning and student engagement for students who

interacted with CRYSTAL ISLAND, a game-based learning

environment for microbiology education. Findings from the study

reveal that the affect-enhanced student models significantly

outperform baseline predictive student models that utilize the

same gameplay traces but do not use facial expression tracking.

The study also found that models based on individual facial action

coding units are more effective than composite emotion models.

The findings suggest that introducing facial expression tracking

can improve the accuracy of student models, both for predicting

student learning gains and also for predicting student

engagement.

KEYWORDS Student Modeling, Affect, Game-based Learning

1 INTRODUCTION

Affect plays a central role in learning [3, 8, 9, 48]. Positive

emotions, such as curiosity and joy, can lead students to engage

more deeply in learning, while emotions such as boredom and

frustration can lead students to resort to unproductive behaviors

including hint abuse and gaming the system, or even to

completely disengage [37, 48]. Students exhibit a range

of emotions when interacting with adaptive learning

environments [13]. Building on recent advances in our

understanding of affect, which stem from both manually coded

protocols [15, 35] and automated techniques [7, 10, 36], affect-

enhanced student models that track and dynamically respond to

affect on a moment-by-moment basis offer significant potential

for enhancing adaptive learning environments.

While student modeling has long been a goal of the user

modeling community [19, 33], student modeling work to date has

focused primarily on cognitive aspects of learning [1]. Recent

student modeling work has begun to consider how to augment

student models with affect [12], as well as to tailor support based

on students’ affective states, such as confusion, frustration, and

boredom [14, 17]. Models of affect have also been used to predict

students’ reactions to virtual agents [25, 29] and better understand

student motivation and off-task behavior [40].

Game-based learning has been studied in a broad range of

subject matters and student populations [11, 20, 42, 49]. Because

game-based learning environments are designed to provide

students with engaging learning experiences, they offer a rich

“laboratory” for investigating affect-enhanced student

models. This paper reports on an investigation of affect-enhanced

student modeling for game-based learning. Specifically, using a

game-based learning environment for microbiology education,

CRYSTAL ISLAND [38], we compare two approaches to student

modeling: an affect-enhanced student model that uses facial

expression tracking in addition to gameplay data, and a baseline

student model that uses only gameplay data. We compare the

accuracy of the models for predicting both student learning

(measured with normalized learning gain) and student presence

(a facet of engagement measured with participant perception of

transportation into a virtual environment through a standard self-

report instrument [4]). In addition, we compare two competing

approaches to affect recognition—composite emotion models and

models based on individual facial action coding units—to explore

how different types of facial expression tracking can support

affect-enhanced student modeling.

2

2 RELATED WORK

Learning and affect are inextricably linked. It has been found, for

example, that positively valenced emotions tend to support

student learning [37]. Confusion has been shown to be beneficial

to learning in certain contexts [16], yet students experiencing

anxiety or anger tend to be less efficient learners [37]. Student

affect correlates with problem-solving performance [39] and is

predictive of user interactions with virtual agents [25].

Recognizing the potential for affect to inform student modeling

in learning environments, recent work with automated affect

detectors has found that integrating detectors into Bayesian

knowledge tracing models can improve predictive accuracy of

student performance [12].

Although there are many open questions about which

emotions or sets of emotions offer the greatest potential for

adaptive learning environments, as well as which sets of

multimodal data streams can best support adaptivity in learning

environments, facial expression tracking has emerged as a

promising means for inferring learning-centric emotions [10, 28].

Arroyo et al. [2] used facial expressions in combination with other

sensors to trigger interventions, which appeared to increase on-

task behavior and reduce hint-abuse. Bosch et al. [7] created

models that accurately predict student affect with facial action

units in a classroom setting that generalized across multiple days

and multiple classrooms. Grafsgaard et al. [24] found an additive

relationship between facial expressions, dialogue acts, and task

performance in a tutoring system, and more recent work in this

line of investigation found that male and female students exhibit

significant differences in facial expressions [43]. Collectively,

this work suggests that facial expressions provide a window into

affect and learning that can be utilized by student models to

support adaptive learning environments.

Upon detecting student affective states, adaptive learning

environments can deliver a broad range of possible supports for

student learning and emotion regulation. For example, D’Mello

et al. [14, 17] devised a modified version of AutoTutor that

delivered affect-sensitive dialogue moves, facial displays of

emotion, and synthesized speech patterns in response to student

affect. Results indicated that the affect-sensitive interventions had

a positive impact on student learning among low prior knowledge

students, but a neutral or even harmful impact on student learning

in other conditions. DeFalco et al. [18] found that self-efficacy-

oriented feedback messages, which were delivered in response to

detected learner frustration, helped to enhance student learning in

a simulation-based training environment for military combat

medics. However, the feedback messages were not found to

mitigate occurrences of frustration itself during training.

Although student modeling has been investigated for decades

in the intelligent tutoring systems community [44], only recently

has student modeling research turned to game-based

learning. Baker et al. [6] analyzed game-trace logs in the Virtual

Performance Assessments system to create assessment models of

students’ ability to design controlled experiments and apply

inquiry skills. Using evidence-centered design, Bayesian

network-based stealth assessments of student knowledge have

been explored in both the Physics Playground and a modified

version of the popular game, Plants vs. Zombies [26, 41], and

other work in this vein has explored deep learning-based models

of stealth assessment for game-based learning [31]. Student goal

recognition has also been investigated in game-based learning,

with approaches spanning Bayesian networks [34], Markov logic

networks [5], and long short-term memory networks [32]. The

affect-enhanced student modeling approach presented here

extends earlier work in student modeling for game-based learning

to exploit facial expression data to improve student models’

predictive accuracy.

Figure 1. Screenshots from CRYSTAL ISLAND illustrating a virtual textbook, non-player characters, the virtual

scanner, and game environment.

3

3 CRYSTAL ISLAND

We explore affect-enhanced student modeling in a study

conducted with an intelligent game-based learning environment

for science problem solving in microbiology, CRYSTAL

ISLAND [38]. Intelligent game-based learning environments

integrate the personalized learning supports of intelligent tutoring

systems [27, 45] with the engaging interactions afforded by game

technologies. In CRYSTAL ISLAND, students take on the role of a

medical field agent tasked with investigating an epidemic on a

remote island research station. The student must discover the

source and identity of the disease as well as recommend a

treatment plan to aid the infected members of the island’s research

team.

In CRYSTAL ISLAND, students engage in a range of problem-

solving actions to gather information, test hypotheses, and

diagnose a solution. Examples of student activities in the game

include exploring the island environment, conversing with non-

player characters, running tests on potentially contaminated items

in a virtual laboratory, and completing an in-game diagnosis

worksheet to report findings and solve the mystery. The open-

world game environment features several buildings, including an

infirmary, dining hall, laboratory, and various residences where

the student can gather information by talking to characters.

Relevant microbiology concepts about viruses, bacteria,

immunization, and how diseases spread are found in virtual books

and articles scattered throughout the buildings of the island.

Students use knowledge gained from both the informational texts

and dialogue with non-player characters to diagnose the illness

and solve the mystery. Students successfully complete the game

by submitting a worksheet to the camp nurse with the correct

diagnosis, identified transmission object, and treatment solution.

4 METHODS AND DATA

To investigate the potential contributions of affect to student

modeling, we compare two approaches. First, we created an

affect-enhanced student model that uses facial expression

tracking in combination with gameplay. As students played

CRYSTAL ISLAND, video was captured of their facial expressions.

We created predictive student models that used both the facial

expression data stream and the gameplay data stream to predict

students’ learning and students’ sense of presence, a core

component of engagement. We then created a baseline student

model that also predicts students’ learning and presence but only

uses the gameplay data stream. We compare the performance of

these two models and explore the impact of two approaches to

facial recognition tracking for affect-enhanced student modeling:

(1) composite emotion models and (2) models based on individual

facial action coding units.

4.1 Participants and Experimental Setup

The study involved 37 college age students that interacted with

the CRYSTAL ISLAND game-based learning environment in a lab

setting. Four students were removed due to partial or missing

data, resulting in 33 students (M = 19.9 years old, SD = 1.30), of

which 17 (51.5%) were female. All remaining students played the

1 The iMotions software was previously commercially available as FACET and the

research-focused toolbox CERT [30].

game to completion, and time spent playing the game ranged from

26.4 to 105.1 minutes (M = 63.7, SD = 18.4). Prior to interacting

with the game, students completed a 20-question multiple-choice

test assessing their conceptual and application-based

understanding of microbiology. At conclusion of gameplay,

students completed a Presence Questionnaire [46] and again

completed the same microbiology assessment.

4.2 Survey Measures

In order to assess student learning, we examined students’

performance on the microbiology content test administered

before and after students’ interactions with CRYSTAL ISLAND.

Using the difference between the pre-test score (Pre-Test) and

post-test score (Post-Test), Normalized Learning Gain was

calculated for each student participating in the study. Normalized

Learning Gain is the difference between Post-Test and Pre-Test,

standardized by the total amount of improvement or decline

possible from the Pre-Test. Students achieved positive

Normalized Learning Gain on average (M = 0.226, SD = 0.315),

with 25 of the 33 (75.8%) students achieving positive Normalized

Learning Gain (Post-Test greater than Pre-Test).

Normalized Learning Gain =

{

Post − Pre

1 − Pre Post > Pre

Post − Pre

Pre Post ≤ Pre

We use presence as a proxy for engagement in the game

environment. In order to assess student presence, the Presence

Questionnaire was used to measure players’ self-reported

perceptions of presence [47], which refers to a participant’s

perception of transportation into a virtual environment. The

questionnaire contains a series of 30 questions that students

answer on a 7-point scale to characterize their in-game experience

with a virtual environment. The questionnaire consists of several

subscales for measuring presence, such as involvement, sensory

fidelity, adaptation/immersion, and interface quality. To measure

presence, we summed the numeric responses to all 30 questions

to obtain a presence score for each student, which ranged from

116 to 209 (M = 155, SD = 23.4).

4.3 Facial Expression Recognition

Facial expression features were extracted automatically through a

video-based facial expression tracking system, iMotions.1 The

iMotions system extracts facial features that correspond to the

Facial Action Coding System (FACS) [21, 22]. It uses an

objective three-phase framework of facial detection from image

input, feature detection (i.e., locating position of facial

landmarks), and feature classification to report an evidence score

representing the likelihood of a particular affective measure being

present. A separate classifier is used for each affective measure,

ranging from low-level facial expression features such as Action

Units (AUs) to more complex, composite affective measures,

such as Surprise and Joy. We used the facial expression tracking

system to monitor students during gameplay to provide real-time

4

evidence scores for 20 AUs and 9 composite affective measures.

The 9 composite affective measures include Anger, Surprise,

Frustration, Joy, Confusion, Fear, Disgust, Sadness, and

Contempt. Table 1 displays the relationships between composite

affective measures and AUs as defined by iMotions.

Table 1. Breakdown of Composites by contributing AU.

Composite Measure Contributing AUs

Joy 6 + 12

Sadness 1 + 4 + 15

Surprise 1 + 2 + 5 + 26

Fear 2 + 4 + 5 + 7 + 20 + 26

Anger 4 + 5 + 7 + 23

Disgust 9 + 15 + 16

Contempt 12 + 14

Predictive models require higher-level features than are

provided by the evidence scores. We performed feature

engineering to transform the evidence scores to a representation

that could be effectively used by the student models. We used a

method with relative thresholding by amplitude and summed over

durations to minimize the effect of micro expressions. First, the

evidence scores were standardized for each student by subtracting

the mean evidence and dividing by the standard deviation

evidence over their entire episode. While iMotions calibrates the

evidence scores over the first few observations, it does not

account for the potential variability of evidence for students.

Thus, dividing by the standard deviation of a student’s evidence

score over the episode accounts for potential variability in

expressiveness of individuals. Events were added to the affect log

for the duration that the standardized evidence scores rose above

a threshold of one. These events represent moments when

evidence scores rose one standard deviation above the mean

evidence score for a particular student. The duration of these

events in each affective measure (i.e., the 20 AUs and 9

Composites) is summed over the gameplay episode of a student.

The final feature used in student modeling is this sum (i.e., the

duration of the evidence score was above one standard deviation

above the mean evidence score) divided by the total time the

student spent playing the game. These affective features thus

represent the proportion of gameplay duration that the respective

affective evidence score was one standard deviation above the

mean evidence score for a particular student.

4.3 Gameplay Behavior Logging

Gameplay interactions with CRYSTAL ISLAND were recorded in a

granular timestamped game log for each student. From these

game interaction logs, several measures were calculated to

summarize student behavior in the game-based learning

environment. The actions recorded include how students gather

information (e.g., reading books and articles, speaking with non-

player characters), select and organize information (e.g., editing

the diagnosis worksheet), and test their hypotheses (e.g., scanning

objects, submitting the worksheet as the final solution). The

measures used as features in predictive modeling include both the

number of actions performed and the duration of actions with

varying lengths. The set of actions reported includes

conversations with non-player characters (ConversationCount

and ConversationDuration), items scanned in the virtual

laboratory (ScannerCount), books and articles read

(BooksReadCount and ReadingDuration), edits to the diagnosis

worksheet (WorksheetCount and WorksheetDuration), and the

number of times the worksheet was submitted for final evaluation

(SubmitCount). These measures were also standardized by the

total time the student spent playing the game to represent rate-

per-minute for counts and proportion of time spent performing

each of the actions over a given duration (Table 2). The units for

counts are shown in counts-per-minute, while the units for

durations are in seconds-per-minute, representing a proportion

out of 60.

Table 2. Mean and standard deviations of gameplay

behaviors.

Gameplay Behaviors Mean

(per minute)

Standard

Deviation

ConversationCount 0.797 0.216

BooksReadCount 0.352 0.119

WorksheetCount 0.352 0.159

SubmitCount 0.029 0.017

ScannerCount 0.389 0.223

ConversationDuration 7.900 2.070

ReadingDuration 24.100 5.280

WorksheetDuration 5.150 1.970

5 RESULTS

We used three different subsets of the full feature set in order to

compare predictive student models of Normalized Learning Gain

and Presence. The first feature subset (Gameplay) consisted of

only the 8 gameplay features (actions and durations of select

actions) to represent a baseline model without affect information.

The second feature subset consisted of the gameplay actions and

AU-7 Lid Tightener

AU-12 Lip Corner Puller

AU-14 Dimpler

AU-15 Lip Corner Depressor

Figure 2. Examples of AUs used.

5

the 9 different composite affective features

(Gameplay+Composites). The third feature subset consisted of

the gameplay actions with the 20 different AU affective features

(Gameplay+AUs). The composite features and AUs are treated

separately due to multicollinearity issues stemming from the fact

that composites are combinations of the AUs. The AUs provide a

more granular measure while the composites provide a higher

level, more interpretable measure of affect. We compare the

performance of each feature subset predicting Normalized

Learning Gain and Presence for a total of 6 models (3 feature

subsets x 2 response variables), with 2 being baseline models

using only gameplay features, and the other 4 being affect-

enhanced models of student learning and presence.

Variables for a linear model were chosen using a forward

stepwise linear regression where features were selected from their

respective subset to optimize leave-one-out cross-validation

(LOOCV) R2. The reported models are the result of using the

selected features in an ordinary least squares linear regression

model.

5.1 Baseline Models

Baseline predictive models for Normalized Learning Gain and

Presence were created using the stepwise feature selection

process using the subset of 8 gameplay features. In addition to

standard linear regression measures, (coefficients, standard error

of coefficients, standardized coefficients and t-statistic for

significance of feature), the leave-one-out cross validation R2 is

reported as the measure of the accuracy of the model.

The highest performing Normalized Learning Gain model

using gameplay features for leave-one-out cross validation R2 is

shown in Table 1. This linear model uses two gameplay features

to predict Normalized Learning Gain: ScannerCount and

ReadingDuration. The model achieves a leave-one-out cross

validation R2 of 0.268. The negative coefficient of ScannerCount

indicates that the more often students scan items in the virtual

laboratory, the lower Normalized Learning Gain they achieve.

Conversely, the longer students read, the higher their Normalized

Learning Gain, an intuitive result since much of the microbiology

assessment focuses on content found in the game-based learning

environment’s virtual books and articles.

Table 3. Resulting linear model predicting Normalized

Learning Gain using gameplay features.

Baseline Model for Normalized Learning Gain

B SE B p-value

Scanner

Count

-0.318 0.242 -0.224 0.198

Reading

Duration 0.0279 0.0102 0.466 0.010*

Leave-One Out CV R2 = 0.268

Note: ** - p < 0.01, * - p < 0.05

The optimal Presence model using gameplay features under

leave-one-out cross validation R2 yields ConversationCount and

WorksheetCount as the only two gameplay features used to

predict Presence. This model achieves a leave-one-out cross

validation R2 of 0.147. The negative coefficients for each of the

features selected suggests that more conversations with non-

player characters and more edits of the worksheet lead to lower

Presence outcomes. The LOOCV R2 for the Presence model is

lower than the Gameplay model for Normalized Learning Gain,

revealing that predictions for Presence using only gameplay

features are worse relative to the total sum of squares for Presence

than similar predictions using only gameplay features for

Normalized Learning Gain. This indicates that Presence is more

challenging to predict from Gameplay features than Normalized

Learning Gain.

Table 4. Resulting linear model for predicting Presence

from gameplay features.

Baseline Model for Presence

B SE B p-value

Conversation

Count -31.9 17.5 -0.294 0.078

Worksheet

Count -52.6 23.6 -0.359 0.034*


Note: ** - p < 0.01, * - p < 0.05

5.2 Affect-Enhanced Composite Models

Next, we explored the potential of affect-enhanced predictive

models for Normalized Learning Gain and Presence by

introducing the 9 composite affective measures with the previous

8 gameplay features denoted Affect-Enhanced Composite

models. From this set of 17 features, we performed the same

forward stepwise feature selection technique optimizing for

LOOCV R2 and report the resulting models.

Table 5. Resulting linear model for predicting

Normalized Learning Gain from gameplay and

composite-affect features.

Affect-Enhanced Composite Model for

Normalized Learning Gain

B SE B p-value

Scanner

Count -0.369 0.237 -0.261 0.130

Reading

Duration 0.0212 0.0106 0.364 0.049*

Surprise -0.0183 0.0112 -0.246 0.111


Note: ** - p < 0.01, * - p < 0.05

The optimal Affect-Enhanced Composite model for predicting

Normalized Learning Gain uses three features and achieves a

LOOCV R2 of 0.324. Of the three features selected from a

potential 17, two are gameplay features (ScannerCount and

ReadingDuration), and one is a Composite feature (Surprise). It

is important to note that these two gameplay features are the same

6

for the Baseline model for Normalized Learning Gain, indicating

that adding Surprise to that model improves the predictive

accuracy for Normalized Learning Gain from an LOOCV R2 of

0.268 to 0.324. The coefficients of the two gameplay features are

also similar to those of the Baseline model, indicating that the

incorporation of Surprise improves the former model without

making drastic changes to the former model parameter estimates.

The negative coefficient of Surprise suggests that, holding other

variables constant, more surprise leads to reduced Normalized

Learning Gain.

The optimal Affect-Enhanced Composite model for Presence

achieves a LOOCV R2 of 0.245, improving on the Baseline

LOOCV R2 of 0.147. This model uses five features, with the

same two gameplay features (ConversationCount and

WorksheetCount) as the Baseline model and three features from

the set of Composite measures (Disgust, Confusion, and Sadness).

The negative coefficients of Disgust and Confusion indicate that

students experiencing more duration with elevated standardized

evidence score for these composite emotions exhibited lower

Presence. Conversely, students with a higher proportion of

expressed Sadness reported higher Presence. This Affect-

Enhanced Composite model also indicates an additive effect in

the sense that the same gameplay features were maintained, but

the predictive accuracy is improved by incorporating several

Composite features.

5.3 Affect-Enhanced AU Models

While the Affect-Enhanced Composite models improve upon the

Baseline models, the Action Units (AUs) measured by iMotions

provide low-level feature evidence scores that offer a more

granular level of facial expression features than the Composite

measures. Creating predictive models from this more granular

measure of affect may produce a more accurate affect-enhanced

model at the cost of interpretability since the AUs constitute a

lower-level facial expression feature than the Composite

measures. There are 20 AUs measured by iMotions, yielding a

total feature set of 28 when including the gameplay features that

can potentially be used in modeling students’ normalized learning

gain and presence.

Using the forward stepwise feature selection optimizing

LOOCV R2 for predicting Normalized Learning Gain from the

feature set of 20 AUs and 8 Gameplay features generates a model

with six features and a LOOCV R2 of 0.475. The two gameplay

features selected (ScannerCount and ReadingDuration) are the

same as the Affect-Enhanced Composite model and Baseline

model for Normalized Learning Gain. The other four features

were AUs, including AU1 Inner Brow Raiser, AU14 Dimpler,

AU15 Lip Corner Depressor, and AU43 Eyes Closed. Because the

Sadness measure is partially composed of AU1 Inner Brow Raiser

(as indicated by Table 1), the inclusion of AU1 Inner Brow Raiser

shows the similarity between the affect-enhanced models because

Sadness was included in the Composite-Affect Normalized

Learning Gain model. The coefficients for both Gameplay

features remain similar to both the Affect-Enhanced Composite

Normalized Learning Gain model and the Baseline Normalized

Learning Gain model. The addition of the AUs into this linear

model provides a greater LOOCV R2 than either the baseline

Normalized Learning Gain model or Affect-Enhanced Composite

Normalized Learning Gain model. Thus, the Affect-Enhanced

AU Normalized Learning Gain model exemplifies the additive

value of more granular affective measures in the predictive

models for Normalized Learning Gain, compared to the Affect-

Enhanced Composite Normalized Learning Gain model.

Table 7. Resulting linear model from predicting

Normalized Learning Gain.

Affect-Enhanced AU Model for

Normalized Learning Gain

B SE B p-value

Scanner

Count -0.294 0.203 -0.207 0.159

Reading

Duration 0.0353 0.0087 0.590 < 0.001**

AU1 -0.0290 0.0085 -0.413 0.002**

AU14 -0.0548 0.0157 -0.534 0.002**

AU15 0.0470 0.0186 0.390 0.018*

AU43 -0.0340 0.0163 -0.265 0.048*


Note: ** - p < 0.01, * - p < 0.05

An Affect-Enhanced AU Presence model was created using

the feature set of 20 AUs and 8 gameplay features and the same

feature selection techniques as the previous models. This resulted

in five selected features, the same two gameplay features from the

previous Presence models (ConversationCount and

WorksheetCount) along with three AUs. The included AUs were

AU7 Lid Tightener, AU12 Lip Corner Puller, and AU28 Lip Suck.

While the gameplay features are the same as the Affect-Enhanced

Composite and Baseline Presence models, none of the AUs

included in the Affect-Enhanced AU Presence model contribute

to the composite features included in the Affect-Enhanced

Composite model. The Affect-Enhanced AU Presence model

results in an increased LOOCV R2 over both the Affect-Enhanced

Composite model and Baseline model. Since this model preserves

similar coefficients for the same features used in the Baseline

Presence model, it appears the AUs improve the predictive

accuracy of the Baseline in terms of LOOCV R2.

Table 6. Resulting linear model for predicting Presence

from gameplay and composite-affect features.

Affect-Enhanced Composite Model for Presence

B SE B p-value

Conversation

Count -39.3 16.4 -0.362 0.024*

Worksheet

Count -64.0 21.6 -0.437 0.006**

Disgust -4.56 1.95 -0.369 0.027*

Confusion -2.20 1.29 -0.325 0.099

Sadness 3.44 1.20 0.560 0.008**


Note: ** - p < 0.01, * - p < 0.05

7

Table 8. Resulting linear model from predicting

Presence from gameplay and AU-affect features.

Affect-Enhanced AU Model for Presence

B SE B p-value

Conversation

Count -26.5 15.9 -0.244 0.107

Worksheet

Count -58.3 21.5 -0.398 0.011*

AU7 -4.89 1.58 -0.581 0.005**

AU12 3.88 1.61 0.440 0.023*

AU28 -2.01 0.822 -0.367 0.022*


Note: ** - p < 0.01, * - p < 0.05

The predictive accuracy of the three families of student

models is shown in Figure 3. The results show that including

facial expression features improved the predictive power of the

models for both learning gains and presence. Similarly, models

utilizing action unit features achieved higher predictive accuracy

than models using the composite emotion labels. This superior

performance was achieved for both learning gains and

presence. While the models of learning gains outperform their

counterparts for presence for each model structure, the Affect-

Enhanced Composite Presence model shows a larger increase

from the Baseline than the Affect-Enhanced Composite

Normalized Learning Gain model, while the Affect-Enhanced

AU Normalized Learning Gain model shows a larger increase

from utilizing AU features than the Presence model.

The Affect-Enhanced models use the same gameplay features

as the Baseline models, with the addition of several affective

features. Thus, the Baseline models are nested within the Affect-

Enhanced models, with the Affect-Enhanced models acting as

“full models” and the Baselines as “reduced models.” Since the

Baseline models are nested within their respective Affect-

Enhanced Composite and Affect-Enhanced AU models, an F-test

provides a measure of the contributions of the additional features

included in the full models over the reduced models. Results from

this test show that the Affect-Enhanced Composite Normalized

Learning Gain model’s addition of one additional feature

(Surprise) does not show significant improvement over the

Baseline Normalized Learning Gain model (F(1,29) = 2.70, p =

0.11). The Affect-Enhanced AU Normalized Learning Gain

model’s addition of four features shows significant improvement

over the Baseline Normalized Learning Gain model (F(4, 26) =

5.12, p = 0.004) in terms of reduction in residual sum of squares.

For the Presence models, results indicate that both the Affect-

Enhanced Composite (F(3, 27) = 3.37, p = 0.033) and Affect-

Enhanced AU (F(3, 27) = 3.95, p = 0.019) Presence models’

incorporation of three affective features show significant

improvement over the Baseline Presence model in terms of

reduction in sum of squared errors.

6 DISCUSSION

The results of the study suggest that affect-enhanced student

models can improve upon predictive accuracy compared to

models based exclusively on student behavior in the learning

environment. By augmenting gameplay data with facial

expression data, affect-enhanced student models significantly

outperformed models using only gameplay

features. Additionally, the models using the more granular action

unit data significantly outperformed models using the composite

emotions. These results highlight the importance of decisions

about the granularity of affect data representation.

Across multiple feature sets, predictive models for

Normalized Learning Gain outperformed models for Presence, as

measured by leave-one-out cross validation R2. In all of the

Normalized Learning Gain models, ReadingDuration is included

with high significance. The virtual books and articles contain

information relevant to questions presented in the microbiology

assessment, making it a strong candidate for predicting the

learning gain assessed by the microbiology test. For both learning

Figure 3. Summary of leave-one-out cross validation R2 for Baseline and Affect-Enhanced models.

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

NLG Presence

LO

OC

V R

2

Model Accuracy Summary

Baseline Affect-Enhanced Composite Affect-Enhanced AU

8

gains and presence, the models utilizing the more granular AUs

achieved significantly higher predictive accuracy than those

utilizing composite emotions. This could be due to a variety of

factors. First, the automatic tagging of AUs may be more

accurate, which is plausible as action units represent more

concretely represented phenomena than the composite emotions.

In addition, the facial expression tracking framework underlying

the emotion recognition system was trained on a library of posed

faces, which are potentially more expressive than emotions that

appear in a learning context. It could also be the case that

important information is being captured by AUs that is not

captured by the composite emotion labels. For example, an AU

may be conveying a subtle expression of an emotion not captured

by the composite labeling. It is also possible that the AUs are

representing a contextually-specific response not captured by the

composite labels, such as signaling increased cognitive load.

Analysis of the specific emotions present in the composite

models show that less Surprise is more predictive of learning

gains. The result highlights the need to further investigate where

in the gameplay activity the periods of Surprise tend to occur.

Similarly, for presence measures, the model shows less Disgust

and less Contempt and increased Sadness to be predictive of

presence. While the relationship for Disgust and Contempt align

with what one would expect for increased presence, increased

Sadness does not appear to align, though could potentially be

related to students’ empathy for the sick patients in the mystery

or emotional investment in solving the mystery.

Many of the AUs found to be significant in this work have

previously been found as commonly occurring AUs that have

predictive significance for tutoring outcomes [23]. The presence

of AU14 Dimpler being negatively associated with learning is a

result also found in a study of affect in tutoring [24], which found

AU14 Dimpler to be an indicator of lower engagement and higher

frustration. AU1 Inner Brow Raiser is associated with surprise,

which was the only emotion present in the composite model,

further strengthening the need to further investigate which game

events are triggering the reaction. For the presence models, AU7

Lid Tightener and AU28 Lip Suck were negatively associated with

presence. These AUs have been associated with increased mental

effort, which may negatively impact student perceptions of

“transportation” into the virtual environment.

While AUs may produce better models, using them sacrifices

some of the interpretability found in the composite models. For

example, one can imagine designing an adaptive intervention to

induce confusion, or perhaps reduce frustration, but it is not as

clear what the design implications would be for inducing or

decreasing a specific AU. While some AUs can be related to

composite emotions, others may be more difficult to categorize or

require additional information to contextualize. On the other

hand, composite emotions may not always be more interpretable,

as shown by the features in the presence model. The trade-off

between predictive ability and interpretability is one that requires

further investigation, and may depend on the types of adaptivity

that a game-based learning environment is designed to

provide. One approach to improving interpretability would be to

investigate how these models differ as students move through a

game. While the data presented here was accumulated through

complete playthroughs of a game, examining how facial

expressions evolve as students transition between affective states

throughout different phases of the gameplay session may help

identify specific game events or contexts that produce the most

desirable reactions. For example, expressions of frustration may

be distributed evenly through the session, or they may be

triggered by specific events. A more granular analysis of the data

may help identify such scenarios, allowing for targeted and

effective supports to be incorporated into the game-based learning

environment.

7 CONCLUSION

Affect is integral to learning and problem solving in adaptive

learning environments. Creating affect-enhanced student models

has the potential to provide the foundation for the next generation

of adaptive learning environments that are more effective and

more engaging. Because game-based learning environments are

specifically designed for the dual goals of increasing learning

effectiveness and increasing student engagement, the prospect of

introducing affect-enhanced student models into game-based

learning environments may be a particularly fruitful line of

investigation. Creating affect-enhanced student models that

leverage facial expression analysis could potentially yield student

models that can more accurately predict student learning and

engagement and, ultimately, contribute to improved student

experiences through more informed adaptation.

In this work we have investigated affect-enhanced student

models with a game-based learning environment. Baseline

models using gameplay features were created to predict

normalized learning gain and presence using a forward stepwise

feature selection methodology. Two sets of affect-enhanced

student models, one based on composite emotion models and one

based on action units, were created. Both affect-enhanced student

models augmented gameplay data with these features sets.

Results show that the affect-enhanced student models improve

upon the predictive accuracy of baseline models. The additional

features incorporated in the affect-enhanced models were also

found to significantly contribute to predictive accuracy,

suggesting that the affective features provide additive value. The

action-unit-based affect-enhanced models performed better than

the composite-based affect-enhanced models, achieving

predictive accuracy improvements for both student learning (as

measured by normalized learning gain) and student engagement

(as measured by presence). In future work, it will be important to

investigate the integration of affect-enhanced student models into

runtime game-based learning environments. Explorations of run-

time integration should include early prediction modeling and

game-based adaptation that uses the affect-enhanced student

models to guide both cognitive and affective scaffolding and to

understand their impact on learning processes and outcomes.

Acknowledgements

We thank our collaborators in the Center of Educational

Informatics and the SMART Lab at North Carolina State

University. This study was supported by funding from the Social

Sciences and Humanities Research Council of Canada. Any

opinions, findings, and conclusions or recommendations

expressed in this material are those of the author(s) and do not

necessarily reflect the views of the Social Sciences and

Humanities Research Council of Canada.

9

REFERENCES [1] Aleven, V., Mclaughlin, E., Glenn, R. and Koedinger, K.

2016. Instruction Based on Adaptive Learning

Technologies. Handbook of Research on Learning and

Instruction. R. Mayer and P. Alexander, eds. Routledge,

London, UK.

[2] Arroyo, I., Ferguson, K., Johns, J., Dragon, T.,

Mehranian, H., Fisher, D., Barto, A., Mahadevan, S. and

Woolf, B. 2007. Repairing Disengagement with Non-

Invasive Interventions. In Proceedings of the 13th

International Conference on Artificial Intelligence in

Education. Los Angeles, CA, 195–202.

[3] Azevedo, R. and Aleven, V. eds. 2013. International

Handbook of Metacognition and Learning

Technologies. Springer New York.

[4] Azevedo, R., Taub, M., Mudrick, N.V., Martin, S.A. and

Grafsgaard, J. 2017. Using Multi-Channel Trace Data to

Infer and Foster Self-Regulated Learning Between

Humans and Advanced Learning Technologies.

Handbook of self-regulation of learning and

performance. D.H. Schunk and J.A. Greene, eds.

Routledge, London, UK.

[5] Baikadi, A., Rowe, J., Mott, B. and Lester, J. 2014.

Generalizability of Goal Recognition Models in

Narrative-Centered Learning Environments. In

Proceedings of the 22nd Conference on User Modeling,

Adaptation, and Personalization. Springer, Aalborg,

Denmark, 278–289.

[6] Baker, R. and Clarke-Midura, J. 2013. Predicting

Successful Inquiry Learning in a Virtual Performance

Assessment for Science. In Proceedings of the 21st

International Conference on User Modeling, Adaptation

and Personalization. Springer, Rome, Italy, 203–214.

[7] Bosch, N., D’Mello, S., Ocumpaugh, J., Baker, R. and

Shute, V. 2016. Using Video to Automatically Detect

Learner Affect in Computer-Enabled Classrooms. ACM

Transactions on Interactive Intelligent Systems, 6 (2),

17:1-26.

[8] Calvo, R. and D’Mello, S. 2011. New Perspectives on

Affect and Learning Technologies. Springer. New York,

New York, USA.

[9] Calvo, R., D’Mello, S., Gratch, J. and Kappas, A. 2015.

The Oxford Handbook of Affective Computing. Oxford

University Press. New York, New York, USA.

[10] Calvo, R. and Mello, S. 2010. Affect Detection : An

Interdisciplinary Review of Models, Methods, and Their

Applications. IEEE Transactions on affective

computing, 1 (1), 18–37.

[11] Clark, D., Tanner-Smith, E. and Killingsworth, S. 2016.

Digital Games, Design, and Learning: A Systematic

Review and Meta-Analysis. Review of Educational

Research, 86 (1), 79–122.

[12] Corrigan, S., Barkley, T. and Pardos, Z. 2015. Dynamic

Approaches to Modeling Student Affect and Its

Changing Role in Learning and Performance. In

Proceedings of the 23rd International Conference on

User Modeling, Adaptation and Personalization.

Springer, Dublin, Ireland, 92–103.

[13] D’Mello, S. 2013. A Selective Meta-Analysis on the

Relative Incidence of Discrete Affective States during

Learning with Technology. Journal of Educational

Psychology, 105 (4), 1082–1099.

[14] D’Mello, S. and Graesser, A. 2012. AutoTutor and

Affective AutoTutor: Learning by Talking with

Cognitively and Emotionally Intelligent Computers that

Talk Back? ACM Transactions on Interactive Intelligent

Systems, 15 (212), 434–442.

[15] D’Mello, S. and Graesser, A. 2014. Confusion and Its

Dynamics during Device Comprehension with

Breakdown Scenarios. Acta Psychologica, 151, 106–

116.

[16] D’Mello, S., Lehman, B., Pekrun, R. and Graesser, A.

2014. Confusion can be Beneficial for Learning.

Learning and Instruction, 29, 153–170.

[17] D’Mello, S., Lehman, B., Sullins, J., Daigle, R., Combs,

R., Vogt, K., Perkins, L. and Graesser, A. 2010. A Time

for Emoting: When Affect-Sensitivity Is and Isn’t

Effective at Promoting Deep Learning. In Proceedings

of the International Conference on Intelligent Tutoring

Systems. Springer, Pittsburgh, PA, 245–254.

[18] Defalco, J., Georgoulas-Sherry, V., Paquette, L., Baker,

R., Rowe, J., Mott, B. and Lester, J. 2016. Motivational

Feedback Messages as Interventions to Frustration in

GIFT. In Proceedings of the Fourth GIFT User

Symposium(GIFTSym4). Princeton, NJ, 25-35.

[19] Desmarais, M. and Baker, R. 2011. A Review of Recent

Advances in Learner and Skill Modeling in Intelligent

Learning Environments. User Modeling and User-

Adapted Interaction, 22 (1–2), 9–38.

[20] Easterday, M., Aleven, V., Scheines, R. and Carver, S.

2016. Using Tutors to Improve Educational Games: A

Cognitive Game for Policy Argument. Journal of the

Learning Sciences, 26 (2), 226–276.

[21] Ekman, P. and Friesen, W. 1977. Facial Action Coding

System.

[22] Ekman, P. and Rosenberg, E. 1997. What the Face

Reveals: Basic and Applied Studies of Spontaneous

Expression Using the Facial Action Coding System

(FACS). Oxford University Press. New York, New

York, USA.

[23] Grafsgaard, J., Wiggins, J., Boyer, K., Wiebe, E. and

Lester, J. 2013. Automatically Recognizing Facial

Expression: Predicting Engagement and Frustration. In

Proceedings of the 6th International Conference on

Educational Data Mining. Memphis, TN, 43-50.

[24] Grafsgaard, J., Wiggins, J., Vail, A., Boyer, K., Wiebe,

E. and Lester, J. 2014. The Additive Value of

Multimodal Features for Predicting Engagement,

Frustration, and Learning during Tutoring. In


Multimodal Interaction (ICMI ’14). Istanbul, Turkey,

42–49.

[25] Harley, J., Carter, C., Papaionnou, N., Bouchet, F.,

Landis, R., Azevedo, R. and Karabachian, L. 2016.

Examining the Predictive Relationship between

Personality and Emotion Traits and Students Agent-

Directed Emotions: Towards Emotionally-Adaptive

10

Agent-Based Learning Environments. User Modelling

and User-Adapted Interaction, 26 (2–3) 177–219.

[26] Kim, Y., Almond, R. and Shute, V. 2016. Applying

Evidence-Centered Design for the Development of

Game-Based Assessments in Physics Playground.

International Journal of Testing, 16 (2), 142–163.

[27] Koedinger, K. and Aleven, V. 2016. An Interview

Reflection on “Intelligent Tutoring Goes to School in the

Big City.” International Journal of Artificial

Intelligence in Education, 26 (1), 13–24.

[28] De la Torre, F. and Cohn, J. 2011. Facial Expression

Analysis. Springer London.

[29] Lallé, S., Mudrick, N., Taub, M., Grafsgaard, J., Conati,

C. and Azevedo, R. 2016. Impact of Individual

Differences on Affective Reactions to Pedagogical

Agents Scaffolding Related Work on Students’

Reactions to Pedagogical Agents. In Proceedings of

International Conference on Intelligent Virtual Agents.

Springer, Los Angeles, CA, 269-282.

[30] Littlewort, G., Whitehill, J., Wu, T., Fasel, I., Frank, M.,

Movellan, J. and Bartlett, M. 2011. The Computer

Expression Recognition Toolbox (CERT). In


Automatic Face & Gesture Recognition and Workshops.

IEEE, Santa Barbara, CA, 298–305.

[31] Min, W., Frankosky, M., Mott, B., Rowe, J., Wiebe, E.,

Boyer, K. and Lester, J. 2015. DeepStealth: Leveraging

Deep Learning Models for Stealth Assessment in Game-

Based Learning Environments. In Proceedings of the

Seventeenth International Conference on Artificial

Intelligence in Education. Springer, Madrid, Spain, 277–

286.

[32] Min, W., Mott, B., Rowe, J., Liu, B. and Lester, J. 2016.

Player Goal Recognition in Open-World Digital Games

with Long Short-Term Memory Networks. International

Joint Conference on Artificial Intelligence. AAAI Press,

Burlingame, CA, 197-203.

[33] Mitrovic, A. 2012. Fifteen Years of Constraint-Based

Tutors: What We Have Achieved and Where We Are

Going. User Modeling and User-Adapted Interaction,

22 (1–2) 39–72.

[34] Mott, B., Lee, S. and Lester, J. 2006. Probabilistic Goal

Recognition in Interactive Narrative Environments. In

Proceedings of the Twenty-First National Conference on

Artificial Intelligence. AAAI Press, Boston, MA, 187–

192.

[35] Ocumpaugh, J., Baker, R. and Rodrigo, M. 2015. Baker

Rodrigo Ocumpaugh Monitoring Protocol (BROMP)

2.0 Technical and Training Manual.

[36] Paquette, L., Rowe, J., Baker, R., Mott, B., Lester, J.,

DeFalco, J., Brawner, K., Sottilare, R. and Georgoulas,

V. 2015. Sensor-Free or Sensor-Full: A Comparison of

Data Modalities in Multi-Channel Affect Detection.


Educational Data Mining. Madrid, Spain, 93–100.

[37] Pekrun, R. 2011. Emotions as Drivers of Learning and

Cognitive Development. New Perspectives on Affect and

Learning Technologies. R.A. Calvo and S.K. D’Mello,

eds. Springer. 23–39.

[38] Rowe, J., Shores, L., Mott, B. and Lester, J. 2011.

Integrating Learning, Problem Solving, and Engagement

in Narrative-Centered Learning Environments.

International Journal of Artificial Intelligence in

Education, 21 (1-2), 115–133.

[39] Sabourin, J. and Lester, J. 2014. Affect and Engagement

in Game-Based Learning Environments. IEEE

Transactions on Affective Computing, 5 (1), 45–56.

[40] Sabourin, J., Rowe, J., Mott, B. and Lester, J. 2013.

Considering Alternate Futures to Classify Off-Task

Behavior as Emotion Self-Regulation: A Supervised

Learning Approach. Journal of Educational Data

Mining, 5 (1), 9–38.

[41] Shute, V., Wang, L., Greiff, S., Zhao, W. and Moore, G.

2016. Measuring Problem Solving Skills via Stealth

Assessment in an Engaging Video Game. Computers in

Human Behavior, 63, 106–117.

[42] Taub, M., Mudrick, N. V., Azevedo, R., Millar, G.C.,

Rowe, J. and Lester, J. 2017. Using Multi-Channel Data

with Multi-Level Modeling to Assess In-Game

Performance during Gameplay with Crystal Island.

Computers in Human Behavior,

DOI:https://doi.org/10.1016/j.chb.2017.01.038.

[43] Vail, A.K., Grafsgaard, J.F., Boyer, K.E., Wiebe, E.N.

and Lester, J.C. 2016. Gender Differences in Facial

Expressions of Affect During Learning. In Proceedings

of the 24th Conference on User Modeling, Adaptation

and Personalization (UMAP 2016). ACM, Halifax,

Canada, 65–73.

[44] VanLehn, K. 2006. The Behavior of Tutoring Systems.

International Journal of Artificial Intelligence in

Education, 16 (3), 227–265.

[45] VanLehn, K. 2011. The Relative Effectiveness of

Human Tutoring, Intelligent Tutoring Systems, and

Other Tutoring Systems. Educational Psychologist, 46

(4), 197–221.

[46] Witmer, B., Jerome, C. and Singer, M. 2005. The Factor

Structure of the Presence Questionnaire. Presence:

Teleoperators and Virtual Environments, 14 (3), 298–

312.

[47] Witmer, B. and Singer, M. 1998. Measuring Presence in

Virtual Environments: A Presence Questionnaire.

Presence: Teleoperators Virtual Environments, 7 (3),

225–240.

[48] Woolf, B., Burleson, W., Arroyo, I., Dragon, T., Cooper,

D. and Picard, R. 2009. Affect-Aware Tutors:

Recognising and Responding to Student Affect.

International Journal of Learning Technology, 4 (3-4),

129–164.

[49] Wouters, P., van Nimwegen, C., van Oostendorp, H. and

van der Spek, E. 2013. A Meta-Analysis of the Cognitive

and Motivational Effects of Serious Games. Journal of

Educational Psychology, 105 (2), 249–265.

Enhancing Student Models in Game-based Learning with ... · Enhancing Student Models in Game-based...

Documents

Transcript of Enhancing Student Models in Game-based Learning with ... · Enhancing Student Models in Game-based...