Assessing the Quality of a Sword-Fighting Agent€¦ · Assessing the Quality of a Sword-Fighting...
Transcript of Assessing the Quality of a Sword-Fighting Agent€¦ · Assessing the Quality of a Sword-Fighting...
-
Assessing the Quality of a Sword-Fighting
Agent
J. Dehesa, A. Vidler, C. Lutteroth, J. Padget
July 2, 2019
University of Bath / Ninja Theory Ltd
-
Sword Fighting in VR
We want to make engaging sword fighting in VR.
Like this.
“Game of Thrones” (HBO, 2013)
2
-
Sword Fighting in VR
This is hard for several reasons.
Complex control
HTC Vive (HTC, 2016)
Harder to animate
Unreal Engine 4 (Epic Games, 2014)
3
-
Our proposal
We came to a solution using machine learning.
User
input
Current
pose
Neural
network
Next
pose
The model is trained on motion capture data.
4
-
Our proposal
It looks like this.
Dehesa et al., 2019
5
-
The question
Is it good?
6
-
Wishlist
Reliability
Fidelity
Model behaviour resembles mocap data.
Generality
Works in situations beyond training data.
Stability
Similar inputs produce similar outputs.
7
-
Wishlist
Viability
Performance
Must run at 90 fps in a VR-ready PC.
Data efficiency
Require a reasonable amount of data.
Training time
Fast training facilitates iterative development.
8
-
Wishlist
User satisfaction
Players
Experience is enjoyable, realistic, engaging.
Designers
Overall quality is acceptable.
Methodology is usable.
9
-
Wanted
10
ACTIONABLE
METRICS
-
Evaluation plan
Measuring reliability
Pose error
Custom metric
to measure
similarity to
mocap.NN
PFNN
6
PFNN
8
GFN
N3× 3
GFN
N4× 4
GFN
N2× 3× 3
GFN
N2× 4× 4
Model
0.00
0.25
0.50
0.75
1.00
1.25
1.50
1.75
Err
or(m
2)
11
-
Evaluation plan
Measuring viability
Model complexity
Big-O analysis for time
and memory.
Benchmarks
Referenced to specific
hardware.
12
-
Evaluation plan
Measuring viability
Data requirements
Size of data set, cost
of mocap and
pre-processing,
equipment.
Training time
Referenced to specific
hardware and
experiment settings.
13
-
Evaluation plan
Measuring user satisfaction
User study
Measure enjoyment
and quality.
Designer study
Measure quality and
usability.
Methods?14
-
User study
Enjoyment questionnaires
Did the user like the experience?
Flow Challenge–skill balance.
Sweetset and Wyeth, 2005
(“GameFlow”)
Presence Sense of “being there”.
Witmer and Singer, 1998
Immersion Believing the world.
Jennett et al., 2008
+ Interactive
+ Standard
− Too broad?− Confounding?− Baseline?
15
-
User study
Side-by-side comparisons
Does the model look as good as the mocap data?
+ Simple
+ Non-confounding
+ Baseline
− Non-interactive
16
-
Designer study
Usability questionnaires
Is this methodology useful?
SUS Very broadly used.
Brooke, 1996
SUMI Exhaustive but expensive.
Kirakowski and Corbett, 1993
PSSUQ More task-oriented.
Lewis, 1995
+ Standard
− Applicability?− Which? Why?
17
-
Open questions
• Are our quantitative metrics enough?
• Which are the best tools for our qualitative studies?
• What kind of baseline can we compare to?
18
-
Thank you