Assessing the Quality of a Sword-Fighting Agent€¦ · Assessing the Quality of a Sword-Fighting...

19
Assessing the Quality of a Sword-Fighting Agent J. Dehesa, A. Vidler, C. Lutteroth, J. Padget July 2, 2019 University of Bath / Ninja Theory Ltd

Transcript of Assessing the Quality of a Sword-Fighting Agent€¦ · Assessing the Quality of a Sword-Fighting...

  • Assessing the Quality of a Sword-Fighting

    Agent

    J. Dehesa, A. Vidler, C. Lutteroth, J. Padget

    July 2, 2019

    University of Bath / Ninja Theory Ltd

  • Sword Fighting in VR

    We want to make engaging sword fighting in VR.

    Like this.

    “Game of Thrones” (HBO, 2013)

    2

  • Sword Fighting in VR

    This is hard for several reasons.

    Complex control

    HTC Vive (HTC, 2016)

    Harder to animate

    Unreal Engine 4 (Epic Games, 2014)

    3

  • Our proposal

    We came to a solution using machine learning.

    User

    input

    Current

    pose

    Neural

    network

    Next

    pose

    The model is trained on motion capture data.

    4

  • Our proposal

    It looks like this.

    Dehesa et al., 2019

    5

  • The question

    Is it good?

    6

  • Wishlist

    Reliability

    Fidelity

    Model behaviour resembles mocap data.

    Generality

    Works in situations beyond training data.

    Stability

    Similar inputs produce similar outputs.

    7

  • Wishlist

    Viability

    Performance

    Must run at 90 fps in a VR-ready PC.

    Data efficiency

    Require a reasonable amount of data.

    Training time

    Fast training facilitates iterative development.

    8

  • Wishlist

    User satisfaction

    Players

    Experience is enjoyable, realistic, engaging.

    Designers

    Overall quality is acceptable.

    Methodology is usable.

    9

  • Wanted

    10

    ACTIONABLE

    METRICS

  • Evaluation plan

    Measuring reliability

    Pose error

    Custom metric

    to measure

    similarity to

    mocap.NN

    PFNN

    6

    PFNN

    8

    GFN

    N3× 3

    GFN

    N4× 4

    GFN

    N2× 3× 3

    GFN

    N2× 4× 4

    Model

    0.00

    0.25

    0.50

    0.75

    1.00

    1.25

    1.50

    1.75

    Err

    or(m

    2)

    11

  • Evaluation plan

    Measuring viability

    Model complexity

    Big-O analysis for time

    and memory.

    Benchmarks

    Referenced to specific

    hardware.

    12

  • Evaluation plan

    Measuring viability

    Data requirements

    Size of data set, cost

    of mocap and

    pre-processing,

    equipment.

    Training time

    Referenced to specific

    hardware and

    experiment settings.

    13

  • Evaluation plan

    Measuring user satisfaction

    User study

    Measure enjoyment

    and quality.

    Designer study

    Measure quality and

    usability.

    Methods?14

  • User study

    Enjoyment questionnaires

    Did the user like the experience?

    Flow Challenge–skill balance.

    Sweetset and Wyeth, 2005

    (“GameFlow”)

    Presence Sense of “being there”.

    Witmer and Singer, 1998

    Immersion Believing the world.

    Jennett et al., 2008

    + Interactive

    + Standard

    − Too broad?− Confounding?− Baseline?

    15

  • User study

    Side-by-side comparisons

    Does the model look as good as the mocap data?

    + Simple

    + Non-confounding

    + Baseline

    − Non-interactive

    16

  • Designer study

    Usability questionnaires

    Is this methodology useful?

    SUS Very broadly used.

    Brooke, 1996

    SUMI Exhaustive but expensive.

    Kirakowski and Corbett, 1993

    PSSUQ More task-oriented.

    Lewis, 1995

    + Standard

    − Applicability?− Which? Why?

    17

  • Open questions

    • Are our quantitative metrics enough?

    • Which are the best tools for our qualitative studies?

    • What kind of baseline can we compare to?

    18

  • Thank you