Multimodal Aspects of Stochastic Interaction Managementplison/pdfs/thesis/triallecture2014.pdf ·...

29
Multimodal Aspects of Stochastic Interaction Management Pierre Lison Language Technology Group Trial Lecture 21st February 2014

Transcript of Multimodal Aspects of Stochastic Interaction Managementplison/pdfs/thesis/triallecture2014.pdf ·...

Page 1: Multimodal Aspects of Stochastic Interaction Managementplison/pdfs/thesis/triallecture2014.pdf · Trial Lecture! 21st February 2014. Outline of the lecture 1. What is a multimodal

Multimodal Aspects of Stochastic Interaction Management

Pierre Lison Language Technology Group

!Trial Lecture

21st February 2014

Page 2: Multimodal Aspects of Stochastic Interaction Managementplison/pdfs/thesis/triallecture2014.pdf · Trial Lecture! 21st February 2014. Outline of the lecture 1. What is a multimodal

Outline of the lecture

1. What is a multimodal system?

2. Multimodal architectures

3. Interaction management

4. Conclusion

���2

Page 3: Multimodal Aspects of Stochastic Interaction Managementplison/pdfs/thesis/triallecture2014.pdf · Trial Lecture! 21st February 2014. Outline of the lecture 1. What is a multimodal

Outline of the lecture

1. What is a multimodal system?

2. Multimodal architectures

3. Interaction management

4. Conclusion

���3

Page 4: Multimodal Aspects of Stochastic Interaction Managementplison/pdfs/thesis/triallecture2014.pdf · Trial Lecture! 21st February 2014. Outline of the lecture 1. What is a multimodal

Multimodal interfaces

A multimodal interface is a computer interface that provides the user with more than one “path of communication”

���4

Now turn left …

Select that object!

Page 5: Multimodal Aspects of Stochastic Interaction Managementplison/pdfs/thesis/triallecture2014.pdf · Trial Lecture! 21st February 2014. Outline of the lecture 1. What is a multimodal

What is a modality?

In human-computer interaction, a “modality” is a channel of communication between the user and the machine

• Relation to human senses: vision, audition, touch, etc.

• Includes both the system inputs and outputs

���5

Input

Output

Page 6: Multimodal Aspects of Stochastic Interaction Managementplison/pdfs/thesis/triallecture2014.pdf · Trial Lecture! 21st February 2014. Outline of the lecture 1. What is a multimodal

Multimodal inputs

Why use multiple input modalities?

• Increased usability and accessibility

• More meaningful and reliable interpretations

• Better grasp of the user’s current state (i.e. intention, attention, affect)

���6

Inputs

Page 7: Multimodal Aspects of Stochastic Interaction Managementplison/pdfs/thesis/triallecture2014.pdf · Trial Lecture! 21st February 2014. Outline of the lecture 1. What is a multimodal

Multimodal outputs

Why use multiple output modalities?

• Tailor the system outputs to the situation

• Enrich generated content

• Increase user engagement

���7

Outputs…

Page 8: Multimodal Aspects of Stochastic Interaction Managementplison/pdfs/thesis/triallecture2014.pdf · Trial Lecture! 21st February 2014. Outline of the lecture 1. What is a multimodal

Types of inputs/outputs

Major modalities:

• Vision

• Audition

• Touch

���8

Page 9: Multimodal Aspects of Stochastic Interaction Managementplison/pdfs/thesis/triallecture2014.pdf · Trial Lecture! 21st February 2014. Outline of the lecture 1. What is a multimodal

Modalities in human communication

Human face-to-face communication is fundamentally multimodal

• Speech, gaze, gestures, body pose

• Continuous, bidirectional exchange of information

• Interactive alignment of behaviour���9

Page 10: Multimodal Aspects of Stochastic Interaction Managementplison/pdfs/thesis/triallecture2014.pdf · Trial Lecture! 21st February 2014. Outline of the lecture 1. What is a multimodal

Hand gestures

���10

Symbolic Iconic Metaphoric

Deictic Beat

[D. McNeill (2008), “Gesture and Thought”, University of Chicago Press]

Page 11: Multimodal Aspects of Stochastic Interaction Managementplison/pdfs/thesis/triallecture2014.pdf · Trial Lecture! 21st February 2014. Outline of the lecture 1. What is a multimodal

Non-verbal signals

• More than gestures!

• Gaze, facial expressions & body posture also convey important signals

���11

• Used to control turn-taking, attention, grounding, and affect

[K. Jokinen, H. Furukawa, M. Nishida and S. Yamamoto (2013), "Gaze and turn-taking behavior in casual conversational interactions", ACM TiiS.]

Page 12: Multimodal Aspects of Stochastic Interaction Managementplison/pdfs/thesis/triallecture2014.pdf · Trial Lecture! 21st February 2014. Outline of the lecture 1. What is a multimodal

Outline of the lecture

1. What is a multimodal system?

2. Multimodal architectures

3. Interaction management

4. Conclusion

���12

Page 13: Multimodal Aspects of Stochastic Interaction Managementplison/pdfs/thesis/triallecture2014.pdf · Trial Lecture! 21st February 2014. Outline of the lecture 1. What is a multimodal

Classical dialogue architecture

���13User

Speech recognition

input speech signal (user utterance)

Understanding

Recognition hypotheses

Speech synthesis

Utterance to synthesise

output speech signal (machine utterance)

GenerationIntended response

Interpreted utterance

Dialogue management

Page 14: Multimodal Aspects of Stochastic Interaction Managementplison/pdfs/thesis/triallecture2014.pdf · Trial Lecture! 21st February 2014. Outline of the lecture 1. What is a multimodal

Multimodal architecture

���14User

Interpreted inputs

Interaction management

input signals

Input recognition

… …

Recognition hypotheses

Input understanding

… …

Behaviour to synthesise

output signals

Behaviour execution

… …

Intended response

Output generation

… …

Page 15: Multimodal Aspects of Stochastic Interaction Managementplison/pdfs/thesis/triallecture2014.pdf · Trial Lecture! 21st February 2014. Outline of the lecture 1. What is a multimodal

Input fusion

• Merge information arising from different sources

• Content may be redundant, complementary, or conflicting

• Fusion stages:

• Early fusion: combine coupled signals at feature level

• Late fusion: construct cross-modal semantic content

���15

input signals

Input recognition

… …

Recognition hypotheses

Input understanding

… …

Page 16: Multimodal Aspects of Stochastic Interaction Managementplison/pdfs/thesis/triallecture2014.pdf · Trial Lecture! 21st February 2014. Outline of the lecture 1. What is a multimodal

Output fission

• Distribute a given output over the set of available modalities

• Find the best way to convey the content or behaviour

• Processing steps:

• Message construction

• Modality selection

• Output coordination

���16

output signals

Output generation

… …

Behaviour to synthesise

Behaviour execution

… …

Page 17: Multimodal Aspects of Stochastic Interaction Managementplison/pdfs/thesis/triallecture2014.pdf · Trial Lecture! 21st February 2014. Outline of the lecture 1. What is a multimodal

Outline of the lecture

1. What is a multimodal system?

2. Multimodal architectures

3. Interaction management

4. Conclusion

���17

Page 18: Multimodal Aspects of Stochastic Interaction Managementplison/pdfs/thesis/triallecture2014.pdf · Trial Lecture! 21st February 2014. Outline of the lecture 1. What is a multimodal

Multimodal architecture

Interpreted inputs

Interaction management

input signal

Input

… …

Recognition hypotheses

Input

… …

Intended response

Output generation

… …

Behavioursynthesis

output signal

Behaviour execution

… …

Page 19: Multimodal Aspects of Stochastic Interaction Managementplison/pdfs/thesis/triallecture2014.pdf · Trial Lecture! 21st February 2014. Outline of the lecture 1. What is a multimodal

Interaction management

���19

State tracking

Action selection

Interaction management

Current state

Interpreted inputs

Intended response

Tasks:

1. Track the current state of the interaction given the inputs and past history

2. Decide on the best action(s) to perform

Page 20: Multimodal Aspects of Stochastic Interaction Managementplison/pdfs/thesis/triallecture2014.pdf · Trial Lecture! 21st February 2014. Outline of the lecture 1. What is a multimodal

Challenges for multimodal systems

1. Tracking the interaction state

2. Deciding when to talk

3. Deciding what to do/say

4. End-to-end evaluation

���20

Page 21: Multimodal Aspects of Stochastic Interaction Managementplison/pdfs/thesis/triallecture2014.pdf · Trial Lecture! 21st February 2014. Outline of the lecture 1. What is a multimodal

Tracking the interaction state

The interaction state can be difficult to track:

• Numerous state variables (user & task models, history, external environment)

• Multiple, asynchronous streams of observations

• High levels of uncertainty

• Stochastic action effects

���21

State tracking

Action selection

Interaction management

Current state

Interpreted inputs

Page 22: Multimodal Aspects of Stochastic Interaction Managementplison/pdfs/thesis/triallecture2014.pdf · Trial Lecture! 21st February 2014. Outline of the lecture 1. What is a multimodal

State tracking methods

• Allow state variables to be partially observable (e.g. POMDP models)

• Rely on structural assumptions and abstraction methods to avoid combinatorial explosion of state space

• Use approximate inference to ensure state tracking can be done in real-time

���22

[J. Hoey et al. (2005), “POMDP models for assistive technology”, AAAI]

[J. Williams (2007), “Using Particle Filters to Track Dialogue State”, ASRU]

Page 23: Multimodal Aspects of Stochastic Interaction Managementplison/pdfs/thesis/triallecture2014.pdf · Trial Lecture! 21st February 2014. Outline of the lecture 1. What is a multimodal

Deciding when to talk

• When should the machine take the turn in face-to-face interaction?

• Combination of both verbal (syntax, prosody) and non-verbal factors (gaze, gestures, etc.)

• Statistical models to predict when the current speaker will end its turn

• Sequential probabilistic modelling (e.g. CRFs) with multimodal features

���23

[I. de Kok & D. Heylen (2009), “Multimodal End-of-Turn Prediction in Multi-Party Meetings”, ICMI]

Page 24: Multimodal Aspects of Stochastic Interaction Managementplison/pdfs/thesis/triallecture2014.pdf · Trial Lecture! 21st February 2014. Outline of the lecture 1. What is a multimodal

Deciding what to do/say

• Multimodal systems must coordinate multiple tasks in parallel

• Engagement, communicative behaviour, physical actions, etc.

• Tasks may be decomposed in a hierarchical manner

• How to decide on the best behaviour to execute?

���24

State tracking

Action selection

Interaction management

Current state

Intended response

[Simon Keizer et al. (2013), “Training and evaluation of an MDP model for social multi-user human-robot interaction”, SIGDIAL]

Page 25: Multimodal Aspects of Stochastic Interaction Managementplison/pdfs/thesis/triallecture2014.pdf · Trial Lecture! 21st February 2014. Outline of the lecture 1. What is a multimodal

Action selection methods

• Optimisation of multimodal policies via reinforcement learning

• Temporal abstraction can be used to capture hierarchical tasks

• Reward function can be harder to design in multimodal settings

• Exploit social signals to infer rewards?

���25

[H. Cuayáhuitl & N. Dethlefs (2012), "Spatially-Aware Dialogue Control Using Hierarchical Reinforcement Learning”. In ACM Transactions on Speech and Language Processing]

[V. Rieser & O. Lemon (2009), “Learning Human Multimodal Dialogue Strategies”, NLE]

Page 26: Multimodal Aspects of Stochastic Interaction Managementplison/pdfs/thesis/triallecture2014.pdf · Trial Lecture! 21st February 2014. Outline of the lecture 1. What is a multimodal

End-to-end evaluation

• For applications with clear-cut tasks, standard metrics of task success & efficiency can be extended to multimodal settings

• But the empirical effects of each modality on the interaction are often hard to measure

• However, many interaction domains do not have a single, predefined task

• Naturalness & likability may be more important

���26

[F. Schiel (2006), “Evaluation of Multimodal Dialogue Systems”, SmartKom. Springer][D.Traum et al. (2004), “Evaluation of multi-party virtual reality dialogue interaction”, LREC]

Page 27: Multimodal Aspects of Stochastic Interaction Managementplison/pdfs/thesis/triallecture2014.pdf · Trial Lecture! 21st February 2014. Outline of the lecture 1. What is a multimodal

Outline of the lecture

1. What is a multimodal system?

2. Multimodal architectures

3. Interaction management

4. Conclusion

���27

Page 28: Multimodal Aspects of Stochastic Interaction Managementplison/pdfs/thesis/triallecture2014.pdf · Trial Lecture! 21st February 2014. Outline of the lecture 1. What is a multimodal

Take-home messages

1. Multimodal systems provide users with more than one communication channel

2. They offer many advantages in terms of robustness, usability, and adaptivity

3. But they need to address non-trivial engineering challenges:

• Multimodal fusion and fission

• Complex interaction models

���28

Page 29: Multimodal Aspects of Stochastic Interaction Managementplison/pdfs/thesis/triallecture2014.pdf · Trial Lecture! 21st February 2014. Outline of the lecture 1. What is a multimodal

Questions?

���29