Why and how to model multi-modal interaction for a mobile...

36
Li, Peltason and Wrede, March 2007 1 Why and how to model multi-modal interaction for a mobile robot companion Shuyin Li, Julia Peltason and Britta Wrede Bielefeld University Germany

Transcript of Why and how to model multi-modal interaction for a mobile...

Page 1: Why and how to model multi-modal interaction for a mobile ...nysmith/organizing/sss07/slides/li/li.pdf · Li, Peltason and Wrede, March 2007 1 Why and how to model multi-modal interaction

Li, Peltason and Wrede, March 2007 1

Why and how to model multi-modal interaction for a mobile robot companion

Shuyin Li, Julia Peltason and Britta Wrede

Bielefeld UniversityGermany

Page 2: Why and how to model multi-modal interaction for a mobile ...nysmith/organizing/sss07/slides/li/li.pdf · Li, Peltason and Wrede, March 2007 1 Why and how to model multi-modal interaction

Li, Peltason and Wrede, March 2007 2

Outline

Introduction to Human-Robot Interaction (HRI)Observations in a user studyA multi-modal interaction frameworkSummary

Page 3: Why and how to model multi-modal interaction for a mobile ...nysmith/organizing/sss07/slides/li/li.pdf · Li, Peltason and Wrede, March 2007 1 Why and how to model multi-modal interaction

Li, Peltason and Wrede, March 2007 3

Robots should be aware of environmental changes

Introduction: HRI with a personal robot

Robot characteristics

Requirements for the system

situated situation-awareness

Page 4: Why and how to model multi-modal interaction for a mobile ...nysmith/organizing/sss07/slides/li/li.pdf · Li, Peltason and Wrede, March 2007 1 Why and how to model multi-modal interaction

Li, Peltason and Wrede, March 2007 4

Robots should be aware of environmental changes

Introduction: HRI with a personal robot

Robot characteristics

Requirements for the system

Users expect human-like behaviorsanthropomorphic social

behaviors

situated situation-awareness

Page 5: Why and how to model multi-modal interaction for a mobile ...nysmith/organizing/sss07/slides/li/li.pdf · Li, Peltason and Wrede, March 2007 1 Why and how to model multi-modal interaction

Li, Peltason and Wrede, March 2007 5

Robots should be aware of environmental changes

Introduction: HRI with a personal robot

Robot characteristics

Requirements for the system

Both users and robots have visual access to their

interaction partener's bodyembodied multi-modal

interaction

Users expect human-like behaviorsanthropomorphic social

behaviors

situated situation-awareness

Page 6: Why and how to model multi-modal interaction for a mobile ...nysmith/organizing/sss07/slides/li/li.pdf · Li, Peltason and Wrede, March 2007 1 Why and how to model multi-modal interaction

Li, Peltason and Wrede, March 2007 6

Robots should be aware of environmental changes

Introduction: HRI with a personal robot

Robot characteristics

Requirements for the system

Both users and robots have visual access to their

interaction partener's bodyembodied multi-modal

interaction

Users expect human-like behaviorsanthropomorphic social

behaviors

situated situation-awareness

Page 7: Why and how to model multi-modal interaction for a mobile ...nysmith/organizing/sss07/slides/li/li.pdf · Li, Peltason and Wrede, March 2007 1 Why and how to model multi-modal interaction

Li, Peltason and Wrede, March 2007 7

Robots should be aware of environmental changes

Introduction: HRI with a personal robot

Robot characteristics

Requirements for the system

Both users and robots have visual access to their

interaction partener's bodyembodied multi-modal

interaction

Users expect human-like behaviorsanthropomorphic social

behaviors

situated situation-awareness

Page 8: Why and how to model multi-modal interaction for a mobile ...nysmith/organizing/sss07/slides/li/li.pdf · Li, Peltason and Wrede, March 2007 1 Why and how to model multi-modal interaction

Li, Peltason and Wrede, March 2007 8

Outline

Introduction to Human-Robot Interaction (HRI)Observations in a user studyObservations in a user study

BIRON@HomeBIRON@HomeQuiet speakersQuiet speakersMeta-commentatorsMeta-commentators

A multi-modal interaction frameworkSummary

Page 9: Why and how to model multi-modal interaction for a mobile ...nysmith/organizing/sss07/slides/li/li.pdf · Li, Peltason and Wrede, March 2007 1 Why and how to model multi-modal interaction

Li, Peltason and Wrede, March 2007 9

The user study: BIRON@Home

Page 10: Why and how to model multi-modal interaction for a mobile ...nysmith/organizing/sss07/slides/li/li.pdf · Li, Peltason and Wrede, March 2007 1 Why and how to model multi-modal interaction

Li, Peltason and Wrede, March 2007 10

The user study: BIRON@Home

Experimental setup with BIRON

Non-task behaviors of BIRON:1. situation awareness2. social behavior:

14 subjects, each interaction about 7 min.

Only output-modality of BIRON: speech

Page 11: Why and how to model multi-modal interaction for a mobile ...nysmith/organizing/sss07/slides/li/li.pdf · Li, Peltason and Wrede, March 2007 1 Why and how to model multi-modal interaction

Li, Peltason and Wrede, March 2007 11

The user study: situation awareness

Face recognition

sound source detection

Face recognition

Human leg detection

Page 12: Why and how to model multi-modal interaction for a mobile ...nysmith/organizing/sss07/slides/li/li.pdf · Li, Peltason and Wrede, March 2007 1 Why and how to model multi-modal interaction

Li, Peltason and Wrede, March 2007 12

Observation I: quiet speakers

Page 13: Why and how to model multi-modal interaction for a mobile ...nysmith/organizing/sss07/slides/li/li.pdf · Li, Peltason and Wrede, March 2007 1 Why and how to model multi-modal interaction

Li, Peltason and Wrede, March 2007 13

Observation I: quiet speakers

The problem

No means to communicate pre-interaction attentionPossible reasoninapproprateness of speech modality

(“legs detected, face detected, face lost again, face detected, ...”)

Solutionuse non-verbal modalities

(because they are suitable to represent static information which is only occasionally updated)

Page 14: Why and how to model multi-modal interaction for a mobile ...nysmith/organizing/sss07/slides/li/li.pdf · Li, Peltason and Wrede, March 2007 1 Why and how to model multi-modal interaction

Li, Peltason and Wrede, March 2007 14

The user study: social behavior

...

User: Follow me.

BIRON: OK, I'm following you.

User: This is a cup.

BIRON: It's nice.

You are really doing very well!

User: (laugh)

BIRON: Come here.

Performance remark

Page 15: Why and how to model multi-modal interaction for a mobile ...nysmith/organizing/sss07/slides/li/li.pdf · Li, Peltason and Wrede, March 2007 1 Why and how to model multi-modal interaction

Li, Peltason and Wrede, March 2007 15

Observation II: meta-commentators

Page 16: Why and how to model multi-modal interaction for a mobile ...nysmith/organizing/sss07/slides/li/li.pdf · Li, Peltason and Wrede, March 2007 1 Why and how to model multi-modal interaction

Li, Peltason and Wrede, March 2007 16

Observation II: meta-commentators

Problemusers reply to social comments using out-of-vocabulary words

Possible reasonreciprocity and obtrusiveness of the speech modality

Solutionmaking remarks using non-verbal modalities

(because they are unobtrusive and do not impose strong obligation to reply)

Page 17: Why and how to model multi-modal interaction for a mobile ...nysmith/organizing/sss07/slides/li/li.pdf · Li, Peltason and Wrede, March 2007 1 Why and how to model multi-modal interaction

Li, Peltason and Wrede, March 2007 17

Outline

Introduction to Human-Robot Interaction (HRI)Observations in a user studyA multi-modal interaction frameworkA multi-modal interaction framework

Currently popular approachesCurrently popular approachesOur approachOur approach

Summary

Page 18: Why and how to model multi-modal interaction for a mobile ...nysmith/organizing/sss07/slides/li/li.pdf · Li, Peltason and Wrede, March 2007 1 Why and how to model multi-modal interaction

Li, Peltason and Wrede, March 2007 18

Interaction framework: existing works

Currently popular approaches

address differences between multi-modal information by grouping it into categories

and handle different categories separately

Page 19: Why and how to model multi-modal interaction for a mobile ...nysmith/organizing/sss07/slides/li/li.pdf · Li, Peltason and Wrede, March 2007 1 Why and how to model multi-modal interaction

Li, Peltason and Wrede, March 2007 19

Interaction framework: existing works

Cassell: generic architecture for embodied conversational agents

Page 20: Why and how to model multi-modal interaction for a mobile ...nysmith/organizing/sss07/slides/li/li.pdf · Li, Peltason and Wrede, March 2007 1 Why and how to model multi-modal interaction

Li, Peltason and Wrede, March 2007 20

Interaction framework: existing works

Traum: dialog model for multi-modal, multi-party dialog in virtual world

(based on information-state theory)

Page 21: Why and how to model multi-modal interaction for a mobile ...nysmith/organizing/sss07/slides/li/li.pdf · Li, Peltason and Wrede, March 2007 1 Why and how to model multi-modal interaction

Li, Peltason and Wrede, March 2007 21

Interaction framework: multi-modal grounding

Our approach

address a common feature of multi-modal information: evocative functions

Page 22: Why and how to model multi-modal interaction for a mobile ...nysmith/organizing/sss07/slides/li/li.pdf · Li, Peltason and Wrede, March 2007 1 Why and how to model multi-modal interaction

Li, Peltason and Wrede, March 2007 22

Interaction framework: multi-modal grounding

Evocative functions of conversational behaviors (CBs)

Definition: CBs evoke a reaction from the interaction partner

Validity: for both propositional and interactional info

Page 23: Why and how to model multi-modal interaction for a mobile ...nysmith/organizing/sss07/slides/li/li.pdf · Li, Peltason and Wrede, March 2007 1 Why and how to model multi-modal interaction

Li, Peltason and Wrede, March 2007 23

Interaction framework: multi-modal grounding

Grounding

Definition: the process of establishing shared understanding during a conversation.

Basic idea: for each account (Presentation) issued in a conversation, there needs to be a feedback (Acceptance) from the interaction partner

Application area: traditionally adopted to model evocative functions of propositional information→ to be extended!

Page 24: Why and how to model multi-modal interaction for a mobile ...nysmith/organizing/sss07/slides/li/li.pdf · Li, Peltason and Wrede, March 2007 1 Why and how to model multi-modal interaction

Li, Peltason and Wrede, March 2007 24

Interaction framework: multi-modal grounding

Our approach: extending grounding

Modeling both propositional and interactional contribution with Interaction Unit Organizing IUs based on the principle of grounding

Page 25: Why and how to model multi-modal interaction for a mobile ...nysmith/organizing/sss07/slides/li/li.pdf · Li, Peltason and Wrede, March 2007 1 Why and how to model multi-modal interaction

Li, Peltason and Wrede, March 2007 25

Interaction framework: multi-modal grounding

Interaction Unit (IU)

Motivation Layer

Behavior Layer

Page 26: Why and how to model multi-modal interaction for a mobile ...nysmith/organizing/sss07/slides/li/li.pdf · Li, Peltason and Wrede, March 2007 1 Why and how to model multi-modal interaction

Li, Peltason and Wrede, March 2007 26

Interaction framework: multi-modal grounding

Interaction Unit (IU)

(motivation conception)

verbal generator

non-verbal generator

Page 27: Why and how to model multi-modal interaction for a mobile ...nysmith/organizing/sss07/slides/li/li.pdf · Li, Peltason and Wrede, March 2007 1 Why and how to model multi-modal interaction

Li, Peltason and Wrede, March 2007 27

Interaction framework: multi-modal grounding

gesturespeech

speech

speech

facial expression

...

Grounding

Motivation

Motivation

Motivation

Motivation

Motivation

gaze

...

Page 28: Why and how to model multi-modal interaction for a mobile ...nysmith/organizing/sss07/slides/li/li.pdf · Li, Peltason and Wrede, March 2007 1 Why and how to model multi-modal interaction

Li, Peltason and Wrede, March 2007 28

Interaction framework: multi-modal grounding

Grounding models:[Clark92][Traum94][Cahn&Brennan99]

Our approach: [Li2006]

S. Li, B. Wrede, and G. Sagerer. A computational model of multi-modal grounding. In Proc. ACL SIGdial workshop on discourse and dialog, in conjunction with COLING/ACL 2006, pages 153-160. ACL Press, 2006.

Discourse

IU 1Ex1

Ex2

Ex3

IU_2

IU 3

IU 4

R(Ex4,Ex2) = support

R(Ex3,Ex2) = default

Page 29: Why and how to model multi-modal interaction for a mobile ...nysmith/organizing/sss07/slides/li/li.pdf · Li, Peltason and Wrede, March 2007 1 Why and how to model multi-modal interaction

Li, Peltason and Wrede, March 2007 29

Interaction framework: multi-modal grounding

Pre-interaction attention: solving the quiet-speaker-problem

User: (shows legs)uninstantiated shows legs

unintentional motivationU1

Page 30: Why and how to model multi-modal interaction for a mobile ...nysmith/organizing/sss07/slides/li/li.pdf · Li, Peltason and Wrede, March 2007 1 Why and how to model multi-modal interaction

Li, Peltason and Wrede, March 2007 30

Interaction framework: multi-modal grounding

User: (shows legs)

BIRON: (opens eyes)

uninstantiated shows legs

unintentional motivation

uninstantiated opens eyes

provides acceptance to user IU

U1

U2

Pre-interaction attention: solving the quiet-speaker-problem

Page 31: Why and how to model multi-modal interaction for a mobile ...nysmith/organizing/sss07/slides/li/li.pdf · Li, Peltason and Wrede, March 2007 1 Why and how to model multi-modal interaction

Li, Peltason and Wrede, March 2007 31

Interaction framework: multi-modal grounding

User: (shows face)

BIRON: (raises head)

uninstantiated shows face and legs

unintentional motivation

uninstantiated raises head

provides acceptance to user IU

U3

U4

Pre-interaction attention: solving the quiet-speaker-problem

Page 32: Why and how to model multi-modal interaction for a mobile ...nysmith/organizing/sss07/slides/li/li.pdf · Li, Peltason and Wrede, March 2007 1 Why and how to model multi-modal interaction

Li, Peltason and Wrede, March 2007 32

Interaction framework: multi-modal grounding

User: This is a cup.deictic gesture

shows BIRON a cupU5

Making social comments: solving the meta-commentator-problem

“This is a cup”

Page 33: Why and how to model multi-modal interaction for a mobile ...nysmith/organizing/sss07/slides/li/li.pdf · Li, Peltason and Wrede, March 2007 1 Why and how to model multi-modal interaction

Li, Peltason and Wrede, March 2007 33

Interaction framework: multi-modal grounding

User: This is a cup.

BIRON: I beg your pardon?

“This is a cup” deictic gesture

shows BIRON a cup

initiates conversational repair

U5

U6

Making social comments: solving the meta-commentator-problem

“I beg your pardon” uninstantiated

Page 34: Why and how to model multi-modal interaction for a mobile ...nysmith/organizing/sss07/slides/li/li.pdf · Li, Peltason and Wrede, March 2007 1 Why and how to model multi-modal interaction

Li, Peltason and Wrede, March 2007 34

Interaction framework: multi-modal grounding

User: This is a cup.

BIRON: I beg your pardon?

BIRON: (looking embarrassed)

“This is a cup” deictic gesture

shows BIRON a cup

uninstantiated

initiates conversational repair

U5

U6

uninstantiated looking embarrassed

shows social awarenessU7

Making social comments: solving the meta-commentator-problem

“I beg your pardon”

Page 35: Why and how to model multi-modal interaction for a mobile ...nysmith/organizing/sss07/slides/li/li.pdf · Li, Peltason and Wrede, March 2007 1 Why and how to model multi-modal interaction

Li, Peltason and Wrede, March 2007 35

Interaction framework: multi-modal grounding

Implemented on systems BIRON and BARTHOC

BIRON BARTHOC

Page 36: Why and how to model multi-modal interaction for a mobile ...nysmith/organizing/sss07/slides/li/li.pdf · Li, Peltason and Wrede, March 2007 1 Why and how to model multi-modal interaction

Li, Peltason and Wrede, March 2007 36

Summary

Two cases studies revealing the importance of multi-modality in situatedness and social behaviors of a robot

A multi-modal interaction framework that addresses the evocative function of conversational behaviors