Language acquisition framework for robots: From grounded language acquisition to spoken dialogues

27
LCore: A Language Acquisition Framework for Robots From Grounded Language Acquisition to Spoken Dialogues Komei Sugiura and Naoto Iwahashi National Institute of Information and Communication Technology, Japan [email protected] 2013/12/13

description

 

Transcript of Language acquisition framework for robots: From grounded language acquisition to spoken dialogues

Page 1: Language acquisition framework for robots: From grounded language acquisition to spoken dialogues

LCore: A Language Acquisition Framework for RobotsFrom Grounded Language Acquisition to Spoken Dialogues

Komei Sugiura and Naoto IwahashiNational Institute of Information and Communication Technology, [email protected]

2013/12/13

Page 2: Language acquisition framework for robots: From grounded language acquisition to spoken dialogues

Open problem: grounded language processing

• Language processing based on non-verbal information (vision, motion, context, experience, …) is still very difficult– e.g. “Put the blue cup away”, “Give me the usual”

“blue cup”: multiple candidates “the usual”: umbrella, remote, drink,..

• What is missing in dialog processing for robots?– Physical situatedness / symbol grounding– Shared experience

2

Page 3: Language acquisition framework for robots: From grounded language acquisition to spoken dialogues

Spoken dialogue system + Robot ≠ Robot dialogue

• Robot dialogue– Categorization/prediction of real-world information– Handling real-world properties– Linguistic interaction

• Why is this difficult?– Machine learning, CV, manipulation, symbol grounding problem,

speech recognition,…

Cutlery

Fork

Tableware

Tea cup Knife

Cup Plate

Page 4: Language acquisition framework for robots: From grounded language acquisition to spoken dialogues

Robot Language Acquisition Framework [Iwahashi 10, “Robots That Learn to Communicate: A Developmental Approach…”]

• Task: Object manipulation dialogues• Key features

– Fully grounded vocabulary– Imitation learning– Incremental & interactive learning– Language independent

4

Page 5: Language acquisition framework for robots: From grounded language acquisition to spoken dialogues

LCore functions

Phoneme learning

Word learning

Grammar learning

Disambiguation of word ellipsis

Utterance understanding

Robot-directed utterance detection

Learning question answering

Visual feature learning

Affordance learning

Imitation learning

Role reversal imitation

Active-learning-based dialogue

5

Page 6: Language acquisition framework for robots: From grounded language acquisition to spoken dialogues

Learning modulesWord Grammar Motion-object

relationship

• Learning verbs• Estimation of related objects• Learning trajectories• Learning phoneme sequences

• Learning nouns/adjectives• Learning probabilistic distributions of

visual features• Learning phoneme sequences

Page 7: Language acquisition framework for robots: From grounded language acquisition to spoken dialogues

Symbol grounding: Learning nouns and adjectives

• Visual features modeled by Gaussians– Input: visual features of objects

• Out-of-vocabulary word = phoneme sequence + waveform– Voice conversion (Eigenvoice GMM) to robot voice

BLUE

RED

Unknown object

Generative models

Page 8: Language acquisition framework for robots: From grounded language acquisition to spoken dialogues

Imitation learning of object manipulation [Sugiura+ 07]

• Difficulty: Clustering trajectories in the world coordinate system does not work• Proposed method

– Input: Position sequences of all objects– Estimation of reference point and coordinate system by EM algorithm– Number of state is optimized by cross-validation

Place A on B

Page 9: Language acquisition framework for robots: From grounded language acquisition to spoken dialogues

Imitation learning using reference-point-dependent HMMs[Sugiura+ 07][Sugiura+ 11]

• Delta parameters

:Position at time t

= …

= …

Searching optimal coordinate system

Reference object ID

HMM parameters

Coordinate system type

* Sugiura, K. et al, “Learning, Recognition, and Generation of Motion by …”, Advanced Robotics, Vol.25, No.17, 2011

Page 10: Language acquisition framework for robots: From grounded language acquisition to spoken dialogues

Results: motion learning

Place-on Move-closer Raise Rotate

Jump-over Move-away Move-down

Log likelihood

Position

Velocity

Training-set likelihoodMotion “place-on”

No verb is estimated to have WCS-> Reference-point-dependent verb

Page 11: Language acquisition framework for robots: From grounded language acquisition to spoken dialogues

HMM “Place-on”Place X on Y

Transformation of reference-point-dependent HMMs [Sugiura+ 11]

• What is the problem?– Simple HMMs do not generate continuous trajectories– Situation dependent trajectories

• Reference-point-dependent HMM– Input: (motion ID, object ID) e.g. <place-on, Object 1, Object 3>– Output: Maximum likelihood trajectory

HMM “Place-on”

World CS

Situation

Place X on Y* Sugiura, K. (2011), “Learning, Generation, and Recognition of Reference-Point-Dependent Probabilistic…”

Page 12: Language acquisition framework for robots: From grounded language acquisition to spoken dialogues

Generating continuous trajectory using delta parameters[Tokuda+ 00]

: state sequence: HMM parameters

: time series of (position,velocity,acceleration)

Maximum likelihood trajectory

*Tokuda, K. et al, “Speech parameter generation algorithms for HMM-based speech synthesis”, 2000

: vector of mean vectors

: matrix of covariance matrices of each OPDF

: filter ( )

: time series of position

Page 13: Language acquisition framework for robots: From grounded language acquisition to spoken dialogues

Quantitative results

• Evaluation measure– Euclidian distance – Normalized by frame number T

Trajectory by proposed method

Trajectory by Subject

Page 14: Language acquisition framework for robots: From grounded language acquisition to spoken dialogues

SPOKEN LANGUAGE UNDERSTANDING USING NON-LINGUISTIC INFORMATION

Page 15: Language acquisition framework for robots: From grounded language acquisition to spoken dialogues

Utterance understanding in LCore (1)• User utterances are understood by using multimodal

information learned in a statistical learning framework

Shared belief

Speech

(HMM)

Motion

(HMM)

Vision(Bayesian

learning of a Gaussian) Motion-object

relationship(Bayesian learning

of a Gaussian)

Context

(MCE Learning)15

Page 16: Language acquisition framework for robots: From grounded language acquisition to spoken dialogues

Integration of multimodal information

• Shared belief Ψ: weighted sum of five modules

Speech

Motion

Vision

Motion-object relationship

Context

contextsceneactionutterance

16

Page 17: Language acquisition framework for robots: From grounded language acquisition to spoken dialogues

Inter-module learning

Multimodal understanding

Confidence learning

Utterance/Motion generation

Place Elmo on box

User intension

Place ElmoPlace it

17

Page 18: Language acquisition framework for robots: From grounded language acquisition to spoken dialogues

Grounded utterance disambiguation

• Simple dialog systemsU: “Place the cup (on the table).”R: “You said place the cup.”

-> Risk of motion failure• Generating confirmation utterances using physical information

R: “I’ll place the red cup on the table, is it OK?”

Where to?Which “cup”?

Page 19: Language acquisition framework for robots: From grounded language acquisition to spoken dialogues

Multimodal utterance understanding

Place-on Elmo

1st 2nd 30th

… …

1st

2nd

30th

Sugiura, K. et al, "Situated Spoken Dialogue with Robots Using Active Learning", Advanced Robotics, 2011 19

Page 20: Language acquisition framework for robots: From grounded language acquisition to spoken dialogues

Multimodal utterance understanding

1st 2nd 30th

… …

Margin

1st

2nd

30th

Sugiura, K. et al, "Situated Spoken Dialogue with Robots Using Active Learning", Advanced Robotics, 2011

Place-on Elmo

20

Page 21: Language acquisition framework for robots: From grounded language acquisition to spoken dialogues

Confirmation by paraphrasing user’s utterance

21

• Learning phase• Bayesian Logistic Regression• Input: Margin(d), Output: probability

Margin

Probability

• Execution phase– Decision-making on responses

based on expected utility

Page 22: Language acquisition framework for robots: From grounded language acquisition to spoken dialogues

Quantitative result: Risk reductionBaseline

Failure rateRejection rateConfirmation rate# of confirmation utt

Decreased to 1/4

Proposed

22

Page 23: Language acquisition framework for robots: From grounded language acquisition to spoken dialogues

Reduction of motion failure in learning phase [Sugiura+ 11]

• So far…– Learning utterance understanding probabilities

Sugiura, K. et al, "Situated Spoken Dialogue with Robots Using Active Learning", Advanced Robotics, Vol. 25, No. 17, 2011

• Idea• Learning-by-asking

Phase Operator Motion executorActive Learning Robot User

(Passive) learning User RobotExecution User Robot

Page 24: Language acquisition framework for robots: From grounded language acquisition to spoken dialogues

Motion success

Motion failure

Reduction of motion failure in learning phase

Execution phaseLearning phaseMotion success

Motion failure

Active Learning phase

“Safe” training data

• Problem: – Motion failure is required in learning

phase to avoid over-fitting

Page 25: Language acquisition framework for robots: From grounded language acquisition to spoken dialogues

What kind of commands are effective for learning?

Target action Robot utterance Loss

Act=A, Objs = <1,3> “Place-on Elmo blue box” 35.8

Act=A, Objs = <1,3> “Place-on Elmo” 12.3

Act=A, Objs= <1, 2> “Place-on Elmo” 28.1: : :

Act=B, Objs=<2> “Raise box” 332.3: : :

• Proposed method: Active Learning-based command generation• Objective: Reduce the number of interactions• [Input = image], [Output = utterance]• Expected Log Loss Reduction(ELLR[Roy, 2001]) is used to select

the optimal utteranceActive Learning : A form of supervised learning in which inputs can be selected by the algorithm

Page 26: Language acquisition framework for robots: From grounded language acquisition to spoken dialogues

Utterance generation by ELLR

Page 27: Language acquisition framework for robots: From grounded language acquisition to spoken dialogues

Reduction of motion failure in learning phase

Number of episodes

Test-set likelihood

(1) Proposed(2) Baseline

Proposed Baseline#

of m

otio

n fa

ilure

Fast convergence Motion failure risk reduced