Language acquisition framework for robots: From grounded language acquisition to spoken dialogues

LCore: A Language Acquisition Framework for RobotsFrom Grounded Language Acquisition to Spoken Dialogues

Komei Sugiura and Naoto IwahashiNational Institute of Information and Communication Technology, Japankomei.sugiura@nict.go.jp

2013/12/13

Open problem: grounded language processing

• Language processing based on non-verbal information (vision, motion, context, experience, …) is still very difficult– e.g. “Put the blue cup away”, “Give me the usual”

“blue cup”: multiple candidates “the usual”： umbrella, remote, drink,..

• What is missing in dialog processing for robots?– Physical situatedness / symbol grounding– Shared experience

Spoken dialogue system + Robot ≠ Robot dialogue

• Robot dialogue– Categorization/prediction of real-world information– Handling real-world properties– Linguistic interaction

• Why is this difficult?– Machine learning, CV, manipulation, symbol grounding problem,

speech recognition,…

Cutlery

Tableware

Tea cup Knife

Cup Plate

Robot Language Acquisition Framework [Iwahashi 10, “Robots That Learn to Communicate: A Developmental Approach…”]

• Task: Object manipulation dialogues• Key features

– Fully grounded vocabulary– Imitation learning– Incremental & interactive learning– Language independent

LCore functions

Phoneme learning

Word learning

Grammar learning

Disambiguation of word ellipsis

Utterance understanding

Robot-directed utterance detection

Learning question answering

Visual feature learning

Affordance learning

Imitation learning

Role reversal imitation

Active-learning-based dialogue

Learning modulesWord Grammar Motion-object

relationship

• Learning verbs• Estimation of related objects• Learning trajectories• Learning phoneme sequences

• Learning nouns/adjectives• Learning probabilistic distributions of

visual features• Learning phoneme sequences

Symbol grounding: Learning nouns and adjectives

• Visual features modeled by Gaussians– Input: visual features of objects

• Out-of-vocabulary word = phoneme sequence + waveform– Voice conversion (Eigenvoice GMM) to robot voice

Unknown object

Generative models

Imitation learning of object manipulation [Sugiura+ 07]

• Difficulty: Clustering trajectories in the world coordinate system does not work• Proposed method

– Input: Position sequences of all objects– Estimation of reference point and coordinate system by EM algorithm– Number of state is optimized by cross-validation

Place A on B

Imitation learning using reference-point-dependent HMMs[Sugiura+ 07][Sugiura+ 11]

• Delta parameters

:Position at time t

Searching optimal coordinate system

Reference object ID

HMM parameters

Coordinate system type

* Sugiura, K. et al, “Learning, Recognition, and Generation of Motion by …”, Advanced Robotics, Vol.25, No.17, 2011

Results: motion learning

Place-on Move-closer Raise Rotate

Jump-over Move-away Move-down

Log likelihood

Position

Velocity

Training-set likelihoodMotion “place-on”

No verb is estimated to have WCS-> Reference-point-dependent verb

HMM “Place-on”Place X on Y

Transformation of reference-point-dependent HMMs [Sugiura+ 11]

• What is the problem?– Simple HMMs do not generate continuous trajectories– Situation dependent trajectories

• Reference-point-dependent HMM– Input: (motion ID, object ID) e.g. <place-on, Object 1, Object 3>– Output: Maximum likelihood trajectory

HMM “Place-on”

World CS

Situation

Place X on Y* Sugiura, K. (2011), “Learning, Generation, and Recognition of Reference-Point-Dependent Probabilistic…”

Generating continuous trajectory using delta parameters[Tokuda+ 00]

: state sequence: HMM parameters

: time series of (position,velocity,acceleration)

Maximum likelihood trajectory

*Tokuda, K. et al, “Speech parameter generation algorithms for HMM-based speech synthesis”, 2000

: vector of mean vectors

: matrix of covariance matrices of each OPDF

: filter ( )

: time series of position

Quantitative results

• Evaluation measure– Euclidian distance – Normalized by frame number T

Trajectory by proposed method

Trajectory by Subject

SPOKEN LANGUAGE UNDERSTANDING USING NON-LINGUISTIC INFORMATION

Utterance understanding in LCore (1)• User utterances are understood by using multimodal

information learned in a statistical learning framework

Shared belief

Speech

（HMM）

Motion

（HMM）

Vision(Bayesian

learning of a Gaussian) Motion-object

relationship(Bayesian learning

of a Gaussian)

Context

（MCE Learning）15

Integration of multimodal information

• Shared belief Ψ: weighted sum of five modules

Speech

Motion

Vision

Motion-object relationship

Context

contextsceneactionutterance

Inter-module learning

Multimodal understanding

Confidence learning

Utterance/Motion generation

Place Elmo on box

User intension

Place ElmoPlace it

Grounded utterance disambiguation

• Simple dialog systemsU: “Place the cup (on the table).”R: “You said place the cup.”

-> Risk of motion failure• Generating confirmation utterances using physical information

R: “I’ll place the red cup on the table, is it OK?”

Where to?Which “cup”?

Multimodal utterance understanding

Place-on Elmo

1st 2nd 30th

… …

Sugiura, K. et al, "Situated Spoken Dialogue with Robots Using Active Learning", Advanced Robotics, 2011 19

Multimodal utterance understanding

1st 2nd 30th

… …

Margin

Sugiura, K. et al, "Situated Spoken Dialogue with Robots Using Active Learning", Advanced Robotics, 2011

Place-on Elmo

Confirmation by paraphrasing user’s utterance

• Learning phase• Bayesian Logistic Regression• Input: Margin(d), Output: probability

Margin

Probability

• Execution phase– Decision-making on responses

based on expected utility

Quantitative result: Risk reductionBaseline

Failure rateRejection rateConfirmation rate# of confirmation utt

Decreased to 1/4

Proposed

Reduction of motion failure in learning phase [Sugiura+ 11]

• So far…– Learning utterance understanding probabilities

Sugiura, K. et al, "Situated Spoken Dialogue with Robots Using Active Learning", Advanced Robotics, Vol. 25, No. 17, 2011

• Idea• Learning-by-asking

Phase Operator Motion executorActive Learning Robot User

(Passive) learning User RobotExecution User Robot

Motion success

Motion failure

Reduction of motion failure in learning phase

Execution phaseLearning phaseMotion success

Motion failure

Active Learning phase

“Safe” training data

• Problem: – Motion failure is required in learning

phase to avoid over-fitting

What kind of commands are effective for learning?

Target action Robot utterance Loss

Act=A, Objs = <1,3> “Place-on Elmo blue box” 35.8

Act=A, Objs = <1,3> “Place-on Elmo” 12.3

Act=A, Objs= <1, 2> “Place-on Elmo” 28.1：：：

Act=B, Objs=<2> “Raise box” 332.3：：：

• Proposed method: Active Learning-based command generation• Objective: Reduce the number of interactions• [Input = image], [Output = utterance]• Expected Log Loss Reduction(ELLR[Roy, 2001]) is used to select

the optimal utteranceActive Learning : A form of supervised learning in which inputs can be selected by the algorithm

Utterance generation by ELLR

Reduction of motion failure in learning phase

Number of episodes

Test-set likelihood

(1) Proposed(2) Baseline

Proposed Baseline#

Fast convergence Motion failure risk reduced

Language acquisition framework for robots: From grounded language acquisition to spoken dialogues

Technology

Transcript of Language acquisition framework for robots: From grounded language acquisition to spoken dialogues

Kiziria Aronson - Georgian Language and culture (dialogues)

Proactive Acquisition Dialogues Jihie Kim Yolanda Gil

Second Language Transfer During Third Language Acquisition · PDF fileSecond language transfer during third language acquisition 1 Second Language Transfer During Third Language Acquisition

Language acquisition &language processing

Language Acquisition

Studies in Self -Access Learning Journal Self-Regulation ... · Self-Regulation within Language Learners’ Dialogues ... Self-regulation within language learners’ dialogues. ...

Language Generation and Speech Synthesis in Dialogues for ...

Language acquisition guide - Pittsburgh Public Schools€¦ · Language acquisition guide 5 Language acquisition across the IB continuum Language acquisition in the MYP The IB continuum

Module 1: First language acquisition, second language ... › 2014 › 10 › ling-307 … · Language Acquisition Module 1: First language acquisition, second language acquisition,

Natural Language Dialogues with Seq2Seq · Natural Language Dialogues with Sequence-To-Sequence Learning Dirk von Grünigen Deep Learning Day 2017, 22nd September 2017

First language acquisition, Second language acquisition ... · PDF fileLinguistics First language acquisition, Second language acquisition/learning, Language and social variation Chapters

Proactive Acquisition Dialogues

FIRST LANGUAGE ACQUISITION AND SECOND LANGUAGE ACQUISITION

Language acquisition - LING 200: Introduction to the Study ... · The study of language acquisition Acquisition of syntax Over-regularization Stages of language acquisition Stages

Second Language Acquisition Approaches - Wikispacesweareallalike.wikispaces.com/file/view/Second+Language+Acquisition... · Second Language Acquisition Approaches & Contextual Factors

Language acquisition .

Lexicon Syntax Interface in Second Language Acquisition (Language Acquisition and Language Disorders)

First Language Acquisition Ling400. First language (L1) acquisition Crucial questions regarding language acquisitionCrucial questions regarding language.

Language learning and social media: 6 key dialogues

Second Language Acquisition. Language Learning vs. Language Acquisition Language acquisition is a subconscious process. Language learning requires a formal.