Learning Interaction Protocols through imitation A data mining approach Yasser Mohammad Nishida Lab....

Learning Interaction Protocols through imitation

A data mining approach

Yasser Mohammad

Nishida Lab.

Artificial Intelligence, Adv (E) (2013) Do not distribute beyond this class

Situated Modules

Used in many systems until now mainly with Robovie

Situated modules are executed in serial

[Ishiguro et al. 1999]

Route Guidance Listener (2006)

Analyze Human Human

Interactions

Implement Model

Evaluate Model

Tune/Adapt (Supervised)

Redesign

Model Controller

ParameterAdjustment

StructureAdjustment

2 WOZ experiments using motion captured data

[Kanda et al. 2007]

Engineering vs. Learning Approaches

Analyze Human Human

Interactions

Implement Model

Evaluate Model


Redesign

Collect Human Human

InteractionsDevelop Interact

Adapt (Unsupervised)

Standard Engineering Approach

Learning/Imitation Approach

Model Controller

ControllerTraining Data

ParameterAdjustment

StructureAdjustment

Parameter &Structure Adjustment

Example Scenarios

Gaze Control During Listening

Guided NavigationExplicit

Implicit

Bird’s Eye View

Learner

Watch External Behavior

Learn Actions’ Model

Learn Commands’ Model

Learn Communication Protocol

Main Insights1.Learning By Watching is Ubiquitous in humans2.Learning Actions and Commands are related3.Change in Behavior is what mattersOperator Actor

Commands

Feedback

Actions

Interaction

Model of Commands

Model of Actions

Communication Protocol

Shared ground

Co-action

Primordial Knowledge Model

models and protocol

action

models and protocol

action

Our Long Term Model

Learner Robot

Watch Mimic

Interact Adapt

Learned models and protocol

Learned action

Adapted models and protocol

Adapted actions

Basic ArchitectureExecution Time Activation Level Behavioral Influence

Design Procedure

Analyze Human Human

Interactions

Implement Model

Evaluate Model


Redesign

Model Controller

ParameterAdjustment

StructureAdjustment

Analyze Task & Required

Basic Actions

Decide Required Behavior

(H-H Interactions)

Learn Parameter

s(FPGA)

Evaluate

Intentions

StructureAdjustment

Processes

Redesign

Example: Gaze Control during Listening

Sensors

Perception Processes

Behavior Processes

Intentions

Floating Point Genetic AlgorithmCr

osso

ver

Select 2 individuals and generate 4:

Calculate probability of passing:

Mut

atio

n

1. Calculate probabilities over 1~m:

2. Calculate P(mutation@ k) as:

3. Select mutation site according to P(mutation @ k)4. Mutate parameter using:

Eliting

Cross Over

Mutation

Tournament

FPGA – Preliminary Evaluation Fitness function:

100 generations 100 individuals Two comparison algorithms

Proposed>A1 p=0.0133Proposed>A2 p=0.0032

[Mohammad & Nishida 2010d]

Applications – Gaze Control Fixed Structure Gaze Controller (18 parameters)

• Dynamic Structure Gaze Controller (7 parameters)


Applications – Gaze Control Fixed vs. Dynamic Structure GC

Six novel sessions Four control GCs

Follow Stare Random


Discovery Phase

Association Phase

Controller GenerationPiecewise Linear Controller Gen.

Baysian Network Induction

Constrained Motif Discovery

Learning by watching/imitation/mimicry

Operator Actor

Learner

Commands

Feedback

Actions

Operator Learned Actor

1. Watch 2. Learn

3. ActCommands

Feedback

Actions

Command stream Action stream

Discrete CommandsDiscrete Actions

Behavior Generation Model

online Robot/Agent Controller

Interaction Protocol

learnedInteraction Protocol

offline

Feedback Controller

23340003204402 320000310

Building Blocks Behavior Discovery

Motif Discovery Change Point Detection

Behavior Association Bayesian Network Induction

Causality Analysis

Behavior Generation Piecewise Linear Controller Generation

Behavior Adaptation Bayesian Network Combination

Gaze Control:Data Collection Experiment

44 participants ages 19-37 (27% females) Untrained to interact with robots

Two objects (chair/stepper) Easily assembled (7 steps both) Not so easy (2 ordering steps both)

Two roles: Instructor: explains about a single object three

times: Good listener Bad listener Robot

Listener: listens to two explanations about two objects: Good listener Bad listener

Gaze Control:Evaluation Experiment Internet poll 35 subjects Watch 2 videos:

ISL learned controller carefully designed controller

Age (ranged from 24 to 43 with an average of 31.16 years). Gender (8 females and 30 males). Experience in dealing with robots (ranged from I never saw one

before to I program robots routinely). Expectation of robot attention in a range from 1 to 7 (4 +-

1.376). Expectation of robot's behavior naturalness in a range from 1 to

7 (3.2 +-1.255). Expectation of robot's behavior human-likeness in a range from

1 to 7 (3.526 +-1.52).

Gaze Control:Example Session




Causality Analysis



(1) Behavior DiscoveryProposed

Command Stream Action Stream

Discover Change Points

Discover Change Points

X

Discover Motifs Discover Motifs

Remove Irrelevant

Dimensions

Remove Irrelevant

Dimensions

Robust SingularSpectrum Transform

Granger-CausalityMaximization

Advantages

• Utilizes relation between actions and commands

• removes irrelevant dimensions

• No need for separate clustering step

• No predefined model

Natural Delay

Discovery

ConstrainedMotif Discovery

A t G t

GC t AC t

ˆ ACˆGC

Motif Discovery Given a timeseries (an ordered list of real

numbers), find approximately recurring subsequences

Chiu 2013

Motif Discovery

• Given a time series X(t) find recurring patterns of length L using distance function D

Catalano’s Algorithm

candidate

noise

comparison

Motif

Compare

[Catalano 2006]

Keep top k rather than best only

Constrained Motif Discovery

• Given a time series X(t) find recurring patterns of length between L1 and L2 using distance function D

subject to the constraint P(t), where P(t) is an estimation of the probability that a motif occurrence exists near time step t.

A motif is likely near here

DGCMD

0 33.4 14.5 34.5

33.4 0 22.43 2.31

14.5 22.43 0 17.43

34.5 2.31 17.43 0

Signal

ConstraintTcon

0 0 0 0

0 0 0 2.31

0 0 0 0

0 2.31

0 0

[Mohammad & Nishida 2009]

DGCMD Advantages:

Controlled Exhaustiveness (# candidates). Controlled Sensitivity (Tc). No random subwindow as needed by some MD

algorithms. No upper bound on motif size as needed by most

MD algorithms.

Disadvantages: Can become quadratic if # candidates is large. Sensitive to outlier segments (long subwindows of

outliers).

DGCMD – Evaluation 50440 time series Variable length (102~106)

Variable noise level (0~20%PP) Variable motif types Variable # of occurrences Motif Discovery Algorithms:

Projections (most accurate) Catalano et al. (fastest)

Constrained Motif Discovery Alg.: MCFull MCInc DGCMD

[Mohammad & Nishida 2009]

How good is the constraint?

Time series length

Number of motif occurrences

Probability of discovering a motif - not using the constraint - using the constraint

Window length

Average motif length

Relative entropy between constraint and motif locations

Entropy of the constraint

[Mohammad & Nishida 2010a]

How to get the constraint?

Main insight The generating dynamics change near the

beginning and end of motifs. We need to find points in the time series where

generating dynamics change




Causality Analysis



Change Point Discovery

Given a time series X(t) find for every time step the probability that X(t) is changing form (underlying dynamics are changing!!)

Available Techniques CUMSUM

Detects only mean change Inflection Point Detection

Assumes any variation is a change!! Autoregressive Modeling

Assumes a specific generating model Mixtures of Gaussians

Assumes a specific generating model Discrete Cosine Transform

Finds only global changes Wavelet Analysis

Tons of parameters Singular Spectrum Transform (SST) [Ide et al.

2005] Most General, no ad-hoc adjustment

Main idea At every point

1. Use few values before it to represent the past: H

2. Use few values after it to represent the future: G

3. Compare the past with the future. The more dissimilar, the highest the score

G H

PastFutureH is a hyper plan

G is a set of Eigen vectors

Singular Spectrum Transform

Future

Change Angle

G H

1 ,..., 1

;...; 1

Tseq t x t w x t

H t seq t n seq t

T lH t U t S t V t U

1

T g g

g

G t G t v v

t v

1

Tl

Tl

U tt

U t

C t t t

PastFuture

;...; 1G t seq t g seq t g m

Parameters

w

n

g, m

l

[Ide et al. 2005]

Numeric Example X(t)={-4,-3,-2,-1,0,1,2,3,4,-1,1,-1,1,-1,1,-1,1} Parameters:

w=g=4,n=m=2,l=1 At t=5

4 3

3 2

2 1

H

0 1

1 2

2 3

G

0.8219 -0.5696 6.5468

0.5696 0.8219 0.3742,u s

0.5048 -0.8632 4.3219

0.8632 0.5048 0.5668,u s

22 2 21 0.8219 0.5696 0.5048 0.8632 0.3991T

SVD SVD

Numeric Example X(t)={-4,-3,-2,-1,0,1,2,3,4,-1,1,-1,1,-1,1,-1,1} Parameters:

w=g=5,n=m=2,l=1 At t=8

1 0

0 1

1 2

H

3 4

4 1

1 1

G

0.5696 -0.8219 6.5468

0.8219 0.5696 0.3742,u s

0.7071 0.7071 2.4495

-0.7071 0.7071 0,u s

22 2 21 0.5696 0.8219 0.7071 0.7071 0.5T

SVD SVD

Final Result

0 2 4 6 8 10 12 14 16 18-4

-2

0

2

4

0 2 4 6 8 10 12 14 16 180

0.2

0.4

0.6

0.8

1

Scores are always normalized by dividing them with max(scores)

Singular Spectrum Transform

•Advantages:▫No predefined generation model.▫Comparably few parameters (5).▫ PCA using SVD works for ANY matrix so no

ad-hoc preprocessing is needed.▫ Linear in the length of the time series.

• Disadvantages▫ Still there are 5 parameters hard to select.▫ Specificity degrades very fast with

increased noise level.▫ Inadequate for time series with no

background signal.

Robust Singular Spectrum Transform

Future

Change Angles

G H

;...; 1H t seq t n seq t

Find optimal

T

p

H t U t S t V t

l

PastFuture

1 ;...;G t seq t seq t n

Parametersw,n

1 1

Find optimal

,

T g g

gi i f j j

f

j

G t G t u u

t u i l

l

and

; ,p

p

Tl i

i fTl i

U tt i l

U t

1

1

1

ˆ

f

f

T

i i i

l

i ii

l

ii

s t t t

csx

c

t

ˆ a b a bx t x t t t t t [Mohammad & Nishida 2009b]

RSST vs. SST – Effect of noise

RSST vs. SST – Real world data

Explanation Scenario 22 participants 3 conditions:

Natural listening Unnatural listening Robot

Physiological Sensors: Respiration Skin Conductance Pulse

RSST vs. SST – Physio-psychological data analysis


Exampled Discovered Behaviors

Stop

Come Here




Causality Analysis



Behavior Association

After Discovering Basic Motifs in both actions and commands and detecting their occurrence in all time series as in this graph

Command 1

Action 1

use the natural delay between commands and actions calculated during the discovery phase.

For every command-action pair calculate the joint-activation of them by the number of occurrences of the action within the natural delay interval of the command. Use the joint-activation values to induce a Baysian Network describing the relation between actions and commands

Mohammad & Nishida 2009

Causality Based Delay Estimation

To find delay between and

Regress actions using actions & gestures

Regress actions using actions only

Compare residues

Calculate g-causality statistic

Find the delay that maximizes g-causality

ˆ AC ˆGC

[Mohammad & Nishida 2009c]

Example: Associating Actions and Gestures Guided Navigation Scenario

Correct prediction 95.2%.





Causality Analysis



Behavior Controller Generation

Convert the Baysian Network learned into L0EICA controller

Command Process & Action Process & Link Effect Channel

Motor Babbling

PLGC


Motor Babbling

iFgenerate a straight line in one dimension while minimizing disturbance

to all others

Mohammad & Nishida 2010c

PLGC





Causality Analysis



Accumulation Phase

a

b

c

d

f

e

1

2

3

4

5

a1

b2

c 3

d4

f5

e

6

6

ABN Combination Main assumption

Action nodes are more compatible than gesture nodes

Algorithm1. Associate action nodes with similar stored pattern

Set of action node association links

2. Associate gesture nodes with similar stored pattern Set of action node association links

3. Calculate Link Competence Index for association links

Set of LCIs for gestures and actions

4. Resolve association link conflicts using LCIs Final ABN

1 12 2,i ij jla v la

1 12 2lg , lgi ij jv

1 12 2, lgi ij jLCI la LCI

Associating action/gesture nodes Compile AN1 and AN2 lists {every action node} Calculate Calculate for all nodes and order

them Create a link iff

for any

Set

Gesture association links are calculated the same way

21jila

1 2,DTW i kd m m 1 1min ,i DTW i kd m m

1 2,DTW i j id m m

1 2 1 2, ,DTW i j DTW i k id m m d m m

2 2 21: existsl

k l im m la

2 1 21 ,ji DTW i jv la d m m


LCI Calculation


Putting It All Together

Guided Navigation

Guided Navigation

o Task Orientedo Explicit Protocolo 1 way Interaction

Roles: Actor & Operator Protocol: explicit Nonverbal Behaviors:

Operator’s Gesture Actor’s motion

Sensors Accelerometers (BPACK) Motion Capture (PhaseSpace)

Procedure1. Offline experiment2. Online experiment

[Mohammad and Nishida 2009c]

Guided Navigation – Online experiment 18 subjects (6 days) Task: operator in GN scenario Procedure

WOZ session (training & familiarization)

3 Sessions on these conditions: WOZ Per-participant learner Accumulating learner


Experimental Setup


Action and Gesture Streams

ax

ay

0r0

a

Action Stream5D

Gesture Stream6D

lx

ly

lz

a

a

a

rx

ry

rz

a

a

a

Guided Navigation – Online Examples

Guided Navigation – Online Examples Number of failures:

Per-Participant 1/18 Accumulating 4/17

0 2 4 6 8 10 12 14 16 181

2

3

4

5

6

7

Participant Order

Ave

rage

Sco

re

AccomulatingWOZPer Participant

Conclusions Unsupervised learning of interaction protocols is possible using

three main data mining technologies: Motif discovery <The Heart> Change point discovery <The speeding engine> Causality analysis <Natural delays>

Several algorithms for solving these three problems were introduced and are available (among others) in source code as a MATLAB toolbox called CPMD

We have shown that it is possible to learn both implicit interaction protocols (gaze control) and explicit interaction protocols (guided navigation) without explicit modeling

By manipulating learned BNs, it is possible to improve the interactive behavior of agents over time based on interactions with multiple people

References [Ishiguro et al. 1999] Ishiguro, H.; Kanda, T.; Kimoto, K.; Ishida, T., "A robot architecture based on situated

modules," Intelligent Robots and Systems, 1999. IROS '99. Proceedings. 1999 IEEE/RSJ International Conference on , vol.3, no., pp.1617,1624 vol.3, 1999

[Catalano 2006] Joe Catalano, Tom Armstrong, and Tim Oates. Discovering patterns in real-valued time series. In Knowledge Discovery in Databases: PKDD 2006, pages 462–469, 2006.

[Chiu 2003] B. Chiu, E. Keogh, and S. Lonardi, “Probabilistic discovery of time series motifs,” in KDD ’03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. New York, NY, USA: ACM, 2003, pp. 493–498.

[Ide 2005] T. Ide and K. Inoue, “Knowledge discovery from heterogeneous dynamic systems using change-point correlations,” in Proc. SIAM Intl. Conf. Data Mining, 2005.

[Kanda et al. 2007] Kanda, T., Kamasima, M., Imai, M. et al. “A Humanoid Robot That Pretends to Listen to Route Guidance from a Human”, Auton. Robots, Vol. 22, Number 1, pages 87-100, 2007.

[Mohammad and Nishida 2009a] Yasser Mohammad, Toyoaki Nishida, Shogo Okada, “Unsupervised Simultaneous Learning of Gestures, Actions and their Associations for Human-Robot Interaction," . IEEE/RSJ International Conference on Intelligent Robots and Systems, 2009. IROS 2009, pp.2537-2544, 11-15 Oct. 2009

[Mohammad and Nishida 2009b] Yasser Mohammad and Toyoaki Nishida, Robust Singular Spectrum Transform, The Twenty Second International Conference on Industrial, Engineering & Other Applications of Applied Intelligent Systems (IEA/AIE 2009), June 2009, Taiwan, pp 123-132.

[Mohammad and Nishida 2009c] Yasser Mohammad and Toyoaki Nishida, Measuring Naturalness During Close Encounters Using Physiological Signal Processing, The Twenty Second International Conference on Industrial, Engineering & Other Applications of Applied Intelligent Systems (IEA/AIE 2009), June 2009, Taiwan, pp. 281-290

[Mohammad and Nishida 2009d]] Yasser Mohammad and Toyoaki Nishida, Using Physiological Signals to Detect Natural Interactive Behavior, Applied Intelligence, 13(1) 79-92

[Mohammad 2009 PhDThesis] Yasser Mohammad, Autonomous Development of Natural Interactive Behavior for Robots and Embodied Agents, PhD Thesis, Kyoto University, September 2009

[Mohammad and Nishida 2010a] Yasser Mohammad, Toyoaki Nishida, Learning Interaction Protocols using Augmented Baysian Networks Applied to Guided Navigation, Taipei, Taiwan, IROS 2010.

[Mohammad and Nishida 2010c] Yasser Mohammad, Toyoaki Nishida, Learning Interaction Protocols using Augmented Baysian Networks Applied to Guided Navigation, Taipei, Taiwan, IROS 2010.

[Mohammad and Nishida 2010d] Yasser Mohammad and Toyoaki Nishida, Controlling Gaze with an Embodied Interactive Control Architecture, Applied Intelligence, Vol. 32, No. 2, 2010, pp 148-163

Learning Interaction Protocols through imitation A data mining approach Yasser Mohammad Nishida Lab....

Documents

Transcript of Learning Interaction Protocols through imitation A data mining approach Yasser Mohammad Nishida Lab....