Learning Interaction Protocols through imitation A data mining approach Yasser Mohammad Nishida Lab....
-
Upload
lindsay-wilkins -
Category
Documents
-
view
215 -
download
1
Transcript of Learning Interaction Protocols through imitation A data mining approach Yasser Mohammad Nishida Lab....
Learning Interaction Protocols through imitation
A data mining approach
Yasser Mohammad
Nishida Lab.
Artificial Intelligence, Adv (E) (2013) Do not distribute beyond this class
Situated Modules
Used in many systems until now mainly with Robovie
Situated modules are executed in serial
[Ishiguro et al. 1999]
Route Guidance Listener (2006)
Analyze Human Human
Interactions
Implement Model
Evaluate Model
Tune/Adapt (Supervised)
Redesign
Model Controller
ParameterAdjustment
StructureAdjustment
2 WOZ experiments using motion captured data
[Kanda et al. 2007]
Engineering vs. Learning Approaches
Analyze Human Human
Interactions
Implement Model
Evaluate Model
Tune/Adapt (Supervised)
Redesign
Collect Human Human
InteractionsDevelop Interact
Adapt (Unsupervised)
Standard Engineering Approach
Learning/Imitation Approach
Model Controller
ControllerTraining Data
ParameterAdjustment
StructureAdjustment
Parameter &Structure Adjustment
Example Scenarios
Gaze Control During Listening
Guided NavigationExplicit
Implicit
Bird’s Eye View
Learner
Watch External Behavior
Learn Actions’ Model
Learn Commands’ Model
Learn Communication Protocol
Main Insights1.Learning By Watching is Ubiquitous in humans2.Learning Actions and Commands are related3.Change in Behavior is what mattersOperator Actor
Commands
Feedback
Actions
Interaction
Model of Commands
Model of Actions
Communication Protocol
Shared ground
Co-action
Primordial Knowledge Model
models and protocol
action
models and protocol
action
Our Long Term Model
Learner Robot
Watch Mimic
Interact Adapt
Learned models and protocol
Learned action
Adapted models and protocol
Adapted actions
Basic ArchitectureExecution Time Activation Level Behavioral Influence
Design Procedure
Analyze Human Human
Interactions
Implement Model
Evaluate Model
Tune/Adapt (Supervised)
Redesign
Model Controller
ParameterAdjustment
StructureAdjustment
Analyze Task & Required
Basic Actions
Decide Required Behavior
(H-H Interactions)
Learn Parameter
s(FPGA)
Evaluate
Intentions
StructureAdjustment
Processes
Redesign
Example: Gaze Control during Listening
Sensors
Perception Processes
Behavior Processes
Intentions
Floating Point Genetic AlgorithmCr
osso
ver
Select 2 individuals and generate 4:
Calculate probability of passing:
Mut
atio
n
1. Calculate probabilities over 1~m:
2. Calculate P(mutation@ k) as:
3. Select mutation site according to P(mutation @ k)4. Mutate parameter using:
Eliting
Cross Over
Mutation
Tournament
FPGA – Preliminary Evaluation Fitness function:
100 generations 100 individuals Two comparison algorithms
Proposed>A1 p=0.0133Proposed>A2 p=0.0032
[Mohammad & Nishida 2010d]
Applications – Gaze Control Fixed Structure Gaze Controller (18 parameters)
• Dynamic Structure Gaze Controller (7 parameters)
[Mohammad & Nishida 2010d]
Applications – Gaze Control Fixed vs. Dynamic Structure GC
Six novel sessions Four control GCs
Follow Stare Random
[Mohammad & Nishida 2010d]
Discovery Phase
Association Phase
Controller GenerationPiecewise Linear Controller Gen.
Baysian Network Induction
Constrained Motif Discovery
Learning by watching/imitation/mimicry
Operator Actor
Learner
Commands
Feedback
Actions
Operator Learned Actor
1. Watch 2. Learn
3. ActCommands
Feedback
Actions
Command stream Action stream
Discrete CommandsDiscrete Actions
Behavior Generation Model
online Robot/Agent Controller
Interaction Protocol
learnedInteraction Protocol
offline
Feedback Controller
23340003204402 320000310
Building Blocks Behavior Discovery
Motif Discovery Change Point Detection
Behavior Association Bayesian Network Induction
Causality Analysis
Behavior Generation Piecewise Linear Controller Generation
Behavior Adaptation Bayesian Network Combination
Gaze Control:Data Collection Experiment
44 participants ages 19-37 (27% females) Untrained to interact with robots
Two objects (chair/stepper) Easily assembled (7 steps both) Not so easy (2 ordering steps both)
Two roles: Instructor: explains about a single object three
times: Good listener Bad listener Robot
Listener: listens to two explanations about two objects: Good listener Bad listener
Gaze Control:Evaluation Experiment Internet poll 35 subjects Watch 2 videos:
ISL learned controller carefully designed controller
Age (ranged from 24 to 43 with an average of 31.16 years). Gender (8 females and 30 males). Experience in dealing with robots (ranged from I never saw one
before to I program robots routinely). Expectation of robot attention in a range from 1 to 7 (4 +-
1.376). Expectation of robot's behavior naturalness in a range from 1 to
7 (3.2 +-1.255). Expectation of robot's behavior human-likeness in a range from
1 to 7 (3.526 +-1.52).
Gaze Control:Example Session
Building Blocks Behavior Discovery
Motif Discovery Change Point Detection
Behavior Association Bayesian Network Induction
Causality Analysis
Behavior Generation Piecewise Linear Controller Generation
Behavior Adaptation Bayesian Network Combination
(1) Behavior DiscoveryProposed
Command Stream Action Stream
Discover Change Points
Discover Change Points
X
Discover Motifs Discover Motifs
Remove Irrelevant
Dimensions
Remove Irrelevant
Dimensions
Robust SingularSpectrum Transform
Granger-CausalityMaximization
Advantages
• Utilizes relation between actions and commands
• removes irrelevant dimensions
• No need for separate clustering step
• No predefined model
Natural Delay
Discovery
ConstrainedMotif Discovery
A t G t
GC t AC t
ˆ ACˆGC
Motif Discovery Given a timeseries (an ordered list of real
numbers), find approximately recurring subsequences
Chiu 2013
Motif Discovery
• Given a time series X(t) find recurring patterns of length L using distance function D
Catalano’s Algorithm
candidate
noise
comparison
Motif
Compare
[Catalano 2006]
Keep top k rather than best only
Constrained Motif Discovery
• Given a time series X(t) find recurring patterns of length between L1 and L2 using distance function D
subject to the constraint P(t), where P(t) is an estimation of the probability that a motif occurrence exists near time step t.
A motif is likely near here
DGCMD
0 33.4 14.5 34.5
33.4 0 22.43 2.31
14.5 22.43 0 17.43
34.5 2.31 17.43 0
Signal
ConstraintTcon
0 0 0 0
0 0 0 2.31
0 0 0 0
0 2.31
0 0
[Mohammad & Nishida 2009]
DGCMD Advantages:
Controlled Exhaustiveness (# candidates). Controlled Sensitivity (Tc). No random subwindow as needed by some MD
algorithms. No upper bound on motif size as needed by most
MD algorithms.
Disadvantages: Can become quadratic if # candidates is large. Sensitive to outlier segments (long subwindows of
outliers).
DGCMD – Evaluation 50440 time series Variable length (102~106)
Variable noise level (0~20%PP) Variable motif types Variable # of occurrences Motif Discovery Algorithms:
Projections (most accurate) Catalano et al. (fastest)
Constrained Motif Discovery Alg.: MCFull MCInc DGCMD
[Mohammad & Nishida 2009]
How good is the constraint?
Time series length
Number of motif occurrences
Probability of discovering a motif - not using the constraint - using the constraint
Window length
Average motif length
Relative entropy between constraint and motif locations
Entropy of the constraint
[Mohammad & Nishida 2010a]
How to get the constraint?
Main insight The generating dynamics change near the
beginning and end of motifs. We need to find points in the time series where
generating dynamics change
Building Blocks Behavior Discovery
Motif Discovery Change Point Detection
Behavior Association Bayesian Network Induction
Causality Analysis
Behavior Generation Piecewise Linear Controller Generation
Behavior Adaptation Bayesian Network Combination
Change Point Discovery
Given a time series X(t) find for every time step the probability that X(t) is changing form (underlying dynamics are changing!!)
Available Techniques CUMSUM
Detects only mean change Inflection Point Detection
Assumes any variation is a change!! Autoregressive Modeling
Assumes a specific generating model Mixtures of Gaussians
Assumes a specific generating model Discrete Cosine Transform
Finds only global changes Wavelet Analysis
Tons of parameters Singular Spectrum Transform (SST) [Ide et al.
2005] Most General, no ad-hoc adjustment
Main idea At every point
1. Use few values before it to represent the past: H
2. Use few values after it to represent the future: G
3. Compare the past with the future. The more dissimilar, the highest the score
G H
PastFutureH is a hyper plan
G is a set of Eigen vectors
Singular Spectrum Transform
Future
Change Angle
G H
1 ,..., 1
;...; 1
Tseq t x t w x t
H t seq t n seq t
T lH t U t S t V t U
1
T g g
g
G t G t v v
t v
1
Tl
Tl
U tt
U t
C t t t
PastFuture
;...; 1G t seq t g seq t g m
Parameters
w
n
g, m
l
[Ide et al. 2005]
Numeric Example X(t)={-4,-3,-2,-1,0,1,2,3,4,-1,1,-1,1,-1,1,-1,1} Parameters:
w=g=4,n=m=2,l=1 At t=5
4 3
3 2
2 1
H
0 1
1 2
2 3
G
0.8219 -0.5696 6.5468
0.5696 0.8219 0.3742,u s
0.5048 -0.8632 4.3219
0.8632 0.5048 0.5668,u s
22 2 21 0.8219 0.5696 0.5048 0.8632 0.3991T
SVD SVD
Numeric Example X(t)={-4,-3,-2,-1,0,1,2,3,4,-1,1,-1,1,-1,1,-1,1} Parameters:
w=g=5,n=m=2,l=1 At t=8
1 0
0 1
1 2
H
3 4
4 1
1 1
G
0.5696 -0.8219 6.5468
0.8219 0.5696 0.3742,u s
0.7071 0.7071 2.4495
-0.7071 0.7071 0,u s
22 2 21 0.5696 0.8219 0.7071 0.7071 0.5T
SVD SVD
Final Result
0 2 4 6 8 10 12 14 16 18-4
-2
0
2
4
0 2 4 6 8 10 12 14 16 180
0.2
0.4
0.6
0.8
1
Scores are always normalized by dividing them with max(scores)
Singular Spectrum Transform
•Advantages:▫No predefined generation model.▫Comparably few parameters (5).▫ PCA using SVD works for ANY matrix so no
ad-hoc preprocessing is needed.▫ Linear in the length of the time series.
• Disadvantages▫ Still there are 5 parameters hard to select.▫ Specificity degrades very fast with
increased noise level.▫ Inadequate for time series with no
background signal.
Robust Singular Spectrum Transform
Future
Change Angles
G H
;...; 1H t seq t n seq t
Find optimal
T
p
H t U t S t V t
l
PastFuture
1 ;...;G t seq t seq t n
Parametersw,n
1 1
Find optimal
,
T g g
gi i f j j
f
j
G t G t u u
t u i l
l
and
; ,p
p
Tl i
i fTl i
U tt i l
U t
1
1
1
ˆ
f
f
T
i i i
l
i ii
l
ii
s t t t
csx
c
t
ˆ a b a bx t x t t t t t [Mohammad & Nishida 2009b]
RSST vs. SST – Effect of noise
RSST vs. SST – Real world data
Explanation Scenario 22 participants 3 conditions:
Natural listening Unnatural listening Robot
Physiological Sensors: Respiration Skin Conductance Pulse
RSST vs. SST – Physio-psychological data analysis
[Mohammad & Nishida 2009d]
Exampled Discovered Behaviors
Stop
Come Here
Building Blocks Behavior Discovery
Motif Discovery Change Point Detection
Behavior Association Bayesian Network Induction
Causality Analysis
Behavior Generation Piecewise Linear Controller Generation
Behavior Adaptation Bayesian Network Combination
Behavior Association
After Discovering Basic Motifs in both actions and commands and detecting their occurrence in all time series as in this graph
Command 1
Action 1
use the natural delay between commands and actions calculated during the discovery phase.
For every command-action pair calculate the joint-activation of them by the number of occurrences of the action within the natural delay interval of the command. Use the joint-activation values to induce a Baysian Network describing the relation between actions and commands
Mohammad & Nishida 2009
Causality Based Delay Estimation
To find delay between and
Regress actions using actions & gestures
Regress actions using actions only
Compare residues
Calculate g-causality statistic
Find the delay that maximizes g-causality
ˆ AC ˆGC
[Mohammad & Nishida 2009c]
Example: Associating Actions and Gestures Guided Navigation Scenario
Correct prediction 95.2%.
[Mohammad & Nishida 2009c]
Building Blocks Behavior Discovery
Motif Discovery Change Point Detection
Behavior Association Bayesian Network Induction
Causality Analysis
Behavior Generation Piecewise Linear Controller Generation
Behavior Adaptation Bayesian Network Combination
Behavior Controller Generation
Convert the Baysian Network learned into L0EICA controller
Command Process & Action Process & Link Effect Channel
Motor Babbling
PLGC
[Mohammad & Nishida 2010c]
Motor Babbling
iFgenerate a straight line in one dimension while minimizing disturbance
to all others
Mohammad & Nishida 2010c
PLGC
[Mohammad & Nishida 2010c]
Building Blocks Behavior Discovery
Motif Discovery Change Point Detection
Behavior Association Bayesian Network Induction
Causality Analysis
Behavior Generation Piecewise Linear Controller Generation
Behavior Adaptation Bayesian Network Combination
Accumulation Phase
a
b
c
d
f
e
1
2
3
4
5
a1
b2
c 3
d4
f5
e
6
6
ABN Combination Main assumption
Action nodes are more compatible than gesture nodes
Algorithm1. Associate action nodes with similar stored pattern
Set of action node association links
2. Associate gesture nodes with similar stored pattern Set of action node association links
3. Calculate Link Competence Index for association links
Set of LCIs for gestures and actions
4. Resolve association link conflicts using LCIs Final ABN
1 12 2,i ij jla v la
1 12 2lg , lgi ij jv
1 12 2, lgi ij jLCI la LCI
Associating action/gesture nodes Compile AN1 and AN2 lists {every action node} Calculate Calculate for all nodes and order
them Create a link iff
for any
Set
Gesture association links are calculated the same way
21jila
1 2,DTW i kd m m 1 1min ,i DTW i kd m m
1 2,DTW i j id m m
1 2 1 2, ,DTW i j DTW i k id m m d m m
2 2 21: existsl
k l im m la
2 1 21 ,ji DTW i jv la d m m
[Mohammad & Nishida 2010c]
LCI Calculation
[Mohammad & Nishida 2010c]
Putting It All Together
Guided Navigation
Guided Navigation
o Task Orientedo Explicit Protocolo 1 way Interaction
Roles: Actor & Operator Protocol: explicit Nonverbal Behaviors:
Operator’s Gesture Actor’s motion
Sensors Accelerometers (BPACK) Motion Capture (PhaseSpace)
Procedure1. Offline experiment2. Online experiment
[Mohammad and Nishida 2009c]
Guided Navigation – Online experiment 18 subjects (6 days) Task: operator in GN scenario Procedure
WOZ session (training & familiarization)
3 Sessions on these conditions: WOZ Per-participant learner Accumulating learner
[Mohammad and Nishida 2010c]
Experimental Setup
[Mohammad and Nishida 2009c]
Action and Gesture Streams
ax
ay
0r0
a
Action Stream5D
Gesture Stream6D
lx
ly
lz
a
a
a
rx
ry
rz
a
a
a
Guided Navigation – Online Examples
Guided Navigation – Online Examples Number of failures:
Per-Participant 1/18 Accumulating 4/17
0 2 4 6 8 10 12 14 16 181
2
3
4
5
6
7
Participant Order
Ave
rage
Sco
re
AccomulatingWOZPer Participant
Conclusions Unsupervised learning of interaction protocols is possible using
three main data mining technologies: Motif discovery <The Heart> Change point discovery <The speeding engine> Causality analysis <Natural delays>
Several algorithms for solving these three problems were introduced and are available (among others) in source code as a MATLAB toolbox called CPMD
We have shown that it is possible to learn both implicit interaction protocols (gaze control) and explicit interaction protocols (guided navigation) without explicit modeling
By manipulating learned BNs, it is possible to improve the interactive behavior of agents over time based on interactions with multiple people
References [Ishiguro et al. 1999] Ishiguro, H.; Kanda, T.; Kimoto, K.; Ishida, T., "A robot architecture based on situated
modules," Intelligent Robots and Systems, 1999. IROS '99. Proceedings. 1999 IEEE/RSJ International Conference on , vol.3, no., pp.1617,1624 vol.3, 1999
[Catalano 2006] Joe Catalano, Tom Armstrong, and Tim Oates. Discovering patterns in real-valued time series. In Knowledge Discovery in Databases: PKDD 2006, pages 462–469, 2006.
[Chiu 2003] B. Chiu, E. Keogh, and S. Lonardi, “Probabilistic discovery of time series motifs,” in KDD ’03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. New York, NY, USA: ACM, 2003, pp. 493–498.
[Ide 2005] T. Ide and K. Inoue, “Knowledge discovery from heterogeneous dynamic systems using change-point correlations,” in Proc. SIAM Intl. Conf. Data Mining, 2005.
[Kanda et al. 2007] Kanda, T., Kamasima, M., Imai, M. et al. “A Humanoid Robot That Pretends to Listen to Route Guidance from a Human”, Auton. Robots, Vol. 22, Number 1, pages 87-100, 2007.
[Mohammad and Nishida 2009a] Yasser Mohammad, Toyoaki Nishida, Shogo Okada, “Unsupervised Simultaneous Learning of Gestures, Actions and their Associations for Human-Robot Interaction," . IEEE/RSJ International Conference on Intelligent Robots and Systems, 2009. IROS 2009, pp.2537-2544, 11-15 Oct. 2009
[Mohammad and Nishida 2009b] Yasser Mohammad and Toyoaki Nishida, Robust Singular Spectrum Transform, The Twenty Second International Conference on Industrial, Engineering & Other Applications of Applied Intelligent Systems (IEA/AIE 2009), June 2009, Taiwan, pp 123-132.
[Mohammad and Nishida 2009c] Yasser Mohammad and Toyoaki Nishida, Measuring Naturalness During Close Encounters Using Physiological Signal Processing, The Twenty Second International Conference on Industrial, Engineering & Other Applications of Applied Intelligent Systems (IEA/AIE 2009), June 2009, Taiwan, pp. 281-290
[Mohammad and Nishida 2009d]] Yasser Mohammad and Toyoaki Nishida, Using Physiological Signals to Detect Natural Interactive Behavior, Applied Intelligence, 13(1) 79-92
[Mohammad 2009 PhDThesis] Yasser Mohammad, Autonomous Development of Natural Interactive Behavior for Robots and Embodied Agents, PhD Thesis, Kyoto University, September 2009
[Mohammad and Nishida 2010a] Yasser Mohammad, Toyoaki Nishida, Learning Interaction Protocols using Augmented Baysian Networks Applied to Guided Navigation, Taipei, Taiwan, IROS 2010.
[Mohammad and Nishida 2010c] Yasser Mohammad, Toyoaki Nishida, Learning Interaction Protocols using Augmented Baysian Networks Applied to Guided Navigation, Taipei, Taiwan, IROS 2010.
[Mohammad and Nishida 2010d] Yasser Mohammad and Toyoaki Nishida, Controlling Gaze with an Embodied Interactive Control Architecture, Applied Intelligence, Vol. 32, No. 2, 2010, pp 148-163