Representing hierarchical POMDPs as DBNs for multi-scale robot localization G. Thocharous, K....
-
Upload
annabel-joy-parks -
Category
Documents
-
view
218 -
download
0
Transcript of Representing hierarchical POMDPs as DBNs for multi-scale robot localization G. Thocharous, K....
Representing hierarchical POMDPs as DBNs for multi-scale
robot localization
G. Thocharous, K. Murphy, L. Kaelbling
Presented by: Hannaneh Hajishirzi
Outline
• Define H-HMM– Flattening H-HMM
• Define H-POMDP– Flattening H-POMDP
• Approximate H-POMDP with DBN
• Inference and Learning in H-POMDP
Introduction
• H-POMDPs represent state-space at multiple levels of abstraction– Scale much better to larger environments– Simplify planning
• Abstract states are more deterministic
– Simplify learning• Number of free parameters is reduced
Hierarchical HMMs
• A generalization of HMM to model hierarchical structure domains– Application: NLP
• Concrete states: emit single observation
• Abstract states: emit strings of observations
• Emitted strings by abstract states are governed by sub-HMMs
Example
• HHMM representing a(xy)+b | c(xy)+d
When the sub-HHMM is finished, control is returned to wherever it was called from
HHMM to HMM• Create a state for every leaf in HHMM
HHMM to HMM• Create a state for every leaf in HHMM• Flat transition probability =
Sum( P( all paths in HHMM))
• Disadvantages: Flattening loses modularity Learning requires more samples
Representing HHMMs as DBNs
dtQ : state at level d
if HMM at level d finished
:1dtF
H-POMDPs
• HHMMs with inputs and reward function• Problems:
– Planning: Find mapping from belief states to actions
– Filtering: Compute the belief state online– Smoothing: Compute offline– Learning: Find MLE of model parameters
),|( :1:1 TTt uyxP
H-POMDP for Robot Navigation
Flat model
Hierarchical model4
* Abstract state: Xt1 (1..4)
* Concrete state: Xt2 (1..3)
* Observation: Yt (4 bits)
* Robot position: Xt
(1..10)
In this paper, Ignore the problem of how to choose the actions
State Transition Diagram for 2-H-POMDP
Sample path:
State Transition Diagram for Corridor Environment
Abstract States
Entry States
Exit States
Concrete States
Flattening H-POMDPs
• Advantages of H-POMDP over corresponding POMDP:– Learning is easier: Learn sub-models– Planning is easier: Reason in terms of “macro” actions
tU
tL
t
1tL
1t
STATE POMDP FACTORED DBN POMDP
0.080.01
0.7
0.05
0.08
0.01
Dynamic Bayesian Networks
# of parameters
34 3 4 40
# of parameters
12 * 9 108
STATE H-POMDP FACTORED DBN H-POMDP
1tU
1tE
tU
1tL
t
11tL
1t
tY1tY
2tL 2
1tL
tE
EAST WESTEAST WESTWEST EASTWEST EAST
Representing H-POMDPs as DBNs
STATE H-POMDP FACTORED DBN H-POMDP
1tU
1tE
tU
1tL
t
11tL
1t
tY1tY
2tL 2
1tL
tE
EAST WESTEAST WESTWEST EASTWEST EAST
Representing H-POMDPs as DBNs
STATE H-POMDP FACTORED DBN H-POMDP
1tU
1tE
tU
1tL
t
11tL
1t
tY1tY
2tL 2
1tL
tE
EAST WESTEAST WESTWEST EASTWEST EAST
Representing H-POMDPs as DBNs
STATE H-POMDP FACTORED DBN H-POMDP
1tU
1tE
tU
1tL
t
11tL
1t
tY1tY
2tL 2
1tL
tE
EAST WESTEAST WESTWEST EASTWEST EAST
Representing H-POMDPs as DBNs
STATE H-POMDP FACTORED DBN H-POMDP
1tU
1tE
tU
1tL
t
11tL
1t
tY1tY
2tL 2
1tL
tE
EAST WESTEAST WESTWEST EASTWEST EAST
Representing H-POMDPs as DBNs
H-POMDPs as DBNs
1tL : Abstract location
: Orientationt2tL : Concrete location
tE : Exit node (5 values)
tY : Observation
tU : Action node
Representing no-exit, s, n, l, r -exit
Transition Model
),,(
),(),|(
212
12
jeiH
jieEiLjLP ttt
If e = no-exit
otherwise
Abstract horizontal transition matrix
Transition Model
),,(
),(),|(
2122
jeiH
jieEiLjLP ttt
If e = no-exit
otherwise
),,,|( 211
11
1 aLeEiLjLP ttttt
),,(
),,,( 11
jaeV
jaiH t If e = no-exit
otherwise
Concrete vertical entry vector
Concrete horizontal transition matrix
),,,(),,|( 21 eajXaLjLeEP ttttt Probability of entering exit state e
Observation Model
• Probability of seeing a wall or opening on each of 4 sides of the robot
• Naïve Bayes assumption: where
• Map global coordinate frame to robot’s local coordinate frame
Then,
Learn the appearance of the cell in all directions
4
1
)|()|(i
tittt XYPXYP ),,( 21
tttt LLX
),|(),,,( 21 aLjLyYPyjaB ttt
),,,(),,|( 18021 yRjaBaLjLyYP ttt
Bt
Example
1.9.
9.1.2H
100
9.1.0
09.1.1
1,H
Inference
• Online filtering: – Input of controller: MLE of the abstract and concrete
states
• Offline smoothing:– O(DK1.5D T) D: # of dimensions
K: # of states in each level– 1.5D: size of largest clique in DBN =
The state nodes at t-1 + half of the state nodes at t– Approximation (belief propagation): O(DKT)
),|( :1:1 TTt uyXP
),|( :1:1 ttt uyXP
Learning
• Maximum likelihood parameter estimate using EM• In E step, compute:
• In M step, compute normalizing matrix of expected counts:
),|)(,( :1:1 TTtt uyVPaVP
' 2
2
2
2
2
12
122
),,,(
),,,(),,,(
)|,,(),,,(
j
T
t
T
t
ttt
jeitH
jeitHjeitH
OeEiLjLPjeitH
Learning (Cont.)
)|,,,(),,,,( 21 OeEaLjLPeajtX ttttt
)|,,,,(),,,,( 1211
111
1 OjLaLnEiLPjaitH tttttt
neOjLaLeEPjaetV tttt ),|,,,(),,,( 1211
Concrete horizontal transition matrix:
Exit probabilities:
Vertical transition vector:
Estimating Observation Model
• Map local observations into world-centered
Probability of observing y, facing North
Hierarchical Localizes better
)|()( :1,:1 tttt uysXPsb
T
t t sb1 )(accuracyon localizati
Factored DBN H-POMDP
H-POMDP
STATE POMDP
Before training
Conclusions
• Represent H-POMDPs with DBNs– Learn large models with less data
• Difference with SLAM: – SLAM is harder to generalize
Complexity of Inference
STATE H-POMDP FACTORED
DBN H-POMDP
O(ST 3)
O(S1.5T)
EAST WESTEAST WESTWEST EASTWEST EAST
1tA
1tE
tA
1tL
t
11tL
1t
tO1tO
2tL 2
1tL
tE
DKS Number of states: