Representing hierarchical POMDPs as DBNs for multi-scale robot localization G. Thocharous, K....

Representing hierarchical POMDPs as DBNs for multi-scale

robot localization

G. Thocharous, K. Murphy, L. Kaelbling

Presented by: Hannaneh Hajishirzi

Outline

• Define H-HMM– Flattening H-HMM

• Define H-POMDP– Flattening H-POMDP

• Approximate H-POMDP with DBN

• Inference and Learning in H-POMDP

Introduction

• H-POMDPs represent state-space at multiple levels of abstraction– Scale much better to larger environments– Simplify planning

• Abstract states are more deterministic

– Simplify learning• Number of free parameters is reduced

Hierarchical HMMs

• A generalization of HMM to model hierarchical structure domains– Application: NLP

• Concrete states: emit single observation

• Abstract states: emit strings of observations

• Emitted strings by abstract states are governed by sub-HMMs

Example

• HHMM representing a(xy)+b | c(xy)+d

When the sub-HHMM is finished, control is returned to wherever it was called from

HHMM to HMM• Create a state for every leaf in HHMM

HHMM to HMM• Create a state for every leaf in HHMM• Flat transition probability =

Sum( P( all paths in HHMM))

• Disadvantages: Flattening loses modularity Learning requires more samples

Representing HHMMs as DBNs

dtQ : state at level d

if HMM at level d finished

:1dtF

H-POMDPs

• HHMMs with inputs and reward function• Problems:

– Planning: Find mapping from belief states to actions

– Filtering: Compute the belief state online– Smoothing: Compute offline– Learning: Find MLE of model parameters

),|( :1:1 TTt uyxP

H-POMDP for Robot Navigation

Flat model

Hierarchical model4

* Abstract state: Xt1 (1..4)

* Concrete state: Xt2 (1..3)

* Observation: Yt (4 bits)

* Robot position: Xt

(1..10)

In this paper, Ignore the problem of how to choose the actions

State Transition Diagram for 2-H-POMDP

Sample path:

State Transition Diagram for Corridor Environment

Abstract States

Entry States

Exit States

Concrete States

Flattening H-POMDPs

• Advantages of H-POMDP over corresponding POMDP:– Learning is easier: Learn sub-models– Planning is easier: Reason in terms of “macro” actions

tU

tL

t

1tL

1t

STATE POMDP FACTORED DBN POMDP

0.080.01

0.7

0.05

0.08

0.01

Dynamic Bayesian Networks

# of parameters

34 3 4 40

# of parameters

12 * 9 108

STATE H-POMDP FACTORED DBN H-POMDP

1tU

1tE

tU

1tL

t

11tL

1t

tY1tY

2tL 2

1tL

tE

EAST WESTEAST WESTWEST EASTWEST EAST

Representing H-POMDPs as DBNs

H-POMDPs as DBNs

1tL : Abstract location

: Orientationt2tL : Concrete location

tE : Exit node (5 values)

tY : Observation

tU : Action node

Representing no-exit, s, n, l, r -exit

Transition Model

),,(

),(),|(

212

12

jeiH

jieEiLjLP ttt

If e = no-exit

otherwise

Abstract horizontal transition matrix

Transition Model

),,(

),(),|(

2122

jeiH

jieEiLjLP ttt

If e = no-exit

otherwise

),,,|( 211

11

1 aLeEiLjLP ttttt

),,(

),,,( 11

jaeV

jaiH t If e = no-exit

otherwise

Concrete vertical entry vector

Concrete horizontal transition matrix

),,,(),,|( 21 eajXaLjLeEP ttttt Probability of entering exit state e

Observation Model

• Probability of seeing a wall or opening on each of 4 sides of the robot

• Naïve Bayes assumption: where

• Map global coordinate frame to robot’s local coordinate frame

Then,

Learn the appearance of the cell in all directions

4

1

)|()|(i

tittt XYPXYP ),,( 21

tttt LLX

),|(),,,( 21 aLjLyYPyjaB ttt

),,,(),,|( 18021 yRjaBaLjLyYP ttt

Bt

Example

1.9.

9.1.2H

100

9.1.0

09.1.1

1,H

Inference

• Online filtering: – Input of controller: MLE of the abstract and concrete

states

• Offline smoothing:– O(DK1.5D T) D: # of dimensions

K: # of states in each level– 1.5D: size of largest clique in DBN =

The state nodes at t-1 + half of the state nodes at t– Approximation (belief propagation): O(DKT)

),|( :1:1 TTt uyXP

),|( :1:1 ttt uyXP

Learning

• Maximum likelihood parameter estimate using EM• In E step, compute:

• In M step, compute normalizing matrix of expected counts:

),|)(,( :1:1 TTtt uyVPaVP

' 2

2

2

2

2

12

122

),,,(

),,,(),,,(

)|,,(),,,(

j

T

t

T

t

ttt

jeitH

jeitHjeitH

OeEiLjLPjeitH

Learning (Cont.)

)|,,,(),,,,( 21 OeEaLjLPeajtX ttttt

)|,,,,(),,,,( 1211

111

1 OjLaLnEiLPjaitH tttttt

neOjLaLeEPjaetV tttt ),|,,,(),,,( 1211

Concrete horizontal transition matrix:

Exit probabilities:

Vertical transition vector:

Estimating Observation Model

• Map local observations into world-centered

Probability of observing y, facing North

Hierarchical Localizes better

)|()( :1,:1 tttt uysXPsb

T

t t sb1 )(accuracyon localizati

Factored DBN H-POMDP

H-POMDP

STATE POMDP

Before training

Conclusions

• Represent H-POMDPs with DBNs– Learn large models with less data

• Difference with SLAM: – SLAM is harder to generalize

Complexity of Inference

STATE H-POMDP FACTORED

DBN H-POMDP

O(ST 3)

O(S1.5T)

EAST WESTEAST WESTWEST EASTWEST EAST

1tA

1tE

tA

1tL

t

11tL

1t

tO1tO

2tL 2

1tL

tE

DKS Number of states:

Representing hierarchical POMDPs as DBNs for multi-scale robot localization G. Thocharous, K....

Documents

Transcript of Representing hierarchical POMDPs as DBNs for multi-scale robot localization G. Thocharous, K....