Finding Approximate POMDP Solutions through Belief Compression

Post on 10-Feb-2016

32 views 0 download

Tags:

description

Finding Approximate POMDP Solutions through Belief Compression. Based on slides by Nicholas Roy, MIT. Estimated robot position Robot position distribution True robot position Goal position. Reliable Navigation. Conventional trajectories may not be robust to localisation error. Control. - PowerPoint PPT Presentation

Transcript of Finding Approximate POMDP Solutions through Belief Compression

Based on slides byNicholas Roy, MIT

Finding Approximate POMDP Solutions through Belief Compression

Reliable Navigation

Conventional trajectories may not be robust to localisation error

Estimated robot positionRobot position distribution

True robot positionGoal position

Perception and Control

Perception Control

World state

Control algorithms

Perception and Control

Assumed full observability

Exact POMDP planning

Probabilistic Perception

ModelP(x) argmax P(x) Control

World state World state

Probabilistic Perception

ModelP(x) Control

Brittle

Intractable

Perception and Control

Assume full observability

Exact POMDP planning

Brittle

World state

Probabilistic Perception

ModelP(x) Compressed P(x) Control

Intractable

Main Insight

World state

Probabilistic Perception

ModelP(x) Low-dimensional P(x) Control

Good policies for real world POMDPs can be found by planning over low-dimensional representations

of the belief space.

but not usually.

The controller may be globally uncertain...

Belief Space Structure

Coastal Navigation

Represent beliefs using

Discretise into low-dimensional belief space MDP

)();(maxarg~ bHsbbs

Coastal Navigation

A Hard Navigation Problem

0

1

2

3

4

5

6

7

8

9

Maximum Likelihood AMDP

Dis

tanc

e in

M

Average Distance to Goal

Dimensionality Reduction

Principal Components Analysis

Original Beliefs

WeightsCharacteristicBeliefs

Principal Components Analysis

Given belief bn, we want bm, m«n.

Collection of beliefs drawn from 200 state problem

Prob

abili

ty o

f bei

ng in

stat

e

State

~

One sample distribution

m=9 gives this representation for one sample distribution

Principal Components Analysis

Given belief bn, we want bm, m«n.

Prob

abili

ty o

f bei

ng in

stat

e

State

~

Principal Components Analysis

Many real world POMDP distributions are characterised by large regions of low probability.

Idea: Create fitting criterion that is (exponentially) stronger in low-probability regions (E-PCA)

1 basis2 bases3 bases4 bases

Example EPCA

State

Prob

abili

ty o

f bei

ng in

stat

e

Example Reduction

E-PCA will indicate appropriate number of bases, depending on beliefs encountered

Finding Dimensionality

Planning

S1

S2

S3Original POMDP Low-dimensional

belief space B

E-PCA

Discrete beliefspace MDP

Discretise

~

Model Parameters

Reward function

R(b)

s1 s2 s3

p(s)

Back-project to high dimensional belief

S

b sRspsREbR )()())(()(

Compute expected reward from belief:~~

Model Parameters

Low dimensionFull dimension

~1. For each belief bi and action a

bi

~3. Propagate according to

action

bj

4. Propagate according toobservation

bj

~

~5. Recover bj

||

1

||

1

||

1

)(),|()|()~

,,~

(bZ

k

S

l

S

mmjmllkji sbasspszpbabT

6. Set T(bi, a, bj) to probabilityof observation

~~ bi

~2. Recover full belief bi

Robot Navigation Example

True (hidden) robot positionGoal position

Goal state

Initial Distribution

Robot Navigation Example

True robot positionGoal position

Policy Comparison

0

1

2

3

4

5

6

7

8

9

Maximum Likelihood AMDP E-PCA

Average Distance to GoalD

ista

nce

in M

6 bases

People Finding

People Finding as a POMDP

Fully Observable Robot

Position of person unknownRobot position

True person position

Finding and Tracking People

Robot positionTrue person position

People Finding as a POMDP

Factored belief space2 dimensions: fully-observable robot position6 dimensions: distribution over person positions

Regular grid gives ≈ 1016 states

Variable Resolution

Non-regular grid using samples

b1b2 b3 b4

b5

T(b1, a1, b2)

T(b1, a2, b5)

Compute model parameters using nearest-neighbour

~ ~

~ ~

~

~~

~ ~

Refining the Grid

V(b1)~

V(b'1)~

Sample beliefs according to policy

b1

~

b'~

Construct new model~ ~Keep new belief if V(b'1) > V(b1)

The Optimal Policy

Original distribution

Reconstruction using EPCA and 6 bases

Robot positionTrue person position

0

50

100

150

200

250

Closest Densest MaximumLikelihood

E-PCA RefinedE-PCA

Policy Comparison

Average time to find person

Ave

rage

# o

f Act

ions

to fi

nd P

erso

n

E-PCA: 72 statesRefined E-PCA: 260 states

Fully observable MDP

Nick’s Thesis Contributions

Good policies for real world POMDPs can be found by planning over a low-dimensional representation of the belief space, using E-PCA.

POMDPs can scale to bigger, more complicated real-world problems.POMDPs can be used for real deployed robots.