Finding Approximate POMDP Solutions through Belief Compression

Based on slides byNicholas Roy, MIT

Reliable Navigation

Conventional trajectories may not be robust to localisation error

Estimated robot positionRobot position distribution

True robot positionGoal position

Perception and Control

Perception Control

World state

Control algorithms

Assumed full observability

Exact POMDP planning

Probabilistic Perception

ModelP(x) argmax P(x) Control

World state World state

ModelP(x) Control

Brittle

Intractable

Assume full observability

Exact POMDP planning

Brittle

World state

ModelP(x) Compressed P(x) Control

Intractable

Main Insight

World state

ModelP(x) Low-dimensional P(x) Control

Good policies for real world POMDPs can be found by planning over low-dimensional representations

of the belief space.

but not usually.

The controller may be globally uncertain...

Belief Space Structure

Coastal Navigation

Represent beliefs using

Discretise into low-dimensional belief space MDP

)();(maxarg~ bHsbbs

Coastal Navigation

A Hard Navigation Problem

Maximum Likelihood AMDP

Average Distance to Goal

Dimensionality Reduction

Principal Components Analysis

Original Beliefs

WeightsCharacteristicBeliefs

Given belief bn, we want bm, m«n.

Collection of beliefs drawn from 200 state problem

One sample distribution

m=9 gives this representation for one sample distribution

Given belief bn, we want bm, m«n.

Many real world POMDP distributions are characterised by large regions of low probability.

Idea: Create fitting criterion that is (exponentially) stronger in low-probability regions (E-PCA)

1 basis2 bases3 bases4 bases

Example EPCA

Example Reduction

E-PCA will indicate appropriate number of bases, depending on beliefs encountered

Finding Dimensionality

Planning

S3Original POMDP Low-dimensional

belief space B

Discrete beliefspace MDP

Discretise

Model Parameters

Reward function

s1 s2 s3

Back-project to high dimensional belief

b sRspsREbR )()())(()(

Compute expected reward from belief:~~

Model Parameters

Low dimensionFull dimension

~1. For each belief bi and action a

~3. Propagate according to

action

4. Propagate according toobservation

~5. Recover bj

)(),|()|()~

mmjmllkji sbasspszpbabT

6. Set T(bi, a, bj) to probabilityof observation

~2. Recover full belief bi

Robot Navigation Example

True (hidden) robot positionGoal position

Goal state

Initial Distribution

Robot Navigation Example

True robot positionGoal position

Policy Comparison

Maximum Likelihood AMDP E-PCA

Average Distance to GoalD

6 bases

People Finding

People Finding as a POMDP

Fully Observable Robot

Position of person unknownRobot position

True person position

Finding and Tracking People

Robot positionTrue person position

People Finding as a POMDP

Factored belief space2 dimensions: fully-observable robot position6 dimensions: distribution over person positions

Regular grid gives ≈ 1016 states

Variable Resolution

Non-regular grid using samples

b1b2 b3 b4

T(b1, a1, b2)

T(b1, a2, b5)

Compute model parameters using nearest-neighbour

Refining the Grid

V(b1)~

V(b'1)~

Sample beliefs according to policy

Construct new model~ ~Keep new belief if V(b'1) > V(b1)

The Optimal Policy

Original distribution

Reconstruction using EPCA and 6 bases

Robot positionTrue person position

Closest Densest MaximumLikelihood

E-PCA RefinedE-PCA

Policy Comparison

Average time to find person

E-PCA: 72 statesRefined E-PCA: 260 states

Fully observable MDP

Nick’s Thesis Contributions

Good policies for real world POMDPs can be found by planning over a low-dimensional representation of the belief space, using E-PCA.

POMDPs can scale to bigger, more complicated real-world problems.POMDPs can be used for real deployed robots.

Finding Approximate POMDP Solutions through Belief Compression

Documents

Transcript of Finding Approximate POMDP Solutions through Belief Compression

Planning Robust Manual Tasks in Hierarchical Belief Spaces · (POMDP) to address both of these problems at once is dif-ﬁcult. As a result, traditional approaches in this setting

Belief Propagation. What is Belief Propagation (BP)? BP is a specific instance of a general class of methods that exist for approximate inference in Bayes.

MAGIC: Learning Macro-Actions for Online POMDP Planning

Bayesian update of dialogue state: A POMDP framework for ...mi.eng.cam.ac.uk/research/dialogue/papers/thyo09.pdf · Bayesian belief updating is made possible using a novel form of

POMDP Seminar Backup3

POMDP solution methods - University of Torontodarius/papers/POMDP_survey.pdf · POMDP solution methods Darius Braziunas Department of Computer Science University of Toronto 2003 Abstract

DESPOT: Online POMDP Planning with Regularization · (DESPOT) as a sparse approximation to the standard belief tree. The construction of a DESPOT leverages a set of randomly sampled

Probabilistic Graphical Modelscourses.cms.caltech.edu/cs155/slides/cs155-13-approximate-marke… · Probabilistic Graphical Models Lecture 13 –Loopy Belief Propagation CS/CNS/EE

A POMDP Tutorial - cs.mcgill.cajpineau/talks/jpineau-dagstuhl13.pdf · Bayes-Adaptive POMDPs: Belief tracking Assume S, A, Z are discrete. ... In practice, approximate with a particle

Finding Approximate POMDP Solutions Through Belief Compression · Decision Processes (POMDPs) are generally considered to be intractable for large models. The intractability of these

Approximate POMDP Algorithms

A Tractable POMDP for a Class of Sequencing Problems Paat ... · A Tractable POMDP for a Class of Sequencing Problems Paat Rusmevichientong, Benjamin Van Roy Stanford University Stanford,

Optimal Policies for POMDP

POMDP-based Decision Making for Cognitive Cars using an ... · POMDP-based Decision Making for Cognitive Cars using an Adaptive State Space. Study Thesis of Sebastian Klaas At the

Learning Dialogue POMDP Model Components from …...model from dialogues based on inverse reinforcement learning (IRL). In particular, we propose the POMDP-IRL-BT algorithm (BT for

Finding Approximate POMDP Solutions Through Belief Compressionrobots.stanford.edu/papers/Roy04a.pdf · cision Processes (POMDPs) are generally considered to be intractable for large

Finding Approximate POMDP Solutions Through Belief …ggordon/roy-gordon-thrun.belief-compression-jair.pdfRoy, Gordon, & Thrun Koller, & Russell, 1995; Isard & Blake, 1998) by using

CSI 661 - Uncertainty in A.I. Lecture 91. 2 3 Three Main Approaches To Approximate Inference MCMC Variational Methods Loopy belief propagation.

D1.3: POMDP Learning for ISU Dialogue Management · D1.3: POMDP Learning for ISU Dialogue Management Paul Crook, James Henderson, Oliver Lemon, Xingkun Liu Distribution: Public CLASSiC

A POMDP Extension with Belief-dependent Rewards (Extended ... · Centre de recherche INRIA Nancy – Grand Est LORIA, Technopôle de Nancy-Brabois, Campus scientiﬁque, 615, rue