Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004
description
Transcript of Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004
Tractable Planning for Real-World Robotics:
The promises and challenges of dealing with uncertainty
Joelle PineauRobotics Institute
Carnegie Mellon University
Stanford University
May 3, 2004
Joelle Pineau Tractable Planning for Real-World Robotics 2
Robots in unstructured environments
Joelle Pineau Tractable Planning for Real-World Robotics 3
A vision for robotic-assisted health-care
Moving thingsaround
Moving thingsaround Supporting
inter-personalcommunication
Supportinginter-personal
communication
Calling for helpin emergencies
Calling for helpin emergencies
Monitoring Rx adherence
& safety
Monitoring Rx adherence
& safety
Providinginformation Providing
information
Reminding to eat, drink, & take meds
Reminding to eat, drink, & take meds
Providing physical
assistance
Providing physical
assistance
Joelle Pineau Tractable Planning for Real-World Robotics 4
What is hard about planning for real-world robotics?
• Generating a plan that has high expected utility.
• Switching between different representations of the world.
• Resolving conflicts between interfering jobs.
• Accomplishing jobs in changing, partly unknown environments.
• Handling percepts which are incomplete, ambiguous, outdated,
incorrect.
* Highlights from a proposed robot grand challenge
Joelle Pineau Tractable Planning for Real-World Robotics 5
Talk outline
• Uncertainty in plan-based robotics
• Partially Observable Markov Decision Processes (POMDPs)
• POMDP solver #1: Point-based value iteration (PBVI)
• POMDP solver #2: Policy-contingent abstraction (PolCA+)
Joelle Pineau Tractable Planning for Real-World Robotics 6
Why use a POMDP?
• POMDPs provide a rich framework for sequential decision-making, which can model:
– Effect uncertainty
– State uncertainty
– Varying rewards across actions and goals
Joelle Pineau Tractable Planning for Real-World Robotics 7
Robotics:
• Robust mobile robot navigation[Simmons & Koenig, 1995; + many more]
• Autonomous helicopter control[Bagnell & Schneider, 2001; Ng et al., 2003]
• Machine vision[Bandera et al., 1996; Darrell & Pentland, 1996]
• High-level robot control[Pineau et al., 2003]
• Robust dialogue management[Roy, Pineau & Thrun, 2000; Peak & Horvitz, 2000]
POMDP applications in last decade
Others:
• Machine maintenance[Puterman., 1994]
• Network troubleshooting[Thiebeaux et al., 1996]
• Circuits testing [correspondence, 2004]
• Preference elicitation[Boutilier, 2002]
• Medical diagnosis[Hauskrecht, 1997]
Joelle Pineau Tractable Planning for Real-World Robotics 8
POMDP model
POMDP is n-tuple { S, A, Z, T, O, R }:
What goes on: st-1 st
at-1 at
S = state setA = action setZ = observation set
What we see: zt-1 zt
Joelle Pineau Tractable Planning for Real-World Robotics 9
POMDP model
POMDP is n-tuple { S, A, Z, T, O, R }:
What goes on: st-1 st
at-1 at
T: Pr(s’|s,a) = state-to-state transition probabilitiesS = state setA = action setZ = observation set
What we see: zt-1 zt
Joelle Pineau Tractable Planning for Real-World Robotics 10
POMDP model
POMDP is n-tuple { S, A, Z, T, O, R }:
What goes on: st-1 st
at-1 at
T: Pr(s’|s,a) = state-to-state transition probabilitiesO: Pr(z|s,a) = observation generation probabilities
S = state setA = action setZ = observation set
What we see: zt-1 zt
Joelle Pineau Tractable Planning for Real-World Robotics 11
POMDP model
POMDP is n-tuple { S, A, Z, T, O, R }:
What goes on: st-1 st
at-1 at
T: Pr(s’|s,a) = state-to-state transition probabilitiesO: Pr(z|s,a) = observation generation probabilitiesR(s,a) = reward function
S = state setA = action setZ = observation set
What we see: zt-1 zt
rt-1 rt
Joelle Pineau Tractable Planning for Real-World Robotics 12
POMDP model
POMDP is n-tuple { S, A, Z, T, O, R }:
What goes on: st-1 st
at-1 atWhat we see: zt-1 zt
What we infer: bt-1 bt
rt-1 rt
T: Pr(s’|s,a) = state-to-state transition probabilitiesO: Pr(z|s,a) = observation generation probabilitiesR(s,a) = reward function
S = state setA = action setZ = observation set
Joelle Pineau Tractable Planning for Real-World Robotics 13
Examples of robot beliefs
robot particles
Uniform belief Bi-modal belief
Joelle Pineau Tractable Planning for Real-World Robotics 14
Understanding the belief state
• A belief is a probability distribution over states
Where Dim(B) = |S|-1
– E.g. Let S={s1, s2}
P(s1)
0
1
Joelle Pineau Tractable Planning for Real-World Robotics 15
Understanding the belief state
• A belief is a probability distribution over states
Where Dim(B) = |S|-1
– E.g. Let S={s1, s2, s3}
P(s1)
P(s2)
0
1
1
Joelle Pineau Tractable Planning for Real-World Robotics 16
Understanding the belief state
• A belief is a probability distribution over states
Where Dim(B) = |S|-1
– E.g. Let S={s1, s2, s3 , s4}
P(s1)
P(s2)
0
1
1
P(s3)
Joelle Pineau Tractable Planning for Real-World Robotics 17
POMDP solving
Objective: Find the sequence of actions that maximizes the expected sum of rewards.
Bb
AabVbabTabRbV
'
)'()',,(),(max)(
Valuefunction
Immediatereward
Futurereward
Joelle Pineau Tractable Planning for Real-World Robotics 18
• Represent V(b) as the upper surface of a set of vectors.– Each vector is a piece of the control policy (= action sequences).
– Dim(vector) = number of states.
• Modify / add vectors to update value fn (i.e. refine policy).
POMDP value function
P(s1)
V(b)
b
2 states
Joelle Pineau Tractable Planning for Real-World Robotics 19
Optimal POMDP solving
• Simple problem: 2 states, 3 actions, 3 observations
V0(b)
b
Plan length #vectors 0 1
P(break-in)
Joelle Pineau Tractable Planning for Real-World Robotics 20
Optimal POMDP solving
• Simple problem: 2 states, 3 actions, 3 observations
P(break-in)
V1(b)
b
Plan length # vectors 0 1 1 3
Call-911
Investigate
Go-to-bed
Joelle Pineau Tractable Planning for Real-World Robotics 21
Optimal POMDP solving
• Simple problem: 2 states, 3 actions, 3 observations
V2(b)
b
Plan length # vectors 0 1 1 3 2 27
P(break-in)
Call-911
Investigate
Go-to-bed
Joelle Pineau Tractable Planning for Real-World Robotics 22
Optimal POMDP solving
• Simple problem: 2 states, 3 actions, 3 observations
V3(b)
b
Plan length # vectors 0 1 1 3 2 27 3 2187
P(break-in)
Call-911
Investigate
Go-to-bed
Joelle Pineau Tractable Planning for Real-World Robotics 23
Optimal POMDP solving
• Simple problem: 2 states, 3 actions, 3 observations
Plan length # vectors 0 1 1 3 2 27 3 2187 4 14,348,907V4(b)
b
P(break-in)
Call-911
Investigate
Go-to-bed
Joelle Pineau Tractable Planning for Real-World Robotics 24
The curse of history
)A( Z1 nn O
Policy size grows exponentially with the
planning horizon:
Where Γ = policy sizen = planning horizonA = # actionsZ = # observations
Joelle Pineau Tractable Planning for Real-World Robotics 25
How many vectors for this problem?
104 (navigation) x 103 (dialogue) states1000+ observations100+ actions
Joelle Pineau Tractable Planning for Real-World Robotics 26
Talk outline
• Uncertainty in plan-based robotics
• Partially Observable Markov Decision Processes (POMDPs)
• POMDP solver #1: Point-based value iteration (PBVI)
• POMDP solver #2: Policy-contingent abstraction (PolCA+)
Joelle Pineau Tractable Planning for Real-World Robotics 27
Exact solving assumes all beliefs are equally likely
robot particles
Uniform belief Bi-modal belief N-modal belief
INSIGHT: No sequence of actions and observations canproduce this N-modal belief.
Joelle Pineau Tractable Planning for Real-World Robotics 28
A new algorithm: Point-based value iteration
P(s1)
V(b)
b1 b0 b2
Approach:
Select a small set of belief points
Plan for those belief points only
Joelle Pineau Tractable Planning for Real-World Robotics 29
A new algorithm: Point-based value iteration
P(s1)
V(b)
b1 b0 b2a,z a,z
Approach:
Select a small set of belief points Use well-separated, reachable beliefs
Plan for those belief points only
Joelle Pineau Tractable Planning for Real-World Robotics 30
A new algorithm: Point-based value iteration
P(s1)
V(b)
b1 b0 b2a,z a,z
Approach:
Select a small set of belief points Use well-separated, reachable beliefs
Plan for those belief points only Learn value and its gradient
Joelle Pineau Tractable Planning for Real-World Robotics 31
A new algorithm: Point-based value iteration
Approach:
Select a small set of belief points Use well-separated, reachable beliefs
Plan for those belief points only Learn value and its gradient
Pick action that maximizes value: bbV
max)(
P(s1)
V(b)
bb1 b0 b2
Joelle Pineau Tractable Planning for Real-World Robotics 32
The anytime PBVI algorithm
• Alternate between:
1. Growing the set of belief point
2. Planning for those belief points
• Terminate when you run out of time or have a good policy.
Joelle Pineau Tractable Planning for Real-World Robotics 33
Complexity of value update
Exact Update PBVI
Time:Projection S2 A Z n S2 A Z n
Sum S A nZ S A Z n B
Size: (# vectors) A nZ B
where: S = # states n = # vectors at iteration n A = # actions B = # belief points
Z = # observations
Joelle Pineau Tractable Planning for Real-World Robotics 34
Theoretical properties of PBVI
Theorem: For any set of belief points B and planning horizon n, the error of the PBVI algorithm is bounded by:
P(s1)
V(b)
b1 b0 b2
Where is the set of reachable beliefsB is the set of all beliefs
1'2minmax* ||'||minmax
)1(
)(|||| bb
RRVV Bbbn
Bn
Err Err
Joelle Pineau Tractable Planning for Real-World Robotics 35
The anytime PBVI algorithm
• Alternate between:
1. Growing the set of belief point
2. Planning for those belief points
• Terminate when you run out of time or have a good policy.
Joelle Pineau Tractable Planning for Real-World Robotics 36
PBVI’s belief selection heuristic
1. Leverage insight from policy search methods:
– Focus on reachable beliefs.
P(s1)
b ba1,z2ba2,z2ba2,z1ba1,z1
a2,z2 a1,z2
a2,z1
a1,z1
Joelle Pineau Tractable Planning for Real-World Robotics 37
PBVI’s belief selection heuristic
1. Leverage insight from policy search methods:
– Focus on reachable beliefs.
2. Focus on high probability beliefs:
– Consider all actions, but stochastic observation choice.
P(s1)
b ba1,z2ba2,z1
a1,z2
a2,z1
ba2,z2ba1,z1
Joelle Pineau Tractable Planning for Real-World Robotics 38
PBVI’s belief selection heuristic
1. Leverage insight from policy search methods:
– Focus on reachable beliefs.
2. Focus on high probability beliefs:
– Consider all actions, but stochastic observation choice.
3. Use the error bound on point-based value updates:
– Select well-separated beliefs, rather than near-by beliefs.
P(s1)
b ba1,z2ba2,z1
a1,z2
a2,z1
Joelle Pineau Tractable Planning for Real-World Robotics 39
Classes of value function approximations
1. No belief [Littman et al., 1995]
3. Compressed belief[Poupart&Boutilier, 2002;
Roy&Gordon, 2002]
x1
x0
x2
2. Grid over belief[Lovejoy, 1991; Brafman 1997;
Hauskrecht, 2000; Zhou&Hansen, 2001]
4. Sample belief points[Poon, 2001; Pineau et al, 2003]
Joelle Pineau Tractable Planning for Real-World Robotics 40
Performance on well-known POMDPs
Maze1
0.20
0.94
0.00
2.30
2.25
REWARD
Maze2
0.11
-
0.07
0.35
0.34
Maze3
0.26
-
0.11
0.53
0.53
Maze1
0.19
-
24hrs
12166
3448
TIME
Maze2
1.44
-
24hrs
27898
360
Maze3
0.51
-
24hrs
450
288
Maze1
-
174
-
660
470
# Belief points
Maze2
-
337
-
1840
95
Maze3
-
-
-
300
86
Method
No belief[Littman&al., 1995]
Grid[Brafman., 1997]
Compressed[Poupart&al., 2003]
Sample[Poon, 2001]
PBVI[Pineau&al., 2003]
Maze1:36 states
Maze2:92 states
Maze3: 60 states
Joelle Pineau Tractable Planning for Real-World Robotics 41
PBVI in the Nursebot domain
Objective: Find the patient!
State space = RobotPosition PatientPosition
Observation space = RobotPosition + PatientFound
Action space = {North, South, East, West, Declare}
Joelle Pineau Tractable Planning for Real-World Robotics 42
PBVI performance on find-the-patient domain
Patient found 17% of trials
Patient found 90% of trialsNo Belief PBVI
No Belief
PBVI
Joelle Pineau Tractable Planning for Real-World Robotics 43
Validation of PBVI’s belief expansion heuristic
Greedy
PBVI
No BeliefRandom
Find-the-patient domain870 states, 5 actions, 30 observations
Joelle Pineau Tractable Planning for Real-World Robotics 44
Policy assuming full observability
You find some:(25%)
You loose some:(75%)
Joelle Pineau Tractable Planning for Real-World Robotics 45
PBVI Policy with 3141 belief points
You find some:(81%)
You loose some:(19%)
Joelle Pineau Tractable Planning for Real-World Robotics 46
PBVI Policy with 643 belief points
You find some:(22%)
You loose some:(78%)
Joelle Pineau Tractable Planning for Real-World Robotics 47
Highlights of the PBVI algorithm
• Algorithmic:
– New belief sampling algorithm.
– Efficient heuristic for belief point selection.
– Anytime performance.
• Experimental:
– Outperforms previous value approximation algorithms on known problems.
– Solves new larger problem (1 order of magnitude increase in problem size).
• Theoretical:
– Bounded approximation error.
[ Pineau, Gordon & Thrun, IJCAI 2003. Pineau, Gordon & Thrun, NIPS 2003. ]
Joelle Pineau Tractable Planning for Real-World Robotics 48
Back to the grand challenge
How can we go from 103 states to real-world problems?
Joelle Pineau Tractable Planning for Real-World Robotics 49
Talk outline
• Uncertainty in plan-based robotics
• Partially Observable Markov Decision Processes (POMDPs)
• POMDP solver #1: Point-based value iteration (PBVI)
• POMDP solver #2: Policy-contingent abstraction (PolCA+)
Joelle Pineau Tractable Planning for Real-World Robotics 50
Navigation
Structured POMDPs
Many real-world decision-making problems exhibit structure inherent to the problem domain.
Cognitive support Social interaction
High-level controller
Move AskWhere
Left Right Forward Backward
Joelle Pineau Tractable Planning for Real-World Robotics 51
Structured POMDP approaches
Factored models[Boutilier & Poole, 1996; Hansen & Feng, 2000; Guestrin et al., 2001]
– Idea: Represent state space with multi-valued state features.
– Insight: Independencies between state features can be leveraged to
overcome the curse of dimensionality.
Hierarchical POMDPs[Wiering & Schmidhuber, 1997; Theocharous et al., 2000; Hernandez-Gardiol &
Mahadevan, 2000; Pineau & Thrun, 2000]
– Idea: Exploit domain knowledge to divide one POMDP into many
smaller ones.
– Insight: Smaller action sets further help overcome the curse of history.
Joelle Pineau Tractable Planning for Real-World Robotics 52
A hierarchy of POMDPs
Act
ExamineHealth Navigate
MoveVerifyFluids
ClarifyGoal
North South East West
VerifyMeds
subtask
abstract action
primitive action
Joelle Pineau Tractable Planning for Real-World Robotics 53
PolCA+: Planning with a hierarchy of POMDPs
Navigate
Move ClarifyGoal
South East WestNorth
AMove = {N,S,E,W}
ACTIONSNorthSouthEastWest
ClarifyGoalVerifyFluidsVerifyMeds
ACTIONSNorthSouthEastWest
ClarifyGoalVerifyFluidsVerifyMeds
Step 1: Select the action set
Joelle Pineau Tractable Planning for Real-World Robotics 54
PolCA+: Planning with a hierarchy of POMDPs
Navigate
Move ClarifyGoal
South East WestNorth
AMove = {N,S,E,W}
SMove = {s1,s2}
STATE FEATURESX-positionY-position
X-goalY-goal
HealthStatus
STATE FEATURESX-positionY-position
X-goalY-goal
HealthStatus
ACTIONSNorthSouthEastWest
ClarifyGoalVerifyFluidsVerifyMeds
ACTIONSNorthSouthEastWest
ClarifyGoalVerifyFluidsVerifyMeds
Step 1: Select the action set
Step 2: Minimize the state set
Joelle Pineau Tractable Planning for Real-World Robotics 55
PolCA+: Planning with a hierarchy of POMDPs
Navigate
Move ClarifyGoal
South East WestNorth
AMove = {N,S,E,W}
SMove = {s1,s2}
STATE FEATURESX-positionY-position
X-goalY-goal
HealthStatus
STATE FEATURESX-positionY-position
X-goalY-goal
HealthStatus
ACTIONSNorthSouthEastWest
ClarifyGoalVerifyFluidsVerifyMeds
ACTIONSNorthSouthEastWest
ClarifyGoalVerifyFluidsVerifyMeds
PARAMETERS
{bh,Th,Oh,Rh}
PARAMETERS
{bh,Th,Oh,Rh}
Step 1: Select the action set
Step 2: Minimize the state set
Step 3: Choose parameters
Joelle Pineau Tractable Planning for Real-World Robotics 56
PolCA+: Planning with a hierarchy of POMDPs
Navigate
Move ClarifyGoal
South East WestNorth
STATE FEATURESX-positionY-position
X-goalY-goal
HealthStatus
STATE FEATURESX-positionY-position
X-goalY-goal
HealthStatus
ACTIONSNorthSouthEastWest
ClarifyGoalVerifyFluidsVerifyMeds
ACTIONSNorthSouthEastWest
ClarifyGoalVerifyFluidsVerifyMeds
PLAN
h
PLAN
h
PARAMETERS
{bh,Th,Oh,Rh}
PARAMETERS
{bh,Th,Oh,Rh}
Step 1: Select the action set
Step 2: Minimize the state set
Step 3: Choose parameters
Step 4: Plan task h
AMove = {N,S,E,W}
SMove = {s1,s2}
Joelle Pineau Tractable Planning for Real-World Robotics 57
PolCA+ in the Nursebot domain
• Goal: A robot is deployed in a nursing home, where it provides reminders to elderly users and accompanies them to appointments.
Joelle Pineau Tractable Planning for Real-World Robotics 58
Performance measure
-2000
2000
6000
10000
14000
0 400 800 1200
Time Steps
Cum
ulat
ive
Rew
ard
PolCA+
PolCA
QMDP
Hierarchy + Belief
Execution Steps
Hierarchy + Belief
Hierarchy + BeliefPolCA+
Joelle Pineau Tractable Planning for Real-World Robotics 59
Comparing user performance
0.1 0.10.18
POMDP PolicyNo Belief Policy
Joelle Pineau Tractable Planning for Real-World Robotics 60
Visit to the nursing home
Joelle Pineau Tractable Planning for Real-World Robotics 61
Highlights of the PolCA+ algorithm
• Algorithmic:– New hierarchical approach for POMDP framework.
– POMDP-specific state and observation abstraction methods.
• Experimental:– First instance of POMDP-based high-level robot controller.
– Novel application of POMDPs to robust dialogue management.
• Theoretical:– For special case (fully observable), guarantees recursive optimality.
[ Pineau, Gordon & Thrun, UAI 2003. Pineau et al., RAS 2003. Roy, Pineau & Thrun, ACL 2001]
Joelle Pineau Tractable Planning for Real-World Robotics 62
Future work
PBVI:
• How can we handle domains with multi-valued state features?
• Can we leverage dimensionality reduction?
• Can we find better ways to pick belief points?
PolCA+:
• Can we automatically learn hierarchies?
• How can we learn (or do without) pseudo-reward functions?
Joelle Pineau Tractable Planning for Real-World Robotics 63
Questions?
Project information:www.cs.cmu.edu/~nursebot
Navigation software:www.cs.cmu.edu/~carmen
Papers and more:www.cs.cmu.edu/~jpineau
Collaborators: Geoffrey Gordon, Judith Matthews, Michael Montemerlo,Martha Pollack, Nicholas Roy, Sebastian Thrunh
Joelle Pineau Tractable Planning for Real-World Robotics 64
Two types of uncertainty
Effect → Stochastic action effects
State → Partial and noisy sensor information
Joelle Pineau Tractable Planning for Real-World Robotics 65
Example: Effect uncertainty
Startposition
Distribution over possiblenext-step positions
Startposition
Distribution over possiblenext-step positions
Motion action
Joelle Pineau Tractable Planning for Real-World Robotics 66
Two types of uncertainty
Effect → Stochastic action effects
State → Partial and noisy sensor information
Joelle Pineau Tractable Planning for Real-World Robotics 67
Example: State uncertainty
Effect → Stochastic action effects
State → Partial and noisy sensor information
Model → Inaccurate parameterization of the environment
Agent → Unknown behaviour of other agents
Startposition
Distribution over possiblenext-step positions
Joelle Pineau Tractable Planning for Real-World Robotics 68
Validation of PBVI’s belief expansion heuristic
No Belief
Hallway domain60 states, 5 actions, 20 observations
Joelle Pineau Tractable Planning for Real-World Robotics 69
0
500
1000
1500
2000
2500
3000
3500
4000
4500
NoAbs PolCA PolCA+
# S
tate
ssubInform
subMove
subContact
subRest
subAssist
subRemind
act
State space reduction
No hierarchy PolCA+
Joelle Pineau Tractable Planning for Real-World Robotics 70
Future directions
• Improving POMDP planning
– sparser belief space sampling, ordered value updating, dimensionality reduction, continuous / hybrid domains
• Addressing two more types of uncertainty:1. Effect2. State3. Model4. Agent
• Exploring new applications