Post on 23-Feb-2016
description
Sungwook Yoon – Probabilistic Planning via Determinization
Probabilistic Planning via Determinization in Hindsight
FF-Hindsight
Sungwook Yoon
Joint work withAlan Fern, Bob Givan and Rao Kambhampati
Sungwook Yoon – Probabilistic Planning via Determinization
Probabilistic Planning Competition
Client : Participants, send actionServer: Competition Host, simulates actions
2
Sungwook Yoon – Probabilistic Planning via Determinization
The Winner was ……
• FF-Replan– A replanner. Use FF– Probabilistic domain is determinized
• Interesting Contrast– Many probabilistic planning techniques • Work in theory but does not work in practice
– FF-Replan• No theory• Work in practice
3
Sungwook Yoon – Probabilistic Planning via Determinization
The Paper’s Objective
Better determinization approach(Determinization in Hindsight)
Theoretical consideration of the new determinization (in Hindsight)
New view on FF-Replan
Experimental studies with determinization in Hindsight (FF-Hindsight)
4
Sungwook Yoon – Probabilistic Planning via Determinization
Probabilistic Planning(goal-oriented)
Action
ProbabilisticOutcome
Time 1
Time 2
Goal State
5
ActionState
Maximize Goal Achievement
Dead End
A1 A2
I
A1 A2 A1 A2 A1 A2 A1 A2
Left Outcomes are more likely
Sungwook Yoon – Probabilistic Planning via Determinization
All Outcome Replanning (FFRA)
Action
Effect 1
Effect 2
Probability1
Probability2
Action1 Effect 1
Action2 Effect 2
ICAPS-07
6
Sungwook Yoon – Probabilistic Planning via Determinization
Probabilistic PlanningAll Outcome Determinization
Action
ProbabilisticOutcome
Time 1
Time 2
Goal State
7
ActionState
Find Goal
Dead End
A1 A2
A1 A2 A1 A2 A1 A2 A1 A2
I
A1-1 A1-2 A2-1 A2-2
A1-1 A1-2 A2-1 A2-2 A1-1 A1-2 A2-1 A2-2 A1-1 A1-2 A2-1 A2-2 A1-1 A1-2 A2-1 A2-2
Sungwook Yoon – Probabilistic Planning via Determinization
Probabilistic PlanningAll Outcome Determinization
Action
ProbabilisticOutcome
Time 1
Time 2
Goal State
8
ActionState
Find Goal
Dead End
A1 A2
A1 A2 A1 A2 A1 A2 A1 A2
I
A1-1 A1-2 A2-1 A2-2
A1-1 A1-2 A2-1 A2-2 A1-1 A1-2 A2-1 A2-2 A1-1 A1-2 A2-1 A2-2 A1-1 A1-2 A2-1 A2-2
Sungwook Yoon – Probabilistic Planning via Determinization
Problem of FF-Replan and better alternative sampling
9
FF-Replan’s Static Determinizations don’t respect probabilities.
We need “Probabilistic and Dynamic Determinization”
Sample Future Outcomes and
Determinization in HindsightEach Future Sample Becomes a
Known-Future Deterministic Problem
Sungwook Yoon – Probabilistic Planning via Determinization
Probabilistic Planning(goal-oriented)
Action
ProbabilisticOutcome
Time 1
Time 2
Goal State
10
ActionState
Maximize Goal Achievement
Dead End
Left Outcomes are more likely
A1 A2
A1 A2 A1 A2 A1 A2 A1 A2
I
Sungwook Yoon – Probabilistic Planning via Determinization 11
Start Sampling
Note. Sampling will reveal which is betterA1? Or A2 at state I
Sungwook Yoon – Probabilistic Planning via Determinization
Hindsight Sample 1Action
ProbabilisticOutcome
Time 1
Time 2
Goal State
12
ActionState
Maximize Goal Achievement
Dead EndA1: 1A2: 0
Left Outcomes are more likely
A1 A2
A1 A2 A1 A2 A1 A2 A1 A2
I
Sungwook Yoon – Probabilistic Planning via Determinization
Hindsight Sample 2Action
ProbabilisticOutcome
Time 1
Time 2
Goal State
13
ActionState
Maximize Goal Achievement
Dead End
Left Outcomes are more likely
A1: 2A2: 1
A1 A2
A1 A2 A1 A2 A1 A2 A1 A2
I
Sungwook Yoon – Probabilistic Planning via Determinization
Hindsight Sample 3Action
ProbabilisticOutcome
Time 1
Time 2
Goal State
14
ActionState
Maximize Goal Achievement
Dead End
Left Outcomes are more likely
A1: 2A2: 1
A1 A2
A1 A2 A1 A2 A1 A2 A1 A2
I
Sungwook Yoon – Probabilistic Planning via Determinization
Hindsight Sample 4Action
ProbabilisticOutcome
Time 1
Time 2
Goal State
15
ActionState
Maximize Goal Achievement
Dead End
Left Outcomes are more likely
A1: 3A2: 1
A1 A2
A1 A2 A1 A2 A1 A2 A1 A2
I
Sungwook Yoon – Probabilistic Planning via Determinization
Summary of the Idea:The Decision Process
(Estimating Q-Value, Q(s,a))
1. For Each Action A, Draw Future Samples
2. Solve The Deterministic Problems
3. Aggregate the solutions for each action
4. Select the action with best aggregation
S: Current State, A(S) → S’
Each Sample is a Deterministic Planning Problem
The solution length is used for goal-oriented problems, Q(s,A)
Max A Q(s,A)
16
Sungwook Yoon – Probabilistic Planning via Determinization
Mathematical Summary of the Algorithm
• H-horizon future FH for M = [S,A,T,R]– Mapping of state, action and time (h<H) to a state– S × A × h → S
• Value of a policy π for FH – R(s,FH, π)
• VHS(s,H) = EFH [maxπ R(s,FH,π)]
• Compare this and the real value• V*(s,H) = maxπ EF
H [ R(s,FH,π) ]• VFFRa(s) = maxF V(s,F) ≥ VHS(s,H) ≥ V*(s,H)• Q(s,a,H) = (R(a) + EF
H-1 [maxπ R(a(s),FH-1,π)] )– In our proposal, computation of maxπ R(s,FH-1,π) is
approximately done by FF [Hoffmann and Nebel ’01]17
Done by FF
Each Future is aDeterministicProblem
Sungwook Yoon – Probabilistic Planning via Determinization
Key Technical ResultsThe Importance of Independent Sampling of States, Actions, Time
The necessity of Random Time Breaking in Decision making
Theorem 1When there is a policy that can achieve the goal with probability 1 within horizon, hindsight decision making algorithm will find the goal with probability 1.
Theorem 2Polynomial number of samples are needed with regard to, Horizon, Action, The minimum Q-value advantage
We identify the characteristic of FF-Replan in terms of Hindsight Decision Making, VFFRa(s) = maxF V(s,F)
18
Sungwook Yoon – Probabilistic Planning via Determinization
Empirical Results
Problem FFRa FF-HindsightBlocksworld 270 158
Boxworld 150 100
Fileworld 29 14
R-Tireworld 30 30
ZenoTravel 30 0
Exploding BW 5 28
G-Tireworld 7 18
Tower of Hanois 11 17
IPPC-04 Problems Numbers are solved Trials
For ZenoTravel, when we used Importance sampling, the solved trials have been improved to 26
19
Sungwook Yoon – Probabilistic Planning via Determinization
Empirical Results
Planners
Climber River Bus-Fare
Tire1 Tire2 Tire3 Tire4 Tire5 Tire6
FFRa 60% 65% 1% 50% 0% 0% 0% 0% 0%Paragraph 100% 65% 100% 100% 100% 100% 3% 1% 0%FPG 100% 65% 22% 100% 92% 60% 35% 19% 13%FF-HS 100% 65% 100% 100% 100% 100% 100% 100% 100%
These Domains are Developed just to Beat FF-ReplanObviously, FF-Replan did not do well.
But, FF-Hindsight did very well, showingProbabilistic Reasoning Ability while achieving Scalability
20
Sungwook Yoon – Probabilistic Planning via Determinization
Conclusion
21
Deterministic Planningscalability
Classic Planning
Machine Learning forPlanning
Net Benefit Optimization
Temporal Planning
Probabilistic Planning
scalability
Markov Decision Processes
Machine Learning forMDP
Temporal MDP
scalability
Determinization
Sungwook Yoon – Probabilistic Planning via Determinization
Conclusion
• Devised an algorithm that can take advantage of the significant advances in deterministic planning in the context of probabilistic planning
• Made many of the deterministic planning techniques available to probabilistic planning– Most of the learning to planning techniques are
developed solely for deterministic planning• Now, these techniques are relevant to probabilistic planning
too– Advanced net-benefit style of planners can be used
for the reward maximization style of probabilistic planning problems
22
Sungwook Yoon – Probabilistic Planning via Determinization
Discussion
• Mercier and Van Hentenryck provided the analysis of the difference between – V*(s,H) = maxπ EF
H [ R(s,FH,π) ]– VHS(s,H) = EF
H [maxπ R(s,FH,π)]• Ng and Jordan provided the analysis of the
difference between– V*(s,H) = maxπ EF
H [ R(s,FH,π) ]– V^(s,H) = maxπ ∑ [ R(s,FH,π) ] / m, where m is the
sample number
23
Sungwook Yoon – Probabilistic Planning via Determinization
IPPC-2004 Results
NMRC J1 Classy NMR mGPT C FFRS FFRA
BW 252 270 255 30 120 30 210 270
Box 134 150 100 0 30 0 150 150
File - - - 3 30 3 14 29
Zeno - - - 30 30 30 0 30
Tire-r - - - 30 30 30 30 30
Tire-g - - - 9 16 30 7 7
TOH - - - 15 0 0 0 11Exploding - - - 0 0 0 3 5
Human Control Knowledge 2nd Place Winners
LearnedKnowledge
NMR Non-Markovian Reward Decision Process PlannerClassy Approximate Policy Iteration with a Policy Language Bias
mGPT Heuristic Search Probabilistic Planning
C Symbolic Heuristic Search
Numbers : Successful Runs
Winner of IPPC-04FFRs
24
Sungwook Yoon – Probabilistic Planning via Determinization
IPPC-2006 ResultsFFRA FPG FOALP sfDP Paragraph FFRS
BW 86 63 100 29 0 77Zenotravel 100 27 0 7 7 7
Random 100 65 0 0 5 73
Elevator 93 76 100 0 0 93
Exploding 52 43 24 31 31 52
Drive 71 56 0 0 9 0
Schedule 51 54 0 0 1 0
PitchCatch 54 23 0 0 0 0
Tire 82 75 82 0 91 69
FPG Factored Policy Gradient Planner
FOALP First Order Approximate Linear Programming
sfDP Symbolic Stochastic Focused Dynamic Programming with Decision Diagrams
Paragraph A Graphplan Based Probabilistic Planner
Numbers : Percentage ofSuccessful Runs
Unofficial Winner of IPPC-06 FFRa
25
Sungwook Yoon – Probabilistic Planning via Determinization 26
Sungwook Yoon – Probabilistic Planning via Determinization
Sampling ProblemTime dependency issue
Start
S1 S2
Goal
S3
Dead End
A
BC (with probability p)
C (with probability 1-p)
D (with probability 1-p)
D (with probability p)
27
Sungwook Yoon – Probabilistic Planning via Determinization
Sampling ProblemTime dependency issue
Start
S1 S2
Goal
S3
Dead End
A
B
S3 is worse state then S1 but looks like there is always a path to GoalNeed to sample independently across actions
28
Sungwook Yoon – Probabilistic Planning via Determinization
Action Selection ProblemRandom Tie breaking is essential
Start S1 Goal
C: with probability 1-p
C: with probability p
B: with probability p
A: Always stays in StartB: with probability 1-p
In Start state, C action is definitely better, but A can be used to wait until C to the Goal effect is realized
29
Sungwook Yoon – Probabilistic Planning via Determinization
Sampling ProblemImportance Sampling (IS)
Start GoalS1 B: with extremely low probability
B: with very high probability
- Sampling uniformly would find the problem unsolvable.- Use importance sampling.- Identifying the region that needs importance sampling is for further study.-In the benchmark, Zenotravel needs the IS idea.
30
Sungwook Yoon – Probabilistic Planning via Determinization
Theoretical Results• Theorem 1
– For goal-achieving probabilistic planning problems, if there is a policy that can solve the probabilistic planning problem with probability 1 with bounded horizon, then hindsight planning would solve the problem with probability 1. If there is no such policy, hindsight planning would return less 1 success ratio.
– If there is a future where no plan can achieve the goal, the future can be sampled
• Theorem 2– The number of future samples needed to correctly identify the
best action– w > 4Δ-2
T ln (|A|H| / δ)– Δ : the minimum Q-advantage of the best action over the other
actions, δ: confidence parameter– From Chernoff Bound
31
Sungwook Yoon – Probabilistic Planning via Determinization
Probabilistic PlanningExpecti-max solution
Action
ProbabilisticOutcome
Time 1
Time 2
Goal State
32
ActionState
Maximize Goal Achievement
Max
Max Max Max Max
Exp Exp
E E E E E E E E
Sungwook Yoon – Probabilistic Planning via Determinization
Hindsight Sample 1Action
ProbabilisticOutcome
Time 1
Time 2
Goal State
33
ActionState
Maximize Goal Achievement
Dead EndA1: 1A2: 0
Left Outcomes are more likely
A1 A2
A1 A2 A1 A2 A1 A2 A1 A2
I
Sungwook Yoon – Probabilistic Planning via Determinization
Hindsight Sample 2Action
ProbabilisticOutcome
Time 1
Time 2
Goal State
34
ActionState
Maximize Goal Achievement
Dead End
Left Outcomes are more likely
A1: 2A2: 1
A1 A2
A1 A2 A1 A2 A1 A2 A1 A2
I
Sungwook Yoon – Probabilistic Planning via Determinization
Hindsight Sample 3Action
ProbabilisticOutcome
Time 1
Time 2
Goal State
35
ActionState
Maximize Goal Achievement
Dead End
Left Outcomes are more likely
A1: 2A2: 1
A1 A2
A1 A2 A1 A2 A1 A2 A1 A2
I
Sungwook Yoon – Probabilistic Planning via Determinization
Hindsight Sample 4Action
ProbabilisticOutcome
Time 1
Time 2
Goal State
36
ActionState
Maximize Goal Achievement
Dead End
Left Outcomes are more likely
A1: 3A2: 1
A1 A2
A1 A2 A1 A2 A1 A2 A1 A2
I