Probabilistic Planning via Determinization in Hindsight FF-Hindsight

Sungwook Yoon – Probabilistic Planning via Determinization

Probabilistic Planning via Determinization in Hindsight

FF-Hindsight

Sungwook Yoon

Joint work withAlan Fern, Bob Givan and Rao Kambhampati

Probabilistic Planning Competition

Client : Participants, send actionServer: Competition Host, simulates actions

The Winner was ……

• FF-Replan– A replanner. Use FF– Probabilistic domain is determinized

• Interesting Contrast– Many probabilistic planning techniques • Work in theory but does not work in practice

– FF-Replan• No theory• Work in practice

The Paper’s Objective

Better determinization approach(Determinization in Hindsight)

Theoretical consideration of the new determinization (in Hindsight)

New view on FF-Replan

Experimental studies with determinization in Hindsight (FF-Hindsight)

Probabilistic Planning(goal-oriented)

Action

ProbabilisticOutcome

Time 1

Time 2

Goal State

ActionState

Maximize Goal Achievement

Dead End

A1 A2 A1 A2 A1 A2 A1 A2

Left Outcomes are more likely

All Outcome Replanning (FFRA)

Action

Effect 1

Effect 2

Probability1

Probability2

Action1 Effect 1

Action2 Effect 2

ICAPS-07

Probabilistic PlanningAll Outcome Determinization

Action

Time 1

Time 2

Goal State

ActionState

Find Goal

Dead End

A1 A2 A1 A2 A1 A2 A1 A2

A1-1 A1-2 A2-1 A2-2

A1-1 A1-2 A2-1 A2-2 A1-1 A1-2 A2-1 A2-2 A1-1 A1-2 A2-1 A2-2 A1-1 A1-2 A2-1 A2-2

Probabilistic PlanningAll Outcome Determinization

Action

Time 1

Time 2

Goal State

ActionState

Find Goal

Dead End

A1 A2 A1 A2 A1 A2 A1 A2

A1-1 A1-2 A2-1 A2-2

A1-1 A1-2 A2-1 A2-2 A1-1 A1-2 A2-1 A2-2 A1-1 A1-2 A2-1 A2-2 A1-1 A1-2 A2-1 A2-2

Problem of FF-Replan and better alternative sampling

FF-Replan’s Static Determinizations don’t respect probabilities.

We need “Probabilistic and Dynamic Determinization”

Sample Future Outcomes and

Determinization in HindsightEach Future Sample Becomes a

Known-Future Deterministic Problem

Probabilistic Planning(goal-oriented)

Action

Time 1

Time 2

Goal State

ActionState

Dead End

A1 A2 A1 A2 A1 A2 A1 A2

Sungwook Yoon – Probabilistic Planning via Determinization 11

Start Sampling

Note. Sampling will reveal which is betterA1? Or A2 at state I

Hindsight Sample 1Action

Time 1

Time 2

Goal State

ActionState

Dead EndA1: 1A2: 0

A1 A2 A1 A2 A1 A2 A1 A2

Time 1

Time 2

Goal State

ActionState

Dead End

A1: 2A2: 1

A1 A2 A1 A2 A1 A2 A1 A2

Time 1

Time 2

Goal State

ActionState

Dead End

A1: 2A2: 1

A1 A2 A1 A2 A1 A2 A1 A2

Time 1

Time 2

Goal State

ActionState

Dead End

A1: 3A2: 1

A1 A2 A1 A2 A1 A2 A1 A2

Summary of the Idea:The Decision Process

(Estimating Q-Value, Q(s,a))

1. For Each Action A, Draw Future Samples

2. Solve The Deterministic Problems

3. Aggregate the solutions for each action

4. Select the action with best aggregation

S: Current State, A(S) → S’

Each Sample is a Deterministic Planning Problem

The solution length is used for goal-oriented problems, Q(s,A)

Max A Q(s,A)

Mathematical Summary of the Algorithm

• H-horizon future FH for M = [S,A,T,R]– Mapping of state, action and time (h<H) to a state– S × A × h → S

• Value of a policy π for FH – R(s,FH, π)

• VHS(s,H) = EFH [maxπ R(s,FH,π)]

• Compare this and the real value• V*(s,H) = maxπ EF

H [ R(s,FH,π) ]• VFFRa(s) = maxF V(s,F) ≥ VHS(s,H) ≥ V*(s,H)• Q(s,a,H) = (R(a) + EF

H-1 [maxπ R(a(s),FH-1,π)] )– In our proposal, computation of maxπ R(s,FH-1,π) is

approximately done by FF [Hoffmann and Nebel ’01]17

Done by FF

Each Future is aDeterministicProblem

Key Technical ResultsThe Importance of Independent Sampling of States, Actions, Time

The necessity of Random Time Breaking in Decision making

Theorem 1When there is a policy that can achieve the goal with probability 1 within horizon, hindsight decision making algorithm will find the goal with probability 1.

Theorem 2Polynomial number of samples are needed with regard to, Horizon, Action, The minimum Q-value advantage

We identify the characteristic of FF-Replan in terms of Hindsight Decision Making, VFFRa(s) = maxF V(s,F)

Empirical Results

Problem FFRa FF-HindsightBlocksworld 270 158

Boxworld 150 100

Fileworld 29 14

R-Tireworld 30 30

ZenoTravel 30 0

Exploding BW 5 28

G-Tireworld 7 18

Tower of Hanois 11 17

IPPC-04 Problems Numbers are solved Trials

For ZenoTravel, when we used Importance sampling, the solved trials have been improved to 26

Empirical Results

Planners

Climber River Bus-Fare

Tire1 Tire2 Tire3 Tire4 Tire5 Tire6

FFRa 60% 65% 1% 50% 0% 0% 0% 0% 0%Paragraph 100% 65% 100% 100% 100% 100% 3% 1% 0%FPG 100% 65% 22% 100% 92% 60% 35% 19% 13%FF-HS 100% 65% 100% 100% 100% 100% 100% 100% 100%

These Domains are Developed just to Beat FF-ReplanObviously, FF-Replan did not do well.

But, FF-Hindsight did very well, showingProbabilistic Reasoning Ability while achieving Scalability

Conclusion

Deterministic Planningscalability

Classic Planning

Machine Learning forPlanning

Net Benefit Optimization

Temporal Planning

Probabilistic Planning

scalability

Markov Decision Processes

Machine Learning forMDP

Temporal MDP

scalability

Determinization

Conclusion

• Devised an algorithm that can take advantage of the significant advances in deterministic planning in the context of probabilistic planning

• Made many of the deterministic planning techniques available to probabilistic planning– Most of the learning to planning techniques are

developed solely for deterministic planning• Now, these techniques are relevant to probabilistic planning

too– Advanced net-benefit style of planners can be used

for the reward maximization style of probabilistic planning problems

Discussion

• Mercier and Van Hentenryck provided the analysis of the difference between – V*(s,H) = maxπ EF

H [ R(s,FH,π) ]– VHS(s,H) = EF

H [maxπ R(s,FH,π)]• Ng and Jordan provided the analysis of the

difference between– V*(s,H) = maxπ EF

H [ R(s,FH,π) ]– V^(s,H) = maxπ ∑ [ R(s,FH,π) ] / m, where m is the

sample number

IPPC-2004 Results

NMRC J1 Classy NMR mGPT C FFRS FFRA

BW 252 270 255 30 120 30 210 270

Box 134 150 100 0 30 0 150 150

File - - - 3 30 3 14 29

Zeno - - - 30 30 30 0 30

Tire-r - - - 30 30 30 30 30

Tire-g - - - 9 16 30 7 7

TOH - - - 15 0 0 0 11Exploding - - - 0 0 0 3 5

Human Control Knowledge 2nd Place Winners

LearnedKnowledge

NMR Non-Markovian Reward Decision Process PlannerClassy Approximate Policy Iteration with a Policy Language Bias

mGPT Heuristic Search Probabilistic Planning

C Symbolic Heuristic Search

Numbers : Successful Runs

Winner of IPPC-04FFRs

IPPC-2006 ResultsFFRA FPG FOALP sfDP Paragraph FFRS

BW 86 63 100 29 0 77Zenotravel 100 27 0 7 7 7

Random 100 65 0 0 5 73

Elevator 93 76 100 0 0 93

Exploding 52 43 24 31 31 52

Drive 71 56 0 0 9 0

Schedule 51 54 0 0 1 0

PitchCatch 54 23 0 0 0 0

Tire 82 75 82 0 91 69

FPG Factored Policy Gradient Planner

FOALP First Order Approximate Linear Programming

sfDP Symbolic Stochastic Focused Dynamic Programming with Decision Diagrams

Paragraph A Graphplan Based Probabilistic Planner

Numbers : Percentage ofSuccessful Runs

Unofficial Winner of IPPC-06 FFRa

Sungwook Yoon – Probabilistic Planning via Determinization 26

Sampling ProblemTime dependency issue

Dead End

BC (with probability p)

C (with probability 1-p)

D (with probability 1-p)

D (with probability p)

Sampling ProblemTime dependency issue

Dead End

S3 is worse state then S1 but looks like there is always a path to GoalNeed to sample independently across actions

Action Selection ProblemRandom Tie breaking is essential

Start S1 Goal

C: with probability 1-p

C: with probability p

B: with probability p

A: Always stays in StartB: with probability 1-p

In Start state, C action is definitely better, but A can be used to wait until C to the Goal effect is realized

Sampling ProblemImportance Sampling (IS)

Start GoalS1 B: with extremely low probability

B: with very high probability

- Sampling uniformly would find the problem unsolvable.- Use importance sampling.- Identifying the region that needs importance sampling is for further study.-In the benchmark, Zenotravel needs the IS idea.

Theoretical Results• Theorem 1

– For goal-achieving probabilistic planning problems, if there is a policy that can solve the probabilistic planning problem with probability 1 with bounded horizon, then hindsight planning would solve the problem with probability 1. If there is no such policy, hindsight planning would return less 1 success ratio.

– If there is a future where no plan can achieve the goal, the future can be sampled

• Theorem 2– The number of future samples needed to correctly identify the

best action– w > 4Δ-2

T ln (|A|H| / δ)– Δ : the minimum Q-advantage of the best action over the other

actions, δ: confidence parameter– From Chernoff Bound

Probabilistic PlanningExpecti-max solution

Action

Time 1

Time 2

Goal State

ActionState

Max Max Max Max

Exp Exp

E E E E E E E E

Time 1

Time 2

Goal State

ActionState

Dead EndA1: 1A2: 0

A1 A2 A1 A2 A1 A2 A1 A2

Time 1

Time 2

Goal State

ActionState

Dead End

A1: 2A2: 1

A1 A2 A1 A2 A1 A2 A1 A2

Time 1

Time 2

Goal State

ActionState

Dead End

A1: 2A2: 1

A1 A2 A1 A2 A1 A2 A1 A2

Time 1

Time 2

Goal State

ActionState

Dead End

A1: 3A2: 1

A1 A2 A1 A2 A1 A2 A1 A2

Probabilistic Planning via Determinization in Hindsight FF-Hindsight

Documents

Transcript of Probabilistic Planning via Determinization in Hindsight FF-Hindsight

hindsight decision

HINDSIGHT, INSIGHT, FORESIGHT - Longview€¦ · DATA FOR ITS PRESENTATION.” 4 I HINDSIGHT, INSIGHT, FORESIGHT. 5 I HINDSIGHT, INSIGHT, FORESIGHT Longview makes performance software

6.17 Hindsight, part 1

Hindsight To The Future

Hindsight Number 2 January 2006

‘Estimating with Confidence’ and hindsight:

2020 is hindsight.

6.18 Hindsight, part 2

The Reiteration Effect in Hindsight Bias...The Hindsight Bias Two experimental designs, memory design and hypothetical design, have been employed to study the hindsight bias. The findings

2018-08-09- hindsight -initial decision and order · this proceeding, Respondent Hindsight was the owner of the fishing vessel Hindsight (“F/V Hindsight”),8JX 1 at ¶2; AX 4 at

Hindsight bias in a very sparse environment · Hindsight bias also has serious implications in many applied judgment areas. For example, hindsight bias ... way to give information

State Minimization and Determinization EECS 290A Sequential Logic Synthesis and Verification.

From hindsight to foresight Improving business transparencygraphics.eiu.com/upload/from_hindsight_to_foresight_final_Sep_200… · From hindsight to foresight: Improving business

Hindsight Bias .

September 11th in Hindsight: in Hindsight: Recovery and ... · September 11th in Hindsight: in Hindsight: Recovery and Resolve August 30, 2002 ... 16% say events had high impact on

God’s Plan Revealed May 3. Quotes on Hindsight Hindsight is always twenty-twenty. Hindsight is an exact science. Hindsight is the time machine, you will.

gg - Forging Industry Association · gg . gg . ff fff . ff . ff . ff . ff . ff . ff . ff . ff . ff . ff . ff . ff . ff . gg ff . Title: ffaaMacSleyne.PDF Author: Fran Beam Subject:

Hindsight, Issue 13

ffff - Rightmove · ff ˚ ˛ ˝ ˛˙ˆ˛˛ ˇ ˘ ff ff ff ff ff ff ff ff ˝˙ ff ˚˛ ˚˘ ˚ ˚ ˚ ffff ff ff ff ˇ ˇ ˚ ˚ ˛ ˛ ˝ ˝ ˘ ˘ ff ff ff ffff ff ff ff ff ˝˙ ˝˙

Better climate predictions using hindsight