Post on 27-May-2020
CSC2542 Contingent Planning
Sheila McIlraithDepartment of Computer ScienceWinter, 2019
Contingent PlanningThe slides that follow describe the following publication:
Christian J. Muise, Vaishak Belle, Sheila A. McIlraith: Computing Contingent Plans via Fully Observable Non-Deterministic Planning. AAAI 2014: 2322-2329
Slides largely prepared by Christian Muise.
Computing Contingent Plans viaFully Observable Non-Deterministic Planning
Christian Muise Vaishak Belle Sheila A. McIlraith
Department of Computer Science,University of Toronto, Toronto, Canada.{cjmuise,vaishak,sheila}@cs.toronto.edu
June 23, 2014
Overview
PRP (2012)
Planner for Fully ObservableNon-Deterministic (FOND).
Powerful techniques for deadenddetection, state relevance, etc.
Recently updated to handleconditional effects.
PO-PRP (2014)
Computes conditional plans forsimple contingent problems.
Modifies the computational coreof PRP to solve encodings.
Novel approach for constructingcompact conditional plans.
Main Contribution
Leverage modern FOND planners and POND encodings toeffectively compile a compact conditional plan offline.
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 1 / 24
Overview
PRP (2012)
Planner for Fully ObservableNon-Deterministic (FOND).
Powerful techniques for deadenddetection, state relevance, etc.
Recently updated to handleconditional effects.
PO-PRP (2014)
Computes conditional plans forsimple contingent problems.
Modifies the computational coreof PRP to solve encodings.
Novel approach for constructingcompact conditional plans.
Main Contribution
Leverage modern FOND planners and POND encodings toeffectively compile a compact conditional plan offline.
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 1 / 24
Overview
PRP (2012)
Planner for Fully ObservableNon-Deterministic (FOND).
Powerful techniques for deadenddetection, state relevance, etc.
Recently updated to handleconditional effects.
PO-PRP (2014)
Computes conditional plans forsimple contingent problems.
Modifies the computational coreof PRP to solve encodings.
Novel approach for constructingcompact conditional plans.
Main Contribution
Leverage modern FOND planners and POND encodings toeffectively compile a compact conditional plan offline.
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 1 / 24
Example Plan: doors-5
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 2 / 24
Example Plan: doors-5
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 2 / 24
Outline
1 Background & Notation
2 PO-PRPEncodingComputingRepresenting
3 Evaluation
4 Conclusion
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 3 / 24
Outline
1 Background & Notation
2 PO-PRP
3 Evaluation
4 Conclusion
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 4 / 24
Contingent Planning (1/2)
Planning Task: 〈F , I,G,O,A〉F : Finite set of fluents.
I: Set of clauses over F that determines the initial state.
G: Conjunction of atoms over F that determines the goal.
O: Set of observations.
A: Set of actions.
States
Belief state b represents a set of states.
I corresponds to a belief state bI .
b satisfies a formula φ if φ holds in every state s ∈ b.
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 4 / 24
Contingent Planning (2/2)
Actions
Pre(a): Conjunction of atoms that must hold to execute a.
Eff(a): Set of pairs 〈C , L〉 to capture conditional effects.
Action progression: ba = {Prog(s, a) | s ∈ b}
Observations
Observation o consists of the condition Pre(o) that the agentmust know before observing the truth of fluent Sense(o).
Belief update with observation: boa = {s | s ∈ ba and L ∈ s},where o = 〈C , L〉 and every state in ba satisfies C .
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 5 / 24
Contingent Planning (2/2)
Actions
Pre(a): Conjunction of atoms that must hold to execute a.
Eff(a): Set of pairs 〈C , L〉 to capture conditional effects.
Action progression: ba = {Prog(s, a) | s ∈ b}
Observations
Observation o consists of the condition Pre(o) that the agentmust know before observing the truth of fluent Sense(o).
Belief update with observation: boa = {s | s ∈ ba and L ∈ s},where o = 〈C , L〉 and every state in ba satisfies C .
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 5 / 24
Simple Contingent Problems (Bonet & Geffner, 2011)
Effective Characterization
The non-unary clauses in I are invariant. E.g.,:
Location of the agent (always in exactly one spot).
Uncertainty of a fluent (can only ever be true or false).
Monotonicity
Uncertainty of a fluent cannot propagate to other fluents. E.g.,:
Ok: The effect dropping a vase when we know its status:〈{fragile vase}, broken vase〉
Not Ok: Unknowingly moving into a trap:〈{at wumpus loc3}, dead〉
Corollary: Once known, a fluent remains known.
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 6 / 24
Simple Contingent Problems (Bonet & Geffner, 2011)
Effective Characterization
The non-unary clauses in I are invariant. E.g.,:
Location of the agent (always in exactly one spot).
Uncertainty of a fluent (can only ever be true or false).
Monotonicity
Uncertainty of a fluent cannot propagate to other fluents. E.g.,:
Ok: The effect dropping a vase when we know its status:〈{fragile vase}, broken vase〉
Not Ok: Unknowingly moving into a trap:〈{at wumpus loc3}, dead〉
Corollary: Once known, a fluent remains known.
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 6 / 24
Simple Contingent Problems (Bonet & Geffner, 2011)
Effective Characterization
The non-unary clauses in I are invariant. E.g.,:
Location of the agent (always in exactly one spot).
Uncertainty of a fluent (can only ever be true or false).
Monotonicity
Uncertainty of a fluent cannot propagate to other fluents. E.g.,:
Ok: The effect dropping a vase when we know its status:〈{fragile vase}, broken vase〉
Not Ok: Unknowingly moving into a trap:〈{at wumpus loc3}, dead〉
Corollary: Once known, a fluent remains known.
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 6 / 24
FOND Planning
Planning Task: 〈F , I,G,A〉F : Finite set of fluents.
I: Complete setting of the fluents for the initial state.
G: Conjunction of atoms over F that determines the goal.
A: Set of (possibly non-deterministic) actions.
Actions
Pre(a) is the set of fluents that must be true to execute a.
Eff(a) is a finite set of non-deterministic effects.
A non-deterministic effect consists of a set of conditionaleffects, each being a pair 〈C , L〉 (C possibly being >)
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 7 / 24
FOND Planning
Planning Task: 〈F , I,G,A〉F : Finite set of fluents.
I: Complete setting of the fluents for the initial state.
G: Conjunction of atoms over F that determines the goal.
A: Set of (possibly non-deterministic) actions.
Actions
Pre(a) is the set of fluents that must be true to execute a.
Eff(a) is a finite set of non-deterministic effects.
A non-deterministic effect consists of a set of conditionaleffects, each being a pair 〈C , L〉 (C possibly being >)
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 7 / 24
FOND Solutions and Representation
Weak Plan
Strong Plan
Strong Cyclic Plan
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 8 / 24
FOND Solutions and Representation
Weak Plan
Strong Plan
Strong Cyclic Plan
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 8 / 24
FOND Solutions and Representation
Weak Plan
Strong Plan
Strong Cyclic Plan
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 8 / 24
FOND Solutions and Representation
Policy
P(s): Given a set of tuples of the form 〈p, a, c〉 where p is a partialstate, a an action, and c a cost, return the action a in state s fromthe tuple with the lowest cost c such that s |= p.
Strong Cyclic Plan
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 8 / 24
Outline
1 Background & Notation
2 PO-PRP
3 Evaluation
4 Conclusion
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 9 / 24
Approach
1 Convert the contingent problem to a FOND one.
2 Solve with a tailored version of PRP.
3 Compile the final conditional plan.
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 9 / 24
Outline
2 PO-PRPEncodingComputingRepresenting
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 10 / 24
FOND Encoding (based on Bonet & Geffner, 2011)
Planning Task: 〈F ′, I ′,G ′,A′〉F ′: {KL | L ∈ F} ∪ {K¬L | L ∈ F}I ′: {KL | L ∈ I}G′: {KL | L ∈ G}A′: A′
A ∪ A′O ∪ A′
V
A′A: a′ for every a ∈ A where, Pre(a′) = {KL | L ∈ Pre(a)}and Eff(a′) = {〈KC ,KL〉 | 〈C , L〉 ∈ Eff(a)}.
A′O: for o = 〈C , L〉 ∈ O, there is an action a′ ∈ A′O such thatPre(a′) = KC ∧ ¬KL ∧ ¬K¬L with two possiblenon-deterministic effects: {〈>,KL〉} and {〈>,K¬L〉}.A′V : for every (C ⊃ L) ∈ I∗, there is an action a′ ∈ A′V suchthat Pre(a′) = KC and Eff(a′) = {〈>,KL〉}.
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 10 / 24
FOND Encoding (based on Bonet & Geffner, 2011)
Planning Task: 〈F ′, I ′,G ′,A′〉F ′: {KL | L ∈ F} ∪ {K¬L | L ∈ F}I ′: {KL | L ∈ I}G′: {KL | L ∈ G}A′: A′
A ∪ A′O ∪ A′
VA′A: a′ for every a ∈ A where, Pre(a′) = {KL | L ∈ Pre(a)}and Eff(a′) = {〈KC ,KL〉 | 〈C , L〉 ∈ Eff(a)}.
A′O: for o = 〈C , L〉 ∈ O, there is an action a′ ∈ A′O such thatPre(a′) = KC ∧ ¬KL ∧ ¬K¬L with two possiblenon-deterministic effects: {〈>,KL〉} and {〈>,K¬L〉}.A′V : for every (C ⊃ L) ∈ I∗, there is an action a′ ∈ A′V suchthat Pre(a′) = KC and Eff(a′) = {〈>,KL〉}.
Note: Do not need to include {¬K¬C ,¬K¬L | 〈C , L〉 ∈ Eff(a)}
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 10 / 24
FOND Encoding (based on Bonet & Geffner, 2011)
Planning Task: 〈F ′, I ′,G ′,A′〉F ′: {KL | L ∈ F} ∪ {K¬L | L ∈ F}I ′: {KL | L ∈ I}G′: {KL | L ∈ G}A′: A′
A ∪ A′O ∪ A′
VA′A: a′ for every a ∈ A where, Pre(a′) = {KL | L ∈ Pre(a)}and Eff(a′) = {〈KC ,KL〉 | 〈C , L〉 ∈ Eff(a)}.A′O: for o = 〈C , L〉 ∈ O, there is an action a′ ∈ A′O such thatPre(a′) = KC ∧ ¬KL ∧ ¬K¬L with two possiblenon-deterministic effects: {〈>,KL〉} and {〈>,K¬L〉}.
A′V : for every (C ⊃ L) ∈ I∗, there is an action a′ ∈ A′V suchthat Pre(a′) = KC and Eff(a′) = {〈>,KL〉}.
Note: Can only observe a fluent once due to monotonicity.
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 10 / 24
FOND Encoding (based on Bonet & Geffner, 2011)
Planning Task: 〈F ′, I ′,G ′,A′〉F ′: {KL | L ∈ F} ∪ {K¬L | L ∈ F}I ′: {KL | L ∈ I}G′: {KL | L ∈ G}A′: A′
A ∪ A′O ∪ A′
VA′A: a′ for every a ∈ A where, Pre(a′) = {KL | L ∈ Pre(a)}and Eff(a′) = {〈KC ,KL〉 | 〈C , L〉 ∈ Eff(a)}.A′O: for o = 〈C , L〉 ∈ O, there is an action a′ ∈ A′O such thatPre(a′) = KC ∧ ¬KL ∧ ¬K¬L with two possiblenon-deterministic effects: {〈>,KL〉} and {〈>,K¬L〉}.A′V : for every (C ⊃ L) ∈ I∗, there is an action a′ ∈ A′V suchthat Pre(a′) = KC and Eff(a′) = {〈>,KL〉}.
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 10 / 24
Key Considerations
Fairness Assumption
How do we get away with assuming that the non-deterministicoutcomes of a sensing action are “fair”?
Reachable Incoherent States
How do we handle belief states that will never occur in practice?(e.g., considering a sensing outcome that is not possible)
Solution Correspondence
How do we interpret the policy that PRP generates for the originalcontingent planning problem?
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 11 / 24
Key Considerations
Fairness Assumption
How do we get away with assuming that the non-deterministicoutcomes of a sensing action are “fair”? Answer: Monotonicity!
Reachable Incoherent States
How do we handle belief states that will never occur in practice?(e.g., considering a sensing outcome that is not possible)
Solution Correspondence
How do we interpret the policy that PRP generates for the originalcontingent planning problem?
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 11 / 24
Outline
2 PO-PRPEncodingComputingRepresenting
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 12 / 24
General PO-PRP Algorithm
Simple Contingent Problem → FOND Problem → Contingent Plan
1 Initialize Open and Closed lists of states handled by theincumbent policy P (Open initially contains only I ′);
2 Select and move a state s from Open to Closed such that,1 If P(s) = ⊥, run UpdatePolicy(〈F ′, s,G′,A′〉,P);2 If P(s) = a, and a 6= ⊥, add to Open every state in{Prog(s, a,Effi ) | a = 〈Prea,Eff1, . . . ,Effk〉}\ Closed ;
3 If UpdatePolicy failed in 2.2, process s as a deadend ;
3 If Open is empty, return P. Else, repeat from step 2;
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 12 / 24
General PO-PRP Algorithm
Simple Contingent Problem → FOND Problem ⇒ Contingent Plan
1 Initialize Open and Closed lists of states handled by theincumbent policy P (Open initially contains only I ′);
2 Select and move a state s from Open to Closed such that,1 If P(s) = ⊥, run UpdatePolicy(〈F ′, s,G′,A′〉,P);2 If P(s) = a, and a 6= ⊥, add to Open every state in{Prog(s, a,Effi ) | a = 〈Prea,Eff1, . . . ,Effk〉}\ Closed ;
3 If UpdatePolicy failed in 2.2, process s as a deadend ;
3 If Open is empty, return P. Else, repeat from step 2;
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 12 / 24
General PO-PRP Algorithm
Simple Contingent Problem → FOND Problem ⇒ Contingent Plan
1 Initialize Open and Closed lists of states handled by theincumbent policy P (Open initially contains only I ′);
2 Select and move a state s from Open to Closed such that,1 If P(s) = ⊥, run UpdatePolicy(〈F ′, s,G′,A′〉,P);2 If P(s) = a, and a 6= ⊥, add to Open every state in{Prog(s, a,Effi ) | a = 〈Prea,Eff1, . . . ,Effk〉}\ Closed ;
3 If UpdatePolicy failed in 2.2, process s as a deadend ;
3 If Open is empty, return P. Else, repeat from step 2;
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 12 / 24
UpdatePolicy(〈F ′, s,G ′,A′〉,P)
1 A′′ = Determinize(A′); // Using all-outcomes
2 [a1, · · · an] = ComputePlan(〈F ′, s,G′,A′′〉);
3 For every suffix [ai , · · · an] of the plan,1 pi = Regress(G′, [ai , · · · an]);2 ci = cost([ai , · · · an]);3 Add 〈pi , ai , ci 〉 to P;
Invariants left as actions (and not axioms).
Priority is given to useful invariants.
Arbitrary ordering of invariants enforced.
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 13 / 24
UpdatePolicy(〈F ′, s,G ′,A′〉,P)
1 A′′ = Determinize(A′); // Using all-outcomes
2 [a1, · · · an] = ComputePlan(〈F ′, s,G′,A′′〉);
3 For every suffix [ai , · · · an] of the plan,1 pi = Regress(G′, [ai , · · · an]);2 ci = cost([ai , · · · an]);3 Add 〈pi , ai , ci 〉 to P;
Invariants left as actions (and not axioms).
Priority is given to useful invariants.
Arbitrary ordering of invariants enforced.
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 13 / 24
Outline
2 PO-PRPEncodingComputingRepresenting
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 14 / 24
Contingent Plan Representations
Option 1: State-Action Mapping
Uses the default output of PRP.
Requires maintaining the compiled belief.
Difficult to see the “bigger picture”.
Option 2: Conditional Plan
Expressed in terms of the original problem.
Standard for (offline) contingent planning.
Typically takes the form of an exponentially large tree.
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 14 / 24
Contingent Plan Representations
Option 1: State-Action Mapping
Uses the default output of PRP.
Requires maintaining the compiled belief.
Difficult to see the “bigger picture”.
Option 2: Conditional Plan
Expressed in terms of the original problem.
Standard for (offline) contingent planning.
Typically takes the form of an exponentially large tree.
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 14 / 24
Strong Cyclic Detection
(PO-)PRP identifies a subset of the tuples in a policy P that aredetermined to be marked as strong cyclic.
Strong Cyclic Property 1
If a tuple 〈p, a, c〉 ∈ P is marked as strong cyclic, then for anystate s such that s |= p, the policy P is a strong cyclic plan for s.
Strong Cyclic Property 2
If a tuple 〈p, a, c〉 ∈ P is marked as strong cyclic, then for anystate s such that s |= p, s ′ = Prog(s, a), and P(s ′) = 〈p′, a′, c ′〉,either p′ is G′ or the tuple 〈p′, a′, c ′〉 is marked as strong cyclic.
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 15 / 24
Strong Cyclic Detection
(PO-)PRP identifies a subset of the tuples in a policy P that aredetermined to be marked as strong cyclic.
Strong Cyclic Property 1
If a tuple 〈p, a, c〉 ∈ P is marked as strong cyclic, then for anystate s such that s |= p, the policy P is a strong cyclic plan for s.
Strong Cyclic Property 2
If a tuple 〈p, a, c〉 ∈ P is marked as strong cyclic, then for anystate s such that s |= p, s ′ = Prog(s, a), and P(s ′) = 〈p′, a′, c ′〉,either p′ is G′ or the tuple 〈p′, a′, c ′〉 is marked as strong cyclic.
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 15 / 24
Strong Cyclic Detection
(PO-)PRP identifies a subset of the tuples in a policy P that aredetermined to be marked as strong cyclic.
Strong Cyclic Property 1
If a tuple 〈p, a, c〉 ∈ P is marked as strong cyclic, then for anystate s such that s |= p, the policy P is a strong cyclic plan for s.
Strong Cyclic Property 2
If a tuple 〈p, a, c〉 ∈ P is marked as strong cyclic, then for anystate s such that s |= p, s ′ = Prog(s, a), and P(s ′) = 〈p′, a′, c ′〉,either p′ is G′ or the tuple 〈p′, a′, c ′〉 is marked as strong cyclic.
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 15 / 24
Exporting a Conditional Plan
General Intuition
Enumerate the reachable state space of the policy P, and use thetuples in P marked as strong cyclic to replace complete states.
1 Let Open = {I ′}, Cls = ∅, and 〈N,E 〉 = 〈{I ′}, ∅〉;2 Select and move a state s from Open to Cls such that,
(i) Let P(s) = 〈p, a, c〉 and if the tuple is marked as beingstrong cyclic, let s ′ = p, otherwise let s ′ = s;
(ii) Assuming that Eff(a) = {Eff1 · · ·Effn}, letSucc = {Prog(s ′, a,Eff1), · · · ,Prog(s ′, a,Effn)} ;
(iii) Add Succ \ Cls to N and Open;
(iv) Add (s, s ′′) to E for every s ′′ in Succ;
3 If Open is empty, return 〈N,E 〉. Else repeat from 2.
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 16 / 24
Exporting a Conditional Plan
General Intuition
Enumerate the reachable state space of the policy P, and use thetuples in P marked as strong cyclic to replace complete states.
1 Let Open = {I ′}, Cls = ∅, and 〈N,E 〉 = 〈{I ′}, ∅〉;
2 Select and move a state s from Open to Cls such that,
(i) Let P(s) = 〈p, a, c〉 and if the tuple is marked as beingstrong cyclic, let s ′ = p, otherwise let s ′ = s;
(ii) Assuming that Eff(a) = {Eff1 · · ·Effn}, letSucc = {Prog(s ′, a,Eff1), · · · ,Prog(s ′, a,Effn)} ;
(iii) Add Succ \ Cls to N and Open;
(iv) Add (s, s ′′) to E for every s ′′ in Succ;
3 If Open is empty, return 〈N,E 〉. Else repeat from 2.
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 16 / 24
Exporting a Conditional Plan
General Intuition
Enumerate the reachable state space of the policy P, and use thetuples in P marked as strong cyclic to replace complete states.
1 Let Open = {I ′}, Cls = ∅, and 〈N,E 〉 = 〈{I ′}, ∅〉;2 Select and move a state s from Open to Cls such that,
(i) Let P(s) = 〈p, a, c〉 and if the tuple is marked as beingstrong cyclic, let s ′ = p, otherwise let s ′ = s;
(ii) Assuming that Eff(a) = {Eff1 · · ·Effn}, letSucc = {Prog(s ′, a,Eff1), · · · ,Prog(s ′, a,Effn)} ;
(iii) Add Succ \ Cls to N and Open;
(iv) Add (s, s ′′) to E for every s ′′ in Succ;
3 If Open is empty, return 〈N,E 〉. Else repeat from 2.
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 16 / 24
Exporting a Conditional Plan
General Intuition
Enumerate the reachable state space of the policy P, and use thetuples in P marked as strong cyclic to replace complete states.
1 Let Open = {I ′}, Cls = ∅, and 〈N,E 〉 = 〈{I ′}, ∅〉;2 Select and move a state s from Open to Cls such that,
(i) Let P(s) = 〈p, a, c〉 and if the tuple is marked as beingstrong cyclic, let s ′ = p, otherwise let s ′ = s;
(ii) Assuming that Eff(a) = {Eff1 · · ·Effn}, letSucc = {Prog(s ′, a,Eff1), · · · ,Prog(s ′, a,Effn)} ;
(iii) Add Succ \ Cls to N and Open;
(iv) Add (s, s ′′) to E for every s ′′ in Succ;
3 If Open is empty, return 〈N,E 〉. Else repeat from 2.
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 16 / 24
Exporting a Conditional Plan
General Intuition
Enumerate the reachable state space of the policy P, and use thetuples in P marked as strong cyclic to replace complete states.
1 Let Open = {I ′}, Cls = ∅, and 〈N,E 〉 = 〈{I ′}, ∅〉;2 Select and move a state s from Open to Cls such that,
(i) Let P(s) = 〈p, a, c〉 and if the tuple is marked as beingstrong cyclic, let s ′ = p, otherwise let s ′ = s;
(ii) Assuming that Eff(a) = {Eff1 · · ·Effn}, letSucc = {Prog(s ′, a,Eff1), · · · ,Prog(s ′, a,Effn)} ;
(iii) Add Succ \ Cls to N and Open;
(iv) Add (s, s ′′) to E for every s ′′ in Succ ;
3 If Open is empty, return 〈N,E 〉. Else repeat from 2.
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 16 / 24
Exporting a Conditional Plan
General Intuition
Enumerate the reachable state space of the policy P, and use thetuples in P marked as strong cyclic to replace complete states.
1 Let Open = {I ′}, Cls = ∅, and 〈N,E 〉 = 〈{I ′}, ∅〉;2 Select and move a state s from Open to Cls such that,
(i) Let P(s) = 〈p, a, c〉 and if the tuple is marked as beingstrong cyclic, let s ′ = p, otherwise let s ′ = s;
(ii) Assuming that Eff(a) = {Eff1 · · ·Effn}, letSucc = {Prog(s ′, a,Eff1), · · · ,Prog(s ′, a,Effn)} ;
(iii) Add Succ \ Cls to N and Open;
(iv) Add (s, s ′′) to E for every s ′′ in Succ ;
3 If Open is empty, return 〈N,E 〉. Else repeat from 2.
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 16 / 24
Outline
1 Background & Notation
2 PO-PRP
3 Evaluation
4 Conclusion
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 17 / 24
Setup
Used one fabricated and three existing domains to comparePO-PRP with CLG (Albore, Palacios, and Geffner 2009).
All domains are simple contingent planning problems.
Resources given: 30min and 1Gb time and memory limit.
Uses a modified version of the kreplanner translator.
Fast Downward’s invariant synthesis limited to 3 seconds.
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 17 / 24
Testbed
Domains
Doors (doors)
Coloured Balls (cballs)
Simple Wumpus (wumpus)
Canadian Traveller Problem (ctp)
Metrics
Size of contingent plan (action + sensing nodes)
Time to compute offline plan (sec)
PO-PRP Policy size (# of tuples)
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 18 / 24
Results: Time and Size (1/2)
ProblemTime (seconds) Size (actions + sensing)
CLG PO-PRP CLG PO-PRP
cballs-4-1 0.20 0.02 343 261
cballs-4-2 19.86 0.67 22354 13887
cballs-4-3 1693.02 171.28 1247512 671988
cballs-10-1 211.66 1.57 4829 4170
cballs-10-2 T M T M
doors-5 0.12 0.01 169 82
doors-7 3.50 0.04 2492 1295
doors-9 187.60 1.07 50961 28442
doors-11 T M T M
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 19 / 24
Results: Time and Size (2/2)
ProblemTime (seconds) Size (actions + sensing)
CLG PO-PRP CLG PO-PRP
wumpus-5 0.44 0.16 854 233
wumpus-7 9.28 1.54 7423 770
wumpus-10 1379.62 11.17 362615 2669
wumpus-15 T 86.16 T 15628
wumpus-20 T M T M
ctp-ch-1 0.00 0.00 5 4
ctp-ch-5 0.02 0.00 125 16
ctp-ch-10 2.2 0.02 4093 31
ctp-ch-15 133.24 0.07 131069 46
ctp-ch-20 T 0.22 T 61
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 20 / 24
Results: PO-PRP Policy -vs- Contingent Plan Size
ProblemSize PO-PRP Policy Size
CLG PO-PRP all strong used
cballs-4-1 343 261 150 24 125
cballs-4-2 22354 13887 812 46 507
cballs-4-3 1247512 671988 3753 81 1689
cballs-10-1 4829 4170 1139 20 815
wumpus-5 854 233 706 160 331
wumpus-7 7423 770 2974 241 992
wumpus-10 362615 2669 8122 281 2482
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 21 / 24
Outline
1 Background & Notation
2 PO-PRP
3 Evaluation
4 Conclusion
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 22 / 24
Summary
Introduced PO-PRP: an extension to the FOND planner,PRP, capable of solving simple contingent planning problems.
Identified some of the key considerations involved when usingthe POND-as-FOND methodology.
Demonstrated the performance gains to be had over theprevious state of the art in offline contingent planning.
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 22 / 24
Future Work
Extend PO-PRP to handle width-1 problems.
Handle a richer set of invariants for new compilations.
Improve the strong cyclic detection for more compact policies.
Session PM 2b: Planning Under Uncertainty (Tues. 2:50pm)
Non-Deterministic Planning With Conditional Effects Muise,C.; McIlraith, S. A.; and Bell, V. In the 24th InternationalConference on Automated Planning and Scheduling, 2014.
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 23 / 24
Future Work
Extend PO-PRP to handle width-1 problems.
Handle a richer set of invariants for new compilations.
Improve the strong cyclic detection for more compact policies.
Session PM 2b: Planning Under Uncertainty (Tues. 2:50pm)
Non-Deterministic Planning With Conditional Effects Muise,C.; McIlraith, S. A.; and Bell, V. In the 24th InternationalConference on Automated Planning and Scheduling, 2014.
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 23 / 24
Any Questions?
Benchmarks, code, and slides available (in July) at:http://www.haz.ca/research/poprp/
Computing Contingent Plans via Fully Observable Non-Deterministic Planning 24 / 24