csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle...

CSC2542 Contingent Planning

Sheila McIlraithDepartment of Computer ScienceWinter, 2019

Contingent PlanningThe slides that follow describe the following publication:

Christian J. Muise, Vaishak Belle, Sheila A. McIlraith: Computing Contingent Plans via Fully Observable Non-Deterministic Planning. AAAI 2014: 2322-2329

Slides largely prepared by Christian Muise.

Computing Contingent Plans viaFully Observable Non-Deterministic Planning

Christian Muise Vaishak Belle Sheila A. McIlraith

Department of Computer Science,University of Toronto, Toronto, Canada.{cjmuise,vaishak,sheila}@cs.toronto.edu

June 23, 2014

Overview

PRP (2012)

Planner for Fully ObservableNon-Deterministic (FOND).

Powerful techniques for deadenddetection, state relevance, etc.

Recently updated to handleconditional effects.

PO-PRP (2014)

Computes conditional plans forsimple contingent problems.

Modifies the computational coreof PRP to solve encodings.

Novel approach for constructingcompact conditional plans.

Main Contribution

Leverage modern FOND planners and POND encodings toeffectively compile a compact conditional plan offline.

Computing Contingent Plans via Fully Observable Non-Deterministic Planning 1 / 24

Overview

PRP (2012)

PO-PRP (2014)

Main Contribution

Overview

PRP (2012)

PO-PRP (2014)

Main Contribution

Example Plan: doors-5

Outline

1 Background & Notation

2 PO-PRPEncodingComputingRepresenting

3 Evaluation

4 Conclusion

Outline

2 PO-PRP

3 Evaluation

4 Conclusion

Contingent Planning (1/2)

Planning Task: 〈F , I,G,O,A〉F : Finite set of fluents.

I: Set of clauses over F that determines the initial state.

G: Conjunction of atoms over F that determines the goal.

O: Set of observations.

A: Set of actions.

States

Belief state b represents a set of states.

I corresponds to a belief state bI .

b satisfies a formula φ if φ holds in every state s ∈ b.

Actions

Pre(a): Conjunction of atoms that must hold to execute a.

Eff(a): Set of pairs 〈C , L〉 to capture conditional effects.

Action progression: ba = {Prog(s, a) | s ∈ b}

Observations

Observation o consists of the condition Pre(o) that the agentmust know before observing the truth of fluent Sense(o).

Belief update with observation: boa = {s | s ∈ ba and L ∈ s},where o = 〈C , L〉 and every state in ba satisfies C .

Actions

Pre(a): Conjunction of atoms that must hold to execute a.

Eff(a): Set of pairs 〈C , L〉 to capture conditional effects.

Action progression: ba = {Prog(s, a) | s ∈ b}

Observations

Observation o consists of the condition Pre(o) that the agentmust know before observing the truth of fluent Sense(o).

Belief update with observation: boa = {s | s ∈ ba and L ∈ s},where o = 〈C , L〉 and every state in ba satisfies C .

Simple Contingent Problems (Bonet & Geffner, 2011)

Effective Characterization

The non-unary clauses in I are invariant. E.g.,:

Location of the agent (always in exactly one spot).

Uncertainty of a fluent (can only ever be true or false).

Monotonicity

Uncertainty of a fluent cannot propagate to other fluents. E.g.,:

Ok: The effect dropping a vase when we know its status:〈{fragile vase}, broken vase〉

Not Ok: Unknowingly moving into a trap:〈{at wumpus loc3}, dead〉

Corollary: Once known, a fluent remains known.

Monotonicity

FOND Planning

Planning Task: 〈F , I,G,A〉F : Finite set of fluents.

I: Complete setting of the fluents for the initial state.

A: Set of (possibly non-deterministic) actions.

Actions

Pre(a) is the set of fluents that must be true to execute a.

Eff(a) is a finite set of non-deterministic effects.

A non-deterministic effect consists of a set of conditionaleffects, each being a pair 〈C , L〉 (C possibly being >)

FOND Planning

Planning Task: 〈F , I,G,A〉F : Finite set of fluents.

I: Complete setting of the fluents for the initial state.

A: Set of (possibly non-deterministic) actions.

Actions

Pre(a) is the set of fluents that must be true to execute a.

Eff(a) is a finite set of non-deterministic effects.

A non-deterministic effect consists of a set of conditionaleffects, each being a pair 〈C , L〉 (C possibly being >)

FOND Solutions and Representation

Weak Plan

Strong Plan

Strong Cyclic Plan

Weak Plan

Strong Plan

Strong Cyclic Plan

Weak Plan

Strong Plan

Strong Cyclic Plan

Policy

P(s): Given a set of tuples of the form 〈p, a, c〉 where p is a partialstate, a an action, and c a cost, return the action a in state s fromthe tuple with the lowest cost c such that s |= p.

Strong Cyclic Plan

Outline

2 PO-PRP

3 Evaluation

4 Conclusion

Approach

1 Convert the contingent problem to a FOND one.

2 Solve with a tailored version of PRP.

3 Compile the final conditional plan.

Outline

FOND Encoding (based on Bonet & Geffner, 2011)

Planning Task: 〈F ′, I ′,G ′,A′〉F ′: {KL | L ∈ F} ∪ {K¬L | L ∈ F}I ′: {KL | L ∈ I}G′: {KL | L ∈ G}A′: A′

A ∪ A′O ∪ A′

A′A: a′ for every a ∈ A where, Pre(a′) = {KL | L ∈ Pre(a)}and Eff(a′) = {〈KC ,KL〉 | 〈C , L〉 ∈ Eff(a)}.

A′O: for o = 〈C , L〉 ∈ O, there is an action a′ ∈ A′O such thatPre(a′) = KC ∧ ¬KL ∧ ¬K¬L with two possiblenon-deterministic effects: {〈>,KL〉} and {〈>,K¬L〉}.A′V : for every (C ⊃ L) ∈ I∗, there is an action a′ ∈ A′V suchthat Pre(a′) = KC and Eff(a′) = {〈>,KL〉}.

A ∪ A′O ∪ A′

VA′A: a′ for every a ∈ A where, Pre(a′) = {KL | L ∈ Pre(a)}and Eff(a′) = {〈KC ,KL〉 | 〈C , L〉 ∈ Eff(a)}.

A′O: for o = 〈C , L〉 ∈ O, there is an action a′ ∈ A′O such thatPre(a′) = KC ∧ ¬KL ∧ ¬K¬L with two possiblenon-deterministic effects: {〈>,KL〉} and {〈>,K¬L〉}.A′V : for every (C ⊃ L) ∈ I∗, there is an action a′ ∈ A′V suchthat Pre(a′) = KC and Eff(a′) = {〈>,KL〉}.

Note: Do not need to include {¬K¬C ,¬K¬L | 〈C , L〉 ∈ Eff(a)}

A ∪ A′O ∪ A′

VA′A: a′ for every a ∈ A where, Pre(a′) = {KL | L ∈ Pre(a)}and Eff(a′) = {〈KC ,KL〉 | 〈C , L〉 ∈ Eff(a)}.A′O: for o = 〈C , L〉 ∈ O, there is an action a′ ∈ A′O such thatPre(a′) = KC ∧ ¬KL ∧ ¬K¬L with two possiblenon-deterministic effects: {〈>,KL〉} and {〈>,K¬L〉}.

A′V : for every (C ⊃ L) ∈ I∗, there is an action a′ ∈ A′V suchthat Pre(a′) = KC and Eff(a′) = {〈>,KL〉}.

Note: Can only observe a fluent once due to monotonicity.

A ∪ A′O ∪ A′

VA′A: a′ for every a ∈ A where, Pre(a′) = {KL | L ∈ Pre(a)}and Eff(a′) = {〈KC ,KL〉 | 〈C , L〉 ∈ Eff(a)}.A′O: for o = 〈C , L〉 ∈ O, there is an action a′ ∈ A′O such thatPre(a′) = KC ∧ ¬KL ∧ ¬K¬L with two possiblenon-deterministic effects: {〈>,KL〉} and {〈>,K¬L〉}.A′V : for every (C ⊃ L) ∈ I∗, there is an action a′ ∈ A′V suchthat Pre(a′) = KC and Eff(a′) = {〈>,KL〉}.

Key Considerations

Fairness Assumption

How do we get away with assuming that the non-deterministicoutcomes of a sensing action are “fair”?

Reachable Incoherent States

How do we handle belief states that will never occur in practice?(e.g., considering a sensing outcome that is not possible)

Solution Correspondence

How do we interpret the policy that PRP generates for the originalcontingent planning problem?

Key Considerations

Fairness Assumption

How do we get away with assuming that the non-deterministicoutcomes of a sensing action are “fair”? Answer: Monotonicity!

Reachable Incoherent States

How do we handle belief states that will never occur in practice?(e.g., considering a sensing outcome that is not possible)

Solution Correspondence

How do we interpret the policy that PRP generates for the originalcontingent planning problem?

Outline

General PO-PRP Algorithm

Simple Contingent Problem → FOND Problem → Contingent Plan

1 Initialize Open and Closed lists of states handled by theincumbent policy P (Open initially contains only I ′);

2 Select and move a state s from Open to Closed such that,1 If P(s) = ⊥, run UpdatePolicy(〈F ′, s,G′,A′〉,P);2 If P(s) = a, and a 6= ⊥, add to Open every state in{Prog(s, a,Effi ) | a = 〈Prea,Eff1, . . . ,Effk〉}\ Closed ;

3 If UpdatePolicy failed in 2.2, process s as a deadend ;

3 If Open is empty, return P. Else, repeat from step 2;

Simple Contingent Problem → FOND Problem ⇒ Contingent Plan

UpdatePolicy(〈F ′, s,G ′,A′〉,P)

1 A′′ = Determinize(A′); // Using all-outcomes

2 [a1, · · · an] = ComputePlan(〈F ′, s,G′,A′′〉);

3 For every suffix [ai , · · · an] of the plan,1 pi = Regress(G′, [ai , · · · an]);2 ci = cost([ai , · · · an]);3 Add 〈pi , ai , ci 〉 to P;

Invariants left as actions (and not axioms).

Priority is given to useful invariants.

Arbitrary ordering of invariants enforced.

UpdatePolicy(〈F ′, s,G ′,A′〉,P)

1 A′′ = Determinize(A′); // Using all-outcomes

2 [a1, · · · an] = ComputePlan(〈F ′, s,G′,A′′〉);

3 For every suffix [ai , · · · an] of the plan,1 pi = Regress(G′, [ai , · · · an]);2 ci = cost([ai , · · · an]);3 Add 〈pi , ai , ci 〉 to P;

Invariants left as actions (and not axioms).

Priority is given to useful invariants.

Arbitrary ordering of invariants enforced.

Outline

Contingent Plan Representations

Option 1: State-Action Mapping

Uses the default output of PRP.

Requires maintaining the compiled belief.

Difficult to see the “bigger picture”.

Option 2: Conditional Plan

Expressed in terms of the original problem.

Standard for (offline) contingent planning.

Typically takes the form of an exponentially large tree.

Contingent Plan Representations

Option 1: State-Action Mapping

Uses the default output of PRP.

Requires maintaining the compiled belief.

Difficult to see the “bigger picture”.

Option 2: Conditional Plan

Expressed in terms of the original problem.

Standard for (offline) contingent planning.

Typically takes the form of an exponentially large tree.

Strong Cyclic Detection

(PO-)PRP identifies a subset of the tuples in a policy P that aredetermined to be marked as strong cyclic.

Strong Cyclic Property 1

If a tuple 〈p, a, c〉 ∈ P is marked as strong cyclic, then for anystate s such that s |= p, the policy P is a strong cyclic plan for s.

If a tuple 〈p, a, c〉 ∈ P is marked as strong cyclic, then for anystate s such that s |= p, s ′ = Prog(s, a), and P(s ′) = 〈p′, a′, c ′〉,either p′ is G′ or the tuple 〈p′, a′, c ′〉 is marked as strong cyclic.

Exporting a Conditional Plan

General Intuition

Enumerate the reachable state space of the policy P, and use thetuples in P marked as strong cyclic to replace complete states.

1 Let Open = {I ′}, Cls = ∅, and 〈N,E 〉 = 〈{I ′}, ∅〉;2 Select and move a state s from Open to Cls such that,

(i) Let P(s) = 〈p, a, c〉 and if the tuple is marked as beingstrong cyclic, let s ′ = p, otherwise let s ′ = s;

(ii) Assuming that Eff(a) = {Eff1 · · ·Effn}, letSucc = {Prog(s ′, a,Eff1), · · · ,Prog(s ′, a,Effn)} ;

(iii) Add Succ \ Cls to N and Open;

(iv) Add (s, s ′′) to E for every s ′′ in Succ;

3 If Open is empty, return 〈N,E 〉. Else repeat from 2.

General Intuition

1 Let Open = {I ′}, Cls = ∅, and 〈N,E 〉 = 〈{I ′}, ∅〉;

2 Select and move a state s from Open to Cls such that,

General Intuition

(iv) Add (s, s ′′) to E for every s ′′ in Succ ;

General Intuition

(iv) Add (s, s ′′) to E for every s ′′ in Succ ;

Outline

2 PO-PRP

3 Evaluation

4 Conclusion

Used one fabricated and three existing domains to comparePO-PRP with CLG (Albore, Palacios, and Geffner 2009).

All domains are simple contingent planning problems.

Resources given: 30min and 1Gb time and memory limit.

Uses a modified version of the kreplanner translator.

Fast Downward’s invariant synthesis limited to 3 seconds.

Testbed

Domains

Doors (doors)

Coloured Balls (cballs)

Simple Wumpus (wumpus)

Canadian Traveller Problem (ctp)

Metrics

Size of contingent plan (action + sensing nodes)

Time to compute offline plan (sec)

PO-PRP Policy size (# of tuples)

Results: Time and Size (1/2)

ProblemTime (seconds) Size (actions + sensing)

CLG PO-PRP CLG PO-PRP

cballs-4-1 0.20 0.02 343 261

cballs-4-2 19.86 0.67 22354 13887

cballs-4-3 1693.02 171.28 1247512 671988

cballs-10-1 211.66 1.57 4829 4170

cballs-10-2 T M T M

doors-5 0.12 0.01 169 82

doors-7 3.50 0.04 2492 1295

doors-9 187.60 1.07 50961 28442

doors-11 T M T M

Results: Time and Size (2/2)

ProblemTime (seconds) Size (actions + sensing)

CLG PO-PRP CLG PO-PRP

wumpus-5 0.44 0.16 854 233

wumpus-7 9.28 1.54 7423 770

wumpus-10 1379.62 11.17 362615 2669

wumpus-15 T 86.16 T 15628

wumpus-20 T M T M

ctp-ch-1 0.00 0.00 5 4

ctp-ch-5 0.02 0.00 125 16

ctp-ch-10 2.2 0.02 4093 31

ctp-ch-15 133.24 0.07 131069 46

ctp-ch-20 T 0.22 T 61

Results: PO-PRP Policy -vs- Contingent Plan Size

ProblemSize PO-PRP Policy Size

CLG PO-PRP all strong used

cballs-4-1 343 261 150 24 125

cballs-4-2 22354 13887 812 46 507

cballs-4-3 1247512 671988 3753 81 1689

cballs-10-1 4829 4170 1139 20 815

wumpus-5 854 233 706 160 331

wumpus-7 7423 770 2974 241 992

wumpus-10 362615 2669 8122 281 2482

Outline

2 PO-PRP

3 Evaluation

4 Conclusion

Summary

Introduced PO-PRP: an extension to the FOND planner,PRP, capable of solving simple contingent planning problems.

Identified some of the key considerations involved when usingthe POND-as-FOND methodology.

Demonstrated the performance gains to be had over theprevious state of the art in offline contingent planning.

Future Work

Extend PO-PRP to handle width-1 problems.

Handle a richer set of invariants for new compilations.

Improve the strong cyclic detection for more compact policies.

Session PM 2b: Planning Under Uncertainty (Tues. 2:50pm)

Non-Deterministic Planning With Conditional Effects Muise,C.; McIlraith, S. A.; and Bell, V. In the 24th InternationalConference on Automated Planning and Scheduling, 2014.

Future Work

Extend PO-PRP to handle width-1 problems.

Handle a richer set of invariants for new compilations.

Improve the strong cyclic detection for more compact policies.

Session PM 2b: Planning Under Uncertainty (Tues. 2:50pm)

Non-Deterministic Planning With Conditional Effects Muise,C.; McIlraith, S. A.; and Bell, V. In the 24th InternationalConference on Automated Planning and Scheduling, 2014.

Any Questions?

Benchmarks, code, and slides available (in July) at:http://www.haz.ca/research/poprp/

csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle...

Documents

Transcript of csc2542w19 ContingentPlanning - Department of Computer ...€¦ · Christian Muise Vaishak Belle...

VAISHAK VENUGOPAL SRUTHI GOPALAN 1174114 … · manikoth po kanhangad kasargod dt kerala appln. no. 1125966 income rs. 20000/ ... ravi nikethan karatvayal po balla kanhangad kasaragod

Causality Aspects of Modi ed Kerr-Newman spacetimes · Causality Aspects of Modi ed Kerr-Newman spacetimes Vaishak Prasad Inter-University Center for Astronomy and Astrophysics (IUCAA),

Toronto Police officers - Toronto Police Service

InterContinental Toronto Hotel InterContinental Toronto Hotel.

Toronto Jewellery - Custom Jewelry Toronto Engagement Rings

Table of Contents...can affect individuals’ romantic relationship satisfaction. For instance, females score higher than males on both Facebook usage and romantic jealousy (Muise

NEWSLETTER · • Vaishak S and Purnanand V. Bhale. “Effectof dust deposition on performance characteristics of a refrigerant based photovoltaic/thermal system”,Sustainable Energy

Advanced Thermal Barrier Coatings for Operation in High ......Advanced Thermal Barrier Coatings for Operation in High Hydrogen Content Gas Turbines Gopal Dwivedi, Vaishak Viswanathan,Yang

Male Strip Club Toronto | Best Toronto Strippers

Learning Implicitly with Noisy Data in Linear Arithmetic · 2021. 1. 2. · Ionela Georgiana Mocanu University of Edinburgh i.g.mocanu@ed.ac.uk Vaishak Belle University of Edinburgh

Remote Defense Turret Courtney Mann(EE) Szu-Yu Fairen Huang (EE) Brad Clymer (EE) Mentor: Dr. Robert Muise, Senior Member, IEEE, Lockheed Martin Sponsored.

VAISHAK -AMUL

TORONTO / INDUSTRIA V3 TORONTO V3 TORONTO / INDUSTRIA …pdf.lowes.com/installationguides/816793011631_install.pdf · TORONTO / INDUSTRIA V3 Ø 155 cm Notice de montage Installation

csc2542w19 EpistemicPlanning - Department of Computer ...€¦ · Vaishak Belle Sheila McIlraith Paolo Felli Tim Miller Adrian Pearce Liz Sonenberg Planning over Multi-Agent Epistemic

Toronto International - Toronto District School Board...Toronto District School Board (TDSB) Getting to Toronto Just a 20-minute train ride from Union Station in downtown, Toronto

Configuring & Extending Web App Templates€¦ · Configuring & Extending Web App Templates. Allison Muise. Mike Tschudi. Scott Oppmann. ArcGIS as a Platform. ArcGIS. Executive Access.

Specifier The Toronto - CSC Toronto Chaptertoronto.csc-dcc.ca/img/content/Toronto Chapter/Toronto Specifiers... · The Toronto Specifier CSC ... slide! As the struggle to ... 6. Use

UNIVERSITY OF PENNSYLVANIA · Seniors Vaishak Kumar, Melanie Mariano and Kriya Patel have been named recipients of the 2016 President’s Engagement Prize at the University of Pennsylvania.

Toronto Transit Commission / City of Toronto …thecrosstown.ca/.../appendixn-travel-demand-forecasting-report.pdf · toronto transit commission / city of toronto eglinton crosstown

Regional Housing Market Tables City of Toronto: Toronto ... · Toronto Real Estate Board City of Toronto, Toronto W01 3 Regional Housing Market Tables City of Toronto, Toronto W01