Towards Model-lite Planning A Proposal For Learning & Planning with Incomplete Domain Models

Towards Model-lite PlanningA Proposal For Learning & Planning with Incomplete Domain Models

Sungwook YoonSubbarao Kambhampati

Supported by DARPA Integrated Learning Program

A Planning Problem

Towards Model-lite Planning - Sungwook Yoon

Suppose you have a super fast planner and a target application.What is the first problem you have to solve? Is it a problem from the application?

Domain Engineering is hard Model-lite Planning

Snapshot of the talk• This is a proposal. We formulate learning and planning

problems and solution methods for them. We tested our idea on some problems. But the verification is still an undergoing process

• We propose– Representation for model-lite planning

• probabilistic logic, incompleteness is quantified• Explicit consideration of domain invariant

– Learning of the domain model• Update of the probability and finding of the new axioms

– Planning with the model• Deterministic planning domain needs probabilistic planning• Most plausible plan that respects the current domain model


Representation

• Precondition Axiom: pAi, A → prei

• Uncertainty is quantified as a probability

• Effect Axiom: eAi, A → effecti

• Facilitates learning


Domain Model - Blocksworld

• 0.9, Pickup (x) -> armempty()• 1, Pickup (x) -> clear(x)• 1, Pickup (x) -> ontable(x)• 0.8, Pickup (x) –> holding(x)• 0.8, Pickup (x) -> not armempty()• 0.8, Pickup (x) -> not ontable(x)

Precondition Axiom:Relates Actions with Current state facts

Effect Axiom:Relates Actions with Next state facts


Representation• One modeling problem

• Conjunction of the effect have different semantics, if the probability of each effect is independently specified

• Add hidden variable, O , (e, A → O), then add deterministic axioms for each effect, (1,O → eff1), (1,O → eff2), …

• We can alleviate this problem also with explicit domain invariant property

• Writing explicit domain invariant property is easier than writing initial state generator and a set of operators that respects such property


• 0.8, Pickup (x) –> holding(x)• 0.8, Pickup (x) -> not armempty()• 0.8, Pickup (x) -> not ontable(x)

Effect Axiom:Relates Actions with Next state facts

• 1, holding(x) -> not armempty()• 1, holding(x) -> not ontable(x)

Static Property:Relates Facts in a State

Learning the domain model• Given a trajectory of states and actions, S1,A1,S2,A2, … , Sn,An,Sn+1

– We can learn precondition axioms from (S1,A1), (S2,A2), …, (Sn,An)– We can learn effect axioms from (A1,S2), (A2,S3), … , (An,Sn+1)– We can learn domain invariant properties from each state (S1), … , (Sn+1)– The weights (probabilities) of the axioms can be updated with simple

perceptron update

• There are readily available package for weighted logic learning– Alchemy (MLN)– Problog

• Structure learning– Alchemy provides structure learning too– We can also enumerate all the possible axioms (very costly for planning)


Model-lite planning Probabilistic Planning

• As stated before, with incomplete domain knowledge, a deterministic planning domain should be treated as a probabilistic domain

• The resulting plan should be maximally consistent with the current domain model

• We develop a planning technique for this purpose– A plan that is maximally plausible, given the

probabilistic axioms, initial state and goal • MPE solution to a Bayes Net problem

– Build on plangraph


Subbarao Kambhampati

Upfront people may think that this might bias the plan to inherit the incorrectnesses/incompleteness of the current model..

Probabilistic PlangraphA B

AB

clear_aclear_barmemptyontable_aontable_b

pickup_apickup_b

clear_aclear_barmemptyontable_aontable_bholding_aholding_b

pickup_apickup_bstack_a_bstack_b_a

clear_aclear_barmemptyontable_aontable_bholding_aholding_bon_a_bon_b_a

noop_clear_anoop_clear_bnoop_armemptynoop_ontable_anoop_ontable_b

noop_clear_anoop_clear_bnoop_armemptynoop_ontable_anoop_ontable_bnoop_holding_anoop_holding_b

0.8

How do we generate a weighted clause?0.95, pickup_b’ v holding_b

Red lines indicate Mutexes

0.8

Domain Invariant PropertyCan be asserted too


A BAB


pickup_apickup_b






0.8

Can we view the probabilistic plangraph as Bayes net?

Evidence Variables

How we find a solution?MPE (most probabilistic explanation)There are some solvers out there

0.5

0.8

Domain Invariant PropertyCan be asserted too, 0.9


MPE as Maxsat

• There has been a work by James D. Park, AAAI 2002

• Set –log(P) as the weight of the clauses

A/B P

T T 0.7

F T 0.3

T F 0.2

F F 0.8

Weighted Clauses-log0.7 -A v –B -log0.3 A V –B-log0.2 –A v B-log0.8 A v B

Intuitive explanationViolating the clause is easier for

High probability instances

Thus the MaxSat ProblemGives you the highest probability

instantiations

A->B, T T 1, T F 0, InfinityWeight for –A v B, (complies with our intuitive understanding)


A BAB


pickup_apickup_b






-log0.8

Probabilistic Plangraph to MaxSat

Evidence Variables

-log0.5

For each probabilistic weight, we give –log(1-p)!That’s it.

-log0.8

Domain Invariant PropertyCan be asserted too, -log0.9


Exploding Blocksworld


Current Status (ongoing)• Learning test

– Generated Blocksworld Random Wandering Data and feed them to Alchemy with correct and incorrect axioms

– Alchemy found higher weight on the correct axioms and lower weight on the incorrect axioms

• Planning test – Tested on probabilistic planning problems– Hand tested on a couple of instances of Slippery Gripper

Domain• Hand encoded the clauses and assigned the weight• Put the resulting clauses to MaxSat solve• Got desired results

– On Exploding Blocksworld• Implemented generic MaxSat encoder for probabilistic planning

problems• Tested on a couple of problems from Exploding Blocksworld• Finds desired output frequently (not always)


Summary• We can learn precondition axioms and effect

axioms separately.– A -> Prec, A->Effect– Facilitates the learning

• Domain axiom or Invariant Property can be, provided, learned and used explicitly– It is better for domain modeler

• For planning, we can apply probabilistic plangraph approach– We proposed using MaxSat to solve probabilistic

planning problems– Interesting parallel to deterministic planning to SAT


Domain Learning – Related Work

• Logical Filtering (Chang & Eyal, ICAPS’06)– Update belief state and domain transition model– Experiments involved planning

• Probabilistic operator learning (Zettlemoyer, Pasula and Kaelbling, AAAI’05)– Experiments involved planning

• ARMS (Yang, Wu and Jiang, ICAPS ‘05)– No observation besides initial state and goal


Probabilistic Planning in Plangraph – Related Work

• Pgraphplan, Paragraph• Both search plans in the graphplan

framework.• pGraphplan searches for a consistent plan that

maximizes the goal-reaching probability – Forward probability propagation

• Paragraph searches for a plan that minimizes the cost to reach the goal– Backward plan search


Towards Model-lite Planning A Proposal For Learning & Planning with Incomplete Domain Models

Documents

Transcript of Towards Model-lite Planning A Proposal For Learning & Planning with Incomplete Domain Models