Expressive and Efficient Frameworks for Partial Satisfaction Planning Subbarao Kambhampati Arizona...
-
date post
21-Dec-2015 -
Category
Documents
-
view
214 -
download
0
Transcript of Expressive and Efficient Frameworks for Partial Satisfaction Planning Subbarao Kambhampati Arizona...
Expressive and Efficient Frameworks for Partial Satisfaction Planning
Subbarao KambhampatiArizona State University
(Proposal submitted for consideration to Behzad Kamgar-Parsi/ONR)
Partial Satisfaction/Over-Subscription Planning
Traditional planning problems Find the (lowest cost) plan that satisfies all the given goals
PSP Planning Find the highest utility plan given the resource constraints
Goals have utilities and actions have costs
…arises naturally in many real world planning scenarios MARS rovers attempting to maximize scientific return, given resource
constraints UAVs attempting to maximize reconnaisance returns, given fuel etc constraints Logistics problems resource constraints
… due to a variety of reasons Constraints on agent’s resources Conflicting goals
With complex inter-dependencies between goal utilities Soft constraints Limited time
Supporting PSP planning
PSP planning changes planning from a “satisficing” to an “optimizing” problem It is trivial to find a plan; hard to find a good one!
Rich connections to OR(IP)/MDP
Requires selecting “objectives” in addition to “actions” Which subset of goals to achieve At what degree to satisfy individual goals
E.g. Collect as much soil sample as possible; get done as close to 2pm as possible
Currently, the objective selection is left to humans Leads to highly suboptimal plans since objective selection cannot be done
independent of planning
We propose to develop scalable methods for synthesizing plans in such over-subscribed scenarios
Proposal Overview
Preliminary work Simple formal model: PSP-Net Benefit MDP-based, IP-based, and heuristic-planning based approaches
Proposed directions Improving expressiveness of PSP planners
Handling goals needing degree of satisfaction (e.g. numeric goals) Handling goals with soft deadline (where utility of the delayed goals is reduced) Handling complex interactions between objectives
Interactions between the plans of the goals Interactions between the utilities of the goals
Improving search in PSP planners More powerful heuristics for PSP planning (which take interactions into
account) More flexible search frameworks --non-combinable costs and utilities
Multi-objective search
Applications Replanning as a PSP planning problem
Formulation
PSP Net benefit: Given a planning problem P = (F, A, I, G), and for each action a
“cost” ca 0, and for each goal fluent f G a “utility” uf 0, and a positive number k. Is there a finite sequence of actions = (a1, a2, …, an) that starting from I leads to a state S that has net benefit f(SG) uf – a ca k. PLAN EXISTENCE
PLAN LENGTH
PSP GOAL LENGTH
PSP GOAL
PLAN COST PSP UTILITY
PSP UTILITY COST
PSP NET BENEFIT
Maximize the Net Benefit
Actions have execution costs, goals have utilities, and the objective is to find the plan that has the highest net benefit. easy enough to extend to mixture of soft and hard goals
A spectrum of approaches for PSP-Net Benefit
EXACT METHODS Deterministic MDPs
Model the problem as a deterministic MDP with action costs, where a state has a reward equal to the utility of the goals that hold in it.
A special action “Done” takes the agent from any state S to a state Sd which is a sink state
Guaranteed optimal, but very slow (using SPUDD, a state of the art MDP solver)
Optiplan Integer programming based
STRIPS planner Optimal for a given plan length
Equivalent to bounded-horizon MDP
HEURISTIC METHODS
Altaltps
Heuristic planner that selects the “objectives” up front heuristically
Novel use of planning-graph based reachability analysis to pick objectives
Not optimal, but quite fast Sapaps
Models PSP as heuristic search. Can be optimal given admissible heuristics.
Can be thought of as a search-based solution to the deterministic MDP
[AAAI 2004; KBCS 2004]
Source of Strength: Planning graph basedReachability Heuristics for PSP
Comparison of approaches
[AAAI 2004]
Exact algorithms based on MDPs don’t scale at all
Adapting PG heuristics for PSP
Challenges: Need to propagate costs on the planning graph The exact set of goals are not clear
Interactions between goals Obvious approach of considering all 2n goal
subsets is infeasible
Idea: Select a subset of the top level goals upfront
Challenge: Goal interactions
Approach: Estimate the net benefit of each goal in terms of its utility minus the cost of its relaxed plan
Bias the relaxed plan extraction to (re)use the actions already chosen for other goals
Action Templates
Problem Spec
(Init, Goal state)
Solution Plan
GraphplanPlan Extension Phase
(based on STAN)
+
Cost Propagation
Cost-sensitive PlanningGraph
Extraction ofHeuristics
HeuristicsActions in the
Last Level
Goal Set selection
Algorithm
Cost sensitive
Search
Action Templates
Problem Spec
(Init, Goal state)
Solution Plan
GraphplanPlan Extension Phase
(based on STAN)
+
Cost Propagation
Cost-sensitive PlanningGraph
Extraction ofHeuristics
HeuristicsActions in the
Last Level
Goal Set selection
Algorithm
Cost sensitive
Search
0
0
0
0
4
0
0
4
5 5
8
5 5
3
l=0 l=1 l=2
4 4
12
[optional]
SAPAPS: A forward A* Approach for PSP
A*: f(S) = g(S) + h(S)
A1: Navigate(X,Y) A2: SampleSoil(Y)
A3: TakePicture
A4: Navigate(Y,Z)
A5: SampleRock(Y)
g(S) is the net benefit of the plan that got us from initial state to S -- Difference between the utility of goals holding in S and and the cost of actions that took us from I to S
h*(S) is the additional net benefit of the best plan P starting from S (If S’ is the result of applying P to S, then we want to maximize [U(S’) – U(S)] – C(P)] h(S) is the estimate of h*()
Anytime A* Algorithm:Search through best beneficial nodes
[optional]
SAPAPS: Modeling A* search for PSP
Search node evaluation (f = g+h): Lowest expected total
number of actions Candidate Plans:
Qualifying plans: Achieve all goals
Search termination criteria: Achieving all goals
Search node evaluation (f = g+h): Highest expected total “benefit”
(goal utility – action cost). Candidate Plans:
“Beneficial” plans: Total achieved goal utility > total action cost.
Search termination criteria: No search node appears to be
extendable to be more beneficial than the best beneficial plan found.
Many state-of-the-art planners use best-first A* search.How to model A* search to PSP Net Benefit?
[optional]
Proposal Overview
Preliminary work Simple formal model: PSP-Net Benefit MDP-based, IP-based, and heuristic-planning based approaches
Proposed directions Improving expressiveness of PSP planners
Handling goals needing degree of satisfaction (e.g. numeric goals) Handling goals with soft deadlines (where utility of the delayed goals is
reduced) Handling complex interactions between objectives
Interactions between the plans of the goals Interactions between the utilities of the goals
Improving search in PSP planners More powerful heuristics for PSP planning (which take interactions into account) More flexible search frameworks --non-combinable costs and utilities
Multi-objective search
Applications Replanning as a PSP planning problem
Search & Heuristic Improvements
Make objective selection more sensitive to goal (achievement) interactions Consider group interactions Consider negative interactions
Preliminary work in ICAPS 2005 (with Sanchez Nigenda)
Consider faster techniques for exact methods Leverage our recent work on
novel IP encodings Based on loosely coupled
network flow problems which is highly competitive with SAT methods ICAPS 2005 (with van den Briel)
Consider adapting directed and anytime MDP techniques
Example: state change flow network
I
I
G
AT_LOC1
AT_LOC2
IN_TRUCK1
AT_LOC1
AT_LOC2
t = 1 t = 2 t = 3
LOC1 LOC2
Action effects link multiple networks together
Package1
Truck1
LOAD(Package1) DRIVE(Truck1,Loc1,Loc2) UNLOAD(Package1)
Degree & Delay of Satisfaction• In metric temporal domains, PSP
will involve– Partial Degree of satisfaction
• If you can’t give me 1000$, give me half at least
• Need to track costs for various intervals of a numeric quantity
– Delayed Satisfaction• If you submit the
homework past the deadline, you will get penalty points
Preliminary work on degree of satisfaction in [IJCAI 2005]
Utility interactions between goals• PSP-net benefit considers goal
achievement interactions• ..but assumes additive model of goal
utilities – U(G1,G2)= U(G1)+U(G2)
• Additive utility model often unrealistic– Utility having two shoes is much more than
the sum of the utilities of having either one of them
– Utility of having two cars is less than the sum of utilities of having either one of them
• Challenges:– Elicit utility models (preference elicitation)– Model utility interactions
• Adapt and extend CP-nets for modeling goal utilities– Can also consider qualitative preference models
– Extend the reachability heuristics to consider both plan interactions and goal interactions
Non-combinable costs/utilities• PSP Net Benefit assumes costs and
utilities are in same units• …often does not hold
– E.g. different types of resource costs (fuel, manpower); different types of utilities
• Solution: Multi-objective search– Either elicit utility models
• Alpha * manpower + Beta * mission utility– ..or search for highest utility plans given a
specific resource bound– ..or provide pareto (non-dominated) set of
solution plans and let the user choose
• Challenge: Need to adapt reachability heuristics to separately track the various types of costs and utilities
– We plan to build on our work on multi-objective temporal planning in SAPA
Cost variation
0
10
20
30
40
50
60
0.1 0.2 0.3 0.4 0.5 0.6 0 0.8 0.9 0.95 1
Alpha
To
tal
Co
st
Makespan variation
Cost variation
0
10
20
30
40
50
60
0.1 0.2 0.3 0.4 0.5 0.6 0 0.8 0.9 0.95 1
Alpha
To
tal
Co
st
Makespan variation
Combining uncertainty and partial satisfaction
Time permitting, we hope to extend our PSP framework to handle stochastic domains
Planning in stochastic domains already has many natural affinities to PSP If the planner wants to ensure that its
plan reaches goals with higher probability, it needs to often go for longer (costlier) plans
..Many challenges remain in selecting objectives in stochastic domains We expect to leverage our significant
work in extending reachability heuristics for stochastic and non-deterministic domains
[UAI 2005; AAAI 2005; ICAPS 2004; JAIR in review]
Filtered Unioned (Labeled) Graph [SLUG]
13
15
1
3
4
5
1
3
5
o12
o34
o56
2
1
3
4
5
o12
o34
o23
o45
o56
2
6 6
7o67
oG
G G G
oG oG
35
Don’t let the name fool you!
Ignore irrelevant labels
Largest LUG == all LUGs
Optimized single graph
W. Cushing and D. Bryce, “State Agnostic Planning Graphs”, In AAAI, 2005.
Note: Not in the proposal draft
Explaining the planner’s decisions in mixed initiative scenarios
In mixed-initiative scenarios, humans would like to get explanations on the selected objectives Anecdotal evidence suggests that in military planning applications, human users are
not willing to take a plan when the objectives selected by the planner do not match the human’s intuition
Challenge: Explaining the “optimality” of the planner’s decisions is technically hard In contrast, explaining correctness is much simpler
Proposed approach: Will modify the reachability heuristic computations to leave a trace of their reasoning Intent would be to explain at least the pareto-optimality of the selected set of
objectives1. when a subgoal cannot not be included because of cost-based or preference-based
interactions with other selected subgoals, annotate this fact 2. summarize the pareto-set (in multi-objective optimization cases) in terms of
conditional plans explaining which member of the set is “optimal” under what conditions
3. Support sensitivity analysis on the stability of the selected objectives (i.e., under what conditions will they no longer be optimal)
Modeling Replanning as a PSP problem
Traditionally, replanning has been cast as a “procedure” rather than a problem Modify the old plan to handle the
new situations ..we take the stance that
replanning is a “problem” Achieve the original goals of the
agent from the current initial situation
Subject to various constraints that were imposed by the partial execution of the original plan
Reservations, Commitments– these are however soft constraints
..Replanning can be best modeled as a PSP problem!
We propose to do this..
Three Replanning Scenarios..that differ in their assumptions about other agents
Either no other agents or the agents are neutral E.g. Replanning in Robot path planning
Can focus on going from the current state to goal state (any differences are for computational savings)
Other agents are collaborative E.g. Travel planning where we broadcast our plans to our
friends Must consider commitments made by the
announcement/execution of the plan Other agents are adversarial
E.g. A naughty child pushing all red block stacks Must consider and plan around the disruptions that
the other agents can cause
Phoenix LAFlagstaff
Phoenix LAFlagstaff
Phoenix LAFlagstaff Phoenix LAFlagstaff
Phoenix LAFlagstaff
I won’t fall into that trap
again!
Can’t miss this meeting! I’ll go back for the car
later…
2:00 pm: Meet with Romeo
Where’s Romeo?
Cancelled! That’s
inconvenient!
Summary and Impact
PSP planning problems are ubiquitous and extend the modeling power of planning frameworks .. By foregrounding user preferences
among different objectives
They pose interesting technical challenges to the state of the art ..by emphasizing plan-quality
considerations
We have already made significant progress in handling PSP problems AAAI 2004; ICAPS 2005 (2); IJCAI 2005
..and propose to extend our framework significantly
..as well as demonstrate its power through applications
Proposal Overview
Preliminary work Simple formal model: PSP-Net Benefit MDP-based, IP-based, and heuristic-planning based approaches
Proposed directions Improving expressiveness of PSP planners
Handling goals needing degree of satisfaction (e.g. numeric goals) Handling goals with soft deadline (where utility of the delayed goals is reduced) Handling complex interactions between objectives
Interactions between the plans of the goals Interactions between the utilities of the goals
Improving search in PSP planners More powerful heuristics for PSP planning (which take interactions into account) More flexible search frameworks --non-combinable costs and utilities
Multi-objective search
Applications Replanning as a PSP planning problem