Planning Under Uncertainty

21
Planning Under Uncertainty

description

Planning Under Uncertainty. Sensing error Partial observability Unpredictable dynamics Other agents. Three Strategies for Handling Uncertainty. Reactive strategies Replanning Proactive strategies Anticipating future uncertainty Shaping future uncertainty (active sensing). - PowerPoint PPT Presentation

Transcript of Planning Under Uncertainty

Page 1: Planning Under Uncertainty

Planning Under Uncertainty

Page 2: Planning Under Uncertainty

Sensing error

Partial observability

Unpredictable dynamics

Other agents

Page 3: Planning Under Uncertainty

Three Strategies for Handling Uncertainty Reactive strategies

Replanning Proactive strategies

Anticipating future uncertaintyShaping future uncertainty (active

sensing)

Page 4: Planning Under Uncertainty

Uncertainty in Dynamics

This class: the robot has uncertain dynamicsCalibration errors, wheel slip, actuator error,

unpredictable obstacles Perfect (or good enough) sensing

Page 5: Planning Under Uncertainty

Modeling imperfect dynamics

Probabilistic dynamics modelP(x’|x,u): a probability distribution over

successors x’, given state x, control uMarkov assumption

Nondeterministic dynamics model f(x,u) -> a set of possible successors

Page 6: Planning Under Uncertainty

Target Tracking

targetrobot

The robot must keep a target in its field of view

The robot has a prior map of the obstacles

But it does not know the target’s trajectory in advance

Page 7: Planning Under Uncertainty

Target-Tracking Example Time is discretized into

small steps of unit duration

At each time step, each of the two agents moves by at most one increment along a single axis

The two moves are simultaneous

The robot senses the new position of the target at each step

The target is not influenced by the robot (non-adversarial, non-cooperative target)

targetrobot

Page 8: Planning Under Uncertainty

Time-Stamped States (no cycles possible)

State = (robot-position, target-position, time) In each state, the robot can execute 5 possible actions :

{stop, up, down, right, left} Each action has 5 possible outcomes (one for each possible action

of the target), with some probability distribution[Potential collisions are ignored for simplifying the presentation]

([i,j], [u,v], t)

• ([i+1,j], [u,v], t+1)• ([i+1,j], [u-1,v], t+1)• ([i+1,j], [u+1,v], t+1)• ([i+1,j], [u,v-1], t+1)• ([i+1,j], [u,v+1], t+1)

right

Page 9: Planning Under Uncertainty

Rewards and CostsThe robot must keep seeing the target as long as possible

Each state where it does not see the target is terminal

The reward collected in every non-terminal state is 1; it is 0 in each terminal state

[ The sum of the rewards collected in an execution run is exactly the amount of time the robot sees the target]

No cost for moving vs. not moving

Page 10: Planning Under Uncertainty

Expanding the state/action tree

...

horizon 1 horizon h

Page 11: Planning Under Uncertainty

Assigning rewards

...

horizon 1 horizon h

Terminal states: states where the target is not visible

Rewards: 1 in non-terminal states; 0 in others

But how to estimate the utility of a leaf at horizon h?

Page 12: Planning Under Uncertainty

Estimating the utility of a leaf

...

horizon 1 horizon h

Compute the shortest distance d for the target to escape the robot’s current field of view

If the maximal velocity v of the target is known, estimate the utility of the state to d/v [conservative estimate]

targetrobot

d

Page 13: Planning Under Uncertainty

Selecting the next action

...

horizon 1 horizon h

Compute the optimal policy over the state/action tree using estimated utilities at leaf nodes

Execute only the first step of this policy

Repeat everything again at t+1… (sliding horizon)

Real-time constraint: h is chosen so that a decision can be returned in unit time [A larger h may result in a better decision that will arrive too late !!]

Page 14: Planning Under Uncertainty

Pure Visual Servoing

Page 15: Planning Under Uncertainty

Computing and Using a Policy

Page 16: Planning Under Uncertainty

Markov Decision Process Finite state space X, action space U Reward function R(x) Optimal value function V(x)

Bellman equations If x is terminal:

V(x) = R(x)

If x is non-terminal:

V(x) = R(x) + maxuUx’XP(x’|x,u)V(x’)

*(x) = arg maxuUx’XP(x’|x,u)V(x’)

Page 17: Planning Under Uncertainty

Solving Finite MDPs

Value iteration Improve V(x) by repeatedly “backing-up”

value of best action Policy iteration

Improve (x) by repeatedly computing best action

Linear programming

Page 18: Planning Under Uncertainty

Continuous Markov Decision Processes Discretize state and action space

E.g., grids, random samples=> Perform value/policy iteration/LP

formulation on discrete space Or discretize value function

Basis functions=> Iterative least squares approaches

Page 19: Planning Under Uncertainty

Extra Credit for Open House +5% of final project grade Preliminary demo & description by next Tuesday Open House on April 16, 4-7pm

Page 20: Planning Under Uncertainty

Upcoming Subjects

Sensorless planning (Goldberg, 1994)

An example of a nondeterministic uncertainty model

Planning to sense (Gonzalez-Banos and Latombe 2002)

Page 21: Planning Under Uncertainty

Comments on Uncertainty

Reasoning with uncertainty requires representing and transforming sets of states

This is challenging in high-D spaces Hard to get accurate, sparse representation Multi-modal, nonlinear, degenerate distributions

Breakthroughs would have dramatic impact