Making Simple Decisions

Making Simple Decisions

Chapter 16

Topics• Decision making under uncertainty

– Expected utility

– Utility theory and rationality

– Utility functions

– Decision networks

– Value of information

Uncertain Outcome of ActionsUncertain Outcome of Actions

• Some actions may have uncertain outcomes– Action: spend $10 to buy a lottery which pays $10,000 to the

winner

– Outcome: {win, not-win}

• Each outcome is associated with some merit (utility)– Win: gain $9990

– Not-win: lose $10

• There is a probability distribution associated with the outcomes of this action (0.0001, 0.9999).

• Should I take this action?

Expected UtilityExpected Utility

• Random variable X with n values x1,…,xn and distribution (p1,…,pn)– X is the outcome of performing action A (i.e., the state reached

after A is taken)• Function U of X

– U is a mapping from states to numerical utilities (values)• The expected utility of performing action A is

EU[A] = i=1,…,n p(xi|A)U(xi)

Utility of each outcomeProbability of each outcome

s0

s3s2s1

A1

0.2 0.7 0.1100 50 70

EU(A1) = 100 x 0.2 + 50 x 0.7 + 70 x 0.1 = 20 + 35 + 7 = 62

One State/One Action ExampleOne State/One Action Example

s0

s3s2s1

A1

0.2 0.7 0.1100 50 70

A2

s40.2 0.8

80

• EU(A1) = 62• EU(A2) = 74

One State/Two Actions ExampleOne State/Two Actions Example

MEU Principle• Decision theory: A rational agent should choose the action

that maximizes the agent’s expected utility

• Maximizing expected utility (MEU) is a normative criterion for rational choices of actions

• Must have complete model of:– Actions

– Utilities

– States

Decision networks

• Extend Bayesian nets to handle actions and utilities– a.k.a. influence diagrams

• Make use of Bayesian net inference

• Useful application: Value of Information

Decision network representation

• Chance nodes: random variables, as in Bayesian nets

• Decision nodes: actions that decision maker can take

• Utility/value nodes: the utility of the outcome state.

Airport example

Airport example II

Evaluating decision networks

• Set the evidence variables for the current state.

• For each possible value of the decision node (assume just one):– Set the decision node to that value.

– Calculate the posterior probabilities for the parent nodes of the utility node, using BN inference.

– Calculate the resulting expected utility for the action.

• Return the action with the highest expected utility.

Exercise: Umbrella network

Weather

Forecast

Umbrella

Happiness

take/don’t take

f w p(f|w)sunny rain 0.3rainy rain 0.7sunny no rain 0.8rainy no rain 0.2

P(rain) = 0.4

U(lug, rain) = -25U(lug, ~rain) = 0U(~lug, rain) = -100U(~lug, ~rain) = 100

Lug umbrella

P(lug|take) = 1.0P(~lug|~take)=1.0

Value of Perfect Information (VPI)• How much is it worth to observe (with certainty) a random

variable X?• Suppose the agent’s current knowledge is E. The value of the current

best action is:EU(α | E) = maxA ∑i U(Resulti(A)) p(Resulti(A) | E, Do(A))

• The value of the new best action after observing the value of X is: EU(α’ | E,X) = maxA ∑i U(Resulti(A)) p(Resulti(A) | E, X, Do(A))

• …But we don’t know the value of X yet, so we have to sum over its possible values

• The value of perfect information for X is therefore: VPI(X) = ( ∑k p(xk | E) EU(αxk | xk, E)) – EU (α | E)

Probability ofeach value of X

Expected utilityof the best actiongiven that value of X

Expected utilityof the best actionif we don’t know X(i.e., currently)

VPI exercise: Umbrella network

Weather

Forecast

Umbrella

Happiness

take/don’t take

f w p(f|w)sunny rain 0.3rainy rain 0.7sunny no rain 0.8rainy no rain 0.2

P(rain) = 0.4

U(lug, rain) = -25U(lug, ~rain) = 0U(~lug, rain) = -100U(~lug, ~rain) = 100

Lug umbrella

P(lug|take) = 1.0P(~lug|~take)=1.0

What’s the value of knowing the weather forecast before leaving home?

Information gathering agent

• Using VPI we can design an agent that gathers information (greedily)

function INFORMATION-GATHERING-AGENT (percept) return an action Persistent D a decision network

integrate percept into D j = the value that maximizes VPI(Ej) / Cost(Ej) // or VPI(Ej) - Cost(Ej) if VPI(Ej) > Cost(Ej)

return REQUEST(Ej) else

return the best action from D

Making Simple Decisions

Documents

Transcript of Making Simple Decisions