Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

Decision Making Under Decision Making Under UncertaintyUncertainty

Russell and Norvig: ch 16

CMSC421 – Fall 2006

Utility-Based AgentUtility-Based Agent

environmentagent

?

sensors

actuators

Non-deterministic vs. Non-deterministic vs. Probabilistic UncertaintyProbabilistic Uncertainty

?

ba c

{a,b,c}

decision that is best for worst case

?

ba c

{a(pa),b(pb),c(pc)}

decision that maximizes expected utility valueNon-deterministic

modelProbabilistic model~ Adversarial search

Expected UtilityExpected UtilityRandom variable X with n values x1,…,xn and distribution (p1,…,pn)E.g.: Xi is Resulti(A)|Do(A), E, the state reached after doing an action A given E, what we know about the current state Function U of XE.g., U is the utility of a stateThe expected utility of A is EU[A|E] = i=1,…,n p(xi|A)U(xi)

= i=1,…,n p(Resulti(A)|Do(A),E)U(Resulti(A))

s0

s3s2s1

A1

0.2 0.7 0.1100 50 70

U(S0) = 100 x 0.2 + 50 x 0.7 + 70 x 0.1 = 20 + 35 + 7 = 62

One State/One Action One State/One Action ExampleExample

s0

s3s2s1

A1

0.2 0.7 0.1100 50 70

A2

s40.2 0.8

80

• U1(S0) = 62• U2(S0) = 74• U(S0) = max{U1(S0),U2(S0)} = 74

One State/Two Actions One State/Two Actions ExampleExample

s0

s3s2s1

A1

0.2 0.7 0.1100 50 70

A2

s40.2 0.8

80

• U1(S0) = 62 – 5 = 57• U2(S0) = 74 – 25 = 49• U(S0) = max{U1(S0),U2(S0)} = 57

-5 -25

Introducing Action CostsIntroducing Action Costs

MEU Principle

rational agent should choose the action that maximizes agent’s expected utilitythis is the basis of the field of decision theorynormative criterion for rational choice of action

Not quite…

Must have complete model of: Actions Utilities States

Even if you have a complete model, will be computationally intractableIn fact, a truly rational agent takes into account the utility of reasoning as well---bounded rationalityNevertheless, great progress has been made in this area recently, and we are able to solve much more complex decision theoretic problems than ever before

We’ll look at

Decision Theoretic Reasoning Simple decision making (ch. 16) Sequential decision making (ch. 17)

Preferences

An agent chooses among prizes (A, B, etc.) and lotteries, i.e., situations with uncertain prizes

Lottery L = [p, A; (1 – p), B]

Notation:A > B: A preferred to BA B : indifference between A and

BA ≥ B : B not preferred to A

Rational Preferences

Idea: preferences of a rational agent must obey constraints

Axioms of Utility Theory1. Orderability:

(A > B) v (B > A) v (A B)2. Transitivity:

(A > B) ^ (B > C) (A > C)3. Contitnuity:

A > B > C p [p, A; 1-p,C] B4. Substitutability:

A B [p, A; 1-p,C] [p, B; 1-p,C] 5. Monotonicity:

A > B (p ≥ q [p, A; 1-p, B] ≥ [q, A; 1-q, B])

Rational Preferences

Violating the constraints leads to irrational behavior

E.g: an agent with intransitive preferences can be induced to give away all its money

if B > C, than an agent who has C would pay some amount, say $1, to get B

if A > B, then an agent who has B would pay, say, $1 to get A

if C > A, then an agent who has A would pay, say, $1 to get C

….oh, oh!

Rational Preferences Utility

Theorem (Ramsey, 1931, von Neumann and Morgenstern, 1944): Given preferences satisfying the constraints, there exists a real-valued function U such that

U(A) ≥ U(B) A ≥BU([p1,S1;…,pn,Sn])=i piU(Si)

MEU principle: Choose the action that maximizes expected utility

Utility Assessment

Standard approach to assessment of human utilites:compare a given state A to a standard lottery Lp that has

best possible prize w/ prob. pworst possible catastrophy w/

prob. (1-p)adjust lottery probability p until ALpcontinue as before

instant death

A Lp

p

1 - p

Aside: Money Utility function

Given a lottery L with expected monetrary value EMV(L), usually U(L) < U(EMV(L)) e.g., people are risk-averse

Would you rather have $1,000,000 for sure, or a lottery with [0.5, $0; 0.5, $3,000,000]?

Decision Networks

Extend BNs to handle actions and utilitiesAlso called Influence diagramsMake use of BN inferenceCan do Value of Information calculations

Decision Networks cont.

Chance nodes: random variables, as in BNsDecision nodes: actions that decision maker can takeUtility/value nodes: the utility of the outcome state.

R&N example

Prenatal Testing Example

Umbrella Network

rainTake Umbrella

happiness

take/don’t takeP(rain) = 0.4

U(~umb, ~rain) = 100U(~umb, rain) = -100 U(umb,~rain) = 0U(umb,rain) = -25

umbrella

P(umb|take) = 1.0P(~umb|~take)=1.0

Evaluating Decision Networks

Set the evidence variables for current stateFor each possible value of the decision node: Set decision node to that value Calculate the posterior probability of the

parent nodes of the utility node, using BN inference

Calculate the resulting utility for action

return the action with the highest utility

Umbrella Network

rainTake Umbrella

happiness



umbrella

P(umb|take) = 1.0P(umb|~take)= 0

Umbrella Network

rainTake Umbrella

happiness



umbrella

P(umb|take) = 0.8P(umb|~take)=0.1

umb rain P(umb,rain | take)

0 0 0.2 x 0.6

0 1 0.2 x 0.4

1 0 0.8 x 0.6

1 1 0.8 x 0.4

#1

#1: EU(take) = 100 x .12 + -100 x 0.08 + 0 x 0.48 + -25 x .32 = ???

Umbrella Network

rainTake Umbrella

happiness



umbrella


umb rain P(umb,rain | ~take)

0 0 0. 9 x 0.6

0 1 0.9 x 0.4

1 0 0.1 x 0.6

1 1 0.1 x 0.4

#2

#2: EU(~take) = 100 x .54 + -100 x 0.36 + 0 x 0.06 + -25 x .04 = ???

So, in this case I would…?

Value of Information

Idea: Compute the expected value of acquiring possible evidence

Example: buying oil drilling rights Two blocks A and B, exactly one of them has oil, worth k Prior probability 0.5 Current price of block is k/2

What is the value of getting a survey of A done? Survey will say ‘oil in A’ or ‘no oil in A’ w/ prob. 0.5

Compute expected value of information (VOI) expected value of best action given the infromation minus expected

value of best action without informationVOI(Survey) = [0.5 x value of buy A given oil in A] + [0.5 x value of buy B given no oil in A] – 0

= ??

Value of Information (VOI)

suppose agent’s current knowledge is E. The value of the current best action is

i

iiA

))A(Do,E|)A(sult(ReP))A(sult(ReUmax)E|(EU

the value of the new best action (after new evidence E’ is obtained):

i

iiA

))A(Do,E,E|)A(sult(ReP))A(sult(ReUmax)E,E|(EU

the value of information for E’ is:

)E|(EU)E,e|(EU)E|e(P)E(VOIk

kekk

Umbrella Network

rain

forecast

Take Umbrella

happiness



umbrella


R P(F=rainy|R)

0 0.2

1 0.7

VOI

VOI(forecast)= P(rainy)EU(rainy) + P(~rainy)EU(~rainy) – EU()

Umbrella Network

rain

forecast

Take Umbrella

happiness

take/don’t take

P(F=rainy) = 0.4


umbrella


F P(R=rain|F)

0 0.2

1 0.7

VOI

VOI(forecast)= P(rainy)EU(rainy) + P(~rainy)EU(~rainy) – EU()

Summary: Simple Decision Making

Decision Theory = Probability Theory + Utility TheoryRational Agent operates by MEUDecision NetworksValue of Information

Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.

Documents

Transcript of Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006.