Uncertainty and Utilitiessniekum/classes/343-S20/... · 2019. 8. 29. · Makes one think of...

CS343:ArtificialIntelligenceUncertaintyandUtilities

Prof.ScottNiekum

TheUniversityofTexasatAustin[TheseslidesarebasedonthoseofDanKleinandPieterAbbeelforCS188IntrotoAIatUCBerkeley.AllCS188materialsareavailableathttp://ai.berkeley.edu.]

UncertainOutcomes

Worst-Casevs.AverageCase

10 10 9 100

max

min

Idea:Uncertainoutcomescontrolledbychance,notanadversary!

chance

ExpectimaxSearch

▪ Whywouldn’tweknowwhattheresultofanactionwillbe?▪ Explicitrandomness:rollingdice▪ Unpredictableopponents:theghostsrespondrandomly▪ Actionscanfail:whenmovingarobot,wheelsmightslip

▪ Valuesshouldnowreflectaverage-case(expectimax)outcomes,notworst-case(minimax)outcomes

▪ Expectimaxsearch:computetheaveragescoreunderoptimalplay▪ Maxnodesasinminimaxsearch▪ Chancenodesarelikeminnodesbuttheoutcomeisuncertain▪ Calculatetheirexpectedutilities▪ I.e.takeweightedaverage(expectation)ofchildren

▪ Later,we’lllearnhowtoformalizetheunderlyinguncertain-resultproblemsasMarkovDecisionProcesses

10 4 5 7

max

chance

10 10 9 100

MinimaxvsExpectimax(Min)

End your misery!

MinimaxvsExpectimax(Exp)

Hold on to hope, Pacman!

ExpectimaxPseudocode

defvalue(state):ifthestateisaterminalstate:returnthestate’sutilityifthenextagentisMAX:returnmax-value(state)ifthenextagentisEXP:returnexp-value(state)

defexp-value(state):initializev=0foreachsuccessorofstate: p=probability(successor)

v+=p*value(successor)returnv

defmax-value(state):initializev=-∞ foreachsuccessorofstate:

v=max(v,value(successor))returnv

ExpectimaxPseudocode

defexp-value(state):initializev=0foreachsuccessorofstate: p=probability(successor)

v+=p*value(successor)returnv 5 78 24 -12

1/21/3

1/6

v=(1/2)(8)+(1/3)(24)+(1/6)(-12)=10

ExpectimaxExample

12 9 6 03 2 154 6

8 4 7

8

ExpectimaxPruning?

12 93 2

Depth-LimitedExpectimax

…

…

492 362 …

400 300

Estimateoftrueexpectimaxvalue(whichwouldrequirealotof

worktocompute)

Probabilities

Reminder:Probabilities

▪ Arandomvariablerepresentsaneventwhoseoutcomeisunknown▪ Aprobabilitydistributionisanassignmentofweightstooutcomes

▪ Example:Trafficonfreeway▪ Randomvariable:T=whetherthere’straffic▪ Outcomes:Tin{none,light,heavy}▪ Distribution:P(T=none)=0.25,P(T=light)=0.50,P(T=heavy)=0.25

▪ Somelawsofprobability(morelater):▪ Probabilitiesarealwaysnon-negative▪ Probabilitiesoverallpossibleoutcomessumtoone

▪ Aswegetmoreevidence,probabilitiesmaychange:▪ P(T=heavy)=0.25,P(T=heavy|Hour=8am)=0.60▪ We’lltalkaboutmethodsforreasoningandupdatingprobabilitieslater

0.25

0.50

0.25

▪ Theexpectedvalueofafunctionofarandomvariableistheaverage,weightedbytheprobabilitydistributionoveroutcomes

▪ Example:Howlongtogettotheairport?

Reminder:Expectations

0.25 0.50 0.25Probability:

20min 30min 60minTime:35minx x x+ +

▪ Inexpectimaxsearch,wehaveaprobabilisticmodelofhowtheopponent(orenvironment)willbehaveinanystate▪ Modelcouldbeasimpleuniformdistribution(rolladie)▪ Modelcouldbesophisticatedandrequireagreatdealof

computation▪ Wehaveachancenodeforanyoutcomeoutofourcontrol:

opponentorenvironment▪ Themodelmightsaythatadversarialactionsarelikely!

▪ Fornow,assumeeachchancenodemagicallycomesalongwithprobabilitiesthatspecifythedistributionoveritsoutcomes

WhatProbabilitiestoUse?

Havingaprobabilisticbeliefaboutanotheragent’sactiondoesnotmeanthattheagentisflippinganycoins!

WhatareProbabilities?

▪ Objectivist/frequentistanswer:▪ Averagesoverrepeatedexperiments▪ E.g.empiricallyestimatingP(rain)fromhistoricalobservation▪ Assertionabouthowfutureexperimentswillgo(inthelimit)▪ Newevidencechangesthereferenceclass▪ Makesonethinkofinherentlyrandomevents,likerollingdice

▪ Subjectivist/Bayesiananswer:▪ Degreesofbeliefaboutunobservedvariables▪ E.g.anagent’sbeliefthatit’sraining,giventhetemperature▪ E.g.pacman’sbeliefthattheghostwillturnleft,giventhestate▪ Oftenlearnprobabilitiesfrompastexperiences(morelater)▪ Newevidenceupdatesbeliefs(morelater)

Quiz:InformedProbabilities

▪ Let’ssayyouknowthatyouropponentisactuallyrunningadepth2minimax,usingtheresult80%ofthetime,andmovingrandomlyotherwise

▪ Question:Whattreesearchshouldyouuse?

0.10.9

▪ Answer:Expectimax!▪ TofigureoutEACHchancenode’sprobabilities,

youhavetorunasimulationofyouropponent

▪ Thiskindofthinggetsveryslowveryquickly

▪ Evenworseifyouhavetosimulateyouropponentsimulatingyou…

▪ …exceptforminimax,whichhasthenicepropertythatitallcollapsesintoonegametree

ModelingAssumptions

TheDangersofOptimismandPessimism

DangerousOptimismAssumingchancewhentheworldisadversarial

DangerousPessimismAssumingtheworstcasewhenit’snotlikely

Assumptionsvs.Reality

AdversarialGhost RandomGhost

MinimaxPacman

Won5/5

Avg.Score:483

Won5/5

Avg.Score:493

ExpectimaxPacman

Won1/5

Avg.Score:-303

Won5/5

Avg.Score:503

Resultsfromplaying5games

Pacmanuseddepth4searchwithanevalfunctionthatavoidstroubleGhostuseddepth2searchwithanevalfunctionthatseeksPacman

VideoofDemoWorldAssumptions RandomGhost–ExpectimaxPacman

VideoofDemoWorldAssumptions RandomGhost–MinimaxPacman

VideoofDemoWorldAssumptions AdversarialGhost–MinimaxPacman

VideoofDemoWorldAssumptions AdversarialGhost–ExpectimaxPacman

OtherGameTypes

MixedLayerTypes

▪ E.g.Backgammon

▪ Expectiminimax▪ Environmentisanextra“randomagent”playerthatmovesaftereachmin/maxagent

▪ Eachnodecomputestheappropriatecombinationofitschildren

Multi-AgentUtilities

▪ Whatifthegameisnotzero-sum,orhasmultipleplayers?

▪ Generalizationofminimax:▪ Terminalshaveutilitytuples▪ Nodevaluesarealsoutilitytuples▪ Eachplayermaximizesitsowncomponent▪ Cangiverisetocooperationand competitiondynamically…

1,6,6 7,1,2 6,1,2 7,2,1 5,1,7 1,5,2 7,7,1 5,2,5

Utilities

MaximumExpectedUtility

▪ Whyshouldweaverageutilities?Whynotminimax?

▪ Principleofmaximumexpectedutility:▪ Arationalagentshouldchosetheactionthatmaximizesitsexpected

utility,givenitsknowledge

▪ Questions:▪ Wheredoutilitiescomefrom?▪ Howdoweknowsuchutilitiesevenexistthatrepresentourpreferences?▪ Howdoweknowthataveragingevenmakessense?▪ Whatifourbehavior(preferences)can’tbedescribedbyutilities?

WhatUtilitiestoUse?

▪ Forworst-caseminimaxreasoning,terminalfunctionscaledoesn’tmatter

▪ Wejustwantbetterstatestohavehigherevaluations(gettheorderingright)▪ Wecallthisinsensitivitytomonotonictransformations

▪ Foraverage-caseexpectimaxreasoning,weneedmagnitudestobemeaningful

0 40 20 30 x2 0 1600 400 900

Utilities

▪ Utilitiesarefunctionsfromoutcomes(statesoftheworld)torealnumbersthatdescribeanagent’spreferences

▪ Wheredoutilitiescomefrom?▪ Inagame,maybesimple(+1/-1)▪ Utilitiessummarizetheagent’sgoals▪ Theorem:any“rational”preferencescanbe

summarizedasautilityfunction

▪ Wehard-wireutilitiesandletbehaviorsemerge▪ Whydon’tweletagentspickutilities?▪ Whydon’tweprescribebehaviors?

Utilities:UncertainOutcomes

Gettingicecream

GetSingle GetDouble

Oops Whew!

Preferences

▪ Anagentmusthavepreferencesamong:▪ Prizes:A, B,etc.▪ Lotteries:situationswithuncertainprizes

▪ Notation:▪ Preference:▪ Indifference:

A B

p 1-p

ALotteryAPrize

A

Rationality

▪ Wewantsomeconstraintsonpreferencesbeforewecallthemrational,suchas:

▪ Forexample:anagentwithintransitivepreferencescan beinducedtogiveawayallofitsmoney

▪ IfB>C,thenanagentwithCwouldpay(say)1centtogetB▪ IfA>B,thenanagentwithBwouldpay(say)1centtogetA▪ IfC>A,thenanagentwithAwouldpay(say)1centtogetC

RationalPreferences

)()()( CACBBA ≻≻≻ ⇒∧AxiomofTransitivity:

RationalPreferences

Theorem:Rationalpreferencesimplybehaviordescribableasmaximizationofexpectedutility

TheAxiomsofRationality

▪ Theorem[Ramsey,1931;vonNeumann&Morgenstern,1944]▪ Givenanypreferencessatisfyingtheseconstraints,thereexistsareal-valued functionUsuchthat:

▪ I.e.valuesassignedbyUpreservepreferencesofbothprizesandlotteries!

▪ Maximumexpectedutility(MEU)principle:▪ Choosetheactionthatmaximizesexpectedutility▪ Note:anagentcanbeentirelyrational(consistentwithMEU)withouteverrepresentingormanipulating

utilitiesandprobabilities▪ E.g.,alookuptableforperfecttic-tac-toe,areflexvacuumcleaner

MEUPrinciple

HumanUtilities

UtilityScales

▪ Normalizedutilities:u+=1.0,u-=0.0

▪ Micromorts:one-millionthchanceofdeath,usefulforpayingtoreduceproductrisks,etc.

▪ QALYs:quality-adjustedlifeyears,usefulformedicaldecisionsinvolvingsubstantialrisk

▪ Note:behaviorisinvariantunderpositivelineartransformation

▪ Withdeterministicprizesonly(nolotterychoices),onlyordinalutilitycanbedetermined,i.e.,totalorderonprizes.Todeterminemagnitudes,mustaskquestionsaboutlotterypreferences.

▪ Utilitiesmapstatestorealnumbers.Whichnumbers?

▪ Standardapproachtoassessment(elicitation)ofhumanutilities:

▪ CompareaprizeAtoastandardlotteryLpbetween▪ “bestpossibleprize”u+withprobabilityp

▪ “worstpossiblecatastrophe”u-withprobability1-p

▪ Adjustlotteryprobabilitypuntilindifference:A~Lp

▪ Resultingpisautilityin[0,1]

HumanUtilities

0.000001

Nopay

Pay$30

Instantdeath

0.999999

Money

▪ Moneydoesnotbehaveasautilityfunction,butwecantalkabouttheutilityofhavingmoney(orbeingindebt)

▪ GivenalotteryL=[p,$X;(1-p),$Y]

▪ TheexpectedmonetaryvalueEMV(L)isp*X+(1-p)*Y▪ U(L)=p*U($X)+(1-p)*U($Y)▪ Typically,U(L)<U(EMV(L))▪ Inthissense,peoplearerisk-averse▪ Whendeepindebt,peoplearerisk-prone

Example:Insurance

▪ Considerthelottery[0.5,$1000;0.5,$0]▪ Whatisitsexpectedmonetaryvalue?($500)▪ Whatisitscertaintyequivalent?

▪ Monetaryvalueacceptableinlieuoflottery▪ $400formostpeople

▪ Differenceof$100istheinsurancepremium▪ There’saninsuranceindustrybecausepeoplewillpaytoreducetheirrisk

▪ Ifeveryonewererisk-neutral,noinsuranceneeded!

▪ It’swin-win:you’dratherhavethe$400andtheinsurancecompanywouldratherhavethelottery(theirutilitycurveisflatandtheyhavemanylotteries)

Example:HumanRationality?

▪ FamousexampleofAllais(1953)

▪ A:[0.8,$4k;0.2,$0]▪ B:[1.0,$3k;0.0,$0]

▪ C:[0.2,$4k;0.8,$0]▪ D:[0.25,$3k;0.75,$0]

▪ MostpeoplepreferB>A,C>D

▪ ButifU($0)=0,then▪ B>A⇒ U($3k)>0.8U($4k)▪ C>D⇒ 0.8U($4k)>U($3k)(multbothsidesby4—lineartransformsareOK)

NextTime:MDPs!

Uncertainty and Utilitiessniekum/classes/343-S20/... · 2019. 8. 29. · Makes one think of...

Documents

Transcript of Uncertainty and Utilitiessniekum/classes/343-S20/... · 2019. 8. 29. · Makes one think of...