Uncertainty and Utilitiessniekum/classes/343-S20/... · 2019. 8. 29. · Makes one think of...
Transcript of Uncertainty and Utilitiessniekum/classes/343-S20/... · 2019. 8. 29. · Makes one think of...
CS343:ArtificialIntelligenceUncertaintyandUtilities
Prof.ScottNiekum
TheUniversityofTexasatAustin[TheseslidesarebasedonthoseofDanKleinandPieterAbbeelforCS188IntrotoAIatUCBerkeley.AllCS188materialsareavailableathttp://ai.berkeley.edu.]
UncertainOutcomes
Worst-Casevs.AverageCase
10 10 9 100
max
min
Idea:Uncertainoutcomescontrolledbychance,notanadversary!
chance
ExpectimaxSearch
▪ Whywouldn’tweknowwhattheresultofanactionwillbe?▪ Explicitrandomness:rollingdice▪ Unpredictableopponents:theghostsrespondrandomly▪ Actionscanfail:whenmovingarobot,wheelsmightslip
▪ Valuesshouldnowreflectaverage-case(expectimax)outcomes,notworst-case(minimax)outcomes
▪ Expectimaxsearch:computetheaveragescoreunderoptimalplay▪ Maxnodesasinminimaxsearch▪ Chancenodesarelikeminnodesbuttheoutcomeisuncertain▪ Calculatetheirexpectedutilities▪ I.e.takeweightedaverage(expectation)ofchildren
▪ Later,we’lllearnhowtoformalizetheunderlyinguncertain-resultproblemsasMarkovDecisionProcesses
10 4 5 7
max
chance
10 10 9 100
MinimaxvsExpectimax(Min)
End your misery!
MinimaxvsExpectimax(Exp)
Hold on to hope, Pacman!
ExpectimaxPseudocode
defvalue(state):ifthestateisaterminalstate:returnthestate’sutilityifthenextagentisMAX:returnmax-value(state)ifthenextagentisEXP:returnexp-value(state)
defexp-value(state):initializev=0foreachsuccessorofstate: p=probability(successor)
v+=p*value(successor)returnv
defmax-value(state):initializev=-∞ foreachsuccessorofstate:
v=max(v,value(successor))returnv
ExpectimaxPseudocode
defexp-value(state):initializev=0foreachsuccessorofstate: p=probability(successor)
v+=p*value(successor)returnv 5 78 24 -12
1/21/3
1/6
v=(1/2)(8)+(1/3)(24)+(1/6)(-12)=10
ExpectimaxExample
12 9 6 03 2 154 6
8 4 7
8
ExpectimaxPruning?
12 93 2
Depth-LimitedExpectimax
…
…
492 362 …
400 300
Estimateoftrueexpectimaxvalue(whichwouldrequirealotof
worktocompute)
Probabilities
Reminder:Probabilities
▪ Arandomvariablerepresentsaneventwhoseoutcomeisunknown▪ Aprobabilitydistributionisanassignmentofweightstooutcomes
▪ Example:Trafficonfreeway▪ Randomvariable:T=whetherthere’straffic▪ Outcomes:Tin{none,light,heavy}▪ Distribution:P(T=none)=0.25,P(T=light)=0.50,P(T=heavy)=0.25
▪ Somelawsofprobability(morelater):▪ Probabilitiesarealwaysnon-negative▪ Probabilitiesoverallpossibleoutcomessumtoone
▪ Aswegetmoreevidence,probabilitiesmaychange:▪ P(T=heavy)=0.25,P(T=heavy|Hour=8am)=0.60▪ We’lltalkaboutmethodsforreasoningandupdatingprobabilitieslater
0.25
0.50
0.25
▪ Theexpectedvalueofafunctionofarandomvariableistheaverage,weightedbytheprobabilitydistributionoveroutcomes
▪ Example:Howlongtogettotheairport?
Reminder:Expectations
0.25 0.50 0.25Probability:
20min 30min 60minTime:35minx x x+ +
▪ Inexpectimaxsearch,wehaveaprobabilisticmodelofhowtheopponent(orenvironment)willbehaveinanystate▪ Modelcouldbeasimpleuniformdistribution(rolladie)▪ Modelcouldbesophisticatedandrequireagreatdealof
computation▪ Wehaveachancenodeforanyoutcomeoutofourcontrol:
opponentorenvironment▪ Themodelmightsaythatadversarialactionsarelikely!
▪ Fornow,assumeeachchancenodemagicallycomesalongwithprobabilitiesthatspecifythedistributionoveritsoutcomes
WhatProbabilitiestoUse?
Havingaprobabilisticbeliefaboutanotheragent’sactiondoesnotmeanthattheagentisflippinganycoins!
WhatareProbabilities?
▪ Objectivist/frequentistanswer:▪ Averagesoverrepeatedexperiments▪ E.g.empiricallyestimatingP(rain)fromhistoricalobservation▪ Assertionabouthowfutureexperimentswillgo(inthelimit)▪ Newevidencechangesthereferenceclass▪ Makesonethinkofinherentlyrandomevents,likerollingdice
▪ Subjectivist/Bayesiananswer:▪ Degreesofbeliefaboutunobservedvariables▪ E.g.anagent’sbeliefthatit’sraining,giventhetemperature▪ E.g.pacman’sbeliefthattheghostwillturnleft,giventhestate▪ Oftenlearnprobabilitiesfrompastexperiences(morelater)▪ Newevidenceupdatesbeliefs(morelater)
Quiz:InformedProbabilities
▪ Let’ssayyouknowthatyouropponentisactuallyrunningadepth2minimax,usingtheresult80%ofthetime,andmovingrandomlyotherwise
▪ Question:Whattreesearchshouldyouuse?
0.10.9
▪ Answer:Expectimax!▪ TofigureoutEACHchancenode’sprobabilities,
youhavetorunasimulationofyouropponent
▪ Thiskindofthinggetsveryslowveryquickly
▪ Evenworseifyouhavetosimulateyouropponentsimulatingyou…
▪ …exceptforminimax,whichhasthenicepropertythatitallcollapsesintoonegametree
ModelingAssumptions
TheDangersofOptimismandPessimism
DangerousOptimismAssumingchancewhentheworldisadversarial
DangerousPessimismAssumingtheworstcasewhenit’snotlikely
Assumptionsvs.Reality
AdversarialGhost RandomGhost
MinimaxPacman
Won5/5
Avg.Score:483
Won5/5
Avg.Score:493
ExpectimaxPacman
Won1/5
Avg.Score:-303
Won5/5
Avg.Score:503
Resultsfromplaying5games
Pacmanuseddepth4searchwithanevalfunctionthatavoidstroubleGhostuseddepth2searchwithanevalfunctionthatseeksPacman
VideoofDemoWorldAssumptions RandomGhost–ExpectimaxPacman
VideoofDemoWorldAssumptions RandomGhost–MinimaxPacman
VideoofDemoWorldAssumptions AdversarialGhost–MinimaxPacman
VideoofDemoWorldAssumptions AdversarialGhost–ExpectimaxPacman
OtherGameTypes
MixedLayerTypes
▪ E.g.Backgammon
▪ Expectiminimax▪ Environmentisanextra“randomagent”playerthatmovesaftereachmin/maxagent
▪ Eachnodecomputestheappropriatecombinationofitschildren
Multi-AgentUtilities
▪ Whatifthegameisnotzero-sum,orhasmultipleplayers?
▪ Generalizationofminimax:▪ Terminalshaveutilitytuples▪ Nodevaluesarealsoutilitytuples▪ Eachplayermaximizesitsowncomponent▪ Cangiverisetocooperationand competitiondynamically…
1,6,6 7,1,2 6,1,2 7,2,1 5,1,7 1,5,2 7,7,1 5,2,5
Utilities
MaximumExpectedUtility
▪ Whyshouldweaverageutilities?Whynotminimax?
▪ Principleofmaximumexpectedutility:▪ Arationalagentshouldchosetheactionthatmaximizesitsexpected
utility,givenitsknowledge
▪ Questions:▪ Wheredoutilitiescomefrom?▪ Howdoweknowsuchutilitiesevenexistthatrepresentourpreferences?▪ Howdoweknowthataveragingevenmakessense?▪ Whatifourbehavior(preferences)can’tbedescribedbyutilities?
WhatUtilitiestoUse?
▪ Forworst-caseminimaxreasoning,terminalfunctionscaledoesn’tmatter
▪ Wejustwantbetterstatestohavehigherevaluations(gettheorderingright)▪ Wecallthisinsensitivitytomonotonictransformations
▪ Foraverage-caseexpectimaxreasoning,weneedmagnitudestobemeaningful
0 40 20 30 x2 0 1600 400 900
Utilities
▪ Utilitiesarefunctionsfromoutcomes(statesoftheworld)torealnumbersthatdescribeanagent’spreferences
▪ Wheredoutilitiescomefrom?▪ Inagame,maybesimple(+1/-1)▪ Utilitiessummarizetheagent’sgoals▪ Theorem:any“rational”preferencescanbe
summarizedasautilityfunction
▪ Wehard-wireutilitiesandletbehaviorsemerge▪ Whydon’tweletagentspickutilities?▪ Whydon’tweprescribebehaviors?
Utilities:UncertainOutcomes
Gettingicecream
GetSingle GetDouble
Oops Whew!
Preferences
▪ Anagentmusthavepreferencesamong:▪ Prizes:A, B,etc.▪ Lotteries:situationswithuncertainprizes
▪ Notation:▪ Preference:▪ Indifference:
A B
p 1-p
ALotteryAPrize
A
Rationality
▪ Wewantsomeconstraintsonpreferencesbeforewecallthemrational,suchas:
▪ Forexample:anagentwithintransitivepreferencescan beinducedtogiveawayallofitsmoney
▪ IfB>C,thenanagentwithCwouldpay(say)1centtogetB▪ IfA>B,thenanagentwithBwouldpay(say)1centtogetA▪ IfC>A,thenanagentwithAwouldpay(say)1centtogetC
RationalPreferences
)()()( CACBBA ≻≻≻ ⇒∧AxiomofTransitivity:
RationalPreferences
Theorem:Rationalpreferencesimplybehaviordescribableasmaximizationofexpectedutility
TheAxiomsofRationality
▪ Theorem[Ramsey,1931;vonNeumann&Morgenstern,1944]▪ Givenanypreferencessatisfyingtheseconstraints,thereexistsareal-valued functionUsuchthat:
▪ I.e.valuesassignedbyUpreservepreferencesofbothprizesandlotteries!
▪ Maximumexpectedutility(MEU)principle:▪ Choosetheactionthatmaximizesexpectedutility▪ Note:anagentcanbeentirelyrational(consistentwithMEU)withouteverrepresentingormanipulating
utilitiesandprobabilities▪ E.g.,alookuptableforperfecttic-tac-toe,areflexvacuumcleaner
MEUPrinciple
HumanUtilities
UtilityScales
▪ Normalizedutilities:u+=1.0,u-=0.0
▪ Micromorts:one-millionthchanceofdeath,usefulforpayingtoreduceproductrisks,etc.
▪ QALYs:quality-adjustedlifeyears,usefulformedicaldecisionsinvolvingsubstantialrisk
▪ Note:behaviorisinvariantunderpositivelineartransformation
▪ Withdeterministicprizesonly(nolotterychoices),onlyordinalutilitycanbedetermined,i.e.,totalorderonprizes.Todeterminemagnitudes,mustaskquestionsaboutlotterypreferences.
▪ Utilitiesmapstatestorealnumbers.Whichnumbers?
▪ Standardapproachtoassessment(elicitation)ofhumanutilities:
▪ CompareaprizeAtoastandardlotteryLpbetween▪ “bestpossibleprize”u+withprobabilityp
▪ “worstpossiblecatastrophe”u-withprobability1-p
▪ Adjustlotteryprobabilitypuntilindifference:A~Lp
▪ Resultingpisautilityin[0,1]
HumanUtilities
0.000001
Nopay
Pay$30
Instantdeath
0.999999
Money
▪ Moneydoesnotbehaveasautilityfunction,butwecantalkabouttheutilityofhavingmoney(orbeingindebt)
▪ GivenalotteryL=[p,$X;(1-p),$Y]
▪ TheexpectedmonetaryvalueEMV(L)isp*X+(1-p)*Y▪ U(L)=p*U($X)+(1-p)*U($Y)▪ Typically,U(L)<U(EMV(L))▪ Inthissense,peoplearerisk-averse▪ Whendeepindebt,peoplearerisk-prone
Example:Insurance
▪ Considerthelottery[0.5,$1000;0.5,$0]▪ Whatisitsexpectedmonetaryvalue?($500)▪ Whatisitscertaintyequivalent?
▪ Monetaryvalueacceptableinlieuoflottery▪ $400formostpeople
▪ Differenceof$100istheinsurancepremium▪ There’saninsuranceindustrybecausepeoplewillpaytoreducetheirrisk
▪ Ifeveryonewererisk-neutral,noinsuranceneeded!
▪ It’swin-win:you’dratherhavethe$400andtheinsurancecompanywouldratherhavethelottery(theirutilitycurveisflatandtheyhavemanylotteries)
Example:HumanRationality?
▪ FamousexampleofAllais(1953)
▪ A:[0.8,$4k;0.2,$0]▪ B:[1.0,$3k;0.0,$0]
▪ C:[0.2,$4k;0.8,$0]▪ D:[0.25,$3k;0.75,$0]
▪ MostpeoplepreferB>A,C>D
▪ ButifU($0)=0,then▪ B>A⇒ U($3k)>0.8U($4k)▪ C>D⇒ 0.8U($4k)>U($3k)(multbothsidesby4—lineartransformsareOK)
NextTime:MDPs!