7/28/2019 chapter06-Game playing-Russel.pdf
1/38
Gameplaying
Chapter6
Chapter61
7/28/2019 chapter06-Game playing-Russel.pdf
2/38
Outline
Games
Perfectplay minimaxdecisions
pruning
ResourcelimitsandapproximateevaluationGamesofchance
Gamesofimperfectinformation
Chapter62
7/28/2019 chapter06-Game playing-Russel.pdf
3/38
Gamesvs.searchproblems
UnpredictableopponentsolutionisastrategyspecifyingamoveforeverypossibleopponentreplyTimelimitsunlikelytofindgoal,mustapproximate
Planofattack:
Computerconsiderspossiblelinesofplay(Babbage,1846)
Algorithmforperfectplay(Zermelo,1912;VonNeumann,1944)
Finitehorizon,approximateevaluation(Zuse,1945;Wiener,1948;Shannon,1950)Firstchessprogram(Turing,1951)
Machinelearningtoimproveevaluationaccuracy(Samuel,195257)
Pruningtoallowdeepersearch(McCarthy,1956)
Chapter63
7/28/2019 chapter06-Game playing-Russel.pdf
4/38
Typesofgames
deterministicchance
perfectinformation
imperfectinformation
chess,checkers,
go,othello
backgammon
monopoly
bridge,poker,scrabblenuclearwar
battleships,blindtictactoe
Chapter64
7/28/2019 chapter06-Game playing-Russel.pdf
5/38
Gametree(2-player,deterministic,turns)
X X
X X
X
X
X
X X
MAX(X)
MIN(O)
XX
O
O
O XO
O
OO
OO O
MAX(X)
XO XO XOX
XX
X
X
XX
MIN(O)
XOXXOXXOX
............
...
...
...
TERMINAL
X X
10+1 Utility
Chapter65
7/28/2019 chapter06-Game playing-Russel.pdf
6/38
Minimax
Perfectplayfordeterministic,perfect-informationgames
Idea:choosemovetopositionwithhighestminimaxvalue =bestachievablepayoffagainstbestplay
E.g.,2-plygame:
MAX
31286 4 21452
MIN
3
A1A3 A2
A13 A
12A
11A
21A23
A22
A33 A
32A
31
322
Chapter66
7/28/2019 chapter06-Game playing-Russel.pdf
7/38
Minimaxalgorithm
functionMinimax-Decision(state)returnsanaction
inputs:state,currentstateingame
returnthe
ain
Actions
(state
)maximizingMin-Value
(Result
(a,
state))
functionMax-Value(state)returnsautilityvalue
ifTerminal-Test(state)thenreturnUtility(state)
v
fora,sinSuccessors(state)dovMax(v,Min-Value(s))
returnv
functionMin-Value(state)returnsautilityvalue
ifTerminal-Test(state)thenreturnUtility(state)
v
fora,sinSuccessors(state)dovMin(v,Max-Value(s))
returnv
Chapter67
7/28/2019 chapter06-Game playing-Russel.pdf
8/38
Propertiesofminimax
Complete??
Chapter68
7/28/2019 chapter06-Game playing-Russel.pdf
9/38
Propertiesofminimax
Complete??Onlyiftreeisfinite(chesshasspecificrulesforthis).NBafinitestrategycanexisteveninaninfinitetree!
Optimal??
Chapter69
7/28/2019 chapter06-Game playing-Russel.pdf
10/38
Propertiesofminimax
Complete??Yes,iftreeisfinite(chesshasspecificrulesforthis)
Optimal??Yes,againstanoptimalopponent.Otherwise??
Timecomplexity??
Chapter610
7/28/2019 chapter06-Game playing-Russel.pdf
11/38
Propertiesofminimax
Complete??Yes,iftreeisfinite(chesshasspecificrulesforthis)
Optimal??Yes,againstanoptimalopponent.Otherwise??
Timecomplexity??O(bm)
Spacecomplexity??
Chapter611
7/28/2019 chapter06-Game playing-Russel.pdf
12/38
Propertiesofminimax
Complete??Yes,iftreeisfinite(chesshasspecificrulesforthis)
Optimal??Yes,againstanoptimalopponent.Otherwise??
Timecomplexity??O(bm)
Spacecomplexity??O(bm)(depth-firstexploration)
Forchess,b35,m100forreasonablegames
exactsolutioncompletelyinfeasible
Butdoweneedtoexploreeverypath?
Chapter612
7/28/2019 chapter06-Game playing-Russel.pdf
13/38
pruningexample
MAX
3128
MIN3
3
Chapter613
7/28/2019 chapter06-Game playing-Russel.pdf
14/38
pruningexample
MAX
3128
MIN3
2
2
XX
3
Chapter614
7/28/2019 chapter06-Game playing-Russel.pdf
15/38
pruningexample
MAX
3128
MIN3
2
2
XX14
14
3
Chapter615
7/28/2019 chapter06-Game playing-Russel.pdf
16/38
pruningexample
MAX
3128
MIN3
2
2
XX14
14
5
5
3
Chapter616
7/28/2019 chapter06-Game playing-Russel.pdf
17/38
pruningexample
MAX
3128
MIN
3
3
2
2
XX14
14
5
5
2
2
3
Chapter617
7/28/2019 chapter06-Game playing-Russel.pdf
18/38
Whyisitcalled?
..
..
..
MAX
MIN
MAX
MINV
isthebestvalue(tomax)foundsofaroffthecurrentpath
IfVisworsethan,maxwillavoiditprunethatbranch
Definesimilarlyformin
Chapter618
7/28/2019 chapter06-Game playing-Russel.pdf
19/38
Thealgorithm
functionAlpha-Beta-Decision(state)returnsanaction
returntheainActions(state)maximizingMin-Value(Result(a,state))
functionMax-Value(state,,)returnsautilityvalue
inputs:state,currentstateingame
,thevalueofthebestalternativeformaxalongthepathtostate
,thevalueofthebestalternativeforminalongthepathtostate
ifTerminal-Test(state)thenreturnUtility(state)
v
fora,sinSuccessors(state)do
vMax(v,Min-Value(s,,))
ifvthenreturnvMax(,v)
returnv
functionMin-Value(state,,)returnsautilityvalue
sameasMax-Valuebutwithrolesof,reversed
Chapter619
7/28/2019 chapter06-Game playing-Russel.pdf
20/38
Propertiesof
Pruningdoesnotaffectfinalresult
Goodmoveorderingimproveseffectivenessofpruning
Withperfectordering,timecomplexity=O(bm/2
)doublessolvabledepth
Asimpleexampleofthevalueofreasoningaboutwhichcomputationsarerelevant(aformofmetareasoning)
Unfortunately,3550
isstillimpossible!
Chapter620
7/28/2019 chapter06-Game playing-Russel.pdf
21/38
Resourcelimits
Standardapproach:
UseCutoff-TestinsteadofTerminal-Test
e.g.,depthlimit(perhapsaddquiescencesearch)
UseEvalinsteadofUtilityi.e.,evaluationfunctionthatestimatesdesirabilityofposition
Supposewehave100seconds,explore104
nodes/second10
6nodespermove35
8/2
reachesdepth8prettygoodchessprogram
Chapter621
7/28/2019 chapter06-Game playing-Russel.pdf
22/38
Evaluationfunctions
Blacktomove
Whiteslightlybetter
Whitetomove
Blackwinning
Forchess,typicallylinearweightedsumoffeatures
Eval(s)=w1f1(s)+w2f2(s)+...+wnfn(s)
e.g.,w1=9with
f1(s)=(numberofwhitequeens)(numberofblackqueens),etc.
Chapter622
7/28/2019 chapter06-Game playing-Russel.pdf
23/38
Digression:Exactvaluesdontmatter
MIN
MAX
2 1
1
4 2
2
20
1
1400 20
20
BehaviourispreservedunderanymonotonictransformationofEval
Onlytheordermatters:
payoffindeterministicgamesactsasanordinalutilityfunction
Chapter623
7/28/2019 chapter06-Game playing-Russel.pdf
24/38
Deterministicgamesinpractice
Checkers:Chinookended40-year-reignofhumanworldchampionMarionTinsleyin1994.Usedanendgamedatabasedefiningperfectplayforallpositionsinvolving8orfewerpiecesontheboard,atotalof443,748,401,247
positions.
Chess:DeepBluedefeatedhumanworldchampionGaryKasparovinasix-gamematchin1997.DeepBluesearches200millionpositionspersecond, usesverysophisticatedevaluation,andundisclosedmethodsforextending
somelinesofsearchupto40ply.
Othello:humanchampionsrefusetocompeteagainstcomputers,whoaretoogood.
Go:humanchampionsrefusetocompeteagainstcomputers,whoaretoobad.Ingo,b>300,somostprogramsusepatternknowledgebasestosuggestplausiblemoves.
Chapter624
7/28/2019 chapter06-Game playing-Russel.pdf
25/38
Nondeterministicgames:backgammon
123456789101112
242322212019181716151413
0
25
Chapter625
7/28/2019 chapter06-Game playing-Russel.pdf
26/38
Nondeterministicgamesingeneral
Innondeterministicgames,chanceintroducedbydice,card-shuffling
Simplifiedexamplewithcoin-flipping:
MIN
MAX
2
CHANCE
4746052
2402
0.50.50.50.5
31
Chapter626
7/28/2019 chapter06-Game playing-Russel.pdf
27/38
Algorithmfornondeterministicgames
Expectiminimaxgivesperfectplay
JustlikeMinimax,exceptwemustalsohandlechancenodes:
...
ifstateisaMaxnodethenreturnthehighestExpectiMinimax-ValueofSuccessors(state)
ifstateisaMinnodethen
returnthelowestExpectiMinimax-ValueofSuccessors(state)ifstateisachancenodethen
returnaverageofExpectiMinimax-ValueofSuccessors(state)...
Chapter627
7/28/2019 chapter06-Game playing-Russel.pdf
28/38
Nondeterministicgamesinpractice
Dicerollsincreaseb:21possiblerollswith2diceBackgammon20legalmoves(canbe6,000with1-1roll)
depth4=20(2120)3
1.2109
Asdepthincreases,probabilityofreachingagivennodeshrinksvalueoflookaheadisdiminished
pruningismuchlesseffective
TDGammonusesdepth-2search+verygoodEvalworld-championlevel
Chapter628
7/28/2019 chapter06-Game playing-Russel.pdf
29/38
Digression:ExactvaluesDOmatter
DICE
MIN
MAX
22331144
2314
.9.1.9.1
2.11.3
2020303011400400
20301400
.9.1.9.1
2140.9
BehaviourispreservedonlybypositivelineartransformationofEval
HenceEvalshouldbeproportionaltotheexpectedpayoff
Chapter629
7/28/2019 chapter06-Game playing-Russel.pdf
30/38
Gamesofimperfectinformation
E.g.,cardgames,whereopponentsinitialcardsareunknown
Typicallywecancalculateaprobabilityforeachpossibledeal
Seemsjustlikehavingonebigdicerollatthebeginningofthegame
Idea:computetheminimaxvalueofeachactionineachdeal,thenchoosetheactionwithhighestexpectedvalueoveralldeals
Specialcase:ifanactionisoptimalforalldeals,itsoptimal.
GIB,currentbestbridgeprogram,approximatesthisideaby1)generating100dealsconsistentwithbiddinginformation2)pickingtheactionthatwinsmosttricksonaverage
Chapter630
7/28/2019 chapter06-Game playing-Russel.pdf
31/38
Example
Four-cardbridge/whist/heartshand,Maxtoplayfirst
8
92
6 668766766766767
429342934234343
0
Chapter631
7/28/2019 chapter06-Game playing-Russel.pdf
32/38
7/28/2019 chapter06-Game playing-Russel.pdf
33/38
7/28/2019 chapter06-Game playing-Russel.pdf
34/38
Commonsenseexample
RoadAleadstoasmallheapofgoldpiecesRoadBleadstoafork:
taketheleftforkandyoullfindamoundofjewels;
taketherightforkandyoullberunoverbyabus.
Chapter634
7/28/2019 chapter06-Game playing-Russel.pdf
35/38
Commonsenseexample
RoadAleadstoasmallheapofgoldpiecesRoadBleadstoafork:
taketheleftforkandyoullfindamoundofjewels;
taketherightforkandyoullberunoverbyabus.
RoadAleadstoasmallheapofgoldpiecesRoadBleadstoafork:
taketheleftforkandyoullberunoverbyabus;taketherightforkandyoullfindamoundofjewels.
Chapter635
p
7/28/2019 chapter06-Game playing-Russel.pdf
36/38
Commonsenseexample
RoadAleadstoasmallheapofgoldpiecesRoadBleadstoafork:
taketheleftforkandyoullfindamoundofjewels;
taketherightforkandyoullberunoverbyabus.
RoadAleadstoasmallheapofgoldpiecesRoadBleadstoafork:
taketheleftforkandyoullberunoverbyabus;taketherightforkandyoullfindamoundofjewels.
RoadAleadstoasmallheapofgoldpiecesRoadBleadstoafork:
guesscorrectlyandyoullfindamoundofjewels;guessincorrectlyandyoullberunoverbyabus.
Chapter636
Chapter637
7/28/2019 chapter06-Game playing-Russel.pdf
37/38
Properanalysis
*IntuitionthatthevalueofanactionistheaverageofitsvaluesinallactualstatesisWRONG
Withpartialobservability,valueofanactiondependsontheinformationstateorbeliefstatetheagentisin
Cangenerateandsearchatreeofinformationstates
LeadstorationalbehaviorssuchasActingtoobtaininformationSignallingtoonespartnerActingrandomlytominimizeinformationdisclosure
Chapter637
Chapter638
7/28/2019 chapter06-Game playing-Russel.pdf
38/38
Summary
Gamesarefuntoworkon!(anddangerous)
TheyillustrateseveralimportantpointsaboutAI
perfectionisunattainablemustapproximate
goodideatothinkaboutwhattothinkaboutuncertaintyconstrainstheassignmentofvaluestostates
optimaldecisionsdependoninformationstate,notrealstate
GamesaretoAIasgrandprixracingistoautomobiledesign
Chapter638
Top Related