Download - chapter06-Game playing-Russel.pdf

7/28/2019 chapter06-Game playing-Russel.pdf

1/38

Gameplaying

Chapter6

Chapter61


2/38

Outline

Games

Perfectplay minimaxdecisions

pruning

ResourcelimitsandapproximateevaluationGamesofchance

Gamesofimperfectinformation

Chapter62


3/38

Gamesvs.searchproblems

UnpredictableopponentsolutionisastrategyspecifyingamoveforeverypossibleopponentreplyTimelimitsunlikelytofindgoal,mustapproximate

Planofattack:

Computerconsiderspossiblelinesofplay(Babbage,1846)

Algorithmforperfectplay(Zermelo,1912;VonNeumann,1944)

Finitehorizon,approximateevaluation(Zuse,1945;Wiener,1948;Shannon,1950)Firstchessprogram(Turing,1951)

Machinelearningtoimproveevaluationaccuracy(Samuel,195257)

Pruningtoallowdeepersearch(McCarthy,1956)

Chapter63


4/38

Typesofgames

deterministicchance

perfectinformation

imperfectinformation

chess,checkers,

go,othello

backgammon

monopoly

bridge,poker,scrabblenuclearwar

battleships,blindtictactoe

Chapter64


5/38

Gametree(2-player,deterministic,turns)

X X

X X

X

X

X

X X

MAX(X)

MIN(O)

XX

O

O

O XO

O

OO

OO O

MAX(X)

XO XO XOX

XX

X

X

XX

MIN(O)

XOXXOXXOX

............

...

...

...

TERMINAL

X X

10+1 Utility

Chapter65


6/38

Minimax

Perfectplayfordeterministic,perfect-informationgames

Idea:choosemovetopositionwithhighestminimaxvalue =bestachievablepayoffagainstbestplay

E.g.,2-plygame:

MAX

31286 4 21452

MIN

3

A1A3 A2

A13 A

12A

11A

21A23

A22

A33 A

32A

31

322

Chapter66


7/38

Minimaxalgorithm

functionMinimax-Decision(state)returnsanaction

inputs:state,currentstateingame

returnthe

ain

Actions

(state

)maximizingMin-Value

(Result

(a,

state))

functionMax-Value(state)returnsautilityvalue

ifTerminal-Test(state)thenreturnUtility(state)

v

fora,sinSuccessors(state)dovMax(v,Min-Value(s))

returnv

functionMin-Value(state)returnsautilityvalue


v

fora,sinSuccessors(state)dovMin(v,Max-Value(s))

returnv

Chapter67


8/38

Propertiesofminimax

Complete??

Chapter68


9/38

Propertiesofminimax

Complete??Onlyiftreeisfinite(chesshasspecificrulesforthis).NBafinitestrategycanexisteveninaninfinitetree!

Optimal??

Chapter69


10/38

Propertiesofminimax

Complete??Yes,iftreeisfinite(chesshasspecificrulesforthis)

Optimal??Yes,againstanoptimalopponent.Otherwise??

Timecomplexity??

Chapter610


11/38

Propertiesofminimax



Timecomplexity??O(bm)

Spacecomplexity??

Chapter611


12/38

Propertiesofminimax



Timecomplexity??O(bm)

Spacecomplexity??O(bm)(depth-firstexploration)

Forchess,b35,m100forreasonablegames

exactsolutioncompletelyinfeasible

Butdoweneedtoexploreeverypath?

Chapter612


13/38

pruningexample

MAX

3128

MIN3

3

Chapter613


14/38

pruningexample

MAX

3128

MIN3

2

2

XX

3

Chapter614


15/38

pruningexample

MAX

3128

MIN3

2

2

XX14

14

3

Chapter615


16/38

pruningexample

MAX

3128

MIN3

2

2

XX14

14

5

5

3

Chapter616


17/38

pruningexample

MAX

3128

MIN

3

3

2

2

XX14

14

5

5

2

2

3

Chapter617


18/38

Whyisitcalled?

..

..

..

MAX

MIN

MAX

MINV

isthebestvalue(tomax)foundsofaroffthecurrentpath

IfVisworsethan,maxwillavoiditprunethatbranch

Definesimilarlyformin

Chapter618


19/38

Thealgorithm

functionAlpha-Beta-Decision(state)returnsanaction

returntheainActions(state)maximizingMin-Value(Result(a,state))

functionMax-Value(state,,)returnsautilityvalue

inputs:state,currentstateingame

,thevalueofthebestalternativeformaxalongthepathtostate

,thevalueofthebestalternativeforminalongthepathtostate


v

fora,sinSuccessors(state)do

vMax(v,Min-Value(s,,))

ifvthenreturnvMax(,v)

returnv

functionMin-Value(state,,)returnsautilityvalue

sameasMax-Valuebutwithrolesof,reversed

Chapter619


20/38

Propertiesof

Pruningdoesnotaffectfinalresult

Goodmoveorderingimproveseffectivenessofpruning

Withperfectordering,timecomplexity=O(bm/2

)doublessolvabledepth

Asimpleexampleofthevalueofreasoningaboutwhichcomputationsarerelevant(aformofmetareasoning)

Unfortunately,3550

isstillimpossible!

Chapter620


21/38

Resourcelimits

Standardapproach:

UseCutoff-TestinsteadofTerminal-Test

e.g.,depthlimit(perhapsaddquiescencesearch)

UseEvalinsteadofUtilityi.e.,evaluationfunctionthatestimatesdesirabilityofposition

Supposewehave100seconds,explore104

nodes/second10

6nodespermove35

8/2

reachesdepth8prettygoodchessprogram

Chapter621


22/38

Evaluationfunctions

Blacktomove

Whiteslightlybetter

Whitetomove

Blackwinning

Forchess,typicallylinearweightedsumoffeatures

Eval(s)=w1f1(s)+w2f2(s)+...+wnfn(s)

e.g.,w1=9with

f1(s)=(numberofwhitequeens)(numberofblackqueens),etc.

Chapter622


23/38

Digression:Exactvaluesdontmatter

MIN

MAX

2 1

1

4 2

2

20

1

1400 20

20

BehaviourispreservedunderanymonotonictransformationofEval

Onlytheordermatters:

payoffindeterministicgamesactsasanordinalutilityfunction

Chapter623


24/38

Deterministicgamesinpractice

Checkers:Chinookended40-year-reignofhumanworldchampionMarionTinsleyin1994.Usedanendgamedatabasedefiningperfectplayforallpositionsinvolving8orfewerpiecesontheboard,atotalof443,748,401,247

positions.

Chess:DeepBluedefeatedhumanworldchampionGaryKasparovinasix-gamematchin1997.DeepBluesearches200millionpositionspersecond, usesverysophisticatedevaluation,andundisclosedmethodsforextending

somelinesofsearchupto40ply.

Othello:humanchampionsrefusetocompeteagainstcomputers,whoaretoogood.

Go:humanchampionsrefusetocompeteagainstcomputers,whoaretoobad.Ingo,b>300,somostprogramsusepatternknowledgebasestosuggestplausiblemoves.

Chapter624


25/38

Nondeterministicgames:backgammon

123456789101112

242322212019181716151413

0

25

Chapter625


26/38

Nondeterministicgamesingeneral

Innondeterministicgames,chanceintroducedbydice,card-shuffling

Simplifiedexamplewithcoin-flipping:

MIN

MAX

2

CHANCE

4746052

2402

0.50.50.50.5

31

Chapter626


27/38

Algorithmfornondeterministicgames

Expectiminimaxgivesperfectplay

JustlikeMinimax,exceptwemustalsohandlechancenodes:

...

ifstateisaMaxnodethenreturnthehighestExpectiMinimax-ValueofSuccessors(state)

ifstateisaMinnodethen

returnthelowestExpectiMinimax-ValueofSuccessors(state)ifstateisachancenodethen

returnaverageofExpectiMinimax-ValueofSuccessors(state)...

Chapter627


28/38

Nondeterministicgamesinpractice

Dicerollsincreaseb:21possiblerollswith2diceBackgammon20legalmoves(canbe6,000with1-1roll)

depth4=20(2120)3

1.2109

Asdepthincreases,probabilityofreachingagivennodeshrinksvalueoflookaheadisdiminished

pruningismuchlesseffective

TDGammonusesdepth-2search+verygoodEvalworld-championlevel

Chapter628


29/38

Digression:ExactvaluesDOmatter

DICE

MIN

MAX

22331144

2314

.9.1.9.1

2.11.3

2020303011400400

20301400

.9.1.9.1

2140.9

BehaviourispreservedonlybypositivelineartransformationofEval

HenceEvalshouldbeproportionaltotheexpectedpayoff

Chapter629


30/38

Gamesofimperfectinformation

E.g.,cardgames,whereopponentsinitialcardsareunknown

Typicallywecancalculateaprobabilityforeachpossibledeal

Seemsjustlikehavingonebigdicerollatthebeginningofthegame

Idea:computetheminimaxvalueofeachactionineachdeal,thenchoosetheactionwithhighestexpectedvalueoveralldeals

Specialcase:ifanactionisoptimalforalldeals,itsoptimal.

GIB,currentbestbridgeprogram,approximatesthisideaby1)generating100dealsconsistentwithbiddinginformation2)pickingtheactionthatwinsmosttricksonaverage

Chapter630


31/38

Example

Four-cardbridge/whist/heartshand,Maxtoplayfirst

8

92

6 668766766766767

429342934234343

0

Chapter631


32/38


33/38


34/38

Commonsenseexample

RoadAleadstoasmallheapofgoldpiecesRoadBleadstoafork:

taketheleftforkandyoullfindamoundofjewels;

taketherightforkandyoullberunoverbyabus.

Chapter634


35/38

Commonsenseexample





taketheleftforkandyoullberunoverbyabus;taketherightforkandyoullfindamoundofjewels.

Chapter635

p


36/38

Commonsenseexample





taketheleftforkandyoullberunoverbyabus;taketherightforkandyoullfindamoundofjewels.


guesscorrectlyandyoullfindamoundofjewels;guessincorrectlyandyoullberunoverbyabus.

Chapter636

Chapter637


37/38

Properanalysis

*IntuitionthatthevalueofanactionistheaverageofitsvaluesinallactualstatesisWRONG

Withpartialobservability,valueofanactiondependsontheinformationstateorbeliefstatetheagentisin

Cangenerateandsearchatreeofinformationstates

LeadstorationalbehaviorssuchasActingtoobtaininformationSignallingtoonespartnerActingrandomlytominimizeinformationdisclosure

Chapter637

Chapter638


38/38

Summary

Gamesarefuntoworkon!(anddangerous)

TheyillustrateseveralimportantpointsaboutAI

perfectionisunattainablemustapproximate

goodideatothinkaboutwhattothinkaboutuncertaintyconstrainstheassignmentofvaluestostates

optimaldecisionsdependoninformationstate,notrealstate

GamesaretoAIasgrandprixracingistoautomobiledesign

Chapter638