Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

37
Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5

Transcript of Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

Page 1: Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

Game Playing

ECE457 Applied Artificial IntelligenceSpring 2007 Lecture #5

Page 2: Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 2

Outline Types of games Playing a perfect game

Minimax search Alpha-beta pruning

Playing an imperfect game Real-time Imperfect information Chance

Russell & Norvig, chapter 6

Page 3: Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 3

Game Problems Games are well-defined search problems…

Well-defined board configurations (states) Limited set of well-defined moves (actions) Well-defined victory conditions (goal) Values assigned to pieces, moves, outcomes

(cost) …that are hard to solve by searching

A search tree for chess has an average branching factor of 35

An average chess game lasts for 50 moves per player (ply)

The average search tree has 35100 nodes!

Page 4: Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 4

Game Problems The opponent

He wants to win and make our agent lose

We have no control over his actions He prevents us from reaching the

optimal solution Introduces uncertainty in the search

We don’t know what moves the opponent will do

We will assume “perfect play” behaviour

Page 5: Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 5

Types of Games

Deterministic Chance

Perfect informatio

n

ChessCheckers

Go

Backgammon

Monopoly

Imperfect informatio

n

StrategoBattleship

BridgePoker

Scrabble

Page 6: Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 6

Game-Playing Strategy Our agent and the opponent play

sequentially We assume the opponent plays

perfectly Our agent cannot get to the

optimal goal The opponent won’t allow it

Our agent must find the best achievable goal

Page 7: Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 7

Minimax Algorithm Payoff (utility) function assigns a value

to each leaf node in the tree Value then propagates up to non-leaf nodes

Two players MAX wants to maximise payoff MIN wants to minimise payoff MAX is the player currently looking for a

move (i.e. at root of tree) Payoff function

Simple 1 = win / 0 = draw / -1 = lose Complex for different victory conditions Win/lose for MAX

Page 8: Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 8

Minimax Algorithm

X XX

X

OX O X O

X O X X O

X

X O

X

Page 9: Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 9

Minimax Algorithm

MAX

MIN

MAX 3 18 5 56 1 15 42

-12

-5

3 1 -12

3

Page 10: Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 10

Minimax Algorithm Game of Nim

Initial state: 7 matches in a pile Each player must divide a pile into two

non-empty unequal piles Player who can’t do that, loses

Payoff +1 win, -1 loss

Page 11: Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 11+1 (max wins)

Minimax Algorithm7

4-35-26-1

3-3-13-2-24-2-15-1-1

2-2-2-13-2-1-14-1-1-1

2-2-1-1-13-1-1-1-1

2-1-1-1-1-1

MAX

MIN

MAX

MIN

MAX

MIN

-1 (max loses)

+1 (max wins)

+1

+1

+1

-1

-1-1

-1

+1

-1-1-1

The value of each node is the value of the best leaf the current player (MAX or MIN) can reach.

Page 12: Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 12

Minimax Algorithm Generate entire game tree Compute payoff of leaf nodes For each non-leaf node, from the

lowest in the tree to the root If MAX level, then assign value of the

child with maximum payoff If MAX level, then assign value of the

child with minimum payoff At the root, select action with

maximum payoff

Page 13: Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 13

Minimax Algorithm Complete, if tree is finite Optimal against a perfect

opponent Time complexity = O(bm) Space complexity = O(bm) But remember, b and m can be

huge For chess, b ≈ 35 and m ≈ 100

Page 14: Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 14

Alpha-Beta Pruning MAX take the max of its children

MIN gives each child the min of its children

max(min(3,18,5),min(1,15,42),min(56,-12,-5))

We don’t need to compute the values of all the grandchildren! Only until we find a value lower than

the highest child’s value

max(min(3,18,5),min(1,?,?),min(56,-12,?))

Page 15: Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 15

Alpha-Beta Pruning Maintain values and

is the maximum value that MAX is assured of at any point in the search

is the minimum value that MIN is assured of at any point in the search

Both computed using payoff propagated through the tree

Start with = - and = As the search goes on, the number of

possible values of and decreases When

Current path is not the result of best play by both players, so no need to explore further

Page 16: Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 16

Alpha-Beta Pruning

MAX

MIN

MAX 3 18 5 56 1 -12

3 1 -12

3

X X X

[-, ] [-, ]

[-, ]

[-, 3][3, 3]

[-, ]

[3, ]

[-, 1]

[-, 56][-, -12]

[3, 3][3, 56]

[, ]

Page 17: Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 17

Alpha-Beta Pruning Called as “rootvalue = Evaluate(root, -, )”Evaluate(node, , ) If node is leaf

Return payoff If node is MAX

v = - For each child of node

v = max( v, Evaluate(child, , ) Break if v = max(, v)

Return v If node is MIN

v = For each child of node

v = min( v, Evaluate(child, , ) ) Break if v = min(, v)

Return v

Page 18: Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 18

Alpha-Beta Pruning Efficiency dependant on ordering of

children Will check each of MAX’s children until finding

one with a value higher than beta Will check each of MIN’s children until finding

one with a value lower than alpha Use heuristics to order the nodes to check

Check the highest-value children first for MAX Check the lowest-value children first for MIN

Good ordering can reduce time complexity to O(bd/2) Random ordering gives roughly O(b3d/4) Minimax is O(bd)

Page 19: Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 19

Imperfect Play Real-time or time constraints Chance Hidden information

Page 20: Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 20

Real-Time Games Sometimes we can’t search the

entire tree Real-time games Time constraints (playing against a

clock) Tree too big (e.g. chess)

Page 21: Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 21

Real-Time Games Evaluation function

Estimate value of a non-leaf node in the tree

Cut off search at a given level

Chess: count value of pieces, available moves, board configurations, …

X

X

O

X

O X

<

Page 22: Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 22

Real-Time Minimax Algorithm Generate entire game tree down to

maximum number of ply Evaluate lowest nodes For each non-leaf node, from the lowest

in the tree to the root If MAX level, then assign value of the child

with maximum payoff If MAX level, then assign value of the child

with minimum payoff At the root, select action with maximum

payoff

Page 23: Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 23

Real-Time Alpha-Beta Pruning Called as “rootvalue = Evaluate(root, -, )”Evaluate(node, , ) If node is at lowest level

Return evaluation If node is MAX

v = - For each child of node

v = max( v, Evaluate(child, , ) Break if v = max(, v)

Return v If node is MIN

v = For each child of node

v = min( v, Evaluate(child, , ) ) Break if v = min(, v)

Return v

Page 24: Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 24

Real-Time Games: Problems Non-quiescent positions

Some board configurations cause value to change wildly

Solved with quiescence search Expand non-quiescent boards deeper, until you

reach stable “quiescent” boards Horizon effect

A “singular” move is considerably better than all others

But a damaging unavoidable move is (or can be pushed) just beyond the search depth limit (the “horizon”)

Solved with singular extension Expand singular state deeper

Page 25: Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 25

Games of Chance Minimax requires planning for

upcoming moves If moves depend on dice rolls,

random draws, etc., planning is impossible

We need to add all possible outcomes in the tree!

Page 26: Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 26

Recall

3 18 5 56 1 15 42

-12

-5

3 1 -12

3

Page 27: Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 27

Expectiminimax

0.80.15

0.05

0.80.15

0.05

0.80.15

0.05

163 -7 1 25 -8 -12 -25 58

4.45 4.15 -10.45

4.45MAX has already rolled the dice and has three possible moves

Then, MIN rolls the dice

And MIN picks an action based on the roll result

There are three possible outcomes to the roll

Page 28: Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 28

Problems with Expectiminimax

0.80.15

0.05

0.80.15

0.05

0.80.15

0.05

163 -7 1 25 -8 -12 -25 800

4.45 4.15 26.65

26.65

Page 29: Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 29

Problems with Expectiminimax Time complexity: O(bmnm)

n is the number of possible outcomes of a chance node

Recall: minimax is O(bm) Trees can grow very large very

quickly Minimax & pruning limits search to likely

sequences of actions given perfect play With randomness, there is no likely

sequence of actions

Page 30: Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 30

Imperfect Information Algorithms so far require knowing

everything about the game In some games, information about the

opponent is hidden Cards in poker, pieces in Stratego, etc.

We could approximate hidden information to random events The probability that the opponent has a

flush, the probability that a piece is a bomb, etc.

Then use expectiminimax to get best action

Page 31: Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 31

Imperfect Information List all possible outcomes, then

average best action overall Can lead to irrational behaviour! Possible cases:

Road 1 leads to money, road 2-a leads to gold, road 2-b leads to death (rational action is road 2, then a)

Road 1 leads to money, road 2-a leads to death, road 2-b leads to gold (rational action is road 2, then b)

But the real situation is: Road 1 leads to money, road 2 leads

to gold or death (rational action is road 1)

1 2

a b

Page 32: Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 32

Imperfect Information It’s a useful approximation, but it’s not

exact! Hidden information not the same as random

events Need to handle information

Gather information Plan based on what information we will have at

a given point in the future Leads to more rational behaviour

Acting to gain information Acting to give information to partners Acting to conceal information from the

opponents We will learn to do that later in the course

Page 33: Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 33

IBM Deep Blue First chess computer to defeat

a reigning world champion (Garry Kasparov) under normal chess tournament constraints in 1997

Relied on brute hardware search power 30 processors for the search 480 custom VLSI chess

processors for move generation and ordering, and leaf node evaluation

Page 34: Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 34

IBM Deep Blue Searched a minimax tree

100-200M states per second, maximum 330M Average 6 to 16 ply, maximum 40 ply Decide which moves are worth expanding,

giving priority to singular expansion and chess threats

Null-window alpha-beta pruning Alpha-beta pruning but limited to a “window”

of moves rather than the entire tree Faster and easier to implement on hardware Approximate, can only returns bounds on the

minimax value Allows for a highly non-uniform, more

selective and human-like search of the tree

Page 35: Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 35

IBM Deep Blue Two board evaluation heuristics Fast evaluation to get a quick

approximate value Considers piece position value

Slow evaluation to get an exact value Considers 8,000 features Includes common chess concepts and

specific Kasparov strategies Features have programmable weights

learned automatically from 700,000 grandmaster games and fine-tuned manually by a chess grandmaster

Page 36: Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 36

Assumptions Utility-based agent Environment

Fully observable Deterministic Sequential Static Discrete / Continuous Single agent

Page 37: Game Playing ECE457 Applied Artificial Intelligence Spring 2007 Lecture #5.

ECE457 Applied Artificial Intelligence R. Khoury (2007) Page 37

Assumptions Updated Utility-based agent Environment

Fully observable / Partially observable (approximation)

Deterministic / Strategic / Stochastic Sequential Static / Semi-dynamic Discrete / Continuous Single agent / Multi-agent