double AlphaBeta (state, depth , alpha, beta ) begin if depth

39

description

double AlphaBeta (state, depth , alpha, beta ) begin if depth = beta) return rval; - PowerPoint PPT Presentation

Transcript of double AlphaBeta (state, depth , alpha, beta ) begin if depth

double AlphaBeta(state, depth, alpha, beta)begin

if depth <= 0 then return evaluation(state) //op pov

for each action “a” possible from statenextstate = performAction(a, state)rval = -AlphaBeta(nextstate, depth-1,

-beta, -alpha);if (rval >= beta) return rval;if (rval > alpha) alpha = rval;

endforreturn alpha;

end

Meta-Reasoning for Search• problem: one legal move only (or one clear

favourite) alpha-beta search will still generate (possibly large) search tree

• similar symmetrical situations• idea: compute utility of expanding a node before

expanding it• meta-reasoning (reasoning about reasoning):

reason about how to spend computing time

Where We are: Chess

TechnologySearch depth

Level of play

minimax, evaluation function, cut-off test with quiescence search, large transposition table, speed: 1 million nodes/sec

5 ply novice

as above, but alpha-beta search 10 ply expert

as above, plus additional pruning, database of openings and end games, supercomputer

14 plygrand master

Deep Blue• algorithm:

– iterative-deepening alpha-beta search, transposition table, databases incl. openings, grandmaster games (700000), endgames (all with 5 pieces, many with 6)

• hardware:– 30 IBM RS/6000 processors

• software search: at high level– 480 custom chess processors for

• hardware search: search deep in the tree, move generation and ordering, position evaluation (8000 features)

• average performance:– 126 million nodes/sec., 30 billion position/move generated,

search depth: 14 (but up to 40 plies)

Samuel’s Checkers Program (1952)

• learn an evaluation function by self-play(see: machine learning)

• beat its creator after several days of self-play

• hardware: IBM 704– 10kHz processor– 10000 words of memory– magnetic tape for long-term storage

Chinook: Checkers World Champion

• simple alpha-beta search (running on PCs)• database of 444 billion positions with eight or

fewer pieces• problem: Marion Tinsley

– world checkers champion for over 40 years– lost three games in all this time

• 1990: Tinsley vs. Chinook: 20.5-18.5– Chinook won two games!

• 1994: Tinsley retires (for health reasons)

Backgammon

• TD-GAMMON– search only to depth 2 or 3– evaluation function

• machine learning techniques (see Samuel’s Checkers Program)

• neural network

– performance• ranked amongst top three players in the world

• program’s opinions have altered received wisdom

Go

• most popular board game in Asia

• 19x19 board: initial branching factor 361– too much for search methods

• best programs: Goemate/Go4++– pattern recognition techniques (rules)– limited search (locally)

• performance: 10 kyu (weak amateur)

A Dose of Reality: Chance• unpredictability:

– in real life: normal; often external events that are not predictable

– in games: add random element, e.g. throwing dice, shuffling of cards

• games with an element of chance are less “toy problems”

Example: Backgammon

move:

• roll pair of dice

• move pieces according to result

Search Trees with Chance Nodes

• problem: – MAX knows its own legal moves

– MAX does not know MIN’s possible responses

• solution: introduce chance nodes– between all MIN and MAX nodes

– with n children if there are n possible outcomes of the random element, each labelled with

• the result of the random element

• the probability of this outcome

Example: Search Tree for Backgammon

1/361-1

1/181-2

1/185-6

1/366-6

MAX

MIN

CHANCE

CHANCE

move

probability +outcome

move

probability +outcome

Optimal Decisions for Games with Chance Elements

• aim: pick move that leads to best position

• idea: calculate the expected value over all possible outcomes of the random element

expectiminimax value

Example: Simple Tree

0.9 0.90.1 0.1

2 2 13 43 1 4

2 3 1 4

0.9 × 2 + 0.1 × 3 = 2.1 0.9 × 1 + 0.1 × 4 = 1.3

2.1 1.3

Complexity of Expectiminimax• time complexity: O(bmnm)

– b: maximal number of possible moves– n: number of possible outcomes for the random

element– m: maximal search depth

• example: backgammon– average b is around 20

(but can be up to 4000 for doubles)– n = 21– about three ply depth is feasible