Game tree search As ever, slides drawn from Andrew Moores Machine Learning Tutorials: as well as...

Game tree search As ever, slides drawn from Andrew Moores Machine Learning Tutorials:as well as from Faheim Bacchus Game tree search In this lecture we will cover some basics of two player zero sum discrete finite deterministic games of perfect information More on the meaning of zero sum We will focus on Zero Sum games. Zero-sum games are fully competitive if one player wins, the other player loses more specifically, the amount of money I win (lose) at poker is the amount of money you lose (win) More general games can be cooperative some outcomes are preferred by both of us, or at least our values arent diametrically opposed Is tic tac toe zero sum? Scissors cut paper, paper covers rock, rock smashes scissors Represented as a matrix: Player I chooses a row, Player II chooses a column Payoff to each player in each cell (Pl.I / Pl.II) 1: win, 0: tie, -1: loss so its zero-sum RPS 0/00/0 0/00/0 0/00/0 -1/1 1/-1 R P S Player II Player I Is the prisoners dilemma zero sum? Two prisoners in separate cells, DA doesnt have enough evidence to convict them If one confesses, other doesnt: confessor goes free other sentenced to 4 years If both confess (both defect) both sentenced to 3 years Neither confess (both cooperate) sentenced to 1 year on minor charge Payoff: 4 minus sentence CoopDef 3/3 1/1 0/4 4/0 Coop Def With a search space defined for II-Nim, we can define a Game Tree. A game tree looks like a search tree Layers reflect alternating moves between A and B Player A doesnt decide where to go alone after Player A moves to a state, B decides which of the states children to move to. Thus, A must have a strategy: A must know what to do for each possible move of B. What to do will depend on how B plays. Questions: What happens if there are loops in the tree? How would looping influence your determination of the minimax value for a node? Lets practice by computing all the game theoretic values for nodes in this tree. A A A A B B B A A A A B B B Lets practice by computing all the game theoretic values for nodes in this tree. A A A A B B B A A A A B B B Question: if both players play rationally, what path will be followed through this tree? A A A A B B B A note on complexity Imagine you have a game with N states, that each state has b successors, and the length of the game is usually D moves. Minimax will expand O(b D ) states, which is both a BEST and WORSE case scenario. This is different than regular DFS! Tricks exist, however, to reduce this complexity to O(N)... using a tool called dynamic programming. Alpha Beta Pruning, from Russel & Norvig Function max-value(state, game, alpha, beta) returns minimax value of state inputs: state (current game state) game (game description) alpha (best score for MAX on path -- ie highest) beta (best score for MIN on path -- ie lowest) if (state) is a terminal, return eval(state) else for every successor, s, alpha = max(alpha, min-value (s, game, alpha, beta)) if (alpha >= beta) return beta; %were done here, MIN will never go this way so cut!! end return alpha. Function min-value(state, game, alpha, beta) returns minimax value of state if (state) is a terminal, return eval(state) else for every successor, s, beta = min(beta, max-value (s, game, alpha, beta)) if (beta

Game tree search As ever, slides drawn from Andrew Moores Machine Learning Tutorials: as well as...

Documents

Transcript of Game tree search As ever, slides drawn from Andrew Moores Machine Learning Tutorials: as well as...