game_ai4.ppt

AI in game (IV)Oct. 11, 2006

So farArtificial Intelligence: A Modern ApproachStuart Russell and Peter NorvigPrentice Hall, 2nd ed.

Chapter 1: AI taxonomyChapter 2: agentsChapter 3: uninformed searchChapter 4: informed search

From now onArtificial Intelligence: A Modern ApproachChapter 4. Chapter 6: adversarial search

Network partLearning (maybe from the same textbook)Game AI techniques

OutlineCh 4. informed searchOnline searchCh 6. adversarial searchOptimal decisions- pruningImperfect, real-time decisions

Offline search vs. online searchOffline search agentsCompute a solution before setting foot in the real world

Online search agentsInterleave computation and actionE.g. takes an action and then observes environments and then computes the next actionNecessary for an exploration problemStates and actions are unknownE.g. robot in a new building, or labyrinth

online search problemsAgents are assumed to know onlyActions(s): returns a list of actions allowed in statesc(s,a,s): this step-cost cannot be used until the agent knows that s is the outcomeGoal-test(s)The agent cannot access the successors of a state except by actually trying all the actions in that stateAssumptionsThe agent can recognize a state that it has visited beforeActions are deterministicOptionally, an admissible heuristic function

online search problemsIf some actions are irreversible, the agent may reach a dead end

If some goal state is reachable from every reachable state, the state space is safely explorable

online search agentsOnline algorithm can expand only a node that it physically occupiesOffline algorithms can expand any node in fringeSame principle as DFS

Online DFSfunction ONLINE_DFS-AGENT(s) return an actioninput: s, a percept identifying current statestatic: result, a table of the next state, indexed by action and state, initially empty unexplored, a stack that lists, for each visited state, the action not yet tried unbacktracked, a stack that lists, for each visited state, the predecessor states to which the agent has not yet backtracked s, a, the previous state and action, initially null

if GOAL-TEST(s) then return stopif s is a new state then unexplored[s] ACTIONS(s)if s is not null then doresult[a,s] sadd s to the front of unbackedtracked[s]if unexplored[s] is empty thenif unbacktracked[s] is empty then return stopelse a an action b such that result[b, s]=POP(unbacktracked[s])else a POP(unexplored[s])s sreturn a

Online DFS, exampleAssume maze problem on 3x3 grid.s = (1,1) is initial stateresult, unexplored (UX), unbacktracked (UB), are emptys, a are also emptys

Online DFS, exampleGOAL-TEST((1,1))?s != G thus false(1,1) a new state? TrueACTION((1,1)) UX[(1,1)]{RIGHT,UP}s is null?True (initially)UX[(1,1)] empty? FalsePOP(UX[(1,1)]) aa=UPs = (1,1)Return a s=(1,1)s

Online DFS, exampleGOAL-TEST((1,2))?s != G thus false(1,2) a new state? TrueACTION((1,2)) UX[(1,2)]{DOWN}s is null?false (s=(1,1))result[UP,(1,1)] (1,2)UB[(1,2)]={(1,1)}UX[(1,2)] empty? Falsea=DOWN, s=(1,2)return ass=(1,2)s

Online DFS, exampleGOAL-TEST((1,1))?s != G thus false(1,1) a new state? falses is null?false (s=(1,2))result[DOWN,(1,2)] (1,1)UB[(1,1)] = {(1,2)}UX[(1,1)] empty? Falsea=RIGHT, s=(1,1) return ass=(1,1)s

Online DFS, exampleGOAL-TEST((2,1))?s != G thus false(2,1) a new state? True, UX[(2,1)]={RIGHT,UP,LEFT}s is null?false (s=(1,1))result[RIGHT,(1,1)] (2,1)UB[(2,1)]={(1,1)}UX[(2,1)] empty? Falsea=LEFT, s=(2,1) return ass=(2,1)s

Online DFS, exampleGOAL-TEST((1,1))?s != G thus false(1,1) a new state? falses is null?false (s=(2,1))result[LEFT,(2,1)] (1,1)UB[(1,1)]={(2,1),(1,2)}UX[(1,1)] empty? TrueUB[(1,1)] empty? Falsea = an action b such that result[b,(1,1)]=(2,1)b=RIGHTa=RIGHT, s=(1,1)Return aAnd so onss=(1,1)s

Online DFSWorst case each node is visited twice.An agent can go on a long walk even when it is close to the solution.An online iterative deepening approach solves this problem.Online DFS works only when actions are reversible.

Online local searchHill-climbing is already onlineOne state is stored.Bad performance due to local maximaRandom restarts impossible.Solution1: Random walk introduces exploration Selects one of actions at random, preference to not-yet-tried actioncan produce exponentially many steps

Online local searchSolution 2: Add memory to hill climberStore current best estimate H(s) of cost to reach goalH(s) is initially the heuristic estimate h(s)Afterward updated with experience (see below)Learning real-time A* (LRTA*)The current position of agent

Learning real-time A*(LRTA*)function LRTA*-COST(s,a,s,H) return an cost estimateif s is undefined the return h(s)else return c(s,a,s) + H[s]

function LRTA*-AGENT(s) return an actioninput: s, a percept identifying current statestatic: result, a table of next state, indexed by action and state, initially emptyH, a table of cost estimates indexed by state, initially emptys, a, the previous state and action, initially nullif GOAL-TEST(s) then return stopif s is a new state (not in H) then H[s] h(s)unless s is nullresult[a,s] sH[s] min LRTA*-COST(s,b,result[b,s],H) b ACTIONS(s)a an action b in ACTIONS(s) that minimizes LRTA*-COST(s,b,result[b,s],H)s sreturn a

OutlineCh 4. informed searchCh 6. adversarial searchOptimal decisions- pruningImperfect, real-time decisions

Games vs. search problemsProblem solving agent is not alone any moreMultiagent, conflictDefault: deterministic, turn-taking, two-player, zero sum game of perfect informationPerfect info. vs. imperfect, or probability"Unpredictable" opponent specifying a move for every possible opponent reply

Time limits unlikely to find goal, must approximate

* Environments with very many agents are best viewed as economies rather than games

Game formalizationInitial stateA successor functionReturns a list of (move, state) parisTerminal testTerminal statesUtility function (or objective function)A numeric value for the terminal statesGame treeThe state space

Tic-tac-toe: Game tree (2-player, deterministic, turns)

MinimaxPerfect play for deterministic games: optimal strategy

Idea: choose move to position with highest minimax value = best achievable payoff against best play

E.g., 2-ply game: only two half-moves

Minimax algorithm

Problem of minimax searchNumber of games states is exponential to the number of moves.Solution: Do not examine every node ==> Alpha-beta pruningRemove branches that do not influence final decisionRevisit example

Alpha-Beta Example[-, +][-,+]Range of possible valuesDo DF-search until first leaf

Alpha-Beta Example (continued)[-,3][-,+]

Alpha-Beta Example (continued)[3,+][3,3]

Alpha-Beta Example (continued)[-,2][3,+][3,3]This node is worse for MAX

Alpha-Beta Example (continued)[-,2][3,14][3,3][-,14],

Alpha-Beta Example (continued)[,2][3,5][3,3][-,5],

Alpha-Beta Example (continued)[2,2][,2][3,3][3,3]

Alpha-Beta Example (continued)[2,2][-,2][3,3][3,3]

Properties of -Pruning does not affect final result

Good move ordering improves effectiveness of pruning

With "perfect ordering," time complexity = O(bm/2) doubles depth of search

Why is it called -? is the value of the best (i.e., highest-value) choice found so far at any choice point along the path for max

If v is worse than , max will avoid it

prune that branch

Define similarly for min

The - pruning algorithm

Resource limitsIn reality, imperfect and real-time decisions are requiredSuppose we have 100 secs, explore 104 nodes/sec 106 nodes per move

Standard approach:

cutoff test: e.g., depth limit

evaluation function = estimated desirability of position

Evaluation functionsFor chess, typically linear weighted sum of featuresEval(s) = w1 f1(s) + w2 f2(s) + + wn fn(s)

e.g., w1 = 9 for queen, w2 = 5 for rook, wn = 1 for pawnf1(s) = (number of white queens) (number of black queens), etc.

Cutting off searchMinimaxCutoff is identical to MinimaxValue exceptTerminal-Test is replaced by Cutoff-TestUtility is replaced by Eval

Does it work in practice?

bm = 106, b=35 m 4

4-ply lookahead is a hopeless chess player!

4-ply human novice8-ply typical PC, human master12-ply Deep Blue, Kasparov

Games that include chanceBackgammon: move all ones pieces off the boardBranches leading from each chance node denote the possible dice rollsLabeled with roll and the probabilitychance nodes

Games that include chance[1,1], [6,6] chance 1/36, all other chance 1/18 Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16) and (5-11,11-16)Cannot calculate definite minimax value, only expected value

Expected minimax valueEXPECTED-MINIMAX-VALUE(n) =

UTILITY(n)If n is a terminalmaxs successors(n) MINIMAX-VALUE(s) If n is a max nodemins successors(n) MINIMAX-VALUE(s) If n is a max nodes successors(n) P(s) * EXPECTEDMINIMAX(s) If n is a chance node

These equations can be backed-up recursively all the way to the root of the game tree.

Position evaluation with chance nodesLeft, A1 is bestRight, A2 is bestOutcome of evaluation function (hence the agent behavior) may change when values are scaled differently.Behavior is preserved only by a positive linear transformation of EVAL.

game_ai4.ppt

Documents

Transcript of game_ai4.ppt