game_ai4.ppt

46
AI in game (IV) Oct. 11, 2006

Transcript of game_ai4.ppt

  • AI in game (IV)Oct. 11, 2006

  • So farArtificial Intelligence: A Modern ApproachStuart Russell and Peter NorvigPrentice Hall, 2nd ed.

    Chapter 1: AI taxonomyChapter 2: agentsChapter 3: uninformed searchChapter 4: informed search

  • From now onArtificial Intelligence: A Modern ApproachChapter 4. Chapter 6: adversarial search

    Network partLearning (maybe from the same textbook)Game AI techniques

  • OutlineCh 4. informed searchOnline searchCh 6. adversarial searchOptimal decisions- pruningImperfect, real-time decisions

  • Offline search vs. online searchOffline search agentsCompute a solution before setting foot in the real world

    Online search agentsInterleave computation and actionE.g. takes an action and then observes environments and then computes the next actionNecessary for an exploration problemStates and actions are unknownE.g. robot in a new building, or labyrinth

  • online search problemsAgents are assumed to know onlyActions(s): returns a list of actions allowed in statesc(s,a,s): this step-cost cannot be used until the agent knows that s is the outcomeGoal-test(s)The agent cannot access the successors of a state except by actually trying all the actions in that stateAssumptionsThe agent can recognize a state that it has visited beforeActions are deterministicOptionally, an admissible heuristic function

  • online search problemsIf some actions are irreversible, the agent may reach a dead end

    If some goal state is reachable from every reachable state, the state space is safely explorable

  • online search agentsOnline algorithm can expand only a node that it physically occupiesOffline algorithms can expand any node in fringeSame principle as DFS

  • Online DFSfunction ONLINE_DFS-AGENT(s) return an actioninput: s, a percept identifying current statestatic: result, a table of the next state, indexed by action and state, initially empty unexplored, a stack that lists, for each visited state, the action not yet tried unbacktracked, a stack that lists, for each visited state, the predecessor states to which the agent has not yet backtracked s, a, the previous state and action, initially null

    if GOAL-TEST(s) then return stopif s is a new state then unexplored[s] ACTIONS(s)if s is not null then doresult[a,s] sadd s to the front of unbackedtracked[s]if unexplored[s] is empty thenif unbacktracked[s] is empty then return stopelse a an action b such that result[b, s]=POP(unbacktracked[s])else a POP(unexplored[s])s sreturn a

  • Online DFS, exampleAssume maze problem on 3x3 grid.s = (1,1) is initial stateresult, unexplored (UX), unbacktracked (UB), are emptys, a are also emptys

  • Online DFS, exampleGOAL-TEST((1,1))?s != G thus false(1,1) a new state? TrueACTION((1,1)) UX[(1,1)]{RIGHT,UP}s is null?True (initially)UX[(1,1)] empty? FalsePOP(UX[(1,1)]) aa=UPs = (1,1)Return a s=(1,1)s

  • Online DFS, exampleGOAL-TEST((1,2))?s != G thus false(1,2) a new state? TrueACTION((1,2)) UX[(1,2)]{DOWN}s is null?false (s=(1,1))result[UP,(1,1)] (1,2)UB[(1,2)]={(1,1)}UX[(1,2)] empty? Falsea=DOWN, s=(1,2)return ass=(1,2)s

  • Online DFS, exampleGOAL-TEST((1,1))?s != G thus false(1,1) a new state? falses is null?false (s=(1,2))result[DOWN,(1,2)] (1,1)UB[(1,1)] = {(1,2)}UX[(1,1)] empty? Falsea=RIGHT, s=(1,1) return ass=(1,1)s

  • Online DFS, exampleGOAL-TEST((2,1))?s != G thus false(2,1) a new state? True, UX[(2,1)]={RIGHT,UP,LEFT}s is null?false (s=(1,1))result[RIGHT,(1,1)] (2,1)UB[(2,1)]={(1,1)}UX[(2,1)] empty? Falsea=LEFT, s=(2,1) return ass=(2,1)s

  • Online DFS, exampleGOAL-TEST((1,1))?s != G thus false(1,1) a new state? falses is null?false (s=(2,1))result[LEFT,(2,1)] (1,1)UB[(1,1)]={(2,1),(1,2)}UX[(1,1)] empty? TrueUB[(1,1)] empty? Falsea = an action b such that result[b,(1,1)]=(2,1)b=RIGHTa=RIGHT, s=(1,1)Return aAnd so onss=(1,1)s

  • Online DFSWorst case each node is visited twice.An agent can go on a long walk even when it is close to the solution.An online iterative deepening approach solves this problem.Online DFS works only when actions are reversible.

  • Online local searchHill-climbing is already onlineOne state is stored.Bad performance due to local maximaRandom restarts impossible.Solution1: Random walk introduces exploration Selects one of actions at random, preference to not-yet-tried actioncan produce exponentially many steps

  • Online local searchSolution 2: Add memory to hill climberStore current best estimate H(s) of cost to reach goalH(s) is initially the heuristic estimate h(s)Afterward updated with experience (see below)Learning real-time A* (LRTA*)The current position of agent

  • Learning real-time A*(LRTA*)function LRTA*-COST(s,a,s,H) return an cost estimateif s is undefined the return h(s)else return c(s,a,s) + H[s]

    function LRTA*-AGENT(s) return an actioninput: s, a percept identifying current statestatic: result, a table of next state, indexed by action and state, initially emptyH, a table of cost estimates indexed by state, initially emptys, a, the previous state and action, initially nullif GOAL-TEST(s) then return stopif s is a new state (not in H) then H[s] h(s)unless s is nullresult[a,s] sH[s] min LRTA*-COST(s,b,result[b,s],H) b ACTIONS(s)a an action b in ACTIONS(s) that minimizes LRTA*-COST(s,b,result[b,s],H)s sreturn a

  • OutlineCh 4. informed searchCh 6. adversarial searchOptimal decisions- pruningImperfect, real-time decisions

  • Games vs. search problemsProblem solving agent is not alone any moreMultiagent, conflictDefault: deterministic, turn-taking, two-player, zero sum game of perfect informationPerfect info. vs. imperfect, or probability"Unpredictable" opponent specifying a move for every possible opponent reply

    Time limits unlikely to find goal, must approximate

    * Environments with very many agents are best viewed as economies rather than games

  • Game formalizationInitial stateA successor functionReturns a list of (move, state) parisTerminal testTerminal statesUtility function (or objective function)A numeric value for the terminal statesGame treeThe state space

  • Tic-tac-toe: Game tree (2-player, deterministic, turns)

  • MinimaxPerfect play for deterministic games: optimal strategy

    Idea: choose move to position with highest minimax value = best achievable payoff against best play

    E.g., 2-ply game: only two half-moves

  • Minimax algorithm

  • Problem of minimax searchNumber of games states is exponential to the number of moves.Solution: Do not examine every node ==> Alpha-beta pruningRemove branches that do not influence final decisionRevisit example

  • Alpha-Beta Example[-, +][-,+]Range of possible valuesDo DF-search until first leaf

  • Alpha-Beta Example (continued)[-,3][-,+]

  • Alpha-Beta Example (continued)[-,3][-,+]

  • Alpha-Beta Example (continued)[3,+][3,3]

  • Alpha-Beta Example (continued)[-,2][3,+][3,3]This node is worse for MAX

  • Alpha-Beta Example (continued)[-,2][3,14][3,3][-,14],

  • Alpha-Beta Example (continued)[,2][3,5][3,3][-,5],

  • Alpha-Beta Example (continued)[2,2][,2][3,3][3,3]

  • Alpha-Beta Example (continued)[2,2][-,2][3,3][3,3]

  • Properties of -Pruning does not affect final result

    Good move ordering improves effectiveness of pruning

    With "perfect ordering," time complexity = O(bm/2) doubles depth of search

  • Why is it called -? is the value of the best (i.e., highest-value) choice found so far at any choice point along the path for max

    If v is worse than , max will avoid it

    prune that branch

    Define similarly for min

  • The - pruning algorithm

  • The - pruning algorithm

  • Resource limitsIn reality, imperfect and real-time decisions are requiredSuppose we have 100 secs, explore 104 nodes/sec 106 nodes per move

    Standard approach:

    cutoff test: e.g., depth limit

    evaluation function = estimated desirability of position

  • Evaluation functionsFor chess, typically linear weighted sum of featuresEval(s) = w1 f1(s) + w2 f2(s) + + wn fn(s)

    e.g., w1 = 9 for queen, w2 = 5 for rook, wn = 1 for pawnf1(s) = (number of white queens) (number of black queens), etc.

  • Cutting off searchMinimaxCutoff is identical to MinimaxValue exceptTerminal-Test is replaced by Cutoff-TestUtility is replaced by Eval

    Does it work in practice?

    bm = 106, b=35 m 4

    4-ply lookahead is a hopeless chess player!

    4-ply human novice8-ply typical PC, human master12-ply Deep Blue, Kasparov

  • Games that include chanceBackgammon: move all ones pieces off the boardBranches leading from each chance node denote the possible dice rollsLabeled with roll and the probabilitychance nodes

  • Games that include chance[1,1], [6,6] chance 1/36, all other chance 1/18 Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16) and (5-11,11-16)Cannot calculate definite minimax value, only expected value

  • Expected minimax valueEXPECTED-MINIMAX-VALUE(n) =

    UTILITY(n)If n is a terminalmaxs successors(n) MINIMAX-VALUE(s) If n is a max nodemins successors(n) MINIMAX-VALUE(s) If n is a max nodes successors(n) P(s) * EXPECTEDMINIMAX(s) If n is a chance node

    These equations can be backed-up recursively all the way to the root of the game tree.

  • Position evaluation with chance nodesLeft, A1 is bestRight, A2 is bestOutcome of evaluation function (hence the agent behavior) may change when values are scaled differently.Behavior is preserved only by a positive linear transformation of EVAL.