AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full...

AI from a game player perspective

Eleonora Giunchiglia

Why?

David Churchill, professor at Memorial University of Newfoundland:

``From a scientific point of view, the properties of StarCraft are very much like the properties of real life. [. . .] We’re making a

test bed for technologies we can use in the real world.’’

This concept can be extended to every game and justifies the research of AI in games.

Connections with MAS A multiagent system is one composed of multiple interacting software components known as agents, which are typically capable of cooperating to solve problems that are beyond the abilities of any individual member.

This represents by far a more complex setting than the traditional 1vs1 games à only in recent years they were able to study games with multiple agents.

In this project we traced the path that led from AI applied to 1vs1 games to many vs. many games.

The very first attempts The very first attempts were done even before the concept of Artificial intelligence was born:

�  1890: Leonardo Torres y Quevedo developed an electro-mechanical device, El Ajedrecista, to checkmate a human opponent’s king using only its own king and rook.

�  1948: Alan Turing wrote the algorithm TurboChamp. He never managed to run it on a real computer.

The very first attempts �  1950: Claude Shannon proposes the Minimax algorithm.

Shannon proposed two different ways of deciding the next move:

1.  doing brute-force tree search on the complete tree, and take the optimal move, or

2.  looking at a small subset of next moves at each layer during tree search, and take the “likely optimal” move.

Why Chess? Chess represented the ideal challenge because:

�  discrete structure

�  well defined problem

�  zero-sum game with perfect information

�  Markov’s property

The Dartmouth Workshop Organized by Jonh McCarthy in 1956. Proposal:

“We propose that a 2 month, 10 man study of artificial intelligence be carried out during the summer of 1956 at

Dartmouth College in Hanover, New Hampshire. The study is to proceed on the basis of the conjecture that every aspect of

learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. An attempt will be made to find how to make machines use

language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves...”

Projects developed after the workshop: �  Arthur Samuel implemented an algorithm able to

play Checkers. His algorithm was essentially based on Minimax, but with one key addition: learning.

�  Nathaniel Rochester together the mathematician Alex Bernstein fully implemented the algorithm on the IBM 701. The program only achieved very basic chess play.

Deep Blue �  Developed 40 years after the Darthmouth workshop

�  Based on the Minimax algorithm

�  Two versions, both played against the chess world champion Garry Kasparov: 1.  1996 Deep Blue: it lost with 1 win, 2 draws and 2

losses. 2.  1997 Deep Blue: it won with 2 wins, 3 draws and 1

loss.

Reasons behind the victory �  Massively parallel designed hardware, able to

examine 200 millions of chess positions per second

�  256 special purpose chips orchestrated by a 32-process supercomputer à each chip evaluated a position and find all legal moves from it in only one electronic flash

�  Helped by chess players to write the library for the opening moves

�  Finely designed evaluation function

The evaluation function Values taken into account:

�  Material

�  Position

�  King safety

�  Tempo

On the ground of these factors, the algorithm selected the move that returned the highest value among the ones belonging to the set of legal moves.

The game Go

Game rules:

�  The game is played by two opponents taking turns

�  Played on a 19 x 19 board

�  Each player can either control black or white pieces

�  In each turn one player can place one stone

�  stones may be “captured” à the stone must be removed from the board

The game Go

Game aim: each player’s aim is to control as much of the board as possible

Stop condition: the game proceeds until neither player wishes to make another move

Game complexity: lower bound of legal moves in Go is 2x10170

First attempt to tackle Go 1962: H. Remus built a machine able to play a simplified version of Go à 11x11 board Idea: the initial moves were decided mainly with the random generator, but subsequently and increasingly with the heuristic computer, and eventually with the lexicon.

Later advancements �  1968: Alfred Zobrist managed to implement an

algorithm able to beat a wholly inexperienced amateur à algorithm not based on tree search but on visual patterns

�  1993: Bernd Brugmann implemented a program based on tree search. Instead of having a complex evaluation function he simulated to play many random games and picked the move that led to the best outcome on average.

Later advancements

�  2006: Rémi Coulom implemented CrazyStone, a program deploying Monte Carlo evaluation with tree search (MCTS) à winner of 2016 KGS computer-Go tournament for the 9 × 9 variant of Go. MCTS is based on many playouts à In each playout, the game is played out to the very end by selecting moves at random.

MCTS: details Each round of Monte Carlo tree search consists of 4 steps:

1) Selection 2) Expansion 3) Simulation/ playout

4) Backpropagation

Later advancements

�  2014: two unrelated groups of scientists 1.  supervised by Maddison C.

2.  supervised by Clark C. trained a CNN to play and published their results with only 2 weeks of distance between each other.

Good results, but not enough to play against MCTS algorithms operating at full brute force capacity.

AlphaGo

Represents the result obtained by Maddison’s group à they followed the idea of using neural networks together with MCTS

Basic intuition:

�  Neural networks serve as intuition to recognize possible good moves

�  MCTS serve as reasoning to give an evaluation about how good these moves actually are

AlphaGo: characteristics �  Policy network: used to predict the best move for a

given position à this network has been trained on a dataset containing 30 million position-move pairs from games played by humans + reinforcement learning to play against older versions of itself

�  Rollout network: smaller network to be used in the rollout phase of MCTS à better than random but faster then using the policy network

�  Value network: evaluate a position on the ground of the probability of winning the game from that position.

AlphaGo: steps

Every time AlphaGo does a move:

1.  Policy net suggests promising moves to evaluate

2.  These moves are evaluated through a combination of MCTS rollouts and the value network prediction

AlphaGo: implementation

AlphaGo could run 48 CPUs deploying 40 search threads, aided by 8 GPUs for neural net computations.

AlphaGo was also implemented in a distributed version which could scale to more than a thousand CPUs and close to 200 GPUs.

AlphaGo: results

�  October 2015: AlphaGo won its first match against the reigning 3-times European Champion, Mr Fan Hui à first time a machine beat a Go professional player.

�  March 2016: AlphaGo won with a score of 4 - 1 against Mr Lee Sedol, winner of 18 world titles and widely considered to be the greatest player last ten years.

AI plays Atari games

2013: DeepMind published an article in which they presented a new way of training neural networks with reinforcement learning

As a demonstration of effectiveness, they presented a neural network, called the Q-network, able to play 7 Atari 2600 games without any adjustment to the architecture or the learning algorithm.

Atari games: challenges

�  High dimensional input: the agent could only take the value of raw pixels as input (210 x 160 RGB video at 60Hz)

�  No Markov property: the algorithm could not take decisions only on the ground of the current frame, but it had also to take into account the previous ones.

Atari games: the Q-network

DeepMind trained the Q-network with a variation of the Q-learning. The team made the Q-network play multiple consecutive games and minimize a loss function dependent on:

�  a sequence of past actions performed by the neural network itself

�  the current performed action

Results in Atari Breakout �  After 10 minutes of training, the algorithm tries to hit

the ball, but it is yet too clumsy to manage. �  After 120 minutes of training, the algorithm already

plays at a good level. �  After 240 minutes of training, the algorithm realizes

that digging a tunnel through the wall is the most effective technique to beat the game.

Open challenges: StarCraft The game: in the full 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which contains resources and other elements.

The game:

The most common games are 1v1 played between human players, but team games are possible as well (2v2, 3v3 or 4v4), as well as there exist more difficult games with unbalanced teams or more than two teams.

In the fu l l 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which contains resources and other elements.

StarCraft: rules To win a game, a player must:

1.  accumulate resources

2.  build structures for the production

3.  accumulate soldiers in order to build an army

4.  destroy all of the enemy’s buildings.

StarCraft: challenges Why is StarCraft a difficult game to be tackled?

�  Multi-agent problem with multiple players interacting

�  Imperfect information game

�  Large action space involving the selection and control of hundreds of units

�  Large state space observable only from raw input

�  Need for long term plans

StarCraft: agent inputs All the agents must take as input:

�  minimap: a coarse representation of the state of the entire world,

�  screen: a detailed view of a subsection of the world corresponding to the players on-screen view,

�  non-spatial features: non-spatial features refer to information regarding the status of the game available to each player

StarCraft: first agents The first proposed agents are:

�  Atari-net Agent: it has been developed by adapting the architecture that had been used in the Atari paper

�  FullyConv Agent: an agent that “predicts spatial actions directly through a sequence of resolution-preserving convolutional layers”

�  FullyConv LSTM Agent: presents the advantages given by the FullyConv agent introducing an element of memory thanks to its recurrent structure.

StarCraft: first agents �  Random agents:

1.  Random policy agent: is an agent that uniformly and randomly picks an action in the set of the valid ones.

2.  Random search agent: takes many independent, randomly initialized policy networks, then it evaluates them for 20 episodes, and finally it keeps the one with the highest mean score.

Thank you!

AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full...

Documents

Transcript of AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full...