AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full...

34
AI from a game player perspective Eleonora Giunchiglia

Transcript of AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full...

Page 1: AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which

AI from a game player perspective

Eleonora Giunchiglia

Page 2: AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which

Why?

David Churchill, professor at Memorial University of Newfoundland:

``From a scientific point of view, the properties of StarCraft are very much like the properties of real life. [. . .] We’re making a

test bed for technologies we can use in the real world.’’

This concept can be extended to every game and justifies the research of AI in games.

Page 3: AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which

Connections with MAS A multiagent system is one composed of multiple interacting software components known as agents, which are typically capable of cooperating to solve problems that are beyond the abilities of any individual member.

This represents by far a more complex setting than the traditional 1vs1 games à only in recent years they were able to study games with multiple agents.

In this project we traced the path that led from AI applied to 1vs1 games to many vs. many games.

Page 4: AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which

The very first attempts The very first attempts were done even before the concept of Artificial intelligence was born:

�  1890: Leonardo Torres y Quevedo developed an electro-mechanical device, El Ajedrecista, to checkmate a human opponent’s king using only its own king and rook.

�  1948: Alan Turing wrote the algorithm TurboChamp. He never managed to run it on a real computer.

Page 5: AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which

The very first attempts �  1950: Claude Shannon proposes the Minimax algorithm.

Shannon proposed two different ways of deciding the next move:

1.  doing brute-force tree search on the complete tree, and take the optimal move, or

2.  looking at a small subset of next moves at each layer during tree search, and take the “likely optimal” move.

Page 6: AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which

Why Chess? Chess represented the ideal challenge because:

�  discrete structure

�  well defined problem

�  zero-sum game with perfect information

�  Markov’s property

Page 7: AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which

The Dartmouth Workshop Organized by Jonh McCarthy in 1956. Proposal:

“We propose that a 2 month, 10 man study of artificial intelligence be carried out during the summer of 1956 at

Dartmouth College in Hanover, New Hampshire. The study is to proceed on the basis of the conjecture that every aspect of

learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. An attempt will be made to find how to make machines use

language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves...”

Page 8: AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which

Projects developed after the workshop: �  Arthur Samuel implemented an algorithm able to

play Checkers. His algorithm was essentially based on Minimax, but with one key addition: learning.

�  Nathaniel Rochester together the mathematician Alex Bernstein fully implemented the algorithm on the IBM 701. The program only achieved very basic chess play.

Page 9: AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which

Deep Blue �  Developed 40 years after the Darthmouth workshop

�  Based on the Minimax algorithm

�  Two versions, both played against the chess world champion Garry Kasparov: 1.  1996 Deep Blue: it lost with 1 win, 2 draws and 2

losses. 2.  1997 Deep Blue: it won with 2 wins, 3 draws and 1

loss.

Page 10: AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which

Reasons behind the victory �  Massively parallel designed hardware, able to

examine 200 millions of chess positions per second

�  256 special purpose chips orchestrated by a 32-process supercomputer à each chip evaluated a position and find all legal moves from it in only one electronic flash

�  Helped by chess players to write the library for the opening moves

�  Finely designed evaluation function

Page 11: AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which

The evaluation function Values taken into account:

�  Material

�  Position

�  King safety

�  Tempo

On the ground of these factors, the algorithm selected the move that returned the highest value among the ones belonging to the set of legal moves.

Page 12: AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which

The game Go

Game rules:

�  The game is played by two opponents taking turns

�  Played on a 19 x 19 board

�  Each player can either control black or white pieces

�  In each turn one player can place one stone

�  stones may be “captured” à the stone must be removed from the board

Page 13: AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which

The game Go

Game aim: each player’s aim is to control as much of the board as possible

Stop condition: the game proceeds until neither player wishes to make another move

Game complexity: lower bound of legal moves in Go is 2x10170

Page 14: AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which

First attempt to tackle Go 1962: H. Remus built a machine able to play a simplified version of Go à 11x11 board Idea: the initial moves were decided mainly with the random generator, but subsequently and increasingly with the heuristic computer, and eventually with the lexicon.

Page 15: AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which

Later advancements �  1968: Alfred Zobrist managed to implement an

algorithm able to beat a wholly inexperienced amateur à algorithm not based on tree search but on visual patterns

�  1993: Bernd Brugmann implemented a program based on tree search. Instead of having a complex evaluation function he simulated to play many random games and picked the move that led to the best outcome on average.

Page 16: AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which

Later advancements

�  2006: Rémi Coulom implemented CrazyStone, a program deploying Monte Carlo evaluation with tree search (MCTS) à winner of 2016 KGS computer-Go tournament for the 9 × 9 variant of Go. MCTS is based on many playouts à In each playout, the game is played out to the very end by selecting moves at random.

Page 17: AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which

MCTS: details Each round of Monte Carlo tree search consists of 4 steps:

1) Selection 2) Expansion 3) Simulation/ playout

4) Backpropagation

Page 18: AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which

Later advancements

�  2014: two unrelated groups of scientists 1.  supervised by Maddison C.

2.  supervised by Clark C. trained a CNN to play and published their results with only 2 weeks of distance between each other.

Good results, but not enough to play against MCTS algorithms operating at full brute force capacity.

Page 19: AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which

AlphaGo

Represents the result obtained by Maddison’s group à they followed the idea of using neural networks together with MCTS

Basic intuition:

�  Neural networks serve as intuition to recognize possible good moves

�  MCTS serve as reasoning to give an evaluation about how good these moves actually are

Page 20: AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which

AlphaGo: characteristics �  Policy network: used to predict the best move for a

given position à this network has been trained on a dataset containing 30 million position-move pairs from games played by humans + reinforcement learning to play against older versions of itself

�  Rollout network: smaller network to be used in the rollout phase of MCTS à better than random but faster then using the policy network

�  Value network: evaluate a position on the ground of the probability of winning the game from that position.

Page 21: AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which

AlphaGo: steps

Every time AlphaGo does a move:

1.  Policy net suggests promising moves to evaluate

2.  These moves are evaluated through a combination of MCTS rollouts and the value network prediction

Page 22: AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which

AlphaGo: implementation

AlphaGo could run 48 CPUs deploying 40 search threads, aided by 8 GPUs for neural net computations.

AlphaGo was also implemented in a distributed version which could scale to more than a thousand CPUs and close to 200 GPUs.

Page 23: AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which

AlphaGo: results

�  October 2015: AlphaGo won its first match against the reigning 3-times European Champion, Mr Fan Hui à first time a machine beat a Go professional player.

�  March 2016: AlphaGo won with a score of 4 - 1 against Mr Lee Sedol, winner of 18 world titles and widely considered to be the greatest player last ten years.

Page 24: AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which

AI plays Atari games

2013: DeepMind published an article in which they presented a new way of training neural networks with reinforcement learning

As a demonstration of effectiveness, they presented a neural network, called the Q-network, able to play 7 Atari 2600 games without any adjustment to the architecture or the learning algorithm.

Page 25: AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which

Atari games: challenges

�  High dimensional input: the agent could only take the value of raw pixels as input (210 x 160 RGB video at 60Hz)

�  No Markov property: the algorithm could not take decisions only on the ground of the current frame, but it had also to take into account the previous ones.

Page 26: AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which

Atari games: the Q-network

DeepMind trained the Q-network with a variation of the Q-learning. The team made the Q-network play multiple consecutive games and minimize a loss function dependent on:

�  a sequence of past actions performed by the neural network itself

�  the current performed action

Page 27: AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which

Results in Atari Breakout �  After 10 minutes of training, the algorithm tries to hit

the ball, but it is yet too clumsy to manage. �  After 120 minutes of training, the algorithm already

plays at a good level. �  After 240 minutes of training, the algorithm realizes

that digging a tunnel through the wall is the most effective technique to beat the game.

Page 28: AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which

Open challenges: StarCraft The game: in the full 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which contains resources and other elements.

The game:

The most common games are 1v1 played between human players, but team games are possible as well (2v2, 3v3 or 4v4), as well as there exist more difficult games with unbalanced teams or more than two teams.

In the fu l l 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which contains resources and other elements.

Page 29: AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which

StarCraft: rules To win a game, a player must:

1.  accumulate resources

2.  build structures for the production

3.  accumulate soldiers in order to build an army

4.  destroy all of the enemy’s buildings.

Page 30: AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which

StarCraft: challenges Why is StarCraft a difficult game to be tackled?

�  Multi-agent problem with multiple players interacting

�  Imperfect information game

�  Large action space involving the selection and control of hundreds of units

�  Large state space observable only from raw input

�  Need for long term plans

Page 31: AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which

StarCraft: agent inputs All the agents must take as input:

�  minimap: a coarse representation of the state of the entire world,

�  screen: a detailed view of a subsection of the world corresponding to the players on-screen view,

�  non-spatial features: non-spatial features refer to information regarding the status of the game available to each player

Page 32: AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which

StarCraft: first agents The first proposed agents are:

�  Atari-net Agent: it has been developed by adapting the architecture that had been used in the Atari paper

�  FullyConv Agent: an agent that “predicts spatial actions directly through a sequence of resolution-preserving convolutional layers”

�  FullyConv LSTM Agent: presents the advantages given by the FullyConv agent introducing an element of memory thanks to its recurrent structure.

Page 33: AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which

StarCraft: first agents �  Random agents:

1.  Random policy agent: is an agent that uniformly and randomly picks an action in the set of the valid ones.

2.  Random search agent: takes many independent, randomly initialized policy networks, then it evaluates them for 20 episodes, and finally it keeps the one with the highest mean score.

Page 34: AI from a game player perspective - unige.it · Open challenges: StarCraft The game: in the full 1v1 game of StarCraft II, two opponent races of aliens are generated on a map which

Thank you!