Monte-Carlo Tree Search (MCTS) for Computer...

103
Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy [email protected] Université Paris Descartes AOA class

Transcript of Monte-Carlo Tree Search (MCTS) for Computer...

  • Monte-Carlo Tree Search (MCTS) for Computer Go

    Bruno [email protected]

    Université Paris Descartes

    AOA class

  • MCTS for Computer Go 2

    Outline

    ● The game of Go: a 9x9 game● The « old » approach (*-2002)● The Monte-Carlo approach (2002-2005)● The MCTS approach (2006-today)● Conclusion

  • MCTS for Computer Go 3

    The game of Go

  • MCTS for Computer Go 4

    The game of Go

    ● 4000 years● Originated from China● Developed by Japan (20th century)● Best players in Korea, Japan, China● 19x19: official board size● 9x9: beginners' board size

  • MCTS for Computer Go 5

    A 9x9 game● The board has 81 « intersections ». Initially,it

    is empty.

  • MCTS for Computer Go 6

    A 9x9 game● Black moves first. A « stone » is played on

    an intersection.

  • MCTS for Computer Go 7

    A 9x9 game● White moves second.

  • MCTS for Computer Go 8

    A 9x9 game● Moves alternate between Black and White.

  • MCTS for Computer Go 9

    A 9x9 game● Two adjacent stones of the same color

    builds a « string » with « liberties ».● 4-adjacency

  • MCTS for Computer Go 10

    A 9x9 game● Strings are created.

  • MCTS for Computer Go 11

    A 9x9 game● A white stone is in « atari » (one liberty).

  • MCTS for Computer Go 12

    A 9x9 game● The white string has five liberties.

  • MCTS for Computer Go 13

    A 9x9 game● The black stone is « atari ».

  • MCTS for Computer Go 14

    A 9x9 game● White « captures » the black stone.

  • MCTS for Computer Go 15

    A 9x9 game● For advised players, the game is over.

    – Hu?

    – Why?

  • MCTS for Computer Go 16

    A 9x9 game● What happens if White contests black

    « territory »?

  • MCTS for Computer Go 17

    A 9x9 game● White has invaded. Two strings are atari!

  • MCTS for Computer Go 18

    A 9x9 game● Black captures !

  • MCTS for Computer Go 19

    A 9x9 game● White insists but its string is atari...

  • MCTS for Computer Go 20

    A 9x9 game● Black has proved is « territory ».

  • MCTS for Computer Go 21

    A 9x9 game● Black may contest white territory too.

  • MCTS for Computer Go 22

    A 9x9 terminal position● The game is over for computers.

    – Hu?

    – Who won ?

  • MCTS for Computer Go 23

    A 9x9 game● The game ends when both players pass.● One black (resp. white) point for each black

    (resp. white) stone and each black (resp. white) « eye » on the board.

    ● One black (resp. white) eye = an empty intersection surrounded by black (resp. white) stones.

  • MCTS for Computer Go 24

    A 9x9 game● Scoring:

    – Black = 44 – White = 37– Komi = 7.5– Score = -0.5

    ● White wins!

  • MCTS for Computer Go 25

    Go ranking: « kyu » and « dan »

    Top professional players

    Average playersl

    Very beginners

    Beginners

    Strong players

    Very strong players

    Pro ranking Amateur ranking

    9 dan

    1 dan

    30 kyu

    20 kyu

    10 kyu

    1 kyu1 dan

    6 dan

    9 dan

  • MCTS for Computer Go 26

    Computer Go (old history)● First go program (Lefkovitz 1960)● Zobrist hashing (Zobrist 1969)● Interim2 (Wilcox 1979)● Life and death model (Benson 1988)● Patterns: Goliath (Boon 1990)● Mathematical Go (Berlekamp 1991)● Handtalk (Chen 1995)

  • MCTS for Computer Go 27

    The old approach● Evaluation of non terminal positions

    – Knowledge-based– Breaking-down of a position into sub-

    positions● Fixed-depth global tree search

    – Depth = 0 : action with the best value– Depth = 1: action leading to the position

    with the best evaluation– Depth > 1: alfa-beta or minmax

  • MCTS for Computer Go 28

    The old approachCurrent position

    Evaluation of non terminal positions

    Terminal positions

    Bounded depth Tree search

    361

    2 or 3

    Huhu?

  • MCTS for Computer Go 29

    Position evaluation● Break-down

    – Whole game (win/loss or score)– Goal-oriented sub-game

    ● String capture● Connections, dividers, eyes, life and death

    ● Local searches– Alpha-beta and enhancements– Proof-number search

  • MCTS for Computer Go 30

    A 19x19 middle-game position

  • MCTS for Computer Go 31

    A possible black break-down

  • MCTS for Computer Go 32

    A possible white break-down

  • MCTS for Computer Go 33

    Possible local evaluations (1)

    Alive and territory

    unstable

    alive

    dead

    alive

    Not important

  • MCTS for Computer Go 34

    Possible local evaluations (2)alive

    unstable

    unstablealive + big territory

    unstable

  • MCTS for Computer Go 35

    Position evaluation● Local results

    – Obtained with local tree search– Result if white plays first (resp. black)– Combinatorial game theory (Conway)– Switches {a|b}, >,

  • MCTS for Computer Go 36

    Position evaluation

  • MCTS for Computer Go 37

    Drawbacks (1/2)● The break-down is not unique● Performing a (wrong) local tree search on a

    (possibly irrelevant) local position● Misevaluating the size of the local position● Different kinds of local information

    – Symbolic (group: dead alive unstable)– Numerical (territory size, reduction,

    increase)

  • MCTS for Computer Go 38

    Drawbacks (2/2)● Local positions interact● Complicated● Domain-dependent knowledge● Need of human expertise● Difficult to program and maintain● Holes of knowledge● Erratic behaviour

  • MCTS for Computer Go 39

    Upsides

    ● Feasible on 1990's computers● Execution is fast

    ● Some specific local tree searches are accurate and fast

  • MCTS for Computer Go 40

    The old approach

    Top professional players

    Average playersl

    Very beginners

    Beginners

    Strong players

    Very strong players

    Pro ranking Amateur ranking

    9 dan

    1 dan

    30 kyu

    20 kyu

    10 kyu10 kyu

    1 kyu1 dan

    6 dan

    9 dan

    Old approach

  • MCTS for Computer Go 41

    End of part one!● Next: the Monte-Carlo approach...

  • MCTS for Computer Go 42

    The Monte-Carlo (MC) approach● Games containing chance

    – Backgammon (Tesauro 1989)

    ● Games with hidden information– Bridge (Ginsberg 2001)– Poker (Billings & al. 2002)– Scrabble (Sheppard 2002)

  • MCTS for Computer Go 43

    The Monte-Carlo approach● Games with complete information

    – A general model (Abramson 1990)

    ● Simulated annealing Go – (Brügmann 1993)– 2 sequences of moves– « all moves as first » heuristic– Gobble on 9x9

  • MCTS for Computer Go 44

    The Monte-Carlo approach● Position evaluation:

    Launch N random gamesEvaluation = mean value of outcomes

    ● Depth-one MC algorithm:For each move m {

    Play m on the ref positionLaunch N random gamesMove value (m) = mean value

    }

  • MCTS for Computer Go 45

    Depth-one Monte-Carlo

    ... ... ...

    ...

  • MCTS for Computer Go 46

    Progressive pruning● (Billings 2002, Sheppard 2002, Bouzy &

    Helmstetter 2003)

    Current best move

    Second best

    Pruned

    Still explored

  • MCTS for Computer Go 47

    Upper bound● Optimism in face of uncertainty

    – Intestim (Kaelbling 1993), – UCB multi-armed bandit (Auer & al 2002)

    Current best promising move

    Second best promising

    Current best proven move

  • MCTS for Computer Go 48

    All-moves-as-first heuristic (1/3)

    A B C D

    rA

    B

    D

    C

  • MCTS for Computer Go 49

    All-moves-as-first heuristic (2/3)

    Actual simulation

    A D

    CB

    A B C D

    r

    o

    A

    C

    B

    D

    A

    B

    D

    C

  • MCTS for Computer Go 50

    All-moves-as-first heuristic (3/3)

    Actual simulation

    Virtual simulation = actual simulation assuming c is played« as first »

    A D

    CB

    A B C D

    r

    o o

    CA

    C A

    B B

    D D

    A

    B

    D

    C

  • MCTS for Computer Go 51

    The Monte-Carlo approach● Upsides

    – Robust evaluation– Global search– Move quality increases with computing power

    ● Way of playing– Good strategical sense but weak tactically

    ● Easy to program– Follow the rules of the game– No break-down problem

  • MCTS for Computer Go 52

    Monte-Carlo and knowledge● Pseudo-random simulations using Go

    knowledge (Bouzy 2003)– Moves played with a probability depending on

    specific domain-dependent knowledge● 2 basic concepts

    – string capture 3x3 shapes

  • MCTS for Computer Go 53

    Monte-Carlo and knowledge

    ● Results are impressive– MC(random)

  • MCTS for Computer Go 54

    Monte-Carlo and knowledge● Pseudo-random player

    – 3x3 pattern urgency table with 38 patterns– Few dizains of relevant patterns only– Patterns gathered by

    ● Human expertise● Reinforcement Learning (Bouzy & Chaslot

    2006)● Warning

    – p1 better than p2 does not mean MC(p1) better than MC(p2)

  • MCTS for Computer Go 55

    Monte-Carlo Tree Search (MCTS)● How to integrate MC and TS ?● UCT = UCB for Trees

    – (Kocsis & Szepesvari 2006)– Superposition of UCB (Auer & al 2002)

    ● MCTS– Selection, expansion, updating (Chaslot & al)

    (Coulom 2006)– Simulation (Bouzy 2003) (Wang & Gelly

    2006)

  • MCTS for Computer Go 56

    MCTS (1/2)while (hasTime) {

    playOutTreeBasedGame()expandTree()outcome = playOutRandomGame()updateNodes(outcome)

    }then choose the node with...

    ... the best mean value

    ... the highest visit number

  • MCTS for Computer Go 57

    MCTS (2/2)PlayOutTreeBasedGame() {

    node = getNode(position)while (node) {

    move=selectMove(node)play(move)node = getNode(position)

    }}

  • MCTS for Computer Go 58

    UCT move selection● Move selection rule to browse the tree:

    move=argmax (s*mean + C*sqrt(log(t)/n)) ● Mean value for exploitation

    – s (=+-1): color to move● UCT bias for exploration

    – C: constant term set up by experiments– t: number of visits of the parent node – n: number of visits of the current node

  • MCTS for Computer Go 59

    Example

    ● 1 iteration

    1

    1/1

  • MCTS for Computer Go 60

    Example● 2 iterations

    0

    1/2

    0/1

  • MCTS for Computer Go 61

    Example● 3 iterations

    1

    2/3

    0/11/1

  • MCTS for Computer Go 62

    Example

    ● 4 iterations2/4

    0/11/10/1

  • MCTS for Computer Go 63

    Example

    ● 5 iterations3/5

    0/11/10/1 1/1

  • MCTS for Computer Go 64

    Example● 6 iterations

    0/11/20/1 1/1

    0/1

    3/6

  • MCTS for Computer Go 65

    Example● 7 iterations 3/7

    0/11/20/1 1/2

    0/1 0/1

  • MCTS for Computer Go 66

    Example● 8 iterations 4/8

    0/12/30/1 1/2

    0/1 0/11/1

  • MCTS for Computer Go 67

    Example● 9 iterations 4/9

    0/10/1 1/2

    0/1 0/11/1

    2/4

    0/1

  • MCTS for Computer Go 68

    Example● 10 iterations

    5/10

    0/12/40/1 2/3

    0/1 0/11/10/1 1/1

  • MCTS for Computer Go 69

    Example● 11 iterations 6/11

    0/12/40/1 3/4

    0/1 0/11/10/1 1/1 1/1

  • MCTS for Computer Go 70

    Example● 12 iterations

    7/12

    0/12/40/1 4/5

    0/1 1/21/10/1 1/1 1/1

    1/1

  • MCTS for Computer Go 71

    Example● Clarity

    – C = 0● Notice

    – with C != 0 a node cannot stay unvisited– min or max rule according to the node depth– not visited children have an infinite mean

    ● Practice– Mean initialized optimistically

  • MCTS for Computer Go 72

    MCTS enhancements● The raw version can be enhanced

    – Tuning UCT C value– Outcome = score or win loss info (+1/-1)– Doubling the simulation number– RAVE – Using Go knowledge

    ● In the tree or in the simulations– Speed-up

    ● Optimizing, pondering, parallelizing

  • MCTS for Computer Go 73

    Assessing an enhancement● Self-play

    – The new version vs the reference version– % wins with few hundred games – 9x9 (or 19x19 boards)

    ● Against differently designed programs– GTP (Go Text Protocol)– CGOS (Computer Go Operating System)

    ● Competitions

  • MCTS for Computer Go 74

    Move selection formula tuning

    ● Using UCB– Best value for C ?– 60-40%

    ● Using « UCB-tuned » (Auer & al 2002)– C replaced by min(1/4,variance)– 55-45%

  • MCTS for Computer Go 75

    Exploration vs exploitation

    ● General idea: explore at the beginning and exploit in the end of thinking time

    ● Diminishing C linearly in the remaining time– (Vermorel & al 2005)– 55-45%

    ● At the end:– Argmax over the mean value or over the

    number of visits ?– 55-45%

  • MCTS for Computer Go 76

    Kind of outcome

    ● 2 kinds of outcomes– Score (S) or win loss information (WLI) ?– Probability of winning or expected score ?– Combining both (S+WLI) (score +45 if win)

    ● Results – WLI vs S 65-35%– S+WLI vs S 65-35%

  • MCTS for Computer Go 77

    Doubling the number of simulations

    ● N = 100,000

    ● Results – 2N vs N 60-40%– 4N vs 2N 58-42%

  • MCTS for Computer Go 78

    Tree management

    ● Transposition tables– Tree -> Directed Acyclic Graph (DAG)– Different sequences of moves may lead to

    the same position– Interest for MC Go: merge the results– Result: 60-40%

    ● Keeping the tree from one move to the next– Result: 65-35%

  • MCTS for Computer Go 79

    RAVE (1/3)

    ● Rapid Action Value Estimation– Mogo 2007– Use the AMAF heuristic (Brugmann 1993)– There are « many » virtual sequences that

    are transposed from the actually played sequence

    ● Result: – 70-30%

  • MCTS for Computer Go 80

    RAVE (2/3)● AMAF heuristic● Which nodes to update?● Actual

    – Sequence ACBD– Nodes

    ● Virtual– BCAD, ADBC, BDAC– Nodes

    B

    A B

    B

    DD

    D C

    CC

    AA

  • MCTS for Computer Go 81

    RAVE (3/3)● 3 variables

    – Usual mean value Mu– AMAF mean value Mamaf– M = β Mamaf + (1-β) Mu– β = sqrt(k/(k+3N))– K set up experimentally

    ● M varies from Mamaf

    to Mu

  • MCTS for Computer Go 82

    Knowledge in the simulations

    ● High urgency for...– capture/escape 55-45%– 3x3 patterns 60-40%– Proximity rule 60-40%

    ● Mercy rule– Interrupt the game when the difference of

    captured stones is greater than a threshold (Hillis 2006)

    – 51-49%

  • MCTS for Computer Go 83

    Knowledge in the tree

    ● Virtual wins for good looking moves● Automatic acquisition of patterns of pro

    games (Coulom 2007) (Bouzy & Chaslot 2005)● Matching has a high cost● Progressive widening (Chaslot & al 2008)● Interesting under strong time constraints ● Result: 60-40%

  • MCTS for Computer Go 84

    Speeding up the simulations

    ● Fully random simulations (2007)– 50,000 game/second (Lew 2006)– 20,000 (commonly eared)– 10,000 (my program)

    ● Pseudo-random– 5,000 (my program in 2007)

    ● Rough optimization is worthwhile

  • MCTS for Computer Go 85

    Pondering

    ● Think on the opponent time– 55-45%– Possible doubling of thinking time– The move of the opponent may not be

    the planned move on which you think– Side effect: play quickly to think on the

    opponent time

  • MCTS for Computer Go 86

    Summing up the enhancements● MCTS with all enhancements vs raw MCTS

    – Exploration and exploitation: 60-40%– Win/loss outcome: 65-35%– Rough optimization of simulations 60-40%– Transposition table 60-40%– RAVE 70-30%– Knowledge in the simulations 70-30%– Knowledge in the tree 60-40%– Pondering 55-45%– Parallelization 70-30%

    ● Result: 99-1%

  • MCTS for Computer Go 87

    Parallelization● Computer Chess: Deep Blue● Multi-core computer

    – Symmetric MultiProcessor (SMP)– one thread per processor– shared memory, low latency– mutual exclusion (mutex) mechanism

    ● Cluster of computers– Message Passing Information (MPI)

  • MCTS for Computer Go 88

    Parallelization

    while (hasTime) {playOutTreeBasedGame()expandTree()outcome = playOutRandomGame()updateNodes(outcome)

    }

  • MCTS for Computer Go 89

    Leaf parallelization

  • MCTS for Computer Go 90

    Leaf parallelization● (Cazenave Jouandeau 2007)● Easy to program● Drawbacks

    – Wait for the longest simulation– When part of the simulation outcomes is a

    loss, performing the remaining may not be a relevant strategy.

  • MCTS for Computer Go 91

    Root parallelization

  • MCTS for Computer Go 92

    Root parallelization

    ● (Cazenave Jouandeau 2007)● Easy to program● No communication● At completion, merge the trees● 4 MCTS for 1sec > 1 MCTS for 4 sec● Good way for low time settings and a small

    number of threads

  • MCTS for Computer Go 93

    Tree parallelization – global mutex

  • MCTS for Computer Go 94

    Tree parallelization – local mutex

  • MCTS for Computer Go 95

    Tree parallelization● One shared tree, several threads● Mutex

    – Global: the whole tree has a mutex– Local: each node has a mutex

    ● « Virtual loss »– Given to a node browsed by a thread– Removed at update stage– Preventing threads from similar simulations

  • MCTS for Computer Go 96

    Computer-computer results● Computer Olympiads

    19x19 9x9– 2010 Erica, Zen, MFGo MyGoFriend– 2009 Zen, Fuego, Mogo Fuego– 2008 MFGo, Mogo, Leela MFGo– 2007 Mogo, CrazyStone, GNU Go Steenvreter– 2006 GNU Go, Go Intellect, Indigo CrazyStone– 2005 Handtalk, Go Intellect, Aya Go Intellect– 2004 Go Intellect, MFGo, Indigo Go Intellect

  • MCTS for Computer Go 97

    Human-computer results● 9x9

    – 2009: Mogo won a pro with black– 2009: Fuego won a pro with white

    ● 19x19:– 2008: Mogo won a pro with 9 stones

    Crazy Stone won a pro with 8 stones Crazy Stone won a pro with 7 stones

    – 2009: Mogo won a pro with 6 stones

  • MCTS for Computer Go 98

    MCTS and the old approach

    Top professional players

    Average players

    Very beginners

    Beginners

    Strong players

    Very strong players

    Pro ranking Amateur ranking

    9 dan

    1 dan

    30 kyu

    20 kyu

    10 kyu

    1 kyu1 dan

    6 dan

    9 dan

    MCTS

    Old approach

    9x9 go

    19x19 go

  • MCTS for Computer Go 99

    Computer Go (MC history)

    ● Monte-Carlo Go (Brugmann 1993)● MCGo devel. (Bouzy & Helmstetter 2003)● MC+knowledge (Bouzy 2003)● UCT (Kocsis & Szepesvari 2006)● Crazy Stone (Coulom 2006)● Mogo (Wang & Gelly 2006)

  • MCTS for Computer Go 100

    Conclusion● Monte-Carlo brought a Big

    improvement in Computer Go over the last decade!

    – No old approach based program anymore!– All go programs are MCTS based!– Professional level on 9x9!– Dan level on 19x19!

    ● Unbelievable 10 years ago!

  • MCTS for Computer Go 101

    Some references

    ● PhD, MCTS and Go (Chaslot 2010)● PhD, Reinf. Learning and Go (Silver 2010)● PhD, R. Learning: applic. to Go (Gelly 2007)● UCT (Kocsis & Szepesvari 2006)● 1st MCTS go program (Coulom 2006)

  • MCTS for Computer Go 102

    Web links

    ● http://www.grappa.univ-lille3.fr/icga/● http://cgos.boardspace.net/● http://www.gokgs.com/● http://www.lri.fr/~gelly/MoGo.htm● http://remi.coulom.free.fr/CrazyStone/● http://fuego.sourceforge.net/● ...

    http://www.grappa.univ-lille3.fr/icga/http://cgos.boardspace.net/http://www.gokgs.com/http://www.lri.fr/~gelly/MoGo.htmhttp://remi.coulom.free.fr/CrazyStone/http://fuego.sourceforge.net/

  • MCTS for Computer Go 103

    Thank you for your attention!

    Diapo 1Diapo 2Diapo 3Diapo 4Diapo 5Diapo 6Diapo 7Diapo 8Diapo 9Diapo 10Diapo 11Diapo 12Diapo 13Diapo 14Diapo 15Diapo 16Diapo 17Diapo 18Diapo 19Diapo 20Diapo 21Diapo 22Diapo 23Diapo 24Diapo 25Diapo 26Diapo 27Diapo 28Diapo 29Diapo 30Diapo 31Diapo 32Diapo 33Diapo 34Diapo 35Diapo 36Diapo 37Diapo 38Diapo 39Diapo 40Diapo 41Diapo 42Diapo 43Diapo 44Diapo 45Diapo 46Diapo 47Diapo 48Diapo 49Diapo 50Diapo 51Diapo 52Diapo 53Diapo 54Diapo 55Diapo 56Diapo 57Diapo 58Diapo 59Diapo 60Diapo 61Diapo 62Diapo 63Diapo 64Diapo 65Diapo 66Diapo 67Diapo 68Diapo 69Diapo 70Diapo 71Diapo 72Diapo 73Diapo 74Diapo 75Diapo 76Diapo 77Diapo 78Diapo 79Diapo 80Diapo 81Diapo 82Diapo 83Diapo 84Diapo 85Diapo 86Diapo 87Diapo 88Diapo 89Diapo 90Diapo 91Diapo 92Diapo 93Diapo 94Diapo 95Diapo 96Diapo 97Diapo 98Diapo 99Diapo 100Diapo 101Diapo 102Diapo 103