Meta Monte-Carlo Tree Search

download Meta Monte-Carlo Tree Search

If you can't read please download the document

Transcript of Meta Monte-Carlo Tree Search

- Go is not solved beyond 6x6 - We build an opening book for 7x7 for approximately solving 7x7 Go.. Widely believed: perfect play is a draw with Komi 9.

Our tools1) Monte-Carlo Tree Search2) Meta-Monte-Carlo Tree Search3) Senseis' partial solution (using in particular Davies' work)

Monte-Carlo Tree Search Coulom 2006, Kocsis-Szepesvari 2006.

= combining tree search and Monte-Carlo evaluation

Coulom (06)Chaslot, Saito & Bouzy (06)Kocsis Szepesvari (06)UCT (Upper Confidence Trees)

UCT

UCT

UCT

UCT

UCT

Kocsis & Szepesvari (06)

Exploitation ...

Exploitation ...

SCORE = 5/7 + k.sqrt( log(10)/7 )

Exploitation ...

SCORE = 5/7 + k.sqrt( log(10)/7 )

Exploitation ...

SCORE = 5/7 + k.sqrt( log(10)/7 )

... or exploration ?

SCORE = 0/2 + k.sqrt( log(10)/2 )

Our tools1) Monte-Carlo Tree Search2) Meta-Monte-Carlo Tree Search3) Senseis' partial solution (using in particular Davies' work)

Meta-Monte-CarloTree Search= Monte-Carlo Tree Search with Monte-Carlo replaced by MCTS

Meta-Monte-CarloTree SearchI.e.:MCTS = MC play-outs + Tree SearchMeta-MCTS = MCTS play-outs + Tree SearchMeta-Meta-MCTS = Meta-MCTS play-outs + TreeSearch...

Meta-Monte-CarloTree SearchI.e.:MCTS = MC play-outs + Tree SearchMeta-MCTS = MCTS play-outs + Tree SearchMeta-Meta-MCTS = Meta-MCTS play-outs + TreeSearch...

Our tools1) Monte-Carlo Tree Search2) Meta-Monte-Carlo Tree Search3) Senseis' partial solution (using in particular Davies' work)

A variation which is not in Senseis' file.Left: black E3 should be black E5.Right: corrected version.

EXPERIMENTAL

RESULTS

Meta-MCTS learns against aMCTS sparring partner.

We introduce Senseis' variationsinto this sparring partnerduring the Meta-MCTS run.

Learning curve of black by Meta-MCTS:- X-axis = log2(number of playouts)- Y-axis = moving average (window size 55) of winning rate in playouts

Playouts = MCTS (it's Meta-MCTS)Komi = 8.5 (winning with komi 8.5 ensures a draw with komi 9)

Learning curve of white by Meta-MCTS:- X-axis = log2(number of playouts)- Y-axis = moving average (window size 55) of winning rate in playouts

Playouts = MCTS (it's Meta-MCTS)Komi = 9.5 (winning with komi 9.5 ensures a draw with komi 9)

Decreasing points in the curve = introduction of Senseis variations in the opponent.

Conclusion = the algorithm did not find alone all these variations ==> human needed.

Games

Against

pros.

MoGoTW is black.

MoGoTW is white

With komi 9.5, MoGoTW won everything as White.With komi 8.5, MoGoTW won everything as Black.

Exciting!

Were all MoGoTW's moves perfect ?

With komi 9.5, MoGoTW won everything as White.With komi 8.5, MoGoTW won everything as Black.

Exciting!

Were all MoGoTW's moves perfect ?

No :-(

In one game (at least) the human mighthave won.

Left: this game was won by MoGoTW as black. Chun-Yen Lin (2P) made a mistake.

Right: how Chun-Yen Lin (2P) might have won the game.

So, still at least one variation onwhich the bot does not play correctly.

We did not introduce manually a correction,but we introduced the variation played by the proin the sparring partner.

We see if the bot can find a solution by itself.

Learning curve as black, after introducing the

dangerous variation in the sparring partner.

Time still logarithmic.

CONCLUSIONS

We used Meta-MCTS for buildingan opening book for 7x7 Go.

We are not aware of remaining bad moves, which does not mean there's no more bad move.

Meta-MCTS did a good job by itself,but human inputs ( = Senseis + games againstpros) have been helpful.

Towards exact solving ?= collecting all leafs of the OB+ solving all of them...= huge work.

Other conclusions:

- 7x7 can be very hard, even for pros (pros made mistakes).

- MCTS alone is not enough for very strong play in 7x7