Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK [email protected]...
-
Upload
kenna-lumley -
Category
Documents
-
view
221 -
download
7
Transcript of Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK [email protected]...
![Page 1: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/1.jpg)
Monte Carlo Tree SearchA Tutorial
Peter CowlingUniversity of York, UK
Simon LucasUniversity of Essex, UK
Supported by the Engineering and Physical Sciences Research Council
(EPSRC)
![Page 2: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/2.jpg)
Further Reading
Browne C., Powley E., Whitehouse D., Lucas S., Cowling P.I., Rohlfshagen P., Tavener S., Perez D. and Samothrakis S., Colton S. (2012): "A Survey of Monte Carlo Tree Search Methods" IEEE Transactions on Computational Intelligence and AI in Games, IEEE, 4 (1): 1-43.
![Page 3: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/3.jpg)
Monte Carlo Tree Search (MCTS)Why know more?
Go:
Strong/World Champion MCTS players in: General Game Playing, Hex, Amazons, Arimaa, Khet, Shogi, Mancala, Morpion, Blokus, Focus, Chinese Checkers, Yavalath, Connect Four, Gomoku
Over 250 papers since 2006 - about one per week on average.
Promise for planning, real-time games, nondeterministic games, single-player puzzles, optimisation, simulation, PCG, …
![Page 4: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/4.jpg)
Tutorial Structure
• Monte Carlo search– The multi-armed bandit problem
• Monte Carlo Tree Search (MCTS)– Selection, expansion, simulation, backpropagation
• Enhancements to the basic algorithm • Handling uncertainty in MCTS• MCTS for real-time games• Conclusions, future research directions
![Page 5: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/5.jpg)
Monte Carlo Search for Decision Making in Games
No trees yet …
![Page 6: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/6.jpg)
The Multi-Armed Bandit Problem
At each step pull one arm
Noisy/random reward signal
In order to:Find the best armMinimise regretMaximise expected return
![Page 7: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/7.jpg)
Which Arm to Pull?
Flat Monte Carlo Share trials uniformly between arms
ε-GreedyP(1- ε) – Best arm so farP(ε) – Random arm
UCB1 (Auer et al (2002)). Choose arm j so as to maximise:
Mean Mean so farso far
Upper bound Upper bound on varianceon variance
![Page 8: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/8.jpg)
Game Decisions
Current position
Positionafter move
A
Positionafter move
B
Positionafter move
C
Move A Move BMove C
Simulation Result (+1 = win, 0 = loss)
Multi-Armed Bandit
Arm A Arm CArm B
![Page 9: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/9.jpg)
Discussion
• Anytime – stop whenever you like• UCB1 formula minimises regret– Grows like log(n)
• Needs only game rules:– Move generation– Terminal state evaluation
• Surprisingly effective, but…… doesn’t look ahead
![Page 10: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/10.jpg)
MCTS / UCT Basics
![Page 11: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/11.jpg)
Attractive Features
• Anytime• Scalable– Tackle complex games better than before– May be logarithmically better with increased CPU
• No need for heuristic function– Though usually better with one
• Next we’ll look at:– General MCTS– UCT in particular
![Page 12: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/12.jpg)
MCTS: the main idea
• Tree policy: choose which node to expand (not necessarily leaf of tree)
• Default (simulation) policy: random playout until end of game
![Page 13: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/13.jpg)
MCTS Algorithm
• Decompose into 6 parts:• MCTS main algorithm– Tree policy • Expand• Best Child (UCT Formula)
– Default Policy– Back-propagate
• We’ll run through these then show demos
![Page 14: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/14.jpg)
MCTS Main Algorithm
• BestChild simply picks best child node of root according to some criteria: e.g. best mean value
• In our pseudo-code BestChild is called from TreePolicy and from MctsSearch, but different versions can be used– E.g. final selection can be the max value child or the most frequently visited one
![Page 15: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/15.jpg)
TreePolicy
• Note that node selected for expansion does not need to be a leaf of the tree– The nonterminal test refers to the game state
![Page 16: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/16.jpg)
Expand
• Note: need to choose an appropriate data structure for the children (array, ArrayList, Map)
![Page 17: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/17.jpg)
Best Child (UCT)
• This is the standard UCT equation– Used in the tree
• Higher values of c lead to more exploration• Other terms can be added, and usually are– More on this later
![Page 18: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/18.jpg)
DefaultPolicy
• Each time a new node is added to the tree, the default policy randomly rolls out from the current state until a terminal state of the game is reached
• The standard is to do this uniformly randomly– But better performance may be obtained by biasing with knowledge
![Page 19: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/19.jpg)
Backup
• Note that v is the new node added to the tree by the tree policy
• Back up the values from the added node up the tree to the root
![Page 20: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/20.jpg)
MCTS Builds Asymmetric Trees (demo)
![Page 21: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/21.jpg)
Connect 4 Demo
![Page 22: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/22.jpg)
Enhancements to the Basic MCTS Algorithm
Tree PolicyDefault Policy
Domain: independent or specific
![Page 23: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/23.jpg)
Selection/Expansion Enhancements(just a sample …)
• AMAF / RAVE• First Play Urgency• Learning– E.g. Temporal Difference Learning to bias tree
policy or default policy
• Bandit enhancements– E.g. Bayesian selection
• Parameter Tuning
![Page 24: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/24.jpg)
All Moves As First (AMAF),Rapid Value Action Estimates (RAVE)
• Additional term in UCT equation:– Treat actions / moves the same independently of
where they occur in the move sequence
![Page 25: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/25.jpg)
Simulation Enhancements• Nudge the simulation to be more realistic– Without introducing bias– At very low CPU cost
• Examples:– Move-Average Sampling Technique (MAST)– Last Good Reply (LGR)– Patterns e.g. Bridge Completion: – Heuristic value functions
• Paradoxically better simulation policy sometimes yields weaker play
![Page 26: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/26.jpg)
Hex Demo
![Page 27: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/27.jpg)
Parallelisation
![Page 28: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/28.jpg)
Handling Uncertainty and Incomplete Information in MCTS
![Page 29: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/29.jpg)
Probabilistic (stochastic) nodesProbabilistic (stochastic) nodes
MAXMAX
MINMIN
MAXMAX
CHANCECHANCE
Probabilistic (stochastic) Probabilistic (stochastic) planningplanning
Decision-making/optimisation Decision-making/optimisation under uncertaintyunder uncertainty
![Page 30: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/30.jpg)
Hidden InformationHidden Information
E.g. Poker, E.g. Poker, Bridge, Bridge, SecuritySecurity
Actual state:Actual state:
Observation:Observation:
Information set:Information set:
{ ,
,
, ...}
![Page 31: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/31.jpg)
Effects of Uncertainty and Hidden Information on Effects of Uncertainty and Hidden Information on the Game Treethe Game Tree
4 possible 4 possible plays by meplays by me
50 possible random 50 possible random card drawscard draws
4040CC3 3 = 9880 different = 9880 different opponent playsopponent plays
...
choose subsetchoose subsetchoose subsetchoose subset
![Page 32: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/32.jpg)
Reduced Branching Through DeterminizationReduced Branching Through Determinization
4 possible 4 possible plays by meplays by me
1 possible random 1 possible random card drawscard draws
44CC3 3 = 4 different = 4 different opponent playsopponent plays
...
[Yoon, Fern & Givan 2007]
Use average results over Use average results over many determinizationsmany determinizations
![Page 33: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/33.jpg)
DeterminizationDeterminization
• SampleSample states from the information set states from the information set• AnalyseAnalyse the individual perfect information games the individual perfect information games• CombineCombine the results at the end the results at the end• SuccessesSuccesses– Bridge (Ginsberg), Scrabble (Sheppard)Bridge (Ginsberg), Scrabble (Sheppard)– Klondike solitaire (Bjarnason, Fern and Tadepalli)Klondike solitaire (Bjarnason, Fern and Tadepalli)– Probabilistic planning (Yoon, Fern and Givan)Probabilistic planning (Yoon, Fern and Givan)
• ProblemsProblems– Never tries to Never tries to gathergather or or hide informationhide information– Suffers from Suffers from strategy fusion strategy fusion and and non-locality non-locality (Frank and (Frank and
Basin)Basin)M.L. Ginsberg, “GIB: Imperfect information in a computationally challenging game,” Journal of Artificial Intelligence Research, vol. 14, 2002, p. 313–368.R. Bjarnason, A. Fern, and P. Tadepalli, “Lower bounding Klondike solitaire with Monte-Carlo planning,” Proc. ICAPS-2009, p. 26–33.S. Yoon, A. Fern, and R. Givan, “FF-Replan: A Baseline for Probabilistic Planning,” 1Proc. ICAPS-2007, p. 352–359.I. Frank and D. Basin, “Search in games with incomplete information: a case study using Bridge card play,” Artificial Intelligence, vol. 100, 1998, pp. 87-123.
33
![Page 34: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/34.jpg)
CheatingCheating
• Easiest approach to AI for games with Easiest approach to AI for games with imperfect information: imperfect information: cheatcheat and look and look at the hidden information and at the hidden information and outcomes of future chance eventsoutcomes of future chance events
• This gives a This gives a deterministic game of deterministic game of perfect informationperfect information, which can be , which can be searched with standard techniques searched with standard techniques (minimax, UCT, …)(minimax, UCT, …)
• Obviously not a reasonable approach in Obviously not a reasonable approach in practice, but gives a useful bound on practice, but gives a useful bound on performanceperformance
• Much used in commercial gamesMuch used in commercial games
![Page 35: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/35.jpg)
Information Set Monte Carlo Tree Search (ISMCTS)Information Set Monte Carlo Tree Search (ISMCTS)
Collect information about the Collect information about the value of a decision in the value of a decision in the
corresponding information set.corresponding information set.
Maintain a single treeMaintain a single tree
Use each determinization Use each determinization only onceonly once
Determinizations Determinizations restrict search to a restrict search to a
subtree of the subtree of the information set tree.information set tree.
![Page 36: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/36.jpg)
• Board game designed by Reiner Knizia• Hidden information: can’t see the identities of
opponent’s pieces• Simultaneous moves: combat is resolved by both
players simultaneously choosing a card
Lord of the Rings: The ConfrontationLord of the Rings: The Confrontation
Also Phantom 4,4,4, Dou Di Zhu and several other card games…
![Page 37: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/37.jpg)
MCTS for Real-Time Games• Limited roll-out budget– Heuristic knowledge becomes important
• Action space may be too fine-grained– Take Macro-actions– Otherwise planning will be very short-term
• May be no terminal node in sight– Again, use heuristic– Tune simulation depth
• Next-state function may be expensive– Consider making a simpler abstraction
![Page 38: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/38.jpg)
PTSP Demo
![Page 39: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/39.jpg)
Summary• MCTS: exciting area of research• Many impressive achievements already– With many more to come
• Some interesting directions:– Applications beyond games– Comparisons with other search methods– Heuristic knowledge is important – more work needed on
learning it– Efficient parameter tuning– POMDPs– Uncertainty– Macro-actions– Hybridisation with other CI methods e.g. Evolution
![Page 40: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/40.jpg)
Questions?
Browne C., Powley E., Whitehouse D., Lucas S., Cowling P.I., Rohlfshagen P., Tavener S., Perez D. and Samothrakis S., Colton S. (2012): "A Survey of Monte Carlo Tree Search Methods" IEEE Transactions on Computational Intelligence and AI in Games, IEEE, 4 (1): 1-43.
![Page 41: Monte Carlo Tree Search A Tutorial Peter Cowling University of York, UK peter.cowling@york.ac.uk Simon Lucas University of Essex, UK sml@essex.ac.uk Supported.](https://reader035.fdocuments.us/reader035/viewer/2022062511/5518a3c8550346b31f8b499b/html5/thumbnails/41.jpg)
Advertising Break• Sep 28: Essex workshop on AI and Games– Strong MCTS Theme– Organised by Essex Game Intelligence Group
• Next week: AIGAMEDEV Vienna AI summit– Find out what games industry people think– Resistance competition and MCTS tutorial (!)