1 Project Ideas. 2 Algorithmic Evaluations/Comparisons Compare variants of (nested) policy rollout...
-
Upload
adam-freeman -
Category
Documents
-
view
212 -
download
0
Transcript of 1 Project Ideas. 2 Algorithmic Evaluations/Comparisons Compare variants of (nested) policy rollout...
![Page 1: 1 Project Ideas. 2 Algorithmic Evaluations/Comparisons Compare variants of (nested) policy rollout using different bandit algorithms Compare some.](https://reader035.fdocuments.us/reader035/viewer/2022072010/56649dc95503460f94abec23/html5/thumbnails/1.jpg)
1
Project Ideas
![Page 2: 1 Project Ideas. 2 Algorithmic Evaluations/Comparisons Compare variants of (nested) policy rollout using different bandit algorithms Compare some.](https://reader035.fdocuments.us/reader035/viewer/2022072010/56649dc95503460f94abec23/html5/thumbnails/2.jpg)
2
Algorithmic Evaluations/Comparisons
h Compare variants of (nested) policy rollout using different bandit algorithms
h Compare some variants of Monte-Carlo tree search
h Implement an algorithm from the literature and attempt to replicate results, e.g.5 Forward Search Sparse Sampling (a type of Monte-
Carlo tree search algorithm)5 Anytime AO*5 Least-Squares Policy Iteration5 I could give other pointers depending on interests
![Page 3: 1 Project Ideas. 2 Algorithmic Evaluations/Comparisons Compare variants of (nested) policy rollout using different bandit algorithms Compare some.](https://reader035.fdocuments.us/reader035/viewer/2022072010/56649dc95503460f94abec23/html5/thumbnails/3.jpg)
3
Algorithmic Comparisons
h Compare some reinforcement learning algorithms across some interesting problems5 E.g. compare TD-based vs. Policy Gradient based5 You could use the domains I have in the Java framework
for evaluation
![Page 4: 1 Project Ideas. 2 Algorithmic Evaluations/Comparisons Compare variants of (nested) policy rollout using different bandit algorithms Compare some.](https://reader035.fdocuments.us/reader035/viewer/2022072010/56649dc95503460f94abec23/html5/thumbnails/4.jpg)
4
Solve a Particular Problem
h Pick a challenging sequential decision making problem5 Apply one or more of our planning/learning approaches to it
and evaluate
h Problems from past projects:5 Games
g Tetrisg Pokemong Blockusg Chessg Backgammong Othellog Clueg Space Wars (Galcon Fusion)g Starcraftg Pac Man
![Page 5: 1 Project Ideas. 2 Algorithmic Evaluations/Comparisons Compare variants of (nested) policy rollout using different bandit algorithms Compare some.](https://reader035.fdocuments.us/reader035/viewer/2022072010/56649dc95503460f94abec23/html5/thumbnails/5.jpg)
5
Solve a Particular Problem
h Problems from past projects:5 Compiler scheduling5 Adaptive Java program optimization5 Forest Fire Management5 Crop Management5 Optimizing Policies for Network Protocols5 Controllers for Real-Time Strategy Games
g Subproblems of the game5 Optimizing file sharing policies
h Reinforcement learning and Monte-Carlo were the most commonly applied solution approaches