Bachelor's Thesis 2014 Eeva-Leena Lehtonen APPLYING PUBLIC ...
Bachelor's Thesis (Extended Abstract), Faculty of Computer ...
Transcript of Bachelor's Thesis (Extended Abstract), Faculty of Computer ...
Supervisor: Prof. Hiroshi Hosobe
Q Applying Q-Learning to a Dot-Eat Game
Takahiro Morita
E-mail: [email protected]
Abstract
This paper proposes UCB fuzzy Q-learning by combining fuzzy Q-learning and the UCBQ algorithm and applies it to a dot-eat game. The UCBQ algorithm improved the action selection method called the UCB algorithm by applying it to Q-learning. The UCB algorithm selects the action with the highest value, called the UCB value, instead of a value estimate. In addition, since this algorithm is based on the premise that any unselected actions are selected and value estimates are obtained, the number of unselected actions is small, and it is possible to prevent local solutions. The proposed method aims to promote learning efficiently by reducing unselected actions and preventing the Q value from becoming a local solution in fuzzy Q-learning in a dot-eat game. This paper applies the proposed method to a dot-eat game called Ms. PacMan, and presents an experiment on finding optimum values used in the method. Its evaluation is performed by comparing the game score with the score obtained by the AI of a previous study. The result shows that the proposed method significantly reduced unselected actions.
1. AI AI
2017AI Ponanza AI
AI AlphaGo AI 3AI
Q [1] QQ
QQ [2]
Q Q
Q UCBQ [3]
UCB Q
UCBQUCB [4] Q
UCBUCB
Ms. PacMan
[5] AI
2. Q AI
[2] Q
[6] Q2
1
Ms. PacMan
AI DeLooze [5]
Q2017 Microsoft AI [7] Ms. PacMan
AI Hybrid Reward Architecture
150
Bachelor's Thesis (Extended Abstract), Faculty of Computer and Information Sciences, Hosei University
1
3. Ms. PacMan Ms.PacMan 1980 PacMan
( 1)
200 400 800 1600
1 Ms. PacMan
PacMan ( )AI
Ms. PacMan
IEEE CEC
4.
4.1. Q Q 1992 Watkins [1]
()
4.2. Q Q [2] 2008
Q
DeLooze [5] Q Ms. PacMan
Q Q
4.3. UCB UCB [4] Q Q
UCB
QUCB
4.4. UCBQ [3] UCB Q
UCBQQ UCB
-greedy
Q
UCB
Q UCBQQ UCB Q
Ms. PacMan
UCB
Bachelor's Thesis (Extended Abstract), Faculty of Computer and Information Sciences, Hosei University
2
1. ( ) ( ) ( )( )
2. UCB
3.
Q 2
5~20 1 UCB Q
6UCB
11
1415
Q 17s a
18 Q UCB
procedure UCB fuzzy Q-learning
1 begin 2 initialize , UCB, , ,
numEpisode 3 for cycle := 1 to numEpisode 4 # 5 while (not done) do 6 if 7 8 else 9
10 end 11
12 13 for cycle := 0 to len(Q) 14
15
16 end 17
18
19 20 end 21 end 22 end
2 UCB Q
6. Ms. PacMan OpenAIGym
OpenAIGym OpenAI
OpenAIGym
ATARI Ms. PacMan
OpenCV OpenCV
OpenAIGym Ms. PacMan
( )
3 427
3
50
3
4 ( )
UCBQ
Q 1
00 UCBQ
Q
0
1
100 150 200
0
1
20 60 100
Bachelor's Thesis (Extended Abstract), Faculty of Computer and Information Sciences, Hosei University
3
2 6 UCB Q2 4 2
5 6
1
0 0 0 3360 1 0 0.2 3480 2 0.01 0.2 3560 3 0 0.2 3120 4 0.01 0.2 3300 5 0.1 0.2 3280 6 3.0 0.2 2460
0 Q 50
2 4 5 62 4 5 6 Q
50Q
8.
2 5 6 4.3
6
0.012
Q
02
0.03 QMs. PacMan Q
Q 2 Q1 Q
2
100
2
3
9. Q UCBQ
UCB QMs. PacMan UCBQ
Ms. PacMan 3
UCB Q
[1] C. J. Watkins and P. Dayan, "Q-Learning," Kluwer
Academic Publishers, 1992.
[2] , , , " Q," ,
vol. 15, no. 6, pp. 702-707, 2003.
[3] , , , " UCB," 30, pp. 174-179, 2014.
[4] P. Auer, N. Cesa-Bianchi and P. Fischer, "Finite-time Analysis of the Multiarmed Bandit Problem," Machine Learning, vol. 47, pp. 235-256, 2002.
[5] L. L. DeLooze and W. R. Viner, "Fuzzy Q-Learning in a Nondeterministic Environment: Developing an Intelligent Ms. Pac-Man Agent," Proc. IEEE Symposium on Computational Intelligence and Games, pp. 162-169, 2009.
[6] , , , "Q ," 29
, pp. 1006-1011, 2013.
[7] H. v. Senjen, M. Fateme, J. Romoff, R. Laroche, T. Barnes and J. Tsang, "Hybrid Reward Architecture for Reinforcement Learning," arXiv:1706.04208v2, 2017.
Bachelor's Thesis (Extended Abstract), Faculty of Computer and Information Sciences, Hosei University
4