COMP 4180: Intelligent Mobile Robotics Reinforcement Learning
Transcript of COMP 4180: Intelligent Mobile Robotics Reinforcement Learning
![Page 1: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/1.jpg)
COMP 4180: Intelligent Mobile Robotics
Reinforcement Learning
Jacky BaltesDepartment of Computer Science
University of Manitoba
Email: [email protected]
http://www4.cs.umanitoba.ca/~jacky/...Teaching/Courses/COMP_4180-
IntelligentMobileRobotics/current/index.php
![Page 2: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/2.jpg)
Outline
● Reinforcement Learning Problem– Dynamic Programming– Control learning– Control policies that choose optimal actions– Q Learning– Convergence
● Monte-Carlo Methods● Temporal Difference Learning
![Page 3: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/3.jpg)
Control Learning
![Page 4: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/4.jpg)
Example: TD-Gammon
![Page 5: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/5.jpg)
Reinforcement Learning Problem
![Page 6: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/6.jpg)
Markov Decision Processes
![Page 7: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/7.jpg)
Agent's Learning Task
![Page 8: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/8.jpg)
State Value Function
![Page 9: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/9.jpg)
Bellman Equation(Deterministic Case)
![Page 10: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/10.jpg)
Example
![Page 11: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/11.jpg)
Example
![Page 12: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/12.jpg)
Iterative Policy Evaluation
![Page 13: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/13.jpg)
Iterative Policy Evaluation
![Page 14: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/14.jpg)
What to learn?
![Page 15: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/15.jpg)
Q (Action-Value) Function
![Page 16: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/16.jpg)
Q (Action-Value) Function
![Page 17: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/17.jpg)
![Page 18: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/18.jpg)
Bellman EquationDeterministic Case
![Page 19: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/19.jpg)
Optimal Value Functions
![Page 20: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/20.jpg)
Policy Improvement
![Page 21: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/21.jpg)
Example
![Page 22: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/22.jpg)
Example
![Page 23: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/23.jpg)
Generalized Policy Iteration
![Page 24: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/24.jpg)
Value IterationQ-Learning
![Page 25: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/25.jpg)
Non-deterministic Case
![Page 26: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/26.jpg)
Bellman EquationsNon-deterministic Case
![Page 27: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/27.jpg)
Value IterationQ-Learning
![Page 28: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/28.jpg)
Example
![Page 29: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/29.jpg)
Example
![Page 30: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/30.jpg)
Reinforcement Learning
![Page 31: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/31.jpg)
Monte-Carlo MethodsPolicy Evaluation
![Page 32: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/32.jpg)
Monte Carlo MethodPolicy Evaluation
![Page 33: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/33.jpg)
Temporal Difference (TD) Learning
![Page 34: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/34.jpg)
TD(0): Policy Evaluation
![Page 35: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/35.jpg)
TD(0): Policy Evaluation
![Page 36: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/36.jpg)
e-Greedy Policy
![Page 37: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/37.jpg)
SARSA Policy Iteration
![Page 38: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/38.jpg)
SARSA Example
![Page 39: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/39.jpg)
SARSA Example V(s)
![Page 40: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/40.jpg)
SARSA ExampleQ(s,a)
![Page 41: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/41.jpg)
Rotational Inverted Pendulum
Rotational Inverted Pendulum Stablization Demo, Tor Aarnodthttp://www.eecg.utoronto.ca/~aamodt/BAScThesis/RLsim.htm
![Page 42: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/42.jpg)
Q-Learning (Off-Policy TD)
![Page 43: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/43.jpg)
Q-Learning (Off Policy Iteration)
![Page 44: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/44.jpg)
TD vs Monte Carlo
![Page 45: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/45.jpg)
Temporal Difference Learning
![Page 46: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/46.jpg)
Monte Carlo Method
![Page 47: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/47.jpg)
N-Step return
![Page 48: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/48.jpg)
TD() Learning
![Page 49: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/49.jpg)
Eligibility Traces
![Page 50: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/50.jpg)
On-line TD()
![Page 51: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/51.jpg)
Function Approximation
![Page 52: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/52.jpg)
Function Approximation
![Page 53: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/53.jpg)
Stochastic Gradient Descent
![Page 54: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/54.jpg)
Convergence
![Page 55: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/55.jpg)
Subtleties and Ongoing Research
● Replace Q^ table with neural net or other generalizer
● Handle cases where the state is only partially observable
● Design optimal exploration strategies● Extend to continuous action, state● Learn and use delta^: S x A -> S● Relationship to dynamic programming
![Page 56: COMP 4180: Intelligent Mobile Robotics Reinforcement Learning](https://reader030.fdocuments.us/reader030/viewer/2022012503/617cc8d5d2a73838b4210df9/html5/thumbnails/56.jpg)
References
● Reinforcement Learning: An Introduction. Richard S. Sutton, Andrew G. Barto. MIT Press 1998. http://www-anw.cs.umass.edu/~rich/book/the-book.html
● Neuro-Dynamic Programming, Dimitri Bertsekas, John Tsitsiklis, Athena Scientific, 1996.
● Reinforcement Learning: A Tutorial. M. Harmon, S. Harmon.● Reinforcement Learning: A Survey, L. Kaebling et al., Journal of Aritificial
Intelligence Research, Vol 4, pp. 237-285● How to Make Software Agents Do the Right Thing: An Introduction to
Reinforcement Learning, S. Singh, P. Norvig, D. Cohn.● Reinforcement Learning Software:
– http://www-anw.cs.umass.edu/~rich/software.html– http://www.cse.msu.edu/rlr/domains.html
● Reinforcement Learning for Humanoid Robots–
● Frank Hoffman. http://www.nada.kth.se/kurser/kth/2D1431/02/index.html