Introduction Course Format, Schedule & Grading Overview...
Transcript of Introduction Course Format, Schedule & Grading Overview...
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.1
Lecture 1IntroductionRobot ControlMar. 29, 2016
Robert KrugÖrebro University
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.2
About Me . . .
Robert Krug
Postdoc at the Mobile Robotics & Olfaction Lab,Örebro UniversityResearch domains
Autonomous robot grasping/manipulation
Online motion planning/control
Office: T1220Phone: (+46/0) 19 - 30 3499Email: [email protected]
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.3
Agenda
1 Course Format, Schedule & Grading
2 OverviewWhat’s it All About . . . ?The Big PictureWrap-up
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.3
Agenda
1 Course Format, Schedule & Grading
2 OverviewWhat’s it All About . . . ?The Big PictureWrap-up
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.4
Disclaimer
Dimitar Dimitrov, Manipulation and Controlcourse, 2011.www.aass.oru.se/Research/Learning/drdv_dir/course_mc_2011.html
Magnus Egerstedt, Control ofMobile Robots course, 2015www.coursera.org/learn/mobile-robot
B. Siciliano, L. Sciavicco, L. Villani and G. Oriolo,2009, Robotics: modelling, planning and control,Springer Science & Business Media, ISBN:978-1-84628-642-1, 632 pages
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.5
Course Webpage
http://www.aass.oru.se/Research/mro/rkg_dir/course_rc_2016.html
Weekly updated lecture slides
Weekly updated exercises
Support material
Schedule
Course format & grading specification
Literature list
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.6
Course Format
A (hybrid) Flipped Classroom approach
https://learningsciences.utexas.edu/teaching/flipping-a-class
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.6
Course Format
A (hybrid) Flipped Classroom approach
Lectures 2h
Demos & Discussions 2h
Exercises 2h
Weekly schedule: in total 9 weekly sessions á 6h
Lectures→ intuition
Self study of the “hard” concepts using reading lists
Q & A→ addressing submitted questions, walk throughpractical examples
Exercises→ students solve given assignments
last 3 weeks→ mini-project
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.7
Content & Schedule
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.7
Content & Schedule
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.7
Content & Schedule
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.7
Content & Schedule
http://www.aass.oru.se/Research/mro/rkg_dir/rc_2016/schedule.pdf
The greater danger for most of us lies not in setting our aim toohigh and falling short; but in setting our aim too low, andachieving our mark.- Michelangelo -
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.8
What’s Not in the Course?
Classical Control [1]→ Bode methods, root locus,Laplace/Fourier transforms, analysis in frequency domain;(we use Modern Control based on ODE’s)
System Identification
Signal processing
Stochastic control
Adaptive control
(Full) optimal control
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.9
Examination and Grading
Course grades: Fail (U), Pass (G) or Pass with Distinction (VG)
1 written “filter” exam→ U or G2 weekly exercises→ U, G or VG
Matlab code + brief, written report
7 exercises × 22 points/exercise = 154 points
each can be resubmitted once after feedback3 “publication-style” final project report,
demo and presentation→ U, G or VG150 points according to grading rubric
Course grades: Fail (U), Pass (G) or Pass with Distinction (VG)
1 written “filter” exam→ U or G2 weekly exercises→ U, G or VG
Matlab code + brief, written report
7 exercises × 22 points/exercise = 154 points
each can be resubmitted once after feedback3 “publication-style” final project report,
demo and presentation→ U, G or VG150 points according to grading rubric
Grading: VG = 308 - 230 pts, G = 229 - 150 pts, U = 149 - 0 pts
exam, exercises and project need to be passed separately
http://www.aass.oru.se/Research/mro/rkg_dir/rc_2016/grading.pdf
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.9
Examination and Grading
Course grades: Fail (U), Pass (G) or Pass with Distinction (VG)
1 written “filter” exam→ U or G2 weekly exercises→ U, G or VG
Matlab code + brief, written report
7 exercises × 22 points/exercise = 154 points
each can be resubmitted once after feedback3 “publication-style” final project report,
demo and presentation→ U, G or VG150 points according to grading rubric
Grading: VG = 308 - 230 pts, G = 229 - 150 pts, U = 149 - 0 pts
exam, exercises and project need to be passed separately
http://www.aass.oru.se/Research/mro/rkg_dir/rc_2016/grading.pdf
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.9
Examination and Grading
Course grades: Fail (U), Pass (G) or Pass with Distinction (VG)
1 written “filter” exam→ U or G
2 weekly exercises→ U, G or VG
Matlab code + brief, written report
7 exercises × 22 points/exercise = 154 points
each can be resubmitted once after feedback
3 “publication-style” final project report,demo and presentation→ U, G or VG
150 points according to grading rubric
Grading: VG = 308 - 230 pts, G = 229 - 150 pts, U = 149 - 0 pts
exam, exercises and project need to be passed separately
http://www.aass.oru.se/Research/mro/rkg_dir/rc_2016/grading.pdf
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.9
Agenda
1 Course Format, Schedule & Grading
2 OverviewWhat’s it All About . . . ?The Big PictureWrap-up
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.10
Control Theory is about . . . ?
System: Something that changes over time
Control: Influence to achieve desired behavior
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.10
Control Theory is about . . . ?
System: Something that changes over time
Control: Influence to achieve desired behavior
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.10
Control Theory is about . . . ?
System: Something that changes over time
Control: Influence to achieve desired behavior
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.11
Control system ingredients
State x : Rep. of what the system is currently doing
Dynamics x : Description of the state’s change
Process
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.11
Control system ingredients
State x : Rep. of what the system is currently doing
Dynamics x : Description of the state’s change
Control u: Influence on the system
ProcessController
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.11
Control system ingredients
State x : Rep. of what the system is currently doing
Dynamics x : Description of the state’s change
Control u: Influence on the system
Reference r : Rep. of what the system should do
ProcessController
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.11
Control system ingredients
State x : Rep. of what the system is currently doing
Dynamics x : Description of the state’s change
Control u: Influence on the system
Reference r : Rep. of what the system should do
Output y : (Partial) measurements of the system
Feedback: Mapped output
ProcessController
Feedbackmapping
+
-
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.12
A Bit of Context . . .
Artificial Intelligence (AI): make a sequence of decisionsover time so as to achieve certain goals [8]
fits the bill of a control problem . . .
. . . but AI is traditionally more focused on problem solving
How does Control Theory relate to / differ from:
AI / Planning,
Machine Learning?
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.12
A Bit of Context . . .
Artificial Intelligence (AI): make a sequence of decisionsover time so as to achieve certain goals [8]
fits the bill of a control problem . . .
. . . but AI is traditionally more focused on problem solving
How does Control Theory relate to / differ from:
AI / Planning,
Machine Learning?
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.13
Control Theory vs AI
What do the gurus (Peter Norvig & Sebastian Thrun) say?
https://www.udacity.com/course/intro-to-artificial-intelligence--cs271
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.14
Control Theory vs AI
According to Sutton [8]:
AI
Solve hard problems
“Perfect model disease”(toy domains, block worlds,. . . )
No understanding of thecontrolled system
Control Theory
Clear, rigorous
Based on proven concepts
Guarantees (convergence,stability, . . . )
Limited
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.15
A (hopefully) Illustrative Exemplary Problem
Robot Motion Planning: Find a path that moves a point-robotgradually from start to goal while avoiding obstacles
1st (ofmany) AI options: Path Planning via Search
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.15
A (hopefully) Illustrative Exemplary Problem
1st (of many) AI options: Path Planning via Search
World represented on discrete grid
Breadth First Search: explore space in layers . . .
. . . and label cells according to a distance metric
Once the goal cell is explored . . .
. . . trace back to find the minimum-distance path
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.15
A (hopefully) Illustrative Exemplary Problem
1st (of many) AI options: Path Planning via Search
World represented on discrete grid
Breadth First Search: explore space in layers . . .
. . . and label cells according to a distance metric
Once the goal cell is explored . . .
. . . trace back to find the minimum-distance path
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.15
A (hopefully) Illustrative Exemplary Problem
1st (of many) AI options: Path Planning via Search
World represented on discrete grid
Breadth First Search: explore space in layers . . .
. . . and label cells according to a distance metric
Once the goal cell is explored . . .
. . . trace back to find the minimum-distance path
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.15
A (hopefully) Illustrative Exemplary Problem
1st (of many) AI options: Path Planning via Search
World represented on discrete grid
Breadth First Search: explore space in layers . . .
. . . and label cells according to a distance metric
Once the goal cell is explored . . .
. . . trace back to find the minimum-distance path
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.15
A (hopefully) Illustrative Exemplary Problem
1st (of many) AI options: Path Planning via Search
Do you see any issues with this approach?
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.15
A (hopefully) Illustrative Exemplary Problem
1st (of many) AI options: Path Planning via Search
Always finds optimal solution
World discretization error
What about the robot’s physical properties(kinematic/dynamic constraints)?
Computational effort
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.16
A (hopefully) Illustrative Exemplary Problem
Another option: Find a Random Solution via Sampling
Grow a tree by random sampling (RRT [6])
Valid samples can be connected (collision checking)
Invalid samples are rejected
Once the goal node can be connected . . .
. . . a path is extracted from the tree
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.16
A (hopefully) Illustrative Exemplary Problem
Another option: Find a Random Solution via Sampling
Grow a tree by random sampling (RRT [6])
Valid samples can be connected (collision checking)
Invalid samples are rejected
Once the goal node can be connected . . .
. . . a path is extracted from the tree
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.16
A (hopefully) Illustrative Exemplary Problem
Another option: Find a Random Solution via Sampling
Grow a tree by random sampling (RRT [6])
Valid samples can be connected (collision checking)
Invalid samples are rejected
Once the goal node can be connected . . .
. . . a path is extracted from the tree
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.16
A (hopefully) Illustrative Exemplary Problem
Another option: Find a Random Solution via Sampling
Grow a tree by random sampling (RRT [6])
Valid samples can be connected (collision checking)
Invalid samples are rejected
Once the goal node can be connected . . .
. . . a path is extracted from the tree
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.16
A (hopefully) Illustrative Exemplary Problem
Another option: Find a Random Solution via Sampling
Grow a tree by random sampling (RRT [6])
Valid samples can be connected (collision checking)
Invalid samples are rejected
Once the goal node can be connected . . .
. . . a path is extracted from the tree
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.16
A (hopefully) Illustrative Exemplary Problem
Another option: Find a Random Solution via Sampling
Where do you see the pros/cons of this approach?
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.16
A (hopefully) Illustrative Exemplary Problem
Another option: Find a Random Solution via Sampling
Finds a solution if given enough time
Solution is random
What about the robot’s physical properties(kinematic/dynamic constraints)?
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.16
A (hopefully) Illustrative Exemplary Problem
Another option: Find a Random Solution via Sampling
http://aass.oru.se/~fll/videos/box_middle2.mpg [5]
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.17
A (hopefully) Illustrative Exemplary Problem
A Control Approach to Motion Planning
Assume our robot is governed by the (super-unrealistic for anyrobot) dynamics:
x = f (x , t) =[q1q2
]︸︷︷︸
x
=
[λ1 00 λ2
]︸ ︷︷ ︸
A
[q1q2
]︸︷︷︸
x
(1)
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.17
A (hopefully) Illustrative Exemplary Problem
A Control Approach to Motion Planning
Add (constant) control u:
x = f (x ,u, t) =[q1q2
]︸︷︷︸
x
=
[λ1 00 λ2
]︸ ︷︷ ︸
A
[q1q2
]︸︷︷︸
x
+
[−λ1g1−λ2g2
]︸ ︷︷ ︸
u
(1)
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.17
A (hopefully) Illustrative Exemplary Problem
A Control Approach to Motion Planning
Recall: x = what system is doing; x = description of change
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.17
A (hopefully) Illustrative Exemplary Problem
A Control Approach to Motion Planning
State evolution is governed by the map x = f (x ,u, t)
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.17
A (hopefully) Illustrative Exemplary Problem
A Control Approach to Motion Planning
State evolution is governed by the map x = f (x ,u, t)
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.17
A (hopefully) Illustrative Exemplary Problem
A Control Approach to Motion Planning
The dynamical system x = f (x ,u, t) covers the whole statespace→ built-in disturbance rejection
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.17
A (hopefully) Illustrative Exemplary Problem
A Control Approach to Motion Planning
You cheated!!! What about obstacles? → e.g. addrepelling potential fields [4] or constraints [3]
Control can be seen as real-time, reactive, continuousplanning
Local, greedy approach→ remedied by global optimalcontrol/trajectory optimization (costly!)
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.18
Wrap-up
AIclassically based onproblem-solving:
Solving puzzles
Playing board-games
Control Theory
Classically based ontheorem-proof mathapproach:
Low-level actuatorcontrol (e. g. DC-motorvelocity control)
Often no clear boundaries between control, AI, Learning!
Cross-disciplinaryMotion planning via control
Learning controllers from data [2, 7]
Reinforcement Learning (= adaptive control [9])
. . .
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.18
Wrap-up
AIclassically based onproblem-solving:
Solving puzzles
Playing board-games
Control Theory
Classically based ontheorem-proof mathapproach:
Low-level actuatorcontrol (e. g. DC-motorvelocity control)
Often no clear boundaries between control, AI, Learning!
Cross-disciplinaryMotion planning via control
Learning controllers from data [2, 7]
Reinforcement Learning (= adaptive control [9])
. . .
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.19
Where Are We Today?
DARPA Robotics Challenge 2015
https://www.youtube.com/watch?v=g0TaYhjpOfo
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.20
Where Are We Today?
here . . . . . . or here?
“I think that the control viewpoint is now much more profitablethan the problem solving one, and that control should be thecenterpiece of AI and machine learning research.”1
Bottomline: A robot is a physical system acting in a physicalworld→ at some (ideally across all) architecture levels youhave to think about control.
1Richard S. Sutton. “Artificial intelligence as a control problem: Commentson the relationship between machine learning and intelligent control”. In: Proc.IEEE Int. Symp. on Intell. Control. 1988, pp. 500–507.
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.20
Where Are We Today?
here . . . . . . or here?
“I think that the control viewpoint is now much more profitablethan the problem solving one, and that control should be thecenterpiece of AI and machine learning research.”1
Bottomline: A robot is a physical system acting in a physicalworld→ at some (ideally across all) architecture levels youhave to think about control.
1Richard S. Sutton. “Artificial intelligence as a control problem: Commentson the relationship between machine learning and intelligent control”. In: Proc.IEEE Int. Symp. on Intell. Control. 1988, pp. 500–507.
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.20
Where Are We Today?
here . . . . . . or here?
“I think that the control viewpoint is now much more profitablethan the problem solving one, and that control should be thecenterpiece of AI and machine learning research.”1
Bottomline: A robot is a physical system acting in a physicalworld→ at some (ideally across all) architecture levels youhave to think about control.
1Richard S. Sutton. “Artificial intelligence as a control problem: Commentson the relationship between machine learning and intelligent control”. In: Proc.IEEE Int. Symp. on Intell. Control. 1988, pp. 500–507.
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.21
Reading List
Dimitar Dimitrov, Manipulation and Control, Handout 1:http://www.aass.oru.se/Research/Learning/drdv_dir/mc2011/Lecture_1.pdf
Supplemental:
Nathan Ratliff, Mathematics for Intelligent Systems, LinearAlgebra I: Matrix Representations of Linear Transforms:http://www.nathanratliff.com/pedagogy/mathematics-for-intelligent-systems
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.22
You don’t get away that easy . . .,
Exit Ticket - one minute response
Most important thing you learned today?
Main unanswered question?
Muddiest point?
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.22
Gene F. Franklin, David J. Powell, andAbbas Emami-Naeini. Feedback Control of DynamicSystems. 4th ed. Prentice Hall, 2002.
Auke J. Ijspeert, Jun Nakanishi, and Stefan Schaal.“Movement imitation with nonlinear dynamical systems inhumanoid robots”. In: Proc. IEEE Int. Conf. Rob. Autom.(ICRA). Vol. 2. 2002, pp. 1398–1403.
Oussama Kanoun, Florent Lamiraux, andPierre-Brice Wieber. “Kinematic control of redundantmanipulators: Generalizing the task-priority framework toinequality task”. In: IEEE Trans. Rob. (T-RO) 27.4 (2011),pp. 785–792.
Oussama Khatib. “Real-time obstacle avoidance formanipulators and mobile robots”. In: Int. J. Rob. Res.(IJRR) 5.1 (1986), pp. 90–98.
Fabien Lagriffoul et al. “Efficiently combining task andmotion planning using geometric constraints”. In: Int. J.Rob. Res. (IJRR) 33.14 (2014), pp. 1726–1747.
Introduction
Course Format,Schedule & Grading
OverviewWhat’s it All About . . . ?
The Big Picture
Wrap-up
References
1.22
Steven M. LaValle. Planning algorithms. Cambridgeuniversity press, 2006.
Sergey Levine, Nolan Wagener, and Pieter Abbeel.“Learning contact-rich manipulation skills with guidedpolicy search”. In: Proc. IEEE Int. Conf. Rob. Autom.(ICRA). 2015, pp. 156–163.
Richard S. Sutton. “Artificial intelligence as a controlproblem: Comments on the relationship between machinelearning and intelligent control”. In: Proc. IEEE Int. Symp.on Intell. Control. 1988, pp. 500–507.
Richard S. Sutton, Andrew G. Barto, andRonald J. Williams. “Reinforcement learning is directadaptive optimal control”. In: IEEE Control Systems 12.2(1992), pp. 19–22.