Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT
description
Transcript of Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT
![Page 1: Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT](https://reader036.fdocuments.us/reader036/viewer/2022070405/56813f9a550346895daa8a1c/html5/thumbnails/1.jpg)
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; January 2000 1
Adaptive Intelligent Mobile Robotics
Leslie Pack Kaelbling
Artificial Intelligence Laboratory
MIT
![Page 2: Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT](https://reader036.fdocuments.us/reader036/viewer/2022070405/56813f9a550346895daa8a1c/html5/thumbnails/2.jpg)
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; January 2000 2
Progress to Date
• Erik the Red• Video game environment• Optical flow implementation• Fast bootstrapped reinforcement learning
![Page 3: Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT](https://reader036.fdocuments.us/reader036/viewer/2022070405/56813f9a550346895daa8a1c/html5/thumbnails/3.jpg)
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; January 2000 3
Erik the Red
RWI B21 robot• camera, sonars, laser range-finder, infrareds• 3 Linux machines• ported our framework for writing debuggable code
![Page 4: Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT](https://reader036.fdocuments.us/reader036/viewer/2022070405/56813f9a550346895daa8a1c/html5/thumbnails/4.jpg)
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; January 2000 4
Erik the Red
![Page 5: Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT](https://reader036.fdocuments.us/reader036/viewer/2022070405/56813f9a550346895daa8a1c/html5/thumbnails/5.jpg)
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; January 2000 5
Crystal Space
Public-domain video-game environment• complex graphics• other agents• highly modifiable
![Page 6: Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT](https://reader036.fdocuments.us/reader036/viewer/2022070405/56813f9a550346895daa8a1c/html5/thumbnails/6.jpg)
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; January 2000 6
Crystal Space
![Page 7: Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT](https://reader036.fdocuments.us/reader036/viewer/2022070405/56813f9a550346895daa8a1c/html5/thumbnails/7.jpg)
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; January 2000 7
Optical Flow
Get range information visually by computing optical flow field
• nearer objects cause flow of higher magnitude• expansion pattern means you’re going to hit• rate of expansion tells you when• elegant control laws based on center and rate of
expansion (derived from human and fly behavior)
![Page 8: Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT](https://reader036.fdocuments.us/reader036/viewer/2022070405/56813f9a550346895daa8a1c/html5/thumbnails/8.jpg)
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; January 2000 8
Optical Flow in Crystal Space
![Page 9: Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT](https://reader036.fdocuments.us/reader036/viewer/2022070405/56813f9a550346895daa8a1c/html5/thumbnails/9.jpg)
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; January 2000 9
Making RL Really Work
Typical RL methods require far too much data to be practical in an online setting. Address the problem by
• strong generalization techniques• using human input to bootstrap
![Page 10: Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT](https://reader036.fdocuments.us/reader036/viewer/2022070405/56813f9a550346895daa8a1c/html5/thumbnails/10.jpg)
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; January 2000 10
JAQL
Learning a value function in a continuous state and action space
• based on locally weighted regression (fancy version of nearest neighbor)
• algorithm knows what it knows• use meta-knowledge to be conservative about
dynamic-programming updates
![Page 11: Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT](https://reader036.fdocuments.us/reader036/viewer/2022070405/56813f9a550346895daa8a1c/html5/thumbnails/11.jpg)
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; January 2000 11
Incorporating Human Input
Humans can help a lot, even if they can’t perform the task very well.
• Provide some initial successful trajectories through the space
• Trajectories are not used for supervised learning, but to guide the reinforcement-learning methods through useful parts of the space
• Learn models of the dynamics of the world and of the reward structure
• Once learned models are good, use them to update the value function and policy as well.
![Page 12: Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT](https://reader036.fdocuments.us/reader036/viewer/2022070405/56813f9a550346895daa8a1c/html5/thumbnails/12.jpg)
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; January 2000 12
Simple Experiment
The “hill-car” problem in two continuous dimensions• Regular RL methods take thousands of trials to
learn a reasonable policy• JAQL takes 11 inefficient but eventually successful
trails generated by humans to get 80% performance
• 10 more subsequent trials generate high quality performance in the whole space
![Page 13: Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT](https://reader036.fdocuments.us/reader036/viewer/2022070405/56813f9a550346895daa8a1c/html5/thumbnails/13.jpg)
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; January 2000 13
Success Percentage
0
10
20
30
40
50
60
70
80
90
100
subsequent training runs
JAQLQL
![Page 14: Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT](https://reader036.fdocuments.us/reader036/viewer/2022070405/56813f9a550346895daa8a1c/html5/thumbnails/14.jpg)
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; January 2000 14
Trial Length (200 max)
0
20
40
60
80
100
120
140
160
180
subsequent training runs
JAQLQL
54-step
optimum
![Page 15: Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT](https://reader036.fdocuments.us/reader036/viewer/2022070405/56813f9a550346895daa8a1c/html5/thumbnails/15.jpg)
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; January 2000 15
Next Steps
• Implement optical-flow control algorithms on robot• Apply RL techniques to tune parameters in control
algorithms on robot in real time• corridor following using sonar and laser• obstacle avoidance using optical flow
• Build highly complex simulated environment• Integrate planning and learning in multi-layer
system