Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018...
Transcript of Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018...
![Page 1: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing](https://reader033.fdocuments.us/reader033/viewer/2022053019/5f27760bd02eb11c061656dd/html5/thumbnails/1.jpg)
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
Kurtland Chua, Roberto Calandra, Rowan McAllister, Sergey LevineUniversity of California, Berkeley
![Page 2: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing](https://reader033.fdocuments.us/reader033/viewer/2022053019/5f27760bd02eb11c061656dd/html5/thumbnails/2.jpg)
How Long Does Learning Take?
~800,000 grasp
attempts
~21 million games
~50 million frames
[Mnih et al. 2015]
[Silver et al. 2017]
[Levine et al. 2017]
![Page 3: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing](https://reader033.fdocuments.us/reader033/viewer/2022053019/5f27760bd02eb11c061656dd/html5/thumbnails/3.jpg)
Can we speed this up?
![Page 4: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing](https://reader033.fdocuments.us/reader033/viewer/2022053019/5f27760bd02eb11c061656dd/html5/thumbnails/4.jpg)
Model-Based Reinforcement Learning
OptimizePolicy
ExecutePolicy
Train Dynamics Model
![Page 5: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing](https://reader033.fdocuments.us/reader033/viewer/2022053019/5f27760bd02eb11c061656dd/html5/thumbnails/5.jpg)
Comparative Performance on HalfCheetah
![Page 6: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing](https://reader033.fdocuments.us/reader033/viewer/2022053019/5f27760bd02eb11c061656dd/html5/thumbnails/6.jpg)
Comparative Performance on HalfCheetah
![Page 7: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing](https://reader033.fdocuments.us/reader033/viewer/2022053019/5f27760bd02eb11c061656dd/html5/thumbnails/7.jpg)
Deterministic Neural Nets as Models
![Page 8: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing](https://reader033.fdocuments.us/reader033/viewer/2022053019/5f27760bd02eb11c061656dd/html5/thumbnails/8.jpg)
Deterministic Neural Nets as Models
![Page 9: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing](https://reader033.fdocuments.us/reader033/viewer/2022053019/5f27760bd02eb11c061656dd/html5/thumbnails/9.jpg)
Deterministic Neural Nets as Models
![Page 10: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing](https://reader033.fdocuments.us/reader033/viewer/2022053019/5f27760bd02eb11c061656dd/html5/thumbnails/10.jpg)
Deterministic Neural Nets as Models
![Page 11: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing](https://reader033.fdocuments.us/reader033/viewer/2022053019/5f27760bd02eb11c061656dd/html5/thumbnails/11.jpg)
Deterministic Neural Nets as Models
![Page 12: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing](https://reader033.fdocuments.us/reader033/viewer/2022053019/5f27760bd02eb11c061656dd/html5/thumbnails/12.jpg)
Probabilistic Neural Nets as Models
![Page 13: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing](https://reader033.fdocuments.us/reader033/viewer/2022053019/5f27760bd02eb11c061656dd/html5/thumbnails/13.jpg)
Probabilistic Ensembles as Models
![Page 14: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing](https://reader033.fdocuments.us/reader033/viewer/2022053019/5f27760bd02eb11c061656dd/html5/thumbnails/14.jpg)
Probabilistic Ensembles as Models
![Page 15: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing](https://reader033.fdocuments.us/reader033/viewer/2022053019/5f27760bd02eb11c061656dd/html5/thumbnails/15.jpg)
Trajectory Sampling for State Propagation
![Page 16: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing](https://reader033.fdocuments.us/reader033/viewer/2022053019/5f27760bd02eb11c061656dd/html5/thumbnails/16.jpg)
Trajectory Sampling for State Propagation
![Page 17: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing](https://reader033.fdocuments.us/reader033/viewer/2022053019/5f27760bd02eb11c061656dd/html5/thumbnails/17.jpg)
Trajectory Sampling for State Propagation
![Page 18: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing](https://reader033.fdocuments.us/reader033/viewer/2022053019/5f27760bd02eb11c061656dd/html5/thumbnails/18.jpg)
Trajectory Sampling for State Propagation
![Page 19: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing](https://reader033.fdocuments.us/reader033/viewer/2022053019/5f27760bd02eb11c061656dd/html5/thumbnails/19.jpg)
Trajectory Sampling for State Propagation
![Page 20: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing](https://reader033.fdocuments.us/reader033/viewer/2022053019/5f27760bd02eb11c061656dd/html5/thumbnails/20.jpg)
Trajectory Sampling for State Propagation
![Page 21: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing](https://reader033.fdocuments.us/reader033/viewer/2022053019/5f27760bd02eb11c061656dd/html5/thumbnails/21.jpg)
Trajectory Sampling for State Propagation
![Page 22: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing](https://reader033.fdocuments.us/reader033/viewer/2022053019/5f27760bd02eb11c061656dd/html5/thumbnails/22.jpg)
Trajectory Sampling for State Propagation
![Page 23: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing](https://reader033.fdocuments.us/reader033/viewer/2022053019/5f27760bd02eb11c061656dd/html5/thumbnails/23.jpg)
Trajectory Sampling for State Propagation
![Page 24: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing](https://reader033.fdocuments.us/reader033/viewer/2022053019/5f27760bd02eb11c061656dd/html5/thumbnails/24.jpg)
Trajectory Sampling for State Propagation
![Page 25: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing](https://reader033.fdocuments.us/reader033/viewer/2022053019/5f27760bd02eb11c061656dd/html5/thumbnails/25.jpg)
Experimental Results
![Page 26: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing](https://reader033.fdocuments.us/reader033/viewer/2022053019/5f27760bd02eb11c061656dd/html5/thumbnails/26.jpg)
https://github.com/kchua/handful-of-trialshttps://sites.google.com/view/drl-in-a-handful-of-trials
Code:Website:
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
Kurtland Chua Roberto Calandra Rowan McAllister Sergey Levine
Data efficientCompetitive asymptotic
performanceEasy to implement
Poster #165