Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018...

26

Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models Kurtland Chua , Roberto Calandra, Rowan McAllister, Sergey Levine University of California, Berkeley

Upload
others
Category

Documents
view
3
download
0

Embed Size (px):

Transcript of Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018...

Page 1: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models

Kurtland Chua, Roberto Calandra, Rowan McAllister, Sergey LevineUniversity of California, Berkeley

Page 2: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

How Long Does Learning Take?

~800,000 grasp

attempts

~21 million games

~50 million frames

[Mnih et al. 2015]

[Silver et al. 2017]

[Levine et al. 2017]

Page 3: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Can we speed this up?

Page 4: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Model-Based Reinforcement Learning

OptimizePolicy

ExecutePolicy

Train Dynamics Model

Page 5: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Comparative Performance on HalfCheetah

Page 6: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Comparative Performance on HalfCheetah

Page 7: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Deterministic Neural Nets as Models

Page 8: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Deterministic Neural Nets as Models

Page 9: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Deterministic Neural Nets as Models

Page 10: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Deterministic Neural Nets as Models

Page 11: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Deterministic Neural Nets as Models

Page 12: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Probabilistic Neural Nets as Models

Page 13: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Probabilistic Ensembles as Models

Page 14: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Probabilistic Ensembles as Models

Page 15: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Trajectory Sampling for State Propagation

Page 16: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Trajectory Sampling for State Propagation

Page 17: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Trajectory Sampling for State Propagation

Page 18: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Trajectory Sampling for State Propagation

Page 19: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Trajectory Sampling for State Propagation

Page 20: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Trajectory Sampling for State Propagation

Page 21: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Trajectory Sampling for State Propagation

Page 22: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Trajectory Sampling for State Propagation

Page 23: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Trajectory Sampling for State Propagation

Page 24: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Trajectory Sampling for State Propagation

Page 25: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

Experimental Results

Page 26: Deep Reinforcement Learning in a Handful of Trials with ... › media › Slides › nips › 2018 › 220cd(06-15... · Deep Reinforcement Learning in a Handful of Trials u sing

https://github.com/kchua/handful-of-trialshttps://sites.google.com/view/drl-in-a-handful-of-trials

Code:Website:

Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models

Kurtland Chua Roberto Calandra Rowan McAllister Sergey Levine

Data efficientCompetitive asymptotic

performanceEasy to implement

Poster #165

Handful of Leaves 3

Handful of Leaves 3

Deep Reinforcement Learning in a Handful of Trials using ... · decisions [Atkeson and Santamaría, 1997, Kocijan et al., 2004, Deisenroth et al., 2014]. MBRL is appealing because

Deep Reinforcement Learning in a Handful of Trials using ... · decisions [Atkeson and Santamaría, 1997, Kocijan et al., 2004, Deisenroth et al., 2014]. MBRL is appealing because

Multi-Objective Reinforcement Learning using Sets of Pareto … · 2020. 10. 19. · learning and multi-objective reinforcement learning. 2.1 Reinforcement Learning A reinforcement

Multi-Objective Reinforcement Learning using Sets of Pareto … · 2020. 10. 19. · learning and multi-objective reinforcement learning. 2.1 Reinforcement Learning A reinforcement

Handful of Salt - PJALS

Handful of Salt - PJALS

A Handful of dreams

A Handful of dreams

Deep Reinforcement Learning in a Handful of Trials using …€¦ · Gal et al., 2016, Depeweg et al., 2016], the par-ticular details of the implementation and design decisions in

Deep Reinforcement Learning in a Handful of Trials using …€¦ · Gal et al., 2016, Depeweg et al., 2016], the par-ticular details of the implementation and design decisions in

a mighty handful - full score

a mighty handful - full score

Clinical Trials - Office of Behavioral and Social Sciences ... trials usually involve a program of studies from initial exploratory studies on a handful of subjects to large trials

Clinical Trials - Office of Behavioral and Social Sciences ... trials usually involve a program of studies from initial exploratory studies on a handful of subjects to large trials

Jacob-A Handful of Popular Sayings-Vol-3

Jacob-A Handful of Popular Sayings-Vol-3

Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.

Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.

A Handful of Dust by Evelyn Waugh

A Handful of Dust by Evelyn Waugh

A Handful of Topics

A Handful of Topics

TRIALS AND TRIBULATIONS Architectural Constraints on Modeling a Visuomotor Task within the Reinforcement Learning Paradigm.

TRIALS AND TRIBULATIONS Architectural Constraints on Modeling a Visuomotor Task within the Reinforcement Learning Paradigm.

Schedules of reinforcement. Schedules of Reinforcement Continuous reinforcement refers to reinforcement being administered to each instance of a response.

Schedules of reinforcement. Schedules of Reinforcement Continuous reinforcement refers to reinforcement being administered to each instance of a response.

Reinforcement and deep reinforcement learning for wireless ...

Reinforcement and deep reinforcement learning for wireless ...

Hales Handful... Up From the Ashes

Hales Handful... Up From the Ashes

for high REINFORCEMENT SYSTEMS - nevoga.com€¦ · 2 REINFORCEMENT SYSTEMS REINFORCEMENT SYSTEMS REINFORCEMENT SYSTEM PLEXUS®, PYRAPLEX®, FTW The reinforcement system PLEXUS®,

for high REINFORCEMENT SYSTEMS - nevoga.com€¦ · 2 REINFORCEMENT SYSTEMS REINFORCEMENT SYSTEMS REINFORCEMENT SYSTEM PLEXUS®, PYRAPLEX®, FTW The reinforcement system PLEXUS®,

A HANDFUL OF RICE: SAVINGS MOBILIZATION BY MICRO ...

A HANDFUL OF RICE: SAVINGS MOBILIZATION BY MICRO ...

Guide to Historical Reinforcement - SRIA Concrete 2017 Historical Reinforcement... · Guide to Historical Reinforcement ... reinforcement material properties to use when checking

Guide to Historical Reinforcement - SRIA Concrete 2017 Historical Reinforcement... · Guide to Historical Reinforcement ... reinforcement material properties to use when checking

Continuous hoops for transverse reinforcement of ... · Continuous hoops for transverse reinforcement of ... transverse reinforcement details for the ... Fig. 1 Transverse reinforcement

Continuous hoops for transverse reinforcement of ... · Continuous hoops for transverse reinforcement of ... transverse reinforcement details for the ... Fig. 1 Transverse reinforcement

Languages

Pages

Legal

Copyright © 2022 FDOCUMENTS