Chelsea Finn

In collaboration with Sergey Levine and Ian Goodfellow

Large-Scale Self-Supervised Robotic Learning

Generalization in Reinforcement Learning

to object instances to tasks and environments

Pinto & Gupta ‘16

Levine et al. ‘16

Oh et al. ‘16

Mnih et al. ‘15

need data scale up

First lesson: human supervision doesn’t scale (providing rewards, reseting the environment, etc.)

need data scale up

where does the supervision come from?

most deep RL algorithms learn a single-purpose policy

self-supervision

learn general-purpose model

Evaluating unsupervised methods?lacking task-driven metrics for unsupervised learning

Data collection - 50k sequences (1M+ frames)

data publicly available for download sites.google.com/site/brainrobotdata

test set with novel objects

convolutional LSTMs

action-conditioned stochastic flow prediction

- feed back model’s predictions for multi-frame prediction - trained with l2 loss

Train predictive model

stochastic flow prediction

Stacked ConvLSTM It^

transform

.transformed

images

parameters

evaluate on held-out objects

convolutional LSTMs

action-conditioned stochastic flow prediction

Are these predictions good?

Finn et al., ‘16

Are these predictions good? accurate? useful?

Kalchbrenner et al., ‘16Train predictive model

0x 0.5x 1x 1.5x

What is prediction good for?

action magnitude:

1. Sample N potential action sequences 2. Predict the future for each action

sequence 3. Pick best future & execute

corresponding action 4. Repeat 1-3 to replan in real time

Visual MPC: Planning with Visual Foresight

Specify goal by selecting where pixels should move.

Select future with maximal probability of pixels reaching their respective goals.

Which future is the best one?

output is the mean of a probability distribution over pixel motion predictions

We can predict how pixels will move based on the robot’s actions

–Johnny Appleseed

“Type a quote here.”

How it works

- evaluation on short pushes of novel objects

- translation & rotation

Only human involvement during training is: programming initial motions and providing objects to play with.

Does it work?

–Johnny Appleseed

“Type a quote here.”

Outperforms naive baselines

Benefits of this approach - learn for a wide variety of tasks - scalable - requires minimal human involvement - a good way to evaluate video prediction models

Limitations - can’t [yet] learn complex skills - compute-intensive at test time - some planning methods susceptible to

adversarial examples

Takeaways

unlabeled videoexperience

train model

indicated goal

visual foresight

Future challenges in large-scale self-supervised learning

better predictive models

learn visual reward functions

task-driven exploration, attention

long-term planning - hierarchy - stochasticity

Collaborators

Sergey Levine Ian Goodfellow

Bibliography

Thanks to…Vincent Vanhoucke

Peter Pastor Ethan Holly Jon Barron

Finn, C., Goodfellow, I., & Levine, S. Unsupervised Learning for Physical Interaction through Video Prediction. NIPS 2016

Finn, C. & Levine, S. Deep Visual Foresight for Planning Robot Motion. Under Review, arXiv 2016.

Questions?

cbfinn@eecs.berkeley.edu

All data and code linked at: people.eecs.berkeley.edu/~cbfinn

Collaborators

Sergey Levine Ian Goodfellow

Thanks to…Vincent Vanhoucke

Peter Pastor Ethan Holly Jon Barron

Questions?

All data and code linked at: people.eecs.berkeley.edu/~cbfinn

cbfinn@eecs.berkeley.edu

Thanks!Takeaway:

Acquiring a cost function is important! (and challenging)

–Johnny Appleseed

“Type a quote here.” Sources of failure:

- model mispredictions - more compute needed - occlusions - pixel tracking

This is just the beginning…

Can we design the right model? - stochastic? - longer sequences? - hierarchical? - deeper?

Can we handle long-term planning?

Collecting data with a purpose.

Chelsea Finn

Documents

Transcript of Chelsea Finn

Chelsea Finn, Xin Yu Tan, Yan Duan, Trevor Darrell, Sergey Levine, … · 2016. 3. 2. · Chelsea Finn, Xin Yu Tan, Yan Duan, Trevor Darrell, Sergey Levine, Pieter Abbeel Abstract—Reinforcement

Grosvenor CHELSEA PIMLICO HospitalGardens · Chelsea Chelsea CHY CHY SS Chelsea ARA Chelsea Chelsea E Chelsea CHY CHY CHY Western Riverside Cemex CHY BaterseaRail Whf) Chelsea Bridge

Finn Motorsports

Huck Finn The Adventures of Huckleberry Finn. Hollabaugh Name_____ Period_____ Date_____ Score_____ American Literature Huck Finn The Adventures of Huckleberry Finn Study Guide: Chapters

Jacob Finn

Deep Reinforcement Learning amidst Lifelong Non-StationarityDeep Reinforcement Learning amidst Lifelong Non-Stationarity Annie Xie, James Harrison, Chelsea Finn Stanford University,

Language as an Abstraction for Hierarchical Deep ...Language as an Abstraction for Hierarchical Deep Reinforcement Learning Yiding Jiang , Shixiang Gu, Kevin Murphy, Chelsea Finn Google

Chelsea hReviews on The Chelsea house by client

A3212 Chelsea Embankment junction with Chelsea Bridge Road ...

Finn Filmnapok

Huckleberry Finn

Finn by Chase SM Deposit Account Agreement (PDF) BY CHASE SM DEPOSIT ACCOUNT AGREEMENT FINN CHECKING AND SAVINGS PRODUCT INFORMATION WELCOME TO FINN Thank you for opening your Finn

Events 2011 - Pestana Chelsea Bridge, Chelsea, London, United Kingdom

One-Shot Imitation from Observing Humans via Domain ... · One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning Tianhe Yu*, Chelsea Finn*, Annie Xie, Sudeep

Huckleberry Finn - Internet Archivearchive.org/download/huck_finn_librivox/Huck_Finn.pdfHuckleberry Finn Huckleberry Finn Mark Twain Adventures of Huckleberry Finn (1884) by Mark Twain

Research Statement { Chelsea Finn { UC Berkeley · Research Statement { Chelsea Finn { UC Berkeley The ultimate goal of my research is to enable robots to be generalists, capable

Hafslund ASA 24 October 2013 Finn Bjørn Ruyter, CEO · 24 October 2013 Finn Bjørn Ruyter, CEO. Agenda 1) Third quarter 2013 – Finn Bjørn Ruyter, CEO 2) Hafslund Heat – Finn

FINN - 400

Model-Agnostic Meta-Learning for Fast Adaptation of Deep ... · Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks Chelsea Finn 1Pieter Abbeel1 2 Sergey Levine Abstract

Cologne - Furniture Team€¦ · Okazaki San Finn Juhl: Chieftain Sofa - Chieftain footstool, 1949 Chelsea Hotel Wohnkultur66 is the name of a unique furniture concept store in the

Huck Finn The Adventures of Huckleberry Finn. Hollabaugh Name___ Period_ Date_ Score___ American Literature Huck Finn The Adventures of Huckleberry Finn Study Guide: Chapters

One-Shot Imitation from Observing Humans via Domain ... · One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning Tianhe Yu, Chelsea Finn, Annie Xie, Sudeep