Meta-Learning...

62
Meta-Learning Frontiers: Universal, Uncertain, and Unsupervised Sergey Levine Chelsea Finn UC Berkeley Google Brain

Transcript of Meta-Learning...

Page 1: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Meta-Learning Frontiers:Universal, Uncertain, and Unsupervised

Sergey LevineChelsea Finn

UC Berkeley Google Brain

Page 2: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Levine*, Finn*, Darrell, Abbeel, ‘16

Page 3: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Yahya, Li, Kalakrishnan, Chebotar, Levine, ‘16

Page 4: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Generalizable model-based RL via video prediction

Designated PixelGoal Pixel

Ebert, Finn, Lee, Levine. Self-Supervised Visual Planning via Temporal Skip Connections. CoRL 2017.

Page 5: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Ebert, Finn, Lee, Levine. Self-Supervised Visual Planning via Temporal Skip Connections. CoRL 2017.

Page 6: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Kalashnikov, Irpan, Pastor, Ibarz, Herzong, Jang, Quillen, Holly, Kalakrishnan, Vanhoucke, Levine. QT-Opt: Scalable Deep Reinforcement Learning of Vision-Based Robotic Manipulation Skills

Page 7: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Kalashnikov, Irpan, Pastor, Ibarz, Herzong, Jang, Quillen, Holly, Kalakrishnan, Vanhoucke, Levine. QT-Opt: Scalable Deep Reinforcement Learning of Vision-Based Robotic Manipulation Skills

Page 8: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

what to transfer?people can learn new skills extremely quickly

about four hours about four weeks, nonstop

how?

we never learn from scratch!

representations?models?can we just optimize for what we really want?can we learn to learn?

Page 9: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Outline

- Meta-Learning Problem Statement- Model-Agnostic Meta-Learning (MAML)- Probabilistic Interpretation of MAML- Meta-Learning with Automated Task Proposals- Extensions to Robot Imitation & Intent Inference

Page 10: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Outline

- Meta-Learning Problem Statement- Model-Agnostic Meta-Learning (MAML)- Probabilistic Interpretation of MAML- Meta-Learning with Automated Task Proposals- Extensions to Robot Imitation & Intent Inference

Page 11: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

The Meta-Learning Problem

Inputs: Outputs:Supervised Learning:

Meta-Supervised Learning:Inputs: Outputs:

Data:

Data:

Why is this view useful?Reduces the problem to the design & optimization of f.

{

Finn, Levine. Meta-learning and Universality: Deep Representation… ICLR 2018

Page 12: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Example: Few-Shot Classification

diagram adapted from Ravi & Larochelle ‘17

Given 1 example of 5 classes: Classify new examples

meta-training

training data test set

trainingclasses

… …

Page 13: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Outline

- Meta-Learning Problem Statement

- Model-Agnostic Meta-Learning (MAML)

- Probabilistic Interpretation of MAML

- Meta-Learning with Automated Task Proposals

- Extensions to Robot Imitation & Intent Inference

Page 14: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Design of f ?Recurrent network

(LSTM, NTM, Conv)Santoro et al. ’16, Duan et al. ’17, Wang et al. ’17, Munkhdalai & Yu ’17, Mishra et al. ’17, …

— complex model for complex task of learning— impractical data requirements

Page 15: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Key idea: Train over many tasks, to learn parameter vector θ that transfers

Fine-tuning

Learning Few-Shot Adaptation

training datafor new task

pretrained parameters

Our method

[test-time]

Finn, Abbeel, Levine ICML ’17

Page 16: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Learning Few-Shot Adaptation

Finn, Abbeel, Levine ICML ’17

optimal parameter vector for task i

parameter vectorbeing meta-learned

Model-Agnostic Meta-Learning

Page 17: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

…and the results keep getting better

MiniImagenet few-shot benchmark: 5-shot 5-way

Finn et al. ‘17: 63.11%

Kim et al. ‘18 (AutoMeta): 76.29%Li et al. ‘17: 64.03%

MiniImagenet Few-shot Classification

Page 18: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Design of f ?Recurrent network MAML

network implements the“learned learning procedure”

Does it converge?

What does it converge to?

What to do if not good enough?

Does it converge?- Yes (it’s gradient descent…)

What does it converge to?- A local optimum (it’s gradient descent…)

What to do if not good enough?- Keep taking gradient steps (it’s gradient descent..)

- Who knows…

- Nothing

- Sort of?

Does this structure come at a cost?

Page 19: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Does this structure come at a cost?

For a sufficiently deep f, MAML function can approximate any function of

Finn & Levine, ICLR 2018

Why is this interesting?MAML has benefit of inductive bias without losing expressive power.

Recurrent network MAML

Assumptions:- nonzero - loss function gradient does not lose information about the label- datapoints in are unique

Is this structure useful?Finn, Levine. Meta-learning and Universality: Deep Representation… ICLR 2018

Page 20: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

How well can methods generalize to similar, but extrapolated tasks?MAML TCML, MetaNetworks

task variability

perf

orm

ance

Omniglot image classification

The world is non-stationary.

Finn, Levine. Meta-learning and Universality: Deep Representation… ICLR 2018

Page 21: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

MAML TCMLer

ror

Sinusoid curve regression

task variabilityTakeaway: Strategies learned with MAML consistently generalize better to out-of-distribution tasks

How well can methods generalize to similar, but extrapolated tasks?

The world is non-stationary.

Finn, Levine. Meta-learning and Universality: Deep Representation… ICLR 2018

Page 22: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Outline

- Meta-Learning Problem Statement

- Model-Agnostic Meta-Learning (MAML)

- Probabilistic Interpretation of MAML

- Meta-Learning with Automated Task Proposals

- Extensions to Robot Imitation & Intent Inference

Page 23: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

A probabilistic interpretation

that’s nice but is it… Bayesian?kind of… but it can be more Bayesian

Grant, Finn, Levine, Darrell, Griffiths. “Recasting gradient-based meta-learning as hierarchical Bayes.”

(Santos, 1996)

Page 24: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

A probabilistic interpretation

Grant, Finn, Levine, Darrell, Griffiths. “Recasting gradient-based meta-learning as hierarchical Bayes.”

MAP inference in this model

can we do better than MAP?can use Laplace estimate:

estimate Hessian for neural nets with KFAC

Page 25: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Modeling ambiguity

Important for:

- learning to actively learn

- safety-critical few-shot learning (e.g. medical imaging)

- learning to explore in meta-RL

Can we learn to generate hypotheses about the underlying function??

?

Page 26: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Modeling ambiguity

Can we sample classifiers?

Intuition: we want to learn a prior where a random kick can put us in different modes

smiling, hatsmiling, young

Page 27: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Meta-learning with ambiguity

Finn*, Xu*, Levine. Probabilistic Model-Agnostic Meta-Learning

Page 28: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Sampling parameter vectors

approximate with MAPthis is extremely crudebut extremely convenient!

Finn*, Xu*, Levine. Probabilistic Model-Agnostic Meta-Learning

Page 29: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Sampling parameter vectors

What does ancestral sampling look like?

smiling, hatsmiling, young

Finn*, Xu*, Levine. Probabilistic Model-Agnostic Meta-Learning

Page 30: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Training the model – not so simple!

Can’t do ancestral sampling anymore!

Page 31: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Training the model with amortized inference

Page 32: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Summary

During meta-training:

During meta-testing:

Page 33: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

PLATIPUSProbabilistic LATent model for Incorporating Priors and Uncertainty in few-Shot learning

Finn*, Xu*, Levine. Probabilistic Model-Agnostic Meta-Learning

Ambiguous regression:

Ambiguous classification:

Page 34: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

PLATIPUSProbabilistic LATent model for Incorporating Priors and Uncertainty in few-Shot learning

Finn*, Xu*, Levine. Probabilistic Model-Agnostic Meta-Learning

Page 35: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Outline

- Meta-Learning Problem Statement- Model-Agnostic Meta-Learning (MAML)- Probabilistic Interpretation of MAML- Meta-Learning with Automated Task Proposals- Extensions to Robot Imitation & Intent Inference

Page 36: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Let’s Talk about Meta-Overfitting

•Meta learning requires task distributions•When there are too few meta-

training tasks, we can meta-overfit• Specifying task distributions is

hard, especially for meta-RL!• Can we propose tasks automatically?

after MAML training after 1 gradient step

Page 37: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

A General Recipe for Unsupervised RL

environmentUnsupervised Meta-RL

Meta-learned environment-specific

RL algorithm

reward-maximizing policy

reward function

Unsupervised Task Acquisition Meta-RL

Fast Adaptation

Gupta, Eysenbach, Finn, Levine. Unsupervised Meta-Learning for Reinforcement Learning.

Page 38: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Random Task Proposals

n Use randomly initialize discriminators for reward functions

n Important: Random functions over state space, not random policies

1 2 3 … nRandom policy – exponentialRandom reward – polynomial

R(s, z) = log pD(z|s)D à randomly initialized network

Page 39: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Diversity-Driven Proposals

Policy(Agent)

Discriminator(D)

Skill (z)

Environment

Action State

Predict Skill

n Policy à visit states which are discriminable

n Discriminator à predict skill from state

Task Reward for UML: R(s, z) = log pD(z|s)

Eysenbach, Gupta, Ibarz, Levine. Diversity is All You Need.

Page 40: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Examples of Acquired Tasks

Eysenbach, Gupta, Ibarz, Levine. Diversity is All You Need.

Cheetah Ant

Page 41: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Does it work?

Gupta, Eysenbach, Finn, Levine. Unsupervised Meta-Learning for Reinforcement Learning.

2D Navigation CheetahAnt

Meta-test performance with rewards

Page 42: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Clavera*, Nagabandi*, Abbeel, Levine, Finn, arxiv 2018

What about varying transition models?

online adaptation = few-shot learningtasks are temporal

Model-Based RL Only

Task distribution is more-or-less free!

Page 43: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Meta-learning using either:Dynamics RNN (RBAC)

MAML (GBAC)

Page 44: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

with MAML

Clavera*, Nagabandi*, Abbeel, Levine, Finn, arxiv 2018

Online Adaptation via Meta-Learning

Page 45: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Outline

- Meta-Learning Problem Statement- Model-Agnostic Meta-Learning (MAML)- Probabilistic Interpretation of MAML- Meta-Learning with Automated Task Proposals- Extensions to Robot Imitation & Intent Inference

Page 46: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Visual imitation is expensive.

Goal: Given one visual demonstration of a new task, learn a policy

learns from raw pixels, but requires many demonstrations

Zhang et al. ‘17Rahmanizadeh et al. ‘17

Through meta-learning: reuse data from other tasks/objects/environments

One-Shot Visual Imitation Learning

Finn*, Yu*, Zhang, Abbeel, Levine CoRL ‘17

Page 47: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

How? Learn to learn the policy from one demonstrationtraining validation

meta-trainingtasks}task 1

task 2… …

Meta-test time:Learn policy from one demo.

One-Shot Visual Imitation Learning

Finn*, Yu*, Zhang, Abbeel, Levine CoRL ‘17

Page 48: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

real-world placing from pixels

subset of training objects

held-out test objects

input demo resulting policy

Finn*, Yu*, Zhang, Abbeel, Levine CoRL ‘17

Page 49: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Few-Shot Learning

Chelsea Finn, UC Berkeley

from Weak Supervision

Given 1 example of 5 classes: Classify new examplesGiven 1 example of 5 classes:Given 1 positive example:

Given one teleoperated demonstration:Given one teleoperated demonstration:Given a video of a human:

Learn a policy.

Grant, Finn, Peterson, Abbott, Levine, Darrell, Griffiths NIPS CIAI Workshop ’17

Yu*, Finn*, Xie, Dasari, Zhang, Abbeel, Levine RSS ’18

Page 50: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Learning to Learn from Weak Supervision

fully supervised

meta-testmeta-training

weakly supervised weakly supervised

Chelsea Finn, UC Berkeley

imitation lossWhat if the weakly supervised loss is unavailable?

Yu*, Finn*, Xie, Dasari, Zhang, Abbeel, Levine RSS ‘18

Page 51: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

placing from pixels

Yu*, Finn*, Xie, Dasari, Zhang, Abbeel, Levine RSS ‘18

input demo resulting policy

Page 52: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

pick-and-place from pixels

2x real time

Yu*, Finn*, Xie, Dasari, Zhang, Abbeel, Levine RSS ‘18

input demo resulting policy

Page 53: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

pick-and-place from pixels

Error in inferring the goal or executing on it?2x real time

input demo resulting policy

Can we decouple goal inference & policy learning?

What should the robot do to improve?

Page 54: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

How? Learn to learn the policy from one demonstrationMeta-Learning Approach

training validation

meta-trainingtasks}task 1

task 2… …

Meta-test time:Learn policy from one demo.

Page 55: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

How? Learn to infer the objective from one demonstrationMeta-Learning Approach

training validation

meta-trainingtasks}task 1

task 2… …

Meta-test time:Learn objective from one demo.Learn policy by optimizing reward.

Page 56: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

- -+ +task 1

task 2

task N

Given a few observations of reaching the goal:

Infer an objective:

meta-trainingtasks}

Xie, Singh, Levine, Finn ’18 (under review)

(then run RL or planning)

Page 57: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Given a few observations of reaching the goal:

Infer an objective:

Xie, Singh, Levine, Finn ’18 (under review)

(then run RL or planning)

only positive examplesboth positive & negatives

MAML with supervised loss:

Test time: only need positive examples

another form of weak supervision!

Page 58: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Given 5 examples of success Visual MPC with learned objective

Xie, Singh, Levine, Finn ’18 (under review)

Novel Object Positioning via Visual Planning

infer goal classifier

visual MPC w.r.t. goal classifier

Page 59: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Xie, Singh, Levine, Finn ’18 (under review)

Visual MPC with learned objective

Novel Object Positioning via Visual Planning

Given 5 examples of success

Page 60: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Given 5 examples of success RL policy with learned objective

Xie, Singh, Levine, Finn ’18 (under review)

Rope Manipulation via Reinforcement Learning

infer goal classifier

visual RL w.r.t. goal classifier

Page 61: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Given 5 examples of success

RL policy with learned objective

Xie, Singh, Levine, Finn ’18 (under review)

Rope Manipulation via Reinforcement Learning

Page 62: Meta-Learning Frontierspeople.eecs.berkeley.edu/~cbfinn/_files/metalearning_frontiers_2018_small.pdf · Finn & Levine, ICLR 2018 Why is this interesting? MAML has benefit of inductive

Questions?

Meta-Learning FoundationsFinn, Abbeel, Levine, Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. ICML ’17Finn & Levine, Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm. ICLR ’18Grant, Finn, Levine, Darrell, Griffiths, Recasting Gradient-Based Meta-Learning as Hierarchical Bayes. ICLR ’18Finn*, Xu*, Levine, Probabilistic Model-Agnostic Meta-Learning. under review ’18

Meta-Learning for ControlFinn*, Yu*, Zhang, Abbeel, Levine, One-Shot Visual Imitation Learning via Meta-Learning. CoRL ’17Yu*, Finn*, Xie, Dasari, Zhang, Abbeel, Levine, One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning. RSS ’18Clavera*, Nagabandi*, Fearing, Abbeel, Levine, Finn, Learning to Adapt: Meta-Learning for Model-Based Control. under review ’18Xu*, Ratner, Dragan, Levine, Finn, Learning a Prior over Intent via Meta-Inverse Reinforcement Learning. under review ’18Gupta, Eysenbach, Finn, Levine, Unsupervised Meta-Learning for Reinforcement Learning. under review ’18Xie, Singh, Levine, Finn, Few-Shot Goal Inference for Visuomotor Learning and Planning. under review ’18