Meta-Learning...

Meta-Learning Frontiers:Universal, Uncertain, and Unsupervised

Sergey LevineChelsea Finn

UC Berkeley Google Brain

Levine*, Finn*, Darrell, Abbeel, ‘16

Yahya, Li, Kalakrishnan, Chebotar, Levine, ‘16

Generalizable model-based RL via video prediction

Designated PixelGoal Pixel

Ebert, Finn, Lee, Levine. Self-Supervised Visual Planning via Temporal Skip Connections. CoRL 2017.

Ebert, Finn, Lee, Levine. Self-Supervised Visual Planning via Temporal Skip Connections. CoRL 2017.

Kalashnikov, Irpan, Pastor, Ibarz, Herzong, Jang, Quillen, Holly, Kalakrishnan, Vanhoucke, Levine. QT-Opt: Scalable Deep Reinforcement Learning of Vision-Based Robotic Manipulation Skills

what to transfer?people can learn new skills extremely quickly

about four hours about four weeks, nonstop

how?

we never learn from scratch!

representations?models?can we just optimize for what we really want?can we learn to learn?

Outline

- Meta-Learning Problem Statement- Model-Agnostic Meta-Learning (MAML)- Probabilistic Interpretation of MAML- Meta-Learning with Automated Task Proposals- Extensions to Robot Imitation & Intent Inference

The Meta-Learning Problem

Inputs: Outputs:Supervised Learning:

Meta-Supervised Learning:Inputs: Outputs:

Data:

Data:

Why is this view useful?Reduces the problem to the design & optimization of f.

{

Finn, Levine. Meta-learning and Universality: Deep Representation… ICLR 2018

Example: Few-Shot Classification

diagram adapted from Ravi & Larochelle ‘17

Given 1 example of 5 classes: Classify new examples

meta-training

training data test set

trainingclasses

… …

Outline

- Meta-Learning Problem Statement

- Model-Agnostic Meta-Learning (MAML)

- Probabilistic Interpretation of MAML

- Meta-Learning with Automated Task Proposals

- Extensions to Robot Imitation & Intent Inference

Design of f ?Recurrent network

(LSTM, NTM, Conv)Santoro et al. ’16, Duan et al. ’17, Wang et al. ’17, Munkhdalai & Yu ’17, Mishra et al. ’17, …

— complex model for complex task of learning— impractical data requirements

Key idea: Train over many tasks, to learn parameter vector θ that transfers

Fine-tuning

Learning Few-Shot Adaptation

training datafor new task

pretrained parameters

Our method

[test-time]

Finn, Abbeel, Levine ICML ’17

Learning Few-Shot Adaptation

Finn, Abbeel, Levine ICML ’17

optimal parameter vector for task i

parameter vectorbeing meta-learned

Model-Agnostic Meta-Learning

…and the results keep getting better

MiniImagenet few-shot benchmark: 5-shot 5-way

Finn et al. ‘17: 63.11%

Kim et al. ‘18 (AutoMeta): 76.29%Li et al. ‘17: 64.03%

MiniImagenet Few-shot Classification

Design of f ?Recurrent network MAML

network implements the“learned learning procedure”

Does it converge?

What does it converge to?

What to do if not good enough?

Does it converge?- Yes (it’s gradient descent…)

What does it converge to?- A local optimum (it’s gradient descent…)

What to do if not good enough?- Keep taking gradient steps (it’s gradient descent..)

- Who knows…

- Nothing

- Sort of?

Does this structure come at a cost?

Does this structure come at a cost?

For a sufficiently deep f, MAML function can approximate any function of

Finn & Levine, ICLR 2018

Why is this interesting?MAML has benefit of inductive bias without losing expressive power.

Recurrent network MAML

Assumptions:- nonzero - loss function gradient does not lose information about the label- datapoints in are unique

Is this structure useful?Finn, Levine. Meta-learning and Universality: Deep Representation… ICLR 2018

How well can methods generalize to similar, but extrapolated tasks?MAML TCML, MetaNetworks

task variability

perf

orm

ance

Omniglot image classification

The world is non-stationary.


MAML TCMLer

ror

Sinusoid curve regression

task variabilityTakeaway: Strategies learned with MAML consistently generalize better to out-of-distribution tasks

How well can methods generalize to similar, but extrapolated tasks?

The world is non-stationary.


Outline

- Meta-Learning Problem Statement

- Model-Agnostic Meta-Learning (MAML)

- Probabilistic Interpretation of MAML

- Meta-Learning with Automated Task Proposals

- Extensions to Robot Imitation & Intent Inference

A probabilistic interpretation

that’s nice but is it… Bayesian?kind of… but it can be more Bayesian

Grant, Finn, Levine, Darrell, Griffiths. “Recasting gradient-based meta-learning as hierarchical Bayes.”

(Santos, 1996)

A probabilistic interpretation

Grant, Finn, Levine, Darrell, Griffiths. “Recasting gradient-based meta-learning as hierarchical Bayes.”

MAP inference in this model

can we do better than MAP?can use Laplace estimate:

estimate Hessian for neural nets with KFAC

Modeling ambiguity

Important for:

- learning to actively learn

- safety-critical few-shot learning (e.g. medical imaging)

- learning to explore in meta-RL

Can we learn to generate hypotheses about the underlying function??

?

Modeling ambiguity

Can we sample classifiers?

Intuition: we want to learn a prior where a random kick can put us in different modes

smiling, hatsmiling, young

Meta-learning with ambiguity

Finn*, Xu*, Levine. Probabilistic Model-Agnostic Meta-Learning

Sampling parameter vectors

approximate with MAPthis is extremely crudebut extremely convenient!


Sampling parameter vectors

What does ancestral sampling look like?

smiling, hatsmiling, young


Training the model – not so simple!

Can’t do ancestral sampling anymore!

Training the model with amortized inference

Summary

During meta-training:

During meta-testing:

PLATIPUSProbabilistic LATent model for Incorporating Priors and Uncertainty in few-Shot learning


Ambiguous regression:

Ambiguous classification:

PLATIPUSProbabilistic LATent model for Incorporating Priors and Uncertainty in few-Shot learning


Outline


Let’s Talk about Meta-Overfitting

•Meta learning requires task distributions•When there are too few meta-

training tasks, we can meta-overfit• Specifying task distributions is

hard, especially for meta-RL!• Can we propose tasks automatically?

after MAML training after 1 gradient step

A General Recipe for Unsupervised RL

environmentUnsupervised Meta-RL

Meta-learned environment-specific

RL algorithm

reward-maximizing policy

reward function

Unsupervised Task Acquisition Meta-RL

Fast Adaptation

Gupta, Eysenbach, Finn, Levine. Unsupervised Meta-Learning for Reinforcement Learning.

Random Task Proposals

n Use randomly initialize discriminators for reward functions

n Important: Random functions over state space, not random policies

1 2 3 … nRandom policy – exponentialRandom reward – polynomial

R(s, z) = log pD(z|s)D à randomly initialized network

Diversity-Driven Proposals

Policy(Agent)

Discriminator(D)

Skill (z)

Environment

Action State

Predict Skill

n Policy à visit states which are discriminable

n Discriminator à predict skill from state

Task Reward for UML: R(s, z) = log pD(z|s)

Eysenbach, Gupta, Ibarz, Levine. Diversity is All You Need.

Examples of Acquired Tasks

Eysenbach, Gupta, Ibarz, Levine. Diversity is All You Need.

Cheetah Ant

Does it work?

Gupta, Eysenbach, Finn, Levine. Unsupervised Meta-Learning for Reinforcement Learning.

2D Navigation CheetahAnt

Meta-test performance with rewards

Clavera*, Nagabandi*, Abbeel, Levine, Finn, arxiv 2018

What about varying transition models?

online adaptation = few-shot learningtasks are temporal

Model-Based RL Only

Task distribution is more-or-less free!

Meta-learning using either:Dynamics RNN (RBAC)

MAML (GBAC)

with MAML

Clavera*, Nagabandi*, Abbeel, Levine, Finn, arxiv 2018

Online Adaptation via Meta-Learning

Outline


Visual imitation is expensive.

Goal: Given one visual demonstration of a new task, learn a policy

learns from raw pixels, but requires many demonstrations

Zhang et al. ‘17Rahmanizadeh et al. ‘17

Through meta-learning: reuse data from other tasks/objects/environments

One-Shot Visual Imitation Learning

Finn*, Yu*, Zhang, Abbeel, Levine CoRL ‘17

How? Learn to learn the policy from one demonstrationtraining validation

meta-trainingtasks}task 1

task 2… …

Meta-test time:Learn policy from one demo.

One-Shot Visual Imitation Learning


real-world placing from pixels

subset of training objects

held-out test objects

input demo resulting policy


Few-Shot Learning

Chelsea Finn, UC Berkeley

from Weak Supervision

Given 1 example of 5 classes: Classify new examplesGiven 1 example of 5 classes:Given 1 positive example:

Given one teleoperated demonstration:Given one teleoperated demonstration:Given a video of a human:

Learn a policy.

Grant, Finn, Peterson, Abbott, Levine, Darrell, Griffiths NIPS CIAI Workshop ’17

Yu*, Finn*, Xie, Dasari, Zhang, Abbeel, Levine RSS ’18

Learning to Learn from Weak Supervision

fully supervised

meta-testmeta-training

weakly supervised weakly supervised

Chelsea Finn, UC Berkeley

imitation lossWhat if the weakly supervised loss is unavailable?

Yu*, Finn*, Xie, Dasari, Zhang, Abbeel, Levine RSS ‘18

placing from pixels



pick-and-place from pixels

2x real time



pick-and-place from pixels

Error in inferring the goal or executing on it?2x real time


Can we decouple goal inference & policy learning?

What should the robot do to improve?

How? Learn to learn the policy from one demonstrationMeta-Learning Approach

training validation


task 2… …

Meta-test time:Learn policy from one demo.

How? Learn to infer the objective from one demonstrationMeta-Learning Approach

training validation


task 2… …

Meta-test time:Learn objective from one demo.Learn policy by optimizing reward.

- -+ +task 1

task 2

…

task N

Given a few observations of reaching the goal:

Infer an objective:

meta-trainingtasks}

Xie, Singh, Levine, Finn ’18 (under review)

(then run RL or planning)

Given a few observations of reaching the goal:

Infer an objective:


(then run RL or planning)

only positive examplesboth positive & negatives

MAML with supervised loss:

Test time: only need positive examples

another form of weak supervision!

Given 5 examples of success Visual MPC with learned objective


Novel Object Positioning via Visual Planning

infer goal classifier

visual MPC w.r.t. goal classifier


Visual MPC with learned objective

Novel Object Positioning via Visual Planning

Given 5 examples of success

Given 5 examples of success RL policy with learned objective


Rope Manipulation via Reinforcement Learning

infer goal classifier

visual RL w.r.t. goal classifier

Given 5 examples of success

RL policy with learned objective


Rope Manipulation via Reinforcement Learning

Questions?

Meta-Learning FoundationsFinn, Abbeel, Levine, Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. ICML ’17Finn & Levine, Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm. ICLR ’18Grant, Finn, Levine, Darrell, Griffiths, Recasting Gradient-Based Meta-Learning as Hierarchical Bayes. ICLR ’18Finn*, Xu*, Levine, Probabilistic Model-Agnostic Meta-Learning. under review ’18

Meta-Learning for ControlFinn*, Yu*, Zhang, Abbeel, Levine, One-Shot Visual Imitation Learning via Meta-Learning. CoRL ’17Yu*, Finn*, Xie, Dasari, Zhang, Abbeel, Levine, One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning. RSS ’18Clavera*, Nagabandi*, Fearing, Abbeel, Levine, Finn, Learning to Adapt: Meta-Learning for Model-Based Control. under review ’18Xu*, Ratner, Dragan, Levine, Finn, Learning a Prior over Intent via Meta-Inverse Reinforcement Learning. under review ’18Gupta, Eysenbach, Finn, Levine, Unsupervised Meta-Learning for Reinforcement Learning. under review ’18Xie, Singh, Levine, Finn, Few-Shot Goal Inference for Visuomotor Learning and Planning. under review ’18

Meta-Learning...

Documents

Transcript of Meta-Learning...