Meta-Learning...
Transcript of Meta-Learning...
Meta-Learning Frontiers:Universal, Uncertain, and Unsupervised
Sergey LevineChelsea Finn
UC Berkeley Google Brain
Levine*, Finn*, Darrell, Abbeel, ‘16
Yahya, Li, Kalakrishnan, Chebotar, Levine, ‘16
Generalizable model-based RL via video prediction
Designated PixelGoal Pixel
Ebert, Finn, Lee, Levine. Self-Supervised Visual Planning via Temporal Skip Connections. CoRL 2017.
Ebert, Finn, Lee, Levine. Self-Supervised Visual Planning via Temporal Skip Connections. CoRL 2017.
Kalashnikov, Irpan, Pastor, Ibarz, Herzong, Jang, Quillen, Holly, Kalakrishnan, Vanhoucke, Levine. QT-Opt: Scalable Deep Reinforcement Learning of Vision-Based Robotic Manipulation Skills
Kalashnikov, Irpan, Pastor, Ibarz, Herzong, Jang, Quillen, Holly, Kalakrishnan, Vanhoucke, Levine. QT-Opt: Scalable Deep Reinforcement Learning of Vision-Based Robotic Manipulation Skills
what to transfer?people can learn new skills extremely quickly
about four hours about four weeks, nonstop
how?
we never learn from scratch!
representations?models?can we just optimize for what we really want?can we learn to learn?
Outline
- Meta-Learning Problem Statement- Model-Agnostic Meta-Learning (MAML)- Probabilistic Interpretation of MAML- Meta-Learning with Automated Task Proposals- Extensions to Robot Imitation & Intent Inference
Outline
- Meta-Learning Problem Statement- Model-Agnostic Meta-Learning (MAML)- Probabilistic Interpretation of MAML- Meta-Learning with Automated Task Proposals- Extensions to Robot Imitation & Intent Inference
The Meta-Learning Problem
Inputs: Outputs:Supervised Learning:
Meta-Supervised Learning:Inputs: Outputs:
Data:
Data:
Why is this view useful?Reduces the problem to the design & optimization of f.
{
Finn, Levine. Meta-learning and Universality: Deep Representation… ICLR 2018
Example: Few-Shot Classification
diagram adapted from Ravi & Larochelle ‘17
Given 1 example of 5 classes: Classify new examples
meta-training
training data test set
trainingclasses
… …
Outline
- Meta-Learning Problem Statement
- Model-Agnostic Meta-Learning (MAML)
- Probabilistic Interpretation of MAML
- Meta-Learning with Automated Task Proposals
- Extensions to Robot Imitation & Intent Inference
Design of f ?Recurrent network
(LSTM, NTM, Conv)Santoro et al. ’16, Duan et al. ’17, Wang et al. ’17, Munkhdalai & Yu ’17, Mishra et al. ’17, …
— complex model for complex task of learning— impractical data requirements
Key idea: Train over many tasks, to learn parameter vector θ that transfers
Fine-tuning
Learning Few-Shot Adaptation
training datafor new task
pretrained parameters
Our method
[test-time]
Finn, Abbeel, Levine ICML ’17
Learning Few-Shot Adaptation
Finn, Abbeel, Levine ICML ’17
optimal parameter vector for task i
parameter vectorbeing meta-learned
Model-Agnostic Meta-Learning
…and the results keep getting better
MiniImagenet few-shot benchmark: 5-shot 5-way
Finn et al. ‘17: 63.11%
Kim et al. ‘18 (AutoMeta): 76.29%Li et al. ‘17: 64.03%
MiniImagenet Few-shot Classification
Design of f ?Recurrent network MAML
network implements the“learned learning procedure”
Does it converge?
What does it converge to?
What to do if not good enough?
Does it converge?- Yes (it’s gradient descent…)
What does it converge to?- A local optimum (it’s gradient descent…)
What to do if not good enough?- Keep taking gradient steps (it’s gradient descent..)
- Who knows…
- Nothing
- Sort of?
Does this structure come at a cost?
Does this structure come at a cost?
For a sufficiently deep f, MAML function can approximate any function of
Finn & Levine, ICLR 2018
Why is this interesting?MAML has benefit of inductive bias without losing expressive power.
Recurrent network MAML
Assumptions:- nonzero - loss function gradient does not lose information about the label- datapoints in are unique
Is this structure useful?Finn, Levine. Meta-learning and Universality: Deep Representation… ICLR 2018
How well can methods generalize to similar, but extrapolated tasks?MAML TCML, MetaNetworks
task variability
perf
orm
ance
Omniglot image classification
The world is non-stationary.
Finn, Levine. Meta-learning and Universality: Deep Representation… ICLR 2018
MAML TCMLer
ror
Sinusoid curve regression
task variabilityTakeaway: Strategies learned with MAML consistently generalize better to out-of-distribution tasks
How well can methods generalize to similar, but extrapolated tasks?
The world is non-stationary.
Finn, Levine. Meta-learning and Universality: Deep Representation… ICLR 2018
Outline
- Meta-Learning Problem Statement
- Model-Agnostic Meta-Learning (MAML)
- Probabilistic Interpretation of MAML
- Meta-Learning with Automated Task Proposals
- Extensions to Robot Imitation & Intent Inference
A probabilistic interpretation
that’s nice but is it… Bayesian?kind of… but it can be more Bayesian
Grant, Finn, Levine, Darrell, Griffiths. “Recasting gradient-based meta-learning as hierarchical Bayes.”
(Santos, 1996)
A probabilistic interpretation
Grant, Finn, Levine, Darrell, Griffiths. “Recasting gradient-based meta-learning as hierarchical Bayes.”
MAP inference in this model
can we do better than MAP?can use Laplace estimate:
estimate Hessian for neural nets with KFAC
Modeling ambiguity
Important for:
- learning to actively learn
- safety-critical few-shot learning (e.g. medical imaging)
- learning to explore in meta-RL
Can we learn to generate hypotheses about the underlying function??
?
Modeling ambiguity
Can we sample classifiers?
Intuition: we want to learn a prior where a random kick can put us in different modes
smiling, hatsmiling, young
Meta-learning with ambiguity
Finn*, Xu*, Levine. Probabilistic Model-Agnostic Meta-Learning
Sampling parameter vectors
approximate with MAPthis is extremely crudebut extremely convenient!
Finn*, Xu*, Levine. Probabilistic Model-Agnostic Meta-Learning
Sampling parameter vectors
What does ancestral sampling look like?
smiling, hatsmiling, young
Finn*, Xu*, Levine. Probabilistic Model-Agnostic Meta-Learning
Training the model – not so simple!
Can’t do ancestral sampling anymore!
Training the model with amortized inference
Summary
During meta-training:
During meta-testing:
PLATIPUSProbabilistic LATent model for Incorporating Priors and Uncertainty in few-Shot learning
Finn*, Xu*, Levine. Probabilistic Model-Agnostic Meta-Learning
Ambiguous regression:
Ambiguous classification:
PLATIPUSProbabilistic LATent model for Incorporating Priors and Uncertainty in few-Shot learning
Finn*, Xu*, Levine. Probabilistic Model-Agnostic Meta-Learning
Outline
- Meta-Learning Problem Statement- Model-Agnostic Meta-Learning (MAML)- Probabilistic Interpretation of MAML- Meta-Learning with Automated Task Proposals- Extensions to Robot Imitation & Intent Inference
Let’s Talk about Meta-Overfitting
•Meta learning requires task distributions•When there are too few meta-
training tasks, we can meta-overfit• Specifying task distributions is
hard, especially for meta-RL!• Can we propose tasks automatically?
after MAML training after 1 gradient step
A General Recipe for Unsupervised RL
environmentUnsupervised Meta-RL
Meta-learned environment-specific
RL algorithm
reward-maximizing policy
reward function
Unsupervised Task Acquisition Meta-RL
Fast Adaptation
Gupta, Eysenbach, Finn, Levine. Unsupervised Meta-Learning for Reinforcement Learning.
Random Task Proposals
n Use randomly initialize discriminators for reward functions
n Important: Random functions over state space, not random policies
1 2 3 … nRandom policy – exponentialRandom reward – polynomial
R(s, z) = log pD(z|s)D à randomly initialized network
Diversity-Driven Proposals
Policy(Agent)
Discriminator(D)
Skill (z)
Environment
Action State
Predict Skill
n Policy à visit states which are discriminable
n Discriminator à predict skill from state
Task Reward for UML: R(s, z) = log pD(z|s)
Eysenbach, Gupta, Ibarz, Levine. Diversity is All You Need.
Examples of Acquired Tasks
Eysenbach, Gupta, Ibarz, Levine. Diversity is All You Need.
Cheetah Ant
Does it work?
Gupta, Eysenbach, Finn, Levine. Unsupervised Meta-Learning for Reinforcement Learning.
2D Navigation CheetahAnt
Meta-test performance with rewards
Clavera*, Nagabandi*, Abbeel, Levine, Finn, arxiv 2018
What about varying transition models?
online adaptation = few-shot learningtasks are temporal
Model-Based RL Only
Task distribution is more-or-less free!
Meta-learning using either:Dynamics RNN (RBAC)
MAML (GBAC)
with MAML
Clavera*, Nagabandi*, Abbeel, Levine, Finn, arxiv 2018
Online Adaptation via Meta-Learning
Outline
- Meta-Learning Problem Statement- Model-Agnostic Meta-Learning (MAML)- Probabilistic Interpretation of MAML- Meta-Learning with Automated Task Proposals- Extensions to Robot Imitation & Intent Inference
Visual imitation is expensive.
Goal: Given one visual demonstration of a new task, learn a policy
learns from raw pixels, but requires many demonstrations
Zhang et al. ‘17Rahmanizadeh et al. ‘17
Through meta-learning: reuse data from other tasks/objects/environments
One-Shot Visual Imitation Learning
Finn*, Yu*, Zhang, Abbeel, Levine CoRL ‘17
How? Learn to learn the policy from one demonstrationtraining validation
meta-trainingtasks}task 1
task 2… …
Meta-test time:Learn policy from one demo.
One-Shot Visual Imitation Learning
Finn*, Yu*, Zhang, Abbeel, Levine CoRL ‘17
real-world placing from pixels
subset of training objects
held-out test objects
input demo resulting policy
Finn*, Yu*, Zhang, Abbeel, Levine CoRL ‘17
Few-Shot Learning
Chelsea Finn, UC Berkeley
from Weak Supervision
Given 1 example of 5 classes: Classify new examplesGiven 1 example of 5 classes:Given 1 positive example:
Given one teleoperated demonstration:Given one teleoperated demonstration:Given a video of a human:
Learn a policy.
Grant, Finn, Peterson, Abbott, Levine, Darrell, Griffiths NIPS CIAI Workshop ’17
Yu*, Finn*, Xie, Dasari, Zhang, Abbeel, Levine RSS ’18
Learning to Learn from Weak Supervision
fully supervised
meta-testmeta-training
weakly supervised weakly supervised
Chelsea Finn, UC Berkeley
imitation lossWhat if the weakly supervised loss is unavailable?
Yu*, Finn*, Xie, Dasari, Zhang, Abbeel, Levine RSS ‘18
placing from pixels
Yu*, Finn*, Xie, Dasari, Zhang, Abbeel, Levine RSS ‘18
input demo resulting policy
pick-and-place from pixels
2x real time
Yu*, Finn*, Xie, Dasari, Zhang, Abbeel, Levine RSS ‘18
input demo resulting policy
pick-and-place from pixels
Error in inferring the goal or executing on it?2x real time
input demo resulting policy
Can we decouple goal inference & policy learning?
What should the robot do to improve?
How? Learn to learn the policy from one demonstrationMeta-Learning Approach
training validation
meta-trainingtasks}task 1
task 2… …
Meta-test time:Learn policy from one demo.
How? Learn to infer the objective from one demonstrationMeta-Learning Approach
training validation
meta-trainingtasks}task 1
task 2… …
Meta-test time:Learn objective from one demo.Learn policy by optimizing reward.
- -+ +task 1
task 2
…
task N
Given a few observations of reaching the goal:
Infer an objective:
meta-trainingtasks}
Xie, Singh, Levine, Finn ’18 (under review)
(then run RL or planning)
Given a few observations of reaching the goal:
Infer an objective:
Xie, Singh, Levine, Finn ’18 (under review)
(then run RL or planning)
only positive examplesboth positive & negatives
MAML with supervised loss:
Test time: only need positive examples
another form of weak supervision!
Given 5 examples of success Visual MPC with learned objective
Xie, Singh, Levine, Finn ’18 (under review)
Novel Object Positioning via Visual Planning
infer goal classifier
visual MPC w.r.t. goal classifier
Xie, Singh, Levine, Finn ’18 (under review)
Visual MPC with learned objective
Novel Object Positioning via Visual Planning
Given 5 examples of success
Given 5 examples of success RL policy with learned objective
Xie, Singh, Levine, Finn ’18 (under review)
Rope Manipulation via Reinforcement Learning
infer goal classifier
visual RL w.r.t. goal classifier
Given 5 examples of success
RL policy with learned objective
Xie, Singh, Levine, Finn ’18 (under review)
Rope Manipulation via Reinforcement Learning
Questions?
Meta-Learning FoundationsFinn, Abbeel, Levine, Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. ICML ’17Finn & Levine, Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm. ICLR ’18Grant, Finn, Levine, Darrell, Griffiths, Recasting Gradient-Based Meta-Learning as Hierarchical Bayes. ICLR ’18Finn*, Xu*, Levine, Probabilistic Model-Agnostic Meta-Learning. under review ’18
Meta-Learning for ControlFinn*, Yu*, Zhang, Abbeel, Levine, One-Shot Visual Imitation Learning via Meta-Learning. CoRL ’17Yu*, Finn*, Xie, Dasari, Zhang, Abbeel, Levine, One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning. RSS ’18Clavera*, Nagabandi*, Fearing, Abbeel, Levine, Finn, Learning to Adapt: Meta-Learning for Model-Based Control. under review ’18Xu*, Ratner, Dragan, Levine, Finn, Learning a Prior over Intent via Meta-Inverse Reinforcement Learning. under review ’18Gupta, Eysenbach, Finn, Levine, Unsupervised Meta-Learning for Reinforcement Learning. under review ’18Xie, Singh, Levine, Finn, Few-Shot Goal Inference for Visuomotor Learning and Planning. under review ’18