Probabilistic Model-Agnostic Meta-Learningcbfinn/_files/icml2018_pgm_platipus.pdfProbabilistic...

11
Probabilistic Model-Agnostic Meta-Learning Chelsea Finn*, Kelvin Xu*, Sergey Levine *equal contribution

Transcript of Probabilistic Model-Agnostic Meta-Learningcbfinn/_files/icml2018_pgm_platipus.pdfProbabilistic...

Page 1: Probabilistic Model-Agnostic Meta-Learningcbfinn/_files/icml2018_pgm_platipus.pdfProbabilistic Model-Agnostic Meta-Learning Chelsea Finn*, Kelvin Xu*, Sergey Levine *equal contribution

Probabilistic Model-Agnostic Meta-LearningChelsea Finn*, Kelvin Xu*, Sergey Levine

*equal contribution

Page 2: Probabilistic Model-Agnostic Meta-Learningcbfinn/_files/icml2018_pgm_platipus.pdfProbabilistic Model-Agnostic Meta-Learning Chelsea Finn*, Kelvin Xu*, Sergey Levine *equal contribution

Deep learning requires a lot of data.

Few-shot learning / meta-learning: incorporate prior knowledge from previous tasks for fast learning of new tasks

Given 1 example of 5 classes: Classify new examples

few-shot image classification

Page 3: Probabilistic Model-Agnostic Meta-Learningcbfinn/_files/icml2018_pgm_platipus.pdfProbabilistic Model-Agnostic Meta-Learning Chelsea Finn*, Kelvin Xu*, Sergey Levine *equal contribution

Ambiguity in Few-Shot Learning

Important for:- learning to actively learn

- safety-critical few-shot learning (e.g. medical imaging)

- learning to explore in meta-RL

Can we learn to generate hypotheses about the underlying function?

?

?

+ -

Page 4: Probabilistic Model-Agnostic Meta-Learningcbfinn/_files/icml2018_pgm_platipus.pdfProbabilistic Model-Agnostic Meta-Learning Chelsea Finn*, Kelvin Xu*, Sergey Levine *equal contribution

Key idea: Train over many tasks, to learn parameter vector θ that transfers

Meta-test time:

Background: Model-Agnostic Meta-LearningOptimize for pretrained parameters

MAML:

Finn, Abbeel, Levine ICML ’17

How to reason about ambiguity?

Page 5: Probabilistic Model-Agnostic Meta-Learningcbfinn/_files/icml2018_pgm_platipus.pdfProbabilistic Model-Agnostic Meta-Learning Chelsea Finn*, Kelvin Xu*, Sergey Levine *equal contribution

Modeling ambiguity

Can we sample classifiers?

Intuition of our approach: learn a prior where a random kick can put us in different modes

smiling, hatsmiling, young

Page 6: Probabilistic Model-Agnostic Meta-Learningcbfinn/_files/icml2018_pgm_platipus.pdfProbabilistic Model-Agnostic Meta-Learning Chelsea Finn*, Kelvin Xu*, Sergey Levine *equal contribution

Meta-learning with ambiguity

Page 7: Probabilistic Model-Agnostic Meta-Learningcbfinn/_files/icml2018_pgm_platipus.pdfProbabilistic Model-Agnostic Meta-Learning Chelsea Finn*, Kelvin Xu*, Sergey Levine *equal contribution

Sampling parameter vectors

approximate with MAPthis is extremely crude

but extremely convenient!

Training is harder. We use amortized variational inference.

(Santos ’92, Grant et al. ICLR ’18)

Page 8: Probabilistic Model-Agnostic Meta-Learningcbfinn/_files/icml2018_pgm_platipus.pdfProbabilistic Model-Agnostic Meta-Learning Chelsea Finn*, Kelvin Xu*, Sergey Levine *equal contribution

PLATIPUSProbabilistic LATent model for Incorporating Priors and Uncertainty in few-Shot learning

Ambiguous 5-shot regression:

Ambiguous 1-shot classification:

Page 9: Probabilistic Model-Agnostic Meta-Learningcbfinn/_files/icml2018_pgm_platipus.pdfProbabilistic Model-Agnostic Meta-Learning Chelsea Finn*, Kelvin Xu*, Sergey Levine *equal contribution

PLATIPUSProbabilistic LATent model for Incorporating Priors and Uncertainty in few-Shot learning

Page 10: Probabilistic Model-Agnostic Meta-Learningcbfinn/_files/icml2018_pgm_platipus.pdfProbabilistic Model-Agnostic Meta-Learning Chelsea Finn*, Kelvin Xu*, Sergey Levine *equal contribution

Collaborators

Questions?

Sergey LevineKelvin Xu

Takeaways

Can use hybrid inference in hierarchical Bayesian model for scalable and uncertainty-aware meta-learning.

Handling ambiguity is particularly relevant in few-shot learning.

Page 11: Probabilistic Model-Agnostic Meta-Learningcbfinn/_files/icml2018_pgm_platipus.pdfProbabilistic Model-Agnostic Meta-Learning Chelsea Finn*, Kelvin Xu*, Sergey Levine *equal contribution

Related concurrent work

Kim et al. “Bayesian Model-Agnostic Meta-Learning”: uses Stein variational gradient descent for sampling parameters

Gordon et al. “Decision-Theoretic Meta-Learning”: unifies a number of meta-learning algorithms under a variational inference framework