(2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade,...

18
Meta-Learning with Implicit Gradients Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal

Transcript of (2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade,...

Page 1: (2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal Model-Agnostic Meta-Learning Images from: Model-Agnostic

Meta-Learning with Implicit GradientsAravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019)

Mayank Mittal

Page 2: (2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal Model-Agnostic Meta-Learning Images from: Model-Agnostic

Model-Agnostic Meta-Learning

Images from: Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, Finn et al. (2017)

▪ Learn general purpose internal representation that is transferable across different tasks

distribution over tasks

drawn from task

Meta-Learning objective:

loss function

task-specific parameters

Page 3: (2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal Model-Agnostic Meta-Learning Images from: Model-Agnostic

Model-Agnostic Meta-Learning

Images from: Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, Finn et al. (2017)

Inner Level/ Adaptation Step

Outer Level/ Meta-Optimization

Step

Page 4: (2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal Model-Agnostic Meta-Learning Images from: Model-Agnostic

Images from: Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, Finn et al. (2017)

Model-Agnostic Meta-Learning

Page 5: (2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal Model-Agnostic Meta-Learning Images from: Model-Agnostic

Images from: Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, Finn et al. (2017)

Model-Agnostic Meta-Learning

source of problem

Gradients flow over iterative updates of inner-level! ● Need to store optimization path● Computationally expensive

Hence, heuristics such as first-order MAML and Reptile are used

Page 6: (2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal Model-Agnostic Meta-Learning Images from: Model-Agnostic

make this path independent?

Motivation

Page 7: (2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal Model-Agnostic Meta-Learning Images from: Model-Agnostic

Implicit MAML (iMAML)

▪ In MAML, is an iterative gradient descent ▪ dependence of model-parameters on

meta-parameters shrinks and vanishes as number of gradient steps increases

▪ Instead, iMAML uses an optimum solution of the regularized loss

encourages dependency between model-parameters

and meta-parameters

Page 8: (2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal Model-Agnostic Meta-Learning Images from: Model-Agnostic

Implicit MAML (iMAML)

Issue: Still the meta-gradient requires to computation of the Jacobian, i.e.

Soln: Use implicit Jacobian!

Page 9: (2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal Model-Agnostic Meta-Learning Images from: Model-Agnostic

Implicit Jacobian Lemma

where

Proof. Apply stationary point condition:

Page 10: (2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal Model-Agnostic Meta-Learning Images from: Model-Agnostic

Practical Algorithm

Issues:▪ obtaining exact solution to inner level optimization

problem may be not feasible, in practice▪ computing Jacobian by explicitly forming and

inverting the matrix may be intractable for large networks

Meta gradient:

Soln: Use approximations!

Page 11: (2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal Model-Agnostic Meta-Learning Images from: Model-Agnostic

Practical Algorithm

1. Approximate algorithm solution:

2. Approximate matrix inversion product:

Soln: Use approximations!

Solved using Conjugate Gradient (CG) method

Meta gradient:

Page 12: (2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal Model-Agnostic Meta-Learning Images from: Model-Agnostic

Practical AlgorithmImplicit gradient accuracy: (valid under other regularity assumptions)

1. Approximate algorithm solution:

2. Approximate matrix inversion product:

Soln: Use approximations!

Solved using Conjugate Gradient (CG) method

Page 13: (2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal Model-Agnostic Meta-Learning Images from: Model-Agnostic

Implicit MAML Algorithm

Inner Level/ Adaptation

Step

Outer Level/ Meta-Optimiza

tion Step

Page 14: (2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal Model-Agnostic Meta-Learning Images from: Model-Agnostic

Theory: iMAML computes meta-gradient by small error in memory efficient way

Experiment:

Computation Bounds

Page 15: (2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal Model-Agnostic Meta-Learning Images from: Model-Agnostic

Results on few-shots learning

Page 16: (2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal Model-Agnostic Meta-Learning Images from: Model-Agnostic

Future Work

▪ Broader classes for inner loop optimization▪ dynamic programming▪ energy-based models▪ actor-critic RL

▪ More flexible regularizers▪ Learn a vector- or matrix-valued λ

(regularization coefficient)

Page 17: (2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal Model-Agnostic Meta-Learning Images from: Model-Agnostic

References

▪ Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, Finn et al. (2017)

▪ Meta-Learning with Implicit Gradients, Rajeswaran et al. (2019)

▪ Notes on iMAML, Ferenc Huszar (2019)

Page 18: (2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal Model-Agnostic Meta-Learning Images from: Model-Agnostic

Thank you!