Model-Agnostic Meta-Learning for Fast Adaptation …...Model-Agnostic Meta-Learning for Fast...

Post on 28-May-2020

19 views 0 download

Transcript of Model-Agnostic Meta-Learning for Fast Adaptation …...Model-Agnostic Meta-Learning for Fast...

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

Chelsea Finn, Pieter Abbeel, Sergey Levine

Presentener: Siavash Khodadadeh

Overview

● One shot learning● Meta Learning● Model Agnostic Meta Learning● Supervised Learning

○ Experiments● Reinforcement Learning

○ Experiments● Conclusions

● Approaches○ Transfer Learning○ Meta Learning

■ Learn to learn

● Human○ Learn very quickly○ Few examples

One Shot Learning

1 2

1 2 2 ?

Meta Learning Approaches

● One-shot Learning with Memory-Augmented Neural Networks

● Optimization as a Model for Few-Shot Learning● Model-Agnostic Meta Learning

● Use Recurrent networks● Add a memory● Example

○ Character Recognition○ (3 labels)

Memory-Augmented Neural Networks

1 2 1 2 3

Optimization as a Model

Model Agnostic Meta Learning

● Intuition○ Internal representations○ Transferable among tasks

● Transfer Learning○ Good parameters trained on lots of data

● Meta Learning○ Parameters which are sensitive to small changes○ Large improvement on loss function for any task

Problem Definition

● Model f parameterized by ○ Maps x → a○ p( ): tasks distribution○ = { (x1, a1, …, xH, aH), q(x1), q(xt+1|xt, at), H}

■ Supervised learning: H = 1○ K-shot learning: K samples drawn from qi

Model Agnostic Meta Learning

● Method○ For task i model’s parameter become

○ Multiple gradient update also is extendable● Objective

Intuition3

1

2

Model Agnostic Meta Learning for Supervised Learning

● Regression:

● Classification:

Experiments

● Can MAML enable fast learning?● Can MAML be used in different domains?

○ Supervised regression○ Classification○ Reinforcement learning

● Can it be better with more data?

● Sine wave experiments○ Meta Training (700000)

■ Amplitude [0.1, 5.0]■ Phase [0, π]■ K points sampled from [-5.0, 5.0]

Regression Experiments

■ 2 fully connected layers (40 neurons) with ReLU

Regression Experiments

○ Meta Testing■ K samples from a sine wave

i′

○ Evaluation■ Mean squared error for 600 points

Regression Experiments

○ Baselines■ Pretrained Model

● Train on all samples● Finetune on the given sine wave during test● Evaluated on 600 datapoints

■ Oracle

Regression Experiments

K = 5

K = 10

MAML Pretrained

Oracle

Regression Experiments

Classification Examples

● N-way classification:○ Use N class during test with K-shot learning

● Network Architecture○ 4 modules

■ 3 × 3 convolutions and 64 filters■ ReLU nonlinearity■ 2 × 2 max-pooling

● No Convolution○ 256, 128, 64, 64 with Relu

Classification

● Few-shot learning benchmarks○ Omniglot

■ 1623 characters from 50 alphabets● 20 instances each drawn by a different person

■ Training: ● 1200 characters

■ Testing● 423 characters

Classification

● Few-shot learning benchmarks○ MiniImagenet

■ 80 training classes■ 20 test classes

First Order Approximation

Update step

First order approximation

Classification Omniglot

Classification MiniImagenet

Reinforcement Learning

Loss Function:

Reinforcement Learning

● 2D navigation○ Point agent must move to different goal positions○ Target randomly chosen from a unit square○ The agent should be within 0.01 of the goal○ Meta training: 100 iterations of batches of size 20○ Reward: Negative distance to goal○ H = 100 episode horizon limit○ Meta test batch size: 40

● Repeat

Reinforcement Learning

Vanilla Policy Gradient

● Randomly initialize ● Perform K rollouts● Update weights

○ Collected rewardsPolicy Network .

.

Reinforcement Learning

Reinforcement Learning

● Locomotion○ Two different simulated robots by MuJuCo

■ Planar Cheetah■ 3D quadruped (ant)

○ Run in particular direction or with a particular speed.

Reinforcement Learning Results

Reinforcement Learning Results

Reinforcement Learning Results

Reinforcement Learning Results

Conclusions

Model Agnostic Meta Learning

● Applicable on diverse methods○ Have parameters and smooth loss function

● Adaptation can be done with any amount of data● Future Research

○ Multi-task initialization■ CONTINUOUS ADAPTATION VIA META-LEARNING IN NONSTATIONARY AND COMPETITIVE

ENVIRONMENTS (ICLR 2018)

Questions

Thank you!