Sachin Ravi Hugo Larochelle - GitHub Pages › deep2Read › talks › 20171102-beilun.pdf2017/11/02...

18
Optimization as a model for few-shot learning Sachin Ravi 1 Hugo Larochelle 1 1 Twitter ICLR, 2017 Presenter: Beilun Wang Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learning ICLR, 2017 Presenter: Beilun Wang 1/ 18

Transcript of Sachin Ravi Hugo Larochelle - GitHub Pages › deep2Read › talks › 20171102-beilun.pdf2017/11/02...

Page 1: Sachin Ravi Hugo Larochelle - GitHub Pages › deep2Read › talks › 20171102-beilun.pdf2017/11/02  · Sachin Ravi1 Hugo Larochelle1 1Twitter ICLR, 2017 Presenter: Beilun Wang Sachin

Optimization as a model for few-shot learning

Sachin Ravi1 Hugo Larochelle1

1Twitter

ICLR, 2017Presenter: Beilun Wang

Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 1 /

18

Page 2: Sachin Ravi Hugo Larochelle - GitHub Pages › deep2Read › talks › 20171102-beilun.pdf2017/11/02  · Sachin Ravi1 Hugo Larochelle1 1Twitter ICLR, 2017 Presenter: Beilun Wang Sachin

Outline

1 IntroductionMotivationPrevious SolutionsContributions

2 Proposed Methodsgradient descent and LSTMThe Proposed Method

3 Summary

Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 2 /

18

Page 3: Sachin Ravi Hugo Larochelle - GitHub Pages › deep2Read › talks › 20171102-beilun.pdf2017/11/02  · Sachin Ravi1 Hugo Larochelle1 1Twitter ICLR, 2017 Presenter: Beilun Wang Sachin

Outline

1 IntroductionMotivationPrevious SolutionsContributions

2 Proposed Methodsgradient descent and LSTMThe Proposed Method

3 Summary

Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 3 /

18

Page 4: Sachin Ravi Hugo Larochelle - GitHub Pages › deep2Read › talks › 20171102-beilun.pdf2017/11/02  · Sachin Ravi1 Hugo Larochelle1 1Twitter ICLR, 2017 Presenter: Beilun Wang Sachin

Motivation

Motivation:

Deep Learning has shown great success in a variety of tasks with largeamounts of labeled data.

Perform poorly on few-shot learning tasks

This paper uses an LSTM based meta-learner model to learn theexact optimization algorithm.

Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 4 /

18

Page 5: Sachin Ravi Hugo Larochelle - GitHub Pages › deep2Read › talks › 20171102-beilun.pdf2017/11/02  · Sachin Ravi1 Hugo Larochelle1 1Twitter ICLR, 2017 Presenter: Beilun Wang Sachin

Problem Setting:

Problem Setting:

Input: meta-sets D . For each D ∈ D has a split of Dtrain and Dtest.

Target: an LSTM-based meta-learner.

Output: a neural network

Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 5 /

18

Page 6: Sachin Ravi Hugo Larochelle - GitHub Pages › deep2Read › talks › 20171102-beilun.pdf2017/11/02  · Sachin Ravi1 Hugo Larochelle1 1Twitter ICLR, 2017 Presenter: Beilun Wang Sachin

Outline

1 IntroductionMotivationPrevious SolutionsContributions

2 Proposed Methodsgradient descent and LSTMThe Proposed Method

3 Summary

Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 6 /

18

Page 7: Sachin Ravi Hugo Larochelle - GitHub Pages › deep2Read › talks › 20171102-beilun.pdf2017/11/02  · Sachin Ravi1 Hugo Larochelle1 1Twitter ICLR, 2017 Presenter: Beilun Wang Sachin

Previous Solutions

gradient-based optimization

momentumadagradAdadeltaADAM

learning to learn

no strong guarantees of speed of convergence

meta-learning

quick acquisition of knowledge within each separate task presentedslower extraction of information learned across all the tasks.

Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 7 /

18

Page 8: Sachin Ravi Hugo Larochelle - GitHub Pages › deep2Read › talks › 20171102-beilun.pdf2017/11/02  · Sachin Ravi1 Hugo Larochelle1 1Twitter ICLR, 2017 Presenter: Beilun Wang Sachin

Outline

1 IntroductionMotivationPrevious SolutionsContributions

2 Proposed Methodsgradient descent and LSTMThe Proposed Method

3 Summary

Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 8 /

18

Page 9: Sachin Ravi Hugo Larochelle - GitHub Pages › deep2Read › talks › 20171102-beilun.pdf2017/11/02  · Sachin Ravi1 Hugo Larochelle1 1Twitter ICLR, 2017 Presenter: Beilun Wang Sachin

Contributions

An LSTM based meta-learner model

Achieve better performance in few-shot learning task

Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 9 /

18

Page 10: Sachin Ravi Hugo Larochelle - GitHub Pages › deep2Read › talks › 20171102-beilun.pdf2017/11/02  · Sachin Ravi1 Hugo Larochelle1 1Twitter ICLR, 2017 Presenter: Beilun Wang Sachin

Outline

1 IntroductionMotivationPrevious SolutionsContributions

2 Proposed Methodsgradient descent and LSTMThe Proposed Method

3 Summary

Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 10

/ 18

Page 11: Sachin Ravi Hugo Larochelle - GitHub Pages › deep2Read › talks › 20171102-beilun.pdf2017/11/02  · Sachin Ravi1 Hugo Larochelle1 1Twitter ICLR, 2017 Presenter: Beilun Wang Sachin

Gradient-based method

θt = θt−1 − αt∇θt−1Lt

Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 11

/ 18

Page 12: Sachin Ravi Hugo Larochelle - GitHub Pages › deep2Read › talks › 20171102-beilun.pdf2017/11/02  · Sachin Ravi1 Hugo Larochelle1 1Twitter ICLR, 2017 Presenter: Beilun Wang Sachin

the Update for the cell state in an LSTM

ct = ft � ct−1 + it · c̃t

if ft = 1, ct−1 = θt−1, it = αt , and c̃t = −∇θt−1LtThen it equals to gradient-based approach.

Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 12

/ 18

Page 13: Sachin Ravi Hugo Larochelle - GitHub Pages › deep2Read › talks › 20171102-beilun.pdf2017/11/02  · Sachin Ravi1 Hugo Larochelle1 1Twitter ICLR, 2017 Presenter: Beilun Wang Sachin

Outline

1 IntroductionMotivationPrevious SolutionsContributions

2 Proposed Methodsgradient descent and LSTMThe Proposed Method

3 Summary

Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 13

/ 18

Page 14: Sachin Ravi Hugo Larochelle - GitHub Pages › deep2Read › talks › 20171102-beilun.pdf2017/11/02  · Sachin Ravi1 Hugo Larochelle1 1Twitter ICLR, 2017 Presenter: Beilun Wang Sachin

The formulation of the meta-learner

learning rate it :

it = σ(WI · [∇θt−1Lt ,Lt , θt−1, it−1] + bI )

ft :

ft = σ(WF · [∇θt−1Lt ,Lt , θt−1, ft−1] + bF )

Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 14

/ 18

Page 15: Sachin Ravi Hugo Larochelle - GitHub Pages › deep2Read › talks › 20171102-beilun.pdf2017/11/02  · Sachin Ravi1 Hugo Larochelle1 1Twitter ICLR, 2017 Presenter: Beilun Wang Sachin

Parameter sharing and Normalization

Share parameters across the coordinates of the learner gradient

Each dimension has its own hidden and cell state values but theLSTM parameters are the same across all coordinates.

Normalization the gradients and the losses across different dimensions

x →

{( log(|x |)p , sign(x)) if |x | ≥ e−p

(−1, epx) otherwise(1)

Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 15

/ 18

Page 16: Sachin Ravi Hugo Larochelle - GitHub Pages › deep2Read › talks › 20171102-beilun.pdf2017/11/02  · Sachin Ravi1 Hugo Larochelle1 1Twitter ICLR, 2017 Presenter: Beilun Wang Sachin

Summary Figure

Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 16

/ 18

Page 17: Sachin Ravi Hugo Larochelle - GitHub Pages › deep2Read › talks › 20171102-beilun.pdf2017/11/02  · Sachin Ravi1 Hugo Larochelle1 1Twitter ICLR, 2017 Presenter: Beilun Wang Sachin

Experiment Results–average classification accuracy

Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 17

/ 18

Page 18: Sachin Ravi Hugo Larochelle - GitHub Pages › deep2Read › talks › 20171102-beilun.pdf2017/11/02  · Sachin Ravi1 Hugo Larochelle1 1Twitter ICLR, 2017 Presenter: Beilun Wang Sachin

Summary

This paper proposes an LSTM based meta-learner model.

It improves the performance of training deep Neural networks infew-shot learning tasks.

Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 18

/ 18