Sachin Ravi Hugo Larochelle - GitHub Pages › deep2Read › talks › 20171102-beilun.pdf2017/11/02...

Optimization as a model for few-shot learning

Sachin Ravi1 Hugo Larochelle1

1Twitter

ICLR, 2017Presenter: Beilun Wang

Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 1 /

18

Outline

1 IntroductionMotivationPrevious SolutionsContributions

2 Proposed Methodsgradient descent and LSTMThe Proposed Method

3 Summary


18

Outline



3 Summary


18

Motivation

Motivation:

Deep Learning has shown great success in a variety of tasks with largeamounts of labeled data.

Perform poorly on few-shot learning tasks

This paper uses an LSTM based meta-learner model to learn theexact optimization algorithm.


18

Problem Setting:

Problem Setting:

Input: meta-sets D . For each D ∈ D has a split of Dtrain and Dtest.

Target: an LSTM-based meta-learner.

Output: a neural network


18

Outline



3 Summary


18

Previous Solutions

gradient-based optimization

momentumadagradAdadeltaADAM

learning to learn

no strong guarantees of speed of convergence

meta-learning

quick acquisition of knowledge within each separate task presentedslower extraction of information learned across all the tasks.


18

Outline



3 Summary


18

Contributions

An LSTM based meta-learner model

Achieve better performance in few-shot learning task


18

Outline



3 Summary

Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 10

/ 18

Gradient-based method

θt = θt−1 − αt∇θt−1Lt


/ 18

the Update for the cell state in an LSTM

ct = ft � ct−1 + it · c̃t

if ft = 1, ct−1 = θt−1, it = αt , and c̃t = −∇θt−1LtThen it equals to gradient-based approach.


/ 18

Outline



3 Summary


/ 18

The formulation of the meta-learner

learning rate it :

it = σ(WI · [∇θt−1Lt ,Lt , θt−1, it−1] + bI )

ft :

ft = σ(WF · [∇θt−1Lt ,Lt , θt−1, ft−1] + bF )


/ 18

Parameter sharing and Normalization

Share parameters across the coordinates of the learner gradient

Each dimension has its own hidden and cell state values but theLSTM parameters are the same across all coordinates.

Normalization the gradients and the losses across different dimensions

x →

{( log(|x |)p , sign(x)) if |x | ≥ e−p

(−1, epx) otherwise(1)


/ 18

Summary Figure


/ 18

Experiment Results–average classification accuracy


/ 18

Summary

This paper proposes an LSTM based meta-learner model.

It improves the performance of training deep Neural networks infew-shot learning tasks.


/ 18

Sachin Ravi Hugo Larochelle - GitHub Pages › deep2Read › talks › 20171102-beilun.pdf2017/11/02...

Documents

Transcript of Sachin Ravi Hugo Larochelle - GitHub Pages › deep2Read › talks › 20171102-beilun.pdf2017/11/02...