Sachin Ravi Hugo Larochelle - GitHub Pages › deep2Read › talks › 20171102-beilun.pdf2017/11/02...
Transcript of Sachin Ravi Hugo Larochelle - GitHub Pages › deep2Read › talks › 20171102-beilun.pdf2017/11/02...
Optimization as a model for few-shot learning
Sachin Ravi1 Hugo Larochelle1
1Twitter
ICLR, 2017Presenter: Beilun Wang
Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 1 /
18
Outline
1 IntroductionMotivationPrevious SolutionsContributions
2 Proposed Methodsgradient descent and LSTMThe Proposed Method
3 Summary
Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 2 /
18
Outline
1 IntroductionMotivationPrevious SolutionsContributions
2 Proposed Methodsgradient descent and LSTMThe Proposed Method
3 Summary
Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 3 /
18
Motivation
Motivation:
Deep Learning has shown great success in a variety of tasks with largeamounts of labeled data.
Perform poorly on few-shot learning tasks
This paper uses an LSTM based meta-learner model to learn theexact optimization algorithm.
Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 4 /
18
Problem Setting:
Problem Setting:
Input: meta-sets D . For each D ∈ D has a split of Dtrain and Dtest.
Target: an LSTM-based meta-learner.
Output: a neural network
Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 5 /
18
Outline
1 IntroductionMotivationPrevious SolutionsContributions
2 Proposed Methodsgradient descent and LSTMThe Proposed Method
3 Summary
Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 6 /
18
Previous Solutions
gradient-based optimization
momentumadagradAdadeltaADAM
learning to learn
no strong guarantees of speed of convergence
meta-learning
quick acquisition of knowledge within each separate task presentedslower extraction of information learned across all the tasks.
Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 7 /
18
Outline
1 IntroductionMotivationPrevious SolutionsContributions
2 Proposed Methodsgradient descent and LSTMThe Proposed Method
3 Summary
Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 8 /
18
Contributions
An LSTM based meta-learner model
Achieve better performance in few-shot learning task
Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 9 /
18
Outline
1 IntroductionMotivationPrevious SolutionsContributions
2 Proposed Methodsgradient descent and LSTMThe Proposed Method
3 Summary
Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 10
/ 18
Gradient-based method
θt = θt−1 − αt∇θt−1Lt
Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 11
/ 18
the Update for the cell state in an LSTM
ct = ft � ct−1 + it · c̃t
if ft = 1, ct−1 = θt−1, it = αt , and c̃t = −∇θt−1LtThen it equals to gradient-based approach.
Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 12
/ 18
Outline
1 IntroductionMotivationPrevious SolutionsContributions
2 Proposed Methodsgradient descent and LSTMThe Proposed Method
3 Summary
Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 13
/ 18
The formulation of the meta-learner
learning rate it :
it = σ(WI · [∇θt−1Lt ,Lt , θt−1, it−1] + bI )
ft :
ft = σ(WF · [∇θt−1Lt ,Lt , θt−1, ft−1] + bF )
Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 14
/ 18
Parameter sharing and Normalization
Share parameters across the coordinates of the learner gradient
Each dimension has its own hidden and cell state values but theLSTM parameters are the same across all coordinates.
Normalization the gradients and the losses across different dimensions
x →
{( log(|x |)p , sign(x)) if |x | ≥ e−p
(−1, epx) otherwise(1)
Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 15
/ 18
Summary Figure
Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 16
/ 18
Experiment Results–average classification accuracy
Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 17
/ 18
Summary
This paper proposes an LSTM based meta-learner model.
It improves the performance of training deep Neural networks infew-shot learning tasks.
Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 18
/ 18