Sachin Ravi Hugo Larochelle - GitHub Pages › deep2Read › talks › 20171102-beilun.pdf2017/11/02...

Optimization as a model for few-shot learning

Sachin Ravi1 Hugo Larochelle1

1Twitter

ICLR, 2017Presenter: Beilun Wang

Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 1 /

Outline

1 IntroductionMotivationPrevious SolutionsContributions

2 Proposed Methodsgradient descent and LSTMThe Proposed Method

3 Summary

Outline

3 Summary

Motivation

Motivation:

Deep Learning has shown great success in a variety of tasks with largeamounts of labeled data.

Perform poorly on few-shot learning tasks

This paper uses an LSTM based meta-learner model to learn theexact optimization algorithm.

Problem Setting:

Input: meta-sets D . For each D ∈ D has a split of Dtrain and Dtest.

Target: an LSTM-based meta-learner.

Output: a neural network

Outline

3 Summary

Previous Solutions

gradient-based optimization

momentumadagradAdadeltaADAM

learning to learn

no strong guarantees of speed of convergence

meta-learning

quick acquisition of knowledge within each separate task presentedslower extraction of information learned across all the tasks.

Outline

3 Summary

Contributions

An LSTM based meta-learner model

Achieve better performance in few-shot learning task

Outline

3 Summary

Sachin Ravi, Hugo Larochelle (Twitter) Optimization as a model for few-shot learningICLR, 2017 Presenter: Beilun Wang 10

Gradient-based method

θt = θt−1 − αt∇θt−1Lt

the Update for the cell state in an LSTM

ct = ft � ct−1 + it · c̃t

if ft = 1, ct−1 = θt−1, it = αt , and c̃t = −∇θt−1LtThen it equals to gradient-based approach.

Outline

3 Summary

The formulation of the meta-learner

learning rate it :

it = σ(WI · [∇θt−1Lt ,Lt , θt−1, it−1] + bI )

ft = σ(WF · [∇θt−1Lt ,Lt , θt−1, ft−1] + bF )

Parameter sharing and Normalization

Share parameters across the coordinates of the learner gradient

Each dimension has its own hidden and cell state values but theLSTM parameters are the same across all coordinates.

Normalization the gradients and the losses across different dimensions

{( log(|x |)p , sign(x)) if |x | ≥ e−p

(−1, epx) otherwise(1)

Summary Figure

Experiment Results–average classification accuracy

Summary

This paper proposes an LSTM based meta-learner model.

It improves the performance of training deep Neural networks infew-shot learning tasks.

Sachin Ravi Hugo Larochelle - GitHub Pages › deep2Read › talks › 20171102-beilun.pdf2017/11/02...

Documents

Transcript of Sachin Ravi Hugo Larochelle - GitHub Pages › deep2Read › talks › 20171102-beilun.pdf2017/11/02...

Sachin Rawat Crypsis sachin@crypsis

Sachin Doc

sachin jindal

Sachin tendulkar biography

Sachin Rampur

· CEO's Message Sachin Tendulkar faces tough competition. The toughest competitor of Sachin Tendulakr is Sachin Tendulkar himself, Acommitted player, Sachin plays for his team's

Sachin Rawat Crypsis sachin@crypsis.net SDL Threat Modeling.

Sachin Photos

Sachin Corbus

Sachin diwte

Sachin Last

Sachin 0001

Sachin tendulkarx

WisdenIndia Extra Sachin

Sachin Ppt

Sachin chauhan

Sachin Iocl Ppt

Sachin History

Sachin tuli

sachin ten.