Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·...

89
Deep Residual Networks Deep Learning Gets Way Deeper 8:30-10:30am, June 19 ICML 2016 tutorial Kaiming He Facebook AI Research* *as of July 2016. Formerly affiliated with Microsoft Research Asia 1x1 conv, 64 3x3 conv, 64 1x1 conv, 256 1x1 conv, 64 3x3 conv, 64 1x1 conv, 256 1x1 conv, 64 3x3 conv, 64 1x1 conv, 256 1x1 conv, 128, /2 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 256, /2 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 512, /2 3x3 conv, 512 1x1 conv, 2048 1x1 conv, 512 3x3 conv, 512 1x1 conv, 2048 1x1 conv, 512 3x3 conv, 512 1x1 conv, 2048 ave pool, fc 1000 7x7 conv, 64, /2, pool/2

Transcript of Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·...

Page 1: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

DeepResidualNetworksDeepLearningGetsWayDeeper

8:30-10:30am,June19ICML2016tutorial

KaimingHeFacebookAIResearch*

*asofJuly2016.FormerlyaffiliatedwithMicrosoftResearchAsia

1x1conv,64

3x3conv,64

1x1conv,256

1x1conv,64

3x3conv,64

1x1conv,256

1x1conv,64

3x3conv,64

1x1conv,256

1x1conv,128

,/2

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,256

,/2

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,512

,/2

3x3conv,512

1x1conv,2048

1x1conv,512

3x3conv,512

1x1conv,2048

1x1conv,512

3x3conv,512

1x1conv,2048

avepool,fc1

000

7x7conv

,64,/2,pool/2

Page 2: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

Overview

• Introduction• Background• Fromshallowtodeep

• DeepResidualNetworks• From10layersto100layers• From100layersto1000layers

• Applications• Q&A

Page 3: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

Introduction

Page 4: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

Introduction

DeepResidualNetworks(ResNets)• “DeepResidualLearningforImageRecognition”.CVPR2016(nextweek)

• Asimpleandcleanframeworkoftraining“very”deepnets

• State-of-the-artperformancefor• Imageclassification• Objectdetection• Semanticsegmentation• andmore…

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

Page 5: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

ResNets@ILSVRC&COCO2015Competitions

• 1stplacesinallfivemaintracks• ImageNetClassification:“Ultra-deep”152-layer nets• ImageNetDetection: 16% betterthan2nd• ImageNetLocalization: 27% betterthan2nd• COCODetection: 11% betterthan2nd• COCOSegmentation: 12% betterthan2nd

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

*improvementsarerelativenumbers

Page 6: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

RevolutionofDepth

3.57

6.7 7.3

11.7

16.4

25.828.2

ILSVRC'15ResNet

ILSVRC'14GoogleNet

ILSVRC'14VGG

ILSVRC'13 ILSVRC'12AlexNet

ILSVRC'11 ILSVRC'10

ImageNetClassificationtop-5error(%)

shallow8layers

19layers22layers

152layers

8layers

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

Page 7: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

RevolutionofDepth11x11conv,96,/4,pool/2

5x5conv,256,pool/2

3x3conv,384

3x3conv,384

3x3conv,256,pool/2

fc,4096

fc,4096

fc,1000

AlexNet,8layers(ILSVRC2012)

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

Page 8: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

RevolutionofDepth11x11conv,96,/4,pool/2

5x5conv,256,pool/2

3x3conv,384

3x3conv,384

3x3conv,256,pool/2

fc,4096

fc,4096

fc,1000

AlexNet,8layers(ILSVRC2012)

3x3conv,64

3x3conv,64,pool/2

3x3conv,128

3x3conv,128,pool/2

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256,pool/2

3x3conv,512

3x3conv,512

3x3conv,512

3x3conv,512,pool/2

3x3conv,512

3x3conv,512

3x3conv,512

3x3conv,512,pool/2

fc,4096

fc,4096

fc,1000

VGG,19layers(ILSVRC2014)

input

Conv7x7+ 2(S)

MaxPool 3x3+ 2(S)

LocalRespNorm

Conv1x1+ 1(V)

Conv3x3+ 1(S)

LocalRespNorm

MaxPool 3x3+ 2(S)

Conv Conv Conv Conv1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)

Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)

Dept hConcat

Conv Conv Conv Conv1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)

Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)

Dept hConcat

MaxPool 3x3+ 2(S)

Conv Conv Conv Conv1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)

Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)

Dept hConcat

Conv Conv Conv Conv1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)

Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)

AveragePool 5x5+ 3(V)

Dept hConcat

Conv Conv Conv Conv1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)

Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)

Dept hConcat

Conv Conv Conv Conv1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)

Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)

Dept hConcat

Conv Conv Conv Conv1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)

Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)

AveragePool 5x5+ 3(V)

Dept hConcat

MaxPool 3x3+ 2(S)

Conv Conv Conv Conv1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)

Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)

Dept hConcat

Conv Conv Conv Conv1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)

Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)

Dept hConcat

AveragePool 7x7+ 1(V)

FC

Conv1x1+ 1(S)

FC

FC

Soft maxAct ivat ion

soft max0

Conv1x1+ 1(S)

FC

FC

Soft maxAct ivat ion

soft max1

Soft maxAct ivat ion

soft max2

GoogleNet,22layers(ILSVRC2014)

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

Page 9: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

1x1conv,64

3x3conv,64

1x1conv,256

1x1conv,64

3x3conv,64

1x1conv,256

1x1conv,64

3x3conv,64

1x1conv,256

1x1conv,128,/2

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,256,/2

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,512,/2

3x3conv,512

1x1conv,2048

1x1conv,512

3x3conv,512

1x1conv,2048

1x1conv,512

3x3conv,512

1x1conv,2048

avepool,fc1000

7x7conv,64,/2,pool/2

AlexNet,8layers(ILSVRC2012)

RevolutionofDepthResNet,152layers(ILSVRC2015)

3x3conv,64

3x3conv,64,pool/2

3x3conv,128

3x3conv,128,pool/2

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256,pool/2

3x3conv,512

3x3conv,512

3x3conv,512

3x3conv,512,pool/2

3x3conv,512

3x3conv,512

3x3conv,512

3x3conv,512,pool/2

fc,4096

fc,4096

fc,1000

11x11conv,96,/4,pool/2

5x5conv,256,pool/2

3x3conv,384

3x3conv,384

3x3conv,256,pool/2

fc,4096

fc,4096

fc,1000

VGG,19layers(ILSVRC2014)

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

Page 10: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

RevolutionofDepth

34

5866

86

HOG,DPM AlexNet(RCNN)

VGG(RCNN)

ResNet(FasterRCNN)*

PASCALVOC2007ObjectDetectionmAP (%)

shallow8layers

16layers

101layers

*w/otherimprovements&moredata

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

Enginesofvisualrecognition

Page 11: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

ResNet’s objectdetectionresultonCOCO*theoriginalimageisfromtheCOCOdataset

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

Page 12: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

Verysimple,easytofollow

• Manythird-partyimplementations(listinhttps://github.com/KaimingHe/deep-residual-networks)• FacebookAIResearch’sTorchResNet:• Torch,CIFAR-10,withResNet-20toResNet-110,trainingcode,andcurves:code• Lasagne,CIFAR-10,withResNet-32andResNet-56andtrainingcode:code• Neon,CIFAR-10,withpre-trainedResNet-32toResNet-110models,trainingcode,andcurves: code• Torch,MNIST,100layers:blog,code• AwinningentryinKaggle's rightwhalerecognitionchallenge:blog,code• Neon,Place2(mini),40layers:blog,code• …

• Easilyreproducedresults(e.g.TorchResNet:https://github.com/facebook/fb.resnet.torch)

• Aseriesofextensionsandfollow-ups• >200citationsin6monthsafterpostedonarXiv (Dec.2015)

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

Page 13: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

Background

Fromshallowtodeep

Page 14: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

Traditionalrecognition

edges classifier “bus”?

pixelsclassifier “bus”?

histogram classifier “bus”?edges

SIFT/HOG

histogram classifier “bus”?edges K-means/sparsecode

shallower

deeper

Butwhat’snext?

Page 15: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

DeepLearning

histogram classifier “bus”?edges K-means/sparsecode

Specializedcomponents, domainknowledge required

“bus”?

Genericcomponents (“layers”),lessdomainknowledge

“bus”?

Repeatelementarylayers=>Goingdeeper

• End-to-endlearning• Richersolutionspace

Page 16: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

SpectrumofDepth

shallower deeper

5layers:easy

>10layers:initialization,BatchNormalization

>30layers:skipconnections

>100layers:identityskipconnections

>1000layers:?

Page 17: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

Initialization

LeCun etal1998“EfficientBackprop”Glorot &Bengio 2010“Understandingthedifficultyoftrainingdeepfeedforwardneuralnetworks”

input𝑋

output𝑌 = 𝑊𝑋

weight𝑊

1-layer:𝑉𝑎𝑟 𝑦 = (𝑛+,𝑉𝑎𝑟 𝑤 )𝑉𝑎𝑟[𝑥]

Multi-layer:

𝑉𝑎𝑟 𝑦 = (2𝑛3+,𝑉𝑎𝑟 𝑤33

)𝑉𝑎𝑟[𝑥]

If:• Linearactivation• 𝑥, 𝑦,𝑤:independentThen:

𝑛+, 𝑛567

Page 18: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

Initialization

LeCun etal1998“EfficientBackprop”Glorot &Bengio 2010“Understandingthedifficultyoftrainingdeepfeedforwardneuralnetworks”

1 3 5 7 9 11 13 15depth

exploding

vanishing

ideal

Forward:

𝑉𝑎𝑟 𝑦 = (2𝑛3+,𝑉𝑎𝑟 𝑤33

)𝑉𝑎𝑟[𝑥]

Backward:

𝑉𝑎𝑟𝜕𝜕𝑥 = (2𝑛3567𝑉𝑎𝑟 𝑤3

3

)𝑉𝑎𝑟[𝜕𝜕𝑦]

Bothforward(response)andbackward(gradient)signalcanvanish/explode

Page 19: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

Initialization

• Initializationunderlinear assumption

LeCun etal1998“EfficientBackprop”Glorot &Bengio 2010“Understandingthedifficultyoftrainingdeepfeedforwardneuralnetworks”

∏ 𝑛3+,𝑉𝑎𝑟 𝑤33 = 𝑐𝑜𝑛𝑠𝑡>? (healthyforward)and

∏ 𝑛3567𝑉𝑎𝑟 𝑤33 = 𝑐𝑜𝑛𝑠𝑡@?(healthybackward)

𝑛3+,𝑉𝑎𝑟 𝑤3 = 1or*

𝑛3567𝑉𝑎𝑟 𝑤3 = 1

*:𝑛3567 = 𝑛3BC+, ,soD5,E7FGD5,E7HG= ,IJKL

MNO

,HPQKLRS < ∞.

Itissufficienttouseeitherform.

“Xavier”init inCaffe

Page 20: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

Initialization

• InitializationunderReLU activation

∏ CV𝑛3

+,𝑉𝑎𝑟 𝑤33 = 𝑐𝑜𝑛𝑠𝑡>? (healthyforward)and

∏ CV𝑛3

567𝑉𝑎𝑟 𝑤33 = 𝑐𝑜𝑛𝑠𝑡@?(healthybackward)

12𝑛3

+,𝑉𝑎𝑟 𝑤3 = 1or

12𝑛3

567𝑉𝑎𝑟 𝑤3 = 1

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DelvingDeepintoRectifiers:SurpassingHuman-LevelPerformanceonImageNetClassification”.ICCV2015.

With𝐷 layers,afactorof2 perlayerhasexponentialimpactof2Y

“MSRA”init inCaffe

Page 21: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

Initialization

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DelvingDeepintoRectifiers:SurpassingHuman-LevelPerformanceonImageNetClassification”.ICCV2015.

ours

Xavier

22-layerReLU net:goodinit convergesfaster

𝑛𝑉𝑎𝑟 𝑤 = 1oursXavier

30-layerReLU net:goodinit isabletoconverge

12𝑛𝑉𝑎𝑟 𝑤 = 1

12𝑛𝑉𝑎𝑟 𝑤 = 1

𝑛𝑉𝑎𝑟 𝑤 = 1

*Figuresshowthebeginningoftraining

Page 22: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

BatchNormalization(BN)

• Normalizinginput(LeCun etal1998“EfficientBackprop”)

• BN:normalizingeachlayer,foreachmini-batch

• Greatlyacceleratetraining

• Lesssensitivetoinitialization

• Improveregularization

S.Ioffe &C.Szegedy.Batchnormalization:Acceleratingdeepnetworktrainingbyreducing internalcovariateshift.ICML2015

Page 23: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

BatchNormalization(BN)

S.Ioffe &C.Szegedy.Batchnormalization:Acceleratingdeepnetworktrainingbyreducing internalcovariateshift.ICML2015

layer 𝑥 𝑥Z =𝑥 − 𝜇𝜎 𝑦 = 𝛾𝑥Z + 𝛽

• 𝜇:meanof𝑥 inmini-batch• 𝜎:std of𝑥 inmini-batch• 𝛾:scale• 𝛽:shift

• 𝜇,𝜎:functionsof𝑥,analogoustoresponses

• 𝛾, 𝛽:parameterstobelearned,analogoustoweights

Page 24: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

BatchNormalization(BN)

S.Ioffe &C.Szegedy.Batchnormalization:Acceleratingdeepnetworktrainingbyreducing internalcovariateshift.ICML2015

layer 𝑥 𝑥Z =𝑥 − 𝜇𝜎 𝑦 = 𝛾𝑥Z + 𝛽

2modesofBN:• Trainmode:• 𝜇,𝜎 arefunctionsof𝑥;backprop gradients

• Testmode:• 𝜇,𝜎 arepre-computed*ontrainingset

*:byrunning average,orpost-processing aftertraining

Caution:makesureyourBNisinacorrectmode

Page 25: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

BatchNormalization(BN)

S.Ioffe &C.Szegedy.Batchnormalization:Acceleratingdeepnetworktrainingbyreducing internalcovariateshift.ICML2015

Figure takenfrom[S.Ioffe &C.Szegedy]

w/oBNbestofw/BNaccuracy

iter.

Page 26: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

DeepResidualNetworks

From10layersto100layers

Page 27: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

GoingDeeper

• Initializationalgorithms✓• BatchNormalization✓

• Islearningbetternetworksassimpleasstackingmorelayers?

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

Page 28: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

Simplystackinglayers?

0 1 2 3 4 5 60

10

20

iter. (1e4)

trainerror(%)

0 1 2 3 4 5 60

10

20

iter. (1e4)

testerror(%)CIFAR-10

56-layer

20-layer

56-layer

20-layer

• Plain nets:stacking3x3convlayers…• 56-layernethashighertrainingerror andtesterrorthan20-layernet

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

Page 29: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

Simplystackinglayers?

0 1 2 3 4 5 60

5

10

20

iter. (1e4)

erro

r (%

)

plain-20plain-32plain-44plain-56

CIFAR-10

20-layer32-layer44-layer56-layer

0 10 20 30 40 5020

30

40

50

60

iter. (1e4)

erro

r (%

)

plain-18plain-34

ImageNet-1000

34-layer

18-layer

• “Overlydeep”plainnetshavehighertrainingerror• Ageneralphenomenon,observedinmanydatasets

solid:test/valdashed:train

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

Page 30: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

7x7conv,64,/2

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,128,/2

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,256,/2

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,512,/2

3x3conv,512

3x3conv,512

3x3conv,512

fc1000

ashallowermodel

(18layers)

adeepercounterpart(34layers)

7x7conv,64,/2

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,128,/2

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,256,/2

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,512,/2

3x3conv,512

3x3conv,512

3x3conv,512

3x3conv,512

3x3conv,512

fc1000

“extra”layers

• Richersolutionspace

• Adeepermodelshouldnothavehighertrainingerror

• Asolutionbyconstruction:• originallayers:copiedfroma

learnedshallowermodel• extralayers:setasidentity• atleastthesametrainingerror

• Optimizationdifficulties:solverscannotfindthesolutionwhengoingdeeper…

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

Page 31: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

DeepResidualLearning

• Plaintnet

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

anytwostackedlayers

𝑥

𝐻(𝑥)

weightlayer

weightlayer

relu

relu

𝐻 𝑥 isanydesiredmapping,

hopethe2weightlayersfit𝐻(𝑥)

Page 32: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

DeepResidualLearning

• Residual net

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

𝐻 𝑥 isanydesiredmapping,

hopethe2weightlayersfit𝐻(𝑥)

hope the2weightlayersfit𝐹(𝑥)

let𝐻 𝑥 = 𝐹 𝑥 + 𝑥weightlayer

weightlayer

relu

relu

𝑥

𝐻 𝑥 = 𝐹 𝑥 + 𝑥

identity𝑥

𝐹(𝑥)

Page 33: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

DeepResidualLearning

• 𝐹 𝑥 isaresidual mappingw.r.t.identity

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

• Ifidentitywereoptimal,easytosetweightsas0

• Ifoptimalmappingisclosertoidentity,easiertofindsmallfluctuations

weightlayer

weightlayer

relu

relu

𝑥

𝐻 𝑥 = 𝐹 𝑥 + 𝑥

identity𝑥

𝐹(𝑥)

Page 34: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

RelatedWorks– ResidualRepresentations

• VLAD&FisherVector[Jegou etal2010],[Perronnin etal2007]

• Encodingresidual vectors;powerfulshallowerrepresentations.

• ProductQuantization(IVF-ADC)[Jegou etal2011]

• Quantizingresidual vectors;efficientnearest-neighborsearch.

• MultiGrid&HierarchicalPrecondition[Briggs,etal2000],[Szeliski1990,2006]• Solvingresidual sub-problems;efficientPDEsolvers.

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

Page 35: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

7x7conv,64,/2

pool,/2

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,128,/2

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,256,/2

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,512,/2

3x3conv,512

3x3conv,512

3x3conv,512

3x3conv,512

3x3conv,512

avgpool

fc1000

7x7conv,64,/2

pool,/2

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,128,/2

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,256,/2

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,512,/2

3x3conv,512

3x3conv,512

3x3conv,512

3x3conv,512

3x3conv,512

avgpool

fc1000

Network“Design”

• Keepitsimple

• Ourbasicdesign (VGG-style)• all3x3conv(almost)

• spatialsize/2=>#filtersx2(~samecomplexityperlayer)

• Simpledesign;justdeep!

• Otherremarks:• nohiddenfc• nodropout

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

plainnet ResNet

Page 36: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

Training

• Allplain/residualnetsaretrainedfromscratch

• Allplain/residualnetsuseBatchNormalization

• Standardhyper-parameters&augmentation

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

Page 37: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

CIFAR-10experiments

0 1 2 3 4 5 60

5

10

20

iter. (1e4)

erro

r (%

)

plain-20plain-32plain-44plain-56

20-layer32-layer44-layer56-layer

CIFAR-10plainnets

0 1 2 3 4 5 60

5

10

20

iter. (1e4)

erro

r (%

)

ResNet-20ResNet-32ResNet-44ResNet-56ResNet-110

CIFAR-10ResNets

56-layer44-layer32-layer20-layer

110-layer

• DeepResNetscanbetrainedwithoutdifficulties• DeeperResNetshavelowertrainingerror,andalsolowertesterror

solid:testdashed:train

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

Page 38: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

ImageNetexperiments

0 10 20 30 40 5020

30

40

50

60

iter. (1e4)

erro

r (%

)

ResNet-18ResNet-34

0 10 20 30 40 5020

30

40

50

60

iter. (1e4)

erro

r (%

)

plain-18plain-34

ImageNetplainnets ImageNetResNets

solid:testdashed:train

34-layer

18-layer

18-layer

34-layer

• DeepResNetscanbetrainedwithoutdifficulties• DeeperResNetshavelowertrainingerror,andalsolowertesterror

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

Page 39: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

ImageNetexperiments

• Apracticaldesignofgoingdeeper

3x3,64

3x3,64

relu

relu

64-d

3x3,64

1x1,64relu

1x1,256relu

relu

256-d

all-3x3

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

bottleneck(forResNet-50/101/152)

similarcomplexity

Page 40: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

ImageNetexperiments7.4

6.7

6.15.7

4

5

6

7

8

ResNet-34ResNet-50ResNet-101ResNet-15210-crop testing,top-5val error(%)

thismodelhaslowertimecomplexity

thanVGG-16/19

• Deeper ResNetshavelower error

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

Page 41: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

ImageNetexperiments

3.57

6.7 7.3

11.7

16.4

25.828.2

ILSVRC'15ResNet

ILSVRC'14GoogleNet

ILSVRC'14VGG

ILSVRC'13 ILSVRC'12AlexNet

ILSVRC'11 ILSVRC'10

ImageNetClassificationtop-5error(%)

shallow8layers

19layers22layers

152layers

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

8layers

Page 42: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

DiscussionsRepresentation,Optimization,Generalization

Page 43: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

Issuesonlearningdeepmodels

•Representation ability

•Optimization ability

•Generalization ability

• Abilityofmodeltofittrainingdata,ifoptimumcouldbefound

• IfmodelA’ssolutionspaceisasupersetofB’s,Ashouldbebetter.

• Feasibilityoffindinganoptimum

• Notallmodelsareequallyeasytooptimize

• Oncetrainingdataisfit,howgoodisthetestperformance

Page 44: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

HowdoResNetsaddresstheseissues?

•Representation ability

•Optimization ability

•Generalization ability

• Noexplicitadvantageonrepresentation(onlyre-parameterization),but

• Allowmodelstogodeeper

• Enableverysmoothforward/backwardprop

• Greatlyeaseoptimizingdeeper models

• Notexplicitlyaddressgeneralization,but

• Deeper+thinner isgoodgeneralization

Page 45: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

OntheImportanceofIdentityMapping

From100layersto1000layers

Page 46: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

Onidentitymappingsforoptimization

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.

𝑥cBC = 𝑓(ℎ 𝑥c + 𝐹 𝑥c )

𝑥𝑙

ℎ(𝑥c)𝐹(𝑥𝑙)layer

layer

• shortcutmapping:ℎ =identity

• after-addmapping:𝑓 =ReLU

Page 47: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

Onidentitymappingsforoptimization

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.

𝑥cBC = 𝑓(ℎ 𝑥c + 𝐹 𝑥c )

𝑥𝑙

ℎ(𝑥c)𝐹(𝑥𝑙)layer

layer

• shortcutmapping:ℎ =identity

• after-addmapping:𝑓 =ReLU

• Whatif𝑓 =identity?

Page 48: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

Onidentitymappingsforoptimization

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.

𝑥cBC = 𝑓(ℎ 𝑥c + 𝐹 𝑥c )

𝑥𝑙

ℎ(𝑥c)𝐹(𝑥𝑙)layer

layer

• shortcutmapping:ℎ =identity

• after-addmapping:𝑓 =ReLU

• Whatif𝑓 =identity?

Page 49: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

Verysmoothforwardpropagation

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.

𝑥cBC = 𝑥c + 𝐹 𝑥c

𝑥cBV = 𝑥cBC + 𝐹 𝑥cBC

Page 50: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

Verysmoothforwardpropagation

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.

𝑥cBC = 𝑥c + 𝐹 𝑥c

𝑥cBV = 𝑥c + 𝐹 𝑥c + 𝐹 𝑥cBC

𝑥cBV = 𝑥cBC + 𝐹 𝑥cBC

Page 51: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

Verysmoothforwardpropagation

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.

𝑥cBC = 𝑥c + 𝐹 𝑥c

𝑥cBV = 𝑥c + 𝐹 𝑥c + 𝐹 𝑥cBC

𝑥cBV = 𝑥cBC + 𝐹 𝑥cBC

𝑥g = 𝑥c +h𝐹 𝑥+

giC

+jc

Page 52: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

Verysmoothforwardpropagation

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.

𝑥g = 𝑥c +h𝐹 𝑥+

giC

+jc

7x7conv,64,/2

pool,/2

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,128,/2

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,256,/2

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,512,/2

3x3conv,512

3x3conv,512

3x3conv,512

3x3conv,512

3x3conv,512

avgpool

fc1000

7x7conv,64,/2

pool,/2

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,128,/2

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,256,/2

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,512,/2

3x3conv,512

3x3conv,512

3x3conv,512

3x3conv,512

3x3conv,512

avgpool

fc1000

𝑥g

𝑥c• Any𝑥c isdirectly forward-proptoany𝑥g,plus residual.• Any𝑥g isanadditiveoutcome.• incontrastto multiplicative:𝑥g = ∏ 𝑊+𝑥cgiC

+jc

Page 53: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

Verysmoothbackwardpropagation

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.

𝑥g = 𝑥c +h𝐹 𝑥+

giC

+jc

7x7conv,64,/2

pool,/2

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,128,/2

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,256,/2

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,512,/2

3x3conv,512

3x3conv,512

3x3conv,512

3x3conv,512

3x3conv,512

avgpool

fc1000

7x7conv,64,/2

pool,/2

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,128,/2

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,256,/2

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,512,/2

3x3conv,512

3x3conv,512

3x3conv,512

3x3conv,512

3x3conv,512

avgpool

fc1000

𝜕𝐸𝜕𝑥g

𝜕𝐸𝜕𝑥c

𝜕𝐸𝜕𝑥c

=𝜕𝐸𝜕𝑥g

𝜕𝑥g𝜕𝑥c

=𝜕𝐸𝜕𝑥g

(1 +𝜕𝜕𝑥c

h𝐹 𝑥+

giC

+jC

)

Page 54: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

Verysmoothbackwardpropagation

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.

7x7conv,64,/2

pool,/2

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,128,/2

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,256,/2

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,512,/2

3x3conv,512

3x3conv,512

3x3conv,512

3x3conv,512

3x3conv,512

avgpool

fc1000

7x7conv,64,/2

pool,/2

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,128,/2

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,256,/2

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,512,/2

3x3conv,512

3x3conv,512

3x3conv,512

3x3conv,512

3x3conv,512

avgpool

fc1000

𝜕𝐸𝜕𝑥g

𝜕𝐸𝜕𝑥c

𝜕𝐸𝜕𝑥c

=𝜕𝐸𝜕𝑥g

(1 +𝜕𝜕𝑥c

h𝐹 𝑥+

giC

+jC

)

• Any lmlno

isdirectly back-proptoanylmlnp,

plus residual.

• Anylmlnp

is additive;unlikelytovanish

• incontrastto multiplicative:lmlnp = ∏ 𝑊+lmlno

giC+jc

Page 55: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

Residualforeverylayer

𝑥g = 𝑥c +h𝐹 𝑥+

giC

+jc

forward:

𝜕𝐸𝜕𝑥c

=𝜕𝐸𝜕𝑥g

(1 +𝜕𝜕𝑥c

h𝐹 𝑥+

giC

+jC

)backward:

Enabledby:

• shortcutmapping:ℎ =identity

• after-addmapping:𝑓 =identity

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.

Page 56: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

Experiments

• Set1:whatifshortcutmappingℎ ≠identity

• Set2:whatifafter-addmapping𝑓 isidentity

• ExperimentsonResNetswithmorethan100layers• deepermodelssuffermorefromoptimizationdifficulty

Page 57: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

ExperimentSet1:whatifshortcutmappingℎ ≠identity?

Page 58: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.

(f) dropout shortcut(e) conv shortcut

3x3conv

3x3conv

additionReLU

1x1convReLU

3x3conv

3x3conv

addition

dropoutReLU

ReLU

(d) shortcut-only gating(c) exclusive gating

3x3conv

3x3conv

addition

1x1convsigmoid

1-

ReLU

ReLU

3x3conv

3x3conv

addition

1x1convsigmoid

1-

ReLU

ReLU

(a) original (b) constant scaling

3x3conv

3x3conv

additionReLU

ReLU

3x3conv

3x3conv

addition

0.5 0.5

ReLU

ReLU

ℎ 𝑥 = 𝑥error:6.6%

ℎ 𝑥 = 0.5𝑥error:12.4%

ℎ 𝑥 = gate ·𝑥error:8.7%

ℎ 𝑥 = gate ·𝑥error:12.9%

ℎ 𝑥 = conv(𝑥)error:12.2%

ℎ 𝑥 = dropout(𝑥)error:>20%

*ResNet-110onCIFAR-10

*similarto“HighwayNetwork”

Page 59: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.

(f) dropout shortcut(e) conv shortcut

3x3conv

3x3conv

additionReLU

1x1convReLU

3x3conv

3x3conv

addition

dropoutReLU

ReLU

(d) shortcut-only gating(c) exclusive gating

3x3conv

3x3conv

addition

1x1convsigmoid

1-

ReLU

ReLU

3x3conv

3x3conv

addition

1x1convsigmoid

1-

ReLU

ReLU

(a) original (b) constant scaling

3x3conv

3x3conv

additionReLU

ReLU

3x3conv

3x3conv

addition

0.5 0.5

ReLU

ReLU

ℎ 𝑥 = 𝑥error:6.6%

ℎ 𝑥 = 0.5𝑥error:12.4%

ℎ 𝑥 = gate ·𝑥error:8.7%

ℎ 𝑥 = gate ·𝑥error:12.9%

ℎ 𝑥 = conv(𝑥)error:12.2%

ℎ 𝑥 = dropout(𝑥)error:>20%

shortcutsblockedby

multiplications

*ResNet-110onCIFAR-10

Page 60: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

Ifℎ ismultiplicative,e.g.ℎ 𝑥 = λ𝑥

𝑥g = λgic𝑥c +h𝐹u 𝑥+

giC

+jc

forward:

𝜕𝐸𝜕𝑥c

=𝜕𝐸𝜕𝑥g

(λgic +𝜕𝜕𝑥c

h𝐹u 𝑥+

giC

+jC

)backward:

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.

• ifℎ ismultiplicative,shortcutsareblocked

• directpropagationisdecayed

*assuming𝑓 =identity

Page 61: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

3x3conv

3x3conv

addition

1x1convsigmoid

1-

ReLU

ReLU

3x3conv

3x3conv

additionReLU

ReLU

ℎ isidentity

ℎ isgating

• gatingshouldhavebetterrepresentationability(identityisaspecialcase),but

• optimizationdifficultydominatesresultsKaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.

solid:testdashed:train

Page 62: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

ExperimentSet2:whatifafter-addmapping𝑓 isidentity

Page 63: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

BN

ReLU

weight

BN

weight

addition

ReLU

xl

xl+1

BN

ReLU

weight

BN

weight

addition

ReLU

xl

xl+1

ReLU

weight

BN

ReLU

weight

BN

addition

xl

xl+1

𝑓 isReLU(originalResNet)

𝑓 isBN+ReLU 𝑓 isidentity(pre-activation ResNet)

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.

Page 64: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

BN

ReLU

weight

BN

weight

addition

ReLU

xl

xl+1

BN

ReLU

weight

BN

weight

addition

ReLU

xl

xl+1

𝑓 = ReLU 𝑓 = BN+ReLU

𝑓 = ReLU

𝑓 = BN+ReLU

• BNcouldblockprop• Keeptheshortestpassas

smoothaspossible

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.

solid:testdashed:train

Page 65: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

BN

ReLU

weight

BN

weight

addition

ReLU

xl

xl+1

𝑓 = ReLU 𝑓 = identity

ReLU

weight

BN

ReLU

weight

BN

addition

xl

xl+1

1001-layer ResNetsonCIFAR-10

𝑓 = ReLU𝑓 = identity

• ReLUcouldblockpropwhenthereare1000layers

• pre-activationdesigneasesoptimization(andimprovesgeneralization;seepaper)

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.

solid:testdashed:train

Page 66: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

method error (%)

NIN 8.81

DSN 8.22

FitNet 8.39

Highway 7.72

ResNet-110 (1.7M) 6.61

ResNet-1202 (19.4M) 7.93

ResNet-164, pre-activation (1.7M) 5.46

ResNet-1001, pre-activation (10.2M) 4.92 (4 .89±0 .14 )

method error (%)NIN 35.68

DSN 34.57

FitNet 35.04

Highway 32.39

ResNet-164(1.7M) 25.16

ResNet-1001(10.2M) 27.82

ResNet-164, pre-activation (1.7M) 24.33

ResNet-1001, pre-activation (10.2M) 22.71 (22 .68±0 .22 )

ComparisonsonCIFAR-10/100CIFAR-10 CIFAR-100

*allbasedonmoderateaugmentation

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.

Page 67: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

ImageNetExperiments

method data augmentation top-1error(%) top-5error(%)ResNet-152, original scale 21.3 5.5ResNet-152, pre-activation scale 21.1 5.5ResNet-200, original scale 21.8 6.0ResNet-200, pre-activation scale 20.7 5.3ResNet-200, pre-activation scale+aspectratio 20.1* 4.8*

*independentlyreproducedby:https://github.com/facebook/fb.resnet.torch/tree/master/pretrained#notes

trainingcodeandmodelsavailable.

ImageNetsingle-crop(320x320)val error

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.

Page 68: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

Summaryofobservations

• Keeptheshortestpathassmoothaspossible• bymakingℎ and𝑓 identity• forward/backwardsignalsdirectlyflowthroughthispath

• Featuresofanylayersareadditiveoutcomes

• 1000-layer ResNetscanbeeasilytrainedandhavebetteraccuracy

ReLU

weight

BN

ReLU

weight

BN

addition

xl

xl+1

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.

Page 69: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

FutureWorks

• Representation• skipping1layervs.multiplelayers?• Flatvs.Bottleneck?• Inception-ResNet [Szegedy etal2016]• ResNet inResNet [Targ etal2016]• Widthvs.Depth[Zagoruyko &Komodakis 2016]

• Generalization• DropOut,MaxOut,DropConnect,…• DropLayer(StochasticDepth)[Huangetal2016]

• Optimization• Withoutresidual/shortcut?

ReLU

weight

BN

ReLU

weight

BN

addition

xl

xl+1

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.

Page 70: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

Applications

“Featuresmatter”

Page 71: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

“Featuresmatter.”(quote[Girshicketal.2014],theR-CNNpaper)

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

task 2nd-placewinner ResNets margin

(relative)

ImageNetLocalization(top-5error) 12.0 9.0 27%

ImageNetDetection([email protected]) 53.6 62.1 16%

COCO Detection([email protected]:.95) 33.5 37.3 11%

COCOSegmentation([email protected]:.95) 25.1 28.2 12%

• OurresultsareallbasedonResNet-101• Deeperfeaturesarewelltransferrable

absolute8.5%better!

Page 72: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

RevolutionofDepth

34

5866

86

HOG,DPM AlexNet(RCNN)

VGG(RCNN)

ResNet(FasterRCNN)*

PASCALVOC2007ObjectDetectionmAP (%)

shallow8layers

16layers

101layers

*w/otherimprovements&moredata

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

Enginesofvisualrecognition

Page 73: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

DeepLearningforComputerVision

backbonestructure

ImageNetdata

classificationnetwork

pre-train features

detectionnetwork

(e.g.R-CNN)

segmentationnetwork(e.g.FCN)

…...

humanposeestimationnetwork

depthestimationnetwork

targetdata

fine-tune

Page 74: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

Example:ObjectDetection

ü boatü person

ImageClassification(what?)

ObjectDetection(what+where?)

Page 75: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

ObjectDetection:R-CNN

regionproposals~2,000

1CNNforeachregion

Region-based CNNpipeline

figurecredit:R.Girshicketal.

aeroplane? no.

..person? yes.

tvmonitor? no.

warped region..

CNN

inputimage classifyregions

Girshick,Donahue, Darrell,Malik.RichFeatureHierarchiesforAccurateObjectDetectionandSemanticSegmentation.CVPR2014

Page 76: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

ObjectDetection:R-CNN

• R-CNN

Girshick,Donahue, Darrell,Malik.RichFeatureHierarchiesforAccurateObjectDetectionandSemanticSegmentation.CVPR2014

CNN

feature

image

CNN

feature

CNN

feature

CNN

feature

pre-computedRegions-of-Interest

(RoIs)

End-to-Endtraining

Page 77: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

pre-computedRegions-of-Interest

(RoIs)

image

CNN

feature

featurefeature

ObjectDetection:FastR-CNN

• FastR-CNN

Girshick.FastR-CNN.ICCV2015

End-to-Endtraining

sharedconvlayers

RoI pooling

Page 78: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

ObjectDetection:FasterR-CNN

• FasterR-CNN• SolelybasedonCNN• Noexternalmodules• Eachstepisend-to-end

End-to-Endtraining

image

CNN

featuremap

RegionProposalNet

proposals

features

RoI pooling

Shaoqing Ren,KaimingHe,RossGirshick,&JianSun.“FasterR-CNN:TowardsReal-TimeObjectDetectionwithRegionProposalNetworks”.NIPS2015.

Page 79: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

ObjectDetection

backbonestructure

ImageNetdata

classificationnetwork

pre-train features

detectionnetwork

detectiondata

fine-tune

• AlexNet• VGG-16• GoogleNet• ResNet-101• …

• R-CNN• FastR-CNN• FasterR-CNN• MultiBox• SSD• …

“plug-in”features detectors

independentlydeveloped

Page 80: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

ObjectDetection

• Simply“FasterR-CNN+ResNet”

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.ShaoqingRen,KaimingHe,RossGirshick,&JianSun.“FasterR-CNN:TowardsReal-TimeObjectDetectionwithRegionProposalNetworks”.NIPS2015.

image

CNN

featuremap

RegionProposalNet

proposals

classifier

RoI pooling

FasterR-CNNbaseline [email protected] [email protected]:.95

VGG-16 41.5 21.5ResNet-101 48.4 27.2

COCOdetection resultsResNet-101has28%relativegain

vsVGG-16

Page 81: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

ObjectDetection

• RPNlearns proposalsbyextremelydeepnets• Weuseonly300proposals(nohand-designedproposals)

• Addcomponents:• Iterativelocalization• Contextmodeling• Multi-scaletesting

• AllarebasedonCNNfeatures;allareend-to-end

• Allbenefitmore fromdeeper features– cumulativegains!

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.ShaoqingRen,KaimingHe,RossGirshick,&JianSun.“FasterR-CNN:TowardsReal-TimeObjectDetectionwithRegionProposalNetworks”.NIPS2015.

Page 82: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

ResNet’s objectdetectionresultonCOCO*theoriginalimageisfromtheCOCOdataset

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.arXiv2015.ShaoqingRen,KaimingHe,RossGirshick,&JianSun.“FasterR-CNN:TowardsReal-TimeObjectDetectionwithRegionProposalNetworks”.NIPS2015.

Page 83: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.arXiv2015.ShaoqingRen,KaimingHe,RossGirshick,&JianSun.“FasterR-CNN:TowardsReal-TimeObjectDetectionwithRegionProposalNetworks”.NIPS2015.

*theoriginalimageisfromtheCOCOdataset

Page 84: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.arXiv2015.ShaoqingRen,KaimingHe,RossGirshick,&JianSun.“FasterR-CNN:TowardsReal-TimeObjectDetectionwithRegionProposalNetworks”.NIPS2015.

*theoriginalimageisfromtheCOCOdataset

Page 85: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

Resultsonrealvideo.ModelstrainedonMSCOCO(80categories).(frame-by-frame;notemporalprocessing)

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.arXiv2015.ShaoqingRen,KaimingHe,RossGirshick,&JianSun.“FasterR-CNN:TowardsReal-TimeObjectDetectionwithRegionProposalNetworks”.NIPS2015.

thisvideoisavailableonline:https://youtu.be/WZmSMkK9VuA

Page 86: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

MoreVisualRecognitionTasks

ResNet-basedmethodsleadonthesebenchmarks(incompletelist):• ImageNetclassification,detection,localization• MSCOCOdetection,segmentation• PASCALVOCdetection,segmentation

• Humanposeestimation[Newelletal2016]• Depthestimation[Laina etal2016]• Segmentproposal[Pinheiro etal2016]• …

PASCALdetectionleaderboard

PASCALsegmentationleaderboard

ResNet-101

ResNet-101

Page 87: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

PotentialApplications

ResNetshaveshownoutstandingorpromisingresultson:

VisualRecognition

ImageGeneration(PixelRNN,NeuralArt,etc.)

NaturalLanguageProcessing(VerydeepCNN)

SpeechRecognition(preliminaryresults)

Advertising,userprediction(preliminaryresults)

Page 88: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

ConclusionsoftheTutorial

• DeepResidualLearning:• Ultradeepnetworkscanbeeasytotrain• Ultradeepnetworkscangainaccuracyfromdepth• Ultradeeprepresentationsarewelltransferrable• Now200 layersonImageNetand1000layersonCIFAR!

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.

Page 89: Deep Residual Networks - Kaiming He - FAIRkaiminghe.com/icml16tutorial/icml2016_tutorial_deep... ·  · 2017-01-22• Image classification • Object detection ... , & Jian Sun.

Resources

• ModelsandCode• OurImageNetmodelsinCaffe:https://github.com/KaimingHe/deep-residual-networks

• Manyavailableimplementation(listinhttps://github.com/KaimingHe/deep-residual-networks)

• FacebookAIResearch’sTorchResNet:https://github.com/facebook/fb.resnet.torch

• Torch,CIFAR-10,withResNet-20toResNet-110,trainingcode,andcurves:code• Lasagne,CIFAR-10,withResNet-32andResNet-56andtrainingcode:code• Neon,CIFAR-10,withpre-trainedResNet-32toResNet-110models,trainingcode,andcurves:code• Torch,MNIST,100layers:blog,code• AwinningentryinKaggle's rightwhalerecognitionchallenge:blog,code• Neon,Place2(mini),40layers:blog,code• …....

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv2016.