cvpr2016 deep residual learning kaiminghekaiminghe.com/cvpr16resnet/cvpr2016_deep_residual... ·...

27
Deep Residual Learning for Image Recognition Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun work done at Microsoft Research Asia 1x1 conv, 64 3x3 conv, 64 1x1 conv, 256 1x1 conv, 64 3x3 conv, 64 1x1 conv, 256 1x1 conv, 64 3x3 conv, 64 1x1 conv, 256 1x1 conv, 128, /2 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 256, /2 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 512, /2 3x3 conv, 512 1x1 conv, 2048 1x1 conv, 512 3x3 conv, 512 1x1 conv, 2048 1x1 conv, 512 3x3 conv, 512 1x1 conv, 2048 ave pool, fc 1000 7x7 conv, 64, /2, pool/2

Transcript of cvpr2016 deep residual learning kaiminghekaiminghe.com/cvpr16resnet/cvpr2016_deep_residual... ·...

Page 1: cvpr2016 deep residual learning kaiminghekaiminghe.com/cvpr16resnet/cvpr2016_deep_residual... · 2017-01-22 · Deep Residual Learning for Image Recognition Kaiming He, Xiangyu Zhang,

DeepResidualLearningforImageRecognition

KaimingHe,XiangyuZhang,ShaoqingRen,JianSun

workdoneatMicrosoftResearchAsia

1x1conv,64

3x3conv,64

1x1conv,256

1x1conv,64

3x3conv,64

1x1conv,256

1x1conv,64

3x3conv,64

1x1conv,256

1x1conv,128

,/2

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,256

,/2

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,512

,/2

3x3conv,512

1x1conv,2048

1x1conv,512

3x3conv,512

1x1conv,2048

1x1conv,512

3x3conv,512

1x1conv,2048

avepool,fc1

000

7x7conv

,64,/2,pool/2

Page 2: cvpr2016 deep residual learning kaiminghekaiminghe.com/cvpr16resnet/cvpr2016_deep_residual... · 2017-01-22 · Deep Residual Learning for Image Recognition Kaiming He, Xiangyu Zhang,

ResNet @ILSVRC&COCO2015Competitions

1stplacesinallfivemaintracks• ImageNetClassification:“Ultra-deep”152-layer nets• ImageNetDetection: 16% betterthan2nd• ImageNetLocalization: 27% betterthan2nd• COCODetection: 11% betterthan2nd• COCOSegmentation: 12% betterthan2nd

*improvementsarerelativenumbersKaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

Page 3: cvpr2016 deep residual learning kaiminghekaiminghe.com/cvpr16resnet/cvpr2016_deep_residual... · 2017-01-22 · Deep Residual Learning for Image Recognition Kaiming He, Xiangyu Zhang,

RevolutionofDepth

3.57

6.7 7.3

11.7

16.4

25.828.2

ILSVRC'15ResNet

ILSVRC'14GoogleNet

ILSVRC'14VGG

ILSVRC'13 ILSVRC'12AlexNet

ILSVRC'11 ILSVRC'10

ImageNetClassificationtop-5error(%)

shallow8layers

19layers22layers

152layers

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

8layers

Page 4: cvpr2016 deep residual learning kaiminghekaiminghe.com/cvpr16resnet/cvpr2016_deep_residual... · 2017-01-22 · Deep Residual Learning for Image Recognition Kaiming He, Xiangyu Zhang,

RevolutionofDepth

34

5866

86

HOG,DPM AlexNet(RCNN)

VGG(RCNN)

ResNet(FasterRCNN)*

PASCALVOC2007ObjectDetectionmAP (%)

shallow8layers

16layers

101layers

*w/otherimprovements&moredata

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

Enginesofvisualrecognition

Page 5: cvpr2016 deep residual learning kaiminghekaiminghe.com/cvpr16resnet/cvpr2016_deep_residual... · 2017-01-22 · Deep Residual Learning for Image Recognition Kaiming He, Xiangyu Zhang,

RevolutionofDepth11x11conv,96,/4,pool/2

5x5conv,256,pool/2

3x3conv,384

3x3conv,384

3x3conv,256,pool/2

fc,4096

fc,4096

fc,1000

AlexNet,8layers(ILSVRC2012)

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

Page 6: cvpr2016 deep residual learning kaiminghekaiminghe.com/cvpr16resnet/cvpr2016_deep_residual... · 2017-01-22 · Deep Residual Learning for Image Recognition Kaiming He, Xiangyu Zhang,

RevolutionofDepth11x11conv,96,/4,pool/2

5x5conv,256,pool/2

3x3conv,384

3x3conv,384

3x3conv,256,pool/2

fc,4096

fc,4096

fc,1000

AlexNet,8layers(ILSVRC2012)

3x3conv,64

3x3conv,64,pool/2

3x3conv,128

3x3conv,128,pool/2

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256,pool/2

3x3conv,512

3x3conv,512

3x3conv,512

3x3conv,512,pool/2

3x3conv,512

3x3conv,512

3x3conv,512

3x3conv,512,pool/2

fc,4096

fc,4096

fc,1000

VGG,19layers(ILSVRC2014)

input

Conv7x7+ 2(S)

MaxPool 3x3+ 2(S)

LocalRespNorm

Conv1x1+ 1(V)

Conv3x3+ 1(S)

LocalRespNorm

MaxPool 3x3+ 2(S)

Conv Conv Conv Conv1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)

Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)

Dept hConcat

Conv Conv Conv Conv1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)

Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)

Dept hConcat

MaxPool 3x3+ 2(S)

Conv Conv Conv Conv1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)

Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)

Dept hConcat

Conv Conv Conv Conv1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)

Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)

AveragePool 5x5+ 3(V)

Dept hConcat

Conv Conv Conv Conv1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)

Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)

Dept hConcat

Conv Conv Conv Conv1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)

Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)

Dept hConcat

Conv Conv Conv Conv1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)

Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)

AveragePool 5x5+ 3(V)

Dept hConcat

MaxPool 3x3+ 2(S)

Conv Conv Conv Conv1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)

Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)

Dept hConcat

Conv Conv Conv Conv1x1+ 1(S) 3x3+ 1(S) 5x5+ 1(S) 1x1+ 1(S)

Conv Conv MaxPool 1x1+ 1(S) 1x1+ 1(S) 3x3+ 1(S)

Dept hConcat

AveragePool 7x7+ 1(V)

FC

Conv1x1+ 1(S)

FC

FC

Soft maxAct ivat ion

soft max0

Conv1x1+ 1(S)

FC

FC

Soft maxAct ivat ion

soft max1

Soft maxAct ivat ion

soft max2

GoogleNet,22layers(ILSVRC2014)

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

Page 7: cvpr2016 deep residual learning kaiminghekaiminghe.com/cvpr16resnet/cvpr2016_deep_residual... · 2017-01-22 · Deep Residual Learning for Image Recognition Kaiming He, Xiangyu Zhang,

AlexNet,8layers(ILSVRC2012)

RevolutionofDepthResNet,152layers(ILSVRC2015)

3x3conv,64

3x3conv,64,pool/2

3x3conv,128

3x3conv,128,pool/2

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256,pool/2

3x3conv,512

3x3conv,512

3x3conv,512

3x3conv,512,pool/2

3x3conv,512

3x3conv,512

3x3conv,512

3x3conv,512,pool/2

fc,4096

fc,4096

fc,1000

11x11conv,96,/4,pool/2

5x5conv,256,pool/2

3x3conv,384

3x3conv,384

3x3conv,256,pool/2

fc,4096

fc,4096

fc,1000

1x1conv,64

3x3conv,64

1x1conv,256

1x1conv,64

3x3conv,64

1x1conv,256

1x1conv,64

3x3conv,64

1x1conv,256

1x2conv,128,/2

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,128

3x3conv,128

1x1conv,512

1x1conv,256,/2

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,256

3x3conv,256

1x1conv,1024

1x1conv,512,/2

3x3conv,512

1x1conv,2048

1x1conv,512

3x3conv,512

1x1conv,2048

1x1conv,512

3x3conv,512

1x1conv,2048

avepool,fc1000

7x7conv,64,/2,pool/2

VGG,19layers(ILSVRC2014)

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

Page 8: cvpr2016 deep residual learning kaiminghekaiminghe.com/cvpr16resnet/cvpr2016_deep_residual... · 2017-01-22 · Deep Residual Learning for Image Recognition Kaiming He, Xiangyu Zhang,

Islearningbetternetworksassimpleasstackingmorelayers?

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

Page 9: cvpr2016 deep residual learning kaiminghekaiminghe.com/cvpr16resnet/cvpr2016_deep_residual... · 2017-01-22 · Deep Residual Learning for Image Recognition Kaiming He, Xiangyu Zhang,

Simplystackinglayers?

0 1 2 3 4 5 60

10

20

iter. (1e4)

trainerror(%)

0 1 2 3 4 5 60

10

20

iter. (1e4)

testerror(%)CIFAR-10

56-layer

20-layer

56-layer

20-layer

• Plain nets:stacking3x3convlayers…• 56-layernethashighertrainingerror andtesterrorthan20-layernet

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

Page 10: cvpr2016 deep residual learning kaiminghekaiminghe.com/cvpr16resnet/cvpr2016_deep_residual... · 2017-01-22 · Deep Residual Learning for Image Recognition Kaiming He, Xiangyu Zhang,

Simplystackinglayers?

0 1 2 3 4 5 60

5

10

20

iter. (1e4)

erro

r (%

)

plain-20plain-32plain-44plain-56

CIFAR-10

20-layer32-layer44-layer56-layer

0 10 20 30 40 5020

30

40

50

60

iter. (1e4)

erro

r (%

)

plain-18plain-34

ImageNet-1000

34-layer

18-layer

• “Overlydeep”plainnetshavehighertrainingerror• Ageneralphenomenon,observedinmanydatasets

solid:test/valdashed:train

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

Page 11: cvpr2016 deep residual learning kaiminghekaiminghe.com/cvpr16resnet/cvpr2016_deep_residual... · 2017-01-22 · Deep Residual Learning for Image Recognition Kaiming He, Xiangyu Zhang,

7x7conv,64,/2

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,128,/2

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,256,/2

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,512,/2

3x3conv,512

3x3conv,512

3x3conv,512

fc1000

ashallowermodel

(18layers)

adeepercounterpart(34layers)

7x7conv,64,/2

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,128,/2

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,256,/2

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,512,/2

3x3conv,512

3x3conv,512

3x3conv,512

3x3conv,512

3x3conv,512

fc1000

“extra”layers

• Richersolutionspace

• Adeepermodelshouldnothavehighertrainingerror

• Asolutionbyconstruction:• originallayers:copiedfroma

learnedshallowermodel• extralayers:setasidentity• atleastthesametrainingerror

• Optimizationdifficulties:solverscannotfindthesolutionwhengoingdeeper…

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

Page 12: cvpr2016 deep residual learning kaiminghekaiminghe.com/cvpr16resnet/cvpr2016_deep_residual... · 2017-01-22 · Deep Residual Learning for Image Recognition Kaiming He, Xiangyu Zhang,

DeepResidualLearning

• Plaintnet

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

anytwostackedlayers

𝑥

𝐻(𝑥)

weightlayer

weightlayer

relu

relu

𝐻 𝑥 isanydesiredmapping,

hopethe2weightlayersfit𝐻(𝑥)

Page 13: cvpr2016 deep residual learning kaiminghekaiminghe.com/cvpr16resnet/cvpr2016_deep_residual... · 2017-01-22 · Deep Residual Learning for Image Recognition Kaiming He, Xiangyu Zhang,

DeepResidualLearning

• Residual net

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

𝐻 𝑥 isanydesiredmapping,

hopethe2weightlayersfit𝐻(𝑥)

hope the2weightlayersfit𝐹(𝑥)

let𝐻 𝑥 = 𝐹 𝑥 + 𝑥weightlayer

weightlayer

relu

relu

𝑥

𝐻 𝑥 = 𝐹 𝑥 + 𝑥

identity𝑥

𝐹(𝑥)

Page 14: cvpr2016 deep residual learning kaiminghekaiminghe.com/cvpr16resnet/cvpr2016_deep_residual... · 2017-01-22 · Deep Residual Learning for Image Recognition Kaiming He, Xiangyu Zhang,

DeepResidualLearning

• 𝐹 𝑥 isaresidual mappingw.r.t.identity

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

• Ifidentitywereoptimal,easytosetweightsas0

• Ifoptimalmappingisclosertoidentity,easiertofindsmallfluctuations

weightlayer

weightlayer

relu

relu

𝑥

𝐻 𝑥 = 𝐹 𝑥 + 𝑥

identity𝑥

𝐹(𝑥)

Page 15: cvpr2016 deep residual learning kaiminghekaiminghe.com/cvpr16resnet/cvpr2016_deep_residual... · 2017-01-22 · Deep Residual Learning for Image Recognition Kaiming He, Xiangyu Zhang,

Network“Design”

• Keepitsimple

• Ourbasicdesign (VGG-style)• all3x3conv(almost)

• spatialsize/2=>#filtersx2• Simpledesign;justdeep!

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

7x7conv,64,/2

pool,/2

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,128,/2

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,256,/2

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,512,/2

3x3conv,512

3x3conv,512

3x3conv,512

3x3conv,512

3x3conv,512

avgpool

fc1000

7x7conv,64,/2

pool,/2

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,64

3x3conv,128,/2

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,128

3x3conv,256,/2

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,256

3x3conv,512,/2

3x3conv,512

3x3conv,512

3x3conv,512

3x3conv,512

3x3conv,512

avgpool

fc1000

plainnet ResNet

Page 16: cvpr2016 deep residual learning kaiminghekaiminghe.com/cvpr16resnet/cvpr2016_deep_residual... · 2017-01-22 · Deep Residual Learning for Image Recognition Kaiming He, Xiangyu Zhang,

CIFAR-10experiments

0 1 2 3 4 5 60

5

10

20

iter. (1e4)

erro

r (%

)

plain-20plain-32plain-44plain-56

20-layer32-layer44-layer56-layer

CIFAR-10plainnets

0 1 2 3 4 5 60

5

10

20

iter. (1e4)

erro

r (%

)

ResNet-20ResNet-32ResNet-44ResNet-56ResNet-110

CIFAR-10ResNets

56-layer44-layer32-layer20-layer

110-layer

• DeepResNetscanbetrainedwithoutdifficulties• DeeperResNetshavelowertrainingerror,andalsolowertesterror

solid:testdashed:train

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

Page 17: cvpr2016 deep residual learning kaiminghekaiminghe.com/cvpr16resnet/cvpr2016_deep_residual... · 2017-01-22 · Deep Residual Learning for Image Recognition Kaiming He, Xiangyu Zhang,

ImageNetexperiments

0 10 20 30 40 5020

30

40

50

60

iter. (1e4)

erro

r (%

)

ResNet-18ResNet-34

0 10 20 30 40 5020

30

40

50

60

iter. (1e4)

erro

r (%

)

plain-18plain-34

ImageNetplainnets ImageNetResNets

solid:testdashed:train

34-layer

18-layer

18-layer

34-layer

• DeepResNetscanbetrainedwithoutdifficulties• DeeperResNetshavelowertrainingerror,andalsolowertesterror

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

Page 18: cvpr2016 deep residual learning kaiminghekaiminghe.com/cvpr16resnet/cvpr2016_deep_residual... · 2017-01-22 · Deep Residual Learning for Image Recognition Kaiming He, Xiangyu Zhang,

ImageNetexperiments7.4

6.7

6.15.7

4

5

6

7

8

ResNet-34ResNet-50ResNet-101ResNet-15210-crop testing,top-5val error(%)

thismodelhaslowertimecomplexity

thanVGG-16/19

• Deeper ResNetshavelower error

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

Page 19: cvpr2016 deep residual learning kaiminghekaiminghe.com/cvpr16resnet/cvpr2016_deep_residual... · 2017-01-22 · Deep Residual Learning for Image Recognition Kaiming He, Xiangyu Zhang,

Beyondclassification

AtreasurefromImageNetisonlearningfeatures.

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.arXiv2015.

Page 20: cvpr2016 deep residual learning kaiminghekaiminghe.com/cvpr16resnet/cvpr2016_deep_residual... · 2017-01-22 · Deep Residual Learning for Image Recognition Kaiming He, Xiangyu Zhang,

“Featuresmatter.”(quote[Girshicketal.2014],theR-CNNpaper)

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

task 2nd-placewinner ResNets margin

(relative)

ImageNetLocalization(top-5error) 12.0 9.0 27%

ImageNetDetection([email protected]) 53.6 62.1 16%

COCO Detection([email protected]:.95) 33.5 37.3 11%

COCOSegmentation([email protected]:.95) 25.1 28.2 12%

• OurresultsareallbasedonResNet-101• Ourfeaturesarewelltransferrable

absolute8.5%better!

Page 21: cvpr2016 deep residual learning kaiminghekaiminghe.com/cvpr16resnet/cvpr2016_deep_residual... · 2017-01-22 · Deep Residual Learning for Image Recognition Kaiming He, Xiangyu Zhang,

ObjectDetection(brief)

• Simply“FasterR-CNN+ResNet”

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.ShaoqingRen,KaimingHe,RossGirshick,&JianSun.“FasterR-CNN:TowardsReal-TimeObjectDetectionwithRegionProposalNetworks”.NIPS2015.

image

CNN

featuremap

RegionProposalNet

proposals

classifier

RoI pooling

FasterR-CNNbaseline [email protected] [email protected]:.95

VGG-16 41.5 21.5ResNet-101 48.4 27.2

COCOdetection results(ResNethas28%relativegain)

Page 22: cvpr2016 deep residual learning kaiminghekaiminghe.com/cvpr16resnet/cvpr2016_deep_residual... · 2017-01-22 · Deep Residual Learning for Image Recognition Kaiming He, Xiangyu Zhang,

OurresultsonMSCOCOKaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

ShaoqingRen,KaimingHe,RossGirshick,&JianSun.“FasterR-CNN:TowardsReal-TimeObjectDetectionwithRegionProposalNetworks”.NIPS2015.

*theoriginalimageisfromtheCOCOdataset

Page 23: cvpr2016 deep residual learning kaiminghekaiminghe.com/cvpr16resnet/cvpr2016_deep_residual... · 2017-01-22 · Deep Residual Learning for Image Recognition Kaiming He, Xiangyu Zhang,

Resultsonrealvideo.ModeltrainedonMSCOCOw/80categories.(frame-by-frame;notemporalprocessing)

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.arXiv2015.ShaoqingRen,KaimingHe,RossGirshick,&JianSun.“FasterR-CNN:TowardsReal-TimeObjectDetectionwithRegionProposalNetworks”.NIPS2015.

thisvideoisavailableonline:https://youtu.be/WZmSMkK9VuA

Page 24: cvpr2016 deep residual learning kaiminghekaiminghe.com/cvpr16resnet/cvpr2016_deep_residual... · 2017-01-22 · Deep Residual Learning for Image Recognition Kaiming He, Xiangyu Zhang,

MoreVisualRecognitionTasksResNets leadonthesebenchmarks(incompletelist):• ImageNet classification,detection,localization• MSCOCO detection,segmentation

• PASCALVOC detection,segmentation• VQA challenge2016

• Humanposeestimation[Newelletal2016]• Depthestimation[Laina etal2016]• Segmentproposal[Pinheiro etal2016]• …

PASCALdetectionleaderboard

PASCALsegmentationleaderboard

ResNet-101

ResNet-101

Page 25: cvpr2016 deep residual learning kaiminghekaiminghe.com/cvpr16resnet/cvpr2016_deep_residual... · 2017-01-22 · Deep Residual Learning for Image Recognition Kaiming He, Xiangyu Zhang,

PotentialApplications

ResNetshaveshownoutstandingorpromisingresultson:

VisualRecognition

ImageGeneration(PixelRNN,NeuralArt,etc.)

NaturalLanguageProcessing(VerydeepCNN)

SpeechRecognition(preliminaryresults)

Advertising,userprediction(preliminaryresults)

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

Page 26: cvpr2016 deep residual learning kaiminghekaiminghe.com/cvpr16resnet/cvpr2016_deep_residual... · 2017-01-22 · Deep Residual Learning for Image Recognition Kaiming He, Xiangyu Zhang,

Conclusions

• DeepResidualNetworks:• Easytotrain• Simplygainaccuracyfromdepth• Welltransferrable

• Follow-up[Heetal.arXiv 2016]• 200 layersonImageNet,1000 layersonCIFAR

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“IdentityMappingsinDeepResidualNetworks”.arXiv 2016.KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.

Page 27: cvpr2016 deep residual learning kaiminghekaiminghe.com/cvpr16resnet/cvpr2016_deep_residual... · 2017-01-22 · Deep Residual Learning for Image Recognition Kaiming He, Xiangyu Zhang,

Resources

• ModelsandCode• OurImageNetmodelsinCaffe:https://github.com/KaimingHe/deep-residual-networks

• Manyavailableimplementations:(listinhttps://github.com/KaimingHe/deep-residual-networks)

• FacebookAIResearch’sTorchResNet:https://github.com/facebook/fb.resnet.torch

• Torch,CIFAR-10,withResNet-20toResNet-110,trainingcode,andcurves:code• Lasagne,CIFAR-10,withResNet-32andResNet-56andtrainingcode:code• Neon,CIFAR-10,withpre-trainedResNet-32toResNet-110models,trainingcode,andcurves:code• Torch,MNIST,100layers:blog,code• AwinningentryinKaggle's rightwhalerecognitionchallenge:blog,code• Neon,Place2(mini),40layers:blog,code• …....

KaimingHe,XiangyuZhang,ShaoqingRen,&JianSun.“DeepResidualLearningforImageRecognition”.CVPR2016.