Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure...

113
Justin Johnson September 30, 2019 Lecture 8: CNN Architectures Lecture 8 - 1

Transcript of Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure...

Page 1: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

Lecture8:CNNArchitectures

Lecture8- 1

Page 2: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

Reminder:A2duetoday!

Lecture8- 2

Dueat11:59pm

Remembertorunthevalidationscript!

Page 3: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

Soon:Assignment3!

Lecture8- 3

ModularAPIforbackpropagation

Fully-connectednetworksDropoutUpdaterules:SGD+Momentum,RMSprop,AdamConvolutionalnetworksBatchnormalization

WillbereleasedtodayortomorrowWillbeduetwoweeksfromthedayitisreleased

Page 4: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

LastTime:ComponentsofConvolutionalNetworks

Lecture8- 4

ConvolutionLayers PoolingLayers

x h s

Fully-ConnectedLayers

ActivationFunction Normalization

Page 5: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ImageNetClassificationChallenge

Lecture8- 5

28.225.8

16.4

11.7

7.3 6.73.6 3 2.3

5.1

0

5

10

15

20

25

30

2010 2011 2012 2013 2014 2014 2015 2016 2017 Human

ErrorR

ate

Shallow

8layers 8layers

19layers

22layers

152layers

152layers

152layers

Linetal Sanchez&Perronnin

Krizhevsky etal(AlexNet)

Zeiler &Fergus

Simonyan &Zisserman(VGG)

Szegedy etal(GoogLeNet)

Heetal(ResNet)

Russakovsky etalShaoetal Huetal(SENet)

Page 6: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ImageNetClassificationChallenge

Lecture8- 6

28.225.8

16.4

11.7

7.3 6.73.6 3 2.3

5.1

0

5

10

15

20

25

30

2010 2011 2012 2013 2014 2014 2015 2016 2017 Human

ErrorR

ate

Shallow

8layers 8layers

19layers

22layers

152layers

152layers

152layers

Linetal Sanchez&Perronnin

Krizhevsky etal(AlexNet)

Zeiler &Fergus

Simonyan &Zisserman(VGG)

Szegedy etal(GoogLeNet)

Heetal(ResNet)

Russakovsky etalShaoetal Huetal(SENet)

Page 7: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

AlexNet

Lecture8- 7

FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.

227x227inputs5ConvolutionallayersMaxpooling3fully-connectedlayersReLU nonlinearities

Page 8: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

AlexNet

Lecture8- 8

FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.

227x227inputs5ConvolutionallayersMaxpooling3fully-connectedlayersReLU nonlinearities

Used“Localresponsenormalization”;Notusedanymore

TrainedontwoGTX580GPUs– only3GBofmemoryeach!ModelsplitovertwoGPUs

Page 9: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

AlexNet

Lecture8- 9

FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.

Page 10: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

AlexNet

Lecture8- 10

FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.

284 9422672

5955

10173

14951

11533

0

2000

4000

6000

8000

10000

12000

14000

16000

2013 2014 2015 2016 2017 2018 2019

AlexNet Citationsperyear(Asof9/30/2019)

TotalCitations:46,510

Page 11: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

AlexNet

Lecture8- 11

FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.

284 9422672

5955

10173

14951

11533

0

2000

4000

6000

8000

10000

12000

14000

16000

2013 2014 2015 2016 2017 2018 2019

AlexNet Citationsperyear(Asof9/30/2019)

CitationCountsDarwin,“Ontheoriginofspecies”,1859:50,007

Shannon,“Amathematicaltheoryofcommunication”,1948:69,351

WatsonandCrick,“MolecularStructureofNucleicAcids”,1953:13,111

ATLASCollaboration,“ObservationofanewparticleinthesearchfortheStandardModelHiggsbosonwiththeATLASdetectorattheLHC”, 2012:14,424TotalCitations:46,510

Page 12: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

AlexNet

Lecture8- 12

FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.

Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36flatten 256 6 9216 36fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4

?

Page 13: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

AlexNet

Lecture8- 13

FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.

Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36flatten 256 6 9216 36fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4

Recall:Outputchannels=numberoffilters

?

Page 14: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

AlexNet

Lecture8- 14

FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.

Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36flatten 256 6 9216 36fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4

Recall:W’=(W– K+2P)/S+1=227– 11+2*2)/4+1=220/4+1=56

Page 15: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

AlexNet

Lecture8- 15

FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.

Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36flatten 256 6 9216 36fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4

?

Page 16: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

AlexNet

Lecture8- 16

FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.

Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36flatten 256 6 9216 36fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4

Numberofoutputelements=C*H’*W’=64*56*56=200,704

Bytesperelement=4(for32-bitfloatingpoint)

KB=(numberofelements)*(bytesperelem)/1024=200704*4/1024= 784

Page 17: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

AlexNet

Lecture8- 17

FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.

Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36flatten 256 6 9216 36fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4

?

Page 18: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

AlexNet

Lecture8- 18

FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.

Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36flatten 256 6 9216 36fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4

Weightshape=Cout xCin xKxK=64x3x11x11

Biasshape=Cout =64Numberofweights=64*3*11*11+64

=23,296

Page 19: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

AlexNet

Lecture8- 19

FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.

Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36flatten 256 6 9216 36fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4

?

Page 20: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

AlexNet

Lecture8- 20

FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.

Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36flatten 256 6 9216 36fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4

Numberoffloatingpointoperations(multiply+add)=(numberofoutputelements)*(opsperoutputelem)=(Cout xH’xW’)*(Cin xKxK)=(64*56*56)*(3*11*11)=200,704*363=72,855,552

Page 21: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

AlexNet

Lecture8- 21

FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.

Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36flatten 256 6 9216 36fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4

?

Page 22: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

AlexNet

Lecture8- 22

FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.

Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36flatten 256 6 9216 36fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4

Forpoolinglayer:

#outputchannels=#inputchannels=64

W’=floor((W– K)/S+1)=floor(53/2+1)=floor(27.5)=27

Page 23: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

AlexNet

Lecture8- 23

FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.

Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36flatten 256 6 9216 36fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4

#outputelems =Cout xH’xW’Bytesperelem =4KB=Cout *H’*W’*4/1024

=64*27*27*4/1024=182.25

?

Page 24: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

AlexNet

Lecture8- 24

FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.

Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182 0conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36flatten 256 6 9216 36fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4

Poolinglayershavenolearnableparameters!

?

Page 25: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

AlexNet

Lecture8- 25

FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.

Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182 0 0conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36flatten 256 6 9216 36fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4

Floating-pointopsforpoolinglayer=(numberofoutputpositions)*(flopsperoutputposition)=(Cout *H’*W’)*(K*K)=(64*27*27)*(3*3)=419,904=0.4MFLOP

Page 26: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

AlexNet

Lecture8- 26

FigurecopyrightAlexKrizhevsky,IlyaSutskever,andGeoffreyHinton,2012.Reproducedwithpermission.

Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182 0 0conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127 0 0conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36 0 0flatten 256 6 9216 36 0 0fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4

Flattenoutputsize=Cin xHxW=256*6*6=9216

Page 27: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

AlexNet

Lecture8- 27

Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182 0 0conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127 0 0conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36 0 0flatten 256 6 9216 36 0 0fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4

FCparams =Cin *Cout +Cout=9216*4096+4096=37,725,832

FCflops=Cin *Cout=9216*4096=37,748,736

Page 28: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

AlexNet

Lecture8- 28

Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182 0 0conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127 0 0conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36 0 0flatten 256 6 9216 36 0 0fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4

Page 29: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

AlexNet

Lecture8- 29

Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182 0 0conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127 0 0conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36 0 0flatten 256 6 9216 36 0 0fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4

Howtochoosethis?Trialanderror=(

Page 30: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

AlexNet

Lecture8- 30

Inputsize Layer OutputsizeLayer C H / W filters kernel stride pad C H / W memory(KB) params(k) flop(M)conv1 3 227 64 11 4 2 64 56 784 23 73pool1 64 56 3 2 0 64 27 182 0 0conv2 64 27 192 5 1 2 192 27 547 307 224pool2 192 27 3 2 0 192 13 127 0 0conv3 192 13 384 3 1 1 384 13 254 664 112conv4 384 13 256 3 1 1 256 13 169 885 145conv5 256 13 256 3 1 1 256 13 169 590 100pool5 256 13 3 2 0 256 6 36 0 0flatten 256 6 9216 36 0 0fc6 9216 4096 4096 16 37,749 38fc7 4096 4096 4096 16 16,777 17fc8 4096 1000 1000 4 4,096 4

Interestingtrendshere!

Page 31: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

AlexNet

Lecture8- 31

0

5000

10000

15000

20000

25000

30000

35000

40000

Params(K)

0

50

100

150

200

250

MFLOP

0

100

200

300

400

500

600

700

800

900

Memory(KB)

Mostofthememoryusage isintheearlyconvolutionlayers

Nearlyallparameters areinthefully-connectedlayers

Mostfloating-pointops occurintheconvolutionlayers

Page 32: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ImageNetClassificationChallenge

Lecture8- 32

28.225.8

16.4

11.7

7.3 6.73.6 3 2.3

5.1

0

5

10

15

20

25

30

2010 2011 2012 2013 2014 2014 2015 2016 2017 Human

ErrorR

ate

Shallow

8layers 8layers

19layers

22layers

152layers

152layers

152layers

Linetal Sanchez&Perronnin

Krizhevsky etal(AlexNet)

Zeiler &Fergus

Simonyan &Zisserman(VGG)

Szegedy etal(GoogLeNet)

Heetal(ResNet)

Russakovsky etalShaoetal Huetal(SENet)

Page 33: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ImageNetClassificationChallenge

Lecture8- 33

28.225.8

16.4

11.7

7.3 6.73.6 3 2.3

5.1

0

5

10

15

20

25

30

2010 2011 2012 2013 2014 2014 2015 2016 2017 Human

ErrorR

ate

Shallow

8layers 8layers

19layers

22layers

152layers

152layers

152layers

Linetal Sanchez&Perronnin

Krizhevsky etal(AlexNet)

Zeiler &Fergus

Simonyan &Zisserman(VGG)

Szegedy etal(GoogLeNet)

Heetal(ResNet)

Russakovsky etalShaoetal Huetal(SENet)

Page 34: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ZFNet:ABiggerAlexNet

Lecture8- 34

AlexNet but:CONV1:changefrom(11x11stride4)to(7x7stride2)CONV3,4,5:insteadof384,384,256filtersuse512,1024,512Moretrialanderror=(

ImageNettop5error:16.4%->11.7%

Zeiler andFergus,“VisualizingandUnderstandingConvolutionalNetworks”,ECCV2014

Page 35: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ImageNetClassificationChallenge

Lecture8- 35

28.225.8

16.4

11.7

7.3 6.73.6 3 2.3

5.1

0

5

10

15

20

25

30

2010 2011 2012 2013 2014 2014 2015 2016 2017 Human

ErrorR

ate

Shallow

8layers 8layers

19layers

22layers

152layers

152layers

152layers

Linetal Sanchez&Perronnin

Krizhevsky etal(AlexNet)

Zeiler &Fergus

Simonyan &Zisserman(VGG)

Szegedy etal(GoogLeNet)

Heetal(ResNet)

Russakovsky etalShaoetal Huetal(SENet)

Page 36: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ImageNetClassificationChallenge

Lecture8- 36

28.225.8

16.4

11.7

7.3 6.73.6 3 2.3

5.1

0

5

10

15

20

25

30

2010 2011 2012 2013 2014 2014 2015 2016 2017 Human

ErrorR

ate

Shallow

8layers 8layers

19layers

22layers

152layers

152layers

152layers

Linetal Sanchez&Perronnin

Krizhevsky etal(AlexNet)

Zeiler &Fergus

Simonyan &Zisserman(VGG)

Szegedy etal(GoogLeNet)

Heetal(ResNet)

Russakovsky etalShaoetal Huetal(SENet)

Page 37: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

VGG:DeeperNetworks,RegularDesign

Lecture8- 37

3x3 conv, 128Pool

3x3 conv, 643x3 conv, 64

Input

3x3 conv, 128Pool

3x3 conv, 2563x3 conv, 256

Pool

3x3 conv, 5123x3 conv, 512

Pool

3x3 conv, 5123x3 conv, 512

Pool

FC 4096FC 1000Softmax

FC 4096

3x3 conv, 512

3x3 conv, 512

3x3 conv, 384Pool

5x5 conv, 25611x11 conv, 96

Input

Pool3x3 conv, 3843x3 conv, 256

PoolFC 4096FC 4096

SoftmaxFC 1000

Pool

Input

Pool

Pool

Pool

Pool

Softmax

3x3 conv, 512

3x3 conv, 512

3x3 conv, 2563x3 conv, 256

3x3 conv, 1283x3 conv, 128

3x3 conv, 643x3 conv, 64

3x3 conv, 5123x3 conv, 512

3x3 conv, 512

3x3 conv, 5123x3 conv, 512

3x3 conv, 512

FC 4096FC 1000

FC 4096

AlexNet VGG16 VGG19Simonyan andZissermann,“VeryDeepConvolutionalNetworksforLarge-ScaleImageRecognition”,ICLR2015

VGGDesignrules:Allconvare3x3stride1pad1Allmaxpoolare2x2stride2Afterpool,double#channels

Page 38: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

VGG:DeeperNetworks,RegularDesign

Lecture8- 38

3x3 conv, 128Pool

3x3 conv, 643x3 conv, 64

Input

3x3 conv, 128Pool

3x3 conv, 2563x3 conv, 256

Pool

3x3 conv, 5123x3 conv, 512

Pool

3x3 conv, 5123x3 conv, 512

Pool

FC 4096FC 1000Softmax

FC 4096

3x3 conv, 512

3x3 conv, 512

3x3 conv, 384Pool

5x5 conv, 25611x11 conv, 96

Input

Pool3x3 conv, 3843x3 conv, 256

PoolFC 4096FC 4096

SoftmaxFC 1000

Pool

Input

Pool

Pool

Pool

Pool

Softmax

3x3 conv, 512

3x3 conv, 512

3x3 conv, 2563x3 conv, 256

3x3 conv, 1283x3 conv, 128

3x3 conv, 643x3 conv, 64

3x3 conv, 5123x3 conv, 512

3x3 conv, 512

3x3 conv, 5123x3 conv, 512

3x3 conv, 512

FC 4096FC 1000

FC 4096

AlexNet VGG16 VGG19Simonyan andZissermann,“VeryDeepConvolutionalNetworksforLarge-ScaleImageRecognition”,ICLR2015

VGGDesignrules:Allconvare3x3stride1pad1Allmaxpoolare2x2stride2Afterpool,double#channelsNetworkhas5convolutionalstages:Stage1:conv-conv-poolStage2:conv-conv-poolStage3:conv-conv-poolStage4:conv-conv-conv-[conv]-poolStage5:conv-conv-conv-[conv]-pool

(VGG-19has4convinstages4and5)

Page 39: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

VGG:DeeperNetworks,RegularDesign

Lecture8- 39

3x3 conv, 128Pool

3x3 conv, 643x3 conv, 64

Input

3x3 conv, 128Pool

3x3 conv, 2563x3 conv, 256

Pool

3x3 conv, 5123x3 conv, 512

Pool

3x3 conv, 5123x3 conv, 512

Pool

FC 4096FC 1000Softmax

FC 4096

3x3 conv, 512

3x3 conv, 512

3x3 conv, 384Pool

5x5 conv, 25611x11 conv, 96

Input

Pool3x3 conv, 3843x3 conv, 256

PoolFC 4096FC 4096

SoftmaxFC 1000

Pool

Input

Pool

Pool

Pool

Pool

Softmax

3x3 conv, 512

3x3 conv, 512

3x3 conv, 2563x3 conv, 256

3x3 conv, 1283x3 conv, 128

3x3 conv, 643x3 conv, 64

3x3 conv, 5123x3 conv, 512

3x3 conv, 512

3x3 conv, 5123x3 conv, 512

3x3 conv, 512

FC 4096FC 1000

FC 4096

AlexNet VGG16 VGG19Simonyan andZissermann,“VeryDeepConvolutionalNetworksforLarge-ScaleImageRecognition”,ICLR2015

VGGDesignrules:Allconvare3x3stride1pad1Allmaxpoolare2x2stride2Afterpool,double#channels

Option1:Conv(5x5,C->C)

Params:25C2FLOPs:25C2HW

Page 40: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

VGG:DeeperNetworks,RegularDesign

Lecture8- 40

3x3 conv, 128Pool

3x3 conv, 643x3 conv, 64

Input

3x3 conv, 128Pool

3x3 conv, 2563x3 conv, 256

Pool

3x3 conv, 5123x3 conv, 512

Pool

3x3 conv, 5123x3 conv, 512

Pool

FC 4096FC 1000Softmax

FC 4096

3x3 conv, 512

3x3 conv, 512

3x3 conv, 384Pool

5x5 conv, 25611x11 conv, 96

Input

Pool3x3 conv, 3843x3 conv, 256

PoolFC 4096FC 4096

SoftmaxFC 1000

Pool

Input

Pool

Pool

Pool

Pool

Softmax

3x3 conv, 512

3x3 conv, 512

3x3 conv, 2563x3 conv, 256

3x3 conv, 1283x3 conv, 128

3x3 conv, 643x3 conv, 64

3x3 conv, 5123x3 conv, 512

3x3 conv, 512

3x3 conv, 5123x3 conv, 512

3x3 conv, 512

FC 4096FC 1000

FC 4096

AlexNet VGG16 VGG19Simonyan andZissermann,“VeryDeepConvolutionalNetworksforLarge-ScaleImageRecognition”,ICLR2015

VGGDesignrules:Allconvare3x3stride1pad1Allmaxpoolare2x2stride2Afterpool,double#channels

Option1:Conv(5x5,C->C)

Params:25C2FLOPs:25C2HW

Option2:Conv(3x3,C->C)Conv(3x3,C->C)

Params:18C2FLOPs:18C2HW

Page 41: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

VGG:DeeperNetworks,RegularDesign

Lecture8- 41

3x3 conv, 128Pool

3x3 conv, 643x3 conv, 64

Input

3x3 conv, 128Pool

3x3 conv, 2563x3 conv, 256

Pool

3x3 conv, 5123x3 conv, 512

Pool

3x3 conv, 5123x3 conv, 512

Pool

FC 4096FC 1000Softmax

FC 4096

3x3 conv, 512

3x3 conv, 512

3x3 conv, 384Pool

5x5 conv, 25611x11 conv, 96

Input

Pool3x3 conv, 3843x3 conv, 256

PoolFC 4096FC 4096

SoftmaxFC 1000

Pool

Input

Pool

Pool

Pool

Pool

Softmax

3x3 conv, 512

3x3 conv, 512

3x3 conv, 2563x3 conv, 256

3x3 conv, 1283x3 conv, 128

3x3 conv, 643x3 conv, 64

3x3 conv, 5123x3 conv, 512

3x3 conv, 512

3x3 conv, 5123x3 conv, 512

3x3 conv, 512

FC 4096FC 1000

FC 4096

AlexNet VGG16 VGG19Simonyan andZissermann,“VeryDeepConvolutionalNetworksforLarge-ScaleImageRecognition”,ICLR2015

VGGDesignrules:Allconvare3x3stride1pad1Allmaxpoolare2x2stride2Afterpool,double#channels

Option1:Conv(5x5,C->C)

Params:25C2FLOPs:25C2HW

Option2:Conv(3x3,C->C)Conv(3x3,C->C)

Params:18C2FLOPs:18C2HW

Two3x3convhassamereceptivefieldasasingle5x5conv,buthasfewerparametersandtakeslesscomputation!

Page 42: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

VGG:DeeperNetworks,RegularDesign

Lecture8- 42

3x3 conv, 128Pool

3x3 conv, 643x3 conv, 64

Input

3x3 conv, 128Pool

3x3 conv, 2563x3 conv, 256

Pool

3x3 conv, 5123x3 conv, 512

Pool

3x3 conv, 5123x3 conv, 512

Pool

FC 4096FC 1000Softmax

FC 4096

3x3 conv, 512

3x3 conv, 512

3x3 conv, 384Pool

5x5 conv, 25611x11 conv, 96

Input

Pool3x3 conv, 3843x3 conv, 256

PoolFC 4096FC 4096

SoftmaxFC 1000

Pool

Input

Pool

Pool

Pool

Pool

Softmax

3x3 conv, 512

3x3 conv, 512

3x3 conv, 2563x3 conv, 256

3x3 conv, 1283x3 conv, 128

3x3 conv, 643x3 conv, 64

3x3 conv, 5123x3 conv, 512

3x3 conv, 512

3x3 conv, 5123x3 conv, 512

3x3 conv, 512

FC 4096FC 1000

FC 4096

AlexNet VGG16 VGG19Simonyan andZissermann,“VeryDeepConvolutionalNetworksforLarge-ScaleImageRecognition”,ICLR2015

VGGDesignrules:Allconvare3x3stride1pad1Allmaxpoolare2x2stride2Afterpool,double#channels

Input:Cx2Hx2WLayer:Conv(3x3,C->C)

Memory:4HWCParams:9C2FLOPs:36HWC2

Page 43: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

VGG:DeeperNetworks,RegularDesign

Lecture8- 43

3x3 conv, 128Pool

3x3 conv, 643x3 conv, 64

Input

3x3 conv, 128Pool

3x3 conv, 2563x3 conv, 256

Pool

3x3 conv, 5123x3 conv, 512

Pool

3x3 conv, 5123x3 conv, 512

Pool

FC 4096FC 1000Softmax

FC 4096

3x3 conv, 512

3x3 conv, 512

3x3 conv, 384Pool

5x5 conv, 25611x11 conv, 96

Input

Pool3x3 conv, 3843x3 conv, 256

PoolFC 4096FC 4096

SoftmaxFC 1000

Pool

Input

Pool

Pool

Pool

Pool

Softmax

3x3 conv, 512

3x3 conv, 512

3x3 conv, 2563x3 conv, 256

3x3 conv, 1283x3 conv, 128

3x3 conv, 643x3 conv, 64

3x3 conv, 5123x3 conv, 512

3x3 conv, 512

3x3 conv, 5123x3 conv, 512

3x3 conv, 512

FC 4096FC 1000

FC 4096

AlexNet VGG16 VGG19Simonyan andZissermann,“VeryDeepConvolutionalNetworksforLarge-ScaleImageRecognition”,ICLR2015

VGGDesignrules:Allconvare3x3stride1pad1Allmaxpoolare2x2stride2Afterpool,double#channels

Input:Cx2Hx2WLayer:Conv(3x3,C->C)

Memory:4HWCParams:9C2FLOPs:36HWC2

Input:2CxHxWConv(3x3,2C->2C)

Memory:2HWCParams:36C2FLOPs:36HWC2

Page 44: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

VGG:DeeperNetworks,RegularDesign

Lecture8- 44

3x3 conv, 128Pool

3x3 conv, 643x3 conv, 64

Input

3x3 conv, 128Pool

3x3 conv, 2563x3 conv, 256

Pool

3x3 conv, 5123x3 conv, 512

Pool

3x3 conv, 5123x3 conv, 512

Pool

FC 4096FC 1000Softmax

FC 4096

3x3 conv, 512

3x3 conv, 512

3x3 conv, 384Pool

5x5 conv, 25611x11 conv, 96

Input

Pool3x3 conv, 3843x3 conv, 256

PoolFC 4096FC 4096

SoftmaxFC 1000

Pool

Input

Pool

Pool

Pool

Pool

Softmax

3x3 conv, 512

3x3 conv, 512

3x3 conv, 2563x3 conv, 256

3x3 conv, 1283x3 conv, 128

3x3 conv, 643x3 conv, 64

3x3 conv, 5123x3 conv, 512

3x3 conv, 512

3x3 conv, 5123x3 conv, 512

3x3 conv, 512

FC 4096FC 1000

FC 4096

AlexNet VGG16 VGG19Simonyan andZissermann,“VeryDeepConvolutionalNetworksforLarge-ScaleImageRecognition”,ICLR2015

VGGDesignrules:Allconvare3x3stride1pad1Allmaxpoolare2x2stride2Afterpool,double#channels

Input:Cx2Hx2WLayer:Conv(3x3,C->C)

Memory:4HWCParams:9C2FLOPs:36HWC2

Input:2CxHxWConv(3x3,2C->2C)

Memory:2HWCParams:36C2FLOPs:36HWC2

Convlayersateachspatialresolutiontakethesameamountofcomputation!

Page 45: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

AlexNet vsVGG-16:Muchbiggernetwork!

Lecture8- 45

0

5000

10000

15000

20000

25000

30000

AlexNet vsVGG-16(Memory,KB)

AlexNet VGG-16

0

20000

40000

60000

80000

100000

120000

AlexNet vsVGG-16(Params,M)

AlexNet VGG-16

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

AlexNet vsVGG-16(MFLOPs)

AlexNet VGG-16

AlexNet total:1.9MBVGG-16total:48.6MB(25x)

AlexNet total:61MVGG-16total:138M(2.3x)

AlexNet total:0.7GFLOPVGG-16total:13.6GFLOP(19.4x)

Page 46: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ImageNetClassificationChallenge

Lecture8- 46

28.225.8

16.4

11.7

7.3 6.73.6 3 2.3

5.1

0

5

10

15

20

25

30

2010 2011 2012 2013 2014 2014 2015 2016 2017 Human

ErrorR

ate

Shallow

8layers 8layers

19layers

22layers

152layers

152layers

152layers

Linetal Sanchez&Perronnin

Krizhevsky etal(AlexNet)

Zeiler &Fergus

Simonyan &Zisserman(VGG)

Szegedy etal(GoogLeNet)

Heetal(ResNet)

Russakovsky etalShaoetal Huetal(SENet)

Page 47: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ImageNetClassificationChallenge

Lecture8- 47

28.225.8

16.4

11.7

7.3 6.73.6 3 2.3

5.1

0

5

10

15

20

25

30

2010 2011 2012 2013 2014 2014 2015 2016 2017 Human

ErrorR

ate

Shallow

8layers 8layers

19layers

22layers

152layers

152layers

152layers

Linetal Sanchez&Perronnin

Krizhevsky etal(AlexNet)

Zeiler &Fergus

Simonyan &Zisserman(VGG)

Szegedy etal(GoogLeNet)

Heetal(ResNet)

Russakovsky etalShaoetal Huetal(SENet)

Page 48: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

GoogLeNet:FocusonEfficiency

Lecture8- 48

Szegedy etal,“Goingdeeperwithconvolutions”,CVPR2015

Manyinnovationsforefficiency:reduceparametercount,memoryusage,andcomputation

Page 49: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

GoogLeNet:AggressiveStem

Lecture8- 49

Szegedy etal,“Goingdeeperwithconvolutions”,CVPR2015

Stemnetwork atthestartaggressivelydownsamples input(RecallinVGG-16:Mostofthecomputewasatthestart)

Page 50: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

GoogLeNet:AggressiveStem

Lecture8- 50

Szegedy etal,“Goingdeeperwithconvolutions”,CVPR2015

Inputsize Layer OutputsizeLayer C H / W filters kernelstride pad C H/W memory(KB) params (K) flop(M)conv 3 224 64 7 2 3 64 112 3136 9 118max-pool 64 112 3 2 1 64 56 784 0 2conv 64 56 64 1 1 0 64 56 784 4 13conv 64 56 192 3 1 1 192 56 2352 111 347max-pool 192 56 3 2 1 192 28 588 0 1

Totalfrom224to28spatialresolution:Memory:7.5MBParams:124KMFLOP:418

Stemnetwork atthestartaggressivelydownsamples input(RecallinVGG-16:Mostofthecomputewasatthestart)

Page 51: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

GoogLeNet:AggressiveStem

Lecture8- 51

Szegedy etal,“Goingdeeperwithconvolutions”,CVPR2015

Inputsize Layer OutputsizeLayer C H / W filters kernelstride pad C H/W memory(KB) params (K) flop(M)conv 3 224 64 7 2 3 64 112 3136 9 118max-pool 64 112 3 2 1 64 56 784 0 2conv 64 56 64 1 1 0 64 56 784 4 13conv 64 56 192 3 1 1 192 56 2352 111 347max-pool 192 56 3 2 1 192 28 588 0 1

Totalfrom224to28spatialresolution:Memory:7.5MBParams:124KMFLOP:418

CompareVGG-16:Memory:42.9MB(5.7x)Params:1.1M(8.9x)MFLOP:7485(17.8x)

Stemnetwork atthestartaggressivelydownsamples input(RecallinVGG-16:Mostofthecomputewasatthestart)

Page 52: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

GoogLeNet:InceptionModule

Lecture8- 52

Szegedy etal,“Goingdeeperwithconvolutions”,CVPR2015

InceptionmoduleLocalunitwithparallelbranches

Localstructurerepeatedmanytimesthroughoutthenetwork

Page 53: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

GoogLeNet:InceptionModule

Lecture8- 53

Szegedy etal,“Goingdeeperwithconvolutions”,CVPR2015

InceptionmoduleLocalunitwithparallelbranches

Localstructurerepeatedmanytimesthroughoutthenetwork

Uses1x1“Bottleneck”layerstoreducechanneldimensionbeforeexpensiveconv(wewillrevisitthiswithResNet!)

Page 54: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

GoogLeNet:GlobalAveragePooling

Lecture8- 54

NolargeFClayersattheend!Insteadusesglobalaveragepoolingtocollapsespatialdimensions,andonelinearlayertoproduceclassscores(RecallVGG-16:MostparameterswereintheFClayers!)

Inputsize Layer Output sizeLayer C H/W filters kernel stride pad C H/W memory(KB) params (k) flop(M)avg-pool 1024 7 7 1 0 1024 1 4 0 0fc 1024 1000 1000 0 1025 1

Layer C H/W filters kernel stride pad C H/W memory(KB) params(K) flop(M)flatten 512 7 25088 98fc6 25088 4096 4096 16 102760 103fc7 4096 4096 4096 16 16777 17fc8 4096 1000 1000 4 4096 4

ComparewithVGG-16:

Page 55: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

GoogLeNet:GlobalAveragePooling

Lecture8- 55

NolargeFClayersattheend!Insteaduses“globalaveragepooling”tocollapsespatialdimensions,andonelinearlayertoproduceclassscores(RecallVGG-16:MostparameterswereintheFClayers!)

Inputsize Layer Output sizeLayer C H/W filters kernel stride pad C H/W memory(KB) params (k) flop(M)avg-pool 1024 7 7 1 0 1024 1 4 0 0fc 1024 1000 1000 0 1025 1

Layer C H/W filters kernel stride pad C H/W memory(KB) params(K) flop(M)flatten 512 7 25088 98fc6 25088 4096 4096 16 102760 103fc7 4096 4096 4096 16 16777 17fc8 4096 1000 1000 4 4096 4

ComparewithVGG-16:

Page 56: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

GoogLeNet:AuxiliaryClassifiers

Lecture8- 56

Trainingusinglossattheendofthenetworkdidn’tworkwell:Networkistoodeep,gradientsdon’tpropagatecleanly

Asahack,attach“auxiliaryclassifiers”atseveralintermediatepointsinthenetworkthatalsotrytoclassifytheimageandreceiveloss

GoogLeNet wasbeforebatchnormalization!WithBatchNorm nolongerneedtousethistrick

Page 57: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ImageNetClassificationChallenge

Lecture8- 57

28.225.8

16.4

11.7

7.3 6.73.6 3 2.3

5.1

0

5

10

15

20

25

30

2010 2011 2012 2013 2014 2014 2015 2016 2017 Human

ErrorR

ate

Shallow

8layers 8layers

19layers

22layers

152layers

152layers

152layers

Linetal Sanchez&Perronnin

Krizhevsky etal(AlexNet)

Zeiler &Fergus

Simonyan &Zisserman(VGG)

Szegedy etal(GoogLeNet)

Heetal(ResNet)

Russakovsky etalShaoetal Huetal(SENet)

Page 58: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ImageNetClassificationChallenge

Lecture8- 58

28.225.8

16.4

11.7

7.3 6.73.6 3 2.3

5.1

0

5

10

15

20

25

30

2010 2011 2012 2013 2014 2014 2015 2016 2017 Human

ErrorR

ate

Shallow

8layers 8layers

19layers

22layers

152layers

152layers

152layers

Linetal Sanchez&Perronnin

Krizhevsky etal(AlexNet)

Zeiler &Fergus

Simonyan &Zisserman(VGG)

Szegedy etal(GoogLeNet)

Heetal(ResNet)

Russakovsky etalShaoetal Huetal(SENet)

Page 59: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ResidualNetworks

Lecture8- 59

Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016

OncewehaveBatchNormalization,wecantrainnetworkswith10+layers.Whathappensaswegodeeper?

Page 60: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ResidualNetworks

Lecture8- 60

Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016

OncewehaveBatchNormalization,wecantrainnetworkswith10+layers.Whathappensaswegodeeper?

Deepermodeldoesworsethanshallowmodel!

Initialguess:Deepmodelisoverfitting sinceitismuchbiggerthantheothermodel

Iterations

56-layer

20-layer

Test error

Page 61: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ResidualNetworks

Lecture8- 61

Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016

OncewehaveBatchNormalization,wecantrainnetworkswith10+layers.Whathappensaswegodeeper?

Training error

Iterations

56-layer

20-layer

Iterations

56-layer

20-layer

Test error

Infactthedeepmodelseemstobeunderfitting sinceitalsoperformsworsethantheshallowmodelonthetrainingset!Itisactuallyunderfitting

Page 62: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ResidualNetworks

Lecture8- 62

Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016

Adeepermodelcanemulate ashallowermodel:copylayersfromshallowermodel,setextralayerstoidentity

Thusdeepermodelsshoulddoatleastasgoodasshallowmodels

Hypothesis:Thisisanoptimization problem.Deepermodelsarehardertooptimize,andinparticulardon’tlearnidentityfunctionstoemulateshallowmodels

Page 63: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ResidualNetworks

Lecture8- 63

Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016

Adeepermodelcanemulate ashallowermodel:copylayersfromshallowermodel,setextralayerstoidentity

Thusdeepermodelsshoulddoatleastasgoodasshallowmodels

Hypothesis:Thisisanoptimization problem.Deepermodelsarehardertooptimize,andinparticulardon’tlearnidentityfunctionstoemulateshallowmodels

Solution:Changethenetworksolearningidentityfunctionswithextralayersiseasy!

Page 64: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ResidualNetworks

Lecture8- 64

Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016

conv

conv

relu

“Plain”block

X

H(x)

relu

ResidualBlock

conv

conv

Additive“shortcut”

F(x)+x

F(x)

relu

X

Solution:Changethenetworksolearningidentityfunctionswithextralayersiseasy!

Page 65: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ResidualNetworks

Lecture8- 65

Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016

conv

conv

relu

“Plain”block

X

H(x)

relu

ResidualBlock

conv

conv

Additive“shortcut”

F(x)+x

F(x)

relu

X

Solution:Changethenetworksolearningidentityfunctionswithextralayersiseasy!

Ifyousettheseto0,thewholeblockwillcomputetheidentityfunction!

Page 66: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ResidualNetworks

Lecture8- 66

Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016Input

Softmax

3x3 conv, 64

7x7 conv, 64, / 2

FC 1000

Pool

3x3 conv, 64

3x3 conv, 643x3 conv, 64

3x3 conv, 643x3 conv, 64

3x3 conv, 1283x3 conv, 128, / 2

3x3 conv, 1283x3 conv, 128

3x3 conv, 1283x3 conv, 128

..

.

3x3 conv, 5123x3 conv, 512, /2

3x3 conv, 5123x3 conv, 512

3x3 conv, 5123x3 conv, 512

Pool

relu

Residualblock

3x3 conv

3x3 conv

F(x)+x

F(x)

relu

X

Aresidualnetworkisastackofmanyresidualblocks

Regulardesign,likeVGG:eachresidualblockhastwo3x3conv

Networkisdividedintostages:thefirstblockofeachstagehalvestheresolution(withstride-2conv)anddoublesthenumberofchannels

Page 67: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ResidualNetworks

Lecture8- 67

Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016Input

Softmax

3x3 conv, 64

7x7 conv, 64, / 2

FC 1000

Pool

3x3 conv, 64

3x3 conv, 643x3 conv, 64

3x3 conv, 643x3 conv, 64

3x3 conv, 1283x3 conv, 128, / 2

3x3 conv, 1283x3 conv, 128

3x3 conv, 1283x3 conv, 128

..

.

3x3 conv, 5123x3 conv, 512, /2

3x3 conv, 5123x3 conv, 512

3x3 conv, 5123x3 conv, 512

Pool

Usesthesameaggressivestem asGoogleNet todownsample theinput4xbeforeapplyingresidualblocks:

Inputsize Layer

Outputsize

Layer C H/W filters kernel stride pad C H/W memory(KB)params(k)

flop(M)

conv 3 224 64 7 2 3 64 112 3136 9 118max-pool 64 112 3 2 1 64 56 784 0 2

Page 68: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ResidualNetworks

Lecture8- 68

Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016Input

Softmax

3x3 conv, 64

7x7 conv, 64, / 2

FC 1000

Pool

3x3 conv, 64

3x3 conv, 643x3 conv, 64

3x3 conv, 643x3 conv, 64

3x3 conv, 1283x3 conv, 128, / 2

3x3 conv, 1283x3 conv, 128

3x3 conv, 1283x3 conv, 128

..

.

3x3 conv, 5123x3 conv, 512, /2

3x3 conv, 5123x3 conv, 512

3x3 conv, 5123x3 conv, 512

Pool

LikeGoogLeNet,nobigfully-connected-layers:insteaduseglobalaveragepooling andasinglelinearlayerattheend

Page 69: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ResidualNetworks

Lecture8- 69

Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016Errorratesare224x224single-croptesting,reportedbytorchvision

Input

Softmax

3x3 conv, 64

7x7 conv, 64, / 2

FC 1000

Pool

3x3 conv, 64

3x3 conv, 643x3 conv, 64

3x3 conv, 643x3 conv, 64

3x3 conv, 1283x3 conv, 128, / 2

3x3 conv, 1283x3 conv, 128

3x3 conv, 1283x3 conv, 128

..

.

3x3 conv, 5123x3 conv, 512, /2

3x3 conv, 5123x3 conv, 512

3x3 conv, 5123x3 conv, 512

Pool

ResNet-18:Stem:1convlayerStage1(C=64):2res.block=4convStage2(C=128):2res.block=4convStage3(C=256):2res.block=4convStage4(C=512):2res.block=4convLinear

ImageNettop-5error:10.92GFLOP:1.8

Page 70: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ResidualNetworks

Lecture8- 70

Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016Errorratesare224x224single-croptesting,reportedbytorchvision

Input

Softmax

3x3 conv, 64

7x7 conv, 64, / 2

FC 1000

Pool

3x3 conv, 64

3x3 conv, 643x3 conv, 64

3x3 conv, 643x3 conv, 64

3x3 conv, 1283x3 conv, 128, / 2

3x3 conv, 1283x3 conv, 128

3x3 conv, 1283x3 conv, 128

..

.

3x3 conv, 5123x3 conv, 512, /2

3x3 conv, 5123x3 conv, 512

3x3 conv, 5123x3 conv, 512

Pool

ResNet-18:Stem:1convlayerStage1(C=64):2res.block=4convStage2(C=128):2res.block=4convStage3(C=256):2res.block=4convStage4(C=512):2res.block=4convLinear

ImageNettop-5error:10.92GFLOP:1.8

ResNet-34:Stem:1convlayerStage1:3res.block=6convStage2:4res.block=8convStage3:6res.block=12convStage4:3res.block=6convLinear

ImageNettop-5error:8.58GFLOP:3.6

Page 71: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ResidualNetworks

Lecture8- 71

Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016Errorratesare224x224single-croptesting,reportedbytorchvision

Input

Softmax

3x3 conv, 64

7x7 conv, 64, / 2

FC 1000

Pool

3x3 conv, 64

3x3 conv, 643x3 conv, 64

3x3 conv, 643x3 conv, 64

3x3 conv, 1283x3 conv, 128, / 2

3x3 conv, 1283x3 conv, 128

3x3 conv, 1283x3 conv, 128

..

.

3x3 conv, 5123x3 conv, 512, /2

3x3 conv, 5123x3 conv, 512

3x3 conv, 5123x3 conv, 512

Pool

ResNet-18:Stem:1convlayerStage1(C=64):2res.block=4convStage2(C=128):2res.block=4convStage3(C=256):2res.block=4convStage4(C=512):2res.block=4convLinear

ImageNettop-5error:10.92GFLOP:1.8

ResNet-34:Stem:1convlayerStage1:3res.block=6convStage2:4res.block=8convStage3:6res.block=12convStage4:3res.block=6convLinear

ImageNettop-5error:8.58GFLOP:3.6

VGG-16:ImageNettop-5error:9.62GFLOP:13.6

Page 72: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ResidualNetworks:BasicBlock

Lecture8- 72

Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016

“Basic”Residualblock

Conv(3x3,C->C)

Conv(3x3,C->C)

Page 73: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ResidualNetworks:BasicBlock

Lecture8- 73

Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016

“Basic”Residualblock

Conv(3x3,C->C)

Conv(3x3,C->C) FLOPs:9HWC2

FLOPs:9HWC2

TotalFLOPs:18HWC2

Page 74: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ResidualNetworks:BottleneckBlock

Lecture8- 74

Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016

“Basic”Residualblock

Conv(3x3,C->C)

Conv(3x3,C->C)

Conv(1x1,4C->C)

Conv(3x3,C->C)

Conv(1x1,C->4C)FLOPs:9HWC2

FLOPs:9HWC2

TotalFLOPs:18HWC2 “Bottleneck”

Residualblock

Page 75: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ResidualNetworks:BottleneckBlock

Lecture8- 75

Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016

“Basic”Residualblock

Conv(3x3,C->C)

Conv(3x3,C->C)

Conv(1x1,4C->C)

Conv(3x3,C->C)

Conv(1x1,C->4C)FLOPs:9HWC2

FLOPs:9HWC2

FLOPs:4HWC2

FLOPs:9HWC2

FLOPs:4HWC2

TotalFLOPs:18HWC2 TotalFLOPs:

17HWC2“Bottleneck”Residualblock

Morelayers,lesscomputationalcost!

Page 76: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ResidualNetworks

Lecture8- 76

Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016Errorratesare224x224single-croptesting,reportedbytorchvision

Input

Softmax

3x3 conv, 64

7x7 conv, 64, / 2

FC 1000

Pool

3x3 conv, 64

3x3 conv, 643x3 conv, 64

3x3 conv, 643x3 conv, 64

3x3 conv, 1283x3 conv, 128, / 2

3x3 conv, 1283x3 conv, 128

3x3 conv, 1283x3 conv, 128

..

.

3x3 conv, 5123x3 conv, 512, /2

3x3 conv, 5123x3 conv, 512

3x3 conv, 5123x3 conv, 512

Pool

Stage1 Stage2 Stage3 Stage4Blocktype

Stemlayers Blocks Layers Blocks Layers Blocks Layers Blocks Layers

FClayers GFLOP

ImageNettop-5error

ResNet-18 Basic 1 2 4 2 4 2 4 2 4 1 1.8 10.92ResNet-34 Basic 1 3 6 4 8 6 12 3 6 1 3.6 8.58ResNet-50 Bottle 1 3 9 4 12 6 18 3 9 1 3.8 7.13ResNet-101 Bottle 1 3 9 4 12 23 69 3 9 1 7.6 6.44ResNet-152 Bottle 1 3 9 8 24 36 108 3 9 1 11.3 5.94

Page 77: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ResidualNetworks

Lecture8- 77

Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016Errorratesare224x224single-croptesting,reportedbytorchvision

Input

Softmax

3x3 conv, 64

7x7 conv, 64, / 2

FC 1000

Pool

3x3 conv, 64

3x3 conv, 643x3 conv, 64

3x3 conv, 643x3 conv, 64

3x3 conv, 1283x3 conv, 128, / 2

3x3 conv, 1283x3 conv, 128

3x3 conv, 1283x3 conv, 128

..

.

3x3 conv, 5123x3 conv, 512, /2

3x3 conv, 5123x3 conv, 512

3x3 conv, 5123x3 conv, 512

Pool

Stage1 Stage2 Stage3 Stage4Blocktype

Stemlayers Blocks Layers Blocks Layers Blocks Layers Blocks Layers

FClayers GFLOP

ImageNettop-5error

ResNet-18 Basic 1 2 4 2 4 2 4 2 4 1 1.8 10.92ResNet-34 Basic 1 3 6 4 8 6 12 3 6 1 3.6 8.58ResNet-50 Bottle 1 3 9 4 12 6 18 3 9 1 3.8 7.13ResNet-101 Bottle 1 3 9 4 12 23 69 3 9 1 7.6 6.44ResNet-152 Bottle 1 3 9 8 24 36 108 3 9 1 11.3 5.94

ResNet-50isthesameasResNet-34,butreplacesBasicblockswithBottleneckBlocks.Thisisagreatbaselinearchitectureformanytaskseventoday!

Page 78: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ResidualNetworks

Lecture8- 78

Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016Errorratesare224x224single-croptesting,reportedbytorchvision

Input

Softmax

3x3 conv, 64

7x7 conv, 64, / 2

FC 1000

Pool

3x3 conv, 64

3x3 conv, 643x3 conv, 64

3x3 conv, 643x3 conv, 64

3x3 conv, 1283x3 conv, 128, / 2

3x3 conv, 1283x3 conv, 128

3x3 conv, 1283x3 conv, 128

..

.

3x3 conv, 5123x3 conv, 512, /2

3x3 conv, 5123x3 conv, 512

3x3 conv, 5123x3 conv, 512

Pool

Stage1 Stage2 Stage3 Stage4Blocktype

Stemlayers Blocks Layers Blocks Layers Blocks Layers Blocks Layers

FClayers GFLOP

ImageNettop-5error

ResNet-18 Basic 1 2 4 2 4 2 4 2 4 1 1.8 10.92ResNet-34 Basic 1 3 6 4 8 6 12 3 6 1 3.6 8.58ResNet-50 Bottle 1 3 9 4 12 6 18 3 9 1 3.8 7.13ResNet-101 Bottle 1 3 9 4 12 23 69 3 9 1 7.6 6.44ResNet-152 Bottle 1 3 9 8 24 36 108 3 9 1 11.3 5.94

DeeperResNet-101andResNet-152modelsaremoreaccurate,butalsomorecomputationallyheavy

Page 79: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ResidualNetworks

Lecture8- 79

Heetal,“DeepResidualLearningforImageRecognition”,CVPR2016

- Abletotrainverydeepnetworks- Deepernetworksdobetterthan

shallownetworks(asexpected)- Swept1stplaceinallILSVRCand

COCO2015competitions- Stillwidelyusedtoday!

Page 80: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ImprovingResidualNetworks:BlockDesign

Lecture8- 80

Conv

BatchNorm

ReLU

Conv

BatchNorm

ReLU

BatchNorm

ReLU

Conv

BatchNorm

ReLU

Conv

OriginalResNet block “Pre-Activation”ResNet Block

Heetal,”Identitymappingsindeepresidualnetworks”,ECCV2016

NoteReLU after residual:

Cannotactuallylearnidentityfunctionsinceoutputsarenonnegative!

NoteReLU insideresidual:

CanlearntrueidentityfunctionbysettingConvweightstozero!

Page 81: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ImprovingResidualNetworks:BlockDesign

Lecture8- 81

Conv

BatchNorm

ReLU

Conv

BatchNorm

ReLU

BatchNorm

ReLU

Conv

BatchNorm

ReLU

Conv

OriginalResNet block “Pre-Activation”ResNet Block

Heetal,”Identitymappingsindeepresidualnetworks”,ECCV2016

Slightimprovementinaccuracy(ImageNettop-1error)

ResNet-152:21.3vs21.1ResNet-200:21.8vs20.7

Notactuallyusedthatmuchinpractice

Page 82: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ComparingComplexity

Lecture8- 82

Canziani etal,“Ananalysisofdeepneuralnetworkmodelsforpracticalapplications”,2017

Page 83: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ComparingComplexity

Lecture8- 83

Canziani etal,“Ananalysisofdeepneuralnetworkmodelsforpracticalapplications”,2017

Inception-v4:Resnet +Inception!

Page 84: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ComparingComplexity

Lecture8- 84

Canziani etal,“Ananalysisofdeepneuralnetworkmodelsforpracticalapplications”,2017

VGG:Highestmemory,mostoperations

Page 85: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ComparingComplexity

Lecture8- 85

Canziani etal,“Ananalysisofdeepneuralnetworkmodelsforpracticalapplications”,2017

GoogLeNet:Veryefficient!

Page 86: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ComparingComplexity

Lecture8- 86

Canziani etal,“Ananalysisofdeepneuralnetworkmodelsforpracticalapplications”,2017

AlexNet:Lowcompute,lotsofparameters

Page 87: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ComparingComplexity

Lecture8- 87

Canziani etal,“Ananalysisofdeepneuralnetworkmodelsforpracticalapplications”,2017

ResNet:Simpledesign,moderateefficiency,highaccuracy

Page 88: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ImageNetClassificationChallenge

Lecture8- 88

28.225.8

16.4

11.7

7.3 6.73.6 3 2.3

5.1

0

5

10

15

20

25

30

2010 2011 2012 2013 2014 2014 2015 2016 2017 Human

ErrorR

ate

Shallow

8layers 8layers

19layers

22layers

152layers

152layers

152layers

Linetal Sanchez&Perronnin

Krizhevsky etal(AlexNet)

Zeiler &Fergus

Simonyan &Zisserman(VGG)

Szegedy etal(GoogLeNet)

Heetal(ResNet)

Russakovsky etalShaoetal Huetal(SENet)

Page 89: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ImageNetClassificationChallenge

Lecture8- 89

28.225.8

16.4

11.7

7.3 6.73.6 3 2.3

5.1

0

5

10

15

20

25

30

2010 2011 2012 2013 2014 2014 2015 2016 2017 Human

ErrorR

ate

Shallow

8layers 8layers

19layers

22layers

152layers

152layers

152layers

Linetal Sanchez&Perronnin

Krizhevsky etal(AlexNet)

Zeiler &Fergus

Simonyan &Zisserman(VGG)

Szegedy etal(GoogLeNet)

Heetal(ResNet)

Russakovsky etalShaoetal Huetal(SENet)

Page 90: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ImageNet2016winner:ModelEnsembles

Lecture8- 90

Multi-scaleensembleofInception,Inception-Resnet,Resnet,WideResnet models

Shaoetal,2016

Page 91: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ImprovingResNets

Lecture8- 91

Conv(1x1,4C->C)

Conv(3x3,C->C)

Conv(1x1,C->4C)

FLOPs:4HWC2

FLOPs:9HWC2

FLOPs:4HWC2

TotalFLOPs:17HWC2

“Bottleneck”Residualblock

Page 92: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ImprovingResNets:ResNeXt

Lecture8- 92

Conv(1x1,4C->C)

Conv(3x3,C->C)

Conv(1x1,C->4C)

FLOPs:4HWC2

FLOPs:9HWC2

FLOPs:4HWC2

TotalFLOPs:17HWC2

“Bottleneck”Residualblock

Conv(1x1,4C->c)

Conv(3x3,c->c)

Conv(1x1,c->4C)

Conv(1x1,4C->c)

Conv(3x3,c->c)

Conv(1x1,c->4C)

Gparallelpathways

Xie etal,“Aggregatedresidualtransformationsfordeepneuralnetworks”,CVPR2017

Page 93: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ImprovingResNets:ResNeXt

Lecture8- 93

Conv(1x1,4C->C)

Conv(3x3,C->C)

Conv(1x1,C->4C)

FLOPs:4HWC2

FLOPs:9HWC2

FLOPs:4HWC2

TotalFLOPs:17HWC2

“Bottleneck”Residualblock

Conv(1x1,4C->c)

Conv(3x3,c->c)

Conv(1x1,c->4C)

Conv(1x1,4C->c)

Conv(3x3,c->c)

Conv(1x1,c->4C)

Gparallelpathways

4HWCc

9HWc2

4HWCc

TotalFLOPs:(8Cc+9c2)*HWG

Xie etal,“Aggregatedresidualtransformationsfordeepneuralnetworks”,CVPR2017

Page 94: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ImprovingResNets:ResNeXt

Lecture8- 94

Conv(1x1,4C->C)

Conv(3x3,C->C)

Conv(1x1,C->4C)

FLOPs:4HWC2

FLOPs:9HWC2

FLOPs:4HWC2

TotalFLOPs:17HWC2

“Bottleneck”Residualblock

Conv(1x1,4C->c)

Conv(3x3,c->c)

Conv(1x1,c->4C)

Conv(1x1,4C->c)

Conv(3x3,c->c)

Conv(1x1,c->4C)

Gparallelpathways

4HWCc

9HWc2

4HWCc

TotalFLOPs:(8Cc+9c2)*HWGEqualcostwhen

9Gc2 +8GCc– 17C2 =0Example:C=64,G=4,c=24;C=64,G=32,c=4Xie etal,“Aggregatedresidualtransformationsfordeepneuralnetworks”,CVPR2017

Page 95: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

GroupedConvolution

Lecture8- 95

Convolutionwithgroups=1:Normalconvolution

Input:Cin xHxWWeight:Cout xCin xKxKOutput:Cout xH’xW’FLOPs:CoutCinK2HW

AllconvolutionalkernelstouchallCin channelsoftheinput

Page 96: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

GroupedConvolution

Lecture8- 96

Convolutionwithgroups=2:Twoparallelconvolutionlayersthat

workonhalfthechannels

Input:Cin xHxW

Group1:(Cin /2)xHxW

Group2:(Cin /2)xHxW

Split

Conv(KxK,Cin/2->Cout/2) Conv(KxK,Cin/2->Cout/2)

Out1:(Cout /2)xH’xW’

Out2:(Cout /2)xH’xW’

Concat

Output:Cout xH’xW’

Convolutionwithgroups=1:Normalconvolution

Input:Cin xHxWWeight:Cout xCin xKxKOutput:Cout xH’xW’FLOPs:CoutCinK2HW

AllconvolutionalkernelstouchallCin channelsoftheinput

Page 97: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

GroupedConvolution

Lecture8- 97

Convolutionwithgroups=1:Normalconvolution

Input:Cin xHxWWeight:Cout xCin xKxKOutput:Cout xH’xW’FLOPs:CoutCinK2HW

AllconvolutionalkernelstouchallCin channelsoftheinput

Convolutionwithgroups=G:Gparallelconvlayers;each“sees”Cin/GinputchannelsandproducesCout/Goutputchannels

Input:Cin xHxWSplittoGx[(Cin /G)xHxW]Weight:Gx(Cout /G)x(Cin xG)xKxKGparallelconvolutionsOutput:Gx[(Cout /G)xH’xW’]Concat toCout xH’xW’FLOPs:CoutCinK2HW/G

Page 98: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

GroupedConvolution

Lecture8- 98

Convolutionwithgroups=1:Normalconvolution

Input:Cin xHxWWeight:Cout xCin xKxKOutput:Cout xH’xW’FLOPs:CoutCinK2HW

AllconvolutionalkernelstouchallCin channelsoftheinput

Convolutionwithgroups=G:Gparallelconvlayers;each“sees”Cin/GinputchannelsandproducesCout/Goutputchannels

Input:Cin xHxWSplittoGx[(Cin /G)xHxW]Weight:Gx(Cout /G)x(Cin xG)xKxKGparallelconvolutionsOutput:Gx[(Cout /G)xH’xW’]Concat toCout xH’xW’FLOPs:CoutCinK2HW/GDepthwise Convolution

Specialcase:G=Cin,Cout =nCinEachinputchannelisconvolvedwithndifferentKxKfilterstoproducenoutputchannels

Page 99: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

GroupedConvolutioninPyTorch

Lecture8- 99

PyTorch convolutiongivesanoptionforgroups!

Page 100: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ImprovingResNets:ResNeXt

Lecture8- 100

Conv(1x1,4C->Gc)

Conv(3x3,Gc->Gc,groups=G)

Conv(1x1,Gc->4C)

ResNeXt block:Groupedconvolution

Conv(1x1,4C->c)

Conv(3x3,c->c)

Conv(1x1,c->4C)

Conv(1x1,4C->c)

Conv(3x3,c->c)

Conv(1x1,c->4C)

Gparallelpathways

4HWCc

9HWc2

4HWCc

TotalFLOPs:(8Cc+9c2)*HWGEqualcostwhen

9Gc2 +8GCc– 17C2 =0Example:C=64,G=4,c=24;C=64,G=32,c=4

Equivalentformulationwithgroupedconvolution

Xie etal,“Aggregatedresidualtransformationsfordeepneuralnetworks”,CVPR2017

Page 101: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ResNeXt:Maintaincomputationbyaddinggroups!

Lecture8- 101

Model Groups Groupwidth Top-1ErrorResNet-50 1 64 23.9ResNeXt-50 2 40 23ResNeXt-50 4 24 22.6ResNeXt-50 8 14 22.3ResNeXt-50 32 4 22.2

Model Groups Groupwidth Top-1ErrorResNet-101 1 64 22.0ResNeXt-101 2 40 21.7ResNeXt-101 4 24 21.4ResNeXt-101 8 14 21.3ResNeXt-101 32 4 21.2

Xie etal,“Aggregatedresidualtransformationsfordeepneuralnetworks”,CVPR2017

Addinggroupsimprovesperformancewithsamecomputationalcomplexity!

Page 102: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ImageNetClassificationChallenge

Lecture8- 102

28.225.8

16.4

11.7

7.3 6.73.6 3 2.3

5.1

0

5

10

15

20

25

30

2010 2011 2012 2013 2014 2014 2015 2016 2017 Human

ErrorR

ate

Shallow

8layers 8layers

19layers

22layers

152layers

152layers

152layers

Linetal Sanchez&Perronnin

Krizhevsky etal(AlexNet)

Zeiler &Fergus

Simonyan &Zisserman(VGG)

Szegedy etal(GoogLeNet)

Heetal(ResNet)

Russakovsky etalShaoetal Huetal(SENet)

Page 103: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

Squeeze-and-ExcitationNetworks

Lecture8- 103

Huetal,“Squeeze-and-Excitationnetworks”,CVPR2018

Addsa”Squeeze-and-excite”branchtoeachresidualblockthatperformsglobalpooling,full-connectedlayers,andmultipliesbackontofeaturemap

Addsglobalcontext toeachresidualblock!

WonILSVRC2017withResNeXt-152-SE

Page 104: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

ImageNetClassificationChallenge

Lecture8- 104

28.225.8

16.4

11.7

7.3 6.73.6 3 2.3

5.1

0

5

10

15

20

25

30

2010 2011 2012 2013 2014 2014 2015 2016 2017 Human

ErrorR

ate

Shallow

8layers 8layers

19layers

22layers

152layers

152layers

152layers

Linetal Sanchez&Perronnin

Krizhevsky etal(AlexNet)

Zeiler &Fergus

Simonyan &Zisserman(VGG)

Szegedy etal(GoogLeNet)

Heetal(ResNet)

Russakovsky etalShaoetal Huetal(SENet)

Completionofthechallenge:AnnualImageNetcompetitionnolongerheldafter2017->nowmovedtoKaggle.

Page 105: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

DenselyConnectedNeuralNetworks

Lecture8- 105

Conv

Conv

1x1 conv, 64

1x1 conv, 64

Input

Concat

Concat

Concat

Dense Block

Pool

Conv

Dense Block 1

Conv

Input

Conv

Dense Block 2

Conv

Pool

Conv

Dense Block 3

Softmax

FC

Pool

Huangetal,“Denselyconnectedneuralnetworks”,CVPR2017

Denseblockswhereeachlayerisconnectedtoeveryotherlayerinfeedforwardfashion

Alleviatesvanishinggradient,strengthensfeaturepropagation,encouragesfeaturereuse

Page 106: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

MobileNets:TinyNetworks(ForMobileDevices)

Lecture8- 106

BatchNorm

ReLU

Conv(3x3,C->C)

Conv(3x3,C->C,groups=C)

BatchNorm

ReLU

Conv(1x1,C->C)

BatchNorm

ReLU

9C2HW

9CHW

C2HW

StandardConvolutionBlockTotalcost:9C2HW

Depthwise SeparableConvolutionTotalcost:(9C+C2)HW

“Depthwise Convolution”

“PointwiseConvolution”

Howardetal,“MobileNets:EfficientConvolutionalNeuralNetworksforMobileVisionApplications”,2017

Speedup=9C2/(9C+C2)=9C/(9+C)=>9(asC->inf)

Page 107: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

MobileNets:TinyNetworks(ForMobileDevices)

Lecture8- 107

Conv(3x3,C->C,groups=C)

BatchNorm

ReLU

Conv(1x1,C->C)

BatchNorm

ReLU

9CHW

C2HW

Depthwise SeparableConvolutionTotalcost:(9C+C2)HW

“Depthwise Convolution”

“PointwiseConvolution”

Howardetal,“MobileNets:EfficientConvolutionalNeuralNetworksforMobileVisionApplications”,2017

Alsorelated:

ShuffleNet:Zhangetal,CVPR2018MobileNetV2:Sandleretal,CVPR2018ShuffleNetV2:Maetal,ECCV2018

Page 108: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

NeuralArchitectureSearch

Lecture8- 108

Zoph andLe,“NeuralArchitectureSearchwithReinforcementLearning”,ICLR2017

Designingneuralnetworkarchitecturesishard– let’sautomateit!

- Onenetwork(controller)outputsnetworkarchitectures- Samplechildnetworks fromcontrollerandtrainthem- Aftertrainingabatchofchildnetworks,makeagradient

steponcontrollernetwork(Usingpolicygradient)- Overtime,controllerlearnstooutputgoodarchitectures!

Page 109: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

NeuralArchitectureSearch

Lecture8- 109

Zoph andLe,“NeuralArchitectureSearchwithReinforcementLearning”,ICLR2017

Designingneuralnetworkarchitecturesishard– let’sautomateit!

- Onenetwork(controller)outputsnetworkarchitectures- Samplechildnetworks fromcontrollerandtrainthem- Aftertrainingabatchofchildnetworks,makeagradient

steponcontrollernetwork(Usingpolicygradient)- Overtime,controllerlearnstooutputgoodarchitectures!- VERYEXPENSIVE!!Eachgradientsteponcontroller

requirestrainingabatchofchildmodels!- Originalpapertrainedon800GPUsfor28days!- Followup workhasfocusedonefficientsearch

Page 110: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

NeuralArchitectureSearch

Lecture8- 110

Zoph etal,“LearningTransferableArchitecturesforScalableImageRecognition”,CVPR2018

NeuralarchitecturesearchcanbeusedtofindefficientCNNarchitectures!

Page 111: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

CNNArchitecturesSummary

Lecture8- 111

Earlywork(AlexNet ->ZFNet ->VGG)showsthatbiggernetworksworkbetter

GoogLeNet oneofthefirsttofocusonefficiency (aggressivestem,1x1bottleneckconvolutions,globalavg poolinsteadofFClayers)

ResNet showedushowtotrainextremelydeepnetworks– limitedonlybyGPUmemory!Startedtoshowdiminishingreturnsasnetworksgotbigger

AfterResNet:Efficientnetworks becamecentral:howcanweimprovetheaccuracywithoutincreasingthecomplexity?

Lotsoftinynetworks aimedatmobiledevices:MobileNet,ShuffleNet,etc

NeuralArchitectureSearchpromisestoautomatearchitecturedesign

Page 112: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

WhichArchitectureshouldIuse?

Lecture8- 112

Don’tbeahero.Formostproblemsyoushoulduseanoff-the-shelfarchitecture;don’ttrytodesignyourown!

Ifyoujustcareaboutaccuracy,ResNet-50 orResNet-101 aregreatchoices

Ifyouwantanefficientnetwork(real-time,runonmobile,etc)tryMobileNets andShuffleNets

Page 113: Lecture 8: CNN Architecturesjustincj/slides/eecs498/498_FA2019_lecture08.pdf · Lecture 8 -8 Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced

JustinJohnson September30,2019

NextTime:DeepLearningHardwareandSoftware

Lecture8- 113