CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer...
Transcript of CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer...
![Page 1: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/1.jpg)
CAP6412AdvancedComputerVision
http://www.cs.ucf.edu/~bgong/CAP6412.html
Boqing GongJan19,2016
![Page 2: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/2.jpg)
Today
• Administrivia• Neuralnetworks&backpropagation(PartII)• DeepresiduallearningbyDustin
![Page 3: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/3.jpg)
Assignment1dueat3pm,01/21(Thursday)
• Reviewthefollowingpaper
[Visualization] Zeiler,MatthewD.,andRobFergus.“Visualizingandunderstandingconvolutionalnetworks.”InComputerVision–ECCV2014,pp.818-833.SpringerInternationalPublishing,2014.
Templateforpaperreview:http://www.cs.ucf.edu/~bgong/CAP6412/Review.docx
![Page 4: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/4.jpg)
Use“latehomeworkpolicy”wisely
- Threelatedaysintotalforallreportsandprojects- Countingatthegranularityof12hours- Noadditionallatedays
• Somearelatefortheone-pointassignment“TopicPreferenceList”• Tolose1point?(Default)• OR,toearn1pointandtotriggerthelatehomeworkpolicy?(Sendmeemail)
![Page 5: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/5.jpg)
Email--- thebestwaytoreachme
• [email protected] (preferred)• DONOTleavemessagesundermyannouncements
• Put[CAP6412] insubjectline• Summarizemessageinsubjectline• Ex:[CAP6412]Meetingrequest:Thursday(Jan14)4:30pm?
![Page 6: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/6.jpg)
Officehoursofthisweek
• Tuesday:4:30—5:30pmà Thursday:4:30—5:30pm• HEC214
![Page 7: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/7.jpg)
Thisweek:CNNvisualizatin &objectrecognition
Tuesday(01/19)
DustinMorley
[ILSVRC] Russakovsky, Olga, Jia Deng, Hao Su, Jonathan Krause,Sanjeev Satheesh, Sean Ma, Zhiheng Huang et al. “Imagenet largescale visual recognition challenge.” International Journal of ComputerVision (2014): 1-42.[152 layers] He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.“Deep Residual Learning for Image Recognition.” arXiv preprintarXiv:1512.03385 (2015).
Thursday(01/21)
Jason Tiller
[Visualization] Zeiler, Matthew D., and Rob Fergus. “Visualizing andunderstanding convolutional networks.” In Computer Vision–ECCV2014, pp. 818-833. Springer International Publishing, 2014.Zhou, Bolei, Aditya Khosla, Agata Lapedriza, Aude Oliva, and AntonioTorralba. “Object detectors emerge in deep scene cnns.” arXivpreprint arXiv:1412.6856 (2014).
![Page 8: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/8.jpg)
Nextweek:CNN&objectlocalizationTuesday(01/26)
SamerIskander
J. Hosang, R. Benenson, and B. Schiele. How good are detectionproposals, really? BMVC 2014.{Major} J. Hosang, R. Benenson, P. Dollár, and B. Schiele.What makesfor effective detection proposals?PAMI 2015.{Major} [Faster R-CNN] Ren, Shaoqing, Kaiming He, Ross Girshick,and Jian Sun. “Faster R-CNN: Towards real-time object detection withregion proposal networks.” In Advances in Neural InformationProcessing Systems, pp. 91-99. 2015.
Thursday(01/28)
Syed Ahmed
{Major}[R-CNN] Girshick,Ross,JeffDonahue,TrevorDarrell,andJagannathMalik."Richfeaturehierarchiesforaccurateobjectdetectionandsemanticsegmentation."InComputerVisionandPatternRecognition(CVPR),2014IEEEConferenceon,pp.580-587.IEEE,2014.[FastR-CNN] Girshick,Ross."FastR-CNN."arXiv preprintarXiv:1504.08083 (2015).
![Page 9: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/9.jpg)
LinkhasbeensenttoyourUCFemails
![Page 10: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/10.jpg)
Today
• Administrivia• Neuralnetworks&backpropagation(PartII)• FundamentalsofConvolutionalNeuralNetworks(CNN),byFareeha
![Page 11: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/11.jpg)
Review:biologicalneurons
• Humanbrainshasabout10billionnuerons• Eachconnectedto10Kotherneurons• Aneuronfiresifthesumofelectrochemicalinputsexceedssomethreshold
Imagecredit:cs.stanford.edu/people/eroberts
![Page 12: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/12.jpg)
Review:biologicalneurons
• Humanbrainshasabout10billionnuerons• Eachconnectedto10Kotherneurons• Aneuronfiresifthesumofelectrochemicalinputsexceedssomethreshold
Imagecredit:cs.stanford.edu/people/eroberts
![Page 13: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/13.jpg)
Review:artificialneurons/perceptrons
• Aneuronfiresifthesumofweightedinputsexceedssomethreshold
Imagecredit:www.hiit.fi/u/ahonkela/dippa/node41.html
y = '(nX
i=1
wixi + b)
= '(wTx+ b)
'(·) : activation function
![Page 14: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/14.jpg)
Constructingneuralnetworksfromneurons
• Humanbrainshasabout10billionnuerons• Eachconnectedto10Kotherneurons• Aneuronfiresifthesumofelectrochemicalinputsexceedssomethreshold
Imagecredit:cs.stanford.edu/people/eroberts
![Page 15: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/15.jpg)
Basicnetworkstructures
• Feed-forwardnetworks • Recurrentneuralnetworks
Imagecredit:http://mesin-belajar.blogspot.com/2016/01/a-brief-history-of-neural-nets-and-deep_84.html
![Page 16: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/16.jpg)
Imposingdesiredproperties
• Totuneittowardsdesiredproperties
• E.g.,forbinaryclassification• Outputbetween0and1• Tellstheprobabilityoftheinputxbelongingtoeitherclass+1/-1
Imagecredit:Farid E Ahmed
![Page 17: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/17.jpg)
Acasestudy
• Binaryclassification• Outputbetween0and1• Tellstheprobabilityoftheinputxbelongingtoeitherclass+1/-1
• Step1:choosenetworkstructure• Step2:chooseactivationfunction• Step3:determinethemodelparameters𝚯,
tomeetdesiredproperties
Imagecredit:Farid E Ahmed
-10 -5 0 5 10-1
-0.5
0
0.5
1Binary step
-10 -5 0 5 10-1
-0.5
0
0.5
1Logistic
-10 -5 0 5 10-1
-0.5
0
0.5
1TanH
-10 -5 0 5 100
2
4
6
8
10Rectified Linear Unit (ReLU)
'(x) =
1
1 + exp(�x)
![Page 18: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/18.jpg)
Learningthemodelparameters𝚯 (1)
• Isequivalentto
• where,
• Questions:
Binary classification concept: c : X 7! Y = {0, 1}Hypotheses H = {net(⇥)|⇥d 2 R}
Choose one hypothesis h 2 H to approximate concept c
c is unknown
c 2 H?
EmpiricalRiskMinimization(ERM)
![Page 19: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/19.jpg)
Learningthemodelparameters𝚯 (2)
• Isequivalentto
• Canbeimplementedby
Choose one hypothesis h 2 H to approximate concept c
⇥
? argmin
⇥R(⇥)
R(⇥) = Pr(net(x;⇥) 6= y) = E(x,y)⇠PXY[net(x;⇥) 6= y]
P
XY
is the underlying distribution of (x,y)
ß Calledthegeneralizationrisk
![Page 20: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/20.jpg)
Nextclass
⇥? argmin⇥
E(x,y)⇠PXY[net(x;⇥) 6= y]
![Page 21: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/21.jpg)
Today
• Administrivia• Neuralnetworks&backpropagation (PartI)• DeepresiduallearningbyDustin
![Page 22: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/22.jpg)
Deep Residual Learning for Image RecognitionAuthors: Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun (Microsoft Research)
Presented by Dustin Morley
![Page 23: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/23.jpg)
About the paper
´ NOT peer-reviewed – published on arXiv (Dec. 2015)
´ Well supported claim: “We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.”
´ Questionable claim: “…residual nets with a depth of up to 152 layers – 8x deeper than VGG nets but still having lower complexity”´ Claim of lower complexity is not convincingly supported
´ My rating: 2´ Great innovation with high significance, but claims and experimental data
presentation are not organized that well.
![Page 24: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/24.jpg)
Main Contributions
´ Proposed a novel approach to resolve the issue of performance degradation with increased depth
´ Obtained excellent object recognition and localization results´ Ensemble network on ImageNet dataset – 3.57% top-5 classification error
´ 101-layer ResNet on COCO validation set (object detection) – 27.2% mAP@[0.5, 0.95]
´ Won 1st place in several tracks in ILSVRC and COCO 2015 competitions´ ImageNet detection
´ ImageNet localization
´ COCO detection
´ COCO segmentation
![Page 25: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/25.jpg)
Outline
´ Background – theoretical and experimental
´ Problem – NN scalability with added layers
´ Solution – Residual Learning via Identity Mapping “Shortcuts”
´ Experimental Results
´ Conclusion, evaluation, and future directions
![Page 26: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/26.jpg)
Background
´ Convolutional Neural Networks´ Layers – Conv., pool,
Conv., pool, Conv., pool…
´ Conv./pool results propogated forward
´ Classification error propogated backward´ Each layer computes
error derivatives WRT its parameters
Image Credit: Oxford Visual Geometry Group
![Page 27: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/27.jpg)
Background – ImageNet 2012
´ Dataset for image classification
´ 1000 classes
´ 1.28 million training images
´ 50k validation images
´ 100k test images (final results)
´ Top-1 and top-5 error rates
![Page 28: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/28.jpg)
Background – CIFAR-10 Testing
´ Dataset for image classification
´ Images are small (32x32, color)
´ 10 classes
´ 50k training images
´ 10k test images (final result)
![Page 29: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/29.jpg)
Background – MS COCO Testing
´ Dataset for object detection
´ 80 Object Categories
´ 80k training images
´ 40k test images
´ Detailed manual segmentations of images
´ Evaluation metrics revolve around mean average precision (mAP) and intersection over union (IoU)´ Partition results into different classes of IoU ([0.5,0.55], [0.55, 0.6], … [0.95, 1]
´ Compute average precision for each class
´ Compute mean of the average precisions over all classes
![Page 30: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/30.jpg)
Background – PASCAL VOC Testing
´ Dataset for object detection
´ 16k training images from VOC 2012
´ First test set – 5k test images from VOC 2007
´ Second test set – 10k test images from VOC 2007
´ Evaluation metric – similar to MS COCO (but not exactly the same)
![Page 31: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/31.jpg)
Problem – about adding layers…
´ Theory – only overfitting
´ Practice – multiple issues´ Convergence (mostly solved by normalization layers)
´ Accuracy degradation (Training accuracy degrades!!!)
![Page 32: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/32.jpg)
Too many layers?
´ Theory – more layers should never harm training performance´ Take solution for m layers. Add more layers configured such that they only
perform identity operation – same performance.
´ Thus, equivalent or better solution always exists when more layers are added
´ Implication – optimization methods cannot handle too many layers
´ Need a reformulation of extra layers that makes optimization easier
![Page 33: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/33.jpg)
Solution: Residual Network
´ Conjecture: difficult for optimization to deduce “unneeded layers” ´ Equivalently: difficult to determine a
layer should be an identity mapping
´ Recast initial condition so that result under identity mapping is visible
´ Use “shortcuts” to go “around” layers in addition to going “through” them
´ Mathematically: minimize F(x)+x instead of just minimizing F(x)
![Page 34: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/34.jpg)
Residual Network Comparison
![Page 35: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/35.jpg)
Implementation Details - ImageNet
´ Image resized with 1 randomized dimension for scale augmentation
´ Fixed size crop randomly sampled from an image or its horizontal flip
´ Per-pixel mean subtracted
´ Color augmentation´ According to: A. Krizhevsky et al, Imagenet classification with deep convolutional
neural networks, NIPS, 2012
´ Batch normalization right after each convolution, before activation
´ Learning rate divided by 10 when error plateaus
![Page 36: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/36.jpg)
ImageNet Results – Direct Comparison
´ ImageNet
´ “Plain” network top-1 error: 27.94% for 18 layers, 28.54% for 34 layers
´ Residual network top-1 error: 27.88% for 18 layers, 25.03% for 34 layers
![Page 37: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/37.jpg)
ImageNet Results – High Scalability
Configuration differences for A, B, and C are regarding how the “shortcuts” handle changes in dimensionality (A = zero padding, B = projection applied for increasing dimensions only, C = projection always applied)
![Page 38: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/38.jpg)
ImageNet Results - Ensemble
´ ResNet ensemble built from 6 models of different depth (only 2 of the models are depth 152)
![Page 39: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/39.jpg)
CIFAR-10 Results
![Page 40: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/40.jpg)
Object Detection Results
´ PASCAL VOC ´ MS COCO
![Page 41: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/41.jpg)
Conclusion
´ Blindly increasing depth of CNNs can lead (counterintuitively) to decreases in performance rather than increases
´ The residual “shortcut” approach allows benefit to be gained from increasing depth of CNNs
´ Networks built by the authors on this principle performed very well, winning 1st place in several competitions
![Page 42: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/42.jpg)
Evaluation
Strengths´ Novel idea
´ Solves interesting and important problem
´ Approach should be able to be “dropped in” to virtually any CNN design
´ Authors obtained very good performance
Weaknesses´ Questionable statements about
complexity
´ Some parts in the presentation of results are confusing
´ Certain direct comparisons of results didn’t seem particularly meaningful
![Page 43: CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer Vision ... 1512.03385 (2015). Thursday (01/21) Jason Tiller ... Background – PASCAL](https://reader033.fdocuments.us/reader033/viewer/2022052712/5aef527e7f8b9a8b4c8c2465/html5/thumbnails/43.jpg)
Future Directions
´ Are there other types of shortcuts in addition to the identity mapping shortcut that could further improve performance?´ Could these be inferred by studying the nonlinear mappings output by successful
small-depth networks?
´ Insert the identity mapping shortcuts into other neural networks.´ Results section pinned ResNet “against” vgg and other networks. This seems like a
tyranny of either/or.