YOLO: Unified, Real-Time Object Detection · Ren, Shaoqing, et al. "Faster R-CNN: towards real-time...
Transcript of YOLO: Unified, Real-Time Object Detection · Ren, Shaoqing, et al. "Faster R-CNN: towards real-time...
Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 1 / 25
YOLO: Unified, Real-Time Object DetectionJoseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi
Anup Deshmukh, Pratheeksha Nair
March 28, 2018
Outline
1 Discussion leading towards YOLO!
2 What is YOLO?IntroductionUnified Object DetectionThe design of cost function
3 Experimental analysisComparisons
4 Limitations of YOLO
5 Summary
Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 2 / 25
The Infographic!
Powered by TCPDF (www.tcpdf.org)
Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 3 / 25
Little bit of History
• Object Detection using Hog Features (2005)
• Region-based Convolutional Neural Networks(R-CNN) (Oct2014)
• Fast(R-CNN) (Sept 2015)
• Faster(R-CNN) (Jan 2016)
Powered by TCPDF (www.tcpdf.org)
Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 4 / 25
Next Subsection
1 Discussion leading towards YOLO!
2 What is YOLO?IntroductionUnified Object DetectionThe design of cost function
3 Experimental analysisComparisons
4 Limitations of YOLO
5 Summary
Powered by TCPDF (www.tcpdf.org)
Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 5 / 25
Introduction
• Modelling the Regression problem
• Using bounding boxes with associated class prob.
• Single DNN - end to end optimization
• Full image as input (Understanding of context)
• Extremely fast (45fps)
Powered by TCPDF (www.tcpdf.org)
Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 6 / 25
Next Subsection
1 Discussion leading towards YOLO!
2 What is YOLO?IntroductionUnified Object DetectionThe design of cost function
3 Experimental analysisComparisons
4 Limitations of YOLO
5 Summary
Powered by TCPDF (www.tcpdf.org)
Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 7 / 25
Unified Object Detection
• YOLO sees entire image during both training and testing
• Single regression problem (image pixels to bounding boxes andclasses)
Powered by TCPDF (www.tcpdf.org)
Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 8 / 25
Unified Object Detection
• YOLO sees entire image during both training and testing
• Single regression problem (image pixels to bounding boxes andclasses)
Powered by TCPDF (www.tcpdf.org)
Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 8 / 25
Unified Object Detection - Workflow
• YOLO divides input image into SxS grid
• If the center of an object falls inside a grid cell, that grid cell isresponsible for detecting the object
Powered by TCPDF (www.tcpdf.org)
Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 9 / 25
Unified Object Detection - Workflow
• Each grid cell predicts B bounding boxes and confidence scoresfor these boxes
• Confidence of box being accurate and box containing an object
Confidence = Pr(object) * IOUtruthpred
• Each bounding box is represented as [x , y , w , h, conf ]
Powered by TCPDF (www.tcpdf.org)
Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 10 / 25
Unified Object Detection - Workflow
• Each grid cell predicts B bounding boxes and confidence scoresfor these boxes
• Confidence of box being accurate and box containing an object
Confidence = Pr(object) * IOUtruthpred
• Each bounding box is represented as [x , y , w , h, conf ]
Powered by TCPDF (www.tcpdf.org)
Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 10 / 25
Unified Object Detection - Workflow
• Each grid cell predicts B bounding boxes and confidence scoresfor these boxes
• Confidence of box being accurate and box containing an object
Confidence = Pr(object) * IOUtruthpred
• Each bounding box is represented as [x , y , w , h, conf ]
Powered by TCPDF (www.tcpdf.org)
Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 10 / 25
Unified Object Detection - Workflow
• Each grid cell calculates Ci = Pr(Classi |Object) for all classes ofobjects
• In each grid cell, one box is chosen and Ci is calculated
• This is encoded as a tensor of size [S x S x (B*5 + C)]
Powered by TCPDF (www.tcpdf.org)
Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 11 / 25
Unified Object Detection - Workflow
• Each grid cell calculates Ci = Pr(Classi |Object) for all classes ofobjects
• In each grid cell, one box is chosen and Ci is calculated
• This is encoded as a tensor of size [S x S x (B*5 + C)]
Powered by TCPDF (www.tcpdf.org)
Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 11 / 25
Unified Object Detection - Testing• The conditional class probabilities and confidence of predicted
bounding boxes are multiplied
Pr(Classi |Object) ∗ Pr(Object) ∗ IOUtruthpred = Pr(Classi ) ∗ IOUtruth
pred
• For each predicted box, class specific confidence scores• Confidence of predicted box fitting the object and object
belonging to that class
Powered by TCPDF (www.tcpdf.org)
Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 12 / 25
Quick overview of YOLO
Powered by TCPDF (www.tcpdf.org)
Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 13 / 25
Next Subsection
1 Discussion leading towards YOLO!
2 What is YOLO?IntroductionUnified Object DetectionThe design of cost function
3 Experimental analysisComparisons
4 Limitations of YOLO
5 Summary
Powered by TCPDF (www.tcpdf.org)
Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 14 / 25
The design
• The authors have used sum-squared error because it is easy tooptimize.
• Sum-squared loss should not equally weight errors with objectequally with error when there is no object.
• Sum-squared loss also should not equally weight errors in largeboxes and small boxes.
Powered by TCPDF (www.tcpdf.org)
Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 15 / 25
Cost function
Term 1
Term 2
Powered by TCPDF (www.tcpdf.org)
Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 16 / 25
Cost function
Term 1
Term 2
Powered by TCPDF (www.tcpdf.org)
Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 16 / 25
Next Subsection
1 Discussion leading towards YOLO!
2 What is YOLO?IntroductionUnified Object DetectionThe design of cost function
3 Experimental analysisComparisons
4 Limitations of YOLO
5 Summary
Powered by TCPDF (www.tcpdf.org)
Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 17 / 25
DPM - The only other model that runs in realtime
Table:
DPM YOLO
processes 30fps processes 45fpsmAP is 26.1% mAP is 63.4%
Powered by TCPDF (www.tcpdf.org)
Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 18 / 25
R-CNN’s
Table:
RCNN YOLO
processes 5-18fps processes 45fps3x times better at localizing lags in localizing accuracyfar more background errors analyzing image as a whole
Powered by TCPDF (www.tcpdf.org)
Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 19 / 25
The Generalization
• PICASSO and People-Art Dataset
• YOLO generalizes better to new domains better than others
• RCNN drops considerably (Selective search is tuned only fornatural images)
• YOLO has good performance (models shape, size andrelationship between objects and their locations)
Powered by TCPDF (www.tcpdf.org)
Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 20 / 25
Limitations
• Errors are treated same in small bounding boxes versus largebounding boxes.
• Small objects that appear in groups
Powered by TCPDF (www.tcpdf.org)
Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 21 / 25
Summary
YOLO is modelled as a super-fast regression problem which takes aninput image and learns the class probabilities and bounding box
coordinates.
Powered by TCPDF (www.tcpdf.org)
Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 22 / 25
References
• Girshick, Ross, et al. ”Rich feature hierarchies for accurate objectdetection and semantic segmentation.” Proceedings of the IEEEconference on computer vision and pattern recognition. 2014.
• Girshick, Ross. ”Fast r-cnn.” arXiv preprint arXiv:1504.08083(2015).
• Ren, Shaoqing, et al. ”Faster R-CNN: towards real-time objectdetection with region proposal networks.” IEEE transactions onpattern analysis and machine intelligence 39.6 (2017):1137-1149.
• Redmon, Joseph, et al. ”You only look once: Unified, real-timeobject detection.” Proceedings of the IEEE conference oncomputer vision and pattern recognition. 2016.
• The Infographics: https://medium.com/@nikasa1889/the-modern-history-of-object-recognition-infographic-aea18517c318
Powered by TCPDF (www.tcpdf.org)
Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 23 / 25
Questions/Comments?
Powered by TCPDF (www.tcpdf.org)
Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 24 / 25