YOLO: Unified, Real-Time Object Detection · Ren, Shaoqing, et al. "Faster R-CNN: towards real-time...

30
Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 1 / 25 YOLO: Unified, Real-Time Object Detection Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi Anup Deshmukh, Pratheeksha Nair March 28, 2018

Transcript of YOLO: Unified, Real-Time Object Detection · Ren, Shaoqing, et al. "Faster R-CNN: towards real-time...

Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 1 / 25

YOLO: Unified, Real-Time Object DetectionJoseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi

Anup Deshmukh, Pratheeksha Nair

March 28, 2018

Outline

1 Discussion leading towards YOLO!

2 What is YOLO?IntroductionUnified Object DetectionThe design of cost function

3 Experimental analysisComparisons

4 Limitations of YOLO

5 Summary

Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 2 / 25

The Infographic!

Powered by TCPDF (www.tcpdf.org)

Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 3 / 25

Little bit of History

• Object Detection using Hog Features (2005)

• Region-based Convolutional Neural Networks(R-CNN) (Oct2014)

• Fast(R-CNN) (Sept 2015)

• Faster(R-CNN) (Jan 2016)

Powered by TCPDF (www.tcpdf.org)

Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 4 / 25

Next Subsection

1 Discussion leading towards YOLO!

2 What is YOLO?IntroductionUnified Object DetectionThe design of cost function

3 Experimental analysisComparisons

4 Limitations of YOLO

5 Summary

Powered by TCPDF (www.tcpdf.org)

Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 5 / 25

Introduction

• Modelling the Regression problem

• Using bounding boxes with associated class prob.

• Single DNN - end to end optimization

• Full image as input (Understanding of context)

• Extremely fast (45fps)

Powered by TCPDF (www.tcpdf.org)

Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 6 / 25

Next Subsection

1 Discussion leading towards YOLO!

2 What is YOLO?IntroductionUnified Object DetectionThe design of cost function

3 Experimental analysisComparisons

4 Limitations of YOLO

5 Summary

Powered by TCPDF (www.tcpdf.org)

Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 7 / 25

Unified Object Detection

• YOLO sees entire image during both training and testing

• Single regression problem (image pixels to bounding boxes andclasses)

Powered by TCPDF (www.tcpdf.org)

Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 8 / 25

Unified Object Detection

• YOLO sees entire image during both training and testing

• Single regression problem (image pixels to bounding boxes andclasses)

Powered by TCPDF (www.tcpdf.org)

Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 8 / 25

Unified Object Detection - Workflow

• YOLO divides input image into SxS grid

• If the center of an object falls inside a grid cell, that grid cell isresponsible for detecting the object

Powered by TCPDF (www.tcpdf.org)

Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 9 / 25

Unified Object Detection - Workflow

• Each grid cell predicts B bounding boxes and confidence scoresfor these boxes

• Confidence of box being accurate and box containing an object

Confidence = Pr(object) * IOUtruthpred

• Each bounding box is represented as [x , y , w , h, conf ]

Powered by TCPDF (www.tcpdf.org)

Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 10 / 25

Unified Object Detection - Workflow

• Each grid cell predicts B bounding boxes and confidence scoresfor these boxes

• Confidence of box being accurate and box containing an object

Confidence = Pr(object) * IOUtruthpred

• Each bounding box is represented as [x , y , w , h, conf ]

Powered by TCPDF (www.tcpdf.org)

Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 10 / 25

Unified Object Detection - Workflow

• Each grid cell predicts B bounding boxes and confidence scoresfor these boxes

• Confidence of box being accurate and box containing an object

Confidence = Pr(object) * IOUtruthpred

• Each bounding box is represented as [x , y , w , h, conf ]

Powered by TCPDF (www.tcpdf.org)

Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 10 / 25

Unified Object Detection - Workflow

• Each grid cell calculates Ci = Pr(Classi |Object) for all classes ofobjects

• In each grid cell, one box is chosen and Ci is calculated

• This is encoded as a tensor of size [S x S x (B*5 + C)]

Powered by TCPDF (www.tcpdf.org)

Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 11 / 25

Unified Object Detection - Workflow

• Each grid cell calculates Ci = Pr(Classi |Object) for all classes ofobjects

• In each grid cell, one box is chosen and Ci is calculated

• This is encoded as a tensor of size [S x S x (B*5 + C)]

Powered by TCPDF (www.tcpdf.org)

Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 11 / 25

Unified Object Detection - Testing• The conditional class probabilities and confidence of predicted

bounding boxes are multiplied

Pr(Classi |Object) ∗ Pr(Object) ∗ IOUtruthpred = Pr(Classi ) ∗ IOUtruth

pred

• For each predicted box, class specific confidence scores• Confidence of predicted box fitting the object and object

belonging to that class

Powered by TCPDF (www.tcpdf.org)

Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 12 / 25

Quick overview of YOLO

Powered by TCPDF (www.tcpdf.org)

Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 13 / 25

Next Subsection

1 Discussion leading towards YOLO!

2 What is YOLO?IntroductionUnified Object DetectionThe design of cost function

3 Experimental analysisComparisons

4 Limitations of YOLO

5 Summary

Powered by TCPDF (www.tcpdf.org)

Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 14 / 25

The design

• The authors have used sum-squared error because it is easy tooptimize.

• Sum-squared loss should not equally weight errors with objectequally with error when there is no object.

• Sum-squared loss also should not equally weight errors in largeboxes and small boxes.

Powered by TCPDF (www.tcpdf.org)

Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 15 / 25

Cost function

Term 1

Term 2

Powered by TCPDF (www.tcpdf.org)

Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 16 / 25

Cost function

Term 1

Term 2

Powered by TCPDF (www.tcpdf.org)

Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 16 / 25

Next Subsection

1 Discussion leading towards YOLO!

2 What is YOLO?IntroductionUnified Object DetectionThe design of cost function

3 Experimental analysisComparisons

4 Limitations of YOLO

5 Summary

Powered by TCPDF (www.tcpdf.org)

Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 17 / 25

DPM - The only other model that runs in realtime

Table:

DPM YOLO

processes 30fps processes 45fpsmAP is 26.1% mAP is 63.4%

Powered by TCPDF (www.tcpdf.org)

Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 18 / 25

R-CNN’s

Table:

RCNN YOLO

processes 5-18fps processes 45fps3x times better at localizing lags in localizing accuracyfar more background errors analyzing image as a whole

Powered by TCPDF (www.tcpdf.org)

Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 19 / 25

The Generalization

• PICASSO and People-Art Dataset

• YOLO generalizes better to new domains better than others

• RCNN drops considerably (Selective search is tuned only fornatural images)

• YOLO has good performance (models shape, size andrelationship between objects and their locations)

Powered by TCPDF (www.tcpdf.org)

Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 20 / 25

Limitations

• Errors are treated same in small bounding boxes versus largebounding boxes.

• Small objects that appear in groups

Powered by TCPDF (www.tcpdf.org)

Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 21 / 25

Summary

YOLO is modelled as a super-fast regression problem which takes aninput image and learns the class probabilities and bounding box

coordinates.

Powered by TCPDF (www.tcpdf.org)

Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 22 / 25

References

• Girshick, Ross, et al. ”Rich feature hierarchies for accurate objectdetection and semantic segmentation.” Proceedings of the IEEEconference on computer vision and pattern recognition. 2014.

• Girshick, Ross. ”Fast r-cnn.” arXiv preprint arXiv:1504.08083(2015).

• Ren, Shaoqing, et al. ”Faster R-CNN: towards real-time objectdetection with region proposal networks.” IEEE transactions onpattern analysis and machine intelligence 39.6 (2017):1137-1149.

• Redmon, Joseph, et al. ”You only look once: Unified, real-timeobject detection.” Proceedings of the IEEE conference oncomputer vision and pattern recognition. 2016.

• The Infographics: https://medium.com/@nikasa1889/the-modern-history-of-object-recognition-infographic-aea18517c318

Powered by TCPDF (www.tcpdf.org)

Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 23 / 25

Questions/Comments?

Powered by TCPDF (www.tcpdf.org)

Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 24 / 25

Thank You!

Powered by TCPDF (www.tcpdf.org)

Anup Deshmukh, Pratheeksha Nair Seminar March 28, 2018 25 / 25