YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It...
Transcript of YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It...
![Page 1: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/1.jpg)
YOLO: You Only Look Once
Unified Real-Time Object Detection
Presenter: Liyang Zhong Quan Zou
![Page 2: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/2.jpg)
Outline
1. Review: R-CNN
2. YOLO: -- Detection Procedure
-- Network Design
-- Training Part
-- Experiments
![Page 3: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/3.jpg)
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
![Page 4: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/4.jpg)
Proposal + Classification
![Page 5: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/5.jpg)
Shortcoming:1. Slow, impossible for real-time detection
2. Hard to optimize
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
![Page 6: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/6.jpg)
WHAT’S NEW
Regression
![Page 7: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/7.jpg)
YOLO Features:1. Extremely fast (45 frames per second)
2. Reason Globally on the Entire Image
3. Learn Generalizable Representations
![Page 8: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/8.jpg)
Detection Procedure https://docs.google.com/presentation/d/1kAa7NOamBt4calBU9iHgT8a86RRHz9Yz2oh4-GTdX6M/edit#slide=id.g151008b386_0_44
![Page 9: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/9.jpg)
We split the image into an S*S grid
![Page 10: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/10.jpg)
We split the image into an S*S grid
7*7 grid
![Page 11: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/11.jpg)
Each cell predicts B boxes(x,y,w,h) and confidences of each box: P(Object)
![Page 12: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/12.jpg)
Each cell predicts B boxes(x,y,w,h) and confidences of each box: P(Object)
![Page 13: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/13.jpg)
Each cell predicts B boxes(x,y,w,h) and confidences of each box: P(Object)
each box predict:
P(Object): probability that the box contains an object
B = 2
![Page 14: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/14.jpg)
Each cell predicts B boxes(x,y,w,h) and confidences of each box: P(Object)
![Page 15: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/15.jpg)
Each cell predicts B boxes(x,y,w,h) and confidences of each box: P(Object)
![Page 16: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/16.jpg)
Each cell predicts boxes and confidences: P(Object)
![Page 17: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/17.jpg)
Each cell also predicts a class probability.
Dog
Bicycle Car
Dining Table
![Page 18: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/18.jpg)
Conditioned on object: P(Car | Object)
Dog
Bicycle Car
Dining Table
Eg.Dog = 0.8Cat = 0Bike = 0
![Page 19: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/19.jpg)
Then we combine the box and class predictions.
P(class|Object) * P(Object)=P(class)
![Page 20: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/20.jpg)
Finally we do threshold detections and NMS
![Page 21: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/21.jpg)
S * S * (B * 5 + C) tensor
https://docs.google.com/presentation/d/1kAa7NOamBt4calBU9iHgT8a86RRHz9Yz2oh4-GTdX6M/edit#slide=id.g151008b386_0_44
![Page 22: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/22.jpg)
Network
![Page 23: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/23.jpg)
https://zhuanlan.zhihu.com/p/24916786?refer=xiaoleimlnote
![Page 24: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/24.jpg)
pretrain
https://zhuanlan.zhihu.com/p/24916786?refer=xiaoleimlnote
![Page 25: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/25.jpg)
pretrain stride = 2
https://zhuanlan.zhihu.com/p/24916786?refer=xiaoleimlnote
![Page 26: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/26.jpg)
Train
![Page 27: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/27.jpg)
During training, match example to the right cellhttps://docs.google.com/presentation/d/1kAa7NOamBt4calBU9iHgT8a86RRHz9Yz2oh4-GTdX6M/edit#slide=id.g151008b386_0_44
![Page 28: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/28.jpg)
During training, match example to the right cell
![Page 29: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/29.jpg)
Dog = 1Cat = 0Bike = 0...
Adjust that cell’s class prediction
![Page 30: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/30.jpg)
Look at that cell’s predicted boxes
![Page 31: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/31.jpg)
Find the best one, adjust it, increase the confidence
![Page 32: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/32.jpg)
Find the best one, adjust it, increase the confidence
![Page 33: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/33.jpg)
Find the best one, adjust it, increase the confidence
![Page 34: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/34.jpg)
Decrease the confidence of the other box
![Page 35: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/35.jpg)
Decrease the confidence of the other box
![Page 36: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/36.jpg)
Some cells don’t have any ground truth detections!
![Page 37: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/37.jpg)
Some cells don’t have any ground truth detections!
![Page 38: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/38.jpg)
Decrease the confidence of boxes boxes
![Page 39: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/39.jpg)
Decrease the confidence of these boxes
![Page 40: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/40.jpg)
Don’t adjust the class probabilities or coordinates
![Page 41: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/41.jpg)
https://www.slideshare.net/TaegyunJeon1/pr12-you-only-look-once-yolo-unified-realtime-object-detection?from_action=save
![Page 42: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/42.jpg)
https://www.slideshare.net/TaegyunJeon1/pr12-you-only-look-once-yolo-unified-realtime-object-detection?from_action=save
![Page 43: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/43.jpg)
https://zhuanlan.zhihu.com/p/24916786?refer=xiaoleimlnote
![Page 44: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/44.jpg)
https://www.slideshare.net/TaegyunJeon1/pr12-you-only-look-once-yolo-unified-realtime-object-detection?from_action=save
![Page 45: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/45.jpg)
Experiments
•Datasets•PASCAL VOC 2007 & VOC 2012
![Page 46: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/46.jpg)
Experiments
•Datasets
![Page 47: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/47.jpg)
Accurate object detection is slow!
Pascal 2007 mAP Speed
DPM v5 33.7 .07 FPS 14 s/img
Ref: https://pjreddie.com/publications/
![Page 48: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/48.jpg)
Accurate object detection is slow!
Pascal 2007 mAP Speed
DPM v5 33.7 .07 FPS 14 s/img
R-CNN 66.0 .05 FPS 20 s/img
Ref: https://pjreddie.com/publications/
![Page 49: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/49.jpg)
Accurate object detection is slow!
Pascal 2007 mAP Speed
DPM v5 33.7 .07 FPS 14 s/img
R-CNN 66.0 .05 FPS 20 s/img
⅓ Mile, 1760 feetRef: https://pjreddie.com/publications/
![Page 50: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/50.jpg)
Accurate object detection is slow!
Pascal 2007 mAP Speed
DPM v5 33.7 .07 FPS 14 s/img
R-CNN 66.0 .05 FPS 20 s/img
Fast R-CNN 70.0 .5 FPS 2 s/img
176 feetRef: https://pjreddie.com/publications/
![Page 51: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/51.jpg)
Accurate object detection is slow!
Pascal 2007 mAP Speed
DPM v5 33.7 .07 FPS 14 s/img
R-CNN 66.0 .05 FPS 20 s/img
Fast R-CNN 70.0 .5 FPS 2 s/img
Faster R-CNN 73.2 7 FPS 140 ms/img
12 feet8 feet
Ref: https://pjreddie.com/publications/
![Page 52: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/52.jpg)
Accurate object detection is slow!
Pascal 2007 mAP Speed
DPM v5 33.7 .07 FPS 14 s/img
R-CNN 66.0 .05 FPS 20 s/img
Fast R-CNN 70.0 .5 FPS 2 s/img
Faster R-CNN 73.2 7 FPS 140 ms/img
YOLO 63.4 45 FPS 22 ms/img
2 feetRef: https://pjreddie.com/publications/
![Page 53: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/53.jpg)
Error AnalysisLoc: Localization Error
Correct class,
.1<IOU<.5
Background:
IOU<0.1
![Page 54: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/54.jpg)
YOLO generalizes well to new domains (like art)
Ref: https://pjreddie.com/publications/
![Page 55: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/55.jpg)
It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork
S. Ginosar, D. Haas, T. Brown, and J. Malik. Detecting people in cubist art. In Computer Vision-ECCV 2014 Workshops, pages 101–116. Springer, 2014.
H. Cai, Q. Wu, T. Corradi, and P. Hall. The cross-depiction problem: Computer vision algorithms for recognising objects in artwork and in photographs.
![Page 56: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/56.jpg)
Demo https://youtu.be/VOC3huqHrsshttps://youtu.be/VOC3huqHrss
![Page 57: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/57.jpg)
Strengths and Weaknesses● Strengths:
○ Fast: 45fps, smaller version 155fps○ End2end training○ Background error is low
![Page 58: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/58.jpg)
● Weaknesses:○ Performance is lower than state-of-art ○ Makes more localization errors
Strengths and Weaknesses
![Page 59: YOLO: You Only Look Onceweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/YOLO.pdf · It outperforms methods like DPM and R-CNN when generalizing to person detection in artwork](https://reader033.fdocuments.us/reader033/viewer/2022050218/5f641644bd86b60bd27c6ccb/html5/thumbnails/59.jpg)
Open Questions● How to determine the number of cell, bounding box and the size of the box
● Why normalization x,y,w,h even all the input images have the same resolution?
●