SSD: Single Shot MultiBox Detector - GitHub Pages · 2017-04-30 · SSD: Single Shot MultiBox...

SSD: Single Shot MultiBox Detector

Presented by Hongyan Wang and Nathan Watts

Wei Liu(1), Dragomir Anguelov(2), Dumitru Erhan(3), Christian Szegedy(3), Scott Reed(4), Cheng-Yang Fu(1), Alexander C. Berg(1)

UNC Chapel Hill(1), Zoox Inc.(2), Google Inc.(3), University of Michigan(4)

Original slides are from http://www.cs.unc.edu/~wliu/papers/ssd_eccv2016_slide.pdf

Classical Object Detection

• Region selection: Sliding Window

• Feature extraction: SIFT, HOG

• Classification: SVM, Adaboost

Original slides are from http://cs231n.stanford.edu/slides/winter1516_lecture8.pdf

R-CNN problems• Slow at test time: need to run full forward pass of

CNN for each region proposal.

• Complex training pipeline

Fast R-CNN problem

• Region proposal is the bottleneck.

Faster R-CNN problem

• Still slow for real-time detection.

YOLO problems• Accuracy is much worse than faster R-CNN.

• Not good at small objects.

Next ? SSD !

VGGNet Titan X Pascal

10 20 30 40 50Speed (fps)

R-CNN, Girshick 201466% mAP / 0.02 fps

Fast R-CNN, Girshick 201570% mAP / 0.4 fps

Faster R-CNN, Ren 201573% mAP / 7 fps

YOLO, Redmon 201666% mAP / 21 fps

All with VGGNet pretrained on ImageNet, batch_size = 1 on Titan X

10 20 30 40 50Speed (fps)

SSD30074% mAP / 46 fps

6.6x faster

10 20 30 40 50Speed (fps)

6.6x faster

10 20 30 40 50Speed (fps)

11% better

10 20 30 40 50Speed (fps)

6.6x faster

10 20 30 40 50Speed (fps)

11% better

10 20 30 40 50Speed (fps)

Faster R-CNN, Ren 201573% mAP / 7 fps SSD300

74% mAP / 46 fps

10 20 30 40 50Speed (fps)

Two-Stage

box proposal + postclassify Single Shot

Classical sliding windows

Bounding Box Prediction

Is it a cat? No

Discretize the box space densely

SSD and other deep approaches

Is it a cat? No

cat: 0.8 dog: 0.1Is it a cat? No

Is it a cat? No

dog: 0.4 cat: 0.2Is it a cat? No

Discretize the box space more coarselyRefine the coordinates of each boxDiscretize the box space densely

ConvNet

feature map

SSD Output Layer

ConvNet

feature map

SSD Output Layer

small (e.g. 3x3) conv kernel

ConvNet

feature map

SSD Output Layer

small (e.g. 3x3) conv kernel

default box

feature map

box regression

multiclass probabilities

SSD Output Layer

ConvNet

feature map

box regression

multiclass probabilities

SSD Training• Match default boxes to ground truth boxes to determine true/false positives.

• Loss = SmoothL1(box param) + Softmax(class prob)

Smooth L1 loss Softmax loss

Related WorkOriginal slides are from http://www.cs.unc.edu/~wliu/papers/ssd_eccv2016_slide.pdf

Related WorkMultiBox [Erhan et al. CVPR14]

P(objectness) for K boxes

Fully connected

Offsets for K boxes

Related WorkMultiBox [Erhan et al. CVPR14]

Fully connected

Offsets for K boxes

+ post classify boxes

Related WorkYOLO [Redmon et al. CVPR16]

multiclass probfor K boxes

Fully connected

Offsets for K boxes

MultiBox [Erhan et al. CVPR14]

Fully connected

Offsets for K boxes

Fully connected

Offsets for K boxes

Faster R-CNN [Ren et al. NIPS15]

Convolutional

P(objectness) Box offsets

Fully connected

Offsets for K boxes

Fully connected

Offsets for K boxes

Convolutional

Fully connected

Offsets for K boxes

Fully connected

Offsets for K boxes

Convolutional

Box offsetsmulticlass prob

Fully connected

Offsets for K boxes

Contribution #1:Multi-Scale Feature Maps

ConvNet

box regression

multiclass scores

ConvNet

box regression

multiclass scores

stride 2 convolution

ConvNet

box regression

multiclass scores

box regression

multiclass scores

stride 2 convolution

8⇥ 8 feature map 4⇥ 4 feature map

8⇥ 8 feature map

Multi-Scale Feature Maps

8⇥ 8 feature map 4⇥ 4 feature map

8⇥ 8 feature map

Multi-Scale Feature Maps

Faster R-CNN Objectness Proposal, Ren 2015

Prediction source layers from:

use boundary boxes?

# Boxes

38⇥ 38 19⇥ 19 10⇥ 10 5⇥ 5 3⇥ 3 1⇥ 1 Yes No

4 4 4 4 4 4 74.3 63.4 8732

4 4 4 70.7 69.2 9864

4 62.4 64.0 8664

Multi-Scale Feature Maps Experiment

use boundary boxes?

# Boxes

38⇥ 38 19⇥ 19 10⇥ 10 5⇥ 5 3⇥ 3 1⇥ 1 Yes No

4 4 4 4 4 4 74.3 63.4 8732

4 4 4 70.7 69.2 9864

4 62.4 64.0 8664

use boundary boxes?

# Boxes

38⇥ 38 19⇥ 19 10⇥ 10 5⇥ 5 3⇥ 3 1⇥ 1 Yes No

4 4 4 4 4 4 74.3 63.4 8732

4 4 4 70.7 69.2 9864

4 62.4 64.0 8664

use boundary boxes?

# Boxes

38⇥ 38 19⇥ 19 10⇥ 10 5⇥ 5 3⇥ 3 1⇥ 1 Yes No

4 4 4 4 4 4 74.3 63.4 8732

4 4 4 70.7 69.2 9864

4 62.4 64.0 8664

use boundary boxes?

# Boxes

38⇥ 38 19⇥ 19 10⇥ 10 5⇥ 5 3⇥ 3 1⇥ 1 Yes No

4 4 4 4 4 4 74.3 63.4 8732

4 4 4 70.7 69.2 9864

4 62.4 64.0 8664

use boundary boxes?

# Boxes

38⇥ 38 19⇥ 19 10⇥ 10 5⇥ 5 3⇥ 3 1⇥ 1 Yes No

4 4 4 4 4 4 74.3 63.4 8732

4 4 4 70.7 69.2 9864

4 62.4 64.0 8664

boundary boxes

use boundary boxes?

# Boxes

38⇥ 38 19⇥ 19 10⇥ 10 5⇥ 5 3⇥ 3 1⇥ 1 Yes No

4 4 4 4 4 4 74.3 63.4 8732

4 4 4 70.7 69.2 9864

4 62.4 64.0 8664

use boundary boxes?

# Boxes

38⇥ 38 19⇥ 19 10⇥ 10 5⇥ 5 3⇥ 3 1⇥ 1 Yes No

4 4 4 4 4 4 74.3 63.4 8732

4 4 4 70.7 69.2 9864

4 62.4 64.0 8664

use boundary boxes?

# Boxes

38⇥ 38 19⇥ 19 10⇥ 10 5⇥ 5 3⇥ 3 1⇥ 1 Yes No

4 4 4 4 4 4 74.3 63.4 8732

4 4 4 70.7 69.2 9864

4 62.4 64.0 8664

Contribution #2:Splitting the Region Space

ConvNet convolution

SSD300include { 1

2 , 2} box? 4 4include { 1

3 , 3} box? 4number of Boxes 3880 7760 8732

VOC2007 test mAP 71.6 73.7 74.3

ConvNet convolution

Use 38x38 feature map : +2.5 mAP (conv4_3)

Why So Many Default Boxes?Faster R-CNN YOLO SSD300 SSD512

# Default Boxes 6000 98 8732 24564Resolution 1000x600 448x448 300x300 512x512

GT DETECTION

• SmoothL1 or L2 loss for box shape averages among likely hypotheses

GT DETECTION

• Need to have enough default boxes (discrete bins) to do accurate regression in each

GT DETECTION

• Need to have enough default boxes (discrete bins) to do accurate regression in each

• General principle for regressing complex continuous outputs with deep nets

GT DETECTION

Handling Many Default Boxes

• Matching ground truth and default boxes

GT Default box

• Matching ground truth and default boxes‣ Match each GT box to closest default box

GT Default box

‣ Also match each GT box to all unassigned default boxes with IoU > 0.5

GT Default box

• Hard negative mining

GT Default box

• Hard negative mining• Unbalanced training: 1-30 TP, 8k-25k FP

GT Default box

• Hard negative mining• Unbalanced training: 1-30 TP, 8k-25k FP

• Keep TP:FP ratio fixed (1:3), use worst-misclassified FPs.

GT Default box

SSD Architecture

Classifier : Conv: 3x3x(3x(Classes+4))

74.3mAP 46FPS

Classifier : Conv: 3x3x(6x(Classes+4))

Extra Convolutional Feature Maps

Conv: 3x3x(4x(Classes+4))

38 19 10

300 38

SSD Architecture

Contribution #3:The Devil is in the Details

Data Augmentation

data augmentation SSD300horizontal flip 4 4

random crop & color distortion 4VOC2007 test mAP 65.5 74.3

Data Augmentation

Random expansion creates more small training examples

Data Augmentation

Random expansion creates more small training examples

Data Augmentation

data augmentation SSD300horizontal flip 4 4 4

random crop & color distortion 4 4random expansion 4

VOC2007 test mAP 65.5 74.3 77.2

Results on VOC2007 test

Method mAP FPS batch size # Boxes Input resolution

Faster R-CNN (VGG16) 73.2 7 1 ⇠ 6000 ⇠ 1000⇥ 600

Fast YOLO 52.7 155 1 98 448⇥ 448YOLO (VGG16) 66.4 21 1 98 448⇥ 448

SSD300 74.3 46 1 8732 300⇥ 300SSD512 76.8 19 1 24564 512⇥ 512SSD300 74.3 59 8 8732 300⇥ 300SSD512 76.8 22 8 24564 512⇥ 512

Faster R-CNN (VGG16) 73.2 7 1 ⇠ 6000 ⇠ 1000⇥ 600

SSD300 74.3 46 1 8732 300⇥ 300SSD512 76.8 19 1 24564 512⇥ 512SSD300 74.3 59 8 8732 300⇥ 300SSD512 76.8 22 8 24564 512⇥ 512

10%6.6x

Faster R-CNN (VGG16) 73.2 7 1 ⇠ 6000 ⇠ 1000⇥ 600

SSD300 74.3 46 1 8732 300⇥ 300SSD512 76.8 19 1 24564 512⇥ 512SSD300 74.3 59 8 8732 300⇥ 300SSD512 76.8 22 8 24564 512⇥ 512

10%6.6x

Faster R-CNN (VGG16) 73.2 7 1 ⇠ 6000 ⇠ 1000⇥ 600

SSD300 74.3 46 1 8732 300⇥ 300SSD512 76.8 19 1 24564 512⇥ 512SSD300 74.3 59 8 8732 300⇥ 300SSD512 76.8 22 8 24564 512⇥ 512

10%6.6x

Faster R-CNN (VGG16) 73.2 7 1 ⇠ 6000 ⇠ 1000⇥ 600

SSD300 74.3 46 1 8732 300⇥ 300SSD512 76.8 19 1 24564 512⇥ 512SSD300 74.3 59 8 8732 300⇥ 300SSD512 76.8 22 8 24564 512⇥ 512

77.279.8

Faster R-CNN (VGG16) 73.2 7 1 ⇠ 6000 ⇠ 1000⇥ 600

SSD300 74.3 46 1 8732 300⇥ 300SSD512 76.8 19 1 24564 512⇥ 512SSD300 74.3 59 8 8732 300⇥ 300SSD512 76.8 22 8 24564 512⇥ 512

Results on More Datasets

Method VOC2007test

VOC2012test

MS COCOtest-dev

ILSVRC2014val2

Fast R-CNN 70.0 68.4 19.7 N/A

Method VOC2007test

VOC2012test

MS COCOtest-dev

ILSVRC2014val2

Fast R-CNN 70.0 68.4 19.7 N/AFaster R-CNN 73.2 70.4 21.9 N/A

Method VOC2007test

VOC2012test

MS COCOtest-dev

ILSVRC2014val2

YOLO 63.4 57.9 N/A N/A

Method VOC2007test

VOC2012test

MS COCOtest-dev

ILSVRC2014val2

Fast R-CNN 70.0 68.4 19.7 N/A

Method VOC2007test

VOC2012test

MS COCOtest-dev

ILSVRC2014val2

Method VOC2007test

VOC2012test

MS COCOtest-dev

ILSVRC2014val2

YOLO 63.4 57.9 N/A N/A

Method VOC2007test

VOC2012test

MS COCOtest-dev

ILSVRC2014val2

YOLO 63.4 57.9 N/A N/ASSD300 74.3 72.4 23.2 43.4

Method VOC2007test

VOC2012test

MS COCOtest-dev

ILSVRC2014val2

Fast R-CNN 70.0 68.4 19.7 N/A

Method VOC2007test

VOC2012test

MS COCOtest-dev

ILSVRC2014val2

Method VOC2007test

VOC2012test

MS COCOtest-dev

ILSVRC2014val2

YOLO 63.4 57.9 N/A N/A

Method VOC2007test

VOC2012test

MS COCOtest-dev

ILSVRC2014val2

YOLO 63.4 57.9 N/A N/ASSD300 74.3 72.4 23.2 43.4

Method VOC2007test

VOC2012test

MS COCOtest-dev

ILSVRC2014val2

YOLO 63.4 57.9 N/A N/ASSD300 74.3 72.4 23.2 43.4SSD512 76.8 74.9 26.8 46.4

Method VOC2007test

VOC2012test

MS COCOtest-dev

ILSVRC2014val2

Fast R-CNN 70.0 68.4 19.7 N/A

Method VOC2007test

VOC2012test

MS COCOtest-dev

ILSVRC2014val2

Method VOC2007test

VOC2012test

MS COCOtest-dev

ILSVRC2014val2

YOLO 63.4 57.9 N/A N/A

Method VOC2007test

VOC2012test

MS COCOtest-dev

ILSVRC2014val2

YOLO 63.4 57.9 N/A N/ASSD300 74.3 72.4 23.2 43.4

Method VOC2007test

VOC2012test

MS COCOtest-dev

ILSVRC2014val2

YOLO 63.4 57.9 N/A N/ASSD300 74.3 72.4 23.2 43.4SSD512 76.8 74.9 26.8 46.4

Method VOC2007test

VOC2012test

MS COCOtest-dev

ILSVRC2014val2

YOLO 63.4 57.9 N/A N/ASSD300* 77.2 75.8 25.1 N/ASSD512* 79.8 78.5 28.8 N/A

COCO Bounding Box precision

mAP @ IoU 0.5 0.75 0.5:0.95

Faster R-CNN 45.3 23.5 24.2SSD512* 48.5 30.3 28.8

gain +3.2 +6.8 +4.6

Future Work

• Object detection + pose estimation

Future Work

Figure 2. Two-stage vs. Proposed. (a) The two-stage approach separates the detection and pose estimation steps. After object detection,the detected objects are cropped and then processed by a separate network for pose estimation. This requires resampling the image at leastthree times: once for region proposals, once for detection, and once for pose estimation. (b) The proposed method, in contrast, requires noresampling of the image and instead relies on convolutions for detecting the object and its pose in a single forward pass. This offers a largespeed up because the image is not resampled, and computation for detection and pose estimation is shared.

3. ModelFor an input RGB image, a single evaluation of the

model network is performed and produces scores for cat-egory, bounding box offset directions, and pose, for a con-stant number of boxes. These are filtered by non-max sup-pression to produce the final output. The network is a vari-ant of the single shot detection (SSD) network from [10]with additional outputs for pose. Here we present the net-work’s design choices, structure of the outputs, and training.

An SSD-style detector [10] works by adding a sequenceof feature maps of progressively decreasing spatial resolu-tion to an image classification network such as VGG [17].These feature layers replace the last few layers of the imageclassification network, and 3x3 and 1x1 convolutional fil-ters are used to transform one feature map to the next alongwith max-pooling. See Fig. 3 for a depiction of the model.

Predictions for a regularly spaced set of possible detec-tions are computed by applying a collection of 3x3 filtersto channels in one of the feature layers. Each 3x3 filterproduces one value at each location, where the outputs areeither classification scores, localization offsets, and, in ourcase, discretized pose predictions for the object (if any) in abox. See Fig. 1. Note that different sized detections are pro-duced by different feature layers instead of taking the moretraditional approach of resizing the input image or predict-ing different sized detections from a single feature layer.

We take one of two different approaches for pose predic-tions, either sharing outputs for pose across all the objectcategories (share) or having separate pose outputs for eachobject category (separate). One output is added for each of

possible poses. With N

categories of objects, there areN

pose outputs for the separate model and N

poseoutputs for the share model. While we do add a 3x3 filter foreach of the pose outputs, this added cost is relatively smalland the original SSD pipeline is quite fast, so the result isstill faster than two stage approaches that rely on a (oftenslower) detector followed by a separate pose classificationstage. See Fig. 2 (a).

3.1. Pose Estimation Formulation

There are a number of design choices for a joint detectionand pose estimation method. This section details three par-ticular design choices, and Sec. 4.1.1 shows justificationsfor them through experimental results.

One important choice is in how the pose estimation taskis formulated. A possibility is to train for continuous poseestimation and formulate the problem as a regression. How-ever, in this work we discretize the pose space into N

dis-joint bins and formulate the task as a classification problem.Doing so not only makes the task feasible (since both thequantity and consistency of pose labels is not high enoughfor continuous pose estimation), but also allows us to mea-sure the confidence of our pose prediction. Furthermore,discrete pose estimation still presents a very challengingproblem.

Another design choice is whether to predict poses sepa-rately for the N

object classes or to use the same weights topredict poses for all classes. Sec. 4.1.1 assess these options.

The final design choice is the resolution of the input im-age. Specifically, we consider two resolutions for input:

[Poirson et al, coming out at 3DV, 2016]

Future Work

• Single shot 3D bounding box detection

Future Work

• Joint object detection + tracking model

Future Work

• Joint object detection + tracking model

• https://arxiv.org/abs/1609.05590

Future Work

• DSSD: Deconvolutional Single-Shot Detector

Future Work

• Deconvolution layers added to the end

Future Work

• Improves performance even further: 81.5% mAP

Future Work

• Improves performance even further: 81.5% mAP

• https://arxiv.org/abs/1701.06659

Check out the code/models

https://github.com/weiliu89/caffe/tree/ssd

SSD: Single Shot MultiBox Detector - GitHub Pages · 2017-04-30 · SSD: Single Shot MultiBox...

Documents

Transcript of SSD: Single Shot MultiBox Detector - GitHub Pages · 2017-04-30 · SSD: Single Shot MultiBox...

SSD: Single Shot MultiBox Detector (UPC Reading Group)

MULTIBOX - Mounting Brackets, Standpipe Instrument Pillar ... · MULTIBOX - Mounting Brackets, Standpipe Instrument Pillar, Wall Mounting and Mounting Plate . It is generally recommended

An algorithm for highway vehicle detection based on ... · Faster R-CNN and Single Shot MultiBox Detector (SSD) using aspect ratios are [0.5, 1, 2], but the aspect ratio range of

Single Shot Multibox Detector

Detect-SLAM: Making Object Detection and SLAM Mutually ... · Single Shot Multibox Object Detector (SSD) [14] is the ﬁrst DNN-based real-time object detector that achieves above

Object Detection - Virginia Tech · Object Detection Jia-Bin Huang Virginia Tech ECE 6554 Advanced Computer Vision. Today’s class ... SSD: Single Shot MultiBox Detector, ECCV 2016.

SSD: Single Shot MultiBox Detectorpdfs.semanticscholar.org/12bf/156b71ed9aacbb640d5cbef709626b560e71.pdfSSD: Single Shot MultiBox Detector Wei Liu(1), Dragomir Anguelov(2), Dumitru

Multibox 4 - asgteknoloji.com · Multibox 4 K-RTL Multibox 4 K-RTL is used for the individual room temperature control and maximum limitation of the return temperature with, for instance,

MultiBox System - IPS Corp · 82903 MultiBox WMOB TM CPVC HA CP MBS1201HACP 012181-829035 2.30 10 16 82904 MultiBox WMOB TM F1807 HA CP MBS1202HACP 012181-829042 2.30 10 17 82905

SSD: Single Shot MultiBox Detector - Computer Sciencewliu/papers/ssd.pdf · SSD: Single Shot MultiBox Detector Wei Liu1, Dragomir Anguelov2, Dumitru Erhan3, Christian Szegedy3, Scott

SSD: Single Shot MultiBox Detectorkosecka/cs747/presentation-ssd.pdfSSD: Single Shot MultiBox Detector Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang

Mango Leaf Diseases Identi cation Using Convolutional ... · work (R-FCN), and Single Shot Multibox Detector (SSD) are used to recognize 9 di erent tomato diseases using a dataset

MultiBox System - IPS Corp · 2017. 11. 8. · 82903 MultiBox WMOB TM CPVC HA CP MBS1201HACP 012181-829035 2.30 10 16 82904 MultiBox WMOB TM F1807 HA CP MBS1202HACP 012181-829042

SSD: Single Shot MultiBox Detector - Computer Sciencecyfu/pubs/ssd.pdfSSD: Single Shot MultiBox Detector 5 to be assigned to speciﬁc outputs in the ﬁxed set of detector outputs.

Real-time pedestrian detection via hierarchical convolutional feature VERSION.pdf · objects detector SSD (Single Shot MultiBox Detector), we proposed a hierarchical convo-lution

OBJECT DETECTION FROM MMS IMAGERY USING DEEP …€¦ · 2016), Single Shot MultiBox Detector (SSD) (Liu et al., 2016) and two-stage methods such as Fast R-CNN (Girshick, R., 2015)

SSD: Single Shot MultiBox Detectorimlab.postech.ac.kr/dkim/class/csed514_2020s/SSD.pdf · 2020. 6. 9. · SSD: Single Shot MultiBox Detector Wei Liu, Dragomir Anguelov, Dumitru Erhan,

SSD Adapter Selection Guide Selection Guide.pdfSSD Adapter Selection Guide PCIe SSD Card 2.5”SSD Case USB3.0 SSD Enclosure 4GB SSD 8GB SSD 16GB SSD 32GB SSD 60/64GB SSD 80GB SSD

arXiv:1512.02325v2 [cs.CV] 30 Mar 2016 · PDF fileSSD: Single Shot MultiBox Detector Wei Liu1, Dragomir Anguelov2, Dumitru Erhan3, Christian Szegedy3, Scott Reed4, Cheng-Yang Fu 1,

SSD: Single Shot MultiBox Detector - Welcome to the UNC ...wliu/papers/ssd.pdf · SSD: Single Shot MultiBox Detector Wei Liu1, Dragomir Anguelov2, Dumitru Erhan3, Christian Szegedy3,