CSCE 496/896 Lecture 9: Object...

CSCE496/896

Lecture 9:Object

Detection

Stephen Scott

Introduction

PerformanceMeasures

SPP-net

Fast R-CNN

CSCE 496/896 Lecture 9:Object Detection

Stephen Scott

sscott@cse.unl.edu

1 / 21

CSCE496/896

Lecture 9:Object

Detection

Stephen Scott

Introduction

PerformanceMeasures

SPP-net

Fast R-CNN

Introduction

We know that CNNs are useful in image classificationNow consider object detection

Given an input image, identify what objects (plural) arein it and where they areOutput bounding box of each object

2 / 21

CSCE496/896

Lecture 9:Object

Detection

Stephen Scott

Introduction

PerformanceMeasures

SPP-net

Fast R-CNN

Outline

Performance measuresRCNNSPP-netFast RCNNYOLO

3 / 21

CSCE496/896

Lecture 9:Object

Detection

Stephen Scott

Introduction

PerformanceMeasures

SPP-net

Fast R-CNN

Performance MeasuresMean Average Precision

Mean average precision (mAP) to measure how wellobjects are identified

Recall from Lecture 3

Precision isfraction of thoselabeled positivethat are positiveRecall is fractionof the truepositives that arelabeled positivePrecision-recallcurve plotsprecision vs recall

4 / 21

CSCE496/896

Lecture 9:Object

Detection

Stephen Scott

Introduction

PerformanceMeasures

SPP-net

Fast R-CNN

Performance MeasuresMean Average Precision (2)

Given a ranking (by confidence values) of n items,average precision at n (AP@n) is average of precisionvalues at each position in the ranking:

n∑k=1

P(k)∆r(k) ,

where P(k) is precision at position k and ∆r(k) ischange in recall: r(k)− r(k − 1) (= 0 if instance k isnegative, = 1/Np if k is one of Np positives)

E.g., if ranking = 〈+,+,−,+,−〉, AP@5= (1)(1/3) + (1)(2/3) + (2/3)(2/3) + (3/4)(1) + (3/5)(1)Larger as more positives ranked above negatives

mAP is mean of average precision across all classes

5 / 21

CSCE496/896

Lecture 9:Object

Detection

Stephen Scott

Introduction

PerformanceMeasures

SPP-net

Fast R-CNN

Performance MeasuresIntersection Over Union

Intersection over union (IoU) to measure quality ofbounding boxes

Divide the size of the two boxes’ intersection by the sizeof their union

6 / 21

CSCE496/896

Lecture 9:Object

Detection

Stephen Scott

Introduction

PerformanceMeasures

SPP-net

Fast R-CNN

Basic Idea of Object Detection

Split input image into regions and classify each region witha CNN and other machinery

Region boundary isbounding boxObject detected inregion is object in BB

Issues:

Limited to bounding boxes of fixed sizes and locationsAn object could span regions

7 / 21

CSCE496/896

Lecture 9:Object

Detection

Stephen Scott

Introduction

PerformanceMeasures

SPP-net

Fast R-CNN

Region CNN (Girshick et al. 2014)

R-CNN proposes collection of 2000 regions in imageWarps each region to match input dimensions(227× 227× 3) of CNN to get 4096-dimensionalembedded representationClassifies each embedded vector with class-specificbinary SVMsApply class-specific regressors to fine-tune boundingboxes

8 / 21

CSCE496/896

Lecture 9:Object

Detection

Stephen Scott

Introduction

PerformanceMeasures

SPP-net

Fast R-CNN

Region CNN (Girshick et al. 2014)Example from Girshick (2015)

9 / 21

CSCE496/896

Lecture 9:Object

Detection

Stephen Scott

Introduction

PerformanceMeasures

SPP-net

Fast R-CNN

Region CNN (Girshick et al. 2014)Selective Search

Popular method to propose RoIs: selective search1 Segment the image2 Compute bounding

boxes of segments3 Iteratively merge

adjacent segmentsbased on similarity

Linearcombination ofsimilarities of:color, texture, size,shape

4 Goto 2

10 / 21

CSCE496/896

Lecture 9:Object

Detection

Stephen Scott

Introduction

PerformanceMeasures

SPP-net

Fast R-CNN

Region CNN (Girshick et al. 2014)Issues

Training and detection are slowDetection: 13s/image on GPU, 53s/image on CPUDue to large number of regions proposed, each runthrough CNN and classified

11 / 21

CSCE496/896

Lecture 9:Object

Detection

Stephen Scott

Introduction

PerformanceMeasures

SPP-net

Fast R-CNN

Spatial Pyramid Pooling (He et al. 2015)

Part of R-CNN’s slowdown at test time is running eachRoI through ConvNet separatelyTo speed up test time, instead put entire image throughsingle ConvNet

Choose RoIs from ConvNetoutput and run throughspatial pyramid pooling(SPP) layer

Max/avg pooling withfixed number of binsProduces fixed-lengthvector regardless of inputsize

Fixed-length vectors feed tofully connected layers, thenSVMs12 / 21

CSCE496/896

Lecture 9:Object

Detection

Stephen Scott

Introduction

PerformanceMeasures

SPP-net

Fast R-CNN

Spatial Pyramid Pooling (He et al. 2015)Example from Girshick (2015)

13 / 21

CSCE496/896

Lecture 9:Object

Detection

Stephen Scott

Introduction

PerformanceMeasures

SPP-net

Fast R-CNN

Spatial Pyramid Pooling (He et al. 2015)Drawbacks

While training is faster then R-CNN, is still slow anddisk-intensiveCannot efficiently update ConvNet parameters, so keptfrozen

Each RoI’s receptive field covers most of entire image,so forward pass expensive across all images ofmini-batch

14 / 21

CSCE496/896

Lecture 9:Object

Detection

Stephen Scott

Introduction

PerformanceMeasures

SPP-net

Fast R-CNN

Fast R-CNN (Girshick 2015)Hierarchical Sampling

Similar architecture to SPP-net

Mini-batches constructedvia hierarchical sampling:Sample a similar number ofRoIs over a smaller numberof images

15 / 21

CSCE496/896

Lecture 9:Object

Detection

Stephen Scott

Introduction

PerformanceMeasures

SPP-net

Fast R-CNN

Fast R-CNN (Girshick 2015)Example from Girshick (2015)

16 / 21

CSCE496/896

Lecture 9:Object

Detection

Stephen Scott

Introduction

PerformanceMeasures

SPP-net

Fast R-CNN

Fast R-CNN (Girshick 2015)Example from Girshick (2015)

17 / 21

CSCE496/896

Lecture 9:Object

Detection

Stephen Scott

Introduction

PerformanceMeasures

SPP-net

Fast R-CNN

You Only Look Once (Redmon et al. 2016)

A single, unified networkCan process 45 frames per second on a GPU (155 fpsfor Fast YOLO)Lower mAP than some R-CNN variants, but much fasterHighest mAP of real-time detectors (≥ 30 fps)

18 / 21

CSCE496/896

Lecture 9:Object

Detection

Stephen Scott

Introduction

PerformanceMeasures

SPP-net

Fast R-CNN

You Only Look Once (Redmon et al. 2016)Idea

Divides image into S× SgridEach grid cell predicts Bbounding boxes, each as(x, y,w, h) (coordinates,width, height), and aconfidence (five totalpredictions)

x, y,w, h ∈ [0, 1] (relative to image dimensions and gridcell location)Each cell also predicts C class probabilitiesOutput is S× S× (5B + C) tensor

19 / 21

CSCE496/896

Lecture 9:Object

Detection

Stephen Scott

Introduction

PerformanceMeasures

SPP-net

Fast R-CNN

You Only Look Once (Redmon et al. 2016)Architecture

Leaky ReLU for all layers except output, which is linear

20 / 21

CSCE496/896

Lecture 9:Object

Detection

Stephen Scott

Introduction

PerformanceMeasures

SPP-net

Fast R-CNN

You Only Look Once (Redmon et al. 2016)Training

Pretrained 20 convolutional layers on ImageNet 1000Added 4 convolutional layers and 2 connected layersTrained to optimize weighted square loss functionλcoord = 5 times more weight on (x, y,w, h) predictions

21 / 21

CSCE 496/896 Lecture 9: Object...

Documents

Transcript of CSCE 496/896 Lecture 9: Object...

04PO, p. 896

An application of Residual Network and Faster - RCNN for Medico ...ceur-ws.org/Vol-2283/MediaEval_18_paper_11.pdf · An application of Residual Network and Faster - RCNN for Medico:

A-Fast-RCNN: Hard Positive Generation via …openaccess.thecvf.com/content_cvpr_2017/papers/Wang_A...A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection Xiaolong

Introduction CSCE 496/896 Lecture 6: Recurrent ...cse.unl.edu/~sscott/teach/Classes/cse496S18/slides/6-Recurrent-6up.pdf · CSCE 496/896 Lecture 6: Recurrent Architectures Stephen

CSCE Skills List

CS7015 (Deep Learning) : Lecture 12miteshk/CS7015/Slides/Teaching/pdf/Lecture12.pdf · Some images borrowed from Ross Girshick’s original slides on RCNN, Fast RCNN, etc. Some ideas

Multiple Scale Faster-RCNN Approach to Driver's Cell-Phone ... · Multiple Scale Faster-RCNN Approach to Driver’s Cell-phone Usage and Hands on Steering Wheel Detection ... detector

Object Detection Based on Fast/Faster RCNN Employing Fully ...downloads.hindawi.com/journals/mpe/2018/3598316.pdf · ResearchArticle Object Detection Based on Fast/Faster RCNN Employing

NOTE-RCNN: NOise Tolerant Ensemble RCNN for Semi ...€¦ · training samples using the detectors and then train detec-tors with the updated training samples. The detector can be

D E - docs.rs-online.com · C Round Bezel: 896-6933, 896-6939, 896-6942 D Round 3 hole Bezel: 896-6917, 896-6911, 896-6920 E Square Bezel: 896-6894, 896-6898. 72x72mm Bezel for 896-6894

Pr057 mask rcnn

CSCE 522 - Farkas1 CSCE 522 Network Security. Reading Pfleeger and Pfleeger: Chapter 6 CSCE 522 - Farkas2.

CLTS Handbook for Facilitators in English_WASH-RCNN 2009_Nepal

CSCE 896-004, Fall 2009 cse.unl/~choueiry/F09-896-004/ Questions : cse421@cse.unl

CSCE 496/896 Lecture 9: word2vec and node2veccse.unl.edu/~sscott/teach/Classes/cse496S18/slides/9-xxx2vec.pdf · CSCE 496/896 Lecture 9: word2vec and node2vec Stephen Scott Introduction

CSCE Oct Newsletter

CSCE 510 - Systems Programming Lecture 17 Sockets CSCE March 20, 2013.

CSCE 496/896 Lecture 5: Autoencoderscse.unl.edu/~sscott/teach/Classes/cse496S18/slides/5-Autoencoder… · Denoising AE Sparse AE Contractive AE Variational AE GAN Generative Adversarial

Faster rcnn

Inspiration 896.doc