Object Detection - Through Machine Learningmachinelearning.math.rs/Stegic-ObjectDetection.pdf ·...

Post on 07-Jul-2020

2 views 0 download

Transcript of Object Detection - Through Machine Learningmachinelearning.math.rs/Stegic-ObjectDetection.pdf ·...

Object DetectionThrough Machine Learning

Machine Learning and Applications Group, 2018.

Uroš Stegić

urosstegic@gmx.com

TRADITIONAL COMPUTER VISION

General Overview

Convolution Operator

Filters

Convolutions Over Volume

Traditional Computer Vision Convolutional Neural Networks Object Detection

Description

Process & analyze visual signalExtract information from visual signalPerform on raw signal (pixel intensities values)

3

Traditional Computer Vision Convolutional Neural Networks Object Detection

Enhance Intuition

Computer Graphics

(x0, y0) = (3, 4)r = 6↓

Computer Vision

↓(x0, y0) = (3, 4)

r = 6

4

Traditional Computer Vision Convolutional Neural Networks Object Detection

Tasks in Computer Vision

Object RecognitionImage RetrievalObject DetectionOCRPose Estimation...

TrackingScene ReconstructionOptical FlowSemantic SegmentationImage Reconstuction...

5

Traditional Computer Vision Convolutional Neural Networks Object Detection

Tasks in Computer Vision

Object RecognitionImage RetrievalObject DetectionOCRPose Estimation...

TrackingScene ReconstructionOptical FlowSemantic SegmentationImage Reconstuction...

6

Traditional Computer Vision Convolutional Neural Networks Object Detection

Convolution Operator - Definition

DefinitionLet A,B ∈ D ⊆ Rn×n. Convolution operator, denoted as ∗maps the spaceD×D to a field of real numbers and is definedas follows:

A ∗ B =

n∑i=1

n∑j=1

AijBij

7

Traditional Computer Vision Convolutional Neural Networks Object Detection

Convolution Operator - Example

1 2 34 5 67 8 9

0 1 01 0 10 1 0

8

Traditional Computer Vision Convolutional Neural Networks Object Detection

Convolution Operator - Example

1 2 34 5 67 8 9

0 1 01 0 10 1 0

= 2 ∗ 1 + 4 ∗ 1 + 6 ∗ 1 + 8 ∗ 1 = 20

9

Traditional Computer Vision Convolutional Neural Networks Object Detection

Convolution Operator - Example

1 2 34 5 67 8 9

0 1 01 0 10 1 0

= 2 ∗ 1 + 4 ∗ 1 + 6 ∗ 1 + 8 ∗ 1 = 20

10

Traditional Computer Vision Convolutional Neural Networks Object Detection

Convolution Operator - Example

1 2 34 5 67 8 9

0 1 01 0 10 1 0

= 2 ∗ 1 + 4 ∗ 1 + 6 ∗ 1 + 8 ∗ 1 = 20

11

Traditional Computer Vision Convolutional Neural Networks Object Detection

Convolution Operator - Example

1 2 34 5 67 8 9

0 1 01 0 10 1 0

= 2 ∗ 1 + 4 ∗ 1 + 6 ∗ 1 + 8 ∗ 1 = 20

12

Traditional Computer Vision Convolutional Neural Networks Object Detection

Convolution Operator - Example

1 2 34 5 67 8 9

0 1 01 0 10 1 0

= 2 ∗ 1 + 4 ∗ 1 + 6 ∗ 1 + 8 ∗ 1 = 20

13

Traditional Computer Vision Convolutional Neural Networks Object Detection

Filters

211 39 200 102 174 25 90 144138 44 184 110 193 30 92 136151 73 190 114 189 41 105 128129 101 123 181 201 169 117 191140 122 153 231 209 157 124 113221 115 77 244 198 149 156 247

0 1 01 0 10 1 0

14

Traditional Computer Vision Convolutional Neural Networks Object Detection

Filters - Examples

Vertical Edge ExtractorHorizontal Edge ExtractorSobel filterSharpenGaussian Blur

15

Traditional Computer Vision Convolutional Neural Networks Object Detection

Filters - Edge Extractor

1 0 −11 0 −11 0 −1

=

16

Traditional Computer Vision Convolutional Neural Networks Object Detection

Filters - Edge Extractor

1 0 −11 0 −11 0 −1

=

17

Traditional Computer Vision Convolutional Neural Networks Object Detection

Filters - Sobel

1 0 −12 0 −21 0 −1

=

18

Traditional Computer Vision Convolutional Neural Networks Object Detection

Filters - Gaussian Blur

∗ 1

256

1 4 6 4 14 16 24 16 46 24 36 24 64 16 24 16 41 4 6 4 1

=

19

Traditional Computer Vision Convolutional Neural Networks Object Detection

Multiple Input Channels

Figure: Convolution of multichannel image

20

Traditional Computer Vision Convolutional Neural Networks Object Detection

Multiple Filters

Figure: Convolution of multichannel image with two filters21

CONVOLUTIONAL NEURALNETWORKS

Parameter Learning

Basic CNNs

Residual Networks

Inception Networks

Traditional Computer Vision Convolutional Neural Networks Object Detection

Basic Concepts

Figure: Convolutional layers stacked

23

Traditional Computer Vision Convolutional Neural Networks Object Detection

Basic Concepts - Takeaway

Image ClassificationParameters (filters) Learning [LBD+89]Weight SharingFeature Extraction

24

Traditional Computer Vision Convolutional Neural Networks Object Detection

Basic Concepts - Feature Abstractions

Figure: Feature Visualization [ZF13]

25

Traditional Computer Vision Convolutional Neural Networks Object Detection

Basic Concepts - Pooling Layers

Sampling important FeaturesReduce Computation TimeMake Features Robust

26

Traditional Computer Vision Convolutional Neural Networks Object Detection

Basic Concepts - Pooling Layers (Example)

Pooling Layer - Max Pooling

9 2 4 13 1 8 24 5 9 25 6 0 1

−→[9 86 9

]

27

Traditional Computer Vision Convolutional Neural Networks Object Detection

Basic Concepts - Architecture

Figure: Convolutional Neural Network - Example

28

Traditional Computer Vision Convolutional Neural Networks Object Detection

CNN Architecture - Lenet-5

Figure: Lenet-5 Architecture [LBBH98]

29

Traditional Computer Vision Convolutional Neural Networks Object Detection

CNN Architecture - VGG

Figure: VGG Architecture

30

Traditional Computer Vision Convolutional Neural Networks Object Detection

CNN Architecture - AlexNet

Figure: AlexNet Architecture [KSH12]

31

Traditional Computer Vision Convolutional Neural Networks Object Detection

CNN - Problems

Vanishing GradientExploding GradientComputational Complexity

32

Traditional Computer Vision Convolutional Neural Networks Object Detection

Residual Block

Figure: Residual Block (Skip Connection) [HZRS15]

33

Traditional Computer Vision Convolutional Neural Networks Object Detection

Residual Network

Figure: CNN Architecture - ResNet-34 [HZRS15]

34

Traditional Computer Vision Convolutional Neural Networks Object Detection

1x1 Convolution

Figure: 1x1 Convolution [LCY13]

35

Traditional Computer Vision Convolutional Neural Networks Object Detection

Inception Module - Idea

Figure: Inception Module Naive Version [SLJ+14]

36

Traditional Computer Vision Convolutional Neural Networks Object Detection

Inception Module - Redone

Figure: Inception Module With Dimension Reduction [SLJ+14]

37

Traditional Computer Vision Convolutional Neural Networks Object Detection

Inception Network

Figure: Inception Network (GoogLeNet) [SLJ+14]

38

OBJECT DETECTION

Task Outline

YOLO

RCNN Family

Other Influental Models

Speed/Accuracy Trade-Off

Traditional Computer Vision Convolutional Neural Networks Object Detection

Visualizing the Task

Figure: Object Detection Task40

Traditional Computer Vision Convolutional Neural Networks Object Detection

Understanding the Bounding Box Error

Figure: Bounding Box Missmatch

41

Traditional Computer Vision Convolutional Neural Networks Object Detection

Defining the IoU

Figure: Intersection over Union

42

Traditional Computer Vision Convolutional Neural Networks Object Detection

Gaining Intuition on IoU

Figure: Intersection over Union - Example

43

Traditional Computer Vision Convolutional Neural Networks Object Detection

Similar Bounding Boxes Problem

Figure: Elimination of Multiple Bounding Boxes

44

Traditional Computer Vision Convolutional Neural Networks Object Detection

Non-Maximum Suppression

Threshold every bounding boxSort bounding boxes by detection probability indecresing orderFor each bounding box bi remove all bounding boxesbj(j = i) such that IoU(bi,bj) ≥ t for some fixed t

45

Traditional Computer Vision Convolutional Neural Networks Object Detection

YOLO - Introduction

Figure: Grid for YOLO

y =

pcbxbybwbhc1c2...cn

0You Only Look Once: Unified, Real-Time Object Detection [RDGF15]

46

Traditional Computer Vision Convolutional Neural Networks Object Detection

Limitations (already?)

Problem: Multiple objects centered in same cell

47

Traditional Computer Vision Convolutional Neural Networks Object Detection

Anchor Boxes

Choose a number of anchors (predefined bboxes)Select a ratio (width and height) for each of themModify the output to include this anchors...Profit

48

Traditional Computer Vision Convolutional Neural Networks Object Detection

Anchor Boxes - Example

y1 =

pc1bx1by1bw1bh1c11...cn1

, y2 =

pc2bx2by2bw2bh2c12...cn2

, y =

[y1y2

]

49

Traditional Computer Vision Convolutional Neural Networks Object Detection

YOLO - Loss Fucntion

L(y, y) = λcoord

s2∑i=0

B∑j=0

1objij [(xi − xi)2 + (yi − yi)2]

+ λcoords2∑i=0

B∑j=0

1objij [(

√wi −√

wi)2 + (

√hi −

√hi)

2]

+s2∑i=0

B∑j=0

1objij (Ci − Ci)

2

+ λnoobjs2∑i=0

B∑j=0

1objij (Ci − Ci)

2

+s2∑i=0

1obji

∑c∈classes

(pi(c)− pi(c))2

50

Traditional Computer Vision Convolutional Neural Networks Object Detection

Region Based Approach

Propose Regions of InterestClassify each RoIRegress Bounding Box Coordinates

51

Traditional Computer Vision Convolutional Neural Networks Object Detection

Region Models

Regions with CNN (R-CNN) [GDDM13]Fast R-CNN [Gir15]Faster R-CNN [RHGS15]Mask R-CNN [HGDG17]

52

Traditional Computer Vision Convolutional Neural Networks Object Detection

Region Proposals - Selective Search

Figure: Selective Search Algorithm Visualized

53

Traditional Computer Vision Convolutional Neural Networks Object Detection

R-CNN

Figure: R-CNN Pipeline

54

Traditional Computer Vision Convolutional Neural Networks Object Detection

Fast R-CNN

Convolution Based Sliding WindowROI PoolingSoftmax Classification

55

Traditional Computer Vision Convolutional Neural Networks Object Detection

Fast R-CNN - Sliding Window

Figure: Sliding Window - CNN Implementation

56

Traditional Computer Vision Convolutional Neural Networks Object Detection

Fast R-CNN - Visualized

Figure: Fast R-CNN Pipeline

57

Traditional Computer Vision Convolutional Neural Networks Object Detection

Fast R-CNN - Loss

L(p,u, tu, v) = Lcls(p,u) + λ[u ≥ 1]Lloc(tu, v)

Lcls(p,u) = − logpu

Lloc(tu, v) =∑

i∈{x,y,w,h}smoothL1(tui − vi)

smoothL1(x) ={0.5x2, if x ≤ 1

x− 0.5, otherwise

58

Traditional Computer Vision Convolutional Neural Networks Object Detection

Faster R-CNN

Bottleneck: Region Proposals by Selective Search (2s)Solution: Region Proposals by CNN (0.01s)

59

Traditional Computer Vision Convolutional Neural Networks Object Detection

Region Proposal Network

Figure: Region Proposal Network for Faster R-CNN60

Traditional Computer Vision Convolutional Neural Networks Object Detection

RPN - Loss

L(pi, ti) =1

Ncls

∑iLcls(pi,p∗i ) + λ

1

Nreg

∑ip∗i Lreg(ti, t∗i )

61

Traditional Computer Vision Convolutional Neural Networks Object Detection

Faster R-CNN - Architecture

Figure: Model Scheme of Faster R-CNN

62

Traditional Computer Vision Convolutional Neural Networks Object Detection

Mask R-CNN

Figure: Model Scheme of Faster R-CNN

63

Traditional Computer Vision Convolutional Neural Networks Object Detection

Other Influential Models

RetinaNet (Focal Loss) [LGG+17]Single Shot Detector [LAE+15]

64

Traditional Computer Vision Convolutional Neural Networks Object Detection

RetinaNet

Figure: Retina Net - Overview

65

Traditional Computer Vision Convolutional Neural Networks Object Detection

Speed vs. Precision

Figure: GPU Time vs. Precision [HRS+16]

66

Traditional Computer Vision Convolutional Neural Networks Object Detection

Lecture Pronouncement

CONVERGENCE

67

Traditional Computer Vision Convolutional Neural Networks Object Detection

References I

Ross B. Girshick, Jeff Donahue, Trevor Darrell, andJitendra Malik, Rich feature hierarchies for accurate objectdetection and semantic segmentation, CoRRabs/1311.2524 (2013).Ross B. Girshick, Fast R-CNN, CoRR abs/1504.08083(2015).Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross B.Girshick, Mask R-CNN, CoRR abs/1703.06870 (2017).

68

Traditional Computer Vision Convolutional Neural Networks Object Detection

References II

Jonathan Huang, Vivek Rathod, Chen Sun, MenglongZhu, Anoop Korattikara, Alireza Fathi, Ian Fischer,Zbigniew Wojna, Yang Song, Sergio Guadarrama, andKevin Murphy, Speed/accuracy trade-offs for modernconvolutional object detectors, CoRR abs/1611.10012(2016).Kaiming He, Xiangyu Zhang, Shaoqing Ren, and JianSun, Deep residual learning for image recognition, CoRRabs/1512.03385 (2015).

69

Traditional Computer Vision Convolutional Neural Networks Object Detection

References III

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton,Imagenet classification with deep convolutional neuralnetworks, Advances in Neural Information ProcessingSystems 25 (F. Pereira, C. J. C. Burges, L. Bottou, andK. Q. Weinberger, eds.), Curran Associates, Inc., 2012,pp. 1097–1105.

Wei Liu, Dragomir Anguelov, Dumitru Erhan, ChristianSzegedy, Scott E. Reed, Cheng-Yang Fu, andAlexander C. Berg, SSD: single shot multibox detector,CoRR abs/1512.02325 (2015).Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner,Gradient-based learning applied to document recognition,IEEE (1998), 2278–2324.

70

Traditional Computer Vision Convolutional Neural Networks Object Detection

References IV

Yann Lecun, Bernhard Boser, John Denker, DonHenderson, R E. Howard, W.E. Hubbard, and LarryJackel, Backpropagation applied to handwritten zip coderecognition, Neural Computation 1 (1989), 541–551.

Min Lin, Qiang Chen, and Shuicheng Yan, Network innetwork, CoRR abs/1312.4400 (2013).Tsung-Yi Lin, Priya Goyal, Ross B. Girshick, Kaiming He,and Piotr Dollár, Focal loss for dense object detection,CoRR abs/1708.02002 (2017).Joseph Redmon, Santosh Kumar Divvala, Ross B.Girshick, and Ali Farhadi, You only look once: Unified,real-time object detection, CoRR abs/1506.02640 (2015).

71

Traditional Computer Vision Convolutional Neural Networks Object Detection

References V

Shaoqing Ren, Kaiming He, Ross B. Girshick, and JianSun, Faster R-CNN: towards real-time object detection withregion proposal networks, CoRR abs/1506.01497 (2015).

Christian Szegedy, Wei Liu, Yangqing Jia, PierreSermanet, Scott E. Reed, Dragomir Anguelov, DumitruErhan, Vincent Vanhoucke, and Andrew Rabinovich,Going deeper with convolutions, CoRR abs/1409.4842(2014).Matthew D. Zeiler and Rob Fergus, Visualizing andunderstanding convolutional networks, CoRRabs/1311.2901 (2013).

72