Presented by Mohammad Nayeem Teli · 2019-12-04 · Mohammad Nayeem Teli. Convolutional Neural...

Post on 05-Jul-2020

4 views 0 download

Transcript of Presented by Mohammad Nayeem Teli · 2019-12-04 · Mohammad Nayeem Teli. Convolutional Neural...

Convolutional Neural Networks

Presented by

Mohammad Nayeem Teli

Convolutional Neural Networks■ ConvNets have been around for a long time, ■ One of the seminal works came from LeCun et al. 1989 ■ MNIST digit and OC recognition ■ Most recent version LeNet-5

■ Approx. 14 M labeled images, 20 K classes ■ Images gathered from internet ■ Human labels via Amazon Turk ■ For object recognition challenge 1.2 M training images and 1000

classeshttp://image-net.org/

Performance on ImageNet

Source: https://paperswithcode.com/sota/image-classification-on-imagenet

Face recognition and verification system using FaceNet and DeepFace Algorithms - Implementation

FaceNet: A Unified Embedding for Face Recognition and Clustering

Florian Schroff, Dmitry Kalenichenko and James Philbin (2015) in Proceedings of the IEEE Computer Society

Conference on Computer Vision and Pattern Recognition 2015.

DeepFace: Closing the Gap to Human-Level Performance in Face Verification

Yaniv Taigman, Ming Yang, Marc’ Aurelio Ranzato and Lior Wolf (2014) in Proceedings of the IEEE Computer

Society Conference on Computer Vision and Pattern Recognition 2014.

Object Detection for autonomous car driving application - YOLO Algorithm Implementation

You Only Look Once: Unified, Real-Time Object Detection Joseph Redmon, Santos Divvala, Ross Girschick and Ali Farhadi (2016) in Proceedings of the

IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2016.

■ You OnlySee Once (YOLO) is an object detection model ■ It runs on an input image through a Convolution Neural

Network

Deep Learning & Art: Neural Style Transfer

A neural algorithm of artistic style Leon A. Gatys, Alexander S. Ecker, Matthias Bethge (2015) in Proceedings of the IEEE Computer Society

Conference on Computer Vision and Pattern Recognition 2015.

■ Implementation of neural style transfer algorithm ■ Generate novel artistic images

Cross-Correlation and Convolution■ Cross-correlation is a similarity measure between image I and kernel

K.Convolution is similar, although one signal is reversed

■ They have two key features: ■ shift invariance :

Same operation is performed at every point in the image ■ linearity

Every pixel is replaced with a linear combination of its neighbors.

S(i, j) = (I ⋆ K )(i, j) = ∑m

∑n

I(i + m, j + n)K(m, n)

S(i, j) = (K * I )(i, j) = ∑m

∑n

I(i − m, j − n)K(m, n)

or, S(i, j) = (K * I )(i, j) = ∑m

∑n

I(m, n)K(i − m, j − n)

Convolution

Gaussian

Cross-Correlation and Convolution

*

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

-1 0 1

-2 0 2

-1 0 1

filterImagen x n

f x f

Cross-Correlation and Convolution

*

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

-1 0 1

-2 0 2

-1 0 1

=

-1 -23 -27

10 0 -30

24 25 -19

5*(-1)+15*0+4*1+10*(-2)+1*0+5*2+6*(-2)+9*0+11*2 = -1

Cross-Correlation and Convolution

*

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

-1 0 1

-2 0 2

-1 0 1

=

Cross-Correlation and Convolution

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

-1x -1 x 0 x 1

x -2 x 0 x 2

x -1 x 0 x 1

Cross-Correlation and Convolution

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

-1 -23

x -1 x 0 x 1

x -2 x 0 x 2

x -1 x 0 x 1

Cross-Correlation and Convolution

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

-1 -23 -27

x -1 x 0 x 1

x -2 x 0 x 2

x -1 x 0 x 1

Cross-Correlation and Convolution

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

-1 -23 -27

10x -1 x 0 x 1

x -2 x 0 x 2

x -1 x 0 x 1

Cross-Correlation and Convolution

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

-1 -23 -27

10 0x -1 x 0 x 1

x -2 x 0 x 2

x -1 x 0 x 1

Cross-Correlation and Convolution

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

-1 -23 -27

10 0 -30

x -1 x 0 x 1

x -2 x 0 x 2

x -1 x 0 x 1

Cross-Correlation and Convolution

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

-1 -23 -27

10 0 -30

24x -1 x 0 x 1

x -2 x 0 x 2

x -1 x 0 x 1

Cross-Correlation and Convolution

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

-1 -23 -27

10 0 -30

24 25x -1 x 0 x 1

x -2 x 0 x 2

x -1 x 0 x 1

Cross-Correlation and Convolution

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

-1 -23 -27

10 0 -30

24 25 -19

Output Image

n-f+1 x n-f+1n x n

x -1 x 0 x 1

x -2 x 0 x 2

x -1 x 0 x 1

Cross-Correlation and Convolution

-1 0 1

-2 0 2

-1 0 1

* =

*

-1 -2 -1

0 0 0

1 2 1

=

horizontal edge detector

vertical edge detector

Cross-Correlation and Convolution

-1 0 1

-2 0 2

-1 0 1

1 2 1

0 0 0

-1 -2 -1

Cross-correlation

Convolution

Neural Network

g - activation function

y = g(w0 * x0 + w1 * x1 + w2 * x2)

w0=-10

w1=+20

w2=+20

g(z) =1

1 + e-z

x0

x1

x2

Neural Network

g - activation function

y = g(w0 * x0 + w1 * x1 + w2 * x2)

w0=-10

w1=+20

+20z

g(z) =1

1 + e-z

x0

x1

x2

Neural Network

a21

a22

a23

a31

a32

a41

g(w211*x1 + w212*x2 + w213*x3)

Neural Network

x1

x2

x3

Activation Functions■ Sigmoid

■ ReLU

■ Hyperbolic tangent

■ Leaky ReLU

CNN

*

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

-1 0 1

-2 0 2

-1 0 1

Image : n × n filter : f × f

CNN

*

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

w1 w2 w3

w4 w5 w6

w7 w8 w9

Image : n × n filter : f × f

CNN

*

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

w1 w2 w3

w4 w5 w6

w7 w8 w9

Image : n × n filter : f × fn = 5 f = 3

# of parameters to learn : 3 × 3 = 9

CNN

*

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

w1 w2 w3

w4 w5 w6

w7 w8 w9

x12

x13

x14

x25

x1

y∧

a1

a9

Image : 5 × 5 f ilter : 3 × 3

# of parameters to learn : 3 × 3 = 9

# of parameters to learn : 9 × 25 = 225

w15 w16 w19

w4 w5 w20

w7 w8 w21

w10 w11 w12

w4 w5 w13

w7 w8 w14

CNN over multiple channels

*

1 5 7 1 -2

10 1 5 1 6

6 9 11 1 1

0 -1

1 0

5 15 -3

10 1 2

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

2 1 -3 8 13

10 1 5 1 15

6 9 11 1 -1

0 -1

1 0

5 15 -6

10 1 0

w1 w2 w3

w4 w5 w6

w7 w8 w9

=

o1 o2 o3

o4 o5 o6

o7 o8 o9

Image : n × n × 3

5 × 5 × 3f ilter : f × f × 3

3 × 3 × 3

n − f + 1 × n − f + 1

3 × 3 × 1

o10 o11 o12

o4 o5 o15

o7 o8 o18

w19 w20 w21

w4 w5 w24

w7 w8 w27

w10 w11 w12

w4 w5 w15

w7 w8 w18

CNN using multiple filters

*

1 5 7 1 -2

10 1 5 1 6

6 9 11 1 1

0 -1

1 0

5 15 -3

10 1 2

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

2 1 -3 8 13

10 1 5 1 15

6 9 11 1 -1

0 -1

1 0

5 15 -6

10 1 0

w1 w2 w3

w4 w5 w6

w7 w8 w9

=

v19 v20 v21

w4 w5 v24

w7 w8 v27

v10 v11 v12

w4 w5 v15

w7 w8 v18

v1 v2 v3

v4 v5 v6

v7 v8 v9

o1 o2 o3

o4 o5 o6

o7 o8 o9

*=

o10 o11 o12

o4 o5 o15

o7 o8 o18

o1 o2 o3

o4 o5 o6

o7 o8 o9

Image : n × n × 3

5 × 5 × 3

horizontal edge filter: f × f × 3

vertical edge filter: f × f × 3 (n − f + 1) × (n − f + 1) × 2

CNN layer

*

1 5 7 1 -210

1 5 1 66 9 1

11 1

0 -11 0

5 15

-310

1 2

5 15

4 0 -11

01 5 1 0

6 9 11

1 -10 -

11 05 1

54

10

1 5

2 1 -3 8 131

01 5 1 1

56 9 11

1 -10 -11 0

5 15

-610 1 0

w19 w20 w21

w4 w5 w24

w7 w8 w27

w10 w11 w12

w4 w5 w15

w7 w8 w18

w1 w2 w3

w4 w5 w6

w7 w8 w9

v19 v20 v21

w4 w5 v24

w7 w8 v27

v10 v11 v12

w4 w5 v15

w7 w8 v18

v1 v2 v3

v4 v5 v6

v7 v8 v9

*

ReLUo1 o2 o3

o4 o5 o6

o7 o8 o9( + b1 (

ReLU

o10 o11 o12

o4 o5 o15

o7 o8 o18( + b2 (Image : n × n × 3

horizontal edge filter: f × f × 3

vertical edge filter: f × f × 3

(n − f + 1) × (n − f + 1) × 2

Padding

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

x -1 x 0 x 1

x -2 x 0 x 2

x -1 x 0 x 1

n × n Output Image (n × n) :

(n + 2p − f + 1) × (n + 2p − f + 1)

Padding

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

0 0 0 0 0

0

0

0

0

0

0

0

0

0

0

0

0

0 0 0 0 0 00

n × nOutput Image (n × n) :

(n + 2p − f + 1) × (n + 2p − f + 1)

n + 2p − f + 1 = n

p =f − 1

2= 1 5 + 2 × 1 − 3 + 1 = 5

Strides

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

n × noutput image: (n + 2p − f + 1) × (n + 2p − f + 1)

Strides

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

⌊ n + 2p − fs

+ 1⌋ × ⌊n + 2p − fs

+ 1⌋Output Imagen × n

stride = 1

Strides

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

⌊ n + 2p − fs

+ 1⌋ × ⌊n + 2p − fs

+ 1⌋Output Imagen × n

stride = 2

Strides

5 15 4 0 -1

10 1 5 1 0

6 9 11 1 -1

0 -1

1 0

5 15 4

10 1 5

⌊ n + 2p − fs

+ 1⌋ × ⌊n + 2p − fs

+ 1⌋Output Image

n × n

⌊ 5 + 2 × 0 − 32

+ 1⌋ × ⌊5 + 2 × 0 − 32

+ 1⌋ = 2 × 2

stride = 2

CNN layer

*

horizontal edge filter

1 5 7 1 -210

1 5 1 66 9 1

11 1

0 -11 0

5 15

-310

1 2

5 15

4 0 -11

01 5 1 0

6 9 11

1 -10 -

11 05 1

54

10

1 5

2 1 -3 8 131

01 5 1 1

56 9 11

1 -10 -11 0

5 15

-610 1 0

w19 w20 w21

w4 w5 w24

w7 w8 w27

w10 w11 w12

w4 w5 w15

w7 w8 w18

w1 w2 w3

w4 w5 w6

w7 w8 w9

v19 v20 v21

w4 w5 v24

w7 w8 v27

v10 v11 v12

w4 w5 v15

w7 w8 v18

v1 v2 v3

v4 v5 v6

v7 v8 v9 vertical edge filter

*

ReLUo1 o2 o3

o4 o5 o6

o7 o8 o9( + b1 (

ReLU

o10 o11 o12

o4 o5 o15

o7 o8 o18( + b2 (Image : n × n × 3

f × f × 3

⌊ n + 2p − fs

+ 1⌋ × ⌊n + 2p − fs

+ 1⌋

f × f × 3

Convolutional Neural Network

32 x 32 x 3

f = 3s = 1p = 0

10 filters30 x 30 x 10

f = 5s = 2p = 0

20 filters13 x 13 x 20

f = 5s = 2p = 0

40 filters5 x 5 x 40

1000

0 / 1

logisticoutput

⌊ n + 2p − fs

+ 1⌋ × ⌊n + 2p − fs

+ 1⌋

Convolutional Neural Network

32 x 32 x 3

f = 3s = 1p = 0

10 filters30 x 30 x 10

f = 5s = 2p = 0

20 filters13 x 13 x 20

f = 5s = 2p = 0

40 filters5 x 5 x 40

1000

softmax output

⌊ n + 2p − fs

+ 1⌋ × ⌊n + 2p − fs

+ 1⌋

Max Pooling

1 5

10 8

6 -1

2 3

4 1

0 9

2 -5

-1 0

Max Pooling

1 5

10 8

6 -1

2 3

4 1

0 9

2 -5

-1 0

10

Max Pooling

1 5

10 8

6 -1

2 3

4 1

0 9

2 -5

-1 0

10 6

Max Pooling

1 5

10 8

6 -1

2 3

4 1

0 9

2 -5

-1 0

10 6

9

Max Pooling

1 5

10 8

6 -1

2 3

4 1

0 9

2 -5

-1 0

10 6

9 2

Max Pooling

1 5

10 8

6 -1

2 3

4 1

1 9

2 -5

-1 6

10

10

1

6

1

5 8 -1 2 -7

f = 3s = 1p = 0

⌊ n + 2p − fs

+ 1⌋ × ⌊n + 2p − fs

+ 1⌋

Max Pooling

1 5

10 8

6 -1

2 3

4 1

1 9

2 -5

-1 6

10 8

10

1

6

1

5 8 -1 2 -7

f = 3s = 1p = 0

⌊ n + 2p − fs

+ 1⌋ × ⌊n + 2p − fs

+ 1⌋

-75 8 -1 2

10

1

6

1

Max Pooling

1 5

10 8

6 -1

2 3

4 1

1 9

2 -5

-1 6

10 8 10

xn + 2p -f

s+ 1⎩ ⎩ n + 2p -f

s+ 1⎩ ⎩

f = 3s = 1p = 0

-75 8 -1 2

10

1

6

1

Max Pooling

1 5

10 8

6 -1

2 3

4 1

1 9

2 -5

-1 6

10 8

10

10

xn + 2p -f

s+ 1⎩ ⎩ n + 2p -f

s+ 1⎩ ⎩

f = 3s = 1p = 0

-75 8 -1 2

10

1

6

1

Max Pooling

1 5

10 8

6 -1

2 3

4 1

1 9

2 -5

-1 6

10 8

10 9

10

xn + 2p -f

s+ 1⎩ ⎩ n + 2p -f

s+ 1⎩ ⎩

f = 3s = 1p = 0

-75 8 -1 2

10

1

6

1

Max Pooling

1 5

10 8

6 -1

2 3

4 1

1 9

2 -5

-1 6

10 8

10 9

10

6

xn + 2p -f

s+ 1⎩ ⎩ n + 2p -f

s+ 1⎩ ⎩

f = 3s = 1p = 0

-75 8 -1 2

10

1

6

1

Max Pooling

1 5

10 8

6 -1

2 3

4 1

1 9

2 -5

-1 6

10 8

10 9

10

6

9

xn + 2p -f

s+ 1⎩ ⎩ n + 2p -f

s+ 1⎩ ⎩

f = 3s = 1p = 0

-75 8 -1 2

10

1

6

1

Max Pooling

1 5

10 8

6 -1

2 3

4 1

1 9

2 -5

-1 6

10 8

10 9

10

6

9 9

xn + 2p -f

s+ 1⎩ ⎩ n + 2p -f

s+ 1⎩ ⎩

f = 3s = 1p = 0

-75 8 -1 2

10

1

6

1

Max Pooling

1 5

10 8

6 -1

2 3

4 1

1 9

2 -5

-1 6

10 8

10 9

10

6

9 9 6

xn + 2p -f

s+ 1⎩ ⎩ n + 2p -f

s+ 1⎩ ⎩

f = 3s = 1p = 0

Average Pooling

1 5

10 8

6 -1

2 3

4 1

0 9

2 -5

-1 0

6 2.5

3.5 -1

LeNet - 5

f = 5s = 1p = 0

6 filters

f = 2s = 2

maxpooling

xn + 2p -f

s+ 1⎩ ⎩ n + 2p -f

s+ 1⎩ ⎩

⎨layer 1

f = 5s = 1p = 0

16 filters

f = 2s = 2

maxpooling

⎨layer 2