Arijit Mondal Dept. of Computer Science & Engineering...

32
IIT Patna 1 Introduction to Deep Learning Arijit Mondal Dept. of Computer Science & Engineering Indian Institute of Technology Patna [email protected]

Transcript of Arijit Mondal Dept. of Computer Science & Engineering...

Page 1: Arijit Mondal Dept. of Computer Science & Engineering …arijit/dokuwiki/lib/exe/fetch.php?media=courses:...IIT Patna 1 Introduction to Deep Learning Arijit Mondal Dept. of Computer

IIT Patna 1

Introduction to Deep Learning

Arijit MondalDept. of Computer Science & Engineering

Indian Institute of Technology Patna

[email protected]

Page 2: Arijit Mondal Dept. of Computer Science & Engineering …arijit/dokuwiki/lib/exe/fetch.php?media=courses:...IIT Patna 1 Introduction to Deep Learning Arijit Mondal Dept. of Computer

IIT Patna 2

Convolutional Neural Network

Page 3: Arijit Mondal Dept. of Computer Science & Engineering …arijit/dokuwiki/lib/exe/fetch.php?media=courses:...IIT Patna 1 Introduction to Deep Learning Arijit Mondal Dept. of Computer

IIT Patna 3

Introduction

• Specialized neural network for processing data that has grid like topology

• Time series data (one dimensional)• Image (two dimensional)

• Found to be reasonably suitable for certain class of problems eg. computer vision

• Instead of matrix multiplication, it uses convolution in at least one of the layers

Page 4: Arijit Mondal Dept. of Computer Science & Engineering …arijit/dokuwiki/lib/exe/fetch.php?media=courses:...IIT Patna 1 Introduction to Deep Learning Arijit Mondal Dept. of Computer

IIT Patna 4

Convolution operation

• Consider the scenario of locating a spaceship with a laser sensor

• Suppose, the sensor is noisy

• Accurate estimation is not possible

• Weighted average of location can provide a good estimate s(t) =∫x(a)w(t − a)da

• x(a) — Location at age a by the sensor, t — current time, w — weight• This is known as convolution• Usually denoted as s(t) = (x ∗ w)(t)

• In neural network terminology x is input, w is kernel and output is referred as featuremap

• Discrete convolution can be represented as s(t) = (x ∗ w)(t) =∞∑

a=∞

x(a)w(t − a)

Page 5: Arijit Mondal Dept. of Computer Science & Engineering …arijit/dokuwiki/lib/exe/fetch.php?media=courses:...IIT Patna 1 Introduction to Deep Learning Arijit Mondal Dept. of Computer

IIT Patna 5

Convolution operation (contd)

• In neural network input is multidimensional and so is kernel

• These will be referred as tensor

• Two dimensional convolution can be defined as

s(i , j) = (I ∗K )(i , j) =∑m

∑n

I (m, n)k(i−m, j−n) =∑m

∑n

I (i−m, j−n)k(m, n)

• Commutative

• In many neural network, it implements as cross-correlation

s(i , j) = (I ∗ K )(i , j) =∑m

∑n

I (i + m, j + n)k(m, n)

• No kernel flip is possible

Page 6: Arijit Mondal Dept. of Computer Science & Engineering …arijit/dokuwiki/lib/exe/fetch.php?media=courses:...IIT Patna 1 Introduction to Deep Learning Arijit Mondal Dept. of Computer

IIT Patna 6

2D convolution

a b c d

e f g h

i j k l

aw+bx+ey+fz +fy+gz

bw+cx cw+dx

+gy+hz

ew+fx

+iy+jz

fw+gx+jy+kz

gw+hx

+ky+lz

w x

y z

Page 7: Arijit Mondal Dept. of Computer Science & Engineering …arijit/dokuwiki/lib/exe/fetch.php?media=courses:...IIT Patna 1 Introduction to Deep Learning Arijit Mondal Dept. of Computer

IIT Patna 7

Advantages

• Convolution can exploit the following properties

• Sparse interaction (Also known as sparse connectivity or sparse weights)• Parameter sharing• Equivariant representation

Page 8: Arijit Mondal Dept. of Computer Science & Engineering …arijit/dokuwiki/lib/exe/fetch.php?media=courses:...IIT Patna 1 Introduction to Deep Learning Arijit Mondal Dept. of Computer

IIT Patna 8

Sparse interaction

• Traditional neural network layers use matrix multiplication to describe how outputsand inputs are related

• Convolution uses a smaller kernel

• Significant reduction in number of parameters• Computing output require few comparison

• For example, if there is m inputs and n outputs, traditional neural network will requirem × n parameters

• If each of the output is connected to at most k units, the number of parameters willbe k × n

Page 9: Arijit Mondal Dept. of Computer Science & Engineering …arijit/dokuwiki/lib/exe/fetch.php?media=courses:...IIT Patna 1 Introduction to Deep Learning Arijit Mondal Dept. of Computer

x1 x2 x3 x4 x5

h1 h2 h3 h4 h5

x1 x2 x3 x4 x5

h1 h2 h3 h4 h5

x3

h1 h2 h3 h4 h5

x3

h2 h3 h4

IIT Patna 9

Sparse connectivity

Page 10: Arijit Mondal Dept. of Computer Science & Engineering …arijit/dokuwiki/lib/exe/fetch.php?media=courses:...IIT Patna 1 Introduction to Deep Learning Arijit Mondal Dept. of Computer

x1 x2 x3 x4 x5

h1 h2 h3 h4 h5

x1 x2 x3 x4 x5

h1 h2 h3 h4 h5

x3

h1 h2 h3 h4 h5

x3

h2 h3 h4

IIT Patna 9

Sparse connectivity

Page 11: Arijit Mondal Dept. of Computer Science & Engineering …arijit/dokuwiki/lib/exe/fetch.php?media=courses:...IIT Patna 1 Introduction to Deep Learning Arijit Mondal Dept. of Computer

x1 x2 x3 x4 x5

h1 h2 h3 h4 h5

x1 x2 x3 x4 x5

h1 h2 h3 h4 h5

x1 x2 x3 x4 x5

h3

x2 x3 x4

h3

IIT Patna 10

Sparse connectivity

Page 12: Arijit Mondal Dept. of Computer Science & Engineering …arijit/dokuwiki/lib/exe/fetch.php?media=courses:...IIT Patna 1 Introduction to Deep Learning Arijit Mondal Dept. of Computer

x1 x2 x3 x4 x5

h1 h2 h3 h4 h5

x1 x2 x3 x4 x5

h1 h2 h3 h4 h5

x1 x2 x3 x4 x5

h3

x2 x3 x4

h3

IIT Patna 10

Sparse connectivity

Page 13: Arijit Mondal Dept. of Computer Science & Engineering …arijit/dokuwiki/lib/exe/fetch.php?media=courses:...IIT Patna 1 Introduction to Deep Learning Arijit Mondal Dept. of Computer

x1 x2 x3 x4 x5

h1 h2 h3 h4 h5

g1 g2 g3 g4 g5

x1 x2 x3 x4 x5

h2 h3 h4

g3

IIT Patna 11

Receptive field

Page 14: Arijit Mondal Dept. of Computer Science & Engineering …arijit/dokuwiki/lib/exe/fetch.php?media=courses:...IIT Patna 1 Introduction to Deep Learning Arijit Mondal Dept. of Computer

x1 x2 x3 x4 x5

h1 h2 h3 h4 h5

g1 g2 g3 g4 g5

x1 x2 x3 x4 x5

h2 h3 h4

g3

IIT Patna 11

Receptive field

Page 15: Arijit Mondal Dept. of Computer Science & Engineering …arijit/dokuwiki/lib/exe/fetch.php?media=courses:...IIT Patna 1 Introduction to Deep Learning Arijit Mondal Dept. of Computer

IIT Patna 12

Parameter sharing

• Same parameters are used for more than one function model

• In tradition neural network, weight is used only once

• Each member of kernel is used at every position of the inputs

• As k � m, the number of parameters will reduced significantly

• Also, require less memory

Page 16: Arijit Mondal Dept. of Computer Science & Engineering …arijit/dokuwiki/lib/exe/fetch.php?media=courses:...IIT Patna 1 Introduction to Deep Learning Arijit Mondal Dept. of Computer

Image source: Deep Learning Book

IIT Patna 13

Edge detection

Page 17: Arijit Mondal Dept. of Computer Science & Engineering …arijit/dokuwiki/lib/exe/fetch.php?media=courses:...IIT Patna 1 Introduction to Deep Learning Arijit Mondal Dept. of Computer

IIT Patna 14

Equivariance

• If the input changes, the output changes in the same

• Specifically, a function f (x) is equivariant to function g if f (g(x)) = g(f (x))

• Example, g is a linear translation• Let B be a function giving image brightness at some integer coordinates and g be a

function mapping from one image to another image function such that I ′ = g(I ) withI ′(x , y) = I (x − 1, y)

• There are cases sharing of parameters across the entire image is not a good idea

Page 18: Arijit Mondal Dept. of Computer Science & Engineering …arijit/dokuwiki/lib/exe/fetch.php?media=courses:...IIT Patna 1 Introduction to Deep Learning Arijit Mondal Dept. of Computer

IIT Patna 15

Pooling

• Typical convolutional network has three stages

• Convolution — several convolution to produce linear activation• Detector stage — linear activation runs through the non-linear unit such as ReLU• Pooling — Output is updated with a summary of statistic of nearby inputs

• Maxpooling reports the maximum output within a rectangular neighbourhood• Average of rectangular neighbourhood• Weighted average using central pixel

• Pooling helps to make representation invariant to small translation

• Feature is more important than where it is present

• Pooling helps in case of variable size of inputs

Page 19: Arijit Mondal Dept. of Computer Science & Engineering …arijit/dokuwiki/lib/exe/fetch.php?media=courses:...IIT Patna 1 Introduction to Deep Learning Arijit Mondal Dept. of Computer

IIT Patna 16

Typical CNN

Input

Detector stage

ReLU

Pooling stage

Next layerC

on

volu

tio

nal

laye

r

Convolution stage

Affine transformation

Input

Detector layer

ReLU

Pooling layer

Next layer

Convolution layer

Affine transformation

Page 20: Arijit Mondal Dept. of Computer Science & Engineering …arijit/dokuwiki/lib/exe/fetch.php?media=courses:...IIT Patna 1 Introduction to Deep Learning Arijit Mondal Dept. of Computer

0.1 1.0 0.2 0.3

1.0 1.0 1.0 0.3

Detector stage

Pooling stage

0.2 0.1 1.0 0.2

0.2 1.0 1.0 1.0

Detector stage

Pooling stage

IIT Patna 17

Invariance of maxpooling

Page 21: Arijit Mondal Dept. of Computer Science & Engineering …arijit/dokuwiki/lib/exe/fetch.php?media=courses:...IIT Patna 1 Introduction to Deep Learning Arijit Mondal Dept. of Computer

Image source: Deep Learning Book

IIT Patna 18

Learned invariances

Page 22: Arijit Mondal Dept. of Computer Science & Engineering …arijit/dokuwiki/lib/exe/fetch.php?media=courses:...IIT Patna 1 Introduction to Deep Learning Arijit Mondal Dept. of Computer

0.1 1.0 0.2 1.0 0.3 0.0

1.0 1.0 0.3

IIT Patna 19

Pooling with downsampling

Page 23: Arijit Mondal Dept. of Computer Science & Engineering …arijit/dokuwiki/lib/exe/fetch.php?media=courses:...IIT Patna 1 Introduction to Deep Learning Arijit Mondal Dept. of Computer

x1 x2 x3 x4 x5

s1 s2 s3

Strided convolution

IIT Patna 20

Strided convolution

Page 24: Arijit Mondal Dept. of Computer Science & Engineering …arijit/dokuwiki/lib/exe/fetch.php?media=courses:...IIT Patna 1 Introduction to Deep Learning Arijit Mondal Dept. of Computer

x1 x2 x3 x4 x5

x1 x2 x3 x4 x5

s1 s2 s3

Down-sampling

Convolution

IIT Patna 21

Strided convolution (contd)

Page 25: Arijit Mondal Dept. of Computer Science & Engineering …arijit/dokuwiki/lib/exe/fetch.php?media=courses:...IIT Patna 1 Introduction to Deep Learning Arijit Mondal Dept. of Computer

IIT Patna 22

Zero padding

Page 26: Arijit Mondal Dept. of Computer Science & Engineering …arijit/dokuwiki/lib/exe/fetch.php?media=courses:...IIT Patna 1 Introduction to Deep Learning Arijit Mondal Dept. of Computer

x1 x1 x1 x1 x1

s1 s1 s1 s1 s1

x1 x1 x1 x1 x1

s1 s1 s1 s1 s1

a b a b a b a b a

x1 x1 x1 x1 x1

s1 s1 s1 s1 s1

a b c d e f g h i

IIT Patna 23

Connections

Page 27: Arijit Mondal Dept. of Computer Science & Engineering …arijit/dokuwiki/lib/exe/fetch.php?media=courses:...IIT Patna 1 Introduction to Deep Learning Arijit Mondal Dept. of Computer

Input

Output

IIT Patna 24

Local convolution

Page 28: Arijit Mondal Dept. of Computer Science & Engineering …arijit/dokuwiki/lib/exe/fetch.php?media=courses:...IIT Patna 1 Introduction to Deep Learning Arijit Mondal Dept. of Computer

x

H1 H2 H3

y1 y2 y3

U U U

V V VW W

IIT Patna 25

Recurrent convolution network

Page 29: Arijit Mondal Dept. of Computer Science & Engineering …arijit/dokuwiki/lib/exe/fetch.php?media=courses:...IIT Patna 1 Introduction to Deep Learning Arijit Mondal Dept. of Computer

Image source: https://worksheets.codalab.org

IIT Patna 26

AlexNet

Page 30: Arijit Mondal Dept. of Computer Science & Engineering …arijit/dokuwiki/lib/exe/fetch.php?media=courses:...IIT Patna 1 Introduction to Deep Learning Arijit Mondal Dept. of Computer

Image source: http://joelouismarino.github.io

IIT Patna 27

GoogleNet

Page 31: Arijit Mondal Dept. of Computer Science & Engineering …arijit/dokuwiki/lib/exe/fetch.php?media=courses:...IIT Patna 1 Introduction to Deep Learning Arijit Mondal Dept. of Computer

IIT Patna 28

Naive inception

concatenation

Filter

1x1

convolutions

Previous

Layer

3x3

convolutions

convolutions

3x3

3x3

max pooling

Page 32: Arijit Mondal Dept. of Computer Science & Engineering …arijit/dokuwiki/lib/exe/fetch.php?media=courses:...IIT Patna 1 Introduction to Deep Learning Arijit Mondal Dept. of Computer

IIT Patna 29

Inception

concatenation

Filter

3x3

convolutions

convolutions

3x3

3x3

Max Pool convolutions

convolutions

convolutions

1x1

1x1

1x1

1x1

convolutions

Previous

Layer