Deep Learning for Computer Vision Pr. Jenny Benois-Pineau ...benois-p/DeepLearning... · input data...

30
Deep Learning for Computer Vision Pr. Jenny Benois-Pineau LABRI UMR 5800/Université Bordeaux Chapter 2. From Shallow to Deep

Transcript of Deep Learning for Computer Vision Pr. Jenny Benois-Pineau ...benois-p/DeepLearning... · input data...

Page 1: Deep Learning for Computer Vision Pr. Jenny Benois-Pineau ...benois-p/DeepLearning... · input data without labelled responses. Clustering is the most common unsupervised learning

Deep Learning for Computer Vision Pr. Jenny Benois-Pineau LABRI UMR 5800/Université Bordeaux Chapter 2. From Shallow to Deep

Page 2: Deep Learning for Computer Vision Pr. Jenny Benois-Pineau ...benois-p/DeepLearning... · input data without labelled responses. Clustering is the most common unsupervised learning

Chapter 2

Summary. 1. Kinds of machine Learning. 1.1. Unsupervised learning, 1.2. Supervised learning, main formulations, 2. Artificial Neural Networks 3. Multi-Layered Perceptron (MLP)

Deep Learning for Computer Vision 2

Page 3: Deep Learning for Computer Vision Pr. Jenny Benois-Pineau ...benois-p/DeepLearning... · input data without labelled responses. Clustering is the most common unsupervised learning

1. Kinds of Machine learning

➔  Once more ➔  Machine learning teaches computers to do what comes naturally to

humans and animals: learn from experience. ➔  Machine learning algorithms use computational methods to “learn” information directly from data without ➔  The algorithms adaptively improve their performance as the number of

samples available for learning increases.

Deep Learning for Computer Vision 3

Oge Marques, IPTA’2017

Page 4: Deep Learning for Computer Vision Pr. Jenny Benois-Pineau ...benois-p/DeepLearning... · input data without labelled responses. Clustering is the most common unsupervised learning

Types of Learning

Deep Learning for Computer Vision 4

Page 5: Deep Learning for Computer Vision Pr. Jenny Benois-Pineau ...benois-p/DeepLearning... · input data without labelled responses. Clustering is the most common unsupervised learning

Evaluation Metrics

Deep Learning for Computer Vision 5

Confusion Matrix

Page 6: Deep Learning for Computer Vision Pr. Jenny Benois-Pineau ...benois-p/DeepLearning... · input data without labelled responses. Clustering is the most common unsupervised learning

Quality Metrics

➔  ACC = (TP +TN)/(TP+FP+TN+FN)

➔  BACC = (TP/P + TN/N)/2*

➔  TPR = TP/(TP+FN) or recall (R) ➔  TNR = TN/(TN + FP)

➔  P = TP/(TP+FP) ➔  F-score = 2/(1/R + 1/P)* ➔  FPR = FP/(FP+TN) ➔  FNR = FN/(FN+TP)

➔  * - unbalanced classes

Deep Learning for Computer Vision 6

Page 7: Deep Learning for Computer Vision Pr. Jenny Benois-Pineau ...benois-p/DeepLearning... · input data without labelled responses. Clustering is the most common unsupervised learning

ROC curve

➔  If we consider a binary classification problem ➔  The classifier dependent on the threshold, then for different values of

the threshold

➔  In multi-class classification problem : one vs all ! And we plot for all N classes

Deep Learning for Computer Vision 7

Page 8: Deep Learning for Computer Vision Pr. Jenny Benois-Pineau ...benois-p/DeepLearning... · input data without labelled responses. Clustering is the most common unsupervised learning

1.1. Unsupervised learning(1)

➔  Unsupervised learning finds hidden patterns or intrinsic structures in data. ➔  It is used to draw inferences from datasets consisting of input data without labelled responses. ➔  Clustering is the most common unsupervised learning technique. ➔  It is used for exploratory data analysis to find hidden patterns or groupings in data. ➔  Applications in computer vision : visual data summarization (collections

of images, video)

Deep Learning for Computer Vision 8

Page 9: Deep Learning for Computer Vision Pr. Jenny Benois-Pineau ...benois-p/DeepLearning... · input data without labelled responses. Clustering is the most common unsupervised learning

K-means clustering

➔  J. MacQueen, “Some methods for classification and analysis of multivariate observations”, Proc. Of the Fifth Berkley Symposium on Math. Stat. And Prob., pp. 281 – 296, 1967

➔  Principle : Unisupervised clasification with a priori known numebr of clusters.

➔  Parameter : the number k of clusters ➔  Input data : a sample of M descriptor vectors x1,... xM. ➔  (1) Chose k initial centers c1,... ck

➔  (2) For each of M vectors, assign it to the i-th cluster the center ci of which is closest in the sense of chosen metrics

➔  (3) If none vector changes its class then stop. ➔  (4) Compute new centers: for each i, ci is the mean of vectors of th

eclass i ➔  (5) Go to 2

Deep Learning for Computer Vision 9

Page 10: Deep Learning for Computer Vision Pr. Jenny Benois-Pineau ...benois-p/DeepLearning... · input data without labelled responses. Clustering is the most common unsupervised learning

Application example(1)

➔  Lifelogging ( K. Gurin, A. Smeaton DCU)

Deep Learning for Computer Vision 10

http://www.slideshare.net/cgurrin/biohackers-summit-2015-lifelogging-a-new-era-of-personal-data

Page 11: Deep Learning for Computer Vision Pr. Jenny Benois-Pineau ...benois-p/DeepLearning... · input data without labelled responses. Clustering is the most common unsupervised learning

Application Example (2)

➔  Grouping of similar images ➔  Selection of the cluster center : “Hyper-scenes”

Deep Learning for Computer Vision 11

H1

H2

H3

H4 H. Nicolas, A. Manoury, J. Benois-Pineau, Wi. Dupuis, D. Barba: Grouping video shots into scenes based on 1D mosaic descriptors. IEEE ICIP 2004: 637-640

Page 12: Deep Learning for Computer Vision Pr. Jenny Benois-Pineau ...benois-p/DeepLearning... · input data without labelled responses. Clustering is the most common unsupervised learning

Hierarchical agglomerative clustering (HAG)

➔  Principle : ➔  (1) At the initialisation each descriptor-vector in the sample forms a class ➔  (2) While the number of clusters is larger than k ( limit k=1)

›  Groupe classes in the sense of a distance d Distance between clusters Max-link Min-link Mean-link

Deep Learning for Computer Vision 12

( )yxdji CyCx

ji CCd ,max,

max ),(∈∈

=

( )yxdji CyCx

ji CCd ,min,

min ),(∈∈

=

dmean(Ci ,Cj ) =1

ni ×nj l=1

l=ni

∑ dp=1

p=nj

∑ xl , yp( )

Page 13: Deep Learning for Computer Vision Pr. Jenny Benois-Pineau ...benois-p/DeepLearning... · input data without labelled responses. Clustering is the most common unsupervised learning

Dendrogramm

Deep Learning for Computer Vision 13

S. Benini et al. Extraction of Significant Video Summaries by Dendrogram Analysis. IEEE ICIP 2006: 133-136

Page 14: Deep Learning for Computer Vision Pr. Jenny Benois-Pineau ...benois-p/DeepLearning... · input data without labelled responses. Clustering is the most common unsupervised learning

Supervised learning(1)

➔  Problem statement :

➔  Le us consider a set of pairs

➔  - feature vectors

➔  - labels (of classes)

➔  Let us consider a function ,

➔  Let us now consider a function - the loss of predicting

Deep Learning for Computer Vision 14

x1, y1( ),..., xn , yn( ),... xN , yN( ){ }

xn ∈ RK = X

yn ∈Y

g(x,α) : X →Y g(xn ,α) = yn

L yn , yn( ) yn

Page 15: Deep Learning for Computer Vision Pr. Jenny Benois-Pineau ...benois-p/DeepLearning... · input data without labelled responses. Clustering is the most common unsupervised learning

Supervised learning(2)

➔  Empirical risk minimization : to find a function which minimizes

➔  Structural risk minimization : consider a penalty

➔  If the variable y is discrete – classification, otherwise – regression

➔  For a given form of g the problem consists in finding ➔  optimal parameters

Deep Learning for Computer Vision 15

g

Remp g( ) = 1N L yn ,g xn ,α( )( )n=1

N

J g( ) = Remp g( )+λC g( ),λ ≥ 0

C g( ) :G→ R+

α*

Page 16: Deep Learning for Computer Vision Pr. Jenny Benois-Pineau ...benois-p/DeepLearning... · input data without labelled responses. Clustering is the most common unsupervised learning

Quality of prediction

Deep Learning for Computer Vision 16

Type II Error

Type I Error

Page 17: Deep Learning for Computer Vision Pr. Jenny Benois-Pineau ...benois-p/DeepLearning... · input data without labelled responses. Clustering is the most common unsupervised learning

2. Artificial Neural Networks

➔  Biological inspiration ➔  The basic computational unit of the brain is a neuron. Approximately 86

billion neurons can be found in the human nervous system and they are connected with approximately 1014 – 1015 synapses.

Deep Learning for Computer Vision 17

g x,W,b( ) = f wixi + bi∑⎛

⎝⎜

⎠⎟

McCulloch and Pitts, 1943, activation function

f t( ) =1, t > 0

0 otherwise

⎨⎪

⎩⎪

Page 18: Deep Learning for Computer Vision Pr. Jenny Benois-Pineau ...benois-p/DeepLearning... · input data without labelled responses. Clustering is the most common unsupervised learning

Commonly used non-linear functions

➔  Sigmoïd : ➔  Tanh :

➔  Sigmoids saturate and kill gradients(!) ➔  Tanh non-linearity is always preferred to the sigmoid nonlinearity.

Deep Learning for Computer Vision 18

f (t) = 11+ e−t

f (t) = et − e−t

et + e−t

a) Sigmoid b) Tanh

Page 19: Deep Learning for Computer Vision Pr. Jenny Benois-Pineau ...benois-p/DeepLearning... · input data without labelled responses. Clustering is the most common unsupervised learning

ReLu non-linearity

➔  Rectified Linear Unit :

Deep Learning for Computer Vision 19

f (t) =max(t, 0)

- Lower computational cost wrt sigmoid and tanh- Faster convergence

Page 20: Deep Learning for Computer Vision Pr. Jenny Benois-Pineau ...benois-p/DeepLearning... · input data without labelled responses. Clustering is the most common unsupervised learning

A simple neuron

Deep Learning for Computer Vision 20

x =x1x2

⎜⎜

⎟⎟

w =w1w2

⎜⎜

⎟⎟

w1

w2

xTw = x1w1 + x2w2

y = f (x1w1 + x2w2 )

Our simplest function f : Heaviside Step function

f t( ) =1, t > 0

0 otherwise

⎨⎪

⎩⎪

Source: Wikipedia

y = f (x1w1 + x2w2 )><0 10

How to determine weights ? wi

Page 21: Deep Learning for Computer Vision Pr. Jenny Benois-Pineau ...benois-p/DeepLearning... · input data without labelled responses. Clustering is the most common unsupervised learning

Training a neuron

➔  The artificial neuron can be trained to perform as elementary linear classifier, i.e. we can determine the weights which will minimize empirical (or structural) risk of our classifier.

➔  The elementary training algorithm was proposed by Rosenblatt (1958). “Perceptron”

➔  Consider the “training set” : (In our case of a simple neuron is a binary label) ➔  Initialize weights randomly ➔  Then at each iteration t the weights are updated as

➔  “Back propagation”

Deep Learning for Computer Vision 21

wi

x1, y1( ),..., xn , yn( ),... xN , yN( ){ }yi ∈ 0;1{ }

w w0

wit+1= wi

t +η y tn − yn( ) xi ,n

Limitations : the set of functions – classifiers which could be simulated by Perceptron is narrow ( cf. Minsky and Papert (1969) XOR)

Page 22: Deep Learning for Computer Vision Pr. Jenny Benois-Pineau ...benois-p/DeepLearning... · input data without labelled responses. Clustering is the most common unsupervised learning

3. Multi-Layered Perceptron (MLP)

➔  Perceptron ( 1958) -> MLP (1961) – Rosenblatt ➔  Let us consider a binary classification problem

➔  Input layer – is just our data, ➔  Hidden layers produce more abstract features ➔  Hidden layers are fully connected ➔  The output of hidden layers is usually not binary (RELU, Tanh, Sigm)

Deep Learning for Computer Vision 22 http://neuralnetworksanddeeplearning.com/chap1.html

Page 23: Deep Learning for Computer Vision Pr. Jenny Benois-Pineau ...benois-p/DeepLearning... · input data without labelled responses. Clustering is the most common unsupervised learning

Example : Recognition of Handwritten digits

➔  Input is a binarised image

➔  Each matrix if of 28x28

➔  10-class classification problem

Deep Learning for Computer Vision 23

Page 24: Deep Learning for Computer Vision Pr. Jenny Benois-Pineau ...benois-p/DeepLearning... · input data without labelled responses. Clustering is the most common unsupervised learning

The architecture of an MLP with 1 hidden layer

Deep Learning for Computer Vision 24

http://neuralnetworksanddeeplearning.com/chap1.html

Page 25: Deep Learning for Computer Vision Pr. Jenny Benois-Pineau ...benois-p/DeepLearning... · input data without labelled responses. Clustering is the most common unsupervised learning

Training of MLP

➔  As in the case of Perceptron, training of MLP consists in finding optimal configuration of weights at all layers

➔  Principle : is the same, i.e. to minimize the Loss L between prediction and ground truth labels

➔  To stress multilayer architecture let us denote the weight between i-th neuron of the layer l and j-th neuron of the layer l+1

➔  - is the learning rate ➔  I.e. each parameter of MLP is updated in the direction opposite to the

gradient of of the loss function L – gradient descent. To compute the derivatives of the Loss function at each layer we use the “chain rule”.

Deep Learning for Computer Vision 25

wij(t+1),l ,(l+1) = wij

(t )l ,(l+1) −η∂L

∂wijl ,(l+1)

wijl ,(l+1)

η

Page 26: Deep Learning for Computer Vision Pr. Jenny Benois-Pineau ...benois-p/DeepLearning... · input data without labelled responses. Clustering is the most common unsupervised learning

Back-propagation and chain rule(1)

➔  Let us consider a trivial Neural Network

➔  x is input ➔  h stands for hidden layer ➔  o stands for output layer ➔  f is a non-linear transformation ➔  w are synaptic weights ➔  y is the known output ➔  is the predicted output

Deep Learning for Computer Vision 26

whx

f x ⋅wh( )x ⋅whwo

w0 ⋅ f x ⋅wh( )f w0 ⋅ f x ⋅wh( )( )

y y

y

Page 27: Deep Learning for Computer Vision Pr. Jenny Benois-Pineau ...benois-p/DeepLearning... · input data without labelled responses. Clustering is the most common unsupervised learning

Back-propagation(2)

➔  Les us consider Error/Loss function as

➔  The goal is to find

➔  Method : Gradient descent

➔  is the “learning rate” for simplicity, the same for all layers

Deep Learning for Computer Vision 27

L = 12y − y( )

2

wo*,wh

*( )T= Argmin L wo ,wh( )

wo(t+1) = wo

(t ) −η ʹLwo wo(t ) ,wh

(t )( )

wh(t+1) = wh

(t ) −η ʹLwh wo(t ) ,wh

(t )( )

η

Page 28: Deep Learning for Computer Vision Pr. Jenny Benois-Pineau ...benois-p/DeepLearning... · input data without labelled responses. Clustering is the most common unsupervised learning

Back-propagation(3)

➔  How to compute the partial derivatives

➔  Chain rule :

Deep Learning for Computer Vision 28

whx

f x ⋅wh( )x ⋅whwo

w0 ⋅ f x ⋅wh( )f w0 ⋅ f x ⋅wh( )( )

y y

L = 12y − y( )

2ʹLwo = ?

ʹLwo = y − y( ) ⋅ ˆʹywo = y − y( ) ⋅ ʹf ⋅ f x ⋅wh( )

ʹLwh = ?

a b c x( )( )( )ʹ = ʹa b( ) ⋅ ʹb c( ) ⋅ ʹc x( )

ʹLwh = y − y( ) ⋅ ˆʹywh = y − y( ) ⋅ ʹf ⋅wo ⋅ ʹf ⋅ x

Page 29: Deep Learning for Computer Vision Pr. Jenny Benois-Pineau ...benois-p/DeepLearning... · input data without labelled responses. Clustering is the most common unsupervised learning

Back-propagation(4)

➔  How to compute the partial derivatives

Deep Learning for Computer Vision 29

whx

f x ⋅wh( )x ⋅whwo

w0 ⋅ f x ⋅wh( )f w0 ⋅ f x ⋅wh( )( )

y y

ʹLwh = y − y( ) ⋅ ˆʹywh = y − y( ) ⋅ ʹf ⋅wo ⋅ ʹf ⋅ x

wh

ʹLwo = y − y( ) ⋅ ˆʹywo = y − y( ) ⋅ ʹf ⋅ f x ⋅wh( )yhLayer Error

Layer input

Layer Error

yh

Layer input

Page 30: Deep Learning for Computer Vision Pr. Jenny Benois-Pineau ...benois-p/DeepLearning... · input data without labelled responses. Clustering is the most common unsupervised learning

MLP- conclusion

➔  MLP is a fully connected network : each output of a previous layer is connected with all inputs of the next layer

➔  MLP is a feed-forward neural network network : at a test step the input data pass in a direct manner from the input layer up to the output one

➔  MLP trained with back-propagation proved to be very effective supervised learning algorithm

➔  It was widely used for : character recognition, face recognition etc…

➔  Nevertheless, if we work with high resolution images, the number of parameters to train becomes very high. This would kill performances.

➔  Solution : Convolutional Neural Networks (CNN). Deep Learning for Computer Vision 30