Post on 22-May-2020
Institute of Visual Computing
Introduction to Deep Learning
February 17, 2020
For slides credits we thank Vagia Tsiminaki.
Martin Oswald
Institute of Visual Computing
Outline:
What is Deep Learning?
Artificial Neural Networks
Convolutional Neural Networks
Training Deep Learning Architectures
Applications on Computer Vision
Institute of Visual Computing
What is Deep Learning?
Figure Credit: Adam Gibson, Josh Patterson “Deep Learning”
Institute of Visual Computing
What is Deep Learning?
Machine Learning:
Input Data
Train model
Use trained model for new prediction
Institute of Visual Computing
Towards Deep Learning
Hand-crafted Features
E.g. Canny edges, Harris corners, SIFT
Feature Learning
Extract automatically patterns (features)
Deep Learning
Learning hierarchical representations of data
End to end learning
Institute of Visual Computing
Image Classification
Chihuahua or Muffin ?
f( ) = ”Muffin” f( ) = ”Chihuahua”
Institute of Visual Computing
Image Classification
Machine Learning:
Input Data: Training set of images
and labels
Train model (Image classifier)
Use trained model for new prediction
Institute of Visual Computing
Image Classification
Training Images
TrainingImage
features
Image
classifier
Image
labels
Slide Credit:CS 131, Lecture 1, 2016
Training
Testing
Image
features
Learned image
classifier”Chihuahua”
Institute of Visual Computing
Image Classification
Training Images
TrainingImage
features
Image
classifier
Image
labels
Slide Credit:CS 131, Lecture 1, 2016
Training
Testing
Image
features
Learned image
classifier”Chihuahua”
Feature Engineering
Institute of Visual Computing
Image Classification
Training Images
Image
labels
Slide Credit:CS 131, Lecture 1, 2016
Testing
”Chihuahua”
Training
Image
features
Image
classifier
Learned Model
Learned
features
Learned
classifierLearned Model
Learned
features
Learned
classifier
Feature Learning
Institute of Visual Computing
Image Classification
Training
Images
Image
labels
Learned ModelDeep Learning
Training
Low-level
features
Image
Classifier
High-level
features
Mid-level
features
Mid-level
features
Low-level
features
Image
Classifier
High-level
features
Institute of Visual Computing
Artificial Neural Networks
Figure Credit: Artificial Intelligence Techniques for Modelling of Temperature in the Metal
Cutting Process
Input Layer
Hidden Layer
Output Layer
Institute of Visual Computing
Artificial Neuron
x1, x2,…, xN: Inputs to the neuron
w0,w1, w2,…,wN: Weights on each
input
f: Activation function
Institute of Visual Computing
Activation Function
Sigmoid Function Tanh Activation
ReLU (Rectified Linear Unit)
Institute of Visual Computing
Convolutional Neural Networks
Convolution Layer
Rectified Linear Unit (ReLu)
Pooling Layer
Fully Connected Layer
Institute of Visual Computing
Convolutional Neural Networks
Image Source:http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution
Image
Filter or
KernelConvolution
Feature Map or
Activation Map or
Convolved Feature
Convolution layer to extract features from input
image
Institute of Visual Computing
Convolutional Neural Networks
Image Source: https://ujjwalkarn.me/2016/08/11/intuitive-explanationconvnets/
Convolution layer to extract features from input
image
Institute of Visual Computing
Convolutional Neural Networks
Image Source: https://cs.nyu.edu/~fergus/tutorials/deep_learning_cvpr12/
Convolution layer to extract features from input
image
Institute of Visual Computing
Convolutional Neural Networks
Image Feature Map
Size of Feature Map
Depth: Number of filters
Stride
Zero-padding
Convolution layer to extract features from input
image
Institute of Visual Computing
Convolutional Neural Networks
Rectified Linear Unit (ReLu) to introduce non-
linearity
Most of real-world data are non linear
Convolution is linear operation
Image Source:http://mlss.tuebingen.mpg.de/2015/slides/fergus/Fergus_1.pdf
Output = max (0, Input)
Institute of Visual Computing
Convolutional Neural Networks
Pooling Layer
Max
Average
Sum
Image Source:http://mlss.tuebingen.mpg.de/2015/slides/fergus/Fergus_1.pdf
Institute of Visual Computing
Convolutional Neural Networks
Image Source:http://cs231n.github.io/convolutional-networks/
Pooling Layer to:
Reduce the dimension of input
Reduce the number of parameters and
computations(control overfitting)
Make the network invariant to small
transformations, distortions, translations
Get and almost scale invariant representation of
input
Institute of Visual Computing
Convolutional Neural Networks
Fully Connected Layer
Each node is connected to each node in the
adjacent layer
Input: High-level features of the input image
from the convolutional and pooling layers
Goal:
Classification
Regression
Segmentation
Institute of Visual Computing
Convolutional Neural Networks
Fully Connected Layer
Each node is connected to each node in the
adjacent layer
Input: High-level features of the input image
from the convolutional and pooling layers
Goal:
Classification
Multi Layer Perceptron with a softmax activation function in
the output layer
Institute of Visual Computing
Image Classification
Chihuahua or Muffin ?
f( ) = ”Muffin” f( ) = ”Chihuahua”
Institute of Visual Computing
Image Classification
Training
Images
Image
labels
Learned ModelDeep Learning
Training
Low-level
features
Image
Classifier
High-level
features
Mid-level
features
Mid-level
features
Low-level
features
Image
Classifier
High-level
features
Institute of Visual Computing
Training Deep Learning Architectures
Classes= {”Chihuahua”, ” Muffin”}
Input= ”Chihuahua”
Target Vector= [1,0]
Chihuahua(0)
Muffin(1)
Institute of Visual Computing
Training Deep Learning Architectures
Initialize weights of filters
Forward propagation pass
Convolution
ReLu
Pooling
Fully connected layer
Chihuahua(0)
Muffin(1)
Institute of Visual Computing
Training Deep Learning Architectures
Chihuahua(0)
Muffin(1)
Initialize weights of filters
Forward propagation pass
Calculate the total error: 𝐸 = Σ1
2(𝑡𝑎𝑟𝑔𝑒𝑡 − 𝑜𝑢𝑡𝑝𝑢𝑡)2
Backward propagation pass
Institute of Visual Computing
Training Deep Learning Architectures
Chihuahua(0)
Muffin(1)
Calculate the total error: 𝐸 = Σ1
2(𝑡𝑎𝑟𝑔𝑒𝑡 − 𝑜𝑢𝑡𝑝𝑢𝑡)2
Backward propagation pass
Iterate Forward – Backward propagation with all training data
Institute of Visual Computing
Training Deep Learning Architectures
Chihuahua(0)
Muffin(1)
Calculate the total error: 𝐸 = Σ1
2(𝑡𝑎𝑟𝑔𝑒𝑡 − 𝑜𝑢𝑡𝑝𝑢𝑡)2
Backward propagation pass
Iterate Forward – Backward propagation with all training data
Institute of Visual Computing
Applications on Computer Vision
Super-Resolution
Figure Credit: Ledig et al. “Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network”
Image Segmentation &
Classification
Figure Credit: Dai, He and Sun “Instance-aware Semantic Segmentation via Multi-task Network Cascades”
Style Transfer
Figure Credit: Zhu et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks ”
Semantic 3D Reconstruction
Figure Credit: Dai et al. “3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation”, ECCV 2018