An (very brief) introduction - ESO€¦ · Prof. Dr. Laura Leal-Taixé An (very brief)...
Transcript of An (very brief) introduction - ESO€¦ · Prof. Dr. Laura Leal-Taixé An (very brief)...
Prof. Dr. Laura Leal-Taixé
www.dvl.in.tum.de
An (very brief) introduction to Deep Learning
for Computer
Vision
Computer Vision
Give eyes to a computer
Computer Vision
Understand every pixel of an image
Computer Vision
road
tree
Semantic segmentation
personcar
Understand every pixel of an image
Computer Vision
person 1
tree
Semantic segmentation
person 2
person 3
Instance-based
segmentation
road
car
Understand every pixel of an image
Computer Vision
Understand every pixel of a video
Semantic segmentation
Instance-based
segmentation
Multiple object
tracking
Dynamic Scene Understanding
Understand every pixel of a video
Semantic segmentation
Instance-based
segmentation
Multiple object
tracking
Artificial Intelligence
Data, examples
Computational models
Perform a given task on new unseen data
LEARNING, TRAINING
Artificial Intelligence
Data, examples
Computational models
Perform a given task on new unseen data
Expert eye
Self-driving cars
AI nowadays
What is Deep Learning, really?
CAT
What is Deep Learning, really?
Each node is a small classifier
What is Deep Learning, really?
Each node is a small classifier
Each classifier makes tiny decision
Furry
Has two eyes
Has a tail
Has paws
Has two ears These are learned
from data!
Convolutions on Images
-5 3 2 -5 34 3 2 1 -31 0 3 3 5-2 0 1 4 45 6 7 9 -1
0 -1 0-1 5 -10 -1 0
Ima
ge
5x5
Ke
rne
l 3
x3
6
Ou
tpu
t 3
x3
5 ⋅ 3 + −1 ⋅ 3 + −1 ⋅ 2 + −1 ⋅ 0 + −1 ⋅ 4 =15 − 9 = 6
-5 3 2 -5 34 3 2 1 -31 0 3 3 5-2 0 1 4 45 6 7 9 -1
Convolutions on Images
0 -1 0-1 5 -10 -1 0
Ima
ge
5x5
Ke
rne
l 3
x3
6 1
Ou
tpu
t 3
x3
5 ⋅ 2 + −1 ⋅ 2 + −1 ⋅ 1 + −1 ⋅ 3 + −1 ⋅ 3 =10 − 9 = 1
-5 3 2 -5 34 3 2 1 -31 0 3 3 5-2 0 1 4 45 6 7 9 -1
Convolutions on Images
0 -1 0-1 5 -10 -1 0
Ima
ge
5x5
Ke
rne
l 3
x3
6 1 8
Ou
tpu
t 3
x3
5 ⋅ 1 + −1 ⋅ (−5) + −1 ⋅ (−3) + −1 ⋅ 3 + −1 ⋅ 2 =5 + 3 = 1
-5 3 2 -5 34 3 2 1 -31 0 3 3 5-2 0 1 4 45 6 7 9 -1
Convolutions on Images
0 -1 0-1 5 -10 -1 0
Ima
ge
5x5
Ke
rne
l 3
x3
6 1 8-7
Ou
tpu
t 3
x3
5 ⋅ 0 + −1 ⋅ 3 + −1 ⋅ 0 + −1 ⋅ 1 + −1 ⋅ 3 =0 − 7 = −7
-5 3 2 -5 34 3 2 1 -31 0 3 3 5-2 0 1 4 45 6 7 9 -1
Convolutions on Images
0 -1 0-1 5 -10 -1 0
Ima
ge
5x5
Ke
rne
l 3
x3
6 1 8-7 9
Ou
tpu
t 3
x3
5 ⋅ 3 + −1 ⋅ 2 + −1 ⋅ 3 + −1 ⋅ 1 + −1 ⋅ 0 =15 − 6 = 9
-5 3 2 -5 34 3 2 1 -31 0 3 3 5-2 0 1 4 45 6 7 9 -1
Convolutions on Images
0 -1 0-1 5 -10 -1 0
Ima
ge
5x5
Ke
rne
l 3
x3
6 1 8-7 9 2
Ou
tpu
t 3
x3
5 ⋅ 3 + −1 ⋅ 1 + −1 ⋅ 5 + −1 ⋅ 4 + −1 ⋅ 3 =15 − 13 = 2
-5 3 2 -5 34 3 2 1 -31 0 3 3 5-2 0 1 4 45 6 7 9 -1
Convolutions on Images
0 -1 0-1 5 -10 -1 0
Ima
ge
5x5
Ke
rne
l 3
x3
6 1 8-7 9 2-5
Ou
tpu
t 3
x3
5 ⋅ 0 + −1 ⋅ 0 + −1 ⋅ 1 + −1 ⋅ 6+ −1 ⋅ (−2) = −5
-5 3 2 -5 34 3 2 1 -31 0 3 3 5-2 0 1 4 45 6 7 9 -1
Convolutions on Images
0 -1 0-1 5 -10 -1 0
Ima
ge
5x5
Ke
rne
l 3
x3
6 1 8-7 9 2-5 -9
Ou
tpu
t 3
x3
5 ⋅ 1 + −1 ⋅ 3 + −1 ⋅ 4 + −1 ⋅ 7 + −1 ⋅ 0 =5 − 14 = −9
-5 3 2 -5 34 3 2 1 -31 0 3 3 5-2 0 1 4 45 6 7 9 -1
Convolutions on Images
0 -1 0-1 5 -10 -1 0
Ima
ge
5x5
Ke
rne
l 3
x3
6 1 8-7 9 2-5 -9 3
Ou
tpu
t 3
x3
5 ⋅ 4 + −1 ⋅ 3 + −1 ⋅ 4 + −1 ⋅ 9 + −1 ⋅ 1 =20 − 17 = 3
Image filters
• Each kernel gives us a different image filter
Prof. Leal-Taixé and Prof. Niessner24
Input
Edge detection
−1 −1 −1−1 8 −1−1 −1 −1
Sharpen
0 −1 0−1 5 −10 −1 0
Box mean
191 1 11 1 11 1 1
Gaussian blur
116
1 2 12 4 21 2 1
LET’S L
EARN T
HESE FIL
TERS!
Convolutions on RGB Images
32
323
3 5
5
32×32×3 image (pixels %)
5×5×3 filter (weights &)
128
28
activation map(also feature map)
Convolve
Convolution Layer
32
323
3 5
5
32×32×3 image
5×5×3 filter
128
28
activation maps
Convolve
1
Let’s apply a different filterwith different weights!
Convolution Layer
32
323
32×32×3 image
528
28
activation maps
Convolve
Let’s apply **five** filters,each with different weights!
Convolution “Layer”
Visualizing a CNN
Low-level
features
Mid-level
features
Object
parts
Big Data
Models know where
to learn from
Hardware
Models are
trainable
Deep
Models are
complex
What made Deep Learning possible?
Big data
ImageNet: Goal 10.000 images per 100.000 words
Deep Learning: what is it good at?
Input A Response B
English sentence
Picture
Audio clip
French sentence
Photo tagging
Transcript
Machine translation
Face recognition
Speech recognition
Supervised learning
How to obtain the model?
Data points
Model parameters
Labels (ground truth)
Estimation
Loss function
Optimization
Deep Learning for Image Classification
Classification
Classification
+
Localization
Object
detection
Instance
segmentation
Instance segmentation with Mask-RCNN
K. He, G. Gkioxari, P. Dollar, R. Girshick. Mask R-CNN. ICCV 2017.
Instance segmentation with Mask-RCNN
K. He, G. Gkioxari, P. Dollar, R. Girshick. Mask R-CNN. ICCV 2017.
Manipulating images
Jun-Yan Zhu*, Taesung Park*, Phillip Isola, and Alexei A. Efros. "Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks", ICCV, 2017.
Manipulating images
Jun-Yan Zhu*, Taesung Park*, Phillip Isola, and Alexei A. Efros. "Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks", ICCV, 2017.
The world is dynamic!
• Deep Learning pushed single-image analysis to a point where results are usable in real-world scenarios
• The world is not static
What we still need in ML: good memory models
Multiple object tracking
Goal: detect and track all objects in a
scene
Video object segmentation
TecoGANLowRes
M. Chu, Y. Xie, L. Leal-Taixé and N. Thuerey. „Temporally Coherent GANs for Video Super-Resolution (TecoGAN)“
Video super resolution
TecoGANLowRes
M. Chu, Y. Xie, L. Leal-Taixé and N. Thuerey. „Temporally Coherent GANs for Video Super-Resolution (TecoGAN)“
Video super resolution
M. Chu, Y. Xie, L. Leal-Taixé and N. Thuerey. „Temporally Coherent GANs for Video Super-Resolution (TecoGAN)“
TecoGAN results
Photo
MapObtain the camera
pose with respect to
the map
Visual localization
Visual Localization: applications
• Robot navigation
• Augmented reality
[Middelberg, Sattler, Untzelmann, Kobbelt, Scalable 6-DOF Localization on Mobile Devices. ECCV 2014]
The challenge: Big data
• We need data, lots of data!
• We might not have the luxury of data for some applications such as medical diagnosis
• Data is biased
Big data
Data bias• Increase diversity in
the data
• Increase diversity in the AI community which is building the algorithms
Data bias
The generalization problem
• Neural networks are GREAT at finding patterns in data they have seen, but not so great at generalizing to new scenarios
True intelligence is still far away
Prof. Dr. Laura Leal-Taixé
www.dvl.in.tum.de
Thank you
https://dvl.in.tum.deIf you have images, contact us!