ITK Deformable Registration Demons Methods. Deformable Registration.
Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang...
Transcript of Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang...
![Page 1: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/1.jpg)
Introduction to Deep Learning
Xiaogang Wang
Department of Electronic Engineering, The Chinese University of Hong Kong
![Page 2: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/2.jpg)
Outline
• Historical review of deep learning
• Introduction to classical deep models
• Why does deep learning work?
![Page 3: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/3.jpg)
Machine Learning
)(xFx y
Class label(Classification)Real-valued Vector
(Estimation)
{dog, cat, horse, flower, …}Object recognition
Super resolution
Low-resolution image
High-resolution image
![Page 4: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/4.jpg)
Neural networkBack propagation
1986
• Solve general learning problems
• Tied with biological system
Nature
![Page 5: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/5.jpg)
Neural networkBack propagation
1986
x1 x2 x3
w1 w2w3
g(x)
f(net)
Nature
![Page 6: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/6.jpg)
Neural networkBack propagation
1986
• Solve general learning problems
• Tied with biological system
But it is given up…
• Hard to train
• Insufficient computational resources
• Small training sets
• Does not work well
Nature
![Page 7: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/7.jpg)
Neural networkBack propagation
1986 2006
• SVM
• Boosting
• Decision tree
• KNN
• …
• Flat structures
• Loose tie with biological systems
• Specific methods for specific tasks
– Hand crafted features (GMM-HMM, SIFT, LBP, HOG)
Kruger et al. TPAMI’13
Nature
![Page 8: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/8.jpg)
Neural networkBack propagation
1986 2006
Deep belief netScience
… …
… …
… …
… … • Unsupervised & Layer-wised pre-training
• Better designs for modeling and training (normalization, nonlinearity, dropout)
• New development of computer architectures– GPU
– Multi-core computer systems
• Large scale databases
Big Data !
Nature
![Page 9: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/9.jpg)
Machine Learning with Big Data
• Machine learning with small data: overfitting, reducing model complexity (capacity)
• Machine learning with big data: underfitting, increasing model complexity, optimization, computation resource
![Page 10: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/10.jpg)
How to increase model capacity?
Curse of dimensionality
D. Chen, X. Cao, F. Wen, and J. Sun. Blessing of dimensionality: Highdimensional feature and its efficient compression for face verification. In Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, 2013.
Blessing of dimensionality
Learning hierarchical feature transforms (Learning features with deep structures)
![Page 11: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/11.jpg)
Neural networkBack propagation
1986
• Solve general learning problems
• Tied with biological system
But it is given up…
2006
Deep belief netScience
deep learning results
Speech
2011
Nature
![Page 12: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/12.jpg)
Rank Name Error rate
Description
1 U. Toronto 0.15315 Deep learning
2 U. Tokyo 0.26172 Hand-crafted features and learning models.Bottleneck.
3 U. Oxford 0.26979
4 Xerox/INRIA 0.27058
Object recognition over 1,000,000 images and 1,000 categories (2 GPU)
Neural networkBack propagation
1986 2006
Deep belief netScience Speech
2011 2012
Nature
A. Krizhevsky, L. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” NIPS, 2012.
![Page 13: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/13.jpg)
Examples from ImageNet
![Page 14: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/14.jpg)
Top5 Image Classification Error on ImageNet
14
AlexNet15.375%
ILSVRC 2012 ILSVRC 2013
Clarifai11.197%
ILSVRC 2014
GoogLeNet6.656%
Human5.1%
MSRA4.9%
![Page 15: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/15.jpg)
• How to label the data, or define the evaluation ask is challenging and important
![Page 16: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/16.jpg)
ImageNet Object Detection Task (2013)
16
• 200 object classes
• 40,000 test images
![Page 17: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/17.jpg)
Mean Average Precision (mAP)
17
UvA-Euvision22.581%
ILSVRC 2013
RCNN31.4%
ILSVRC 2014
GoogLeNet43.9%
DeepID-Net50.3%
W. Ouyang and X. Wang, et al. “DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection,” CVPR 2015
![Page 18: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/18.jpg)
Neural networkBack propagation
1986 2006
Deep belief netScience Speech
2011 2012
• ImageNet 2014 – object detection challenge
W. Ouyang and X. Wang et al. “DeepID-Net: deformable deep convolutional neural networks for object detection”, CVPR, 2015
RCNN(Berkley)
Berkley vision
UvA-Euvision
DeepInsight GooLeNet(Google)
DeepID-Net (CUHK)
Model average n/a n/a n/a 40.5 43.9 50.3
Single model 31.4 34.5 35.4 40.2 38.0 47.9
Wanli Ouyang
![Page 19: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/19.jpg)
Labeled Faces in the Wild (2007)
Best results without deep learning
![Page 20: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/20.jpg)
![Page 21: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/21.jpg)
Design Cycle start
Collect data
Preprocessing
Feature design
Choose and design model
Train classifier
Evaluation
end
Domain knowledge Interest of people working on computer vision, speech recognition, medical image processing,…
Interest of people working on machine learning
Interest of people working on machine learning and computer vision, speech recognition, medical image processing,…
Preprocessing and feature design may lose useful information and not be optimized, since they are not parts of an end-to-end learning system
Preprocessing could be the result of another pattern recognition system
![Page 22: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/22.jpg)
Person re-identification pipeline
Pedestrian detection
Pose estimation
Body parts segmentation
Photometric & geometric
transform
Feature extraction
Classification
Face recognition pipeline
Face alignment
Geometric rectification
Photometricrectification
Feature extraction
Classification
![Page 23: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/23.jpg)
Design Cycle with Deep Learning
start
Collect data
Preprocessing(Optional)
Design network
Feature learning
Classifier
Train network
Evaluation
end
• Learning plays a bigger role in the design circle
• Feature learning becomes part of the end-to-end learning system
• Preprocessing becomes optional means that several pattern recognition steps can be merged into one end-to-end learning system
• Feature learning makes the key difference
• We underestimated the importance of data collection and evaluation
![Page 24: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/24.jpg)
What makes deep learning successful in computer vision?
Deep learning
Li Fei-Fei Geoffrey Hinton
Data collection Evaluation task
One million images with labels
Predict 1,000 image categories
CNN is not new
Design network structure
New training strategies
Feature learned from ImageNet can be well generalized to other tasks and datasets!
![Page 25: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/25.jpg)
Learning features and classifiers separately
• Not all the datasets and prediction tasks are suitable for learning features with deep models
Dataset A
feature transform
Classifier 1 Classifier 2...
Prediction on task 1
...Prediction on task 2
Deep learning
Training stage A Dataset B
feature transform
Classifier B
Prediction on task B (Our target task)
Training stage B
![Page 26: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/26.jpg)
Deep learning can be treated as a language to described the world with great flexibility
Collect data
Preprocessing 1
Feature design
Classifier
Evaluation
Preprocessing 2
…
Collect data
Feature transform
Feature transform
…
Classifier
Deep neural network
Evaluation
Connection
![Page 27: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/27.jpg)
Introduction to Deep Learning
• Historical review of deep learning
• Introduction to classical deep models
• Why does deep learning work?
![Page 28: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/28.jpg)
Introduction on Classical Deep Models
• Convolutional Neural Networks (CNN)– Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based Learning Applied to
Document Recognition,” Proceedings of the IEEE, Vol. 86, pp. 2278-2324, 1998.
• Deep Belief Net (DBN)– G. E. Hinton, S. Osindero, and Y. Teh, “A Fast Learning Algorithm for Deep Belief Nets,”
Neural Computation, Vol. 18, pp. 1527-1544, 2006.
• Auto-encoder– G. E. Hinton and R. R. Salakhutdinov, “Reducing the Dimensionality of Data with Neural
Networks,” Science, Vol. 313, pp. 504-507, July 2006.
![Page 29: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/29.jpg)
Classical Deep Models
• Convolutional Neural Networks (CNN)– First proposed by Fukushima in 1980
– Improved by LeCun, Bottou, Bengio and Haffner in 1998
Convolution PoolingLearned filters
![Page 30: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/30.jpg)
Backpropagation
W is the parameter of the network; J is the objective function
Output layer
Hidden layers
Input layer
Target values
Feedforwardoperation
Back error propagation
D. E. Rumelhart, G. E. Hinton, R. J. Williams, “Learning Representations by Back-propagation Errors,” Nature, Vol. 323, pp. 533-536, 1986.
![Page 31: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/31.jpg)
Classical Deep Models
• Deep belief net
– Hinton’06
P(x,h1,h2) = p(x|h1) p(h1,h2)
1
1
1
hx
hx
hx
1hx
,
),(
),(
),(E
E
e
eP
E(x,h1)=b' x+c' h1+h1' Wx
Initial pointPre-training:• Good initialization point• Make use of unlabeled data
![Page 32: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/32.jpg)
Classical Deep Models
• Auto-encoder
– Hinton and Salakhutdinov 2006
x
1h
2h
1h~
x~
1W b1
2W b2
2W' b3
1W' b4Encoding: h1 = σ(W1x+b1)
h2 = σ(W2h1+b2)
Decoding: = σ(W’2h2+b3)
= σ(W’1h1+b4)
1h~
x~
![Page 33: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/33.jpg)
Introduction to Deep Learning
• Historical review of deep learning
• Introduction to classical deep models
• Why does deep learning work?
• Properties of deep feature representations
![Page 34: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/34.jpg)
Feature Learning vs Feature Engineering
![Page 35: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/35.jpg)
Feature Engineering
• The performance of a pattern recognition system heavily depends on feature representations
• Manually designed features dominate the applications of image and video understanding in the past– Reply on human domain knowledge much more than data
– If handcrafted features have multiple parameters, it is hard to manually tune them
– Feature design is separate from training the classifier
– Developing effective features for new applications is slow
![Page 36: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/36.jpg)
Handcrafted Features for Face Recognition
1980s
Geometric features
1992
Pixel vector
1997
Gabor filters
2 parameters
2006
Local binary patterns
3 parameters
![Page 37: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/37.jpg)
Feature Learning
• Learning transformations of the data that make it easier to extract useful information when building classifiers or predictors– Learn the values of a huge number of parameters in feature
representations
– Make better use of big data
– Jointly learning feature transformations and classifiers makes their integration optimal
– Faster to get feature representations for new applications
![Page 38: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/38.jpg)
Deep Learning Means Feature Learning
• Deep learning is about learning hierarchical feature representations
• Good feature representations should be able to disentangle multiple factors coupled in the data
Trainab
le Feature
Transfo
rm
Trainab
le Feature
Transfo
rm
Trainab
le Feature
Transfo
rm
Trainab
le Feature
Transfo
rm
Data …
Classifier
Pixel 1
Pixel n
Pixel 2 Ideal Feature
Transform
view
expression
![Page 39: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/39.jpg)
Deep Learning Means Feature Learning
• How to effectively learn features with deep models
– With challenging tasks
– Predict high-dimensional vectors
Pre-train on classifying 1,000
categories
Fine-tune on classifying 201
categories
Feature representation
SVM binary classifier for each
categoryDetect 200 object classes on ImageNet
W. Ouyang and X. Wang et al. “DeepID-Net: deformable deep convolutional neural networks for object detection”, CVPR, 2015
![Page 40: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/40.jpg)
Dataset A
feature transform
Classifier A
Distinguish 1000 categories
Training stage A
Dataset B
feature transform
Classifier B
Distinguish 201 categories
Training stage B
Dataset C
feature transform
SVM
Distinguish one object class from all the negatives
Training stage C
Fixed
![Page 41: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/41.jpg)
Pedestrian Detection aided by Deep Learning Semantic Tasks
FemaleBagright
Femaleright
MaleBackpackBack
VehicleHorizontal
VehicleVertical
TreeVertical
Y. Tian, P. Luo, X. Wang, and X. Tang, “Pedestrian Detection aided by Deep Learning Semantic Tasks,” CVPR 2015
![Page 42: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/42.jpg)
42
Pedestrian Background
(a) Data Generation
patches160
Bb: Stanford Bkg.
bldg.
hard negatives
64
7
7
16
40
8 4
3248
6496
220 10 5
5
5 33
3
500
100
200
3
(b) TA-CNN
conv1conv2
conv3conv4
fc5
fc6
3
pedestrian classifier:
pedestrian attributes:
… shared bkg. attributes:
unshared bkg. attributes:
Ba: CamVid Bc: LM+SUNP: Caltech
hard negatives
D
……
SPV:x y
h(L)
z
y
as
ap
au
sky tree road traffic light
WL
Wz
h(L-1) Wm
Wap
Was
Wau
hard negatives
sky tree road vertical horizontal sky bldg. tree road vehiclebldg.
Figure 4:The proposed pipeline forpedestrian detection (B estview ed in color).
convolution and m ax-pooling,w hich are form ulated as
h v(l)n = relu(bv(l) +X
u
k vu (l)⇤h u (l− 1)n ), (1)
hv(l)n (i,j) = m ax
8 (p,q)2⌦(i,j ){h v(l)n (p,q)}. (2)
In Eqn.(1),relu(x) = m ax(0,x)isthe rectified linearfunc-tion [19]and ⇤denotes the convolution operatorapplied on
every pixelofthe feature m ap hu (l− 1)n ,w here h
u (l− 1)n and
hv(l)n stand forthe u-th inputchannelatthe l− 1 layerandthe v-th outputchannelatthe l layer,respectively. k vu (l)
and bv(l) denote the filters and bias. In Eqn.(2),the feature
m ap hv(l)n is partitioned into grid w ith overlapping cells,
each of w hich is denoted as⌦(i,j), w here (i,j) indicatesthe cell index. The m ax-pooling com pares value at eachlocation (p,q) ofa celland outputs the m axim um value ofeach cell.
Each hidden layerin fc5 and fc6 is obtained by
h (l)n = relu(W(l)T h (l− 1)n + b (l)), (3)
w here the higher levelrepresentation is transform ed fromlow er levelw ith a non-linear m apping. W (l) and b (l) arethe w eightm atrixesand bias vectoratthe l-th layer.
TA -C N N can be form ulated asm inim izing the log poste-riorprobability w ith respectto a setofnetw ork param eters
W
W ⇤= arg m inW
−
NX
n = 1
log p(yn ,opn ,o
sn ,o
un |x n ;W ), (4)
w here E = −P Nn = 1 log p(yn ,o
pn ,o
sn ,o
un |x n ) is a com -
plete loss function regarding the entire training set. H ere,w e illustrate that the shared attributes osn in Eqn.(4) arecrucialto learn shared representation across m ultiple scenedatasets B ’s.
For clarity,w e keep only the unshared scene attributesoun in the loss function, w hich then becom es E =
−P Nn = 1 log p(o
un |x n ). Let x
a denote the sam ple of B a.A shared representation can be learned if and only if allthe sam ples share atleastone target(attribute). Since thesam ples are independent,the lossfunction can be expandedasE = −
P Ii= 1 log p(o
u 1i |x
ai)−P Jj= 1 log p(o
u 2j ,o
u 3j |x
bj)−
P Kk= 1 log p(o
u 4k |x
ck),w here I + J + K = N ,im plying that
each datasetis only used to optim ize its corresponding un-shared attribute,although allthe datasets and attributes aretrained in a single TA -C N N .Forinstance,the classificationm odel of ou 1 is learned by using B a w ithout leveragingthe existence of the other datasets. In other w ords, theprobability of p(ou 1|x a,x b,x c) = p(ou 1|x a) because ofm issing labels. The above form ulation is not sufficientto learn shared features am ong datasets, especially w henthe data have large differences. To bridge m ultiple scenedatasets B ’s, w e introduce the shared attributes os, the
![Page 43: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/43.jpg)
Pedestrian Detection on Caltech (average miss detection rates)
43
HPG+SVM68% DPM
63%
Joint DL39%
DL aided by semantic tasks
17%
W. Ouyang and X. Wang, “Joint Deep Learning for Pedestrian Detection,” ICCV 2013.
Y. Tian, P. Luo, X. Wang, and X. Tang, “Pedestrian Detection aided by Deep Learning Semantic Tasks,” CVPR 2015.
![Page 44: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/44.jpg)
Example 1: deep learning generic image features
• Hinton group’s groundbreaking work on ImageNet– They did not have much experience on general image classification on
ImageNet
– It took one week to train the network with 60 Million parameters
– The learned feature representations are effective on other datasets (e.g. Pascal VOC) and other tasks (object detection, segmentation, tracking, and image retrieval)
![Page 45: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/45.jpg)
PASCAL VOC (SIFT, HOG, DPM…)
45
![Page 46: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/46.jpg)
PSCAL VOC (CNN features)
46
![Page 47: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/47.jpg)
PSCAL VOL (CNN features)
47
DeepID-Net 64.1%
Our current result 73.9%
![Page 48: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/48.jpg)
Deep learning is feature learning
Image classification Object detection Tracking
SegmentationFeatures learned on ImageNet
![Page 49: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/49.jpg)
Deep Structures vs Shallow Structures(Why deep?)
![Page 50: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/50.jpg)
Shallow Structures
• A three-layer neural network (with one hidden layer) can approximate any classification function
• Most machine learning tools (such as SVM, boosting, and KNN) can be approximated as neural networks with one or two hidden layers
• Shallow models divide the feature space into regions and match templates in local regions. O(N) parameters are needed to represent N regions
SVM
![Page 51: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/51.jpg)
Deep Machines are More Efficient for Representing Certain Classes of Functions
• Theoretical results show that an architecture with insufficient depth can require many more computational elements, potentially exponentially more (with respect to input size), than architectures whose depth is matched to the task (Hastad 1986, Hastad and Goldmann 1991)
• It also means many more parameters to learn
![Page 52: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/52.jpg)
• Take the d-bit parity function as an example
• d-bit logical parity circuits of depth 2 have exponential size (Andrew Yao, 1985)
• There are functions computable with a polynomial-size logic gates circuits of depth k that require exponential size when restricted to depth k -1 (Hastad, 1986)
(X1, . . . , Xd) Xi is even
![Page 53: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/53.jpg)
• Architectures with multiple levels naturally provide sharing and re-use of components
Honglak Lee, NIPS’10
![Page 54: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/54.jpg)
Humans Understand the World through Multiple Levels of Abstractions
• We do not interpret a scene image with pixels– Objects (sky, cars, roads, buildings, pedestrians) -> parts (wheels,
doors, heads) -> texture -> edges -> pixels
– Attributes: blue sky, red car
• It is natural for humans to decompose a complex problem into sub-problems through multiple levels of representations
![Page 55: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/55.jpg)
Humans Understand the World through Multiple Levels of Abstractions
• Humans learn abstract concepts on top of less abstract ones
• Humans can imagine new pictures by re-configuring these abstractions at multiple levels. Thus our brain has good generalization can recognize things never seen before.– Our brain can estimate shape, lighting and pose from a face image and
generate new images under various lightings and poses. That’s why we have good face recognition capability.
![Page 56: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/56.jpg)
Local and Global Representations
![Page 57: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/57.jpg)
• The way these regions carve the input space still depends on few parameters: this huge number of regions are not placed independently of each other
• We can thus represent a function that looks complicated but actually has (global) structures
![Page 58: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/58.jpg)
Joint Learning vs Separate Learning
Data collection
Preprocessing step 1
Preprocessing step 2
… Feature extraction
Training or manual design
Classification
Manual design
Training or manual design
Data collection
Feature transform
Feature transform
… Feature transform
Classification
End-to-end learning
? ? ?
Deep learning is a framework/language but not a black-box model
Its power comes from joint optimization and increasing the capacity of the learner
![Page 59: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/59.jpg)
What if we treat an existing deep model as a black box in pedestrian detection?
ConvNet−U−MS
– Sermnet, K. Kavukcuoglu, S. Chintala, and LeCun, “Pedestrian Detection with Unsupervised Multi-Stage Feature Learning,” CVPR 2013.
![Page 60: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/60.jpg)
Results on Caltech Test Results on ETHZ
![Page 61: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/61.jpg)
• N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. CVPR, 2005. (6000 citations)
• P. Felzenszwalb, D. McAlester, and D. Ramanan. A Discriminatively Trained, Multiscale, Deformable Part Model. CVPR, 2008. (2000 citations)
• W. Ouyang and X. Wang. A Discriminative Deep Model for Pedestrian Detection with Occlusion Handling. CVPR, 2012.
![Page 62: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/62.jpg)
Our Joint Deep Learning Model
W. Ouyang and X. Wang, “Joint Deep Learning for Pedestrian Detection,” Proc. ICCV, 2013.
![Page 63: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/63.jpg)
Modeling Part Detectors
• Design the filters in the second convolutional layer with variable sizes
Part models Learned filtered at the second convolutional layer
Part models learned from HOG
![Page 64: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/64.jpg)
Deformation Layer
![Page 65: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/65.jpg)
Visibility Reasoning with Deep Belief Net
Correlates with part detection score
![Page 66: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/66.jpg)
Experimental Results
• Caltech – Test dataset (largest, most widely used)
2000 2002 2004 2006 2008 2010 2012 201430
40
50
60
70
80
90
100
Ave
rage
mis
s ra
te (
%)
![Page 67: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/67.jpg)
Experimental Results
• Caltech – Test dataset (largest, most widely used)
2000 2002 2004 2006 2008 2010 2012 201430
40
50
60
70
80
90
100 95%
Ave
rage
mis
s ra
te (
%)
![Page 68: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/68.jpg)
Experimental Results
• Caltech – Test dataset (largest, most widely used)
2000 2002 2004 2006 2008 2010 2012 201430
40
50
60
70
80
90
100 95%68%
Ave
rage
mis
s ra
te (
%)
![Page 69: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/69.jpg)
Experimental Results
• Caltech – Test dataset (largest, most widely used)
2000 2002 2004 2006 2008 2010 2012 201430
40
50
60
70
80
90
100 95%68%
63% (state-of-the-art)
Ave
rage
mis
s ra
te (
%)
![Page 70: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/70.jpg)
Experimental Results
• Caltech – Test dataset (largest, most widely used)
2000 2002 2004 2006 2008 2010 2012 201430
40
50
60
70
80
90
100 95%68%
63% (state-of-the-art)
53%
39% (best performing)Improve by ~ 20%
W. Ouyang, X. Zeng and X. Wang, "Modeling Mutual Visibility Relationship in Pedestrian Detection ", CVPR 2013.W. Ouyang, Xiaogang Wang, "Single-Pedestrian Detection aided by Multi-pedestrian Detection ", CVPR 2013.X. Zeng, W. Ouyang and X. Wang, ” A Cascaded Deep Learning Architecture for Pedestrian Detection,” ICCV 2013.W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,” IEEE ICCV 2013.
W. Ouyang and X. Wang, "A Discriminative Deep Model for Pedestrian Detection with Occlusion Handling,“ CVPR 2012.
Ave
rage
mis
s ra
te (
%)
![Page 71: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/71.jpg)
DN-HOGUDN-HOGUDN-HOGCSSUDN-CNNFeatUDN-DefLayer
![Page 72: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/72.jpg)
Deformation layer for general object detection
![Page 73: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/73.jpg)
Deformation layer for repeated patterns
Pedestrian detection General object detection
Assume no repeated pattern Repeated patterns
![Page 74: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/74.jpg)
Deformation layer for repeated patterns
Pedestrian detection General object detection
Assume no repeated pattern Repeated patterns
Only consider one object class Patterns shared across different object classes
![Page 75: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/75.jpg)
Deformation constrained pooling layer
Can capture multiple patterns simultaneously
![Page 76: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/76.jpg)
Deep model with deformation layer
Training scheme Cls+Det Loc+Det Loc+Det
Net structure AlexNet Clarifai Clarifai+Def layer
Mean AP on val2 0.299 0.360 0.385
Patterns shared across different classes
![Page 77: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/77.jpg)
Large learning capacity makes high dimensional data transforms possible, and makes better use
of contextual information
![Page 78: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/78.jpg)
• How to make use of the large learning capacity of deep models?
– High dimensional data transform
– Hierarchical nonlinear representations
?
SVM + feature
smoothness, shape prior…Output
Input
High-dimensional data transform
![Page 79: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/79.jpg)
Face Parsing
• P. Luo, X. Wang and X. Tang, “Hierarchical Face Parsing via Deep Learning,” CVPR 2012
![Page 80: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/80.jpg)
Motivations
• Recast face segmentation as a cross-modality data transformation problem
• Cross modality autoencoder
• Data of two different modalities share the same representations in the deep model
• Deep models can be used to learn shape priors for segmentation
![Page 81: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/81.jpg)
Training Segmentators
![Page 82: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/82.jpg)
![Page 83: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/83.jpg)
Fully convolutional neural network
Convolution-pooling layers
Fully connected layers “Fusion” convolutional layers implemented by 1 x 1 kernel
![Page 84: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/84.jpg)
Big data
Rich information
Challenging supervision task with rich predictions
How to make use of it?
Capacity
Go deeper
Joint optimization
Hierarchical feature learning
Go wider
Take large input
Capture contextual information
Domain knowledge
Make learning more efficient
Reduce capacity
![Page 85: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural](https://reader030.fdocuments.us/reader030/viewer/2022021501/5ac1231e7f8b9aca388cb5a0/html5/thumbnails/85.jpg)
Deep learning = ?
Machine learning with big data
Feature learning
Joint learning
Contextual learning