Deep Learning for Vision Part II-CNN and...
Transcript of Deep Learning for Vision Part II-CNN and...
![Page 1: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/1.jpg)
Deep Learning for Vision
Part II-CNN and Recognition
Associate Prof. Bingbing Ni (倪冰冰)
Shanghai Jiao Tong University
![Page 2: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/2.jpg)
Convolutional Neural Network
![Page 3: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/3.jpg)
Convolutional Neural Network
Alpha Go
![Page 4: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/4.jpg)
Convolutional Neural Network
Input image: 200x200
Consider an image classification problem
“face”
Fully-connected, 400000 hidden units, 16 billion parameters!
![Page 5: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/5.jpg)
Convolutional Neural Network
Idea: local connection
Locally-connected, 400000 hidden units, 40 million parameters!
1. Captures local
10x10 region (100
weights)
Leads to Conv Filter!
Input image: 200x200
𝒘
𝒘
2. Weights sharing
3. Like “convolution”
4. Can have different
local filters to generate
different responses
![Page 6: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/6.jpg)
Convolutional Neural Network
Evidence: biological inspiration
Hubel and Wiesel, 1959
![Page 7: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/7.jpg)
Convolutional Neural Network
![Page 8: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/8.jpg)
Convolutional Neural Network
Convolve the filter with the image, i.e.,
“slide over the image spatially, computing
dot products”
Filters always extend the full
depth of the input volume
Convolutional filter
![Page 9: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/9.jpg)
Convolutional Neural Network
- The result of taking a dot product between the filter and a small
5x5x3 chunk of the image (i.e. 5*5*3 = 75-dimensional dot product +
bias)
- Called convolution due to some legacy, in fact “correlation”
Output a single number
Convolutional filter
![Page 10: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/10.jpg)
Convolutional Neural Network
convolve (slide) over all
spatial locations
Convolutional layer
![Page 11: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/11.jpg)
Convolutional Neural Network
convolve (slide) over all
spatial locations
Convolutional layer
- If we have 6 5x5x3 filters we got 6 activation maps
- Stack up these maps to get a new “image” of the size 28x28x6
- The set of 6 5x5x3 filters is called a “convolutional layer”
![Page 12: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/12.jpg)
Convolutional Neural Network
Image 6x6
Conv filter 3x1
We set stride = 1
Output map 4x4
An example
![Page 13: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/13.jpg)
Convolutional Neural Network
Image 7x7, Filter 3x3
Another example
- If stride = 1, output map size 5x5
- If stride = 2, output map size 3x3
Formula for output size:
(𝑁 − 𝐹)/𝑠𝑡𝑟𝑖𝑑𝑒 + 1
N
F What happens when F = 3?
![Page 14: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/14.jpg)
Convolutional Neural Network
Zero padding
- In practice, common to pad the border with 0
- In this case N = 7+2, F = 3, stride = 3, output
map size is 3 by the formula
- In general common to see CONV layers with
stride = 1, filters with size FxF, with zero-
padding with (F-1)/2
( N + 2 x (F-1) /2 – F)/1 + 1 = N preserve size!
![Page 15: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/15.jpg)
Convolutional Neural Network
![Page 16: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/16.jpg)
Convolutional Neural Network
![Page 17: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/17.jpg)
Convolutional Neural Network
![Page 18: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/18.jpg)
Convolutional Neural Network
First we convert image to column, then calculate 𝒘𝒙+ 𝒃
In CAFFE, we do CNN via vector/matrix operation
![Page 19: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/19.jpg)
Convolutional Neural Network
Compose the network
Conv net is a sequence of conv layers,
interspersed with activation functions
![Page 20: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/20.jpg)
Convolutional Neural Network
Compose the network
Need shrink the image
step by step to extract
higher level
information
![Page 21: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/21.jpg)
Convolutional Neural Network
Receptive field
should be larger and
larger
![Page 22: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/22.jpg)
Convolutional Neural Network
Max pooling
Max pooling with 2x2 filter and stride = 2
![Page 23: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/23.jpg)
Convolutional Neural Network
Connect conv activation maps to fully connected layers (FC)
![Page 24: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/24.jpg)
Convolutional Neural Network
Fully connected layer (FC)
May also convert FC layers to CONV layers, i.e., by setting the
filter size exactly as the input volume
![Page 25: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/25.jpg)
Convolutional Neural Network
Local Contrast Normalization
- Performed also across features and in the higher layers
- improves invariance, optimization and sparsity
![Page 26: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/26.jpg)
Convolutional Neural Network
Local Contrast Normalization Layer
![Page 27: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/27.jpg)
Convolutional Neural Network
Implementation of Le-Net
![Page 28: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/28.jpg)
Convolutional Neural Network
![Page 29: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/29.jpg)
Convolutional Neural Network
![Page 30: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/30.jpg)
Convolutional Neural Network
![Page 31: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/31.jpg)
Convolutional Neural Network
![Page 32: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/32.jpg)
Convolutional Neural Network
![Page 33: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/33.jpg)
Convolutional Neural Network
![Page 34: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/34.jpg)
Convolutional Neural Network
![Page 35: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/35.jpg)
Convolutional Neural Network
Training Deep CNN
![Page 36: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/36.jpg)
Convolutional Neural Network
Training Deep CNN
![Page 37: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/37.jpg)
Batch Normalization (BN)
Convolutional Neural NetworkTraining Deep CNN
![Page 38: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/38.jpg)
Convolutional Neural Network
Trouble shooting the training
Training Deep CNN
![Page 39: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/39.jpg)
Convolutional Neural Network
AlexNet
GoogleNet
LeNet
VGGNet
![Page 40: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/40.jpg)
Convolutional Neural Network
![Page 41: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/41.jpg)
Convolutional Neural Network
![Page 42: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/42.jpg)
Convolutional Neural Network
![Page 43: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/43.jpg)
Convolutional Neural Network
![Page 44: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/44.jpg)
Convolutional Neural Network
![Page 45: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/45.jpg)
Convolutional Neural Network
![Page 46: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/46.jpg)
Convolutional Neural Network
In practice: small scale, novel class
- Often small problem, e.g., hundred categories, thousands
samples
- Not stable if we train CNN from scratch
![Page 47: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/47.jpg)
Convolutional Neural Network
Deep CNN model
Idea: knowledge transfer via CNN
Shared general low level features
Fine-tuned
Deep CNN model
Domain
adaption
![Page 48: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/48.jpg)
Convolutional Neural Network
Idea: knowledge transfer via CNN
- Take a pre-trained model from model zoo
- Remove last fully convolutional and connect with new
objective
- Fine-tune the new network with higher learning rate on FC
layers and lower learning rate on the early CONV layers
![Page 49: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/49.jpg)
Convolutional Neural Network
Application: image retrieval
![Page 50: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/50.jpg)
Convolutional Neural Network
Application: OCR and logo
![Page 51: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/51.jpg)
Convolutional Neural Network
Application: texture
![Page 52: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/52.jpg)
Convolutional Neural Network
Application: object detection
![Page 53: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/53.jpg)
Convolutional Neural Network
Application: scene parsing
![Page 54: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/54.jpg)
Convolutional Neural Network
Application: action recognition
![Page 55: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/55.jpg)
Location: apply CNNs to region proposals
Scarce data: fine-tune the pre-trained model
How to extent the CNN classification results to object detection?
R-CNN
DCNN Object Detection
![Page 56: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/56.jpg)
SSD: Single Shot MultiBox Detector
Default boxes and aspect ratios
Each feature map cell has a set of default bounding
boxes and the position relative to its corresponding cell
is fixed.
DCNN Object Detection
![Page 57: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/57.jpg)
Recurrent Neural Network
xt
yt
𝐡𝒕
x0
y0
𝐡𝟎
x1
y1
𝐡𝟏
x2
y2
𝐡𝟐
xt
yt
𝐡𝒕…=
Deep RNN for sequence
![Page 58: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/58.jpg)
Recurrent Neural Network
𝐡𝒕+𝟏
𝜺𝒕+𝟏
𝜕𝜺𝒕+𝟏𝜕𝒉𝒕+𝟏
𝜕𝒉𝒕+𝟐𝜕𝒉𝒕+𝟏
𝐡𝒕
𝜺𝒕
𝜕𝜺𝒕𝜕𝒉𝒕
𝜕𝒉𝒕+𝟏𝜕𝒉𝒕
𝐡𝒕−𝟏
𝜺𝒕−𝟏
𝜕𝜺𝒕−𝟏𝜕𝒉𝒕−𝟏
𝜕𝒉𝒕𝜕𝒉𝒕−𝟏
𝜕𝒉𝒕−𝟏𝜕𝒉𝒕−𝟐
𝒙𝒕+𝟏𝒙𝒕𝒙𝒕−𝟏
Have no difference with vanilla neural network !
Training: back propagation though time (BPTT)
![Page 59: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/59.jpg)
Recurrent Neural Network
Image Captioning
ℎ𝑡= tanh(𝑊ℎℎℎ𝑡−1 +𝑊𝑥ℎ𝑥𝑡)
𝑥0
START
ℎ0
straw
𝑥1
straw
ℎ1
hat
𝑥2
ℎ2
END
hat
V
𝑵𝒐𝒘: ℎ𝑡 = tanh(𝑊ℎℎℎ𝑡−1 +𝑊𝑥ℎ𝑥𝑡 +𝑊𝑣ℎ𝑣)
![Page 60: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/60.jpg)
Recurrent Neural Network
Attention Model
![Page 61: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/61.jpg)
Recurrent Neural Network Attention Model
![Page 62: Deep Learning for Vision Part II-CNN and Recognitionmedialab.sjtu.edu.cn/teaching/CV/Lec/Lec7-DP-CNN_Recognition.pdf · Convolutional Neural Network Idea: local connection Locally-connected,](https://reader035.fdocuments.us/reader035/viewer/2022071014/5fcd3b691a202a39002369da/html5/thumbnails/62.jpg)
Thank you!