All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep...
Transcript of All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep...
![Page 1: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/1.jpg)
All You Want To Know About CNNsYukun Zhu
![Page 2: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/2.jpg)
Deep Learning
![Page 3: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/3.jpg)
Deep Learning
Image from http://imgur.com/
![Page 4: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/4.jpg)
Deep Learning
Image from http://imgur.com/
![Page 5: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/5.jpg)
Deep Learning
Image from http://imgur.com/
![Page 6: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/6.jpg)
Deep Learning
Image from http://imgur.com/
![Page 7: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/7.jpg)
Deep Learning
Image from http://imgur.com/
![Page 8: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/8.jpg)
Deep Learning
Image from http://imgur.com/
![Page 9: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/9.jpg)
Deep Learning in Vision
33.4
DPM(2010)
Object detection performance, PASCAL VOC 2010
![Page 10: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/10.jpg)
Deep Learning in Vision
33.4
DPM(2010)
40.4
segDPM(2014)
Object detection performance, PASCAL VOC 2010
![Page 11: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/11.jpg)
Deep Learning in Vision
33.4
DPM(2010)
40.4
segDPM(2014)
53.7
RCNN(2014)
Object detection performance, PASCAL VOC 2010
![Page 12: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/12.jpg)
Deep Learning in Vision
33.4
DPM(2010)
40.4
segDPM(2014)
53.7
RCNN(2014)
Object detection performance, PASCAL VOC 2010
62.9
RCNN*(Oct 2014)
![Page 13: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/13.jpg)
Deep Learning in Vision
33.4
DPM(2010)
40.4
segDPM(2014)
53.7
RCNN(2014)
67.2
segRCNN(Jan 2015)
Object detection performance, PASCAL VOC 2010
62.9
RCNN*(Oct 2014)
![Page 14: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/14.jpg)
Deep Learning in Vision
33.4
DPM(2010)
40.4
segDPM(2014)
53.7
RCNN(2014)
67.2
segRCNN(Jan 2015)
Object detection performance, PASCAL VOC 2010
62.9
RCNN*(Oct 2014)
70.8
Fast RCNN(Jun 2015)
![Page 15: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/15.jpg)
A Neuron
Image from http://cs231n.github.io/neural-networks-1/
![Page 16: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/16.jpg)
A Neuron in Neural Network
Image from http://cs231n.github.io/neural-networks-1/
![Page 17: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/17.jpg)
Activation Functions● Sigmoid: f(x) = 1 / (1 + e
-x
)
● ReLU: f(x) = max(0, x)
● Leaky ReLU: f(x) = max(ax, x)
● Maxout: f(x) = max(w
0
x + b
0
, w
1
x + b
1
)
● and many others…
![Page 18: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/18.jpg)
Neural Network (MLP)
Image modified from http://cs231n.github.io/neural-networks-1/
The network simulates a function y = f(x; w)
![Page 19: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/19.jpg)
Forward Computation
Image and code modified from http://cs231n.github.io/optimization-2/
f(x
0
, x
1
) = 1 / (1 + exp(-(w
0
x
0
+ w
1
x
1
+ w
2
)))
x
0
x
1
1
sigmoid
2.00
-1.00
-3.00
-2.00
-3.00
![Page 20: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/20.jpg)
Forward Computation
Image and code modified from http://cs231n.github.io/optimization-2/
f(x
0
, x
1
) = 1 / (1 + exp(-(w
0
x
0
+ w
1
x
1
+ w
2
)))
x
0
x
1
1
sigmoid
2.00
-1.00
-2.00
-3.00
-2.00
-3.00
![Page 21: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/21.jpg)
Forward Computation
Image and code modified from http://cs231n.github.io/optimization-2/
f(x
0
, x
1
) = 1 / (1 + exp(-(w
0
x
0
+ w
1
x
1
+ w
2
)))
x
0
x
1
1
sigmoid
2.00
-1.00
-2.00
-3.00
-2.00
-3.00
6.00
![Page 22: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/22.jpg)
Forward Computation
Image and code modified from http://cs231n.github.io/optimization-2/
f(x
0
, x
1
) = 1 / (1 + exp(-(w
0
x
0
+ w
1
x
1
+ w
2
)))
x
0
x
1
1
sigmoid
2.00
-1.00
-2.00
-3.00
-2.00
-3.00
6.00
4.00
![Page 23: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/23.jpg)
Forward Computation
Image and code modified from http://cs231n.github.io/optimization-2/
f(x
0
, x
1
) = 1 / (1 + exp(-(w
0
x
0
+ w
1
x
1
+ w
2
)))
x
0
x
1
1
sigmoid
2.00
-1.00
-2.00
-3.00
-2.00
-3.00
6.00
4.00
1.00
![Page 24: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/24.jpg)
Forward Computation
Image and code modified from http://cs231n.github.io/optimization-2/
f(x
0
, x
1
) = 1 / (1 + exp(-(w
0
x
0
+ w
1
x
1
+ w
2
)))
x
0
x
1
1
sigmoid
2.00
-1.00
-2.00
-3.00
-2.00
-3.00
6.00
4.00
1.00 -1.00
![Page 25: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/25.jpg)
Forward Computation
Image and code modified from http://cs231n.github.io/optimization-2/
f(x
0
, x
1
) = 1 / (1 + exp(-(w
0
x
0
+ w
1
x
1
+ w
2
)))
x
0
x
1
1
sigmoid
2.00
-1.00
-2.00
-3.00
-2.00
-3.00
6.00
4.00
1.00 -1.00 0.37
![Page 26: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/26.jpg)
Forward Computation
Image and code modified from http://cs231n.github.io/optimization-2/
f(x
0
, x
1
) = 1 / (1 + exp(-(w
0
x
0
+ w
1
x
1
+ w
2
)))
x
0
x
1
1
sigmoid
2.00
-1.00
-2.00
-3.00
-2.00
-3.00
6.00
4.00
1.00 -1.00 0.37 1.37
![Page 27: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/27.jpg)
Forward Computation
Image and code modified from http://cs231n.github.io/optimization-2/
f(x
0
, x
1
) = 1 / (1 + exp(-(w
0
x
0
+ w
1
x
1
+ w
2
)))
x
0
x
1
1
sigmoid
2.00
-1.00
-2.00
-3.00
-2.00
-3.00
6.00
4.00
1.00 -1.00 0.37 1.37 0.73
![Page 28: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/28.jpg)
Loss FunctionLoss function measures how well prediction matches true value
Commonly used loss function:
● Squared loss: (y - y’)
2
● Cross-entropy loss: -sum
i
(y
i
’ * log(y
i
))
● and many others
![Page 29: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/29.jpg)
Loss FunctionDuring training, we would like to minimize the total loss on a set
of training data
● We want to find w* = argmin{sum
i
[loss(f(x
i
; w), y
i
)]}
![Page 30: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/30.jpg)
Loss FunctionDuring training, we would like to minimize the total loss on a set
of training data
● We want to find w* = argmin{sum
i
[loss(f(x
i
; w), y
i
)]}
● Usually we use gradient based approach
○ w
t+1
= w
t
- a∇w
![Page 31: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/31.jpg)
Backward Computation
Image and code modified from http://cs231n.github.io/optimization-2/
f(x
0
, x
1
) = 1 / (1 + exp(-(w
0
x
0
+ w
1
x
1
+ w
2
)))
x
0
x
1
1
sigmoid
2.00
-1.00
-2.00
-3.00
-2.00
-3.00
6.00
4.00
1.00 -1.00 0.37 1.37 0.73
1.00
![Page 32: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/32.jpg)
Backward Computation
Image and code modified from http://cs231n.github.io/optimization-2/
f(x
0
, x
1
) = 1 / (1 + exp(-(w
0
x
0
+ w
1
x
1
+ w
2
)))
x
0
x
1
1
sigmoid
2.00
-1.00
-2.00
-3.00
-2.00
-3.00
6.00
4.00
1.00 -1.00 0.37 1.37 0.73
1.00-0.53
f = 1/xdf/dx = -1/x2
![Page 33: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/33.jpg)
Backward Computation
Image and code modified from http://cs231n.github.io/optimization-2/
f(x
0
, x
1
) = 1 / (1 + exp(-(w
0
x
0
+ w
1
x
1
+ w
2
)))
x
0
x
1
1
sigmoid
2.00
-1.00
-2.00
-3.00
-2.00
-3.00
6.00
4.00
1.00 -1.00 0.37 1.37 0.73
1.00-0.53-0.53
f = x + 1df/dx = 1
![Page 34: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/34.jpg)
Backward Computation
Image and code modified from http://cs231n.github.io/optimization-2/
f(x
0
, x
1
) = 1 / (1 + exp(-(w
0
x
0
+ w
1
x
1
+ w
2
)))
x
0
x
1
1
sigmoid
2.00
-1.00
-2.00
-3.00
-2.00
-3.00
6.00
4.00
1.00 -1.00 0.37 1.37 0.73
1.00-0.53-0.53-0.20
f = ex
df/dx = ex
![Page 35: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/35.jpg)
Backward Computation
Image and code modified from http://cs231n.github.io/optimization-2/
f(x
0
, x
1
) = 1 / (1 + exp(-(w
0
x
0
+ w
1
x
1
+ w
2
)))
x
0
x
1
1
sigmoid
2.00
-1.00
-2.00
-3.00
-2.00
-3.00
6.00
4.00
1.00 -1.00 0.37 1.37 0.73
1.00-0.53-0.53-0.200.20
f = -xdf/dx = -1
![Page 36: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/36.jpg)
Backward Computation
Image and code modified from http://cs231n.github.io/optimization-2/
f(x
0
, x
1
) = 1 / (1 + exp(-(w
0
x
0
+ w
1
x
1
+ w
2
)))
x
0
x
1
1
sigmoid
2.00
-1.00
-2.00
-3.00
-2.00
-3.00
6.00
4.00
1.00 -1.00 0.37 1.37 0.73
1.00-0.53-0.53-0.200.20
0.20
0.20
f = x + adf/dx = 1
![Page 37: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/37.jpg)
Backward Computation
Image and code modified from http://cs231n.github.io/optimization-2/
f(x
0
, x
1
) = 1 / (1 + exp(-(w
0
x
0
+ w
1
x
1
+ w
2
)))
x
0
x
1
1
sigmoid
2.00
-1.00
-2.00
-3.00
-2.00
-3.00
6.00
4.00
1.00 -1.00 0.37 1.37 0.73
1.00-0.53-0.53-0.200.20
0.20
0.20
0.20
0.20
![Page 38: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/38.jpg)
Backward Computation
Image and code modified from http://cs231n.github.io/optimization-2/
f(x
0
, x
1
) = 1 / (1 + exp(-(w
0
x
0
+ w
1
x
1
+ w
2
)))
x
0
x
1
1
sigmoid
2.00
-1.00
-2.00
-3.00
-2.00
-3.00
6.00
4.00
1.00 -1.00 0.37 1.37 0.73
1.00-0.53-0.53-0.200.20
0.20
0.20
0.200.40
-0.20
-0.40
-0.60
0.20
f = axdf/dx = a
![Page 39: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/39.jpg)
Why NNs?
![Page 40: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/40.jpg)
Universal Approximation TheoremA feed-forward network with a single hidden layer containing a
finite number of neurons, can approximate continuous functions
on compact subsets of R
n
, under mild assumptions on the
activation function.
https://en.wikipedia.org/wiki/Universal_approximation_theorem
![Page 41: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/41.jpg)
Stone’s Theorem● Suppose X is a compact Hausdorff space and B is a
subalgebra in C(X, R) such that:
○ B separates points.
○ B contains the constant function 1.
○ If f ∈ B then af ∈ B for all constants a ∈ R.
○ If f, g ∈ B, then f + g, max{f, g} ∈ B.
● Then every continuous function defined on C(X, R) can be
approximated as closely as desired by functions in B
![Page 42: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/42.jpg)
Why CNNs?
![Page 43: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/43.jpg)
Problems of MLP in VisionFor input as a 10 * 10 image:
● A 3 layer MLP with 200 hidden units contains ~100k
parameters
For input as a 100 * 100 image:
● A 1 layer MLP with 20k hidden units contains ~200m
parameters
![Page 44: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/44.jpg)
Can We Do Better?
![Page 45: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/45.jpg)
Can We Do Better?
![Page 46: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/46.jpg)
Can We Do Better?
![Page 47: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/47.jpg)
Can We Do Better?
![Page 48: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/48.jpg)
Can We Do Better?Based on such observation, MLP can be improved in two ways:
● Locally connected instead of fully connected
● Sharing weights between neurons
We achieve those by using convolution neurons
![Page 49: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/49.jpg)
Convolutional Layers
Image from http://cs231n.github.io/convolutional-networks/
![Page 50: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/50.jpg)
Convolutional Layers
Image from http://cs231n.github.io/convolutional-networks/. See this page for an excellent example of convolution.
depth
height
width
![Page 51: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/51.jpg)
Pooling Layers
Image from http://cs231n.github.io/convolutional-networks/
![Page 52: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/52.jpg)
Pooling Layers Example: Max Pooling
Image from http://cs231n.github.io/convolutional-networks/
![Page 53: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/53.jpg)
Pooling LayersCommonly used pooling layers:
● Max pooling
● Average pooling
Why pooling layers?
● Reduce activation dimensionality
● Robust against tiny shifts
![Page 54: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/54.jpg)
CNN Architecture: An Example
Image from http://cs231n.github.io/convolutional-networks/
![Page 55: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/55.jpg)
Layer Activations for CNNs
Image modified from http://cs231n.github.io/convolutional-networks/
Conv:1 ReLU:1 Conv:2 ReLU:2 MaxPool:1 Conv:3
![Page 56: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/56.jpg)
Layer Activations for CNNs
Image modified from http://cs231n.github.io/convolutional-networks/
MaxPool:2 Conv:5 ReLU:5 Conv:6 ReLU:6 MaxPool:3
![Page 57: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/57.jpg)
Learnt Weights for CNNs: First Conv Layer of AlexNet
Image from http://cs231n.github.io/convolutional-networks/
![Page 58: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/58.jpg)
Why CNNs Work Now?
![Page 59: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/59.jpg)
Convolutional Neural Networks● Faster heterogeneous parallel computing
○ CPU clusters, GPUs, etc.
● Large dataset
○ ImageNet: 1.2m images of 1,000 object classes
○ CoCo: 300k images of 2m object instances
● Improvements in model architecture
○ ReLU, dropout, inception, etc.
![Page 60: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/60.jpg)
AlexNet
Krizhevsky, Alex, et al. "Imagenet classification with deep convolutional neural networks." NIPS 2012
![Page 61: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/61.jpg)
GoogLeNet
Szegedy, Christian, et al. "Going deeper with convolutions." arXiv preprint arXiv:1409.4842 (2014).
![Page 62: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/62.jpg)
Quiz# of parameters for the first conv layer of AlexNet?
![Page 63: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/63.jpg)
Quiz# of parameters if the first layer is fully-connected?
![Page 64: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/64.jpg)
QuizGiven a convolution operation written as
f(x
3x3
; w
3x3
, b) = sum
i,j
(x
i,j
w
i,j
) + b
Can you derive its gradients (df/dx, df/dw, df/db)?
![Page 65: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/65.jpg)
Ready to Build Your Own Networks?
![Page 66: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/66.jpg)
Tips and Tricks for CNNs ● Know your data, clean your data, and normalize your data
○ A common trick: subtract the mean and divide by its std.
Image from http://cs231n.github.io/neural-networks-2/
![Page 67: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/67.jpg)
Tips and Tricks for CNNs ● Augment your data
![Page 68: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/68.jpg)
Tips and Tricks for CNNs ● Organize your data:
○ Keep training data balanced
○ Shuffle data before batching
● Feed your data in the correct way
○ Image channel order
○ Tensor storage order
![Page 69: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/69.jpg)
Tips and Tricks for CNNs
First Order, in order. First Order, out of order.
![Page 70: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/70.jpg)
Tips and Tricks for CNNs Common tensor storage order:
● BDRC
○ Used in Caffe, Torch, Theano, supported by CuDNN
○ Pros: faster for convolution (FFT, memory access)
● BRCD
○ Used in TensorFlow, limited support by CuDNN
○ Pros: Fast batch normalization, easier batching
![Page 71: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/71.jpg)
Tips and Tricks for CNNs Designing model architecture
● Convolution, max pooling, then fully connected layers
● Nonlinearity
○ Stay away from sigmoid (except for output)
○ ReLU preferred
○ Leaky ReLU after
○ Use Maxout if most ReLU units die (have zero activation)
![Page 72: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/72.jpg)
Tips and Tricks for CNNs Setting parameters
● Weights
○ Random initialization with proper variance
● Biases
○ For ReLU we prefer a small positive bias to activate ReLU
![Page 73: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/73.jpg)
Tips and Tricks for CNNs Setting hyperparameters
● Learning Rate / Momentum (Δw
t*
= Δw
t
+ mΔw
t-1
)
○ Decrease learning rate while training
○ Setting momentum to 0.8 - 0.9
● Batch Size
○ For large dataset: set to whatever fits your memory
○ For smaller dataset: find a tradeoff between instance
randomness and gradient smoothness
![Page 74: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/74.jpg)
Tips and Tricks for CNNs Monitoring your training:
● Split your dataset to training, validation and test
○ Optimize your hyperparameter in val and evaluate on test
○ Keep track of training and validation loss during training
○ Do early stopping if training and validation loss diverge
○ Loss doesn’t tell you all. Try precision, class-wise precision, and
more
![Page 75: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/75.jpg)
Tips and Tricks for CNNs Borrow knowledge from another dataset
● Pre-train your CNN on a large dataset (e.g. ImageNet)
● Remove / reshape the last a few layers
● Fix the parameters of first a few layers, or make the learning
rate small for them
● Fine-tune the parameters on your own dataset
![Page 76: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/76.jpg)
Tips and Tricks for CNNs Debugging
● import unittest, not import pdb
● Check your gradient [**deprecated**]
● Make your model large enough, and try overfitting training
● Check gradient norms, weight norms, and activation norms
![Page 77: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/77.jpg)
Talk is Cheap, Show Me Some Code
![Page 78: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/78.jpg)
Image from http://www.linkresearchtools.com/
![Page 79: All You Want To Know About CNNsfidler/teaching/2015/slides/CSC2523/... · 2016-01-21 · Deep Learning in Vision 33.4 DPM (2010) 40.4 segDPM (2014) 53.7 RCNN (2014) Object detection](https://reader033.fdocuments.us/reader033/viewer/2022041705/5e447e26fe7aa42d7d2346b4/html5/thumbnails/79.jpg)
Fully Convolutional Networks
Long, Jonathan, et al. "Fully convolutional networks for semantic segmentation." arXiv preprint arXiv:1411.4038 (2014).