Case Study of Convolutional Neural Network
-
Upload
namhyuk-ahn -
Category
Engineering
-
view
1.785 -
download
7
Transcript of Case Study of Convolutional Neural Network
![Page 1: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/1.jpg)
Case Study of CNN from LeNet to ResNet
NamHyuk Ahn @ Ajou Univ. 2016. 03. 09
![Page 2: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/2.jpg)
Convolutional Neural Network
![Page 3: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/3.jpg)
![Page 4: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/4.jpg)
![Page 5: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/5.jpg)
Convolution Layer
- Convolution (3-dim dot product) image and filter
- Stack filter in one layer (See blue and green output, called channel)
![Page 6: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/6.jpg)
Convolution Layer- Local Connectivity
• Instead connect all pixels to neurons, connect only local region of input (called receptive field)
• It can reduce many parameter
- Parameter sharing
• To reduce parameter, each channel have same filter. (# of filter == # of channel)
![Page 7: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/7.jpg)
Convolution Layer- Example) 1st conv layer in AlexNet
• Input: [224, 224], filter: [11x11x3], 96, output: [55, 55]
- Each filter extract different features (i.e. horizontal edge, vertical edge…)
![Page 8: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/8.jpg)
Pooling Layer- Downsample image to reduce parameter
- Usually use max pooling (take maximum value in region)
![Page 9: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/9.jpg)
ReLU, FC Layer
- ReLU
• Sort of activation function (e.g. sigmoid, tanh…)
- Fully-connected Layer
• Same as normal neural network
![Page 10: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/10.jpg)
Convolutional Neural Network
\
![Page 11: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/11.jpg)
Training CNN1. Calculate loss function with foward-prop
2. Optimize parameter w.r.t loss function with back-prop
• Use gradient descent method (SGD)
• Gradient of weight can calculate with chain rule of partial derivate
![Page 12: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/12.jpg)
![Page 13: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/13.jpg)
![Page 14: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/14.jpg)
![Page 15: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/15.jpg)
ILSVRC trend
![Page 16: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/16.jpg)
![Page 17: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/17.jpg)
AlexNet (2012) (ILSVRC 2012 winner)
![Page 18: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/18.jpg)
AlexNet
- ReLU
- Data augmentation
- Dropout
- Ensemble CNN (1-CNN 18.2%, 7-CNN 15.4%)
![Page 19: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/19.jpg)
AlexNet
- Other methods (but will not mention today)
• SGD + momentum (+ mini-batch)
• Multiple GPU
• Weight Decay
• Local Response Normalization
![Page 20: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/20.jpg)
Problems of sigmoid
- Gradient vanishing
• when gradient pass sigmoid, it can vanish because local gradient of sigmoid can be almost zero.
- Output is not zero-centered
• cause bad performance
![Page 21: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/21.jpg)
ReLU
- Converge of SGD is faster than sigmoid-like
- Computationally cheap
![Page 22: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/22.jpg)
Data augmentation- Randomly crop [256, 256] images to [224, 224]
- At test time, crop 5 images and average to predict
![Page 23: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/23.jpg)
Dropout- Similar to bagging (approximation of bagging)
- Act like regularizer (reduce overfit)
- Instead of using all neurons, “dropout” some neurons randomly (usually 0.5 probability)
![Page 24: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/24.jpg)
Dropout• At test time, not “dropout” neurons, but use
weighted neurons (usually 0.5)
• Weight is expected value of each neurons
![Page 25: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/25.jpg)
Architecture
- conv - pool - … - fc - softmax (similar to LeNet)
- Use large size filter (i.e. 11x11)
![Page 26: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/26.jpg)
Architecture
- Weights must be initalized randomly
• If not, all gradients of neurons will be same
• Usually, use gaussian distribution, std = 0.01
- Use mini-batch SGD and momentum SGD to update weight
![Page 27: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/27.jpg)
VGGNet (2014) (ILSVRC 2014 2nd)
![Page 28: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/28.jpg)
VGGNet
- Use small size kernel (always 3x3)
• Can use multiple non-linearlity (e.g. ReLU)
• Less weights to train
- Hard data augmentation (more than AlexNet)
- Ensemble 7 model (ILSVRC submission 7.3%)
![Page 29: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/29.jpg)
Architecture
- Most memory needs in early layers, most parameters increase in fc layers.
![Page 30: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/30.jpg)
GoogLeNet - Inception v1 (2014) (ILSVRC 2014 winner)
![Page 31: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/31.jpg)
GoogLeNet
![Page 32: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/32.jpg)
Inception module- Use 1x1, 3x3 and 5x5 conv
simultaneously to capture variety of structure
- Capture dense structure to 1x1, more spread out structure to 3x3, 5x5
- Computational expensive
• Use 1x1 conv layer to reduce dimension (explain details in later in ResNet)
![Page 33: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/33.jpg)
Auxiliary Classifiers- Deep network raises concern about effectiveness
of graident in backprop
- Loss of auxiliary is added to total loss (weighted by 0.3), remove at test time
![Page 34: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/34.jpg)
Average Pooling
- Proposed in Network in Network (also used in GoogLeNet)
- Problems of fc layer
• Needs lots of parameter, easy to overfit
- Replace fc to average pooling
![Page 35: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/35.jpg)
Average Pooling- Make channel as same as # of class in last conv
- Calc average on each channel, and pass to softmax
- Reduce overfit
![Page 36: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/36.jpg)
MSRA ResNet (2015) (ILSVRC 2015 winner)
![Page 37: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/37.jpg)
before ResNet..
- Have to know about
• PReLU
• Xavier Initalization
• Batch Normalization
![Page 38: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/38.jpg)
PReLU- Adaptive version of ReLU
- Train slope of function when x < 0
- Slightly more parameter (# of layer x # of channel)
![Page 39: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/39.jpg)
Xavier Initalization- If init with gaussian distribution, output of neurons
will be nearly zeros when network is deeep
- If increase std (1.0), output will saturate to -1 or 1
- Xavier init decide initial value by number of input neurons
- Looks fine, but this init method assume linear activation so can’t use in ReLU-like network
![Page 40: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/40.jpg)
output is saturated
output is vanished
![Page 41: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/41.jpg)
Xavier Initalization / 2
Xavier Initalization
Xavier Initalization / 2
![Page 42: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/42.jpg)
Batch Normalization- Make output to be gaussian distribution, but
normalization cost a lot
• Calc mean, variance in each dimension (assume each dims are uncorrelated)
• Calc mean, variance in mini-batch (not entire set)
- Normalize constrain non-linearlity and constrain network by assume each dims are uncorrelated
• Linear transform output (factors are parameter)
![Page 43: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/43.jpg)
Batch Normalization- When test, calc mean, variance using entire set (use
moving average)
- BN act like regularizer (don’t need Dropout)
![Page 44: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/44.jpg)
ResNet
![Page 45: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/45.jpg)
ResNet
![Page 46: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/46.jpg)
Problem of degradation- More depth, more accurate but deep network can
vanish/explode gradient • BN, Xavier Init, Dropout can handle (~30 layer)
- More deeper, degradation problem occur • Not only overfit, but also increase training error
![Page 47: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/47.jpg)
Deep Residual Learning
- Element-wise addition with F(x) and shortcut connection, and pass through ReLU non-linearlity
- Dim of x, F(x) are unequal (changing of channel), linear project x to match dim (done by 1x1 conv)
- Similar to LSTM
![Page 48: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/48.jpg)
Deeper Bottleneck
- To reduce training time, modify as bottleneck design (just for economical reason)
• (3x3x3)x64x64 + (3x3x3)x64x64=221184 (left)
• (1x1x3)x256x64 + (3x3x3)x64x64 + (1x1x3)x64x256=208896 (right)
• More width(channel) in right, but similar parameter
• Similar method also used in GoogLeNet
![Page 49: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/49.jpg)
ResNet
- Data augmentation as AlexNet does
- Batch Normalization (no dropout)
- Xavier / 2 initalization
- Average pooling
- Structure follows VGGNet style
![Page 50: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/50.jpg)
Conclusion
![Page 51: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/51.jpg)
Top-
5 Er
ror
0%
4%
8%
12%
16%
AlexNet
(2012)
VGGNet
(2014)
Inception-V1
(2014)
Human
PReLU-net
(2015)
BN-Inception
(2015)
ResNet-152
(2015)Inception-ResNet
(2016)
3.1%3.57%4.82%4.94%5.1%
6.66%7.32%
15.31%
![Page 52: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/52.jpg)
Conclusion
- Dropout, BN
- ReLU-like activation (e.g. PReLU, ELU..)
- Xavier initalization
- Average pooling
- Use pre-trained model :)
![Page 53: Case Study of Convolutional Neural Network](https://reader030.fdocuments.us/reader030/viewer/2022020203/58f9ada4760da3da068b9a6a/html5/thumbnails/53.jpg)
Reference- Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep
convolutional neural networks." Advances in neural information processing systems. 2012.
- Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
- Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." arXiv preprint arXiv:1312.4400 (2013).
- He, Kaiming, et al. "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification." Proceedings of the IEEE International Conference on Computer Vision. 2015.
- He, Kaiming, et al. "Deep Residual Learning for Image Recognition." arXiv preprint arXiv:1512.03385 (2015).
- Szegedy, Christian, Sergey Ioffe, and Vincent Vanhoucke. "Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning." arXiv preprint arXiv:1602.07261 (2016).
- Gu, Jiuxiang, et al. "Recent Advances in Convolutional Neural Networks." arXiv preprint arXiv:1512.07108 (2015). (good for tutorial)
- Also Thanks to CS231n, I used some figures in CS231n lecture slides. see http://cs231n.stanford.edu/index.html