Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use...
Transcript of Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use...
![Page 1: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/1.jpg)
Session 5:CNNs Overloaded
Varun Sundar, 1st October 2018
![Page 2: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/2.jpg)
OutlineReview:
1. Building blocks of a CNN
Today’s:
2. Backprop in CNNs
3. BatchNorm4. CNN architectures5. CNN in libraries
![Page 3: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/3.jpg)
CNN Building Blocks
![Page 4: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/4.jpg)
CNN vs MLP
● CNNs are MLPs with two constraints:○ Local Connectivity
○ Parameter Sharing
![Page 5: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/5.jpg)
![Page 6: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/6.jpg)
![Page 7: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/7.jpg)
Generic Overview
![Page 8: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/8.jpg)
CNN Blocks
● Convolutional● Activations● Pooling● Flattening● Unpooling (recent)● Deconvolution (more accurately transposed convolution)
![Page 9: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/9.jpg)
CNN Blocks Overview
![Page 10: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/10.jpg)
Convolution Layer
● Similar to signal convolution
● Inspiration from classical filtering, ISP.
● Actually uses correlation
![Page 11: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/11.jpg)
![Page 12: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/12.jpg)
Conv Layer: Variations - Padding
![Page 13: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/13.jpg)
Conv Layer : Variations
Conv Layer: Variations
![Page 14: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/14.jpg)
Multiple Channels
● Consider at layer l, H*W*C ● Kernel D*D*C● Output is (H - D + 1) * (W - D + 1)* 1● Stack K such filters, (H - D + 1) * (W - D + 1)* K● Why?
○ Transforms spatial correspondence into channel
○ Reduce no of params, K is your choice.
![Page 15: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/15.jpg)
Pooling
![Page 16: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/16.jpg)
![Page 17: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/17.jpg)
Pooling Layer
1. Consider:● W1*H1*D1 as input● the spatial extent of filter F● their stride S● the amount of zero padding P (commonly P = 0).
2. Produces an output volume of size W2 X H2 X D2 where:W2=(W1−F+2P)/S+1, H2=(H1−F+2P)/S+1, D2=K
3. Introduces zero parameters since it computes a fixed function of the input.
![Page 18: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/18.jpg)
![Page 19: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/19.jpg)
![Page 20: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/20.jpg)
![Page 21: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/21.jpg)
![Page 22: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/22.jpg)
![Page 23: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/23.jpg)
![Page 24: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/24.jpg)
![Page 25: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/25.jpg)
![Page 26: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/26.jpg)
Backprop in CNNs
![Page 27: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/27.jpg)
Notations
- l is the lth layer where l = 1,2,...,L - w l is the weights connecting layer to layer l+1i,j - bl is the bias at layer l - x l is defined asi,j - where o l is the output vector at layer l after the
non-linearityi,j - f(.) is the non-linearity
![Page 28: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/28.jpg)
![Page 29: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/29.jpg)
![Page 30: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/30.jpg)
![Page 31: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/31.jpg)
![Page 32: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/32.jpg)
![Page 33: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/33.jpg)
![Page 34: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/34.jpg)
![Page 35: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/35.jpg)
![Page 36: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/36.jpg)
![Page 37: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/37.jpg)
![Page 38: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/38.jpg)
BatchNorm
![Page 39: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/39.jpg)
The need for normalisation
● Normalisation in general, even with correlated features speeds up training
● training complicated by fact that the inputs to each layer are affected by the parameters of all preceding layers
● small changes to the network parameters amplify as the network becomes deeper.
● Called Internal Covariate Shift
![Page 40: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/40.jpg)
![Page 41: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/41.jpg)
![Page 42: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/42.jpg)
![Page 43: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/43.jpg)
Solutions
● Whiten the inputs (LeCun, 1998):○ Costly to do for each input (to each layer)○ Need to compute Covariance matrix
● Also, if normalisation computed outside gradient step, model could blow up.
● Even with mini-batch, dont want to compute Cov matrix
![Page 44: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/44.jpg)
![Page 45: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/45.jpg)
![Page 46: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/46.jpg)
Batch Norm algorithm
Credits: BN paper, Sergey, Szegedy.
![Page 47: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/47.jpg)
Advantages of BN
- Improves gradient flow through the network - Allows higher learning rates - Reduces the strong dependence on initialization - Acts as a form of regularization - Accelerates training
![Page 48: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/48.jpg)
During Inference>>>
● Set beta and gamma from the last run (last batch).
● Caveat: Donot use BN on batch size of 1, with less data
● Can be stochastic, unstable.
![Page 49: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/49.jpg)
Summary
![Page 50: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/50.jpg)
![Page 51: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/51.jpg)
![Page 52: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/52.jpg)
![Page 53: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/53.jpg)
![Page 54: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/54.jpg)
![Page 55: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/55.jpg)
![Page 56: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/56.jpg)
![Page 57: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/57.jpg)
![Page 58: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/58.jpg)
![Page 59: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/59.jpg)
CNN Architectures
![Page 60: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/60.jpg)
ConvNet architectures
![Page 61: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/61.jpg)
LENET5● Implemented in 1994 , one of the very first convolutional neural networks, and
what propelled the field of Deep Learning. This pioneering work by Yann LeCun was named LeNet5 after many previous successful iterations since the year 1988.
![Page 62: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/62.jpg)
LENET5
● use sequence of 3 layers: convolution, pooling, non-linearity
● use convolution to extract spatial features● non-linearity in the form of tanh or sigmoids (no ReLus back
then)● multi-layer neural network (MLP) as final classifier
![Page 63: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/63.jpg)
AlexNET
● Brought DL back to mainstream in 2012, when Alex Krizhevsky released AlexNet which was a deeper and much wider version of the LeNet and won by a large margin the difficult ImageNet competition.
![Page 64: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/64.jpg)
AlexNet
● use of rectified linear units (ReLU) as non-linearities● use of dropout technique (Hinton et al. ) to selectively ignore single neurons
during training, a way to avoid overfitting of the model● overlapping max pooling, avoiding the averaging effects of average pooling● use of GPUs ( NVIDIA GTX 580) to reduce training time
The success of AlexNet started a small revolution. Convolutional neural network were now the workhorse of Deep Learning, which became the new name for “large neural networks that can now solve useful tasks”.
![Page 65: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/65.jpg)
VGG● first to use much smaller 3×3 filters in each layer ● insight that multiple 3×3 convolution can replace 5x5 and 7x7 convolutions
● Fewer params than Alexnet, thrice as deep.
● VGG 16, 19.
![Page 66: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/66.jpg)
Different VGG Architectures
![Page 67: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/67.jpg)
GoogLeNet and Inception
● Christian Szegedy and team from Google,
● aimed at reducing the computational burden of deep neural networks,
● devised GoogLeNet in 2014● Won Imagenet that year.
![Page 68: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/68.jpg)
Inception Block● Combination of 1×1, 3×3, and
5×5 convolutional filters● Emulates Network in Network
(NiN)● 1x1 Convolutions save params● Called Bottleneck
![Page 69: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/69.jpg)
GoogLeNet and Inception
![Page 70: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/70.jpg)
Why multiple softmaxes?● 22 layers, danger of the vanishing gradients problem during training ● Added multiple softmaxes at inception 4a, 4d● These blocks may learn meaningful representations● Discarded at inference
![Page 71: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/71.jpg)
Inception V3 ( and V2)December 2015
● Batchnorm added (incep v2)● maximize information flow into the network, by carefully constructing
networks that balance depth and width. Before each pooling, increase the feature maps.
● when depth is increased, the number of features, or width of the layer is also increased systematically
● use width increase at each layer to increase the combination of features.● use only 3×3 convolution, when possible, given that filter of 5×5 and 7×7
can be decomposed with multiple 3×3
![Page 72: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/72.jpg)
The Inception module shown uses convolutions with strides to decrease the size of the data
Inception V3
![Page 73: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/73.jpg)
Complete Inception_v3 architecture
![Page 74: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/74.jpg)
ResNet- December 2015 (around Inception v3)- Simple ideas:
- Feed the output of two successive
convolutional layers
- Bypass the input to the next layers
![Page 75: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/75.jpg)
ResNet architecture
![Page 76: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/76.jpg)
Inception v4 or Inception_Resnet_v2
● Added residual connections.
![Page 77: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/77.jpg)
SqueezeNet
SqueezeNet can be 3 times faster and 500 times smaller than Alexnet with same accuracy.
● Using 1x1 filters to replace 3x3 filters.● Using 1x1 filters as a bottleneck layer to reduce depth to reduce computation of
the following 3x3 filters.● Downsample late to keep a big feature map.
The building brick of SqueezeNet is called fire module, which contains two layers: a squeeze layer and an expand layer. A SqueezeNet stacks a bunch of fire modules and a few pooling layers.
![Page 78: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/78.jpg)
Fire Modules
The squeeze layer and expand layer keep the same feature map size, while the former reduce the depth to a smaller number, the later increase it. The squeezing (bottoleneck layer) and expansion behavior is common in neural architectures. Another common pattern is increasing depth while reducing feature map size to get high level abstract features.
![Page 79: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/79.jpg)
Fire Modules
![Page 80: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/80.jpg)
Mobilenets
Core layers that MobileNet is built on which are depthwise separable filters (factorised filters). Depthwise separable convolution are made up of two layers: depthwise convolutions and pointwise convolutions. Depthwise convolutions are used to apply a single filter per each input channel (input depth). Pointwise convolution, a simple 1x1 convolution, is then used to create a linear combination of the output of the depthwise layer. MobileNets use both batchnorm and ReLU nonlinearities for both layers. ● Also uses width and resolution multipliers to save on computation
● Even more effective than Squeezenet
![Page 81: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/81.jpg)
Depth wise convolutions
● form of factorized convolutions ● factorize a standard convolution into a depthwise convolution and a 1x1
convolution called a pointwise convolution
![Page 82: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/82.jpg)
![Page 83: Session 5: CNNs Overloaded · AlexNet use of rectified linear units (ReLU) as non-linearities use of dropout technique (Hinton et al. ) to selectively ignore single neurons during](https://reader036.fdocuments.us/reader036/viewer/2022071116/5ffc275360e76275972068f0/html5/thumbnails/83.jpg)