Deep learning
-
Upload
andre-karpistsenko -
Category
Software
-
view
524 -
download
0
Transcript of Deep learning
![Page 1: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/1.jpg)
by Ilya Kuzovkin [email protected]
Machine Learning Estonia 2016
Deep LearningTHEORY, HISTORY,
STATE OF THE ART & PRACTICAL TOOLS
http://neuro.cs.ut.ee
![Page 2: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/2.jpg)
![Page 3: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/3.jpg)
Where it has started
How it evolved
What is the state now
How can you use it
How it learns
![Page 4: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/4.jpg)
Where it has started
![Page 5: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/5.jpg)
Where it has startedArtificial Neuron
![Page 6: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/6.jpg)
Where it has startedArtificial Neuron
McCulloch and Pitts“A Logical Calculus of the Ideas Immanent in Nervous Activity”
1943
![Page 7: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/7.jpg)
Where it has startedArtificial Neuron
McCulloch and Pitts“A Logical Calculus of the Ideas Immanent in Nervous Activity”
1943
![Page 8: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/8.jpg)
Where it has startedPerceptron
Frank Rosenblatt1957
![Page 9: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/9.jpg)
Where it has startedPerceptron
Frank Rosenblatt1957
![Page 10: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/10.jpg)
Where it has startedPerceptron
Frank Rosenblatt1957
![Page 11: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/11.jpg)
Where it has startedPerceptron
Frank Rosenblatt1957
“[The Perceptron is] the embryo of an electronic computer that [the Navy] expects will be able to walk, talk, see, write,
reproduce itself and be conscious of its existence.”THE NEW YORK TIMES
![Page 12: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/12.jpg)
Where it has startedSigmoid & Backpropagation
![Page 13: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/13.jpg)
Where it has startedSigmoid & Backpropagation
![Page 14: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/14.jpg)
Where it has startedSigmoid & Backpropagation
Measure how small changes in weights affect outputCan apply NN to regression
![Page 15: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/15.jpg)
Where it has startedSigmoid & Backpropagation
(Werbos) Rumelhart, Hinton, Williams(1974) 1986
“Learning representations by back-propagating errors” (Nature)
Measure how small changes in weights affect outputCan apply NN to regression
![Page 16: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/16.jpg)
Where it has startedSigmoid & Backpropagation
(Werbos) Rumelhart, Hinton, Williams(1974) 1986
“Learning representations by back-propagating errors” (Nature)
Measure how small changes in weights affect output
Multilayer neural networks, etc.
Can apply NN to regression
![Page 17: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/17.jpg)
Where it has startedWhy DL revolution did not happen in 1986?
FROM A TALK BY GEOFFREY HINTON
![Page 18: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/18.jpg)
Where it has startedWhy DL revolution did not happen in 1986?
• Not enough data (datasets were 1000 times too small)
• Computers were too slow (1,000,000 times)
FROM A TALK BY GEOFFREY HINTON
![Page 19: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/19.jpg)
Where it has startedWhy DL revolution did not happen in 1986?
• Not enough data (datasets were 1000 times too small)
• Computers were too slow (1,000,000 times)
• Not enough attention to network initialization
• Wrong non-linearity
FROM A TALK BY GEOFFREY HINTON
![Page 20: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/20.jpg)
How it learns
![Page 21: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/21.jpg)
How it learnsBackpropagation
http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
NET OUT
![Page 22: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/22.jpg)
How it learnsBackpropagation
http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99
NET OUT
![Page 23: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/23.jpg)
How it learnsBackpropagation
http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99
1. The Forward Pass — Calculating the total error
NET OUT
![Page 24: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/24.jpg)
How it learnsBackpropagation
http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99
1. The Forward Pass — Calculating the total error
NET OUT
![Page 25: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/25.jpg)
How it learnsBackpropagation
http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99
1. The Forward Pass — Calculating the total error
NET OUT
![Page 26: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/26.jpg)
How it learnsBackpropagation
http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99
1. The Forward Pass — Calculating the total error
NET OUT
![Page 27: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/27.jpg)
How it learnsBackpropagation
http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99
1. The Forward Pass — Calculating the total error
NET OUT
Repeat for h2 = 0.596, o1 = 0.751, o2 = 0.773
![Page 28: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/28.jpg)
How it learnsBackpropagation
http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99
1. The Forward Pass — Calculating the total error
NET OUT
We have o1, o2
![Page 29: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/29.jpg)
How it learnsBackpropagation
http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99
1. The Forward Pass — Calculating the total error
NET OUT
We have o1, o2
![Page 30: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/30.jpg)
How it learnsBackpropagation
http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99
1. The Forward Pass — Calculating the total error
NET OUT
We have o1, o2
![Page 31: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/31.jpg)
How it learnsBackpropagation
http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99
1. The Forward Pass — Calculating the total error
NET OUT
We have o1, o2
![Page 32: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/32.jpg)
How it learnsBackpropagation
http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99
1. The Forward Pass — Calculating the total error
NET OUT
We have o1, o2
![Page 33: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/33.jpg)
How it learnsBackpropagation
http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.991. The Backwards Pass — updating weights
NET OUT
![Page 34: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/34.jpg)
How it learnsBackpropagation
http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.991. The Backwards Pass — updating weights
NET OUT
![Page 35: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/35.jpg)
How it learnsBackpropagation
http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.991. The Backwards Pass — updating weights
NET OUT
![Page 36: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/36.jpg)
How it learnsBackpropagation
http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.991. The Backwards Pass — updating weights
NET OUT
![Page 37: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/37.jpg)
How it learnsBackpropagation
http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.991. The Backwards Pass — updating weights
NET OUT
![Page 38: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/38.jpg)
How it learnsBackpropagation
http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.991. The Backwards Pass — updating weights
NET OUT
![Page 39: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/39.jpg)
How it learnsBackpropagation
http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.991. The Backwards Pass — updating weights
NET OUT
![Page 40: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/40.jpg)
How it learnsBackpropagation
http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.991. The Backwards Pass — updating weights
NET OUT
![Page 41: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/41.jpg)
How it learnsBackpropagation
http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.991. The Backwards Pass — updating weights
NET OUT
![Page 42: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/42.jpg)
How it learnsBackpropagation
http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.991. The Backwards Pass — updating weights
NET OUT
![Page 43: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/43.jpg)
How it learnsBackpropagation
http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.991. The Backwards Pass — updating weights
NET OUT
![Page 44: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/44.jpg)
How it learnsBackpropagation
http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.991. The Backwards Pass — updating weights
NET OUT
![Page 45: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/45.jpg)
How it learnsBackpropagation
http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.991. The Backwards Pass — updating weights
NET OUT
![Page 46: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/46.jpg)
How it learnsBackpropagation
http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.991. The Backwards Pass — updating weights
NET OUT
![Page 47: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/47.jpg)
How it learnsBackpropagation
http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.991. The Backwards Pass — updating weights
NET OUT
![Page 48: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/48.jpg)
How it learnsBackpropagation
http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.991. The Backwards Pass — updating weights
NET OUT
![Page 49: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/49.jpg)
How it learnsBackpropagation
http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.991. The Backwards Pass — updating weights
NET OUT
![Page 50: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/50.jpg)
How it learnsBackpropagation
http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.991. The Backwards Pass — updating weights
NET OUT
Learning rate
![Page 51: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/51.jpg)
How it learnsBackpropagation
http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.991. The Backwards Pass — updating weights
NET OUT
Gradient descent update rule
Learning rate
![Page 52: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/52.jpg)
How it learnsBackpropagation
http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99
NET OUT
![Page 53: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/53.jpg)
How it learnsBackpropagation
http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99
NET OUT
• Repeat for w6, w7, w8
![Page 54: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/54.jpg)
How it learnsBackpropagation
http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99
NET OUT
• Repeat for w6, w7, w8 • In analogous way for w1, w2, w3, w4
![Page 55: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/55.jpg)
How it learnsBackpropagation
http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99
NET OUT
• Repeat for w6, w7, w8 • In analogous way for w1, w2, w3, w4 • Calculate the total error again:
it was:0.2910279240.298371109
![Page 56: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/56.jpg)
How it learnsBackpropagation
http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
Given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99
NET OUT
• Repeat for w6, w7, w8 • In analogous way for w1, w2, w3, w4 • Calculate the total error again:
it was:
• Repeat 10,000 times:
0.2910279240.298371109
0.000035085
![Page 57: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/57.jpg)
How it learnsOptimization methods
Alec Radford “Introduction to Deep Learning with Python”
![Page 58: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/58.jpg)
How it learnsOptimization methods
Alec Radford “Introduction to Deep Learning with Python”
![Page 59: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/59.jpg)
How it evolved
![Page 60: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/60.jpg)
How it evolved1-layer NN
INPUT OUTPUTAlec Radford “Introduction to Deep Learning with Python”
![Page 61: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/61.jpg)
How it evolved1-layer NN
92.5% on the MNIST test set
INPUT OUTPUTAlec Radford “Introduction to Deep Learning with Python”
![Page 62: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/62.jpg)
How it evolved1-layer NN
92.5% on the MNIST test set
INPUT OUTPUTAlec Radford “Introduction to Deep Learning with Python”
![Page 63: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/63.jpg)
How it evolvedOne hidden layer
Alec Radford “Introduction to Deep Learning with Python”
![Page 64: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/64.jpg)
How it evolvedOne hidden layer
98.2% on the MNIST test set Alec Radford “Introduction to Deep Learning with Python”
![Page 65: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/65.jpg)
How it evolvedOne hidden layer
98.2% on the MNIST test set Activity of a 100 hidden neurons (out of 625)
Alec Radford “Introduction to Deep Learning with Python”
![Page 66: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/66.jpg)
How it evolvedOverfitting
![Page 67: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/67.jpg)
How it evolvedDropout
Srivastava, Hinton, Krizhevsky, Sutskever, Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”, 2014
![Page 68: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/68.jpg)
How it evolvedDropout
Srivastava, Hinton, Krizhevsky, Sutskever, Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”, 2014
![Page 69: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/69.jpg)
How it evolvedDropout
Srivastava, Hinton, Krizhevsky, Sutskever, Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”, 2014
![Page 70: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/70.jpg)
How it evolvedDropout
Srivastava, Hinton, Krizhevsky, Sutskever, Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”, 2014
![Page 71: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/71.jpg)
How it evolvedReLU
X. Glorot, A. Bordes, Y. Bengio, “Deep Sparse Rectifier Neural Networks”, 2011
![Page 72: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/72.jpg)
How it evolvedReLU
X. Glorot, A. Bordes, Y. Bengio, “Deep Sparse Rectifier Neural Networks”, 2011
![Page 73: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/73.jpg)
How it evolved“Modern” ANN
• Several hidden layers • ReLU activation units • Dropout
![Page 74: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/74.jpg)
How it evolved“Modern” ANN
• Several hidden layers • ReLU activation units • Dropout
![Page 75: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/75.jpg)
How it evolved“Modern” ANN
• Several hidden layers • ReLU activation units • Dropout
99.0% on the MNIST test set
![Page 76: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/76.jpg)
How it evolvedConvolution
![Page 77: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/77.jpg)
How it evolvedConvolution
+1 +1 +1
0 0 0
-1 -1 -1
Prewitt edge
detector
![Page 78: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/78.jpg)
How it evolvedConvolution
+1 +1 +1
0 0 0
-1 -1 -1
Prewitt edge
detector
![Page 79: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/79.jpg)
How it evolvedConvolution
+1 +1 +1
0 0 0
-1 -1 -1
Prewitt edge
detector
![Page 80: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/80.jpg)
How it evolvedConvolution
+1 +1 +1
0 0 0
-1 -1 -1
Prewitt edge
detector
40 40 40
40 40 40
40 40 40
10 10 10
10 10 10
10 10 10
![Page 81: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/81.jpg)
How it evolvedConvolution
+1 +1 +1
0 0 0
-1 -1 -1
Prewitt edge
detector
+1 +1 +1
0 0 0
-1 -1 -1
40 40 40
40 40 40
40 40 40
10 10 10
10 10 10
10 10 10
![Page 82: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/82.jpg)
How it evolvedConvolution
+1 +1 +1
0 0 0
-1 -1 -1
Prewitt edge
detector
+40 +40 +40
0 0 0
-40 -40 -40
10 10 10
10 10 10
10 10 10
![Page 83: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/83.jpg)
How it evolvedConvolution
+1 +1 +1
0 0 0
-1 -1 -1
Prewitt edge
detector0
+40 +40 +40
0 0 0
-40 -40 -40
10 10 10
10 10 10
10 10 10
![Page 84: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/84.jpg)
How it evolvedConvolution
+1 +1 +1
0 0 0
-1 -1 -1
Prewitt edge
detector0
90
90
0
40 40 40
40 40 40
40 40 40
10 10 10
10 10 10
10 10 10
![Page 85: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/85.jpg)
How it evolvedConvolution
+1 +1 +1
0 0 0
-1 -1 -1
Prewitt edge
detector0
90
90
0
40 40 40
40 40 40
40 40 40
10 10 10
10 10 10
10 10 10
![Page 86: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/86.jpg)
How it evolvedConvolution
+1 +1 +1
0 0 0
-1 -1 -1
Prewitt edge
detector0
90
90
0
40 40 40
40 40 40
40 40 40
10 10 10
10 10 10
10 10 10
![Page 87: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/87.jpg)
How it evolvedConvolution
+1 +1 +1
0 0 0
-1 -1 -1
Prewitt edge
detector0
90
90
0
40 40 40
40 40 40
40 40 40
10 10 10
10 10 10
10 10 10
Edge detector is a handcrafted feature detector.
![Page 88: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/88.jpg)
How it evolvedConvolution
The idea of a convolutional layer is to learn feature detectors instead of using handcrafted ones
![Page 89: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/89.jpg)
How it evolvedConvolution
The idea of a convolutional layer is to learn feature detectors instead of using handcrafted ones
http://yann.lecun.com/exdb/mnist/
![Page 90: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/90.jpg)
How it evolvedConvolution
The idea of a convolutional layer is to learn feature detectors instead of using handcrafted ones
99.50% on the MNIST test set CURRENT BEST: 99.77% by committee of 35 conv. nets
http://yann.lecun.com/exdb/mnist/
![Page 91: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/91.jpg)
How it evolvedMore layers
![Page 92: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/92.jpg)
How it evolvedMore layers
C. Szegedy, et al., “Going Deeper with Convolutions”, 2014
![Page 93: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/93.jpg)
How it evolvedMore layers
C. Szegedy, et al., “Going Deeper with Convolutions”, 2014
ILSVRC 2015 winner — 152 (!) layers
K. He et al., “Deep Residual Learning
for Image Recognition”,
2015
![Page 94: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/94.jpg)
How it evolvedHyperparameters
• Network: • architecture • number of layers • number of units (in each layer) • type of the activation function • weight initialization
• Convolutional layers: • size • stride • number of filters
• Optimization method: • learning rate • other method-specific
constants • …
![Page 95: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/95.jpg)
How it evolvedHyperparameters
• Network: • architecture • number of layers • number of units (in each layer) • type of the activation function • weight initialization
• Convolutional layers: • size • stride • number of filters
• Optimization method: • learning rate • other method-specific
constants • …
Grid search :(
![Page 96: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/96.jpg)
How it evolvedHyperparameters
• Network: • architecture • number of layers • number of units (in each layer) • type of the activation function • weight initialization
• Convolutional layers: • size • stride • number of filters
• Optimization method: • learning rate • other method-specific
constants • …
Grid search :(
Random search :/
![Page 97: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/97.jpg)
How it evolvedHyperparameters
• Network: • architecture • number of layers • number of units (in each layer) • type of the activation function • weight initialization
• Convolutional layers: • size • stride • number of filters
• Optimization method: • learning rate • other method-specific
constants • …
Grid search :(
Random search :/
Bayesian optimization :)
Snoek, Larochelle, Adams, “Practical Bayesian Optimization of Machine Learning Algorithms”
![Page 98: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/98.jpg)
How it evolvedHyperparameters
• Network: • architecture • number of layers • number of units (in each layer) • type of the activation function • weight initialization
• Convolutional layers: • size • stride • number of filters
• Optimization method: • learning rate • other method-specific
constants • …
Grid search :(
Random search :/
Bayesian optimization :)
Informal parameter search :)Snoek, Larochelle, Adams, “Practical Bayesian Optimization of Machine Learning Algorithms”
![Page 99: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/99.jpg)
How it evolvedMajor Types of ANNs
feedforward convolutional
![Page 100: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/100.jpg)
How it evolvedMajor Types of ANNs
recurrent
feedforward convolutional
![Page 101: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/101.jpg)
How it evolvedMajor Types of ANNs
recurrent autoencoder
feedforward convolutional
![Page 102: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/102.jpg)
What is the state now
![Page 103: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/103.jpg)
What is the state nowComputer vision
http://cs.stanford.edu/people/karpathy/deepimagesent/
Kaiming He, et al. “Deep Residual Learning for Image Recognition” 2015
![Page 104: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/104.jpg)
What is the state nowNatural Language Processing
speech recognition + translation Facebook bAbi dataset: question answering
http://smerity.com/articles/2015/keras_qa.html
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
![Page 105: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/105.jpg)
What is the state nowAI
DeepMind’s DQN
![Page 106: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/106.jpg)
What is the state nowAI
Sukhbaatar et al. “MazeBase: A Sandbox for Learning from Games”, 2015DeepMind’s DQN
![Page 107: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/107.jpg)
What is the state nowNeuroscience
Güclü and Gerven, “Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream”, 2015
![Page 108: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/108.jpg)
What is the state nowNeuroscience
Güclü and Gerven, “Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream”, 2015
![Page 109: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/109.jpg)
How can you use it
![Page 110: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/110.jpg)
How can you use itPre-trained models
![Page 111: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/111.jpg)
How can you use itPre-trained models
• Go to https://github.com/BVLC/caffe/wiki/Model-Zoo, pick a model
• … and use it in your application
![Page 112: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/112.jpg)
How can you use itPre-trained models
• Go to https://github.com/BVLC/caffe/wiki/Model-Zoo, pick a model
• … and use it in your application • Or …
![Page 113: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/113.jpg)
How can you use itPre-trained models
• Go to https://github.com/BVLC/caffe/wiki/Model-Zoo, pick a model
• … and use it in your application • Or …
• … use part of it as the starting point for your model
. . .
![Page 114: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/114.jpg)
How can you use itZoo of Frameworks
Low-level High-level & Wrappers
![Page 122: Deep learning](https://reader031.fdocuments.us/reader031/viewer/2022030305/58708f301a28ab412b8b51f3/html5/thumbnails/122.jpg)
• A Step by Step Backpropagation Example http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example
• Online book by By Michael Nielsen http://neuralnetworksanddeeplearning.com
• CS231n: Convolutional Neural Networks for Visual Recognition http://cs231n.stanford.edu/