Artificial Neural Networks: Intro · 2018. 5. 17. · Artificial Neural Networks: Intro CSC411:...
Transcript of Artificial Neural Networks: Intro · 2018. 5. 17. · Artificial Neural Networks: Intro CSC411:...
![Page 1: Artificial Neural Networks: Intro · 2018. 5. 17. · Artificial Neural Networks: Intro CSC411: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang “Making](https://reader036.fdocuments.us/reader036/viewer/2022071415/610fa8f0d11f9c53ab613b09/html5/thumbnails/1.jpg)
Artificial Neural Networks: Intro
CSC411: Machine Learning and Data Mining, Winter 2018
Michael Guerzhoy and Lisa Zhang
“Making Connections” by Filomena Booth (2013)
Slides from Andrew Ng, Geoffrey Hinton, and Tom Mitchell
1
![Page 2: Artificial Neural Networks: Intro · 2018. 5. 17. · Artificial Neural Networks: Intro CSC411: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang “Making](https://reader036.fdocuments.us/reader036/viewer/2022071415/610fa8f0d11f9c53ab613b09/html5/thumbnails/2.jpg)
Non-Linear Decision Surfaces
x1
x2
• There is no linear decision boundary
2
![Page 3: Artificial Neural Networks: Intro · 2018. 5. 17. · Artificial Neural Networks: Intro CSC411: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang “Making](https://reader036.fdocuments.us/reader036/viewer/2022071415/610fa8f0d11f9c53ab613b09/html5/thumbnails/3.jpg)
Car Classification
Testing:
What is this?
Not a carCars
3
![Page 4: Artificial Neural Networks: Intro · 2018. 5. 17. · Artificial Neural Networks: Intro CSC411: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang “Making](https://reader036.fdocuments.us/reader036/viewer/2022071415/610fa8f0d11f9c53ab613b09/html5/thumbnails/4.jpg)
You see this:
But the camera sees this:
4
![Page 5: Artificial Neural Networks: Intro · 2018. 5. 17. · Artificial Neural Networks: Intro CSC411: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang “Making](https://reader036.fdocuments.us/reader036/viewer/2022071415/610fa8f0d11f9c53ab613b09/html5/thumbnails/5.jpg)
Learning Algorithm
pixel 1
pixel 2
pixel 1
pixel 2
Raw image
Cars“Non”-Cars
![Page 6: Artificial Neural Networks: Intro · 2018. 5. 17. · Artificial Neural Networks: Intro CSC411: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang “Making](https://reader036.fdocuments.us/reader036/viewer/2022071415/610fa8f0d11f9c53ab613b09/html5/thumbnails/6.jpg)
pixel 1
pixel 2
Raw image
Cars“Non”-Cars
Learning Algorithm
pixel 1
pixel 2
![Page 7: Artificial Neural Networks: Intro · 2018. 5. 17. · Artificial Neural Networks: Intro CSC411: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang “Making](https://reader036.fdocuments.us/reader036/viewer/2022071415/610fa8f0d11f9c53ab613b09/html5/thumbnails/7.jpg)
pixel 1
pixel 2
Raw image
Cars“Non”-Cars
50 x 50 pixel images→ 2500 pixels(7500 if RGB)
pixel 1 intensity
pixel 2 intensity
pixel 2500 intensity
Quadratic features ( ): ≈3 millionfeatures
Learning Algorithm
pixel 1
pixel 2
![Page 8: Artificial Neural Networks: Intro · 2018. 5. 17. · Artificial Neural Networks: Intro CSC411: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang “Making](https://reader036.fdocuments.us/reader036/viewer/2022071415/610fa8f0d11f9c53ab613b09/html5/thumbnails/8.jpg)
Simple Non-Linear Classification Example
x1
x2
x1
x2
8
![Page 9: Artificial Neural Networks: Intro · 2018. 5. 17. · Artificial Neural Networks: Intro CSC411: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang “Making](https://reader036.fdocuments.us/reader036/viewer/2022071415/610fa8f0d11f9c53ab613b09/html5/thumbnails/9.jpg)
Inspiration: The Brain
9
![Page 10: Artificial Neural Networks: Intro · 2018. 5. 17. · Artificial Neural Networks: Intro CSC411: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang “Making](https://reader036.fdocuments.us/reader036/viewer/2022071415/610fa8f0d11f9c53ab613b09/html5/thumbnails/10.jpg)
Inspiration: The Brain
10
![Page 11: Artificial Neural Networks: Intro · 2018. 5. 17. · Artificial Neural Networks: Intro CSC411: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang “Making](https://reader036.fdocuments.us/reader036/viewer/2022071415/610fa8f0d11f9c53ab613b09/html5/thumbnails/11.jpg)
Linear Neuron
𝑤1 𝑤2 𝑤3
Linear neuron
𝑤0
𝑤0 + 𝑤1𝑥1 + 𝑤2𝑥2 + 𝑤3𝑥3
𝑥1 𝑥2 𝑥3
11
![Page 12: Artificial Neural Networks: Intro · 2018. 5. 17. · Artificial Neural Networks: Intro CSC411: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang “Making](https://reader036.fdocuments.us/reader036/viewer/2022071415/610fa8f0d11f9c53ab613b09/html5/thumbnails/12.jpg)
Linear Neuron: Cost Function
• Any number of choices. The one made for linear regression is
σ𝑖=1𝑚 𝑦 𝑖 − 𝑤𝑇𝑥(𝑖)
2
• Can minimize this using gradient descent to obtain the best weights w for the training set
12
![Page 13: Artificial Neural Networks: Intro · 2018. 5. 17. · Artificial Neural Networks: Intro CSC411: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang “Making](https://reader036.fdocuments.us/reader036/viewer/2022071415/610fa8f0d11f9c53ab613b09/html5/thumbnails/13.jpg)
Logistic Neuron
𝑤1 𝑤2 𝑤3
𝑤0
𝜎(𝑤0 + 𝑤1𝑥1 + 𝑤2𝑥2 +𝑤3𝑥3), 𝜎 𝑡 =1
1 + exp(−𝑡)
𝑥1 𝑥2 𝑥3
13
![Page 14: Artificial Neural Networks: Intro · 2018. 5. 17. · Artificial Neural Networks: Intro CSC411: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang “Making](https://reader036.fdocuments.us/reader036/viewer/2022071415/610fa8f0d11f9c53ab613b09/html5/thumbnails/14.jpg)
Logistic Neuron: Cost Function
• Could use the quadratic cost function again
• Could use the “log-loss” function to make the neuron perform logistic regression
−
𝑖=1
𝑚
𝑦 𝑖 log1
1 + exp −𝑤𝑇𝑥 𝑖+ (1 − 𝑦 𝑖 ) log
exp −𝑤𝑇𝑥 𝑖
1 + exp −𝑤𝑇𝑥 𝑖
(Note: we derived this cost function by saying we want to maximize the likelihood of the data under a certain model, but there’s nothing stopping us from just making up a loss function)
14
![Page 15: Artificial Neural Networks: Intro · 2018. 5. 17. · Artificial Neural Networks: Intro CSC411: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang “Making](https://reader036.fdocuments.us/reader036/viewer/2022071415/610fa8f0d11f9c53ab613b09/html5/thumbnails/15.jpg)
Logistic Regression Cost Function: Another Look
• 𝐶𝑜𝑠𝑡 ℎ𝑤 𝑥 , 𝑦 = ቐ− log ℎ𝑤 𝑥 , 𝑦 = 1
− log 1 − ℎ𝑤 𝑥 , 𝑦 = 0
• If y = 1, want the cost to be small if ℎ𝑤 𝑥 is close to 1 and large if ℎ𝑤 𝑥 is close to 0• -log(t) is 0 for t=1 and infinity for t = 0
• If y = 0, want the cost to be small if ℎ𝑤 𝑥 is close to 0 and large if ℎ𝑤 𝑥 is close to 1
• Note:0 < 𝜎 𝑡 < 1
𝜎 𝑡 =1
1 + exp(−𝑡)15
![Page 16: Artificial Neural Networks: Intro · 2018. 5. 17. · Artificial Neural Networks: Intro CSC411: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang “Making](https://reader036.fdocuments.us/reader036/viewer/2022071415/610fa8f0d11f9c53ab613b09/html5/thumbnails/16.jpg)
Multilayer Neural Networks
• ℎ𝑖,𝑗 = g W𝑖,𝑗𝑥
= 𝑔(
𝑘
𝑊𝑖,𝑗,𝑘𝑥𝑘 )
• 𝑥0 = 1 always
• 𝑊𝑖,𝑗,0 is the “bias”
• g is the activation function• Could be 𝑔 𝑡 = 𝑡• Could be 𝑔 𝑡 = 𝜎 𝑡
• Nobody uses those anymore…
output units
input units
hidden units
𝑥1
ℎ𝑖,𝑗
𝑥2 𝑥3
𝑜1 𝑜2
16
![Page 17: Artificial Neural Networks: Intro · 2018. 5. 17. · Artificial Neural Networks: Intro CSC411: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang “Making](https://reader036.fdocuments.us/reader036/viewer/2022071415/610fa8f0d11f9c53ab613b09/html5/thumbnails/17.jpg)
Why do we need activation functions?
17
![Page 18: Artificial Neural Networks: Intro · 2018. 5. 17. · Artificial Neural Networks: Intro CSC411: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang “Making](https://reader036.fdocuments.us/reader036/viewer/2022071415/610fa8f0d11f9c53ab613b09/html5/thumbnails/18.jpg)
Activation functions?
18
![Page 19: Artificial Neural Networks: Intro · 2018. 5. 17. · Artificial Neural Networks: Intro CSC411: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang “Making](https://reader036.fdocuments.us/reader036/viewer/2022071415/610fa8f0d11f9c53ab613b09/html5/thumbnails/19.jpg)
Exercise:
• Hand code a neural network to compute:• AND
• XOR
• Use only sigmoid
• Use only ReLU activations
19
![Page 20: Artificial Neural Networks: Intro · 2018. 5. 17. · Artificial Neural Networks: Intro CSC411: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang “Making](https://reader036.fdocuments.us/reader036/viewer/2022071415/610fa8f0d11f9c53ab613b09/html5/thumbnails/20.jpg)
Multilayer Neural Network: Speech Recognition Example (multi-class classification)
20
![Page 21: Artificial Neural Networks: Intro · 2018. 5. 17. · Artificial Neural Networks: Intro CSC411: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang “Making](https://reader036.fdocuments.us/reader036/viewer/2022071415/610fa8f0d11f9c53ab613b09/html5/thumbnails/21.jpg)
Universal Approximator
• Neural networks with at least one hidden layer (and enough neurons) are universal approximators• Can represent any (smooth) function
• The capacity (ability to represent different functions) increases with more hidden layers and more neurons
• Why go deeper? One hidden layer might need a lotof neurons. Deeper and narrower networks are more compact
21
![Page 22: Artificial Neural Networks: Intro · 2018. 5. 17. · Artificial Neural Networks: Intro CSC411: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang “Making](https://reader036.fdocuments.us/reader036/viewer/2022071415/610fa8f0d11f9c53ab613b09/html5/thumbnails/22.jpg)
Computation in Neural Networks
• Forward pass• Making predictions
• Plug in the input x, get the output y
• Backward pass• Compute the gradient of the cost function with respect
to the weights
22
![Page 23: Artificial Neural Networks: Intro · 2018. 5. 17. · Artificial Neural Networks: Intro CSC411: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang “Making](https://reader036.fdocuments.us/reader036/viewer/2022071415/610fa8f0d11f9c53ab613b09/html5/thumbnails/23.jpg)
Multilayer Neural Network for Classification:
input vector (x)
hidden
layer
outputs𝑜1
𝑜𝑖 is large if the probability that the correct class is i is high
𝑜2 𝑜3
A possible cost function:
𝐶 𝑜, 𝑦 =
𝑖=1
𝑚
|𝑦 𝑖 − 𝑜 𝑖 |2
𝑦(𝑖)’s and 𝑜(𝑖)’s encoded using one-hot encoding
ℎ1 ℎ3
𝑥1
𝑊(1,1,1)
𝑊(2,1,1)𝑊(2,3,3)
𝑊(1,3,5)
ℎ5
23
![Page 24: Artificial Neural Networks: Intro · 2018. 5. 17. · Artificial Neural Networks: Intro CSC411: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang “Making](https://reader036.fdocuments.us/reader036/viewer/2022071415/610fa8f0d11f9c53ab613b09/html5/thumbnails/24.jpg)
Forward Pass (vectorized)
𝐨 = 𝑔 𝑊(2) 𝑇𝒉 + 𝑏(2)
𝐡 = 𝑔 𝑊(1) 𝑇𝒙 + 𝑏(1)
…etc... if there are more layers
output units
input units
hidden units
𝑥1
ℎ𝑖,𝑗
𝑥2 𝑥3
𝑜1 𝑜2
24
![Page 25: Artificial Neural Networks: Intro · 2018. 5. 17. · Artificial Neural Networks: Intro CSC411: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang “Making](https://reader036.fdocuments.us/reader036/viewer/2022071415/610fa8f0d11f9c53ab613b09/html5/thumbnails/25.jpg)
Backwards Pass (training)
Need to find
𝑊 = argmin𝑊
𝑖=1
𝑚
𝑙𝑜𝑠𝑠(𝒐 𝑖 , 𝒚 𝑖 )
Where:• 𝒐 𝑖 is the output of the neural network• 𝒚 𝑖 is the ground truth• 𝑊 is all the weights in the neural network• 𝑙𝑜𝑠𝑠 is a continuous loss function.
Use gradient descent to find a good 𝑊
![Page 26: Artificial Neural Networks: Intro · 2018. 5. 17. · Artificial Neural Networks: Intro CSC411: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang “Making](https://reader036.fdocuments.us/reader036/viewer/2022071415/610fa8f0d11f9c53ab613b09/html5/thumbnails/26.jpg)
But how to compute gradient?
• To optimize the weights / parameters o the neural network, we need to compute gradient of the cost function: 𝐶 𝐨, 𝐲 = σ𝑖=1
𝑚 𝑙𝑜𝑠𝑠(𝒐 𝑖 , 𝒚 𝑖 )with respect to every weight in the neural network.
• Need to compute, for every layer and weight 𝑙, 𝑗, 𝑖 :𝜕𝐶
𝜕𝑊 𝑙,𝑗,𝑖
• How to do this? How to do this efficiently?
26
![Page 27: Artificial Neural Networks: Intro · 2018. 5. 17. · Artificial Neural Networks: Intro CSC411: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang “Making](https://reader036.fdocuments.us/reader036/viewer/2022071415/610fa8f0d11f9c53ab613b09/html5/thumbnails/27.jpg)
Review: Chain Rule
• Univariate Chain Rule𝑑
𝑑𝑡𝑔 𝑓 𝑡 =
𝑑𝑔
𝑑𝑓.𝑑𝑓
𝑑𝑡
• Multivariate Chain Rule𝜕𝑔
𝜕𝑥= σ
𝜕𝑔
𝜕𝑓𝑖
𝜕𝑓𝑖
𝜕𝑥
27
𝑓1(𝑥) 𝑓2(𝑥)
𝑥
𝑓3(𝑥)
𝑔(𝑓1 𝑥 , 𝑓2 𝑥 , 𝑓3 𝑥 )
![Page 28: Artificial Neural Networks: Intro · 2018. 5. 17. · Artificial Neural Networks: Intro CSC411: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang “Making](https://reader036.fdocuments.us/reader036/viewer/2022071415/610fa8f0d11f9c53ab613b09/html5/thumbnails/28.jpg)
Gradient of Single Weight (last layer)
• We need the partial derivatives of the cost function 𝐶 𝑜, 𝑦 w.r.t all the W and b .
• 𝑜𝑖 = 𝑔 σ𝑗𝑊2,𝑗,𝑖 ℎ𝑗 + 𝑏(2,𝑗)
• Let 𝑧𝑖 = σ𝑗𝑊2,𝑗,𝑖 ℎ𝑗 + 𝑏 2,𝑗 so that 𝑜𝑖 = 𝑔 𝑧𝑖
• Partial derivative of 𝐶 𝑜, 𝑦 with respect to 𝑊 2,𝑗,𝑖 all evaluated at (𝑥, 𝑦,𝑊, 𝑏, ℎ, 𝑜)
𝜕𝐶
𝜕𝑊 2,𝑗,𝑖 =𝜕𝑜𝑖
𝜕𝑊 2,𝑗,𝑖
𝜕𝐶
𝜕𝑜𝑖
=𝜕𝑧𝑖
𝜕𝑊 2,𝑗,𝑖
𝜕𝑔
𝜕𝑧𝑖
𝜕𝐶
𝜕𝑜𝑖
= ℎ𝑗𝜕𝑔
𝜕𝑧𝑖
𝜕𝐶
𝜕𝑜𝑖
= ℎ𝑗𝑔′ 𝑧𝑖
𝜕
𝜕𝑜𝑖𝐶(𝑜, 𝑦)
28
𝑥1
ℎ𝑖,𝑗
𝑥2 𝑥3
𝑜1 𝑜2
C
𝑊(2,2,1)
![Page 29: Artificial Neural Networks: Intro · 2018. 5. 17. · Artificial Neural Networks: Intro CSC411: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang “Making](https://reader036.fdocuments.us/reader036/viewer/2022071415/610fa8f0d11f9c53ab613b09/html5/thumbnails/29.jpg)
Gradient of Single Weight (last layer)
• For example, if we use:• sigmoid activation 𝑔 = 𝜎, 𝜎′ 𝑡 = 𝜎 𝑡 1 − 𝜎 𝑡
• MSE loss: 𝐶 𝒐, 𝒚 = σ𝑖=1𝑁 𝑜𝑖 − 𝑦𝑖
2
• Then𝜕𝐶
𝜕𝑊 2,𝑗,𝑖 = ℎ𝑗𝑔′ 𝑧𝑖
𝜕𝐶
𝜕𝑜𝑖
= ℎ𝑗𝑔 𝑧𝑖 1 − 𝑔 𝑧𝑖 2 𝑜𝑖 − 𝑦𝑖= ℎ𝑗𝑜𝑖 1 − 𝑜𝑖 2(𝑜𝑖 − 𝑦𝑖)
29
![Page 30: Artificial Neural Networks: Intro · 2018. 5. 17. · Artificial Neural Networks: Intro CSC411: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang “Making](https://reader036.fdocuments.us/reader036/viewer/2022071415/610fa8f0d11f9c53ab613b09/html5/thumbnails/30.jpg)
Vectorization
• For a single weight, we had:
𝜕𝐶
𝜕𝑊 2,𝑗,𝑖= ℎ𝑗𝑜𝑖 1 − 𝑜𝑖 2(𝑜𝑖 − 𝑦𝑖)
• Vectorizing, we get
𝜕𝐶
𝜕𝑾 𝟐= 2𝒉 ⋅ 𝒐. 1 − 𝒐 . 𝒐 − 𝒚
𝑇
• Note this is for sigmoid activation, square loss
30
![Page 31: Artificial Neural Networks: Intro · 2018. 5. 17. · Artificial Neural Networks: Intro CSC411: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang “Making](https://reader036.fdocuments.us/reader036/viewer/2022071415/610fa8f0d11f9c53ab613b09/html5/thumbnails/31.jpg)
What about earlier layers?
input vector (x)
hidden
layer
outputs𝑜1 𝑜2 𝑜3
ℎ1 ℎ3
𝑥1
𝑊(1,1,1)
𝑊(2,1,1)𝑊(2,3,3)
𝑊(1,3,5)
ℎ5
Use multivariate chain rule:
𝜕𝐶
𝜕ℎ𝑖=
𝑘
(𝜕𝐶
𝜕𝑜𝑘
𝜕𝑜𝑘ℎ𝑖
)
𝜕𝐶
𝜕𝑊(1,𝑗,𝑖)=
𝜕𝐶
𝜕ℎ𝑖
𝜕ℎ𝑖𝜕𝑊(1,𝑗,𝑖)
31
![Page 32: Artificial Neural Networks: Intro · 2018. 5. 17. · Artificial Neural Networks: Intro CSC411: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang “Making](https://reader036.fdocuments.us/reader036/viewer/2022071415/610fa8f0d11f9c53ab613b09/html5/thumbnails/32.jpg)
Backpropagation
hidden
layers
outputs
hidB
hidA
𝑊(1,3,5)
𝜕𝐶
𝜕ℎ𝑖𝑑𝐵𝑖=
𝑘
(𝜕𝐶
𝜕ℎ𝑖𝑑𝐴𝑘
𝜕ℎ𝑖𝑑𝐴𝑘𝜕ℎ𝑖𝑑𝐵𝑖
)
𝜕𝐶
𝜕𝑊(1,𝑗,𝑖)=
𝜕𝐶
𝜕ℎ𝑖𝑑𝐵𝑖
𝜕ℎ𝑖𝑑𝐵𝑖𝜕𝑊(1,𝑗,𝑖)
32
![Page 33: Artificial Neural Networks: Intro · 2018. 5. 17. · Artificial Neural Networks: Intro CSC411: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang “Making](https://reader036.fdocuments.us/reader036/viewer/2022071415/610fa8f0d11f9c53ab613b09/html5/thumbnails/33.jpg)
input vector
hidden
layers
outputs
Back-propagate
error signal to
get derivatives
for learning
Compare outputs with
correct answer to get
error signal
33
![Page 34: Artificial Neural Networks: Intro · 2018. 5. 17. · Artificial Neural Networks: Intro CSC411: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang “Making](https://reader036.fdocuments.us/reader036/viewer/2022071415/610fa8f0d11f9c53ab613b09/html5/thumbnails/34.jpg)
Training Summary
34
![Page 35: Artificial Neural Networks: Intro · 2018. 5. 17. · Artificial Neural Networks: Intro CSC411: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang “Making](https://reader036.fdocuments.us/reader036/viewer/2022071415/610fa8f0d11f9c53ab613b09/html5/thumbnails/35.jpg)
Why is training neural networks so hard?
• Hard to optimize:• Not convex
• Local minima, saddle points, etc…
• Can take a long time to train
• Architecture choices:• How many layers? How many units in each layer?
• What activation function to use?
• Choice of optimizer:• We talked about gradient descent, but there are
techniques that improves upon gradient descent
35
![Page 36: Artificial Neural Networks: Intro · 2018. 5. 17. · Artificial Neural Networks: Intro CSC411: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang “Making](https://reader036.fdocuments.us/reader036/viewer/2022071415/610fa8f0d11f9c53ab613b09/html5/thumbnails/36.jpg)
Demo
http://playground.tensorflow.org/
36