ESS391H The Application of Machine Learning and Artificial ...The Application of Machine Learning...

27
The Application of Machine Learning and Artificial Neural Networks in Geosciences Lecture 10 Deep Learning: Neural Networks University of Toronto, Department of Earth Sciences, Hosein Shahnas 1 ESS391H

Transcript of ESS391H The Application of Machine Learning and Artificial ...The Application of Machine Learning...

Page 1: ESS391H The Application of Machine Learning and Artificial ...The Application of Machine Learning and Artificial Neural Networks in Geosciences Lecture 10 – Deep Learning: Neural

The Application of Machine Learning and Artificial

Neural Networks in Geosciences

Lecture 10 – Deep Learning: Neural Networks

University of Toronto, Department of Earth Sciences,

Hosein Shahnas

1

ESS391H

Page 2: ESS391H The Application of Machine Learning and Artificial ...The Application of Machine Learning and Artificial Neural Networks in Geosciences Lecture 10 – Deep Learning: Neural

2

One-vs.-All (OvA)

The perceptron algorithm can be extended to multi-class classification-for

example, through the One-vs.-All (OvA) technique.

Class 1 Class 0

For 3-class problem

1

0 1 0

0 1

Multiclass Problems

5

0 1

4

2

3

6

7

8

9

Page 3: ESS391H The Application of Machine Learning and Artificial ...The Application of Machine Learning and Artificial Neural Networks in Geosciences Lecture 10 – Deep Learning: Neural

3

Multiclass Problems

Softmax

This is a generalization of the logistic function to compute meaningful class-

probabilities in multi-class settings (multinomial logistic regression).

F(𝑧𝑗) = 1

1+𝑒−𝑧𝑗

𝑆𝑜𝑓𝑡𝑚𝑎𝑥 𝑧𝑗 =𝑒𝑧𝑗

𝑒𝑧𝑘𝐾𝑘=1

𝐾: 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠

𝑷𝟏

𝑷𝟐

𝑷𝟑

P3

P1

P2

𝑓𝑜𝑟 𝑗 = 1, 2, 3

Page 4: ESS391H The Application of Machine Learning and Artificial ...The Application of Machine Learning and Artificial Neural Networks in Geosciences Lecture 10 – Deep Learning: Neural

2D to 1D array

n images, one per line

1

2

3

n

n samples (images)

2D: 28x28 pixels

1D: 784

784

Learning

Image Recognition

Handwritten Digits Classification

Learning numbers 0-9

Page 5: ESS391H The Application of Machine Learning and Artificial ...The Application of Machine Learning and Artificial Neural Networks in Geosciences Lecture 10 – Deep Learning: Neural

5

Image Recognition

Q

𝑷𝟎

𝑷𝟏

𝑷𝟐

𝑷𝟑

𝑷𝟒

𝑷𝟓

𝑷𝟔

𝑷𝟕

𝑷𝟖

𝑷𝟗

=

𝟎𝟎𝟎𝟎𝟎𝟎𝟎𝟏𝟎𝟎

=

𝑆𝑜𝑓𝑡𝑚𝑎𝑥 𝑧𝑗 =𝑒𝑧𝑗

𝑒𝑧𝑘𝐾𝑘=1

, 𝑗 = 1, 2, . . , 10, 𝐾 = 10 𝑃𝑖𝑖 = 1

𝑄:𝑄𝑢𝑎𝑛𝑡𝑖𝑧𝑒𝑟

Page 6: ESS391H The Application of Machine Learning and Artificial ...The Application of Machine Learning and Artificial Neural Networks in Geosciences Lecture 10 – Deep Learning: Neural

6

Let’s explore what we can do with combinations of perceptrons rather than

single ones.

Let’s consider the classification problem for which the perceptron algorithm

failed.

This problem cannot be classified by a single perceptron. But what about

with two perceptrons?

Deep Learning - Combining Perceptrons

x1

x2

x1

x2

h1 h2

x1

x2

Page 7: ESS391H The Application of Machine Learning and Artificial ...The Application of Machine Learning and Artificial Neural Networks in Geosciences Lecture 10 – Deep Learning: Neural

7

+

+ -

-

𝝈 ≡ 𝝓(𝒛) = 𝟎, 𝒛 < 𝟎 𝟏, 𝒛 ≥ 𝟎

𝒛 = 𝒘𝒊𝒊 𝒙𝒊 𝝈 = 𝝓 𝟐𝟎 ∗ 𝟎 + 𝟐𝟎 ∗ 𝟎 − 𝟏𝟎 = 0 𝝈 = 𝝓 −𝟐𝟎 ∗ 𝟎 − 𝟐𝟎 ∗ 𝟎 + 𝟑𝟎 = 1 𝝈 = 𝝓 𝟐𝟎 ∗ 𝟎 + 𝟐𝟎 ∗ 𝟏 − 𝟑𝟎 = 0

𝝈 = 𝝓 𝟐𝟎 ∗ 𝟏 + 𝟐𝟎 ∗ 𝟏 − 𝟏𝟎 = 1 𝝈 = 𝝓 −𝟐𝟎 ∗ 𝟏 − 𝟐𝟎 ∗ 𝟏 + 𝟑𝟎 = 0 𝝈 = 𝝓 𝟐𝟎 ∗ 𝟏 + 𝟐𝟎 ∗ 𝟎 − 𝟑𝟎 = 0

𝝈 = 𝝓 𝟐𝟎 ∗ 𝟎 + 𝟐𝟎 ∗ 𝟏 − 𝟏𝟎 = 1 𝝈 = 𝝓 −𝟐𝟎 ∗ 𝟎 − 𝟐𝟎 ∗ 𝟏 + 𝟑𝟎 = 1 𝝈 = 𝝓 𝟐𝟎 ∗ 𝟏 + 𝟐𝟎 ∗ 𝟏 − 𝟑𝟎 = 1

𝝈 = 𝝓 𝟐𝟎 ∗ 𝟏 + 𝟐𝟎 ∗ 𝟎 − 𝟏𝟎 = 1 𝝈 = 𝝓 −𝟐𝟎 ∗ 𝟏 − 𝟐𝟎 ∗ 𝟎 + 𝟑𝟎 = 1 𝝈 = 𝝓 𝟐𝟎 ∗ 𝟏 + 𝟐𝟎 ∗ 𝟏 − 𝟑𝟎 = 1

h1 h2 y

Or Nand And

x2

x1

Deep Learning - Combining Perceptrons

y

x2

x1 h1

h2

w0 = -10

w1 = 20

w2 = 20

w1 = -20

w2 = -20

w0 = 30

w0 = -30

w1 = 20

w2 = 20

Page 8: ESS391H The Application of Machine Learning and Artificial ...The Application of Machine Learning and Artificial Neural Networks in Geosciences Lecture 10 – Deep Learning: Neural

8

- - + + + + - -

- - + + + + - -

- - + + + + - -

8-Perceptrons 16-Perceptrons

We solved this problem using feature-transformation before.

We can use neural networks.

Combining Perceptrons - Multilayer Perceptron

Page 9: ESS391H The Application of Machine Learning and Artificial ...The Application of Machine Learning and Artificial Neural Networks in Geosciences Lecture 10 – Deep Learning: Neural

9

There are many perceptrons (and therefore many parameters), so optimization

might be a problem (remember that for single perceptron when the data were

nonlinear, we had convergence problem).

Solution:

1- We chose a soft threshold (tanh) rather than a hard threshold (step function).

2- We use SGD

3- And an efficient way to find weight factors w.

𝝓 𝒛 = tanh(z) = 𝒆𝒛 − 𝒆−𝒛

𝒆𝒛+ 𝒆−𝒛

Step function

tanh

Input Hidden layers output

0 1≤ 𝒍 < 𝑳 L

Optimization

Page 10: ESS391H The Application of Machine Learning and Artificial ...The Application of Machine Learning and Artificial Neural Networks in Geosciences Lecture 10 – Deep Learning: Neural

10

Neural networks can be thought as Learned Nonlinear Transform. Note that the

nonlinear transformation of features (e.g., polynomial, RBF, etc.) are not learned

transformation.

Since in the hidden layers the features are higher order features (leaned

features), then we can implement a better learning. Indeed the network looks for

weight factors for a proper transform the factors that fits data.

Remark

Hidden layers : higher order features

or learned features

Raw input: features of

dimension d

Output

Page 11: ESS391H The Application of Machine Learning and Artificial ...The Application of Machine Learning and Artificial Neural Networks in Geosciences Lecture 10 – Deep Learning: Neural

11

Dropout

Dropout Dropout is a regularization technique for neural network models (Srivastava, et

al. , 2014). This is a simple way to prevent neural networks from overfitting.

Some key points: 1) Use 20%-50% dropout

2) Dropout with larger network in general provides better performance, giving the

model more of an opportunity to learn independent representations.

3) Dropout can be used on visible (input) as well as hidden layers

Standard neural network Neural network with dropout

Page 12: ESS391H The Application of Machine Learning and Artificial ...The Application of Machine Learning and Artificial Neural Networks in Geosciences Lecture 10 – Deep Learning: Neural

12

Deep Learning - A Simple Network

Ex. 1: MNIST Database - Handwritten digits

A simple sequential deep learning model for handwritten digits recognition using Keras and

TensorFlow,

784

512

0.2

Input 10

Output

Relu Sigmoid

28×28

tf.keras.layers.Flatten(input_shape=(28, 28)

tf.keras.layers.Dense(512, activation=tf.nn.relu)

tf.keras.layers.Dropout(0.2)

tf.keras.layers.Dense(10, activation=tf.nn.softmax)

Page 13: ESS391H The Application of Machine Learning and Artificial ...The Application of Machine Learning and Artificial Neural Networks in Geosciences Lecture 10 – Deep Learning: Neural

13

Deep Learning - A Simple Network

Ex. 2: Pima Indians onset of diabetes dataset

A simple sequential deep learning model for predicting handwritten digits using

Keras,

8 8 12

Input

1

Relu ReLu Sigmoid

model.add(Dense(8, input_dim=8, activation='relu'))

model.add(Dense(12, activation='relu'))

model.add(Dense(1, activation='sigmoid'))

𝒙𝟏

𝒙𝟖

Output

Page 14: ESS391H The Application of Machine Learning and Artificial ...The Application of Machine Learning and Artificial Neural Networks in Geosciences Lecture 10 – Deep Learning: Neural

Single Neuron

Binary Class

Shallow Learning

Binary Class

Deep neural Network

Shallow Learning

Multi Class Multi layer

Multi Class

14

Deep Learning

Page 15: ESS391H The Application of Machine Learning and Artificial ...The Application of Machine Learning and Artificial Neural Networks in Geosciences Lecture 10 – Deep Learning: Neural

15

Some error metrics MSE (Mean Squared Error): The average error based on the square of the

difference between the original and predicted values.

MAE (Mean Absolute Error): The average error based on the absolute values of

the difference between the original and predicted values.

RMSE (Root Mean Squared Error): The error rate by the square root of MSE.

MSLE (Mean Squared Logarithmic Error): The mean error over the log-

transformed values of the original and the predicted values.

R-squared (Coefficient of determination): A measure of how well the values fit

compared to the original values (0-1). The higher the value represents higher

accuracy.

𝑚𝑠𝑒 = 1

𝑁 𝑦𝑖 − 𝑦 2𝑁

1 𝑦 :𝑚𝑒𝑎𝑛 𝑣𝑎𝑙𝑢𝑒

𝑚𝑎𝑒 = 1

𝑁 𝑦𝑖 − 𝑦 𝑁

1 𝑦 :𝑚𝑒𝑎𝑛 𝑣𝑎𝑙𝑢𝑒

𝑟𝑚𝑠𝑒 = 𝑚𝑠𝑒

𝑚𝑠𝑙𝑒 =1

𝑁 log (𝑦𝑖 + 1) − log (𝑦 + 1) 2𝑁

1

𝑅2 = 1 − 𝑦𝑖−𝑦 2

𝑦𝑖−𝑦 2

Deep Learning

Page 16: ESS391H The Application of Machine Learning and Artificial ...The Application of Machine Learning and Artificial Neural Networks in Geosciences Lecture 10 – Deep Learning: Neural

16

Deep Learning - Why more layers?

Page 17: ESS391H The Application of Machine Learning and Artificial ...The Application of Machine Learning and Artificial Neural Networks in Geosciences Lecture 10 – Deep Learning: Neural

17

Deep Learning - Large Number of parameters

Page 18: ESS391H The Application of Machine Learning and Artificial ...The Application of Machine Learning and Artificial Neural Networks in Geosciences Lecture 10 – Deep Learning: Neural

I: Image

K: ‘Filter‘ or ‘Kernel’ or ‘Feature Detector’

I*K: Convolved Feature (activation map)

𝑾𝒄 = 𝑾−𝑭𝑾

𝑺𝑾 + 1 The size of the convolved field (I*K)

S: stride

Ex.: 𝑾𝒄 = (7-3)/1+1 = 5

𝑾𝒄 F

W

Volume image

18

Convolutional Networks (CNN)

𝑓 ∗ 𝑔 𝑡 = 𝑓 𝜏 𝑔 𝑡 − 𝜏 𝑑𝜏∞

−∞

𝐼 ∗ 𝐾 𝑥𝑦 = 𝐾𝑖𝑗 ∙ 𝐼𝑥+𝑖−1,𝑦+𝑗−1

𝑤

𝑗=1

𝑖=1

Page 19: ESS391H The Application of Machine Learning and Artificial ...The Application of Machine Learning and Artificial Neural Networks in Geosciences Lecture 10 – Deep Learning: Neural

19

https://medium.com/@ageitgey/machine-learning-is-fun-part-3-deep-learning-and-convolutional-neural-networks-f40359318721

Convolutional Networks (CNN) - Filters

Page 20: ESS391H The Application of Machine Learning and Artificial ...The Application of Machine Learning and Artificial Neural Networks in Geosciences Lecture 10 – Deep Learning: Neural

I𝐝𝐞𝐧𝐭𝐢𝐭𝐲 𝟎 𝟎 𝟎𝟎 𝟏 𝟎𝟎 𝟎 𝟎

Edge detection

−𝟏 − 𝟏 − 𝟏−𝟏 𝟖 − 𝟏−𝟏 − 𝟏 − 𝟏

S𝐡𝐚𝐫𝐩𝐞𝐧 𝟎 − 𝟏 𝟎−𝟏 𝟓 − 𝟏𝟎 − 𝟏 𝟎

Box blur 𝟏

𝟗

𝟏 𝟏 𝟏𝟏 𝟏 𝟏𝟏 𝟏 𝟏

Gaussian blur 𝟏

𝟏𝟔

𝟏 𝟐 𝟏𝟐 𝟒 𝟐𝟏 𝟐 𝟏

20

Convolutional Networks (CNN) - Filters

Page 21: ESS391H The Application of Machine Learning and Artificial ...The Application of Machine Learning and Artificial Neural Networks in Geosciences Lecture 10 – Deep Learning: Neural

21

Stride and Padding

In convolving an image:

1) The outputs shrink

2) The information on corners of the image is lost

This can be prevented by padding.

32× 32 ×3

36

36

32

32

Conv.

Page 22: ESS391H The Application of Machine Learning and Artificial ...The Application of Machine Learning and Artificial Neural Networks in Geosciences Lecture 10 – Deep Learning: Neural

22

Stride and Padding - Color Image

𝑯𝒄 = 𝑯−𝑭𝑯+𝑷

𝑺𝑯 + 1

𝑾𝒄 = 𝑾−𝑭𝑾+𝑷

𝑺𝑾 + 1

http://machinelearninguru.com/computer_vision/basics/convolution/convolution_layer.html

Page 23: ESS391H The Application of Machine Learning and Artificial ...The Application of Machine Learning and Artificial Neural Networks in Geosciences Lecture 10 – Deep Learning: Neural

23

Stride and Padding

W

H

F

P

S

Stride = 1

Stride = 2

Page 24: ESS391H The Application of Machine Learning and Artificial ...The Application of Machine Learning and Artificial Neural Networks in Geosciences Lecture 10 – Deep Learning: Neural

𝐂𝐨𝐧𝐯 𝐑𝐞𝐋𝐔

𝐏𝐨𝐨𝐥 ……… . .

𝐂𝐥𝐚𝐬𝐬𝐟𝐢𝐜𝐚𝐭𝐢𝐨𝐧

FReLU 𝑧 = 𝑚𝑎𝑥 0, 𝑧

Rectified Linear Unit (ReLU)

FSReLU 𝑧 = log(1+exp(z))

Analytic approx.

𝐂𝐨𝐧𝐯 𝐑𝐞𝐋𝐔

𝐏𝐨𝐨𝐥

𝐂𝐨𝐧𝐯 𝐑𝐞𝐋𝐔

𝐏𝐨𝐨𝐥

Z = X.W = wi 𝑥𝑖𝑛1 = w1 x1 + w2 x2 + … + wn xn

FReLU 𝑧 = 𝑚𝑎𝑥 0, 𝑧

24

Convolutional Networks

Max Pooling / Downsampling with CNNs

Page 25: ESS391H The Application of Machine Learning and Artificial ...The Application of Machine Learning and Artificial Neural Networks in Geosciences Lecture 10 – Deep Learning: Neural

FSoftmax(zj) = 𝑒−𝑧𝑗

𝑒−𝑧𝑚𝑘𝑚

𝐤: Number of classes

𝐒𝐨𝐟𝐭𝐦𝐚𝐱 𝐟𝐮𝐧𝐜𝐭𝐢𝐨𝐧

FReLU 𝑧 = 𝑚𝑎𝑥 0, 𝑧

Rectified Linear Unit (ReLU)

FSReLU 𝑧 = log(1+exp(z))

Analytic approx.

𝐒𝐨𝐟𝐭𝐦𝐚𝐱

𝐑𝐞𝐋𝐔 𝐑𝐞𝐋𝐔

25

Convolutional Networks

https://medium.com/data-science-group-iitr/building-a-convolutional-neural-network-in-python-with-tensorflow-d251c3ca8117 `

𝐒𝐢𝐱 𝐟𝐢𝐥𝐭𝐞𝐫𝐬

Page 26: ESS391H The Application of Machine Learning and Artificial ...The Application of Machine Learning and Artificial Neural Networks in Geosciences Lecture 10 – Deep Learning: Neural

26

Network Architecture

Convolution, Filter shape:(5,5,6), Stride=1, Padding=’SAME’

Max pooling (2x2), Window shape:(2,2), Stride=2, Padding=’Same’

ReLU

Convolution, Filter shape:(5,5,16), Stride=1, Padding=’SAME’

Max pooling (2x2), Window shape:(2,2), Stride=2, Padding=’Same’

ReLU

Fully Connected Layer (128)

ReLU

Fully Connected Layer (10)

Softmax

Convolutional Networks

Page 27: ESS391H The Application of Machine Learning and Artificial ...The Application of Machine Learning and Artificial Neural Networks in Geosciences Lecture 10 – Deep Learning: Neural

27

3× 𝟑 𝑪𝒐𝒏𝒗.

𝟐𝟖

2× 𝟐 𝑷𝒐𝒐𝒍.

flatten

128

128 ×0.2

10

A Simple CNN Model

Output

Fully connected layers

Ex. 3: MNIST Database - Handwritten digits

A simple CNN deep learning model for handwritten digits recognition using Keras,

Max pooling: Take the maximum number when pooling

Dropout

Dense

Dense 𝐑𝐞𝐋𝐔 𝐒𝐨𝐟𝐭𝐦𝐚𝐱