Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory....

DS 5220

Alina OpreaAssociate Professor, CCISNortheastern University

November 13 2019

Supervised Machine Learning and Learning Theory

Logistics

2

• Project milestone– Due on Nov. 18 in Gradescope

• Template– Problem description– Dataset– Approach and methodology– Remaining work– Team member contribution

• Last homework will be released this week

Outline

• Feed-Forward architectures– Multi-class classification (softmax unit)– Lab in Keras

• Convolutional Neural Networks– Convolution layer– Max pooling layer– Examples of famous architectures

3

References

• Deep Learning books– https://www.deeplearningbook.org/– http://d2l.ai/

• Stanford notes on deep learning– http://cs229.stanford.edu/notes/cs229-notes-

deep_learning.pdf

• History of Deep Learning– https://beamandrew.github.io/deeplearning/2017

/02/23/deep_learning_101_part1.html

4

https://www.deeplearningbook.org/

http://d2l.ai/

http://cs229.stanford.edu/notes/cs229-notes-deep_learning.pdf

https://beamandrew.github.io/deeplearning/2017/02/23/deep_learning_101_part1.html

Neural Network Architectures

5

Feed-Forward Networks• Neurons from each layer

connect to neurons from next layer

Convolutional Networks• Includes convolution layer

for feature reduction• Learns hierarchical

representations

Recurrent Networks• Keep hidden state• Have cycles in

computational graph

Feed-Forward Networks

6

Layer 0 Layer 1 Layer 2 Layer 3

Feed-Forward Neural Network

7

𝑎"[$]

𝑎$[$]

𝑎&[$]

𝑎'[$]

Layer 0 Layer 1 Layer 2

𝑊[&]

𝑏[$]𝑏[&]

Training example𝑥 = (𝑥$, 𝑥&, 𝑥')

𝑎/[$]

𝑎$[&]

No cycles

𝑊[$]

𝜃 = (𝑏[$], 𝑊[$], 𝑏[&], 𝑊[&])

Vectorization

8

𝑧[$] = 𝑊[$]𝑥 + 𝑏[$] 𝑎[$] = 𝑔(𝑧 $ )Linear Non-Linear

Vectorization

9

Output layer

𝑧[&] = 𝑊[&]𝑎[$] + 𝑏[&] 𝑎[&] = 𝑔(𝑧 & )Linear Non-Linear

Hidden Units• Layer 1– First hidden unit:

• Linear: 𝑧$[$] = W$

$ 5𝑥 + b$[$]

• Non-linear: 𝑎$[$] = g(z$

[$])– …– Fourth hidden unit:

• Linear: 𝑧/[$] = W/

$ 5𝑥 + b/[$]

• Non-linear: 𝑎/[$] = g(z/

[$])• Terminology– 𝑎9

[:] - Activation of unit i in layer j– g - Activation function– 𝑊[:] - Weight vector controlling mapping from layer j-1 to j– 𝑏[:] - Bias vector from layer j-1 to j

10

Forward Propagation

11

x Prediction

Activation Functions

12

Binary Classification

Intermediary layers

Regression

15

Sigmoid Softmax

Softmax classifier

16

• Predict the class with highest probability• Generalization of sigmoid/logistic regression to multi-class

Multi-class classification

17

Review Feed-Forward Neural Networks• Simplest architecture of NN• Neurons from one layer are connected to neurons from

next layer– Input layer has feature space dimension– Output layer has number of classes – Hidden layers use linear operations, followed by non-linear

activation function– Multi-Layer Perceptron (MLP): fully connected layers

• Activation functions– Sigmoid for binary classification in last layer– Softmax for multi-class classification in last layer– ReLU for hidden layers

• Forward propagation is the computation of the network output given an input

• Back propagation is the training of a network– Determine weights and biases at every layer

18

FFNN Architectures

19

• Input and Output Layers are completely specified by the problem domain

• In the Hidden Layers, number of neurons in Layer i+1 is always smaller than number of neurons in Layer i

MNIST: Handwritten digit recognition

20

Predict the digitMulti-class classifier

Image Representation

21

• Image is 3D “tensor”: height, width, color channel (RGB)

• Black-and-white images are 2D matrices: height, width– Each value is pixel intensity

Lab – Feed Forward NN

22

Import modules

Load MNIST dataProcessing

Vector representation

Neural Network Architecture

23

10 hidden unitsReLU activation

Output LayerSoftmax activation

Feed-Forward Neural Network Architecture• 1 Hidden Layer (“Dense” or Fully Connected)• 10 neurons• Output layer uses softmax activation

Loss function Optimizer

Train and evaluate

24

25

Epoch Output

Metrics• Loss• AccuracyReported on both training and validation

Training/testing results

Changing Number of Neurons

26

500 hidden units

Two Layers

27

Output Softmax Layer

Layer 1

Layer 2

Monitor Loss

28

Model Parameters

29

Neural Network Architectures

30

Feed-Forward Networks• Neurons from each layer

connect to neurons from next layer

Convolutional Networks• Includes convolution layer

for feature reduction• Learns hierarchical

representations

Recurrent Networks• Keep hidden state• Have cycles in

computational graph

Convolutional Neural Networks

31

Convolutional Nets

32

- Object recognition- Steering angle prediction- Assist drivers in making

decisions

- Image captioning

Convolutional Nets

• Particular type of Feed-Forward Neural Nets – Invented by [LeCun 89]

• Applicable to data with natural grid topology– Time series– Images

• Use convolutions on at least one layer– Convolution is a linear operation that uses local

information – Also use pooling operation– Used for dimensionality reduction and learning

hierarchical feature representations

33

Convolutional Nets

34

Convolutional Nets

35

Convolutions

36

Example

37

Input Filter Output

Acknowledgements

• Slides made using resources from:– Andrew Ng– Eric Eaton– David Sontag– Andrew Moore– Yann Lecun

• Thanks!

38

Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory....

Documents

Transcript of Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory....