Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory....

38
DS 5220 Alina Oprea Associate Professor, CCIS Northeastern University November 13 2019 Supervised Machine Learning and Learning Theory

Transcript of Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory....

Page 1: Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory. Logistics 2 •Project milestone –Due on Nov. 18 in Gradescope •Template –Problem

DS 5220

Alina OpreaAssociate Professor, CCISNortheastern University

November 13 2019

Supervised Machine Learning and Learning Theory

Page 2: Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory. Logistics 2 •Project milestone –Due on Nov. 18 in Gradescope •Template –Problem

Logistics

2

• Project milestone– Due on Nov. 18 in Gradescope

• Template– Problem description– Dataset– Approach and methodology– Remaining work– Team member contribution

• Last homework will be released this week

Page 3: Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory. Logistics 2 •Project milestone –Due on Nov. 18 in Gradescope •Template –Problem

Outline

• Feed-Forward architectures– Multi-class classification (softmax unit)– Lab in Keras

• Convolutional Neural Networks– Convolution layer– Max pooling layer– Examples of famous architectures

3

Page 4: Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory. Logistics 2 •Project milestone –Due on Nov. 18 in Gradescope •Template –Problem

References

• Deep Learning books– https://www.deeplearningbook.org/– http://d2l.ai/

• Stanford notes on deep learning– http://cs229.stanford.edu/notes/cs229-notes-

deep_learning.pdf

• History of Deep Learning– https://beamandrew.github.io/deeplearning/2017

/02/23/deep_learning_101_part1.html

4

Page 5: Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory. Logistics 2 •Project milestone –Due on Nov. 18 in Gradescope •Template –Problem

Neural Network Architectures

5

Feed-Forward Networks• Neurons from each layer

connect to neurons from next layer

Convolutional Networks• Includes convolution layer

for feature reduction• Learns hierarchical

representations

Recurrent Networks• Keep hidden state• Have cycles in

computational graph

Page 6: Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory. Logistics 2 •Project milestone –Due on Nov. 18 in Gradescope •Template –Problem

Feed-Forward Networks

6

Layer 0 Layer 1 Layer 2 Layer 3

Page 7: Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory. Logistics 2 •Project milestone –Due on Nov. 18 in Gradescope •Template –Problem

Feed-Forward Neural Network

7

𝑎"[$]

𝑎$[$]

𝑎&[$]

𝑎'[$]

Layer 0 Layer 1 Layer 2

𝑊[&]

𝑏[$]𝑏[&]

Training example𝑥 = (𝑥$, 𝑥&, 𝑥')

𝑎/[$]

𝑎$[&]

No cycles

𝑊[$]

𝜃 = (𝑏[$], 𝑊[$], 𝑏[&], 𝑊[&])

Page 8: Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory. Logistics 2 •Project milestone –Due on Nov. 18 in Gradescope •Template –Problem

Vectorization

8

𝑧[$] = 𝑊[$]𝑥 + 𝑏[$] 𝑎[$] = 𝑔(𝑧 $ )Linear Non-Linear

Page 9: Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory. Logistics 2 •Project milestone –Due on Nov. 18 in Gradescope •Template –Problem

Vectorization

9

Output layer

𝑧[&] = 𝑊[&]𝑎[$] + 𝑏[&] 𝑎[&] = 𝑔(𝑧 & )Linear Non-Linear

Page 10: Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory. Logistics 2 •Project milestone –Due on Nov. 18 in Gradescope •Template –Problem

Hidden Units• Layer 1– First hidden unit:

• Linear: 𝑧$[$] = W$

$ 5𝑥 + b$[$]

• Non-linear: 𝑎$[$] = g(z$

[$])– …– Fourth hidden unit:

• Linear: 𝑧/[$] = W/

$ 5𝑥 + b/[$]

• Non-linear: 𝑎/[$] = g(z/

[$])• Terminology– 𝑎9

[:] - Activation of unit i in layer j– g - Activation function– 𝑊[:] - Weight vector controlling mapping from layer j-1 to j– 𝑏[:] - Bias vector from layer j-1 to j

10

Page 11: Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory. Logistics 2 •Project milestone –Due on Nov. 18 in Gradescope •Template –Problem

Forward Propagation

11

x Prediction

Page 12: Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory. Logistics 2 •Project milestone –Due on Nov. 18 in Gradescope •Template –Problem

Activation Functions

12

Binary Classification

Intermediary layers

Regression

Page 13: Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory. Logistics 2 •Project milestone –Due on Nov. 18 in Gradescope •Template –Problem

13

Page 14: Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory. Logistics 2 •Project milestone –Due on Nov. 18 in Gradescope •Template –Problem

14

Page 15: Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory. Logistics 2 •Project milestone –Due on Nov. 18 in Gradescope •Template –Problem

15

Sigmoid Softmax

Page 16: Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory. Logistics 2 •Project milestone –Due on Nov. 18 in Gradescope •Template –Problem

Softmax classifier

16

• Predict the class with highest probability• Generalization of sigmoid/logistic regression to multi-class

Page 17: Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory. Logistics 2 •Project milestone –Due on Nov. 18 in Gradescope •Template –Problem

Multi-class classification

17

Page 18: Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory. Logistics 2 •Project milestone –Due on Nov. 18 in Gradescope •Template –Problem

Review Feed-Forward Neural Networks• Simplest architecture of NN• Neurons from one layer are connected to neurons from

next layer– Input layer has feature space dimension– Output layer has number of classes – Hidden layers use linear operations, followed by non-linear

activation function– Multi-Layer Perceptron (MLP): fully connected layers

• Activation functions– Sigmoid for binary classification in last layer– Softmax for multi-class classification in last layer– ReLU for hidden layers

• Forward propagation is the computation of the network output given an input

• Back propagation is the training of a network– Determine weights and biases at every layer

18

Page 19: Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory. Logistics 2 •Project milestone –Due on Nov. 18 in Gradescope •Template –Problem

FFNN Architectures

19

• Input and Output Layers are completely specified by the problem domain

• In the Hidden Layers, number of neurons in Layer i+1 is always smaller than number of neurons in Layer i

Page 20: Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory. Logistics 2 •Project milestone –Due on Nov. 18 in Gradescope •Template –Problem

MNIST: Handwritten digit recognition

20

Predict the digitMulti-class classifier

Page 21: Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory. Logistics 2 •Project milestone –Due on Nov. 18 in Gradescope •Template –Problem

Image Representation

21

• Image is 3D “tensor”: height, width, color channel (RGB)

• Black-and-white images are 2D matrices: height, width– Each value is pixel intensity

Page 22: Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory. Logistics 2 •Project milestone –Due on Nov. 18 in Gradescope •Template –Problem

Lab – Feed Forward NN

22

Import modules

Load MNIST dataProcessing

Vector representation

Page 23: Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory. Logistics 2 •Project milestone –Due on Nov. 18 in Gradescope •Template –Problem

Neural Network Architecture

23

10 hidden unitsReLU activation

Output LayerSoftmax activation

Feed-Forward Neural Network Architecture• 1 Hidden Layer (“Dense” or Fully Connected)• 10 neurons• Output layer uses softmax activation

Loss function Optimizer

Page 24: Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory. Logistics 2 •Project milestone –Due on Nov. 18 in Gradescope •Template –Problem

Train and evaluate

24

Page 25: Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory. Logistics 2 •Project milestone –Due on Nov. 18 in Gradescope •Template –Problem

25

Epoch Output

Metrics• Loss• AccuracyReported on both training and validation

Training/testing results

Page 26: Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory. Logistics 2 •Project milestone –Due on Nov. 18 in Gradescope •Template –Problem

Changing Number of Neurons

26

500 hidden units

Page 27: Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory. Logistics 2 •Project milestone –Due on Nov. 18 in Gradescope •Template –Problem

Two Layers

27

Output Softmax Layer

Layer 1

Layer 2

Page 28: Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory. Logistics 2 •Project milestone –Due on Nov. 18 in Gradescope •Template –Problem

Monitor Loss

28

Page 29: Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory. Logistics 2 •Project milestone –Due on Nov. 18 in Gradescope •Template –Problem

Model Parameters

29

Page 30: Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory. Logistics 2 •Project milestone –Due on Nov. 18 in Gradescope •Template –Problem

Neural Network Architectures

30

Feed-Forward Networks• Neurons from each layer

connect to neurons from next layer

Convolutional Networks• Includes convolution layer

for feature reduction• Learns hierarchical

representations

Recurrent Networks• Keep hidden state• Have cycles in

computational graph

Page 31: Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory. Logistics 2 •Project milestone –Due on Nov. 18 in Gradescope •Template –Problem

Convolutional Neural Networks

31

Page 32: Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory. Logistics 2 •Project milestone –Due on Nov. 18 in Gradescope •Template –Problem

Convolutional Nets

32

- Object recognition- Steering angle prediction- Assist drivers in making

decisions

- Image captioning

Page 33: Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory. Logistics 2 •Project milestone –Due on Nov. 18 in Gradescope •Template –Problem

Convolutional Nets

• Particular type of Feed-Forward Neural Nets – Invented by [LeCun 89]

• Applicable to data with natural grid topology– Time series– Images

• Use convolutions on at least one layer– Convolution is a linear operation that uses local

information – Also use pooling operation– Used for dimensionality reduction and learning

hierarchical feature representations

33

Page 34: Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory. Logistics 2 •Project milestone –Due on Nov. 18 in Gradescope •Template –Problem

Convolutional Nets

34

Page 35: Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory. Logistics 2 •Project milestone –Due on Nov. 18 in Gradescope •Template –Problem

Convolutional Nets

35

Page 36: Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory. Logistics 2 •Project milestone –Due on Nov. 18 in Gradescope •Template –Problem

Convolutions

36

Page 37: Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory. Logistics 2 •Project milestone –Due on Nov. 18 in Gradescope •Template –Problem

Example

37

Input Filter Output

Page 38: Supervised Machine Learning and Learning Theory · Supervised Machine Learning and Learning Theory. Logistics 2 •Project milestone –Due on Nov. 18 in Gradescope •Template –Problem

Acknowledgements

• Slides made using resources from:– Andrew Ng– Eric Eaton– David Sontag– Andrew Moore– Yann Lecun

• Thanks!

38