Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised...

41
Introduction to Machine Learning Introduction to Machine Learning Amo G. Tong 1

Transcript of Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised...

Page 1: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

Introduction to Machine Learning

Introduction to Machine Learning Amo G. Tong 1

Page 2: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

Lecture 8Supervised Learning

• Artificial Neural Networks

• Backpropagation Algorithm

• Some materials are courtesy of Vibhave Gogate, Eric Xing and Tom Mitchell.

• All pictures belong to their creators.

Introduction to Machine Learning Amo G. Tong 2

Page 3: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

Introduction to Machine Learning Amo G. Tong 3

Supervised Learning

• < 𝑥, 𝑦 >

• Estimate Pr[𝑦|𝑥] by Bayesian Theory: Pr 𝑦 𝑥 =Pr 𝑥 𝑦 Pr 𝑦

Pr[𝑥]

• Naïve Bayes.

• Estimate Pr[𝑦|𝑥] directed by assuming a certain form

• Logistic regression.

• Assume the form of 𝑦 = 𝑓(𝑥)

• Error-driven.

Pr 𝑥 𝑦 and Pr 𝑦 are hard to compute.No good form for Pr[𝑦|𝑥]No prior Knowledge on 𝑓(𝑥).

You can try Neural Networks.

Page 4: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

Introduction to Machine Learning Amo G. Tong 4

Neural Networks

• Input 𝑥, output 𝑦

• Given 𝑥, how to compute 𝑦?

• 𝑥 and 𝑦 are generally real vectors.

perceptronSigmoid unit

Page 5: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

Introduction to Machine Learning Amo G. Tong 5

Neural Networks

• Input 𝑥, output 𝑦

• Given 𝑥, how to compute 𝑦?

• 𝑥 and 𝑦 are generally real vectors.

Page 6: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

Introduction to Machine Learning Amo G. Tong 6

Neural Networks

• Input 𝑥, output 𝑦

• Given 𝑥, how to compute 𝑦?

• 𝑥 and 𝑦 are generally real vectors.

• Many neuron-like units (perceptron, sigmoid)• Weighted interconnections between units.• Parallel distributed process.

Page 7: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

Introduction to Machine Learning Amo G. Tong 7

Neural Networks

• Input 𝑥, output 𝑦

• Given 𝑥, how to compute 𝑦?

• 𝑥 and 𝑦 are generally real vectors.

• Design a neural network:

• What is the function within each unit?

• How are the units connected?

Page 8: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

Introduction to Machine Learning Amo G. Tong 8

Neural Networks

• Input 𝑥, output 𝑦

• Given 𝑥, how to compute 𝑦?

• 𝑥 and 𝑦 are generally real vectors.

• Units:

• Input Unit

• Processing Units

• Receive input from other units

• Output result to other units

• Edges

• weights

Page 9: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

Introduction to Machine Learning Amo G. Tong 9

Examples

• Linear function 𝑦 = 𝜔1𝑥1 +⋯+𝜔𝑛𝑥𝑛

• Input (𝑥1, … , 𝑥𝑛) output 𝑦.

𝑥1

sum𝑥2

𝑥𝑛

𝜔1

𝜔2

𝜔𝑛

Page 10: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

Introduction to Machine Learning Amo G. Tong 10

Examples

• Quadratic function 𝑦 = (𝜔11𝑥1 + 𝜔12𝑥2)(𝜔21𝑥1 + 𝜔22𝑥2)

• Input (𝑥1, … , 𝑥𝑛) output 𝑦.

𝑥1sum

𝑥2

multiply

𝜔11

𝜔12

1

sum

𝜔21

𝜔22

1

Page 11: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

Introduction to Machine Learning Amo G. Tong 11

Train a Neural Network

• Define the error 𝐸

• Update the weight 𝜔 of each edge to minimize 𝐸

• 𝜔 ← 𝜔 − 𝜂𝜕 𝐸

𝜕 𝜔(delta rule)

• Batch Mode

• E: the total error among training data.

• Consider all the training data each time.

• Incremental Mode

• E: the error for one training instance.

• Consider training instances one by one.

Calculate 𝜕 𝐸

𝜕 𝜔

Page 12: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

Introduction to Machine Learning Amo G. Tong 12

Train a Neural Network

• Define the error 𝐸

• Update the weight 𝜔 of each edge to minimize 𝐸

• 𝜔 ← 𝜔 − 𝜂𝜕 𝐸

𝜕 𝜔(delta rule)

• Incremental Mode

• E: the error for one training instance.

• Consider training instances one by one.

Calculate 𝜕 𝐸

𝜕 𝜔

Page 13: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

Introduction to Machine Learning Amo G. Tong 13

Neural Networks

• One layer of sigmoid with one output

• Two layers of single functions

• Two layers of multiple sigmoid units with one output.

• Two layers of multiple sigmoid units with multiple outputs.

• Goal: how to find the parameters that can minimize the error.

Page 14: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

Introduction to Machine Learning Amo G. Tong 14

Neural Networks

• One layer of sigmoid with one output

𝑥1

sigmoid𝑥2

𝑥𝑛

𝜔1

𝜔2

𝜔𝑛

𝜕𝐸𝑥𝜕 𝜔𝑖

= −(𝑡𝑥 − 𝑜𝑥)𝜕𝑜𝑥𝜕 𝜔𝑖

= −(𝑡𝑥 − 𝑜𝑥) ⋅ 𝑥𝑖⋅ 𝑜𝑥(1 − 𝑜𝑥)

𝑜 𝑥1, … , 𝑥𝑛 =1

1 + 𝑒−𝝎⋅𝒙

𝑜𝑥

Page 15: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

Introduction to Machine Learning Amo G. Tong 15

Neural Networks

• Multilayers

• Update 𝜔1 or 𝜔2 first?

𝑥 𝑓 𝑔𝜔1 𝜔2

𝑓 𝜔1𝑥 𝑔 𝜔2𝑓 𝜔1𝑥

𝜕𝐸

𝜕 𝜔1= −(𝑡𝑥 − 𝑔)

𝜕𝑔

𝜕 𝜔1𝜕𝑔

𝜕 𝜔1=

𝜕𝑔

𝜕𝜔2𝑓

𝜕𝜔2𝑓

𝜕𝜔1

=𝜕𝑔

𝜕𝜔2𝑓𝜔2

𝜕𝑓

𝜕𝜔1

=𝜕𝑔

𝜕𝜔2𝑓𝜔2

𝜕𝑓

𝜕𝜔1𝑥

𝜕𝜔1𝑥

𝜕𝜔1

=𝜕𝑔

𝜕𝜔2𝑓𝜔2

𝜕𝑓

𝜕𝜔1𝑥𝑥

𝜕𝐸

𝜕 𝜔2= −(𝑡𝑥 − 𝑔)

𝜕𝑔

𝜕 𝜔2𝜕𝑔

𝜕 𝜔2=

𝜕𝑔

𝜕𝜔2𝑓

𝜕𝜔2𝑓

𝜕𝜔2=

𝜕𝑔

𝜕𝜔2𝑓𝑓

𝐸 =1

2|𝑡𝑥 − 𝑔 ቚ

2(𝑥, 𝑡𝑥)

Page 16: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

Introduction to Machine Learning Amo G. Tong 16

Neural Networks

• Two layers of multiple sigmoid units with one output.

𝑥1 sigmoid

𝑥2

𝑥𝑖

sigmoidsigmoid

sigmoid

today’s weather tomorrow’s temp (normalized)

𝑥

𝑡𝑥 = 𝑓(𝑥)

𝑓(𝑥, 𝜔)

𝜔: weight on each edge

𝑜𝑥

For each 𝑥 in 𝐷

𝐸𝑥 =1

2𝑡𝑥 − 𝑜𝑥

2

Page 17: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

Introduction to Machine Learning Amo G. Tong 17

Neural Networks

• Two layers of multiple sigmoid units with one output.

𝑥1 sigmoid

𝑥2

𝑥𝑖

sigmoidsigmoid

sigmoid

𝑜𝑥

𝜔𝑗

ℎ𝑥1

ℎ𝑥2

ℎ𝑥𝑗

For each 𝑥 in 𝐷

𝐸𝑥 =1

2𝑡𝑥 − 𝑜𝑥

2

𝜕𝐸𝑥

𝜕 𝜔𝑗= − 𝑡𝑥 − 𝑜𝑥

𝜕𝑜𝑥

𝜕 𝜔𝑗= − 𝑡𝑥 − 𝑜𝑥 ⋅ ℎ𝑥

𝑗⋅ 𝑜𝑥(1 − 𝑜𝑥)

Page 18: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

Introduction to Machine Learning Amo G. Tong 18

Neural Networks

• Two layers of multiple sigmoid units with one output.

𝑥1 sigmoid

𝑥2

𝑥𝑖

sigmoidsigmoid

sigmoid

𝑜𝑥

𝜔𝑖,𝑗

ℎ𝑥1

ℎ𝑥2

ℎ𝑥𝑗

For each 𝑥 in 𝐷

𝐸𝑥 =1

2𝑡𝑥 − 𝑜𝑥

2

𝜕𝐸𝑥

𝜕 𝜔𝑖,𝑗= − 𝑡𝑥 − 𝑜𝑥

𝜕𝑜𝑥

𝜕𝜔𝑖,𝑗= − 𝑡𝑥 − 𝑜𝑥

𝜕𝑜𝑥

𝜕 ℎ𝑥𝑗

𝜕ℎ𝑥𝑗

𝜕𝜔𝑖,𝑗

𝑜𝑥 =1

1 + 𝑒−∑𝜔𝑗⋅ℎ𝑥𝑗

𝜔𝑗

Page 19: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

Introduction to Machine Learning Amo G. Tong 19

Neural Networks

• Two layers of multiple sigmoid units with one output.

𝑥1 sigmoid

𝑥2

𝑥𝑖

sigmoidsigmoid

sigmoid

𝑜𝑥

𝜔𝑖,𝑗

ℎ𝑥1

ℎ𝑥2

ℎ𝑥𝑗

For each 𝑥 in 𝐷

𝐸𝑥 =1

2𝑡𝑥 − 𝑜𝑥

2

𝜕𝑜𝑥

𝜕 ℎ𝑥𝑗= 𝜔𝑗 ⋅ 𝑜𝑥(1 − 𝑜𝑥)

𝑜𝑥 =1

1 + 𝑒−∑𝜔𝑗⋅ℎ𝑥𝑗

𝜕ℎ𝑥𝑗

𝜕𝜔𝑖,𝑗= 𝑥𝑖 ⋅ ℎ𝑥

𝑗(1 − ℎ𝑥

𝑗)

𝜔𝑗

Page 20: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

Introduction to Machine Learning Amo G. Tong 20

Neural Networks

• Two layers of multiple sigmoid units with one output.

𝑥𝑖 sigmoidsigmoid𝑜𝑥𝜔𝑖,𝑗

ℎ𝑥𝑗

𝜕𝐸𝑥

𝜕 𝜔𝑖,𝑗= − 𝑡𝑥 − 𝑜𝑥 𝜔𝑗 ⋅ 𝑜𝑥(1 − 𝑜𝑥) 𝑥𝑖 ⋅ ℎ𝑥

𝑗(1 − ℎ𝑥

𝑗)

𝜔𝑗

𝜕𝐸𝑥

𝜕 𝜔𝑗= − 𝑡𝑥 − 𝑜𝑥 ⋅ 𝑜𝑥(1 − 𝑜𝑥) ⋅ ℎ𝑥

𝑗

𝛿𝑥𝑜 = 𝑡𝑥 − 𝑜𝑥 𝑜𝑥(1 − 𝑜𝑥)

𝛿𝑥ℎ,𝑗

= ℎ𝑥𝑗1 − ℎ𝑥

𝑗𝜔𝑗𝛿𝑥

𝑜

𝜕𝐸𝑥𝜕 𝜔𝑗

= −ℎ𝑥𝑗𝛿𝑥𝑜

𝜕𝐸𝑥𝜕 𝜔𝑖,𝑗

= −𝑥𝑖𝛿𝑥ℎ,𝑗

𝜔 ← 𝜔 + Δ𝜔

Δ𝜔 = −𝜂𝜕 𝐸

𝜕 𝜔Δ𝜔j = 𝜂 ℎ𝑥

𝑗𝛿𝑥𝑜

Δ𝜔𝑖,j = 𝜂 𝑥𝑖𝛿𝑥ℎ,𝑗

Page 21: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

Introduction to Machine Learning Amo G. Tong 21

Neural Networks

• Two layers of multiple sigmoid units with one output.

𝑥1 sigmoid

𝑥2

𝑥𝑖

sigmoidsigmoid

sigmoid

𝑜𝑥

𝜔𝑖,𝑗

ℎ𝑥1

ℎ𝑥2

ℎ𝑥𝑗

𝜔𝑗

𝜔 ← 𝜔 + Δ𝜔

Δ𝜔j = 𝜂 ℎ𝑥𝑗𝛿𝑥𝑜

Δ𝜔𝑖,j = 𝜂 𝑥𝑖𝛿𝑥ℎ,𝑗

ℎ𝑥𝑗

and 𝑥𝑖, input of the edge

𝛿𝑥𝑜 and 𝛿𝑥

𝑗, error term

associated with the units

𝛿 = −𝜕𝐸

𝜕 𝑢𝑛𝑖𝑡

𝑎 𝑏𝜔 Input from 𝑎

Error term 𝜕 𝐸

𝜕 𝑏on 𝑏

Δ𝜔 = 𝜂𝑎𝜕 𝐸

𝜕 𝑏

Page 22: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

Introduction to Machine Learning Amo G. Tong 22

Neural Networks

• Two layers of multiple sigmoid units with one output.

𝑥1 sigmoid

𝑥2

𝑥𝑖

sigmoidsigmoid

sigmoid

𝑜𝑥

𝜔𝑖,𝑗

ℎ𝑥1

ℎ𝑥2

ℎ𝑥𝑗

𝜔𝑗

𝜔 ← 𝜔 + Δ𝜔

Δ𝜔j = 𝜂 ℎ𝑥𝑗𝛿𝑥𝑜

Δ𝜔𝑖,j = 𝜂 𝑥𝑖𝛿𝑥ℎ,𝑗

𝛿𝑥𝑜 = 𝑡𝑥 − 𝑜𝑥 𝑜𝑥(1 − 𝑜𝑥)

𝛿𝑥ℎ,𝑗

= ℎ𝑥𝑗1 − ℎ𝑥

𝑗𝜔𝑗𝛿𝑥

𝑜

For each 𝑥 in 𝐷

Forward, calculate ℎ𝑥𝑗

and 𝑜𝑥Backward, calculate 𝛿𝑥

𝑜 and 𝛿𝑥ℎ,𝑗

Update 𝜔𝑗 and 𝜔𝑖,𝑗

Backpropagation Framework

Page 23: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

Introduction to Machine Learning Amo G. Tong 23

Neural Networks

• Two layers of multiple sigmoid units with multiple outputs.

𝑥1 sigmoid

𝑥2

𝑥𝑖

sigmoid

sigmoid

sigmoid

sigmoid

sigmoid

… …

𝑜𝑥1

𝑜𝑥2

𝑜𝑥𝑘

today’s weather tomorrow’s weather (normalized)

𝑥 𝑡𝑥 = 𝑓(𝑥)𝑓(𝑥, 𝜔)

𝜔: weight on each edge

ℎ𝑥1

ℎ𝑥2

ℎ𝑥𝑗

Page 24: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

Introduction to Machine Learning Amo G. Tong 24

Neural Networks

• Two layers of multiple sigmoid units with multiple outputs.

𝑥1 sigmoid

𝑥2

𝑥𝑖

sigmoid

sigmoid

sigmoid

sigmoid

sigmoid

… …

𝑜𝑥1

𝑜𝑥2

𝑜𝑥𝑘

ℎ𝑥1

ℎ𝑥2

ℎ𝑥𝑗

𝜔𝑖,𝑗

For each 𝑥 in 𝐷

𝐸𝑥 =1

2

𝑘

𝑡𝑥𝑘 − 𝑜𝑥

𝑘 2 𝑎 𝑏𝜔 Input from 𝑎

Error term 𝜕 𝐸

𝜕 𝑏on 𝑏

Δ𝜔 = 𝜂𝑎𝜕 𝐸

𝜕 𝑏

𝜔𝑗,𝑘

Page 25: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

Introduction to Machine Learning Amo G. Tong 25

Neural Networks

• Two layers of multiple sigmoid units with multiple outputs.

𝑥1 sigmoid

𝑥2

𝑥𝑖

sigmoid

sigmoid

sigmoid

sigmoid

sigmoid

… …

𝑜𝑥1

𝑜𝑥2

𝑜𝑥𝑘

ℎ𝑥1

ℎ𝑥2

ℎ𝑥𝑗

𝜔𝑖,𝑗 𝜔𝑗,𝑘

For each 𝑥 in 𝐷

𝐸𝑥 =1

2

𝑘

𝑡𝑥𝑘 − 𝑜𝑥

𝑘 2

One ouput: 𝛿𝑥𝑜 = 𝑡𝑥 − 𝑜𝑥 𝑜𝑥(1 − 𝑜𝑥)

Multiple output: 𝛿𝑥𝑜,𝑘 = 𝑡𝑥

𝑘 − 𝑜𝑥𝑘 𝑜𝑥

𝑘(1 − 𝑜𝑥𝑘)

Δ𝜔𝑗,𝑘 = 𝜂 ℎ𝑥𝑗𝛿𝑥𝑜,𝑘

Page 26: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

Introduction to Machine Learning Amo G. Tong 26

Neural Networks

• Two layers of multiple sigmoid units with multiple outputs.

sigmoid sigmoid

sigmoid

sigmoid

sigmoid

sigmoid

… …

𝑜𝑥1

𝑜𝑥2

𝑜𝑥𝑘

ℎ𝑥1

ℎ𝑥2

ℎ𝑥𝑗

One ouput: 𝛿𝑥ℎ,𝑗

= ℎ𝑥𝑗1 − ℎ𝑥

𝑗𝜔𝑗 𝛿𝑥

𝑜

Multiple output: 𝛿𝑥ℎ,𝑗

= ℎ𝑥𝑗1 − ℎ𝑥

𝑗 ∑𝑘𝜔𝑗,𝑘𝛿𝑥𝑜,𝑘

If not fully connected, sum over the connected units.

Page 27: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

Introduction to Machine Learning Amo G. Tong 27

Neural Networks

• Two layers of multiple sigmoid units with multiple outputs.

𝑥1 sigmoid

𝑥2

𝑥𝑖

sigmoid

sigmoid

sigmoid

sigmoid

sigmoid

… …

𝑜𝑥1

𝑜𝑥2

𝑜𝑥𝑘

ℎ𝑥1

ℎ𝑥2

ℎ𝑥𝑗

𝜔𝑖,𝑗

Δ𝜔𝑖,𝑗 = 𝜂 𝑥𝑖 𝛿𝑥ℎ,𝑗

One ouput: 𝛿𝑥ℎ,𝑗

= ℎ𝑥𝑗1 − ℎ𝑥

𝑗𝜔𝑗 𝛿𝑥

𝑜

Multiple output: 𝛿𝑥ℎ,𝑗

= ℎ𝑥𝑗1 − ℎ𝑥

𝑗 ∑𝑘𝜔𝑗,𝑘𝛿𝑥𝑜,𝑘

𝜔𝑗,𝑘

For each 𝑥 in 𝐷

𝐸𝑥 =1

2

𝑘

𝑡𝑥𝑘 − 𝑜𝑥

𝑘 2

Page 28: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

Introduction to Machine Learning Amo G. Tong 28

Neural Networks

• Two layers of multiple sigmoid units with multiple outputs.

𝜔 ← 𝜔 + Δ𝜔

Δ𝜔𝑗,𝑘 = 𝜂 ℎ𝑥𝑗𝛿𝑥𝑜,𝑘

Δ𝜔𝑖,𝑗 = 𝜂 𝑥𝑖 𝛿𝑥ℎ,𝑗

𝛿𝑥𝑜,𝑘 = 𝑡𝑥

𝑘 − 𝑜𝑥𝑘 𝑜𝑥

𝑘 1 − 𝑜𝑥𝑘

𝛿𝑥ℎ,𝑗

= ℎ𝑥𝑗1 − ℎ𝑥

𝑗 ∑𝑘𝜔𝑗,𝑘𝛿𝑥𝑜,𝑘

For each 𝑥 in 𝐷

Forward, calculate ℎ𝑥𝑗

and 𝑜𝑥𝑘

Backward, calculate 𝛿𝑥𝑜,𝑘 and 𝛿𝑥

ℎ,𝑗

Update 𝜔𝑗,𝑘 and 𝜔𝑖,𝑗

Backpropagation Framework

𝑥1 sigmoid

𝑥2

𝑥𝑖

sigmoid

sigmoid

sigmoid

sigmoid

sigmoid

… …

𝑜𝑥1

𝑜𝑥2

𝑜𝑥𝑘

ℎ𝑥1

ℎ𝑥2

ℎ𝑥𝑗

𝜔𝑖,𝑗 𝜔𝑗,𝑘

Page 29: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

Introduction to Machine Learning Amo G. Tong 29

Neural Networks

• Two layers of multiple sigmoid units with multiple outputs.

• Initialize the weights with random values.

• Do until converge

• For each 𝑥 ∈ 𝐷

• (Forward) Calculate ℎ𝑥𝑗

and 𝑜𝑥𝑘

• (Backward) calculate 𝛿𝑥𝑜,𝑘 and 𝛿𝑥

ℎ,𝑗

• Update 𝜔𝑗,𝑘 and 𝜔𝑖,𝑗

𝛿𝑥𝑜,𝑘 = 𝑡𝑥

𝑘 − 𝑜𝑥𝑘 𝑜𝑥

𝑘 1 − 𝑜𝑥𝑘

𝛿𝑥ℎ,𝑗

= ℎ𝑥𝑗1 − ℎ𝑥

𝑗 ∑𝑘𝜔𝑗,𝑘𝛿𝑥𝑜,𝑘

𝜔 ← 𝜔 + Δ𝜔

Δ𝜔𝑗,𝑘 = 𝜂 ℎ𝑥𝑗𝛿𝑥𝑜,𝑘

Δ𝜔𝑖,𝑗 = 𝜂 𝑥𝑖 𝛿𝑥ℎ,𝑗

Page 30: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

Introduction to Machine Learning Amo G. Tong 30

Neural Networks

• Train a neural network with:

• One sigmoid units.

• Two layers of multiple sigmoid units with one output.

• Two layers of multiple sigmoid units with multiple outputs.

• Backpropagation framework: any units, any acyclic graph.

• Update the weight from right to left.

Page 31: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

Introduction to Machine Learning Amo G. Tong 31

Neural Networks

• Backpropagation framework

• Good news: using the network is fast.

• Bad news: training the network is slow

• More bad news: it may converge to local minima.

• More good news: it performs well in practice.

• To avoid local minima

• Add a momentum

• Δ𝜔𝑛∗ = Δ𝜔𝑛 + 𝛼Δ𝜔𝑛−1

• Train with different initializations.

Page 32: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

Introduction to Machine Learning Amo G. Tong 32

Neural Networks

• Representational Power of Neural Networks.

• Boolean functions: every Boolean function can be represented exactly by some network with two layers of units.

• Continuous functions: every bounded continuous function can be approximated with arbitrarily small error by a network with two layers of units.(sigmoid + linear will do)

• Arbitrary function: any function can be approximated to arbitrary accuracy by a network with three layers of units. (sigmoid + sigmoid+ linear will do)

Page 33: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

• Learn an identity function with eight training examples.

Introduction to Machine Learning Amo G. Tong 33

An example (Mitchell)

Page 34: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

• Learn an identity function with eight training examples.

Introduction to Machine Learning Amo G. Tong 34

An example (Mitchell)

Page 35: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

• Learn an identity function with eight training examples.

Introduction to Machine Learning Amo G. Tong 35

An example (Mitchell)

1 0 0

0 0 1

0 1 0

1 1 1

0 0 0

0 1 1

1 0 1

1 1 0

Page 36: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

• Overfitting is everywhere…

• Overfitting may happen, when

• The learned model is too complicated

• Fitting data too well when data has noise

• When training set is small

• Avoid large parameters.

Introduction to Machine Learning Amo G. Tong 36

Overfitting

Page 37: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

• Overfitting is everywhere…

Introduction to Machine Learning Amo G. Tong 37

Overfitting

Page 38: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

• Overfitting is everywhere…

Introduction to Machine Learning Amo G. Tong 38

Overfitting

Page 39: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

• Overfitting is everywhere…

• Weight Decay:

• Decrease each weight by some small factor during each iteration?

• Penalize large weights:

• 𝐸𝑟𝑟𝑜𝑟 = 𝐸𝑟𝑟𝑜𝑟 + 𝛾 ∑𝜔𝑖2

• Using Validation data.

Introduction to Machine Learning Amo G. Tong 39

Overfitting

Page 40: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

• Overfitting is everywhere…

• Weight Decay:

• Decrease each weight by some small factor during each iteration?

• Penalize large weights:

• 𝐸𝑟𝑟𝑜𝑟 = 𝐸𝑟𝑟𝑜𝑟 + 𝛾 ∑𝜔𝑖2

• Using Validation data.

Introduction to Machine Learning Amo G. Tong 40

Overfitting

Page 41: Introduction to Machine Learning - udel.eduudel.edu/~amotong/teaching/machine learning...Supervised Learning • Artificial Neural Networks • Backpropagation Algorithm • Some materials

• What is an artificial neural network?

• Process the input by the connected units.

• How to training a neural network?

• Error-driven.

• Backpropagation algorithm.

• Two layers of sigmoid units.

Introduction to Machine Learning Amo G. Tong 41

Summary