Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} -...

74
Machine Learning Chao Lan

Transcript of Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} -...

Page 1: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Machine Learning

Chao Lan

Page 2: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Background

Page 3: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern
Page 4: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Can we build a machine that can automatically filter spams?

Page 5: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Which words imply spam?

Page 6: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Does this word imply spam?

Page 7: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern
Page 8: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern
Page 9: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Does this word imply spam?

Page 10: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern
Page 11: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern
Page 12: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Does this combination of words imply spam?

Page 13: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Manually designing patterns for spam is hard.

Page 14: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Can we let the machine learn patterns of spam?

Page 15: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Computers learn from examples to improve its generalizable (classification) performance. - without being explicitly programmed

What is machine learning?

Page 16: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5

A hypothetical pattern of spam learned by the machine.

Page 17: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Other Examples

Page 18: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Other Examples

Page 19: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Other Examples

Page 20: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Other Examples

Page 21: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Concepts

Page 22: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Computers learn from examples to improve its generalizable (classification) performance. - without being explicitly programmed

Revisit: What is machine learning?

Page 23: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Instance, Label

instance x

spam

label y instance x

ham

label y

Page 24: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Model

instance x

ham

predicted label f(x)model f

Page 25: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Prediction Error (or, Generalization Error)

err(f) = 0.3

Page 26: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Training, Training Set

model(training) instances train a model

Page 27: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Supervised Learning versus Unsupervised Learning Tasks

model(training) instances train a model

spam

ham

know instances and their labels in the training set

Page 28: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Supervised Learning versus Unsupervised Learning Tasks

model(training) instances train a model

know instances, not their labels, in the training set

?

?

Page 29: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Testing, Testing Set

(testing) instance predict predicted label

ham

Page 30: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Classification versus Regression

(testing) instance predict predicted label

ham

label is discrete

Page 31: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Classification versus Regression

(testing) instance predict predicted label

minutes for the survey

label is continuous

Page 32: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

[E1] Build a model to classify article topic (sports, politics, etc)

1. what is an instance, what is the label?

2. what are the model input and output?

3. If we have a set of documents with on sports, politics, education and academic, is it a supervised or unsupervised learning task?

4. Is it a classification or regression task?

Page 33: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

[E1] Build a model to classify article topic (sports, politics, etc)

1. what is an instance, what is the label?

2. what are the model input and output?

3. If we have a set of documents with on sports, politics, education and academic, is it a supervised or unsupervised learning task?

4. Is it a classification or regression task?

Page 34: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

[E1] Build a model to classify article topic (sports, politics, etc)

1. what is an instance, what is the label?

2. what are the model input and output?

3. If we have a set of documents with known topics on sports, politics and academic, is it a supervised or unsupervised learning task?

4. Is it a classification or regression task?

Page 35: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

[E1] Build a model to classify article topic (sports, politics, etc)

1. what is an instance, what is the label?

2. what are the model input and output?

3. If we have a set of documents with known topics on sports, politics and academic, is it a supervised or unsupervised learning task?

4. Is it a classification or regression task?

Page 36: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

[E2] Build a model to predict student GPA.

1. what is an instance, what is the label?

2. what are the model input and output?

3. If we have a set of students whose GPAs will be known by the end of this semester, is it a supervised or unsupervised learning task?

4. Is it a classification or regression task?

Page 37: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

[E2] Build a model to predict student GPA.

1. what is an instance, what is the label?

2. what are the model input and output?

3. If we have a set of students whose GPAs will be known by the end of this semester, is it a supervised or unsupervised learning task?

4. Is it a classification or regression task?

Page 38: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

[E2] Build a model to predict student GPA.

1. what is an instance, what is the label?

2. what are the model input and output?

3. If we have a set of students whose GPAs will be known by the end of this semester, is it a supervised or unsupervised learning task?

4. Is it a classification or regression task?

Page 39: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

[E2] Build a model to predict student GPA.

1. what is an instance, what is the label?

2. what are the model input and output?

3. If we have a set of students whose GPAs will be known by the end of this semester, is it a supervised or unsupervised learning task?

4. Is it a classification or regression task?

Page 40: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

An instance is often represented as a feature vector x.

Page 41: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

An instance is often represented as a feature vector x.

x =

steal

lie,cheat

behavior

peer rej

low ac

.

.

=

0

1

2

1

2

.

.

Page 42: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Q: How to represent a text document?

Page 43: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Example

x =

google lotterycatemailtransportpandamillion ..

=

1101001..

Page 44: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Q: How to represent an image?

Page 45: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Q: How to represent an image?

Page 46: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Example

.

.

.

x =

Page 47: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Q: how to represent a user in a graph?

A B

C

D E

F G

Page 48: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Example

x =

A?

B?

C?

D?

E?

F?

G?

A B

C

D E

F G

Page 49: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

x =

A?

B?

C?

D?

E?

F?

G?

=

0

0

1

0

1

1

1

A B

C

D E

F G

Example

Page 50: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

x =

A?

B?

C?

D?

E?

F?

G?

=

0

0

1

0

1

1

1

A B

C

D E

F G

Q: better ways to build vector? (feature engineering)

Page 51: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

A model is a function governed by unknown parameters.

Example: model f is a linear function of features xi with unknown parameters θi’s.

f(x) = θ1x1 + θ2x2 + … + θpxp

- training f means estimating θ’s from training instances

- once θ’s are fixed, model f is fixed and can be applied

Page 52: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Example: use a hyper-parameter λ to control the domain of θ’s.

f(x) = θ1x1 + θ2x2 + … + θpxp

- if λ = 10, then θ ∈ [-1,1] — larger domain, f is complex

- if λ = 1, then θ ∈ {0, 1} — smaller domain, f is simple

A model’s complexity is governed by hyper-parameters.

Page 53: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

1. f(x) = θ1x1 + θ2x2 + … + θpxp, θ ∈ [0,1]

2. f(x) = θ1x1 + θ2x2 + … + θpxp, θ ∈ {0,1}

Q: which model is has higher complexity?

Page 54: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

1. f(x) = θ1x1 + θ2x2 + … + θpxp, θ ∈ [0,1]

2. f(x) = θ1x1 + θ2x2 + … + θpxp, θ ∈ {0,1}

A model with larger domain is often more complex.

Q: what is the hyper-parameter?

Page 55: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Q: which model is has higher complexity?

1. f(x) = θ1x1 + θ2x2 + … + θ10x10, θ ∈ [0,1]

2. f(x) = θ1x1 + θ2x2 + … + θpxp , θ ∈ [0,1]

Page 56: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

1. f(x) = θ1x1 + θ2x2 + … + θ10x10, θ ∈ [0,1]

2. f(x) = θ1x1 + θ2x2 + … + θpxp , θ ∈ [0,1]

A model with more parameters is often more complex.

Q: what is the hyper-parameter?

Page 57: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

1. f(x) = θ1x1 + θ2x2 + … + θ10x10, θ ∈ [0,1]

2. f(x) = θ1x1 + θ2x2 + … + θpxp , θ ∈ [0,1]

Q: which model is has higher complexity?

Page 58: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

1. f(x) = θ1x1 + θ2x2 + … + θ10x10, θ ∈ [0,1]

2. f(x) = θ1x1 + θ2x2 + … + θpxp , θ ∈ [0,1]

A model capturing more complicated relations is often more complex.

Q: what is the hyper-parameter?

Page 59: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

1. f(x) = θ1x1 + θ2x2 + … + θ10x10, θ ∈ [0,1]

2. f(x) = θ1x1 + θ2x2 + … + θpxp , θ ∈ [0,1]

Q: which model is has higher complexity?

Page 60: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

A more complex model is more likely to recover true relation between x and y.

Example: true relation is y = 0.3*x1 - 0.7*x2

- if λ = 10, then θ ∈ [-1,1] — f is complex and can recover the above relation

- if λ = 1, then θ ∈ {0, 1} — f is simple and cannot recover the above relation

- better recovery of the true relation implies higher model accuracy

Connection: Model Complexity and Achievable Accuracy

Page 61: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Q: True or False?

Always build complex model since it is more likely to recover the true relation.

- f1(x) = θ1x1 + θ2x2 + … + θ10x10, θ ∈ [0,1]

- f2(x) = θ1x1 + θ2x2 + … +xp , θ ∈ [0,1]

Page 62: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Q: Which model estimation has less variance?

Always build complex model since it is more likely to recover the true relation.

- f1(x) = θ1x1 + θ2x2 + … + θ10x10, θ ∈ [0,1]

- f2(x) = θ1x1 + θ2x2 + … +xp , θ ∈ [0,1]

Student ID x1: #hour/day x2: #hw/week ... x10: major GPA

1 3.5 0.8 ... cs 3.7

2 2 0.4 ... cs 3.4

Page 63: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Complex model is more demanding on training data volume.

Always build complex model since it is more likely to recover the true relation.

- f1(x) = θ1x1 + θ2x2 + … + θ10x10, θ ∈ [0,1]

- f2(x) = θ1x1 + θ2x2 + … +xp , θ ∈ [0,1]

Student ID x1: #hour/day x2: #hw/week ... x10: major GPA

1 3.5 0.8 ... cs 3.7

2 2 0.4 ... cs 3.4

Page 64: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

model f

population

Another way to look at estimation variance.

sample a training set

Stu ID x1: x2: ...

1 3.5 0.8 ...

2 2 0.4 ...

apply on new (testing) datatraining

Page 65: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

model f

Another way to look at estimation variance.

training set is small

many models may work well on training data, but not everyone works well on the population.

It is likely to learn a model that works well on training data, but not so well on new data in the population, especially if the training set is biased.

Page 66: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

model f

Overfitting

training set is small

many models may work well on training data, but not everyone works well on the population.

It is likely to learn a model that works well on training data, but not so well on new data in the population, especially if the training set is biased.

If testing error >> training error, we say f overfits.

Page 67: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Q: which model (indexed by λ) overfits?

λ=1, 2, 3, 4, 5, 6, 7, 8, 9, 10

Page 68: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

λ=1, 2, 3, 4, 5, 6, 7, 8, 9, 10

Connection: more complex model is more likely to overfit.

Page 69: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Q: True or False?Since more complex model is more likely to overfit, always build simple model.

λ=1, 2, 3, 4, 5, 6, 7, 8, 9, 10

Page 70: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Q: How to choose model complexity (λ) in practice?

λ=1, 2, 3, 4, 5, 6, 7, 8, 9, 10

Page 71: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Model Selection by K-Fold Cross Validation

choose a candidate hyper-parameter λ1

Page 72: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Q: how to choose candidate hyper-parameters?

choose a candidate hyper-parameter λ1

Page 73: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Strategies of choosing multiple candidate hyper-parameters.

Page 74: Machine Learning - uwyo.educlan/teach/ai19/ml_a.pdf · What is machine learning? 0.4*δ{lottery} - 0.7*δ{lottery} + 0.18*δ{account} - 0.32*δ{birth} > 0.5 A hypothetical pattern

Wrap Up: Introduction

Concepts: instance, label, model, training, testing

Data: feature vector representation (profile, text, image, graph, etc)

Model: parameter, hyper-parameter, model complexity, overfitting

Model Selection: k-fold cross validation