Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.

19
Machine Learning ICS 178 Instructor: Max Welling Supervised Learning

description

1 nearest neighbors (your first ML algorithm!) Idea: 1.Find the picture in the database which is closest to your query image. 2.Check its label. 3.Declare the class of your query image to be the same as that of the closest picture. query closest image

Transcript of Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.

Page 1: Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.

Machine LearningICS 178

Instructor: Max Welling

Supervised Learning

Page 2: Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.

Supervised Learning IExample: Imagine you want to classify versus

Data: 100 monkey images and 200 human images with labels what is what.

,

,

{ 0}, 1,...,100{ 1}, 1,...,200

i i

j j

x y ix y j

where x represents the greyscale of the image pixels andy=0 means “monkey” while y=1 means “human”.

Task: Here is a new image: monkey or human?

Page 3: Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.

1 nearest neighbors(your first ML algorithm!)

Idea: 1. Find the picture in the database which is closest to your query image.

2. Check its label.

3. Declare the class of your query image to be the same as that of the closest picture.

query closest image

Page 4: Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.

1NN Decision Surface

decision curve

Page 5: Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.

Distance Metric• How do we measure what it means to be “close”?

• Depending on the problem we should choose an appropriate “distance” metric (or more generally, a (dis)similarity measure)

Hamming distance:( , ) | | { discrete};

Scaled EuclideanDistance:( , ) ( ) ( ) { .};

n m n m

Tn m n m n m

D x x x x x

D x x x x A x x x cont

-Demo: http://www.comp.lancs.ac.uk/~kristof/research/notes/nearb/cluster.html-Matlab Demo.

Page 6: Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.

Remarks on NN methods

• We only need to construct a classifier that works locally for each query. Hence: We don’t need to construct a classifier everywhere in space.

• Classifying is done at query time. This can be computationally taxing at a time where you might want to be fast.

• Memory inefficient.

• Curse of dimensionality: imagine many features are irrelevant / noisy distances are always large.

• Very flexible, not many prior assumptions.

• k-NN variants robust against “bad examples”.

Page 7: Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.

Non-parametric Methods

• Non-parametric methods keep all the data cases/examples in memory.

• A better name is: “instance-based” learning

• As the data-set grows, the complexity of the decision surface grows.

• Sometimes, non-parametric methods have some parameters to tune...

• “Assumption light”

Page 8: Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.

• In unsupervised learning we are concerned with finding the structure in the data.

• For instance, we may want to learn the probability distribution from which the data was sampled.

• Data now looks like (no labels):

• A nonparametric estimate of the probability distribution can be obtained through Parzen-windowing (or kernel estimation).

Unsupervised learning

},...,{,..,1},{

1 iFii

i

xxxNix

Page 9: Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.

• First define a “kernel” . This is some function that “smoothes” a data-point and satisfies:

• We estimate the distribution as:

• Example: Gaussian kernel (RBF kernel)

• When you add these up with weights 1/N you get:

Parzen Windowing

N

ii

N

iiii i

NwgewwithxxKwxP

1 1

),1..(1)|()(

Dx

ixxKdx 1)|(

)|( ixxK

2

2 ||||21exp

21)|( ii xxxxK

Page 10: Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.

Example: Distribution of Grades

Page 11: Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.

Overfitting• Fundamental issue with Parzen windowing: How do you choose the width of the kernel ?

• To broad: you loose the details, to small, you see too much detail.

• Imagine you left out part of the data-points when you produced your estimate for the distribution. A good fit is one that would have high probability for the left out data-points.

• So you can tune on the left out data!

• This is called (cross) validation.

•This issue is fundamental to machine learning: how much fitting is just right.

Page 12: Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.

Overfitting in a Picture

error on hold-out data

error on training data

/1

Page 13: Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.

Generalization• Consider the following regression problem:• Predict the real value on the y-axis from the real value on the x-axis.• You are given 6 examples: {Xi,Yi}.• What is the y-value for a new query point X* ?

X*

Page 14: Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.

Generalization

Page 15: Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.

Generalization

Page 16: Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.

Generalization

which curve is best?

Page 17: Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.

• Ockham’s razor: prefer the simplest hypothesis consistent with data.

Generalization

Page 18: Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.

Your Take-Home Message

Learning is concerned with accurate predictionof future data, not accurate prediction of training data.

(The single most important sentence you will see in the course)

Page 19: Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.

Homework

• Read chapter 4, pages 161-165, 168-17.

• Download netflix data and plot:– ratings as a function of time. – variance in ratings as a function of the mean

of movie ratings.