Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to...

33
1

Transcript of Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to...

Page 1: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just

1

Page 2: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just

Review from last lecture

2

Page 3: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just

3

Page 4: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just

4

Page 5: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just

5

Page 6: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just

6

Page 7: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just

7

Page 8: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just

Back to standard machine learning problem. 2 dimensions (2 attributes, x1 and x2).

2 classes (red and blue). A classifier would classify each point as red or blue.

The points are clearly linearly separable (blue can be separated from red with a line)

8

Page 9: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just

9

Page 10: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just

10

Page 11: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just

11

Page 12: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just

This is how the perceptron with the initial weights will classify points. Points above

the line are classified as 1.

Not very surprising that this isn't very good, as we randomly set the weights

12

Page 13: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just

The algorithm we showed you last time needs to take the derivative of g. Here, g

does not have a derivative. For now, we will just ignore that.

13

Page 14: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just

So here’s what’s happening in a perceptron with those weights.

Picked a mislabeled point, and update the weights

14

Page 15: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just

Pick another mislabeled point, update weights again

15

Page 16: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just

And again...

16

Page 17: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just

Note that each time we pick a blue point essentially nothing happens.

17

Page 18: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just

Note: w0 could only ever change by alpha. If alpha is too small, it will move too

slow. If alpha is too big, you might not have the resolution to find the answer.

18

Page 19: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just

19

Page 20: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just

All these lines separate red from blue. But is one better than the other?

20

Page 21: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just

21

Page 22: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just

22

Page 23: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just

23

Page 24: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just

Pick the direction with the largest margin, and then put the separator in the middle

of the margin. There is a pretty efficient algorithm to find this separator (but the

algorithm is a bit mathy to present here)

When the data is not linearly separable then there’s a thing called “soft margin” you

can look at.

The points on the dotted line are considered the “support vectors”

24

Page 25: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just

25

Page 26: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just

Running a perceptron on this data

26

Page 27: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just

27

Page 28: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just

The data isn’t linearly separable. Boo.

How can we make this data linearly separable?

28

Page 29: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just

But you can make it linearly separable by using a transformation . For example,

you can make points (x1, x1^2).

By mapping the one dimensional examples to higher dimensions you can make

them linearly separable!

29

Page 30: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just

30

Page 31: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just

Oh hey check this out mapping to higher dimensions is actually a pretty good idea.

The trick is just choosing how to map it.

31

Page 32: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just

This is very pretty, watch it!

32

Page 33: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just

33