Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to...
Transcript of Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to...
![Page 1: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just](https://reader033.fdocuments.us/reader033/viewer/2022050513/5f9d4481008bab20397847b8/html5/thumbnails/1.jpg)
1
![Page 2: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just](https://reader033.fdocuments.us/reader033/viewer/2022050513/5f9d4481008bab20397847b8/html5/thumbnails/2.jpg)
Review from last lecture
2
![Page 3: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just](https://reader033.fdocuments.us/reader033/viewer/2022050513/5f9d4481008bab20397847b8/html5/thumbnails/3.jpg)
3
![Page 4: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just](https://reader033.fdocuments.us/reader033/viewer/2022050513/5f9d4481008bab20397847b8/html5/thumbnails/4.jpg)
4
![Page 5: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just](https://reader033.fdocuments.us/reader033/viewer/2022050513/5f9d4481008bab20397847b8/html5/thumbnails/5.jpg)
5
![Page 6: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just](https://reader033.fdocuments.us/reader033/viewer/2022050513/5f9d4481008bab20397847b8/html5/thumbnails/6.jpg)
6
![Page 7: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just](https://reader033.fdocuments.us/reader033/viewer/2022050513/5f9d4481008bab20397847b8/html5/thumbnails/7.jpg)
7
![Page 8: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just](https://reader033.fdocuments.us/reader033/viewer/2022050513/5f9d4481008bab20397847b8/html5/thumbnails/8.jpg)
Back to standard machine learning problem. 2 dimensions (2 attributes, x1 and x2).
2 classes (red and blue). A classifier would classify each point as red or blue.
The points are clearly linearly separable (blue can be separated from red with a line)
8
![Page 9: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just](https://reader033.fdocuments.us/reader033/viewer/2022050513/5f9d4481008bab20397847b8/html5/thumbnails/9.jpg)
9
![Page 10: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just](https://reader033.fdocuments.us/reader033/viewer/2022050513/5f9d4481008bab20397847b8/html5/thumbnails/10.jpg)
10
![Page 11: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just](https://reader033.fdocuments.us/reader033/viewer/2022050513/5f9d4481008bab20397847b8/html5/thumbnails/11.jpg)
11
![Page 12: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just](https://reader033.fdocuments.us/reader033/viewer/2022050513/5f9d4481008bab20397847b8/html5/thumbnails/12.jpg)
This is how the perceptron with the initial weights will classify points. Points above
the line are classified as 1.
Not very surprising that this isn't very good, as we randomly set the weights
12
![Page 13: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just](https://reader033.fdocuments.us/reader033/viewer/2022050513/5f9d4481008bab20397847b8/html5/thumbnails/13.jpg)
The algorithm we showed you last time needs to take the derivative of g. Here, g
does not have a derivative. For now, we will just ignore that.
13
![Page 14: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just](https://reader033.fdocuments.us/reader033/viewer/2022050513/5f9d4481008bab20397847b8/html5/thumbnails/14.jpg)
So here’s what’s happening in a perceptron with those weights.
Picked a mislabeled point, and update the weights
14
![Page 15: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just](https://reader033.fdocuments.us/reader033/viewer/2022050513/5f9d4481008bab20397847b8/html5/thumbnails/15.jpg)
Pick another mislabeled point, update weights again
15
![Page 16: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just](https://reader033.fdocuments.us/reader033/viewer/2022050513/5f9d4481008bab20397847b8/html5/thumbnails/16.jpg)
And again...
16
![Page 17: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just](https://reader033.fdocuments.us/reader033/viewer/2022050513/5f9d4481008bab20397847b8/html5/thumbnails/17.jpg)
Note that each time we pick a blue point essentially nothing happens.
17
![Page 18: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just](https://reader033.fdocuments.us/reader033/viewer/2022050513/5f9d4481008bab20397847b8/html5/thumbnails/18.jpg)
Note: w0 could only ever change by alpha. If alpha is too small, it will move too
slow. If alpha is too big, you might not have the resolution to find the answer.
18
![Page 19: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just](https://reader033.fdocuments.us/reader033/viewer/2022050513/5f9d4481008bab20397847b8/html5/thumbnails/19.jpg)
19
![Page 20: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just](https://reader033.fdocuments.us/reader033/viewer/2022050513/5f9d4481008bab20397847b8/html5/thumbnails/20.jpg)
All these lines separate red from blue. But is one better than the other?
20
![Page 21: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just](https://reader033.fdocuments.us/reader033/viewer/2022050513/5f9d4481008bab20397847b8/html5/thumbnails/21.jpg)
21
![Page 22: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just](https://reader033.fdocuments.us/reader033/viewer/2022050513/5f9d4481008bab20397847b8/html5/thumbnails/22.jpg)
22
![Page 23: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just](https://reader033.fdocuments.us/reader033/viewer/2022050513/5f9d4481008bab20397847b8/html5/thumbnails/23.jpg)
23
![Page 24: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just](https://reader033.fdocuments.us/reader033/viewer/2022050513/5f9d4481008bab20397847b8/html5/thumbnails/24.jpg)
Pick the direction with the largest margin, and then put the separator in the middle
of the margin. There is a pretty efficient algorithm to find this separator (but the
algorithm is a bit mathy to present here)
When the data is not linearly separable then there’s a thing called “soft margin” you
can look at.
The points on the dotted line are considered the “support vectors”
24
![Page 25: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just](https://reader033.fdocuments.us/reader033/viewer/2022050513/5f9d4481008bab20397847b8/html5/thumbnails/25.jpg)
25
![Page 26: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just](https://reader033.fdocuments.us/reader033/viewer/2022050513/5f9d4481008bab20397847b8/html5/thumbnails/26.jpg)
Running a perceptron on this data
26
![Page 27: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just](https://reader033.fdocuments.us/reader033/viewer/2022050513/5f9d4481008bab20397847b8/html5/thumbnails/27.jpg)
27
![Page 28: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just](https://reader033.fdocuments.us/reader033/viewer/2022050513/5f9d4481008bab20397847b8/html5/thumbnails/28.jpg)
The data isn’t linearly separable. Boo.
How can we make this data linearly separable?
28
![Page 29: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just](https://reader033.fdocuments.us/reader033/viewer/2022050513/5f9d4481008bab20397847b8/html5/thumbnails/29.jpg)
But you can make it linearly separable by using a transformation . For example,
you can make points (x1, x1^2).
By mapping the one dimensional examples to higher dimensions you can make
them linearly separable!
29
![Page 30: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just](https://reader033.fdocuments.us/reader033/viewer/2022050513/5f9d4481008bab20397847b8/html5/thumbnails/30.jpg)
30
![Page 31: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just](https://reader033.fdocuments.us/reader033/viewer/2022050513/5f9d4481008bab20397847b8/html5/thumbnails/31.jpg)
Oh hey check this out mapping to higher dimensions is actually a pretty good idea.
The trick is just choosing how to map it.
31
![Page 32: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just](https://reader033.fdocuments.us/reader033/viewer/2022050513/5f9d4481008bab20397847b8/html5/thumbnails/32.jpg)
This is very pretty, watch it!
32
![Page 33: Review from last lecture - andrew.cmu.edu€¦ · The algorithm we showed you last time needs to take the derivative of g. Here, g does not have a derivative. For now, we will just](https://reader033.fdocuments.us/reader033/viewer/2022050513/5f9d4481008bab20397847b8/html5/thumbnails/33.jpg)
33