Road Sickness The Case of Oliver Stone’s Natural Born Killers - rayan fraser
Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 ·...
Transcript of Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 ·...
![Page 1: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/1.jpg)
Introduction to Machine Learning
Risk Minimization
Barnabás Póczos
![Page 2: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/2.jpg)
What have we seen so far?
2
Several classification & regression algorithms seem to work
fine on training datasets:
• Linear regression
• Gaussian Processes
• Naïve Bayes classifier
• Support Vector Machines
How good are these algorithms on unknown test sets?
How many training samples do we need to achieve small error?
What is the smallest possible error we can achieve?
=> Learning Theory
![Page 3: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/3.jpg)
Outline• Risk and loss
–Loss functions
–Risk
–Empirical risk vs True risk
–Empirical Risk minimization
• Underfitting and Overfitting
• Classification
• Regression
3
![Page 4: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/4.jpg)
Supervised Learning Setup
Generative model of the data:(train and test data)
4
Regression:
Classification:
![Page 5: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/5.jpg)
Loss
It measures how good we are on a particular (x,y) pair.
5
Loss function:
![Page 6: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/6.jpg)
Loss Examples
Classification loss:
L2 loss for regression:
L1 loss for regression:
Regression: Predict house prices.
Price
6
![Page 7: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/7.jpg)
Squared loss, L2 loss
7Picture form Alex
![Page 8: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/8.jpg)
L1 loss
8Picture form Alex
![Page 9: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/9.jpg)
-insensitive loss
9Picture form Alex
![Page 10: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/10.jpg)
Huber’s robust loss
10Picture form Alex
![Page 11: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/11.jpg)
Risk
Risk of f classification/regression function:
= The expected loss
Why do we care about this?
11
![Page 12: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/12.jpg)
Why do we care about risk?
Risk of f classification/regression function:
= The expected loss
Our true goal is to minimize the loss of the test points!
Usually we don’t know the test points and their labels in advance…, but
(LLN)
12That is why our goal is to minimize the risk.
![Page 13: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/13.jpg)
Risk Examples
Risk: The expected loss
13
Classification loss:
Risk of classification loss:
L2 loss for regression:
Risk of L2 loss:
![Page 14: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/14.jpg)
Bayes RiskThe expected loss
We consider all possible function f here
We don’t know P, but we have i.i.d. training data sampled from P!
Goal of Learning:
The learning algorithm constructs this function fD from the training data.14
Definition: Bayes Risk
![Page 15: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/15.jpg)
Convergence in Probability
![Page 16: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/16.jpg)
Convergence in Probability
16
Notation:
Definition:
This indeed measures how far the values of Zn() and Z() are from each other.
![Page 17: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/17.jpg)
Consistency of learning methods
Definition:
Stone’s theorem 1977: Many classification, regression algorithms
are universally consistent for certain loss functions under certain
conditions: kNN, Parzen kernel regression, SVM,…
Yayyy!!! ☺Wait! This doesn’t tell us anything about the rates…
18
Risk is a random variable:
![Page 18: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/18.jpg)
No Free Lunch!
Devroy 1982: For every consistent learning method and for every
fixed convergence rate an, there exists P(X,Y) distribution such
that the convergence rate of this learning method on P(X,Y)
distributed data is slower than an
19
What can we do now?
![Page 19: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/19.jpg)
Empirical Risk and True Risk
20
![Page 20: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/20.jpg)
Empirical Risk
21
Let us use the empirical counter part:
Shorthand:
True risk of f (deterministic):
Bayes risk:
Empirical risk:
![Page 21: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/21.jpg)
Empirical Risk Minimization
22
Law of Large Numbers:
Empirical risk is converging to the Bayes risk
![Page 22: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/22.jpg)
Overfitting in Classification with ERM
23
Bayes classifier:
Picture from David Pal
Bayes risk:
Generative model:
![Page 23: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/23.jpg)
24
Picture from David Pal
Bayes risk:
n-order thresholded polynomials
Empirical risk:
Overfitting in Classification with ERM
![Page 24: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/24.jpg)
Is the following predictor a good one?
What is its empirical risk? (performance on training data)
zero !
What about true risk?
> zero
Will predict very poorly on new random test point:
Large generalization error !
Overfitting in Regression with
ERM
25
![Page 25: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/25.jpg)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.5
1
1.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.2
0.4
0.6
0.8
1
1.2
1.4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-45
-40
-35
-30
-25
-20
-15
-10
-5
0
5
k=1 k=2
k=3 k=7
If we allow very complicated predictors, we could overfit
the training data.
Examples: Regression (Polynomial of order k-1 – degree k )
Overfitting in Regression
26
constant linear
quadratic6th order
![Page 26: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/26.jpg)
Solutions to Overfitting
27
![Page 27: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/27.jpg)
Solutions to Overfitting
Structural Risk Minimization
Notation:
1st issue:(Model error, Approximation error)
Solution: Structural Risk Minimzation (SRM)
28
Risk Empirical risk
![Page 28: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/28.jpg)
Approximation error, Estimation
error, PAC framework
29
Bayes risk
Risk of the classifier f
Estimation error Approximation error
Probably Approximately Correct (PAC) learning framework
Estimation error
![Page 29: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/29.jpg)
Big Picture
30
Bayes risk
Estimation error Approximation error
Bayes risk
Ultimate goal:
Approximation error
Estimation error
Bayes risk
![Page 30: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/30.jpg)
Solution to Overfitting
2nd issue:
Solution:
31
![Page 31: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/31.jpg)
Approximation with the
Hinge loss and quadratic loss
Picture is taken from R. Herbrich
![Page 32: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/32.jpg)
Empirical risk is no longer a
good indicator of true risk
fixed # training data
If we allow very complicated predictors, we could overfit
the training data.
Effect of Model Complexity
33
Prediction error on training data
![Page 33: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/33.jpg)
Underfitting
Bayes risk = 0.134
![Page 34: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/34.jpg)
Underfitting
Best linear classifier:
The empirical risk of the best linear classifier:
35
![Page 35: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/35.jpg)
Underfitting
Best quadratic classifier:
Same as the Bayes risk ) good fit!
36
![Page 36: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/36.jpg)
Classification
using the 0-1 loss
37
![Page 37: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/37.jpg)
The Bayes Classifier
38
Lemma I:
Lemma II:
![Page 38: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/38.jpg)
Proofs
39
Lemma I: Trivial from definition
Lemma II: Surprisingly long calculation
![Page 39: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/39.jpg)
The Bayes Classifier
40
This is what the learning algorithm produces
We will need these definitions, please copy it!
![Page 40: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/40.jpg)
The Bayes Classifier
41
Theorem I: Bound on the Estimation error
The true risk of what the learning algorithm produces
This is what the learning algorithm produces
![Page 41: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/41.jpg)
Theorem I: Bound on the Estimation error
The true risk of what the learning algorithm produces
Proof of Theorem 1
Proof:
![Page 42: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/42.jpg)
The Bayes Classifier
43
Theorem II:This is what the learning algorithm produces
![Page 43: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/43.jpg)
Proofs
44
Theorem I: Not so long calculations.
Theorem II: Trivial
Main message:
It’s enough to derive upper bounds for
Corollary:
![Page 44: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/44.jpg)
Illustration of the Risks
45
![Page 45: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/45.jpg)
It is a random variable that we need to bound!
We will bound it with tail bounds!
46
It’s enough to derive upper bounds for
![Page 46: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/46.jpg)
Hoeffding’s inequality (1963)
47
Special case
![Page 47: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/47.jpg)
Binomial distributions
48
Our goal is to bound
Bernoulli(p)
Therefore, from Hoeffding we have:
Yuppie!!!
![Page 48: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/48.jpg)
Inversion
49
From Hoeffding we have:
Therefore,
![Page 49: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/49.jpg)
Union Bound
50
Our goal is to bound:
We already know:
Theorem: [tail bound on the ‘deviation’ in the worst case]
Worst case error
Proof:
This is not the worst classifier in terms of classification accuracy!
Worst case means that the empirical risk of classifier f is the furthest from its true risk!
![Page 50: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/50.jpg)
Inversion of Union Bound
51
Therefore,
We already know:
![Page 51: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/51.jpg)
Inversion of Union Bound
52
•The larger the N, the looser the bound
•This results is distribution free: True for all P(X,Y) distributions
• It is useless if N is big, or infinite… (e.g. all possible hyperplanes)
We will see later how to fix that. (Hint: McDiarmid, VC dimension…)
![Page 52: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/52.jpg)
The Expected Error
53
Our goal is to bound:
Theorem: [Expected ‘deviation’ in the worst case]
Worst case deviation
We already know:
Proof: we already know a tail bound.
(Tail bound, Concentration inequality)
(From that actually we get a bit weaker inequality… oh well)
This is not the worst classifier in terms of classification accuracy!
Worst case means that the empirical risk of classifier f is the furthest from its true risk!
![Page 53: Introduction to Machine Learning10701/slides/18_Learning_Theory_Risk_Min.pdf · 2017-11-08 · Stone’s theorem 1977: Many classification, regression algorithms are universally consistent](https://reader036.fdocuments.us/reader036/viewer/2022062505/5e5cc1e4f49b883c4e5e0314/html5/thumbnails/53.jpg)
Thanks for your attention ☺
54