CSC411: Midterm Reviewmren/teach/csc411_19s/tut/... · 2019-08-31 · CSC411: Midterm Review...
Transcript of CSC411: Midterm Reviewmren/teach/csc411_19s/tut/... · 2019-08-31 · CSC411: Midterm Review...
![Page 1: CSC411: Midterm Reviewmren/teach/csc411_19s/tut/... · 2019-08-31 · CSC411: Midterm Review Xiaohui Zeng February 7, 2019 Built from slides by James Lucas, Joe Wu and others 1/23.](https://reader034.fdocuments.us/reader034/viewer/2022050502/5f94646936427826c44b4c43/html5/thumbnails/1.jpg)
CSC411: Midterm Review
Xiaohui Zeng
February 7, 2019
Built from slides by James Lucas, Joe Wu and others1/23
![Page 2: CSC411: Midterm Reviewmren/teach/csc411_19s/tut/... · 2019-08-31 · CSC411: Midterm Review Xiaohui Zeng February 7, 2019 Built from slides by James Lucas, Joe Wu and others 1/23.](https://reader034.fdocuments.us/reader034/viewer/2022050502/5f94646936427826c44b4c43/html5/thumbnails/2.jpg)
Agenda
1. A brief overview
2. Some sample questions
2/23
![Page 3: CSC411: Midterm Reviewmren/teach/csc411_19s/tut/... · 2019-08-31 · CSC411: Midterm Review Xiaohui Zeng February 7, 2019 Built from slides by James Lucas, Joe Wu and others 1/23.](https://reader034.fdocuments.us/reader034/viewer/2022050502/5f94646936427826c44b4c43/html5/thumbnails/3.jpg)
Basic ML Terminology
I Regression
I Overfitting
I Generalization
I Stochastic Gradient Descent(SGD)
I Classification
I Underfitting
I Regularization
I Bayes Optimal
3/23
![Page 4: CSC411: Midterm Reviewmren/teach/csc411_19s/tut/... · 2019-08-31 · CSC411: Midterm Review Xiaohui Zeng February 7, 2019 Built from slides by James Lucas, Joe Wu and others 1/23.](https://reader034.fdocuments.us/reader034/viewer/2022050502/5f94646936427826c44b4c43/html5/thumbnails/4.jpg)
Basic ML Terminology
I Training Data
I Validation Data
I Test Data
I Optimization
I 0-1 Loss
I Linear classifier
I Features
I Model
4/23
![Page 5: CSC411: Midterm Reviewmren/teach/csc411_19s/tut/... · 2019-08-31 · CSC411: Midterm Review Xiaohui Zeng February 7, 2019 Built from slides by James Lucas, Joe Wu and others 1/23.](https://reader034.fdocuments.us/reader034/viewer/2022050502/5f94646936427826c44b4c43/html5/thumbnails/5.jpg)
Some Questions
Question
1. Bagging improved performance by reducing
2. Bagging improved performance by reducing variance
5/23
![Page 6: CSC411: Midterm Reviewmren/teach/csc411_19s/tut/... · 2019-08-31 · CSC411: Midterm Review Xiaohui Zeng February 7, 2019 Built from slides by James Lucas, Joe Wu and others 1/23.](https://reader034.fdocuments.us/reader034/viewer/2022050502/5f94646936427826c44b4c43/html5/thumbnails/6.jpg)
Some Questions
Question
1. Bagging improved performance by reducing variance
5/23
![Page 7: CSC411: Midterm Reviewmren/teach/csc411_19s/tut/... · 2019-08-31 · CSC411: Midterm Review Xiaohui Zeng February 7, 2019 Built from slides by James Lucas, Joe Wu and others 1/23.](https://reader034.fdocuments.us/reader034/viewer/2022050502/5f94646936427826c44b4c43/html5/thumbnails/7.jpg)
Some Questions
Question
1. Bagging improved performance by reducing variance
2. Given discrete random variables X and Y. TheInformation Gain in Y due to X is
IG (Y |X ) = H( )− H( ),
where H is the entropy
5/23
![Page 8: CSC411: Midterm Reviewmren/teach/csc411_19s/tut/... · 2019-08-31 · CSC411: Midterm Review Xiaohui Zeng February 7, 2019 Built from slides by James Lucas, Joe Wu and others 1/23.](https://reader034.fdocuments.us/reader034/viewer/2022050502/5f94646936427826c44b4c43/html5/thumbnails/8.jpg)
Some Questions
Question
1. Bagging improved performance by reducing variance
2. Given discrete random variables X and Y. TheInformation Gain in Y due to X is
IG (Y |X ) = H(Y )− H(Y |X ),
where H is the entropy
5/23
![Page 9: CSC411: Midterm Reviewmren/teach/csc411_19s/tut/... · 2019-08-31 · CSC411: Midterm Review Xiaohui Zeng February 7, 2019 Built from slides by James Lucas, Joe Wu and others 1/23.](https://reader034.fdocuments.us/reader034/viewer/2022050502/5f94646936427826c44b4c43/html5/thumbnails/9.jpg)
Some Questions
Question 2
Take labelled data (X, y).
1. Why should you use a validation set?
2. How do you know if your model is overfitting?
3. How do you know if your model is underfitting?
6/23
![Page 10: CSC411: Midterm Reviewmren/teach/csc411_19s/tut/... · 2019-08-31 · CSC411: Midterm Review Xiaohui Zeng February 7, 2019 Built from slides by James Lucas, Joe Wu and others 1/23.](https://reader034.fdocuments.us/reader034/viewer/2022050502/5f94646936427826c44b4c43/html5/thumbnails/10.jpg)
Some Questions
Question 2
Take labelled data (X, y).
1. Why should you use a validation set?
2. How do you know if your model is overfitting?
3. How do you know if your model is underfitting?
6/23
![Page 11: CSC411: Midterm Reviewmren/teach/csc411_19s/tut/... · 2019-08-31 · CSC411: Midterm Review Xiaohui Zeng February 7, 2019 Built from slides by James Lucas, Joe Wu and others 1/23.](https://reader034.fdocuments.us/reader034/viewer/2022050502/5f94646936427826c44b4c43/html5/thumbnails/11.jpg)
Some Questions
Question 2
Take labelled data (X, y).
1. Why should you use a validation set?
2. How do you know if your model is overfitting?
3. How do you know if your model is underfitting?
6/23
![Page 12: CSC411: Midterm Reviewmren/teach/csc411_19s/tut/... · 2019-08-31 · CSC411: Midterm Review Xiaohui Zeng February 7, 2019 Built from slides by James Lucas, Joe Wu and others 1/23.](https://reader034.fdocuments.us/reader034/viewer/2022050502/5f94646936427826c44b4c43/html5/thumbnails/12.jpg)
ML Models
1. Nearest Neighbours
2. Decision Trees
3. Ensembles
4. Linear Regression
5. Logistic Regression
6. SVMs
7/23
![Page 13: CSC411: Midterm Reviewmren/teach/csc411_19s/tut/... · 2019-08-31 · CSC411: Midterm Review Xiaohui Zeng February 7, 2019 Built from slides by James Lucas, Joe Wu and others 1/23.](https://reader034.fdocuments.us/reader034/viewer/2022050502/5f94646936427826c44b4c43/html5/thumbnails/13.jpg)
Nearest Neighbours
1. Decision Boundaries2. Choice of ‘k‘ vs. Generalization3. Curse of dimensionality
8/23
![Page 14: CSC411: Midterm Reviewmren/teach/csc411_19s/tut/... · 2019-08-31 · CSC411: Midterm Review Xiaohui Zeng February 7, 2019 Built from slides by James Lucas, Joe Wu and others 1/23.](https://reader034.fdocuments.us/reader034/viewer/2022050502/5f94646936427826c44b4c43/html5/thumbnails/14.jpg)
Decision Trees
1. Entropy: H(X ), H(Y |X )
2. Information Gain
3. Decision Boundaries
9/23
![Page 15: CSC411: Midterm Reviewmren/teach/csc411_19s/tut/... · 2019-08-31 · CSC411: Midterm Review Xiaohui Zeng February 7, 2019 Built from slides by James Lucas, Joe Wu and others 1/23.](https://reader034.fdocuments.us/reader034/viewer/2022050502/5f94646936427826c44b4c43/html5/thumbnails/15.jpg)
Bayes Optimality
Starting with square error: E[(y − t)2|x] = E[y2 − 2yt + t2|x];
1. choose a single value y∗ based on p(t|x):
(y − E[t|x])2 + Var(t|x)
2. treat y as a random variable:I bias termI variance termI Bayes error
10/23
![Page 16: CSC411: Midterm Reviewmren/teach/csc411_19s/tut/... · 2019-08-31 · CSC411: Midterm Review Xiaohui Zeng February 7, 2019 Built from slides by James Lucas, Joe Wu and others 1/23.](https://reader034.fdocuments.us/reader034/viewer/2022050502/5f94646936427826c44b4c43/html5/thumbnails/16.jpg)
Bayes Optimality
Starting with square error: E[(y − t)2|x] = E[y2 − 2yt + t2|x];
1. choose a single value y∗ based on p(t|x):
(y − E[t|x])2 + Var(t|x)
2. treat y as a random variable:I bias termI variance termI Bayes error
10/23
![Page 17: CSC411: Midterm Reviewmren/teach/csc411_19s/tut/... · 2019-08-31 · CSC411: Midterm Review Xiaohui Zeng February 7, 2019 Built from slides by James Lucas, Joe Wu and others 1/23.](https://reader034.fdocuments.us/reader034/viewer/2022050502/5f94646936427826c44b4c43/html5/thumbnails/17.jpg)
Bayes Optimality
Starting with square error: E[(y − t)2|x] = E[y2 − 2yt + t2|x];
1. choose a single value y∗ based on p(t|x):
(y − E[t|x])2 + Var(t|x)
2. treat y as a random variable:I bias termI variance termI Bayes error
10/23
![Page 18: CSC411: Midterm Reviewmren/teach/csc411_19s/tut/... · 2019-08-31 · CSC411: Midterm Review Xiaohui Zeng February 7, 2019 Built from slides by James Lucas, Joe Wu and others 1/23.](https://reader034.fdocuments.us/reader034/viewer/2022050502/5f94646936427826c44b4c43/html5/thumbnails/18.jpg)
Ensembles
1. Bagging/bootstrapaggregation
2. BoostingI decision stumps:
11/23
![Page 19: CSC411: Midterm Reviewmren/teach/csc411_19s/tut/... · 2019-08-31 · CSC411: Midterm Review Xiaohui Zeng February 7, 2019 Built from slides by James Lucas, Joe Wu and others 1/23.](https://reader034.fdocuments.us/reader034/viewer/2022050502/5f94646936427826c44b4c43/html5/thumbnails/19.jpg)
Linear Regression
1. Loss function
2. Direct solution
3. (Stochastic) GradientDescent
4. RegularizationI L1 vs L2 norm
12/23
![Page 20: CSC411: Midterm Reviewmren/teach/csc411_19s/tut/... · 2019-08-31 · CSC411: Midterm Review Xiaohui Zeng February 7, 2019 Built from slides by James Lucas, Joe Wu and others 1/23.](https://reader034.fdocuments.us/reader034/viewer/2022050502/5f94646936427826c44b4c43/html5/thumbnails/20.jpg)
Logistic Regression
1. Loss functionsI 0-1 loss?I l2 loss?I cross-entropy loss?
2. Binary vs. Multi-class
3. Decision BoundariesI p = 1
1+e−θx
13/23
![Page 21: CSC411: Midterm Reviewmren/teach/csc411_19s/tut/... · 2019-08-31 · CSC411: Midterm Review Xiaohui Zeng February 7, 2019 Built from slides by James Lucas, Joe Wu and others 1/23.](https://reader034.fdocuments.us/reader034/viewer/2022050502/5f94646936427826c44b4c43/html5/thumbnails/21.jpg)
SVMs
1. Hinge loss
2. Margins
14/23
![Page 22: CSC411: Midterm Reviewmren/teach/csc411_19s/tut/... · 2019-08-31 · CSC411: Midterm Review Xiaohui Zeng February 7, 2019 Built from slides by James Lucas, Joe Wu and others 1/23.](https://reader034.fdocuments.us/reader034/viewer/2022050502/5f94646936427826c44b4c43/html5/thumbnails/22.jpg)
Sample Question 1
First, we use a linear regression method to model this data.To test our linear regressor, we choose at random some datarecords to be a training set, and choose at random some of theremaining records to be a test set.Now let us increase the training set size gradually. As the trainingset size increases, what do you expect will happen with the meantraining and mean testing errors? (No explanation required)
I Mean Training Error: A. Increase; B. Decrease
I Mean Testing Error: A. Increase; B. Decrease
15/23
![Page 23: CSC411: Midterm Reviewmren/teach/csc411_19s/tut/... · 2019-08-31 · CSC411: Midterm Review Xiaohui Zeng February 7, 2019 Built from slides by James Lucas, Joe Wu and others 1/23.](https://reader034.fdocuments.us/reader034/viewer/2022050502/5f94646936427826c44b4c43/html5/thumbnails/23.jpg)
Q1 Solution
I The training error tends to increase. As more examples haveto be fitted, it becomes harder to ’hit’, or even come close, toall of them.
I The test error tends to decrease. As we take into accountmore examples when training, we have more information, andcan come up with a model that better resembles the truebehavior. More training examples lead to bettergeneralization.
16/23
![Page 24: CSC411: Midterm Reviewmren/teach/csc411_19s/tut/... · 2019-08-31 · CSC411: Midterm Review Xiaohui Zeng February 7, 2019 Built from slides by James Lucas, Joe Wu and others 1/23.](https://reader034.fdocuments.us/reader034/viewer/2022050502/5f94646936427826c44b4c43/html5/thumbnails/24.jpg)
Q1 Solution
I The training error tends to increase. As more examples haveto be fitted, it becomes harder to ’hit’, or even come close, toall of them.
I The test error tends to decrease. As we take into accountmore examples when training, we have more information, andcan come up with a model that better resembles the truebehavior. More training examples lead to bettergeneralization.
16/23
![Page 25: CSC411: Midterm Reviewmren/teach/csc411_19s/tut/... · 2019-08-31 · CSC411: Midterm Review Xiaohui Zeng February 7, 2019 Built from slides by James Lucas, Joe Wu and others 1/23.](https://reader034.fdocuments.us/reader034/viewer/2022050502/5f94646936427826c44b4c43/html5/thumbnails/25.jpg)
Sample Question 2
If variables X and Y are independent, is I (X |Y ) = 0? If yes, proveit. If no, give a counter example.
17/23
![Page 26: CSC411: Midterm Reviewmren/teach/csc411_19s/tut/... · 2019-08-31 · CSC411: Midterm Review Xiaohui Zeng February 7, 2019 Built from slides by James Lucas, Joe Wu and others 1/23.](https://reader034.fdocuments.us/reader034/viewer/2022050502/5f94646936427826c44b4c43/html5/thumbnails/26.jpg)
Q2 Solution
Recall that
I two random variables X and Y are independent if for allx ∈ Values(X ) and all y ∈ Values(Y ),P(X = x ,Y = y) = P(X = x)P(Y = y).
I H(X ) = −∑
x P(x)log2P(x)
18/23
![Page 27: CSC411: Midterm Reviewmren/teach/csc411_19s/tut/... · 2019-08-31 · CSC411: Midterm Review Xiaohui Zeng February 7, 2019 Built from slides by James Lucas, Joe Wu and others 1/23.](https://reader034.fdocuments.us/reader034/viewer/2022050502/5f94646936427826c44b4c43/html5/thumbnails/27.jpg)
Q2 Solution
I (X |Y ) = H(X )− H(X |Y ) (1)
= −∑x
P(x)log2P(x)−(−∑y
∑x
P(x , y)log2P(x |y))(2)
= −∑x
P(x)log2P(x)−(−∑y
P(y)∑x
P(x)log2P(x))
(3)
= −∑x
P(x)log2P(x)−(−∑x
P(x)log2P(x))
(4)
= 0 (5)
19/23
![Page 28: CSC411: Midterm Reviewmren/teach/csc411_19s/tut/... · 2019-08-31 · CSC411: Midterm Review Xiaohui Zeng February 7, 2019 Built from slides by James Lucas, Joe Wu and others 1/23.](https://reader034.fdocuments.us/reader034/viewer/2022050502/5f94646936427826c44b4c43/html5/thumbnails/28.jpg)
Sample Question 3
Given input x ∈ RD and target t ∈ R, consider a linear model ofthe form: y(x ,w) =
∑Di=1 wixi .. Now suppose a noisy pertubation
εi is added independently to each of the input variables xi . i.e.,xi = xi + εi , assume
I E[εi ] = 0
I for i 6= j : E[εiεj ] = 0
I E[ε2i ] = λ
We define the following objective that tries to be robust to noise
w∗ = arg minEε[(wT x− t)2]. (6)
Show that it is equivalent to minimizing L2 regularized linearregression, i.e.,:
w∗ = arg min[(wTx− tn)2 + λ||w||2].
20/23
![Page 29: CSC411: Midterm Reviewmren/teach/csc411_19s/tut/... · 2019-08-31 · CSC411: Midterm Review Xiaohui Zeng February 7, 2019 Built from slides by James Lucas, Joe Wu and others 1/23.](https://reader034.fdocuments.us/reader034/viewer/2022050502/5f94646936427826c44b4c43/html5/thumbnails/29.jpg)
Q3 Solution
Let
y =D∑i=1
wi (xi + εi ) = y +D∑i=1
wiεi ,
where y = wTx =∑D
i=1 wixi .
21/23
![Page 30: CSC411: Midterm Reviewmren/teach/csc411_19s/tut/... · 2019-08-31 · CSC411: Midterm Review Xiaohui Zeng February 7, 2019 Built from slides by James Lucas, Joe Wu and others 1/23.](https://reader034.fdocuments.us/reader034/viewer/2022050502/5f94646936427826c44b4c43/html5/thumbnails/30.jpg)
Q3 Solution
Then we start with
w∗ = arg minEε[(wT x− t)2] = arg minEε[(y − t)2],
the inner term
(y − t)2 = y2 − 2y t + t2 (7)
= (y +D∑i=1
wiεi )2 − 2t(y +
D∑i=1
wiεi ) + t2 (8)
= y2 + 2yD∑i=1
wiεi + (D∑i=1
wiεi )2 − 2ty − 2t
D∑i=1
wiεi + t2
(9)
then we take the expectation under the distribution of ε
22/23
![Page 31: CSC411: Midterm Reviewmren/teach/csc411_19s/tut/... · 2019-08-31 · CSC411: Midterm Review Xiaohui Zeng February 7, 2019 Built from slides by James Lucas, Joe Wu and others 1/23.](https://reader034.fdocuments.us/reader034/viewer/2022050502/5f94646936427826c44b4c43/html5/thumbnails/31.jpg)
Q3 Solution
That is,
Eε
[y2 + 2y
D∑i=1
wiεi + (D∑i=1
wiεi )2 − 2ty − 2t
D∑i=1
wiεi + t2]
we have the second and the fifth term equal to zero since E[εi ] = 0,while the third term become Eε[(
∑Di=1 wiεi )
2] = λ∑D
i=1 w2i .
Finally we see
Eε
[y2 + λ
D∑i=1
w2i − 2ty + t2
]= Eε[(y − t)2 + λ
D∑i=1
w2i ].
23/23