Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human...
Transcript of Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human...
![Page 1: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/1.jpg)
Regression & Regularization
Danna GurariUniversity of Texas at Austin
Spring 2020
https://www.ischool.utexas.edu/~dannag/Courses/IntroToMachineLearning/CourseContent.html
![Page 2: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/2.jpg)
Review
• Last week:• Machine learning today• History of machine learning• How does a machine learn?
• Assignments (Canvas)• Problem Set 1 due yesterday• Problem Set 2 due next week• Lab Assignment 1 due in two weeks
• Questions?
![Page 3: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/3.jpg)
Today’s Topics
• Regression applications
• Evaluating regression models
• Background: notation
• Linear regression
• Polynomial regression
• Regularization (Ridge regression and Lasso regression)
• Lab
![Page 4: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/4.jpg)
Today’s Topics
• Regression applications
• Evaluating regression models
• Background: notation
• Linear regression
• Polynomial regression
• Regularization (Ridge regression and Lasso regression)
• Lab
![Page 5: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/5.jpg)
Today’s Focus: Regression
Predict continuous value
![Page 6: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/6.jpg)
Predict Life Expectancy
![Page 7: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/7.jpg)
Predict Perceived “Hot”-ness
![Page 8: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/8.jpg)
Predict Price to Charge for Your Home
![Page 9: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/9.jpg)
Predict Future Value of a House You Buy
![Page 10: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/10.jpg)
Predict Future Stock Price
![Page 11: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/11.jpg)
Predict Credit Score for Loan Lenders
https://emerj.com/ai-sector-overviews/artificial-intelligence-applications-lending-loan-management/
Demo: https://www.youtube.com/watch?time_continue=6&v=0bEJO4Twgu4&feature=emb_logo
![Page 12: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/12.jpg)
What Else to Predict?
Insurance Cost Popularity of Social Media Posts Public Opinion
Political Party PreferenceFactory Analysis Call Center Complaints
Class RatingsWeather Animal Behavior
![Page 13: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/13.jpg)
Today’s Topics
• Regression applications
• Evaluating regression models
• Background: notation
• Linear regression
• Polynomial regression
• Regularization (Ridge regression and Lasso regression)
• Lab
![Page 14: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/14.jpg)
Goal: Design Models that Generalize Well to New, Previously Unseen Examples
$1,045,864 $918,000
Example:
Cost: $450,900 $725,000
![Page 15: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/15.jpg)
Goal: Design Models that Generalize Well to New, Previously Unseen Examples
$1,045,864 $918,000
Example:
Cost:
Training Data Test Data
$450,900
1. Split data into a “training set” and “
$725,000
![Page 16: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/16.jpg)
Goal: Design Models that Generalize Well to New, Previously Unseen Examples
$1,045,864 $918,000
Example:
Cost:
Training Data
$450,900
2. Train model on “training set” to try to minimize prediction error on it
![Page 17: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/17.jpg)
Goal: Design Models that Generalize Well to New, Previously Unseen Examples
3. Apply trained model on “ ” to measure generalization error
Example:
Cost:
Test Data
Prediction Model
$725,000
Predicted Cost: ?
![Page 18: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/18.jpg)
Goal: Design Models that Generalize Well to New, Previously Unseen Examples
3. Apply trained model on “ ” to measure generalization error
Example:
Cost:
Test Data
Prediction Model
$725,000
Predicted Cost: $615,000
![Page 19: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/19.jpg)
Regression Evaluation Metrics
• Mean absolute error• What is the range of possible values?• Are larger values better or worse?
Results: e.g.,
![Page 20: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/20.jpg)
Regression Evaluation Metrics
• Mean absolute error• Mean squared error
• Why square the errors?
Results: e.g.,
![Page 21: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/21.jpg)
Today’s Topics
• Regression applications
• Evaluating regression models
• Background: notation
• Linear regression
• Polynomial regression
• Regularization (Ridge regression and Lasso regression)
• Lab
![Page 22: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/22.jpg)
Matrices and Vectors
• X : each feature is in its own column and each sample is in its own row• y : each row is the target value for the sample
0.5 121 0.3
0.7 100 0.81 Sample 1:
Sample N:
Feature 1 Feature 2 Feature M Label
0.8
0.1
![Page 23: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/23.jpg)
Matrices and Vectors
• X : each feature is in its own column and each sample is in its own row• y : each row is the target value for the sample
![Page 24: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/24.jpg)
Vector-Vector Product
e.g.,
Excellent Review: http://www.cs.cmu.edu/~zkolter/course/15-884/linalg-review.pdf
1 2 3
4
5
6
= (1x4 + 2x5 + 3x6)
= 32
w1 w2 w3
x1
x2
x3
![Page 25: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/25.jpg)
Class Task: Predict Your Salary If You Become a Machine Learning Engineer
![Page 26: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/26.jpg)
Class Task: Predict Your Salary If You Become a Machine Learning Engineer
• What features would be predictive of your salary?
• Where can you find data for model training and evaluation (features + true values)?
• What would introduce noise to your data?
• Create a matrix/vector representation of three examples.
![Page 27: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/27.jpg)
Today’s Topics
• Regression applications
• Evaluating regression models
• Background: notation
• Linear regression
• Polynomial regression
• Regularization (Ridge regression and Lasso regression)
• Lab
![Page 28: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/28.jpg)
Linear Regression: Historical Context
1613
Human “Computers”
1945
First programmable machine
Turing Test & AI
1959
Machine Learning
1956Early 1800s 1974 1980 1987 1993
1rst AI Winter
2nd AI Winter
Learning Linear Regression Models with Least Squares
Legendre (1805) and Gauss (1809): https://en.wikipedia.org/wiki/Linear_regression
![Page 29: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/29.jpg)
Linear Regression Model
• General formula:
• How many features are there? • p+1
• How many parameters are there? • p+2
Feature vector: x = x[0], x[1], …, x[p]
Parameter vector to learn: w = w[0], w[1], …, w[p]
Predicted value
![Page 30: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/30.jpg)
“Simple” Linear Regression Model
• Formula:
• How many features are there? • 1
• How many parameters are there? • 2
Feature vector
Parameter vector to learn
Predicted value Feature xTa
rget
(Line)
Figure Credit: http://sli.ics.uci.edu/Classes/2015W-273a?action=download&upname=04-linRegress.pdf
![Page 31: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/31.jpg)
x[0] x[1]x[0] x[1]
“Multiple” Linear Regression Model
• Formula:
• How many features are there? • 2
• How many parameters are there? • 3
Feature vector
Parameter vector to learn
Predicted value
(Plane)
Figure Credit: http://sli.ics.uci.edu/Classes/2015W-273a?action=download&upname=04-linRegress.pdf
![Page 32: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/32.jpg)
Linear Regression Model: How to Learn?
• Weight coefficients: • Indicates how much the predicted value will vary when that feature varies
while holding all the other features constant
![Page 33: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/33.jpg)
Linear Regression Model: Learning Parameters
• Split data into a “training set” and “ ”
0.5 121 0.3
0.7 100 0.81 Sample 1:
Sample N:
Feature 1 Feature 2 Feature M Label
Yes
No
![Page 34: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/34.jpg)
Linear Regression Model: Learning Parameters
• Least squares: minimize total squared error (“residual”) on “training set”• Why square the error?
Figure Credit: http://sli.ics.uci.edu/Classes/2015W-273a?action=download&upname=04-linRegress.pdf
![Page 35: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/35.jpg)
• Least squares: minimize total squared error (“residual”) on “training set”
Linear Regression Model: Learning Parameters
Figure Source: https://web.stanford.edu/~hastie/Papers/ESLII.pdf
![Page 36: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/36.jpg)
• Least squares: minimize total squared error (“residual”) on “training set”• Take derivatives, set to zero, and solve for parameters
Great tutorial: http://www.cns.nyu.edu/~eero/NOTES/leastSquares.pdf
Linear Regression Model: Learning Parameters
![Page 37: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/37.jpg)
Linear Regression Model: Learning Parameters
• Least squares: minimize total squared error (“residual”) on “training set”• What would be the impact of outliers in the training data?
Figure Credit: http://sli.ics.uci.edu/Classes/2015W-273a?action=download&upname=04-linRegress.pdf
![Page 38: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/38.jpg)
Linear Regression: Predict Salary of ML Engineer
• How would you write the linear model equation?
• How is the weight of different predictive cues learned?
(Solution is a hyperplane)
![Page 39: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/39.jpg)
Today’s Topics
• Regression applications
• Evaluating regression models
• Background: notation
• Linear regression
• Polynomial regression
• Regularization (Ridge regression and Lasso regression)
• Lab
![Page 40: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/40.jpg)
Linear Regression: Historical Context
1613
Human “Computers”
1945
First programmable machine
Turing Test & AI
1959
Machine Learning
19561815 1974 1980 1987 1993
1rst AI Winter
2nd AI Winter
Gergonne, J. D. (November 1974) [1815]. "The application of the method of least squares to the interpolation of sequences". Historia Mathematica (Translated by Ralph St. John and S. M. Stigler from the 1815 French ed.). 1 (4): 439–447.
Learning Polynomial Regression Models with Least Squares
Linear regression
![Page 41: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/41.jpg)
Linear Models: When They Are Not Good Enough, Increase Representational Capacity
linear equations(lowest capacity)
polynomial equations(higher capacity)
polynomial equations(highest capacity)
![Page 42: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/42.jpg)
Polynomial Regression: Transform Features to Model Non-Linear Relationships• e.g., (Recall) Formula:
• e.g., New Formula:
• Still a linear model!• But can now model more complex relationships!!
Feature vector
Parameter vector
Predicted value
![Page 43: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/43.jpg)
• e.g., feature conversion for polynomial degree 3
• e.g., What is the new feature vector with polynomial degree up to 3?
Example 1:
Example 3: Example 2:
2
43
Example 1:
Example 3: Example 2:
2
43
4
169
8
6427
Polynomial Regression: Transform Features to Model Non-Linear Relationships
![Page 44: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/44.jpg)
• General idea: project data into a higher dimension to fit more complicated relationships to a linear fit
• How to project data into a higher dimension?
e.g.,
e.g.,
Polynomial Regression: Transform Features to Model Non-Linear Relationships
![Page 45: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/45.jpg)
Polynomial Regression Model: Learning Parameters• M-th order polynomial function:
• Still linear model, so can learn with same approach as for linear regression
![Page 46: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/46.jpg)
• Example plot of error on training and test sets:• What happens to training data error with
larger polynomial order?• Shrinks
• What happens to test data error with larger polynomial order?
• Error shrinks and then grows• Higher order can model noise!• Higher order is more likely therefore to
“overfit” to training data and so not generalize to new unobserved test data
Polynomial Regression Model: What Feature Transformation to Use?
![Page 47: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/47.jpg)
How to Avoid Overfitting?
• Use lower degree polynomial:
• Risk: may be underfitting again
![Page 48: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/48.jpg)
How to Avoid Overfitting?
• Add more training data
• What are the challenges/costs with collecting more training data?
![Page 49: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/49.jpg)
How to Avoid Overfitting?
• Or regularize the model…
![Page 50: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/50.jpg)
Today’s Topics
• Regression applications
• Evaluating regression models
• Background: notation
• Linear regression
• Polynomial regression
• Regularization (Ridge regression and Lasso regression)
• Lab
![Page 51: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/51.jpg)
Linear Regression: Historical Context
1613
Human “Computers”
1945
First programmable machine
Turing Test & AI
1959
Machine Learning
19561815 1974 1980 1987 1993
1rst AI Winter
2nd AI Winter
Arthur E. Hoerl and Robert W. Kennard, "Ridge regression: Biased estimation for nonorthogonal problems", Technometrics. 1970.
Linear & polynomial regression
1970
Ridge Regression
Santosa, Fadil; Symes, William W. (1986). "Linear inversion of band-limited reflection seismograms". SIAM Journal on Scientific and Statistical Computing. SIAM. 7 (4): 1307–1330.
1986
Lasso Regression
Tibshirani, Robert (1996). "Regression Shrinkage and Selection via the lasso". Journal of the Royal Statistical Society. Series B (methodological). Wiley. 58 (1): 267–88.
![Page 52: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/52.jpg)
Problem: Overfitting
Figure Source: https://en.wikipedia.org/wiki/Overfitting
![Page 53: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/53.jpg)
Problem: Overfitting
• e.g., weights learned for fitting a model to a sine wave function (polynomial degrees 0, 1, …, 9)
• Sign of overfitting: weights blow up and cancel each other out to fit the training data
![Page 54: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/54.jpg)
Solution: Regularization
• Regularize model (add constraints)
• Idea: add constraint to minimize presence of large weights in models!
![Page 55: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/55.jpg)
Regularization
• Idea: add constraint to minimize presence of large weights in models• Recall: we previously learned models by minimizing sum of squared errors (SSE)
for all n training examples:
![Page 56: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/56.jpg)
Regularization
• Idea: add constraint to minimize presence of large weights in models• Recall: we previously learned models by minimizing sum of squared errors (SSE)
for all n training examples:
• Ridge Regression: add constraint to penalize squared weight values
• Lasso Regression: add constraint to penalize absolute weight values
![Page 57: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/57.jpg)
Regularization: How to Set Alpha?
• Ridge Regression: add constraint to penalize squared weight values
• Lasso Regression: add constraint to penalize absolute weight values
What happens when you set alpha to a small value?
What happens when you set alpha to a large value?
Recall:
![Page 58: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/58.jpg)
Regularization: How to Set Alpha?
![Page 59: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/59.jpg)
Regularization: How to Set Alpha?
• Split training data into “train” and “validation” datasets
• Algorithm: brute-force, exhaustive approach by evaluating every alpha value to find optimal hyperparameter
![Page 60: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/60.jpg)
Why Choose Lasso Instead of Ridge Regression?
• Ridge Regression: add constraint to penalize squared weight values
• Lasso Regression: add constraint to penalize absolute weight values
Lasso: typically creates sparse weight vectors (sets weights to 0)• Good to use when there are MANY features and few believed
to be relevant • Increases interpretability
![Page 61: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/61.jpg)
Why Choose Lasso Instead of Ridge Regression? (2D feature example)
https://web.stanford.edu/~hastie/Papers/ESLII.pdf
Contour of least square error function
Contour of regularization constraint functions:
Lasso Ridge
Minimizes SSE cost
Minimizes constraint penalty
Minimizes cost + penalty
![Page 62: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/62.jpg)
Today’s Topics
• Regression applications
• Evaluating regression models
• Background: notation
• Linear regression
• Polynomial regression
• Regularization (Ridge regression and Lasso regression)
• Lab
![Page 63: Regression & Regularizationdannag/Courses/... · Linear Regression: Historical Context 1613 Human “Computers” 1945 First programmable machine Turing Test & AI 1959 Machine Learning](https://reader033.fdocuments.us/reader033/viewer/2022052803/5f2a47270844a320f1041b40/html5/thumbnails/63.jpg)
Resources Used for Today’s Slides
• Deep Learning by Goodfellow et. al• pgs. 29-38 for background on linear algebra (e.g., matrices, norms)
• http://www.cs.utoronto.ca/~fidler/teaching/2015/slides/CSC411/• http://www.cs.cmu.edu/~epxing/Class/10701/lecture.html• http://web.cs.ucla.edu/~sriram/courses/cs188.winter-
2017/html/index.html• https://people.eecs.berkeley.edu/~jrs/189/• http://alex.smola.org/teaching/cmu2013-10-701/• http://sli.ics.uci.edu/Classes/2015W-273a