Lecture 15 Regularized Linear Regression

Nakul Gopalan

Georgia Tech

Regularized Linear Regression

Machine Learning CS 4641

These slides are adopted based on slides from Andrew Zisserman, Jonathan Taylor, Chao Zhang, Mahdi Roozbahani

and Yaser Abu-Mostafa.

• Linear regression:

• Y = θX

• MSE

Polynomial regression

Polynomial regression when order not known

Outline

• Overfitting and regularized learning

• Ridge regression

• Lasso regression

• Determining regularization strength

Regression: Recap

dd𝑧 = 1, 𝑥, 𝑥2, … , 𝑥𝑑 ∈ 𝑅𝑑

𝑦 = 𝑧𝜃

Which One is Better?

No, this can lead to overfitting!

D=0 D=1

D=9D=3

The Overfitting Problem

• The training error is very low, but the error on test set is large.

• The model captures not only patterns but also noisy nuisances

in the training data.

The Overfitting Problem

• In regression, overfitting is often associated with large Weights

(severe oscillation)

• How can we address overfitting?

Regularization (smart way to cure overfitting disease )

Fit a linear line on sinusoidal with just two data points

Put a brake on fitting

Who is the winner?

ҧ𝑔 𝑥 : average over all lines

bias=0.21; var=1.69 bias=0.23; var=0.33

Minimize 𝐸 𝜃 +𝜆

𝑁𝜃𝑇𝜃

Regularized Learning

Why this term leads to

regularization of parameters

𝑁 𝑁𝜃 𝜃 𝜃

Polynomial Model

Let’s rewrite it as:

𝑦 = 𝜃0 + 𝜃1𝑥 + 𝜃2𝑥2 +⋯+ 𝜃𝑑𝑥

𝑑 + 𝜖

𝑦 = 𝜃0 + 𝜃1𝑧1 + 𝜃2𝑧2 +⋯+ 𝜃𝑑𝑧d + 𝜖 = 𝒛𝜽

Regularizing is just constraining the weights (𝜽)

For example: let’s do a hard constraining

𝑦 = 𝜃0 + 𝜃1𝑧1 + 𝜃2𝑧2 +⋯+ 𝜃𝑑𝑧d

subject to

𝜃𝑑 = 0 𝑓𝑜𝑟 𝑑 > 2

𝑦 = 𝜃0 + 𝜃1𝑧1 + 𝜃2𝑧2 + 0 +⋯+ 0

Let’s not penalize 𝜃 in such a harsh waylet’s cut them some slack

𝜃 = 𝑎𝑟𝑔𝑚𝑖𝑛𝜃𝐸 𝜃 =1

𝑖=1

𝑦𝑖 − 𝑧𝑖𝜃2

𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒1

𝑁z𝜃 − 𝑦 𝑇 z𝜃 − 𝑦

Subject to 𝜃𝑡𝜃 ≤ 𝐶

For simplicity let’s call 𝜃𝑙𝑖𝑛 as weights’ solution for non constrained one

and 𝜃 for the constrained model.

𝐸 𝜃 =1

Possible graph for 𝐸 𝜃 for different values of 𝜃0 and 𝜃1and given

observation data

3D view Top view

Gradient of 𝜃𝑡𝜃

𝜃 =𝜃0𝜃1

⇒ 𝜃𝑡 𝜃 = 𝜃02 + 𝜃1

If you imagine standing at a point (𝜃0, 𝜃1), 𝛻(𝜃𝑇𝜃) tells you which direction you should

travel to increase the value of 𝜃𝑇𝜃 most rapidly.

𝛻 𝜃𝑇𝜃 =

𝜕(𝜃0)𝜃𝑇𝜃

𝜕(𝜃1)𝜃𝑇𝜃

=2𝜃02𝜃1

≈𝜃0𝜃1

𝛻 𝜃𝑇𝜃 is a vector field

any line passing through the center of the

circle

Plotting the regularization term 𝜃𝑡𝜃

𝜃 =𝜃0𝜃1

⇒ 𝜃𝑡 𝜃 = 𝜃02 + 𝜃1

3D view Top view

𝐸 𝜃 =1

𝜃𝑙𝑖𝑛

𝜃𝑙𝑖𝑛 is the solution (min absolute)

𝐸 𝜃 : which is constant on the surface of the ellipsoid

Find a solution in 𝜃𝑡𝜃that minimizes 𝐸 𝜃

Subject to 𝜃𝑡𝜃 ≤ 𝐶

Constraint and Loss

𝜃𝑙𝑖𝑛

Considering the below 𝐸 𝜃 and 𝐶what is a 𝜃 candidate here?

𝜃𝑙𝑖𝑛

𝜃𝑡𝜃 = 𝐶𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡 = 𝐶

𝐸 𝜃

𝛻𝐸 𝜃

𝛻𝐸: the gradient (rate) in objective function

which minimizes error (orthogonal to ellipse.

Changes happen in orthogonal direction)

What is the orthogonal direction on the other surface?

It is just 𝜃, a line passing through center of

the circle

𝛻(𝜃𝑡𝜃)Applying a constrain 𝜃𝑡𝜃, where

the best solution happens?

On the boundary of the circle, as it is the

closest one to the minimum absolute

Considering the below 𝐸 𝜃 and 𝐶what is the bew 𝜃 solution here?

𝜃𝑙𝑖𝑛

𝐸 𝜃

𝛻𝐸(𝜃)

𝛻(𝜃𝑡𝜃)

𝛻𝐸 𝜃 ∝ −𝜃

𝛻𝐸 𝜃 = −2𝜆

𝑁𝜃

𝛻𝐸 𝜃 + 2𝜆

𝑁𝜃 = 0

Let’s do integration

Minimize 𝐸 𝜃 +𝜆

𝑁𝜃𝑇𝜃

𝐶 ↑ 𝜆 ↓

Outline

Ridge Regression

𝑁 𝑁

d d𝑓(𝑥, 𝜃) = 𝜃0 + 𝜃1𝑧1 + 𝜃2𝑧2 +⋯+ 𝜃𝑑𝑧d + 𝜖 = 𝒛𝜽

𝑁 𝑁𝜃 𝜃 𝜃

Solving for the Weights 𝜃

𝑧(𝑥1)𝜃𝑧(𝑥2)𝜃

𝑧(𝑥𝑛)𝜃

𝑧𝜃

𝜃0𝜃1

𝜃𝑑

𝑧1(𝑥1) 𝑧𝑑(𝑥1)…

𝑧1(𝑥2) 𝑧𝑑(𝑥2)…

𝑧1(𝑥𝑛) 𝑧𝑑(𝑥𝑛)…

𝑧𝜃𝜃0

𝑁 𝑁

𝑁𝑁

𝜃 𝜃 𝜃

𝜃𝑦𝑖 − 𝑧𝑖𝜃2 +

𝑁𝜃 2

𝑦𝑖 − 𝑧𝜃 2 +𝜆

𝑁𝜃 2

𝑑𝜃= −𝑧𝑇 𝑦 − 𝑧𝜃 + 𝜆𝜃

𝑧𝑇𝑧 + 𝜆𝐼 𝜃 = 𝑧𝑇𝑦

𝜃 = 𝑧𝑇𝑧 + 𝜆𝐼 −1𝑧𝑇𝑦

Dx1 DxD DxN Nx1

𝜃 = 𝑧𝑇𝑧 + 𝜆𝐼 −1 𝑧𝑇 𝑦

𝜃 = 𝑧𝑇𝑧 −1𝑧𝑇𝑦 = 𝑧+𝑦

𝜃 𝑧𝑇𝑦

𝑧+ 𝑧

𝑧𝑇𝑧𝑧

Ridge Regression Example

𝑁 𝑁

𝑓 𝑥, 𝜃 = 𝑧𝜃 𝑧: 𝑥 → 𝑧

෨𝐸 𝜃 =1

𝑖=1

𝑓 𝑥𝑖 , 𝜃 − 𝑦𝑖2 +

𝑁𝜃 2 𝜃

D = 3 D = 5

Outline

Regularized Regression

Now let’s look at another regularization choice.

𝜃 𝜃

The Lasso Regularization (norm one)

𝜃 𝜃

Let’s say we have two parameters (𝜃0 𝑎𝑛𝑑 𝜃1)

𝜃𝑙𝑖𝑛

𝐸 𝜃 : which is constant on the surface of the ellipsoid

𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝐸 𝜃 =1

𝑁z𝑤 − 𝑦 𝑇 z𝜃 − 𝑦

Subject to 𝜽 ≤ 𝑪

−𝑪

Sharp edges

𝜃 =𝜃0𝟎

Interesting way for

feature selection

Outline

Leave-One-Out Cross Validation

K-Fold Cross Validation

Choosing λ Using Validation Dataset

Pick up the lambda with the lowest

mean value of rmse calculated by

Cross Validation approach

Take-Home Messages

• What is overfitting

• What is regularization

• How does Ridge regression work

• Sparsity properties of Lasso regression

• How to choose the regularization coefficient λ

Lecture 15 Regularized Linear Regression

Documents

Transcript of Lecture 15 Regularized Linear Regression

Bayesian Regression: Basis Functions, MLE & Regularized Least … · 2019-02-06 · Bayesian Regression: Basis Functions, MLE & Regularized Least Squares, Multiple Outputs, Inference

An Improved GLMNET for L1-regularized Logistic Regression

A Uniﬁed Framework for Sparse Relaxed Regularized Regression: … · 2018. 11. 9. · 1 A Uniﬁed Framework for Sparse Relaxed Regularized Regression: SR3 Peng Zheng y, Travis

Regularized Discriminant Analysis, Ridge Regression and Beyond

Estimating Player Contribution in Hockey With Regularized Logistic Regression

REVIEW OF SIMPLE LINEAR REGRESSION SIMPLE LINEAR REGRESSION Determining the Regression ... of linear... · 2012. 6. 21. · REVIEW OF SIMPLE LINEAR REGRESSION SIMPLE LINEAR REGRESSION

1 Curve-Fitting Polynomial Interpolation. 2 Curve Fitting Regression Linear Regression Polynomial Regression Multiple Linear Regression Non-linear Regression.

Regression Linear Regression Regression Trees

Regression Linear Regression

Multiple Linear Regression - Analysis Made Easy · Multiple Linear Regression The MULTIPLE LINEAR REGRESSION command performs simple multiple regression using least squares. Linear

SIMPLE LINEAR REGRESSION. 2 Simple Regression Linear Regression.

1 Curve-Fitting Interpolation. 2 Curve Fitting Regression Linear Regression Polynomial Regression Multiple Linear Regression Non-linear Regression Interpolation.

Graph Regularized Meta path Based Transductive Regression in ... · Graph Regularized Meta-path Based Transductive Regression in Heterogeneous Information Network Mengting Wan, Yunbo

Linear regression: Part 1 - NTNU · Linear regression: Part 1. Lecture Outline What are linear models?-EX1: What is the ‘best’ line? What is linear regression? ... Linear regression

Econometrics notes (Introduction, Simple Linear regression, Multiple linear regression)

Linear and Non-Linear Regression: Powerful and Very ...business.expertjournals.com/ark:/16759/EJBM_322... · Keywords: Linear Regression, Non-Linear Regression, Best-Fitting Model,

PERFORMANCE ANALYSIS OF REGULARIZED LINEAR MODELS … · International Journal of Computational Science and Information Technology (IJCSITY) Vol.1, No.4, November 2013 112 regression

1 Curve-Fitting Spline Interpolation. 2 Curve Fitting Regression Linear Regression Polynomial Regression Multiple Linear Regression Non-linear Regression.

Regularized and Robust Regression Methods for …...Regularized and Robust Regression Methods for High-Dimensional Data A thesis submitted for degree of Doctorate of Philosophy by

Self Scaled Regularized Robust Regressionopenaccess.thecvf.com/content_cvpr_2015/papers/Wang_Self...Self Scaled Regularized Robust Regression Yin Wang Caglayan Dicle Mario Sznaier