TMI04.2 Linear Regression
-
Upload
tunguyenlam -
Category
Documents
-
view
12 -
download
2
description
Transcript of TMI04.2 Linear Regression
-
Institut fr Informatik
Linear regressionwith gradient descent
Ingmar Schuster
Patrick Jhnichen
using slides by Andrew Ng
-
Linear regression w. gradient descent 2
This lecture covers
Linear Regression
Hypothesis formulation, hypthesis space
Optimizing Cost with Gradient Descent
Using multiple input features with Linear Regression
Feature Scaling
Nonlinear Regression
Optimizing Cost using derivatives
-
Institut fr Informatik
Linear Regression
-
Linear regression w. gradient descent 4
Price for buying a flat in Berlin
Supervised learning problem
Expected answer available for each example in data
Regression Problem
Prediction of continuous output
-
Linear regression w. gradient descent 5
Training data of flat prices
m Number of training examples
x is input (predictor) variable features in ML-speek
y is output (response) variable
Notation
Square meters Price in 1000
73 174
146 367
38 69
124 257
... ...
-
Linear regression w. gradient descent 6
Learning procedure
Hypothesis parameters
linear regression,one input variable (univariate)
How to choose parameters?
Training data
Learning Algorithm
hypothesis(mapping between input and output)
Sizeof flat
Estimatedprice
-
Optimization objective
Purpose of learning algorithm expressed inoptimization objective and cost function (often called J)
Few false negatives
...
Fit data well
Few false positives
-
Linear regression w. gradient descent 8
Fitting data well: least squares cost function
In regression almost always want to fit data well
smallest average distance to points in training data(h(x) close to y for (x,y) in training data)
Cost function often named J
Squaring
Penalty for positive and negative deviations the same
Penalty for large deviations stronger
Number of training instances
Number of training instances
-
Linear regression w. gradient descent 9
Optimizing Costwith Gradient Descent
-
Linear regression w. gradient descent 10
Gradient Descent Outline
Want to minimize
Start with random
Keep changing to reduceuntil we end up at minimum
-
3D plots and contour plots
Stepwise descent towards minimum
Stepwise descent towards minimum
Derivatives work only for
few parameters
Derivatives work only for
few parameters
[plot by Andrew Ng]
-
Linear regression w. gradient descent 12
Gradient descent
beware: incrementalupdate incorrect!
steps become smallerwithout changing learning rate
steps become smallerwithout changing learning rate
partial derivative
partial derivative
-
Linear regression w. gradient descent 13
Learning Rate considerations
Small learning rate leads to slowconvergence
Overly large learning rate may not lead to convergence or to divergence
Often
-
Linear regression w. gradient descent 14
Checking convergence
Gradient descent works correctly if decreases with every step
Possible convergence criterion: converged if decreases by less than constant
-
Linear regression w. gradient descent 15
Local Minima
Gradient descent can get stuck at local minima(e.g. J not squared error for regression with only one variable)
Random restart with differentparameter(s)
X
-
Linear regression w. gradient descent 16
Variants of Gradient Descent
Using multiple input features
-
Linear regression w. gradient descent 17
Multiple features
Square meters
Bedrooms Floors Age of building (years)
Price in 1000
x1 x2 x3 x4 y
200 5 1 45 460
131 3 2 40 232
142 3 2 30 315
756 2 1 36 178
Notation
-
Linear regression w. gradient descent 18
Hypothesis representation
More compact
with definition
-
Linear regression w. gradient descent 19
Gradient descent for multiple variables
Generalized cost function
Generalized gradient descent
-
Linear regression w. gradient descent 20
Partial derivative of cost function for multiple variables
Calculating the partial derivative
-
Linear regression w. gradient descent 21
Gradient descent for multiple variables
Simplified gradient descent
-
Linear regression w. gradient descent 22
Conversion considerations for multiple variables
With multiple variables, comparison of variance in data is lost (scales can vary strongly)
Gradient descent converges faster for features on similar scale
Square meters 30 - 400
Bedrooms 1 - 10
Price80 000-2 000 000
-
Linear regression w. gradient descent 23
Feature Scaling
-
Linear regression w. gradient descent 24
Feature scaling
Different approaches for converting features to comparable scale
Min-Max-Scaling makes all data fall into range [0, 1]
(for single data point of feature j)
Z-score conversion
-
Linear regression w. gradient descent 25
Z-Score conversion
Center data on 0
Scale data so majority falls into range [-1, 1]
Z-score conversion of single data point for feature j
empirical standard deviation (sigma)
empirical standard deviation (sigma)
mean / empirical expected value
(mu)
mean / empirical expected value
(mu)
-
Linear regression w. gradient descent 26
Visualizing standard deviation
-
Linear regression w. gradient descent 27
Nonlinear Regression(by cheap trickery)
-
Linear regression w. gradient descent 28
Nonlinear Regression Problems
-
Linear regression w. gradient descent 29
Nonlinear Regression Problems (linear approximation)
-
Linear regression w. gradient descent 30
Nonlinear Regression Problems (nonlinear hypothesis)
-
Linear regression w. gradient descent 31
Nonlinear Regression with cheap trickery
Linear Regression can be used for Nonlinear Problems
Choose nonlinear hypothesis space
-
Linear regression w. gradient descent 32
Optimizing costusing derivatives
-
Linear regression w. gradient descent 33
Comparison Gradient Descent vs. Setting derivative = 0
Instead of Gradient descent solve
for all i
-
Linear regression w. gradient descent 34
Comparison Gradient Descent vs. Setting derivative = 0
Gradient Descent
Need to choose
Needs many iterations, random restarts etc.
Works well for many features
Derivation
No need to choose
No iterations
Slow for many features
-
Linear regression w. gradient descent 35
This lecture covers
Linear Regression
Hypothesis formulation, hypthesis space
Optimizing Cost with Gradient Descent
Using multiple input features with Linear Regression
Feature Scaling
Nonlinear Regression
Optimizing Cost using derivatives
-
Linear regression w. gradient descent 36
Pictures
Some public domain plots from en.wikipedia.org and de.wikipedia.org
Folie 1Folie 2Folie 3Folie 4Folie 5Folie 6Folie 7Folie 8Folie 9Folie 10Folie 11Folie 12Folie 13Folie 14Folie 15Folie 16Folie 17Folie 18Folie 19Folie 20Folie 21Folie 22Folie 23Folie 24Folie 25Folie 26Folie 27Folie 28Folie 29Folie 30Folie 31Folie 32Folie 33Folie 34Folie 35Folie 36