SIMPLE LINEAR REGRESSION. 2 Simple Regression Linear Regression.
Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression...
Transcript of Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression...
![Page 1: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec4d39822367157d34e4755/html5/thumbnails/1.jpg)
Linear Regression
PROF. SAMEER SINGH
FALL 2017
CS 273A: Machine Learning
based on slides by Alex Ihler
![Page 2: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec4d39822367157d34e4755/html5/thumbnails/2.jpg)
Machine Learning
Bayes Error Wrapup
Gaussian Bayes Models
Linear Regression: Definition and Cost
Gradient Descent Algorithms
![Page 3: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec4d39822367157d34e4755/html5/thumbnails/3.jpg)
Measuring errorsConfusion matrix
Can extend to more classes
True positive rate: #(y=1 , ŷ=1) / #(y=1) -- “sensitivity”
False negative rate: #(y=1 , ŷ=0) / #(y=1)
False positive rate: #(y=0 , ŷ=1) / #(y=0)
True negative rate: #(y=0 , ŷ=0) / #(y=0) -- “specificity”
Predict 0 Predict 1
Y=0 380 5
Y=1 338 3
![Page 4: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec4d39822367157d34e4755/html5/thumbnails/4.jpg)
Decision Surfaces
p(x , y=1 )p(x , y=0 )
<
>
<
>
Add multiplier alpha:
![Page 5: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec4d39822367157d34e4755/html5/thumbnails/5.jpg)
ROC CurvesCharacterize performance as we vary the decision threshold?
8
False positive rate
= 1 - specificity
Tru
e p
ositiv
e r
ate
= s
ensitiv
ity
Guess all 0
Guess all 1
Guess at random,
proportion alpha
Bayes classifier,
multiplier alpha
<
>
![Page 6: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec4d39822367157d34e4755/html5/thumbnails/6.jpg)
Probabilistic vs. Discriminative learning
“Probabilistic” learning◦ Conditional models just explain y: p(y|x)◦ Generative models also explain x: p(x,y)
◦ Often a component of unsupervised or semi-supervised learning
◦ Bayes and Naïve Bayes classifiers are generative models
“Discriminative” learning:Output prediction ŷ(x)
“Probabilistic” learning:Output probability p(y|x)
(expresses confidence in outcomes)
![Page 7: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec4d39822367157d34e4755/html5/thumbnails/7.jpg)
Probabilistic vs. Discriminative learning
Can use ROC curves for discriminative models also:◦ Some notion of confidence, but doesn’t correspond to a probability
◦ In our code: “predictSoft” (vs. hard prediction, “predict”)
>> learner = gaussianBayesClassify(X,Y); % build a classifier>> Ysoft = predictSoft(learner, X); % M x C matrix of confidences>> plotSoftClassify2D(learner,X,Y); % shaded confidence plot
“Discriminative” learning:Output prediction ŷ(x)
“Probabilistic” learning:Output probability p(y|x)
(expresses confidence in outcomes)
![Page 8: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec4d39822367157d34e4755/html5/thumbnails/8.jpg)
ROC CurvesCharacterize performance as we vary our confidence threshold?
False positive rate
= 1 - specificity
Tru
e p
ositiv
e r
ate
= s
ensitiv
ity
Guess all 0
Guess all 1
Guess at random, proportion alphaClassifier A
Classifier B
Reduce performance to one number?AUC = “area under the ROC curve”
0.5 < AUC < 1
![Page 9: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec4d39822367157d34e4755/html5/thumbnails/9.jpg)
Machine Learning
Bayes Error Wrapup
Gaussian Bayes Models
Linear Regression: Definition and Cost
Gradient Descent Algorithms
![Page 10: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec4d39822367157d34e4755/html5/thumbnails/10.jpg)
Gaussian models“Bayes optimal” decision
◦ Choose most likely class
Decision boundary◦ Places where probabilities equal
What shape is the boundary?
![Page 11: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec4d39822367157d34e4755/html5/thumbnails/11.jpg)
Gaussian modelsBayes optimal decision boundary
◦ p(y=0 | x) = p(y=1 | x)
◦ Transition point between p(y=0|x) >/< p(y=1|x)
Assume Gaussian models with equal covariances
![Page 12: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec4d39822367157d34e4755/html5/thumbnails/12.jpg)
Gaussian exampleSpherical covariance: Σ = σ2 I
Decision rule
-2 -1 0 1 2 3 4 5-2
-1
0
1
2
3
4
5
![Page 13: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec4d39822367157d34e4755/html5/thumbnails/13.jpg)
Non-spherical Gaussian distributionsEqual covariances => still linear decision rule
◦ May be “modulated” by variance direction
◦ Scales; rotates (if correlated)
-10 -8 -6 -4 -2 0 2 4 6 8 10-2
-1
0
1
2
3
4
Example:Variance[ 3 0 ][ 0 .25 ]
![Page 14: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec4d39822367157d34e4755/html5/thumbnails/14.jpg)
Class posterior probabilitiesConsider comparing two classes
◦ p(x | y=0) * p(y=0) vs p(x | y=1) * p(y=1)
◦ Write probability of each class as
◦ p(y=0 | x) = p(y=0, x) / p(x)
◦ = p(y=0, x) / ( p(y=0,x) + p(y=1,x) )
◦ = 1 / (1 + exp( - f ) )
◦ f = log [ p(y=0, x) / p(y=1, x) ] the logistic function, or logistic sigmoid
![Page 15: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec4d39822367157d34e4755/html5/thumbnails/15.jpg)
Gaussian modelsReturn to Gaussian models with equal covariances
Now we also know that the probability of each class is given by:p(y=0 | x) = Logistic( f ) = Logistic( aT x + b )
We’ll see this form again soon…
f
![Page 16: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec4d39822367157d34e4755/html5/thumbnails/16.jpg)
Machine Learning
Bayes Error Wrapup
Gaussian Bayes Models
Linear Regression: Definition and Cost
Gradient Descent Algorithms
![Page 17: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec4d39822367157d34e4755/html5/thumbnails/17.jpg)
Supervised learningNotation
◦ Features x
◦ Targets y
◦ Predictions ŷ
◦ Parameters θProgram (“Learner”)
Characterized by some “parameters” θ
Procedure (using θ) that outputs a prediction
Training data (examples)
Features
Learning algorithm
Change θImprove performance
Feedback / Target values Score performance
(“cost function”)
![Page 18: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec4d39822367157d34e4755/html5/thumbnails/18.jpg)
Supervised learningNotation
◦ Features x
◦ Targets y
◦ Predictions ŷ
◦ Parameters θProgram (“Learner”)
Characterized by some “parameters” θ
Procedure (using θ) that outputs a prediction
Training data (examples)
Features
Learning algorithm
Change θImprove performance
Feedback / Target values Score performance
(“cost function”)
![Page 19: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec4d39822367157d34e4755/html5/thumbnails/19.jpg)
Linear regression
Define form of function f(x) explicitly
Find a good f(x) within that family
0 10 200
20
40
Target
y
Feature x
“Predictor”:
Evaluate line:
return r
![Page 20: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec4d39822367157d34e4755/html5/thumbnails/20.jpg)
Notation
![Page 21: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec4d39822367157d34e4755/html5/thumbnails/21.jpg)
Supervised learningNotation
◦ Features x
◦ Targets y
◦ Predictions ŷ
◦ Parameters θProgram (“Learner”)
Characterized by some “parameters” θ
Procedure (using θ) that outputs a prediction
Training data (examples)
Features
Learning algorithm
Change θImprove performance
Feedback / Target values Score performance
(“cost function”)
![Page 22: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec4d39822367157d34e4755/html5/thumbnails/22.jpg)
Measuring error
0 200
![Page 23: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec4d39822367157d34e4755/html5/thumbnails/23.jpg)
Mean squared errorHow can we quantify the error?
Could choose something else, of course…◦ Computationally convenient (more later)
◦ Measures the variance of the residuals
◦ Corresponds to likelihood under Gaussian model of “noise”
![Page 24: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec4d39822367157d34e4755/html5/thumbnails/24.jpg)
MSE cost function
# Python / NumPy:e = Y – X.dot( theta.T );J = e.T.dot( e ) / m # = np.mean( e ** 2 )
![Page 25: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec4d39822367157d34e4755/html5/thumbnails/25.jpg)
Supervised learningNotation
◦ Features x
◦ Targets y
◦ Predictions ŷ
◦ Parameters θProgram (“Learner”)
Characterized by some “parameters” θ
Procedure (using θ) that outputs a prediction
Training data (examples)
Features
Learning algorithm
Change θImprove performance
Feedback /
Target values Score performance(“cost function”)
![Page 26: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec4d39822367157d34e4755/html5/thumbnails/26.jpg)
Visualizing the cost function
-1 -0.5 0 0.5 1 1.5 2 2.5 3-40
-30
-20
-10
0
10
20
30
40
-1 -0.5 0 0.5 1 1.5 2 2.5 3-40
-30
-20
-10
0
10
20
30
40
θ1
J(µ
)!
θ1
θ1
θ0
θ1
θ0
![Page 27: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec4d39822367157d34e4755/html5/thumbnails/27.jpg)
Finding good parametersWant to find parameters which minimize our error…
Think of a cost “surface”: error residual for that θ…
![Page 28: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec4d39822367157d34e4755/html5/thumbnails/28.jpg)
Machine Learning
Bayes Error Wrapup
Gaussian Bayes Models
Linear Regression: Definition and Cost
Gradient Descent Algorithms
![Page 29: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec4d39822367157d34e4755/html5/thumbnails/29.jpg)
Gradient descent
?
• How to change θ to
improve J(θ)?
• Choose a direction in which J(θ) is decreasing
![Page 30: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec4d39822367157d34e4755/html5/thumbnails/30.jpg)
• How to change θ to
improve J(θ)?
• Choose a direction in which J(θ) is decreasing
• Derivative
• Positive => increasing
• Negative => decreasing
Gradient descent
![Page 31: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec4d39822367157d34e4755/html5/thumbnails/31.jpg)
Gradient descent in >2 dimensions
• Gradient vector
Indicates direction of steepest ascent(negative = steepest descent)
![Page 32: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec4d39822367157d34e4755/html5/thumbnails/32.jpg)
Gradient descentInitialization
Step size◦ Can change as a function of iteration
Gradient direction
Stopping condition
Initialize θ
Do{
θ ← θ - α∇θJ(θ)
} while (α||∇θJ|| > ε)
![Page 33: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec4d39822367157d34e4755/html5/thumbnails/33.jpg)
Gradient for the MSE
![Page 34: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec4d39822367157d34e4755/html5/thumbnails/34.jpg)
Upcoming…
• Lot of activity on Piazza• You have been added to Gradescope
Misc.
• Homework 1 due tonight• Homework 2 released tonight• HW2 Due: October 19, 2017
Homework