Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression...

34
Linear Regression PROF. SAMEER SINGH FALL 2017 CS 273A: Machine Learning based on slides by Alex Ihler

Transcript of Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression...

Page 1: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10

Linear Regression

PROF. SAMEER SINGH

FALL 2017

CS 273A: Machine Learning

based on slides by Alex Ihler

Page 2: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10

Machine Learning

Bayes Error Wrapup

Gaussian Bayes Models

Linear Regression: Definition and Cost

Gradient Descent Algorithms

Page 3: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10

Measuring errorsConfusion matrix

Can extend to more classes

True positive rate: #(y=1 , ŷ=1) / #(y=1) -- “sensitivity”

False negative rate: #(y=1 , ŷ=0) / #(y=1)

False positive rate: #(y=0 , ŷ=1) / #(y=0)

True negative rate: #(y=0 , ŷ=0) / #(y=0) -- “specificity”

Predict 0 Predict 1

Y=0 380 5

Y=1 338 3

Page 4: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10

Decision Surfaces

p(x , y=1 )p(x , y=0 )

<

>

<

>

Add multiplier alpha:

Page 5: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10

ROC CurvesCharacterize performance as we vary the decision threshold?

8

False positive rate

= 1 - specificity

Tru

e p

ositiv

e r

ate

= s

ensitiv

ity

Guess all 0

Guess all 1

Guess at random,

proportion alpha

Bayes classifier,

multiplier alpha

<

>

Page 6: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10

Probabilistic vs. Discriminative learning

“Probabilistic” learning◦ Conditional models just explain y: p(y|x)◦ Generative models also explain x: p(x,y)

◦ Often a component of unsupervised or semi-supervised learning

◦ Bayes and Naïve Bayes classifiers are generative models

“Discriminative” learning:Output prediction ŷ(x)

“Probabilistic” learning:Output probability p(y|x)

(expresses confidence in outcomes)

Page 7: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10

Probabilistic vs. Discriminative learning

Can use ROC curves for discriminative models also:◦ Some notion of confidence, but doesn’t correspond to a probability

◦ In our code: “predictSoft” (vs. hard prediction, “predict”)

>> learner = gaussianBayesClassify(X,Y); % build a classifier>> Ysoft = predictSoft(learner, X); % M x C matrix of confidences>> plotSoftClassify2D(learner,X,Y); % shaded confidence plot

“Discriminative” learning:Output prediction ŷ(x)

“Probabilistic” learning:Output probability p(y|x)

(expresses confidence in outcomes)

Page 8: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10

ROC CurvesCharacterize performance as we vary our confidence threshold?

False positive rate

= 1 - specificity

Tru

e p

ositiv

e r

ate

= s

ensitiv

ity

Guess all 0

Guess all 1

Guess at random, proportion alphaClassifier A

Classifier B

Reduce performance to one number?AUC = “area under the ROC curve”

0.5 < AUC < 1

Page 9: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10

Machine Learning

Bayes Error Wrapup

Gaussian Bayes Models

Linear Regression: Definition and Cost

Gradient Descent Algorithms

Page 10: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10

Gaussian models“Bayes optimal” decision

◦ Choose most likely class

Decision boundary◦ Places where probabilities equal

What shape is the boundary?

Page 11: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10

Gaussian modelsBayes optimal decision boundary

◦ p(y=0 | x) = p(y=1 | x)

◦ Transition point between p(y=0|x) >/< p(y=1|x)

Assume Gaussian models with equal covariances

Page 12: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10

Gaussian exampleSpherical covariance: Σ = σ2 I

Decision rule

-2 -1 0 1 2 3 4 5-2

-1

0

1

2

3

4

5

Page 13: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10

Non-spherical Gaussian distributionsEqual covariances => still linear decision rule

◦ May be “modulated” by variance direction

◦ Scales; rotates (if correlated)

-10 -8 -6 -4 -2 0 2 4 6 8 10-2

-1

0

1

2

3

4

Example:Variance[ 3 0 ][ 0 .25 ]

Page 14: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10

Class posterior probabilitiesConsider comparing two classes

◦ p(x | y=0) * p(y=0) vs p(x | y=1) * p(y=1)

◦ Write probability of each class as

◦ p(y=0 | x) = p(y=0, x) / p(x)

◦ = p(y=0, x) / ( p(y=0,x) + p(y=1,x) )

◦ = 1 / (1 + exp( - f ) )

◦ f = log [ p(y=0, x) / p(y=1, x) ] the logistic function, or logistic sigmoid

Page 15: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10

Gaussian modelsReturn to Gaussian models with equal covariances

Now we also know that the probability of each class is given by:p(y=0 | x) = Logistic( f ) = Logistic( aT x + b )

We’ll see this form again soon…

f

Page 16: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10

Machine Learning

Bayes Error Wrapup

Gaussian Bayes Models

Linear Regression: Definition and Cost

Gradient Descent Algorithms

Page 17: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10

Supervised learningNotation

◦ Features x

◦ Targets y

◦ Predictions ŷ

◦ Parameters θProgram (“Learner”)

Characterized by some “parameters” θ

Procedure (using θ) that outputs a prediction

Training data (examples)

Features

Learning algorithm

Change θImprove performance

Feedback / Target values Score performance

(“cost function”)

Page 18: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10

Supervised learningNotation

◦ Features x

◦ Targets y

◦ Predictions ŷ

◦ Parameters θProgram (“Learner”)

Characterized by some “parameters” θ

Procedure (using θ) that outputs a prediction

Training data (examples)

Features

Learning algorithm

Change θImprove performance

Feedback / Target values Score performance

(“cost function”)

Page 19: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10

Linear regression

Define form of function f(x) explicitly

Find a good f(x) within that family

0 10 200

20

40

Target

y

Feature x

“Predictor”:

Evaluate line:

return r

Page 20: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10

Notation

Page 21: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10

Supervised learningNotation

◦ Features x

◦ Targets y

◦ Predictions ŷ

◦ Parameters θProgram (“Learner”)

Characterized by some “parameters” θ

Procedure (using θ) that outputs a prediction

Training data (examples)

Features

Learning algorithm

Change θImprove performance

Feedback / Target values Score performance

(“cost function”)

Page 22: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10

Measuring error

0 200

Page 23: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10

Mean squared errorHow can we quantify the error?

Could choose something else, of course…◦ Computationally convenient (more later)

◦ Measures the variance of the residuals

◦ Corresponds to likelihood under Gaussian model of “noise”

Page 24: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10

MSE cost function

# Python / NumPy:e = Y – X.dot( theta.T );J = e.T.dot( e ) / m # = np.mean( e ** 2 )

Page 25: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10

Supervised learningNotation

◦ Features x

◦ Targets y

◦ Predictions ŷ

◦ Parameters θProgram (“Learner”)

Characterized by some “parameters” θ

Procedure (using θ) that outputs a prediction

Training data (examples)

Features

Learning algorithm

Change θImprove performance

Feedback /

Target values Score performance(“cost function”)

Page 26: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10

Visualizing the cost function

-1 -0.5 0 0.5 1 1.5 2 2.5 3-40

-30

-20

-10

0

10

20

30

40

-1 -0.5 0 0.5 1 1.5 2 2.5 3-40

-30

-20

-10

0

10

20

30

40

θ1

J(µ

)!

θ1

θ1

θ0

θ1

θ0

Page 27: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10

Finding good parametersWant to find parameters which minimize our error…

Think of a cost “surface”: error residual for that θ…

Page 28: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10

Machine Learning

Bayes Error Wrapup

Gaussian Bayes Models

Linear Regression: Definition and Cost

Gradient Descent Algorithms

Page 29: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10

Gradient descent

?

• How to change θ to

improve J(θ)?

• Choose a direction in which J(θ) is decreasing

Page 30: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10

• How to change θ to

improve J(θ)?

• Choose a direction in which J(θ) is decreasing

• Derivative

• Positive => increasing

• Negative => decreasing

Gradient descent

Page 31: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10

Gradient descent in >2 dimensions

• Gradient vector

Indicates direction of steepest ascent(negative = steepest descent)

Page 32: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10

Gradient descentInitialization

Step size◦ Can change as a function of iteration

Gradient direction

Stopping condition

Initialize θ

Do{

θ ← θ - α∇θJ(θ)

} while (α||∇θJ|| > ε)

Page 33: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10

Gradient for the MSE

Page 34: Linear Regression - Sameer Singhsameersingh.org/courses/gml/fa17/slides/04-lin...Linear regression Define form of function f(x) explicitly Find a good f(x) within that family 0 10

Upcoming…

• Lot of activity on Piazza• You have been added to Gradescope

Misc.

• Homework 1 due tonight• Homework 2 released tonight• HW2 Due: October 19, 2017

Homework