Download - Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

Transcript
Page 1: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

LinearRegression

RobotImageCredit:ViktoriyaSukhanova©123RF.com

TheseslideswereassembledbyEricEaton,withgratefulacknowledgementofthemanyotherswhomadetheircoursematerialsfreelyavailableonline.Feelfreetoreuseoradapttheseslidesforyourownacademicpurposes,providedthatyouincludeproperaHribuIon.PleasesendcommentsandcorrecIonstoEric.

Page 2: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

RegressionGiven:–  Datawhere

–  Correspondinglabelswhere

2

0

1

2

3

4

5

6

7

8

9

1975 1980 1985 1990 1995 2000 2005 2010 2015

Septem

berA

rc+cSeaIceExtent

(1,000,000sq

km)

Year

DatafromG.WiH.JournalofStaIsIcsEducaIon,Volume21,Number1(2013)

LinearRegressionQuadraIcRegression

X =n

x

(1), . . . ,x(n)o

x

(i) 2 Rd

y =n

y(1), . . . , y(n)o

y(i) 2 R

Page 3: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

•  97samples,parIIonedinto67train/30test•  Eightpredictors(features):

–  6conInuous(4logtransforms),1binary,1ordinal•  ConInuousoutcomevariable:

–  lpsa:log(prostatespecificanIgenlevel)

ProstateCancerDataset

BasedonslidebyJeffHowbert

Page 4: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

LinearRegression•  Hypothesis:

•  Fitmodelbyminimizingsumofsquarederrors

5

x x

y = ✓0 + ✓1x1 + ✓2x2 + . . .+ ✓dxd =dX

j=0

✓jxj

Assumex0=1

y = ✓0 + ✓1x1 + ✓2x2 + . . .+ ✓dxd =dX

j=0

✓jxj

FiguresarecourtesyofGregShakhnarovich

Page 5: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

LeastSquaresLinearRegression

6

•  CostFuncIon

•  Fitbysolving

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2

min✓

J(✓)

Page 6: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

IntuiIonBehindCostFuncIon

7

ForinsightonJ(),let’sassumesox 2 R ✓ = [✓0, ✓1]

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2

BasedonexamplebyAndrewNg

Page 7: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

IntuiIonBehindCostFuncIon

8

0

1

2

3

0 1 2 3

y

x

(forfixed,thisisafuncIonofx) (funcIonoftheparameter)

0

1

2

3

-0.5 0 0.5 1 1.5 2 2.5

ForinsightonJ(),let’sassumesox 2 R ✓ = [✓0, ✓1]

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2

BasedonexamplebyAndrewNg

Page 8: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

IntuiIonBehindCostFuncIon

9

0

1

2

3

0 1 2 3

y

x

(forfixed,thisisafuncIonofx) (funcIonoftheparameter)

0

1

2

3

-0.5 0 0.5 1 1.5 2 2.5

ForinsightonJ(),let’sassumesox 2 R ✓ = [✓0, ✓1]

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2

J([0, 0.5]) =1

2⇥ 3

⇥(0.5� 1)2 + (1� 2)2 + (1.5� 3)2

⇤⇡ 0.58Basedonexample

byAndrewNg

Page 9: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

IntuiIonBehindCostFuncIon

10

0

1

2

3

0 1 2 3

y

x

(forfixed,thisisafuncIonofx) (funcIonoftheparameter)

0

1

2

3

-0.5 0 0.5 1 1.5 2 2.5

ForinsightonJ(),let’sassumesox 2 R ✓ = [✓0, ✓1]

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2

J([0, 0]) ⇡ 2.333

BasedonexamplebyAndrewNg

J()isconcave

Page 10: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

IntuiIonBehindCostFuncIon

11SlidebyAndrewNg

Page 11: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

IntuiIonBehindCostFuncIon

12

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

SlidebyAndrewNg

Page 12: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

IntuiIonBehindCostFuncIon

13

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

SlidebyAndrewNg

Page 13: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

IntuiIonBehindCostFuncIon

14

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

SlidebyAndrewNg

Page 14: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

IntuiIonBehindCostFuncIon

15

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

SlidebyAndrewNg

Page 15: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

BasicSearchProcedure•  ChooseiniIalvaluefor•  UnIlwereachaminimum:–  Chooseanewvaluefortoreduce

16

✓ J(✓)

�1�0

J(�0,�1)

FigurebyAndrewNg

Page 16: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

BasicSearchProcedure•  ChooseiniIalvaluefor•  UnIlwereachaminimum:–  Chooseanewvaluefortoreduce

17

J(✓)

�1�0

J(�0,�1)

FigurebyAndrewNg

Page 17: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

BasicSearchProcedure•  ChooseiniIalvaluefor•  UnIlwereachaminimum:–  Chooseanewvaluefortoreduce

18

J(✓)

�1�0

J(�0,�1)

FigurebyAndrewNg

SincetheleastsquaresobjecIvefuncIonisconvex(concave),wedon’tneedtoworryaboutlocalminima

Page 18: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

GradientDescent•  IniIalize•  RepeatunIlconvergence

19

✓j ✓j � ↵@

@✓jJ(✓) simultaneousupdate

forj=0...d

learningrate(small)e.g.,α=0.05

J(✓)

0

1

2

3

-0.5 0 0.5 1 1.5 2 2.5

Page 19: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

GradientDescent•  IniIalize•  RepeatunIlconvergence

20

✓j ✓j � ↵@

@✓jJ(✓) simultaneousupdate

forj=0...d

ForLinearRegression:@

@✓jJ(✓) =

@

@✓j

1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘2

=@

@✓j

1

2n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!2

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!⇥ @

@✓j

dX

k=0

✓kx(i)k � y

(i)

!

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!x

(i)j

=1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j

Page 20: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

GradientDescent•  IniIalize•  RepeatunIlconvergence

21

✓j ✓j � ↵@

@✓jJ(✓) simultaneousupdate

forj=0...d

ForLinearRegression:@

@✓jJ(✓) =

@

@✓j

1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘2

=@

@✓j

1

2n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!2

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!⇥ @

@✓j

dX

k=0

✓kx(i)k � y

(i)

!

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!x

(i)j

=1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j

Page 21: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

GradientDescent•  IniIalize•  RepeatunIlconvergence

22

✓j ✓j � ↵@

@✓jJ(✓) simultaneousupdate

forj=0...d

ForLinearRegression:@

@✓jJ(✓) =

@

@✓j

1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘2

=@

@✓j

1

2n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!2

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!⇥ @

@✓j

dX

k=0

✓kx(i)k � y

(i)

!

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!x

(i)j

=1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j

Page 22: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

GradientDescent•  IniIalize•  RepeatunIlconvergence

23

✓j ✓j � ↵@

@✓jJ(✓) simultaneousupdate

forj=0...d

ForLinearRegression:@

@✓jJ(✓) =

@

@✓j

1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘2

=@

@✓j

1

2n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!2

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!⇥ @

@✓j

dX

k=0

✓kx(i)k � y

(i)

!

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!x

(i)j

=1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j

Page 23: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

GradientDescentforLinearRegression

•  IniIalize•  RepeatunIlconvergence

24

simultaneousupdateforj=0...d

✓j ✓j � ↵

1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j

•  Toachievesimultaneousupdate•  AtthestartofeachGDiteraIon,compute•  Usethisstoredvalueintheupdatesteploop

h✓

⇣x

(i)⌘

kvk2 =

sX

i

v2i =q

v21 + v22 + . . .+ v2|v|L2norm:

k✓new

� ✓old

k2 < ✏•  Assumeconvergencewhen

Page 24: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

GradientDescent

25

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

h(x)=-900–0.1x

SlidebyAndrewNg

Page 25: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

GradientDescent

26

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

SlidebyAndrewNg

Page 26: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

GradientDescent

27

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

SlidebyAndrewNg

Page 27: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

GradientDescent

28

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

SlidebyAndrewNg

Page 28: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

GradientDescent

29

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

SlidebyAndrewNg

Page 29: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

GradientDescent

30

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

SlidebyAndrewNg

Page 30: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

GradientDescent

31

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

SlidebyAndrewNg

Page 31: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

GradientDescent

32

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

SlidebyAndrewNg

Page 32: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

GradientDescent

33

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

SlidebyAndrewNg

Page 33: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

Choosingα

34

αtoosmall

slowconvergence

αtoolarge

IncreasingvalueforJ(✓)

•  Mayovershoottheminimum•  Mayfailtoconverge•  Mayevendiverge

Toseeifgradientdescentisworking,printouteachiteraIon•  ThevalueshoulddecreaseateachiteraIon•  Ifitdoesn’t,adjustα

J(✓)

Page 34: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

ExtendingLinearRegressiontoMoreComplexModels

•  TheinputsXforlinearregressioncanbe:–  OriginalquanItaIveinputs–  TransformaIonofquanItaIveinputs

•  e.g.log,exp,squareroot,square,etc.–  PolynomialtransformaIon

•  example:y=�0+�1�x+�2�x2+�3�x3–  Basisexpansions–  Dummycodingofcategoricalinputs–  InteracIonsbetweenvariables

•  example:x3=x1�x2

Thisallowsuseoflinearregressiontechniquestofitnon-lineardatasets.

Page 35: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

LinearBasisFuncIonModels

•  Generally,

•  Typically,sothatactsasabias•  Inthesimplestcase,weuselinearbasisfuncIons:

h✓(x) =dX

j=0

✓j�j(x)

�0(x) = 1 ✓0

�j(x) = xj

basisfuncIon

BasedonslidebyChristopherBishop(PRML)

Page 36: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

LinearBasisFuncIonModels

–  Theseareglobal;asmallchangeinxaffectsallbasisfuncIons

•  PolynomialbasisfuncIons:

•  GaussianbasisfuncIons:

–  Thesearelocal;asmallchangeinxonlyaffectnearbybasisfuncIons.μjandscontrollocaIonandscale(width).

BasedonslidebyChristopherBishop(PRML)

Page 37: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

LinearBasisFuncIonModels•  SigmoidalbasisfuncIons:

where

–  Thesearealsolocal;asmallchangeinxonlyaffectsnearbybasisfuncIons.μjandscontrollocaIonandscale(slope).

BasedonslidebyChristopherBishop(PRML)

Page 38: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

ExampleofFinngaPolynomialCurvewithaLinearModel

y = ✓0 + ✓1x+ ✓2x2 + . . .+ ✓px

p =pX

j=0

✓jxj

Page 39: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

LinearBasisFuncIonModels

•  BasicLinearModel:

•  GeneralizedLinearModel:•  OncewehavereplacedthedatabytheoutputsofthebasisfuncIons,finngthegeneralizedmodelisexactlythesameproblemasfinngthebasicmodel–  Unlessweusethekerneltrick–moreonthatwhenwecoversupportvectormachines

–  Therefore,thereisnopointincluHeringthemathwithbasisfuncIons

40

h✓(x) =dX

j=0

✓j�j(x)

h✓(x) =dX

j=0

✓jxj

BasedonslidebyGeoffHinton

Page 40: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

LinearAlgebraConcepts•  Vectorinisanorderedsetofdrealnumbers

–  e.g.,v=[1,6,3,4]isin–  “[1,6,3,4]” isacolumnvector:–  asopposedtoarowvector:

•  Anm-by-nmatrixisanobjectwithmrowsandncolumns,whereeachentryisarealnumber:

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

4361

( )4361

⎟⎟⎟

⎜⎜⎜

2396784821

Rd

R4

BasedonslidesbyJosephBradley

Page 41: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

•  Transpose:reflectvector/matrixonline:

( )baba T

=⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛=⎟⎟⎠

⎞⎜⎜⎝

⎛dbca

dcba T

–  Note:(Ax)T=xTAT(We’lldefinemulIplicaIonsoon…)

•  Vectornorms:–  Lpnormofv=(v1,…,vk)is–  Commonnorms:L1,L2–  Linfinity=maxi|vi|

•  LengthofavectorvisL2(v)

X

i

|vi|p! 1

p

BasedonslidesbyJosephBradley

LinearAlgebraConcepts

Page 42: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

•  Vectordotproduct:

–  Note:dotproductofuwithitself=length(u)2=

•  Matrixproduct:

( ) ( ) 22112121 vuvuvvuuvu +=•=•

⎟⎟⎠

⎞⎜⎜⎝

⎛++++

=

⎟⎟⎠

⎞⎜⎜⎝

⎛=⎟⎟⎠

⎞⎜⎜⎝

⎛=

2222122121221121

2212121121121111

2221

1211

2221

1211 ,

babababababababa

AB

bbbb

Baaaa

A

kuk22

BasedonslidesbyJosephBradley

LinearAlgebraConcepts

Page 43: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

•  Vectorproducts:–  Dotproduct:

–  Outerproduct:

( ) 22112

121 vuvuvv

uuvuvu T +=⎟⎟⎠

⎞⎜⎜⎝

⎛==•

( ) ⎟⎟⎠

⎞⎜⎜⎝

⎛=⎟⎟⎠

⎞⎜⎜⎝

⎛=

2212

211121

2

1

vuvuvuvu

vvuu

uvT

BasedonslidesbyJosephBradley

LinearAlgebraConcepts

Page 44: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

h(x) = ✓

|x

x

| =⇥1 x1 . . . xd

VectorizaIon•  BenefitsofvectorizaIon– MorecompactequaIons–  Fastercode(usingopImizedmatrixlibraries)

•  Considerourmodel:•  Let

•  Canwritethemodelinvectorizedformas45

h(x) =dX

j=0

✓jxj

✓ =

2

6664

✓0✓1...✓d

3

7775

Page 45: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

VectorizaIon•  Considerourmodelforninstances:•  Let

•  Canwritethemodelinvectorizedformas46

h✓(x) = X✓

X =

2

66666664

1 x

(1)1 . . . x

(1)d

......

. . ....

1 x

(i)1 . . . x

(i)d

......

. . ....

1 x

(n)1 . . . x

(n)d

3

77777775

✓ =

2

6664

✓0✓1...✓d

3

7775

h

⇣x

(i)⌘=

dX

j=0

✓jx(i)j

R(d+1)⇥1 Rn⇥(d+1)

Page 46: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

J(✓) =1

2n

nX

i=1

⇣✓

|x

(i) � y(i)⌘2

VectorizaIon•  ForthelinearregressioncostfuncIon:

47

J(✓) =1

2n(X✓ � y)| (X✓ � y)

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2

Rn⇥(d+1)

R(d+1)⇥1

Rn⇥1R1⇥n

Let:

y =

2

6664

y(1)

y(2)

...y(n)

3

7775

Page 47: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

ClosedFormSoluIon:

ClosedFormSoluIon•  InsteadofusingGD,solveforopImal analyIcally–  NoIcethatthesoluIoniswhen

•  DerivaIon:

TakederivaIveandsetequalto0,thensolvefor:

48

✓@

@✓J(✓) = 0

J (✓) =1

2n(X✓ � y)| (X✓ � y)

/ ✓|X|X✓ � y|X✓ � ✓|X|y + y|y/ ✓|X|X✓ � 2✓|X|y + y|y

1x1J (✓) =1

2n(X✓ � y)| (X✓ � y)

/ ✓|X|X✓ � y|X✓ � ✓|X|y + y|y/ ✓|X|X✓ � 2✓|X|y + y|y

J (✓) =1

2n(X✓ � y)| (X✓ � y)

/ ✓|X|X✓ � y|X✓ � ✓|X|y + y|y/ ✓|X|X✓ � 2✓|X|y + y|y

@

@✓(✓|X|X✓ � 2✓|X|y + y|y) = 0

(X|X)✓ �X|y = 0

(X|X)✓ = X|y

✓ = (X|X)�1X|y

✓@

@✓(✓|X|X✓ � 2✓|X|y + y|y) = 0

(X|X)✓ �X|y = 0

(X|X)✓ = X|y

✓ = (X|X)�1X|y

@

@✓(✓|X|X✓ � 2✓|X|y + y|y) = 0

(X|X)✓ �X|y = 0

(X|X)✓ = X|y

✓ = (X|X)�1X|y

@

@✓(✓|X|X✓ � 2✓|X|y + y|y) = 0

(X|X)✓ �X|y = 0

(X|X)✓ = X|y

✓ = (X|X)�1X|y

Page 48: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

ClosedFormSoluIon•  CanobtainbysimplypluggingXand into

•  IfXTXisnotinverIble(i.e.,singular),mayneedto:–  Usepseudo-inverseinsteadoftheinverse

•  Inpython,numpy.linalg.pinv(a) –  Removeredundant(notlinearlyindependent)features–  Removeextrafeaturestoensurethatd≤n

49

@

@✓(✓|X|X✓ � 2✓|X|y + y|y) = 0

(X|X)✓ �X|y = 0

(X|X)✓ = X|y

✓ = (X|X)�1X|y

y =

2

6664

y(1)

y(2)

...y(n)

3

7775X =

2

66666664

1 x

(1)1 . . . x

(1)d

......

. . ....

1 x

(i)1 . . . x

(i)d

......

. . ....

1 x

(n)1 . . . x

(n)d

3

77777775

✓ y

Page 49: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

GradientDescentvsClosedForm

GradientDescentClosedFormSolu+on

50

•  RequiresmulIpleiteraIons•  Needtochooseα•  Workswellwhennislarge•  Cansupportincremental

learning

•  Non-iteraIve•  Noneedforα•  Slowifnislarge

–CompuIng(XTX)-1isroughlyO(n3)

Page 50: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

ImprovingLearning:FeatureScaling

•  Idea:Ensurethatfeaturehavesimilarscales

•  Makesgradientdescentconvergemuchfaster

51

0

5

10

15

20

0 5 10 15 20✓1

✓2

BeforeFeatureScaling

0

5

10

15

20

0 5 10 15 20✓1

✓2

AverFeatureScaling

Page 51: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

FeatureStandardizaIon•  Rescalesfeaturestohavezeromeanandunitvariance

– Letμjbethemeanoffeaturej:

– Replaceeachvaluewith:

•  sjisthestandarddeviaIonoffeaturej •  Couldalsousetherangeoffeaturej (maxj–minj)forsj

•  MustapplythesametransformaIontoinstancesforbothtrainingandpredicIon

•  Outlierscancauseproblems

52

µj =1

n

nX

i=1

x

(i)j

x

(i)j

x

(i)j � µj

sj

forj=1...d(notx0!)

Page 52: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

QualityofFit

OverfiHng:•  Thelearnedhypothesismayfitthetrainingsetverywell()

•  ...butfailstogeneralizetonewexamples

53

Price

Size

Price

Size

Price

Size

Underfinng(highbias)

Overfinng(highvariance)

Correctfit

J(✓) ⇡ 0

BasedonexamplebyAndrewNg

Page 53: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

RegularizaIon•  AmethodforautomaIcallycontrollingthecomplexityofthelearnedhypothesis

•  Idea:penalizeforlargevaluesof–  CanincorporateintothecostfuncIon– Workswellwhenwehavealotoffeatures,eachthatcontributesabittopredicIngthelabel

•  CanalsoaddressoverfinngbyeliminaIngfeatures(eithermanuallyorviamodelselecIon)

54

✓j

Page 54: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

RegularizaIon•  LinearregressionobjecIvefuncIon

–  istheregularizaIonparameter()– NoregularizaIonon!

55

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2+ �

dX

j=1

✓2j

modelfittodata regularizaIon

✓0

� � � 0

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2+

2

dX

j=1

✓2j

Page 55: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

UnderstandingRegularizaIon

•  Notethat

–  Thisisthemagnitudeofthefeaturecoefficientvector!

•  Wecanalsothinkofthisas:

•  L2regularizaIonpullscoefficientstoward0

56

dX

j=1

✓2j = k✓1:dk22

dX

j=1

(✓j � 0)2 = k✓1:d � ~0k22

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2+

2

dX

j=1

✓2j

Page 56: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

UnderstandingRegularizaIon

•  Whathappensifwesettobehuge(e.g.,1010)?

57

�Price

Size

BasedonexamplebyAndrewNg

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2+

2

dX

j=1

✓2j

Page 57: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

UnderstandingRegularizaIon

•  Whathappensifwesettobehuge(e.g.,1010)?

58

�Price

Size0 0 0 0

BasedonexamplebyAndrewNg

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2+

2

dX

j=1

✓2j

Page 58: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

RegularizedLinearRegression

59

•  CostFuncIon

•  Fitbysolving

•  Gradientupdate:

min✓

J(✓)

✓j ✓j � ↵

1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j

✓0 ✓0 � ↵1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

regularizaIon

@

@✓jJ(✓)

@

@✓0J(✓)

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2+

2

dX

j=1

✓2j

� ↵�✓j

Page 59: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

RegularizedLinearRegression

60

✓0 ✓0 � ↵1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

•  Wecanrewritethegradientstepas:

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2+

2

dX

j=1

✓2j

✓j ✓j (1� ↵�)� ↵

1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j

✓j ✓j � ↵

1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j � ↵�✓j

Page 60: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

RegularizedLinearRegression

61

✓ =

0

BBBBB@X|X + �

2

666664

0 0 0 . . . 00 1 0 . . . 00 0 1 . . . 0...

......

. . ....

0 0 0 . . . 1

3

777775

1

CCCCCA

�1

X|y

•  ToincorporateregularizaIonintotheclosedformsoluIon:

Page 61: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

RegularizedLinearRegression

62

•  ToincorporateregularizaIonintotheclosedformsoluIon:

•  Canderivethisthesameway,bysolving

•  Canprovethatforλ>0,inverseexistsintheequaIonabove

✓ =

0

BBBBB@X|X + �

2

666664

0 0 0 . . . 00 1 0 . . . 00 0 1 . . . 0...

......

. . ....

0 0 0 . . . 1

3

777775

1

CCCCCA

�1

X|y

@

@✓J(✓) = 0