LinearRegression
RobotImageCredit:ViktoriyaSukhanova©123RF.com
TheseslideswereassembledbyEricEaton,withgratefulacknowledgementofthemanyotherswhomadetheircoursematerialsfreelyavailableonline.Feelfreetoreuseoradapttheseslidesforyourownacademicpurposes,providedthatyouincludeproperaHribuIon.PleasesendcommentsandcorrecIonstoEric.
RegressionGiven:– Datawhere
– Correspondinglabelswhere
2
0
1
2
3
4
5
6
7
8
9
1975 1980 1985 1990 1995 2000 2005 2010 2015
Septem
berA
rc+cSeaIceExtent
(1,000,000sq
km)
Year
DatafromG.WiH.JournalofStaIsIcsEducaIon,Volume21,Number1(2013)
LinearRegressionQuadraIcRegression
X =n
x
(1), . . . ,x(n)o
x
(i) 2 Rd
y =n
y(1), . . . , y(n)o
y(i) 2 R
• 97samples,parIIonedinto67train/30test• Eightpredictors(features):
– 6conInuous(4logtransforms),1binary,1ordinal• ConInuousoutcomevariable:
– lpsa:log(prostatespecificanIgenlevel)
ProstateCancerDataset
BasedonslidebyJeffHowbert
LinearRegression• Hypothesis:
• Fitmodelbyminimizingsumofsquarederrors
5
x x
y = ✓0 + ✓1x1 + ✓2x2 + . . .+ ✓dxd =dX
j=0
✓jxj
Assumex0=1
y = ✓0 + ✓1x1 + ✓2x2 + . . .+ ✓dxd =dX
j=0
✓jxj
FiguresarecourtesyofGregShakhnarovich
LeastSquaresLinearRegression
6
• CostFuncIon
• Fitbysolving
J(✓) =1
2n
nX
i=1
⇣h✓
⇣x
(i)⌘� y(i)
⌘2
min✓
J(✓)
IntuiIonBehindCostFuncIon
7
ForinsightonJ(),let’sassumesox 2 R ✓ = [✓0, ✓1]
J(✓) =1
2n
nX
i=1
⇣h✓
⇣x
(i)⌘� y(i)
⌘2
BasedonexamplebyAndrewNg
IntuiIonBehindCostFuncIon
8
0
1
2
3
0 1 2 3
y
x
(forfixed,thisisafuncIonofx) (funcIonoftheparameter)
0
1
2
3
-0.5 0 0.5 1 1.5 2 2.5
ForinsightonJ(),let’sassumesox 2 R ✓ = [✓0, ✓1]
J(✓) =1
2n
nX
i=1
⇣h✓
⇣x
(i)⌘� y(i)
⌘2
BasedonexamplebyAndrewNg
IntuiIonBehindCostFuncIon
9
0
1
2
3
0 1 2 3
y
x
(forfixed,thisisafuncIonofx) (funcIonoftheparameter)
0
1
2
3
-0.5 0 0.5 1 1.5 2 2.5
ForinsightonJ(),let’sassumesox 2 R ✓ = [✓0, ✓1]
J(✓) =1
2n
nX
i=1
⇣h✓
⇣x
(i)⌘� y(i)
⌘2
J([0, 0.5]) =1
2⇥ 3
⇥(0.5� 1)2 + (1� 2)2 + (1.5� 3)2
⇤⇡ 0.58Basedonexample
byAndrewNg
IntuiIonBehindCostFuncIon
10
0
1
2
3
0 1 2 3
y
x
(forfixed,thisisafuncIonofx) (funcIonoftheparameter)
0
1
2
3
-0.5 0 0.5 1 1.5 2 2.5
ForinsightonJ(),let’sassumesox 2 R ✓ = [✓0, ✓1]
J(✓) =1
2n
nX
i=1
⇣h✓
⇣x
(i)⌘� y(i)
⌘2
J([0, 0]) ⇡ 2.333
BasedonexamplebyAndrewNg
J()isconcave
IntuiIonBehindCostFuncIon
11SlidebyAndrewNg
IntuiIonBehindCostFuncIon
12
(forfixed,thisisafuncIonofx) (funcIonoftheparameters)
SlidebyAndrewNg
IntuiIonBehindCostFuncIon
13
(forfixed,thisisafuncIonofx) (funcIonoftheparameters)
SlidebyAndrewNg
IntuiIonBehindCostFuncIon
14
(forfixed,thisisafuncIonofx) (funcIonoftheparameters)
SlidebyAndrewNg
IntuiIonBehindCostFuncIon
15
(forfixed,thisisafuncIonofx) (funcIonoftheparameters)
SlidebyAndrewNg
BasicSearchProcedure• ChooseiniIalvaluefor• UnIlwereachaminimum:– Chooseanewvaluefortoreduce
16
✓
✓ J(✓)
�1�0
J(�0,�1)
FigurebyAndrewNg
BasicSearchProcedure• ChooseiniIalvaluefor• UnIlwereachaminimum:– Chooseanewvaluefortoreduce
17
✓
✓
J(✓)
�1�0
J(�0,�1)
✓
FigurebyAndrewNg
BasicSearchProcedure• ChooseiniIalvaluefor• UnIlwereachaminimum:– Chooseanewvaluefortoreduce
18
✓
✓
J(✓)
�1�0
J(�0,�1)
✓
FigurebyAndrewNg
SincetheleastsquaresobjecIvefuncIonisconvex(concave),wedon’tneedtoworryaboutlocalminima
GradientDescent• IniIalize• RepeatunIlconvergence
19
✓
✓j ✓j � ↵@
@✓jJ(✓) simultaneousupdate
forj=0...d
learningrate(small)e.g.,α=0.05
J(✓)
✓
0
1
2
3
-0.5 0 0.5 1 1.5 2 2.5
↵
GradientDescent• IniIalize• RepeatunIlconvergence
20
✓
✓j ✓j � ↵@
@✓jJ(✓) simultaneousupdate
forj=0...d
ForLinearRegression:@
@✓jJ(✓) =
@
@✓j
1
2n
nX
i=1
⇣h✓
⇣x
(i)⌘� y
(i)⌘2
=@
@✓j
1
2n
nX
i=1
dX
k=0
✓kx(i)k � y
(i)
!2
=1
n
nX
i=1
dX
k=0
✓kx(i)k � y
(i)
!⇥ @
@✓j
dX
k=0
✓kx(i)k � y
(i)
!
=1
n
nX
i=1
dX
k=0
✓kx(i)k � y
(i)
!x
(i)j
=1
n
nX
i=1
⇣h✓
⇣x
(i)⌘� y
(i)⌘x
(i)j
GradientDescent• IniIalize• RepeatunIlconvergence
21
✓
✓j ✓j � ↵@
@✓jJ(✓) simultaneousupdate
forj=0...d
ForLinearRegression:@
@✓jJ(✓) =
@
@✓j
1
2n
nX
i=1
⇣h✓
⇣x
(i)⌘� y
(i)⌘2
=@
@✓j
1
2n
nX
i=1
dX
k=0
✓kx(i)k � y
(i)
!2
=1
n
nX
i=1
dX
k=0
✓kx(i)k � y
(i)
!⇥ @
@✓j
dX
k=0
✓kx(i)k � y
(i)
!
=1
n
nX
i=1
dX
k=0
✓kx(i)k � y
(i)
!x
(i)j
=1
n
nX
i=1
⇣h✓
⇣x
(i)⌘� y
(i)⌘x
(i)j
GradientDescent• IniIalize• RepeatunIlconvergence
22
✓
✓j ✓j � ↵@
@✓jJ(✓) simultaneousupdate
forj=0...d
ForLinearRegression:@
@✓jJ(✓) =
@
@✓j
1
2n
nX
i=1
⇣h✓
⇣x
(i)⌘� y
(i)⌘2
=@
@✓j
1
2n
nX
i=1
dX
k=0
✓kx(i)k � y
(i)
!2
=1
n
nX
i=1
dX
k=0
✓kx(i)k � y
(i)
!⇥ @
@✓j
dX
k=0
✓kx(i)k � y
(i)
!
=1
n
nX
i=1
dX
k=0
✓kx(i)k � y
(i)
!x
(i)j
=1
n
nX
i=1
⇣h✓
⇣x
(i)⌘� y
(i)⌘x
(i)j
GradientDescent• IniIalize• RepeatunIlconvergence
23
✓
✓j ✓j � ↵@
@✓jJ(✓) simultaneousupdate
forj=0...d
ForLinearRegression:@
@✓jJ(✓) =
@
@✓j
1
2n
nX
i=1
⇣h✓
⇣x
(i)⌘� y
(i)⌘2
=@
@✓j
1
2n
nX
i=1
dX
k=0
✓kx(i)k � y
(i)
!2
=1
n
nX
i=1
dX
k=0
✓kx(i)k � y
(i)
!⇥ @
@✓j
dX
k=0
✓kx(i)k � y
(i)
!
=1
n
nX
i=1
dX
k=0
✓kx(i)k � y
(i)
!x
(i)j
=1
n
nX
i=1
⇣h✓
⇣x
(i)⌘� y
(i)⌘x
(i)j
GradientDescentforLinearRegression
• IniIalize• RepeatunIlconvergence
24
✓
simultaneousupdateforj=0...d
✓j ✓j � ↵
1
n
nX
i=1
⇣h✓
⇣x
(i)⌘� y
(i)⌘x
(i)j
• Toachievesimultaneousupdate• AtthestartofeachGDiteraIon,compute• Usethisstoredvalueintheupdatesteploop
h✓
⇣x
(i)⌘
kvk2 =
sX
i
v2i =q
v21 + v22 + . . .+ v2|v|L2norm:
k✓new
� ✓old
k2 < ✏• Assumeconvergencewhen
GradientDescent
25
(forfixed,thisisafuncIonofx) (funcIonoftheparameters)
h(x)=-900–0.1x
SlidebyAndrewNg
GradientDescent
26
(forfixed,thisisafuncIonofx) (funcIonoftheparameters)
SlidebyAndrewNg
GradientDescent
27
(forfixed,thisisafuncIonofx) (funcIonoftheparameters)
SlidebyAndrewNg
GradientDescent
28
(forfixed,thisisafuncIonofx) (funcIonoftheparameters)
SlidebyAndrewNg
GradientDescent
29
(forfixed,thisisafuncIonofx) (funcIonoftheparameters)
SlidebyAndrewNg
GradientDescent
30
(forfixed,thisisafuncIonofx) (funcIonoftheparameters)
SlidebyAndrewNg
GradientDescent
31
(forfixed,thisisafuncIonofx) (funcIonoftheparameters)
SlidebyAndrewNg
GradientDescent
32
(forfixed,thisisafuncIonofx) (funcIonoftheparameters)
SlidebyAndrewNg
GradientDescent
33
(forfixed,thisisafuncIonofx) (funcIonoftheparameters)
SlidebyAndrewNg
Choosingα
34
αtoosmall
slowconvergence
αtoolarge
IncreasingvalueforJ(✓)
• Mayovershoottheminimum• Mayfailtoconverge• Mayevendiverge
Toseeifgradientdescentisworking,printouteachiteraIon• ThevalueshoulddecreaseateachiteraIon• Ifitdoesn’t,adjustα
J(✓)
ExtendingLinearRegressiontoMoreComplexModels
• TheinputsXforlinearregressioncanbe:– OriginalquanItaIveinputs– TransformaIonofquanItaIveinputs
• e.g.log,exp,squareroot,square,etc.– PolynomialtransformaIon
• example:y=�0+�1�x+�2�x2+�3�x3– Basisexpansions– Dummycodingofcategoricalinputs– InteracIonsbetweenvariables
• example:x3=x1�x2
Thisallowsuseoflinearregressiontechniquestofitnon-lineardatasets.
LinearBasisFuncIonModels
• Generally,
• Typically,sothatactsasabias• Inthesimplestcase,weuselinearbasisfuncIons:
h✓(x) =dX
j=0
✓j�j(x)
�0(x) = 1 ✓0
�j(x) = xj
basisfuncIon
BasedonslidebyChristopherBishop(PRML)
LinearBasisFuncIonModels
– Theseareglobal;asmallchangeinxaffectsallbasisfuncIons
• PolynomialbasisfuncIons:
• GaussianbasisfuncIons:
– Thesearelocal;asmallchangeinxonlyaffectnearbybasisfuncIons.μjandscontrollocaIonandscale(width).
BasedonslidebyChristopherBishop(PRML)
LinearBasisFuncIonModels• SigmoidalbasisfuncIons:
where
– Thesearealsolocal;asmallchangeinxonlyaffectsnearbybasisfuncIons.μjandscontrollocaIonandscale(slope).
BasedonslidebyChristopherBishop(PRML)
ExampleofFinngaPolynomialCurvewithaLinearModel
y = ✓0 + ✓1x+ ✓2x2 + . . .+ ✓px
p =pX
j=0
✓jxj
LinearBasisFuncIonModels
• BasicLinearModel:
• GeneralizedLinearModel:• OncewehavereplacedthedatabytheoutputsofthebasisfuncIons,finngthegeneralizedmodelisexactlythesameproblemasfinngthebasicmodel– Unlessweusethekerneltrick–moreonthatwhenwecoversupportvectormachines
– Therefore,thereisnopointincluHeringthemathwithbasisfuncIons
40
h✓(x) =dX
j=0
✓j�j(x)
h✓(x) =dX
j=0
✓jxj
BasedonslidebyGeoffHinton
LinearAlgebraConcepts• Vectorinisanorderedsetofdrealnumbers
– e.g.,v=[1,6,3,4]isin– “[1,6,3,4]” isacolumnvector:– asopposedtoarowvector:
• Anm-by-nmatrixisanobjectwithmrowsandncolumns,whereeachentryisarealnumber:
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
4361
( )4361
⎟⎟⎟
⎠
⎞
⎜⎜⎜
⎝
⎛
2396784821
Rd
R4
BasedonslidesbyJosephBradley
• Transpose:reflectvector/matrixonline:
( )baba T
=⎟⎟⎠
⎞⎜⎜⎝
⎛⎟⎟⎠
⎞⎜⎜⎝
⎛=⎟⎟⎠
⎞⎜⎜⎝
⎛dbca
dcba T
– Note:(Ax)T=xTAT(We’lldefinemulIplicaIonsoon…)
• Vectornorms:– Lpnormofv=(v1,…,vk)is– Commonnorms:L1,L2– Linfinity=maxi|vi|
• LengthofavectorvisL2(v)
X
i
|vi|p! 1
p
BasedonslidesbyJosephBradley
LinearAlgebraConcepts
• Vectordotproduct:
– Note:dotproductofuwithitself=length(u)2=
• Matrixproduct:
( ) ( ) 22112121 vuvuvvuuvu +=•=•
⎟⎟⎠
⎞⎜⎜⎝
⎛++++
=
⎟⎟⎠
⎞⎜⎜⎝
⎛=⎟⎟⎠
⎞⎜⎜⎝
⎛=
2222122121221121
2212121121121111
2221
1211
2221
1211 ,
babababababababa
AB
bbbb
Baaaa
A
kuk22
BasedonslidesbyJosephBradley
LinearAlgebraConcepts
• Vectorproducts:– Dotproduct:
– Outerproduct:
( ) 22112
121 vuvuvv
uuvuvu T +=⎟⎟⎠
⎞⎜⎜⎝
⎛==•
( ) ⎟⎟⎠
⎞⎜⎜⎝
⎛=⎟⎟⎠
⎞⎜⎜⎝
⎛=
2212
211121
2
1
vuvuvuvu
vvuu
uvT
BasedonslidesbyJosephBradley
LinearAlgebraConcepts
h(x) = ✓
|x
x
| =⇥1 x1 . . . xd
⇤
VectorizaIon• BenefitsofvectorizaIon– MorecompactequaIons– Fastercode(usingopImizedmatrixlibraries)
• Considerourmodel:• Let
• Canwritethemodelinvectorizedformas45
h(x) =dX
j=0
✓jxj
✓ =
2
6664
✓0✓1...✓d
3
7775
VectorizaIon• Considerourmodelforninstances:• Let
• Canwritethemodelinvectorizedformas46
h✓(x) = X✓
X =
2
66666664
1 x
(1)1 . . . x
(1)d
......
. . ....
1 x
(i)1 . . . x
(i)d
......
. . ....
1 x
(n)1 . . . x
(n)d
3
77777775
✓ =
2
6664
✓0✓1...✓d
3
7775
h
⇣x
(i)⌘=
dX
j=0
✓jx(i)j
R(d+1)⇥1 Rn⇥(d+1)
J(✓) =1
2n
nX
i=1
⇣✓
|x
(i) � y(i)⌘2
VectorizaIon• ForthelinearregressioncostfuncIon:
47
J(✓) =1
2n(X✓ � y)| (X✓ � y)
J(✓) =1
2n
nX
i=1
⇣h✓
⇣x
(i)⌘� y(i)
⌘2
Rn⇥(d+1)
R(d+1)⇥1
Rn⇥1R1⇥n
Let:
y =
2
6664
y(1)
y(2)
...y(n)
3
7775
ClosedFormSoluIon:
ClosedFormSoluIon• InsteadofusingGD,solveforopImal analyIcally– NoIcethatthesoluIoniswhen
• DerivaIon:
TakederivaIveandsetequalto0,thensolvefor:
48
✓@
@✓J(✓) = 0
J (✓) =1
2n(X✓ � y)| (X✓ � y)
/ ✓|X|X✓ � y|X✓ � ✓|X|y + y|y/ ✓|X|X✓ � 2✓|X|y + y|y
1x1J (✓) =1
2n(X✓ � y)| (X✓ � y)
/ ✓|X|X✓ � y|X✓ � ✓|X|y + y|y/ ✓|X|X✓ � 2✓|X|y + y|y
J (✓) =1
2n(X✓ � y)| (X✓ � y)
/ ✓|X|X✓ � y|X✓ � ✓|X|y + y|y/ ✓|X|X✓ � 2✓|X|y + y|y
@
@✓(✓|X|X✓ � 2✓|X|y + y|y) = 0
(X|X)✓ �X|y = 0
(X|X)✓ = X|y
✓ = (X|X)�1X|y
✓@
@✓(✓|X|X✓ � 2✓|X|y + y|y) = 0
(X|X)✓ �X|y = 0
(X|X)✓ = X|y
✓ = (X|X)�1X|y
@
@✓(✓|X|X✓ � 2✓|X|y + y|y) = 0
(X|X)✓ �X|y = 0
(X|X)✓ = X|y
✓ = (X|X)�1X|y
@
@✓(✓|X|X✓ � 2✓|X|y + y|y) = 0
(X|X)✓ �X|y = 0
(X|X)✓ = X|y
✓ = (X|X)�1X|y
ClosedFormSoluIon• CanobtainbysimplypluggingXand into
• IfXTXisnotinverIble(i.e.,singular),mayneedto:– Usepseudo-inverseinsteadoftheinverse
• Inpython,numpy.linalg.pinv(a) – Removeredundant(notlinearlyindependent)features– Removeextrafeaturestoensurethatd≤n
49
@
@✓(✓|X|X✓ � 2✓|X|y + y|y) = 0
(X|X)✓ �X|y = 0
(X|X)✓ = X|y
✓ = (X|X)�1X|y
y =
2
6664
y(1)
y(2)
...y(n)
3
7775X =
2
66666664
1 x
(1)1 . . . x
(1)d
......
. . ....
1 x
(i)1 . . . x
(i)d
......
. . ....
1 x
(n)1 . . . x
(n)d
3
77777775
✓ y
GradientDescentvsClosedForm
GradientDescentClosedFormSolu+on
50
• RequiresmulIpleiteraIons• Needtochooseα• Workswellwhennislarge• Cansupportincremental
learning
• Non-iteraIve• Noneedforα• Slowifnislarge
–CompuIng(XTX)-1isroughlyO(n3)
ImprovingLearning:FeatureScaling
• Idea:Ensurethatfeaturehavesimilarscales
• Makesgradientdescentconvergemuchfaster
51
0
5
10
15
20
0 5 10 15 20✓1
✓2
BeforeFeatureScaling
0
5
10
15
20
0 5 10 15 20✓1
✓2
AverFeatureScaling
FeatureStandardizaIon• Rescalesfeaturestohavezeromeanandunitvariance
– Letμjbethemeanoffeaturej:
– Replaceeachvaluewith:
• sjisthestandarddeviaIonoffeaturej • Couldalsousetherangeoffeaturej (maxj–minj)forsj
• MustapplythesametransformaIontoinstancesforbothtrainingandpredicIon
• Outlierscancauseproblems
52
µj =1
n
nX
i=1
x
(i)j
x
(i)j
x
(i)j � µj
sj
forj=1...d(notx0!)
QualityofFit
OverfiHng:• Thelearnedhypothesismayfitthetrainingsetverywell()
• ...butfailstogeneralizetonewexamples
53
Price
Size
Price
Size
Price
Size
Underfinng(highbias)
Overfinng(highvariance)
Correctfit
J(✓) ⇡ 0
BasedonexamplebyAndrewNg
RegularizaIon• AmethodforautomaIcallycontrollingthecomplexityofthelearnedhypothesis
• Idea:penalizeforlargevaluesof– CanincorporateintothecostfuncIon– Workswellwhenwehavealotoffeatures,eachthatcontributesabittopredicIngthelabel
• CanalsoaddressoverfinngbyeliminaIngfeatures(eithermanuallyorviamodelselecIon)
54
✓j
RegularizaIon• LinearregressionobjecIvefuncIon
– istheregularizaIonparameter()– NoregularizaIonon!
55
J(✓) =1
2n
nX
i=1
⇣h✓
⇣x
(i)⌘� y(i)
⌘2+ �
dX
j=1
✓2j
modelfittodata regularizaIon
✓0
� � � 0
J(✓) =1
2n
nX
i=1
⇣h✓
⇣x
(i)⌘� y(i)
⌘2+
�
2
dX
j=1
✓2j
UnderstandingRegularizaIon
• Notethat
– Thisisthemagnitudeofthefeaturecoefficientvector!
• Wecanalsothinkofthisas:
• L2regularizaIonpullscoefficientstoward0
56
dX
j=1
✓2j = k✓1:dk22
dX
j=1
(✓j � 0)2 = k✓1:d � ~0k22
J(✓) =1
2n
nX
i=1
⇣h✓
⇣x
(i)⌘� y(i)
⌘2+
�
2
dX
j=1
✓2j
UnderstandingRegularizaIon
• Whathappensifwesettobehuge(e.g.,1010)?
57
�Price
Size
BasedonexamplebyAndrewNg
J(✓) =1
2n
nX
i=1
⇣h✓
⇣x
(i)⌘� y(i)
⌘2+
�
2
dX
j=1
✓2j
UnderstandingRegularizaIon
• Whathappensifwesettobehuge(e.g.,1010)?
58
�Price
Size0 0 0 0
BasedonexamplebyAndrewNg
J(✓) =1
2n
nX
i=1
⇣h✓
⇣x
(i)⌘� y(i)
⌘2+
�
2
dX
j=1
✓2j
RegularizedLinearRegression
59
• CostFuncIon
• Fitbysolving
• Gradientupdate:
min✓
J(✓)
✓j ✓j � ↵
1
n
nX
i=1
⇣h✓
⇣x
(i)⌘� y
(i)⌘x
(i)j
✓0 ✓0 � ↵1
n
nX
i=1
⇣h✓
⇣x
(i)⌘� y(i)
⌘
regularizaIon
@
@✓jJ(✓)
@
@✓0J(✓)
J(✓) =1
2n
nX
i=1
⇣h✓
⇣x
(i)⌘� y(i)
⌘2+
�
2
dX
j=1
✓2j
� ↵�✓j
RegularizedLinearRegression
60
✓0 ✓0 � ↵1
n
nX
i=1
⇣h✓
⇣x
(i)⌘� y(i)
⌘
• Wecanrewritethegradientstepas:
J(✓) =1
2n
nX
i=1
⇣h✓
⇣x
(i)⌘� y(i)
⌘2+
�
2
dX
j=1
✓2j
✓j ✓j (1� ↵�)� ↵
1
n
nX
i=1
⇣h✓
⇣x
(i)⌘� y
(i)⌘x
(i)j
✓j ✓j � ↵
1
n
nX
i=1
⇣h✓
⇣x
(i)⌘� y
(i)⌘x
(i)j � ↵�✓j
RegularizedLinearRegression
61
✓ =
0
BBBBB@X|X + �
2
666664
0 0 0 . . . 00 1 0 . . . 00 0 1 . . . 0...
......
. . ....
0 0 0 . . . 1
3
777775
1
CCCCCA
�1
X|y
• ToincorporateregularizaIonintotheclosedformsoluIon:
RegularizedLinearRegression
62
• ToincorporateregularizaIonintotheclosedformsoluIon:
• Canderivethisthesameway,bysolving
• Canprovethatforλ>0,inverseexistsintheequaIonabove
✓ =
0
BBBBB@X|X + �
2
666664
0 0 0 . . . 00 1 0 . . . 00 0 1 . . . 0...
......
. . ....
0 0 0 . . . 1
3
777775
1
CCCCCA
�1
X|y
@
@✓J(✓) = 0
Top Related