CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine...
Transcript of CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine...
![Page 1: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.](https://reader033.fdocuments.us/reader033/viewer/2022052022/6036979ebb7f554f314e0564/html5/thumbnails/1.jpg)
CPSC340:MachineLearningandDataMining
LeastSquaresFall2020
![Page 2: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.](https://reader033.fdocuments.us/reader033/viewer/2022052022/6036979ebb7f554f314e0564/html5/thumbnails/2.jpg)
Admin• Assignment3isup:– Startearly,thisisusuallythelongestassignment.
• We’regoingtostartusingcalculus andlinearalgebraalot.– YoushouldstartreviewingtheseASAPifyouarerusty.– Areviewofrelevantcalculusconceptsishere.– Areviewofrelevantlinearalgebraconceptsishere.
![Page 3: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.](https://reader033.fdocuments.us/reader033/viewer/2022052022/6036979ebb7f554f314e0564/html5/thumbnails/3.jpg)
SupervisedLearningRound2:Regression• We’regoingtorevisitsupervisedlearning:
• Previously,weconsideredclassification:– Weassumedyi wasdiscrete:yi =‘spam’oryi =‘notspam’.
• Nowwe’regoingtoconsiderregression:– Weallowyi tobenumerical:yi =10.34cm.
![Page 4: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.](https://reader033.fdocuments.us/reader033/viewer/2022052022/6036979ebb7f554f314e0564/html5/thumbnails/4.jpg)
Example:Dependentvs.ExplanatoryVariables• Wewanttodiscoverrelationshipbetweennumericalvariables:– Doesnumberoflungcancerdeathschangewithnumberofcigarettes?– Doesnumberofskincancerdeathschangewithlatitude?
http://www.cvgs.k12.va.us:81/digstats/main/inferant/d_regrs.htmlhttps://onlinecourses.science.psu.edu/stat501/node/11
![Page 5: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.](https://reader033.fdocuments.us/reader033/viewer/2022052022/6036979ebb7f554f314e0564/html5/thumbnails/5.jpg)
Example:Dependentvs.ExplanatoryVariables• Wewanttodiscoverrelationshipbetweennumericalvariables:– Dopeopleinbigcitieswalkfaster?– Istheuniverseexpandingorshrinkingorstayingthesamesize?
http://hosting.astro.cornell.edu/academics/courses/astro201/hubbles_law.htmhttps://www.nature.com/articles/259557a0.pdf
![Page 6: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.](https://reader033.fdocuments.us/reader033/viewer/2022052022/6036979ebb7f554f314e0564/html5/thumbnails/6.jpg)
Example:Dependentvs.ExplanatoryVariables• Wewanttodiscoverrelationshipbetweennumericalvariables:– Doesnumberofgundeathschangewithgunownership?– Doesnumberviolentcrimeschangewithviolentvideogames?
http://www.vox.com/2015/10/3/9444417/gun-violence-united-states-americahttps://www.soundandvision.com/content/violence-and-video-games
![Page 7: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.](https://reader033.fdocuments.us/reader033/viewer/2022052022/6036979ebb7f554f314e0564/html5/thumbnails/7.jpg)
Example:Dependentvs.ExplanatoryVariables• Wewanttodiscoverrelationshipbetweennumericalvariables:
– DoeshighergenderequalityindexleadtomorewomenSTEMgrads?
• Notthatwe’redoingsupervisedlearning:– Tryingtopredictvalueof1variable(the‘yi’values).(insteadofmeasuringcorrelationbetween2).
• Supervisedlearningdoesnotgivecausality:– OK:“Higherindexiscorrelatedwithlowergrad%”.– OK:“Higherindexhelpspredictlowergrad%”.– BAD:“Higherindexleadstolowergrads%”.
• People/mediagettheseconfusedallthetime,becareful!• Therearelotsofpotentialreasonsforthiscorrelation.
https://www.weforum.org/agenda/2018/02/does-gender-equality-result-in-fewer-female-stem-grads/
![Page 8: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.](https://reader033.fdocuments.us/reader033/viewer/2022052022/6036979ebb7f554f314e0564/html5/thumbnails/8.jpg)
HandlingNumericalLabels• Onewaytohandlenumericalyi:discretize.– E.g.,for‘age’couldweuse{‘age≤20’,‘20<age≤30’,‘age>30’}.– Nowwecanapplymethodsforclassificationtodoregression.– Butcoarsediscretizationlosesresolution.– Andfinediscretizationrequireslotsofdata.
• Thereexistregressionversionsofclassificationmethods:– Regressiontrees,probabilisticmodels,non-parametricmodels.
• Today:oneofoldest,butstillmostpopular/importantmethods:– Linearregressionbasedonsquarederror.– Interpretableandthebuildingblockformore-complexmethods.
![Page 9: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.](https://reader033.fdocuments.us/reader033/viewer/2022052022/6036979ebb7f554f314e0564/html5/thumbnails/9.jpg)
LinearRegressionin1Dimension• Assumeweonlyhave1feature(d=1):– E.g.,xi isnumberofcigarettesandyi isnumberoflungcancerdeaths.
• Linearregressionmakespredictions𝑦"i usingalinearfunctionofxi:
• Theparameter‘w’istheweight orregressioncoefficient ofxi.– We’retemporarilyignoringthey-intercept.
• Asxi changes,slope‘w’affectstheratethat𝑦"i increases/decreases:– Positive‘w’:𝑦"i increaseasxi increases.– Negative‘w’:𝑦"i decreasesasxi increases.
![Page 10: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.](https://reader033.fdocuments.us/reader033/viewer/2022052022/6036979ebb7f554f314e0564/html5/thumbnails/10.jpg)
LinearRegressionin1Dimension
![Page 11: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.](https://reader033.fdocuments.us/reader033/viewer/2022052022/6036979ebb7f554f314e0564/html5/thumbnails/11.jpg)
Aside:terminologywoes• Differentfieldsusedifferentterminologyandsymbols.– Datapoints=objects=examples =rows=observations.– Inputs =predictors=features =explanatoryvariables=regressors =independentvariables=covariates=columns.
– Outputs =outcomes=targets=responsevariables=dependentvariables(alsocalleda“label”ifit’scategorical).
– Regressioncoefficients=weights=parameters=betas.
• Withlinearregression,thesymbolsareinconsistenttoo:– InML,thedataisXandy,andtheweightsarew.– Instatistics,thedataisXandy,andtheweightsareβ.– Inoptimization,thedataisAandb,andtheweightsarex.
![Page 12: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.](https://reader033.fdocuments.us/reader033/viewer/2022052022/6036979ebb7f554f314e0564/html5/thumbnails/12.jpg)
LeastSquaresObjective• Ourlinearmodelisgivenby:
• Sowemakepredictions foranewexamplebyusing:
• Butwecan’tusethesameerrorasbefore:– Itisunlikelytofindalinewhere𝑦"𝑖 = 𝑦𝑖 exactly formanypoints.
• Duetonoise,relationshipnotbeingquitelinearorjustfloating-pointissues.– “Best”modelmayhave|𝑦"𝑖 − 𝑦𝑖| issmall butnotexactly0.
![Page 13: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.](https://reader033.fdocuments.us/reader033/viewer/2022052022/6036979ebb7f554f314e0564/html5/thumbnails/13.jpg)
LeastSquaresObjective• Insteadof“exactyi”,weevaluate“size”oftheerror inprediction.• Classicwayissettingslope‘w’tominimizesumof squarederrors:
• Therearesomejustificationsforthischoice.– Aprobabilisticinterpretationiscominglaterinthecourse.
• Butusually,itisdonebecauseitiseasytominimize.
![Page 14: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.](https://reader033.fdocuments.us/reader033/viewer/2022052022/6036979ebb7f554f314e0564/html5/thumbnails/14.jpg)
LeastSquaresObjective• Classicwaytosetslope‘w’isminimizingsumof squarederrors:
![Page 15: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.](https://reader033.fdocuments.us/reader033/viewer/2022052022/6036979ebb7f554f314e0564/html5/thumbnails/15.jpg)
LeastSquaresObjective• Classicwaytosetslope‘w’isminimizingsumof squarederrors:
![Page 16: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.](https://reader033.fdocuments.us/reader033/viewer/2022052022/6036979ebb7f554f314e0564/html5/thumbnails/16.jpg)
MinimizingaDifferentialFunction• Math101approachtominimizingadifferentiablefunction‘f’:
1. Takethederivativeof‘f’.2. Findpoints‘w’wherethederivativef’(w)isequalto0.3. Choosethesmallestone(andcheckthatf’’(w)ispositive).
![Page 17: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.](https://reader033.fdocuments.us/reader033/viewer/2022052022/6036979ebb7f554f314e0564/html5/thumbnails/17.jpg)
Digression:MultiplyingbyaPositiveConstant• Notethatthisproblem:
• Hasthesamesetofminimizers asthisproblem:
• Andthesealsohavethesameminimizers:
• Icanmultiply‘f’byanypositiveconstantandnotchangesolution.– Derivativewillstillbezeroatthesamelocations.– We’llusethistrickalot!
(Quoratrollingonethicsofthis)
![Page 18: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.](https://reader033.fdocuments.us/reader033/viewer/2022052022/6036979ebb7f554f314e0564/html5/thumbnails/18.jpg)
FindingLeastSquaresSolution• Finding‘w’thatminimizessumof squarederrors:
![Page 19: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.](https://reader033.fdocuments.us/reader033/viewer/2022052022/6036979ebb7f554f314e0564/html5/thumbnails/19.jpg)
FindingLeastSquaresSolution• Finding‘w’thatminimizessumof squarederrors:
• Let’scheckthatthisisaminimizer bycheckingsecondderivative:
– Since(anything)2 isnon-negativeand(anythingnon-zero)2 >0,ifwehaveonenon-zerofeaturethenf’’(w)>0andthisisaminimizer.
![Page 20: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.](https://reader033.fdocuments.us/reader033/viewer/2022052022/6036979ebb7f554f314e0564/html5/thumbnails/20.jpg)
LeastSquaresObjective/Solution(AnotherView)
• Leastsquaresminimizesaquadraticthatisasumofquadratics:
![Page 21: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.](https://reader033.fdocuments.us/reader033/viewer/2022052022/6036979ebb7f554f314e0564/html5/thumbnails/21.jpg)
(pause)
![Page 22: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.](https://reader033.fdocuments.us/reader033/viewer/2022052022/6036979ebb7f554f314e0564/html5/thumbnails/22.jpg)
Motivation:CombiningExplanatoryVariables• Smokingisnottheonlycontributortolungcancer.– Forexample,thereenvironmentalfactorslikeexposuretoasbestos.
• Howcanwemodelthecombined effect ofsmokingandasbestos?• Asimplewayiswitha2-dimensionallinearfunction:
• Wehaveaweightw1 forfeature‘1’andw2 forfeature‘2’:
![Page 23: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.](https://reader033.fdocuments.us/reader033/viewer/2022052022/6036979ebb7f554f314e0564/html5/thumbnails/23.jpg)
LeastSquaresin2-Dimensions• Linearmodel:
• Thisdefinesatwo-dimensionalplane.
![Page 24: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.](https://reader033.fdocuments.us/reader033/viewer/2022052022/6036979ebb7f554f314e0564/html5/thumbnails/24.jpg)
LeastSquaresin2-Dimensions• Linearmodel:
• Thisdefinesatwo-dimensionalplane.
• Notjustaline!
![Page 25: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.](https://reader033.fdocuments.us/reader033/viewer/2022052022/6036979ebb7f554f314e0564/html5/thumbnails/25.jpg)
DifferentNotationsforLeastSquares• Ifwehave‘d’features,thed-dimensionallinearmodel is:
– Inwords,ourmodelisthattheoutputisaweightedsumoftheinputs.
• Wecanre-writethisinsummationnotation:
• Wecanalsore-writethisinvectornotation:
![Page 26: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.](https://reader033.fdocuments.us/reader033/viewer/2022052022/6036979ebb7f554f314e0564/html5/thumbnails/26.jpg)
NotationAlert(again)• Inthiscourse,allvectorsareassumedtobecolumn-vectors:
• SowTxi isascalar:
• Sorowsof‘X’areactuallytransposeofcolumn-vectorxi:
![Page 27: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.](https://reader033.fdocuments.us/reader033/viewer/2022052022/6036979ebb7f554f314e0564/html5/thumbnails/27.jpg)
LeastSquaresind-Dimensions• Thelinearleastsquaresmodelind-dimensionsminimizes:
• Datesbackto1801:GaussusedittopredictlocationofCeres.• Howdowefindthe bestvector‘w’ in‘d’dimensions?– Canwesetthe“partialderivative”ofeachvariableto0?
![Page 28: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.](https://reader033.fdocuments.us/reader033/viewer/2022052022/6036979ebb7f554f314e0564/html5/thumbnails/28.jpg)
PartialDerivatives
http://msemac.redwoods.edu/~darnold/math50c/matlab/pderiv/index.xhtml
![Page 29: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.](https://reader033.fdocuments.us/reader033/viewer/2022052022/6036979ebb7f554f314e0564/html5/thumbnails/29.jpg)
PartialDerivatives
http://msemac.redwoods.edu/~darnold/math50c/matlab/pderiv/index.xhtml
![Page 30: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.](https://reader033.fdocuments.us/reader033/viewer/2022052022/6036979ebb7f554f314e0564/html5/thumbnails/30.jpg)
LeastSquaresPartialDerivatives(1Example)• Thelinearleastsquaresmodelind-dimensionsfor1example:
• Computingthepartialderivative forvariable‘1’:
![Page 31: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.](https://reader033.fdocuments.us/reader033/viewer/2022052022/6036979ebb7f554f314e0564/html5/thumbnails/31.jpg)
LeastSquaresPartialDerivatives(‘n’Examples)• Linearleastsquarespartialderivativeforvariable1onexample‘i’:
• Foragenericvariable‘j’wewouldhave:
• Andif‘f’issummedoverall‘n’exampleswewouldhave:
• Unfortunately,thepartialderivativeforwj dependsonall{w1,w2,…,wd}– Ican’tjust“setequalto0andsolveforwj”.
![Page 32: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.](https://reader033.fdocuments.us/reader033/viewer/2022052022/6036979ebb7f554f314e0564/html5/thumbnails/32.jpg)
GradientandCriticalPointsind-Dimensions• Generalizing“setthederivativeto0andsolve”ind-dimensions:– Find‘w’wherethegradientvector equalsthezerovector.
• Gradient isvectorwithpartialderivative‘j’inposition‘j’:
http://msemac.redwoods.edu/~darnold/math50c/matlab/pderiv/index.xhtml
![Page 33: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.](https://reader033.fdocuments.us/reader033/viewer/2022052022/6036979ebb7f554f314e0564/html5/thumbnails/33.jpg)
GradientandCriticalPointsind-Dimensions• Generalizing“setthederivativeto0andsolve”ind-dimensions:– Find‘w’wherethegradientvector equalsthezerovector.
• Gradient isvectorwithpartialderivative‘j’inposition‘j’:
http://msemac.redwoods.edu/~darnold/math50c/matlab/pderiv/index.xhtml
![Page 34: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L12.pdf · 2020. 9. 6. · Machine Learning and Data Mining Least Squares Fall 2020. Admin ... linear algebra a lot.](https://reader033.fdocuments.us/reader033/viewer/2022052022/6036979ebb7f554f314e0564/html5/thumbnails/34.jpg)
Summary• Regression considersthecaseofanumericalyi.• Leastsquaresisaclassicmethodforfittinglinearmodels.– With1feature,ithasasimpleclosed-formsolution.– Canbegeneralizedto‘d’features.
• Gradient isvectorcontainingpartialderivativesofallvariables.• Nexttime: