3.2: Least Squares Regressions

61
3.2: Least Squares Regressions

Transcript of 3.2: Least Squares Regressions

Page 1: 3.2: Least Squares Regressions

3.2:LeastSquaresRegressions

Page 2: 3.2: Least Squares Regressions

Section3.2Least-SquaresRegression

Afterthissection,youshouldbeableto…

ü INTERPRETaregressionline

üCALCULATEtheequationoftheleast-squaresregressionline

üCALCULATEresiduals

üCONSTRUCTandINTERPRETresidualplots

üDETERMINEhowwellalinefitsobserveddata

ü INTERPRETcomputerregressionoutput

Page 3: 3.2: Least Squares Regressions

RegressionLinesAregressionline summarizestherelationshipbetweentwovariables,butonlyinsettingswhereoneofthevariableshelpsexplain orpredicttheother.

Aregressionline isalinethatdescribeshowa

responsevariabley changesasanexplanatoryvariablex

changes.Weoftenusearegressionlinetopredictthevalueofy

foragivenvalueofx.

Page 4: 3.2: Least Squares Regressions

RegressionLinesRegressionlinesareusedtoconductanalysis.• Collegesusestudent’sSATandGPAstopredictcollegesuccess

• Professionalsportsteamsuseplayer’svitalstats(40yarddash,height,weight)topredictsuccess

• Macy’susesshipping,salesandinventorydatapredictfuturesales.

• MDCPSusesstudentdatatoevaluateteachersusingtheVAMmodel

Page 5: 3.2: Least Squares Regressions

RegressionLineEquationSupposethatyisaresponsevariable(plottedontheverticalaxis)andxisanexplanatoryvariable(plottedonthehorizontalaxis).Aregressionlinerelatingytoxhasanequationoftheform:

ŷ=ax+bInthisequation,•ŷ(read“yhat”)isthepredictedvalueoftheresponsevariableyforagivenvalueoftheexplanatoryvariablex.•aistheslope,theamountbywhichyispredictedtochangewhenxincreasesbyoneunit.•bistheyintercept,thepredictedvalueofywhenx=0.

Page 6: 3.2: Least Squares Regressions

RegressionLineEquation

0.0908x+16.3

Page 7: 3.2: Least Squares Regressions

FormatofRegressionLines

Format1:=0.0908x+16.3=predictedbackpackweight

x=student’sweight

Format2:Predictedbackpackweight=16.3+0.0908(student’sweight)

Page 8: 3.2: Least Squares Regressions

InterpretingLinearRegression• Y-intercept:Astudentweighingzeropoundsispredicted

tohaveabackpackweightof16.3pounds(nopracticalinterpretation).

• Slope:Foreachadditionalpoundthatthestudentweighs,itispredictedthattheirbackpackwillweighanadditional0.0908poundsmore,onaverage.

Page 9: 3.2: Least Squares Regressions

InterpretingLinearRegressionInterpretthey-interceptandslopevaluesincontext.Isthereanypracticalinterpretation?

=37x+270x=HoursStudiedfortheSAT

PredictedSATMathScore

Page 10: 3.2: Least Squares Regressions

InterpretingLinearRegression=37x+270

Slope:Foreachadditionalhourthestudentstudies,his/herscoreispredictedtoincrease

37points,onaverage.Thismakessense

OR thisdoesnotmakesense;itisunreasonableforscorestoincreaseby37pointsforJUSTonehourofstudying.

Page 11: 3.2: Least Squares Regressions

InterpretingLinearRegression=37x+270

Y-intercept:Ifastudentstudiesforzerohours,thenthestudent’spredictedSATscoreis270

points.Thismakessense

OR ThisdoesnotmakesensebecauseaSATscoresof270isverylowregardlessofstudy.

Page 12: 3.2: Least Squares Regressions

PredictedValueWhatisthepredictedSATMathscoreforastudentwhostudies12hours?

=37x+270HoursStudiedfortheSAT(x)PredictedSATMathScore(y)

Page 13: 3.2: Least Squares Regressions

PredictedValueWhatisthepredictedSATMathscoreforastudentwhostudies12hours?

=37x+270HoursStudiedfortheSAT(x)PredictedSATMathScore(y)

=37(12)+270PredictedScore:714points

Page 14: 3.2: Least Squares Regressions

SelfCheckQuiz!

Page 15: 3.2: Least Squares Regressions

SelfCheckQuiz:CalculatetheRegressionEquation

AcrazyprofessorbelievesthatachildwithIQ100shouldhaveareadingtestscoreof50,andthatreadingscoreshouldincreaseby1pointforeveryadditionalpointofIQ.Whatistheequationoftheprofessor’sregressionlineforpredictingreadingscorefromIQ?Besuretoidentifyallvariablesused.

Page 16: 3.2: Least Squares Regressions

SelfCheckQuiz:CalculatetheRegressionEquation

AcrazyprofessorbelievesthatachildwithIQ100shouldhaveareadingtestscoreof50,andthatreadingscoreshouldincreaseby1pointforeveryadditionalpointofIQ.Whatistheequationoftheprofessor’sregressionlineforpredictingreadingscorefromIQ?Besuretoidentifyallvariablesused.

Answer:=50+x=predictedreadingscore

x=numberofIQpointsabove100

Page 17: 3.2: Least Squares Regressions

SelfCheckQuiz:InterpretingRegressionLines&PredictedValueDataontheIQtestscoresandreadingtestscoresforagroupoffifth-gradechildrenresultedinthefollowingregressionline:predictedreadingscore=−33.4+0.882(IQscore)

(a)What’stheslopeofthisline?Interpretthisvalueincontext.(b)What’sthey-intercept?Explainwhythevalueoftheinterceptisnotstatisticallymeaningful.(c)FindthepredictedreadingscoresfortwochildrenwithIQscoresof90and130,respectively.

Page 18: 3.2: Least Squares Regressions

predictedreadingscore=−33.4+0.882(IQscore)

(a)Slope=0.882.Foreach1pointincreaseofIQscore,thereadingscoreispredictedtoincrease0.882points,onaverage.

(b)Y-intercept=-33.4.IfthestudenthasanIQofzero,whichisessentialimpossible(wouldnotbeabletoholdapenciltotaketheexam),thescorewouldbe-33.4.Thishasnopracticalinterpretation.

(c)PredictedValue:90:-33.4+0.882(90)=45.98130:-33.4+0.882(130)=81.26points.

Page 19: 3.2: Least Squares Regressions

Least-SquaresRegressionLineDifferentregressionlinesproducedifferentresiduals.TheregressionlineweuseinAPStatsisLeast-SquaresRegression.Theleast-squaresregressionlineofyonxisthelinethatmakesthesumofthesquaredresidualsassmallaspossible.

Page 20: 3.2: Least Squares Regressions

ResidualsAresidual isthedifferencebetweenanobservedvalueoftheresponsevariableandthevaluepredictedbytheregressionline.Thatis,

residual=actualy – predictedy(rememberAP)

residual=y - ŷ

residual

Positiveresiduals(aboveline)

Negativeresiduals(belowline)

Page 21: 3.2: Least Squares Regressions

HowtoCalculatetheResidual

1. Calculatethepredictedvalue,byplugginginxtotheLSRE.

2. Determinetheobserved/actualvalue.3. Subtract.

Page 22: 3.2: Least Squares Regressions

CalculatetheResidual1. Ifastudentweighs170poundsandtheirbackpackweighs

35pounds,whatisthevalueoftheresidual?

2. Ifastudentweighs105poundsandtheirbackpackweighs24pounds,whatisthevalueoftheresidual?

Page 23: 3.2: Least Squares Regressions

CalculatetheResidual1.Ifastudentweighs170poundsandtheirbackpackweighs35pounds,whatisthevalueoftheresidual?

Predicted:ŷ=16.3+0.0908(170)=31.736Observed:35Residual:35- 31.736=3.264poundsThestudent’sbackpackweighs3.264poundsmorethanpredicted.

Page 24: 3.2: Least Squares Regressions

CalculatetheResidual2.Ifastudentweighs105poundsandtheirbackpackweighs24pounds,whatisthevalueoftheresidual?

Predicted:ŷ=16.3+0.0908(105)=25.834Observed:24Residual:24– 25.834=-1.834Thestudent’sbackpackweighs1.834poundslessthanpredicted

Page 25: 3.2: Least Squares Regressions

CheckYourUnderstandingSomedatawerecollectedontheweightofamalewhitelaboratoryratforthefirst25weeksafteritsbirth.Ascatterplotofy =weight(ingrams)andx=timesincebirth(inweeks)showsafairlystrong,positivelinearrelationship.Theregressionequation𝒚" = 𝟏𝟎𝟎 + 𝟒𝟎𝒙modelsthedatawell.A. Predicttherat’sweightat16weeksold.

B.Calculateandinterprettheresidualiftheratweighed700gramsat16weeksold

C.Shouldyouusethislinetopredicttherat’sweightat2yearsold?

Page 26: 3.2: Least Squares Regressions

ResidualPlotsAresidualplot isascatterplotoftheresidualsagainsttheexplanatoryvariable.Residualplotshelpusassesshowwellaregressionlinefitsthedata.

Page 27: 3.2: Least Squares Regressions

TI-NSpire:ResidualPlots1. PressMENU,4:Analyze2. Option6:Residual,Option2:ShowResidualPlot

Page 28: 3.2: Least Squares Regressions

InterpretingResidualPlotsAresidualplotmagnifiesthedeviationsofthepointsfromtheline,makingiteasiertoseeunusualobservationsandpatterns.

1) Theresidualplotshouldshownoobviouspatterns2) Theresidualsshouldberelativelysmallinsize.

Avalidresidualplotshouldlooklikethe“nightsky”withapproximatelyequalamountsofpositiveandnegativeresiduals.

Pattern in residualsLinear model not

appropriate

Page 29: 3.2: Least Squares Regressions

ShouldYouUseLSRL?1.

2.

Page 30: 3.2: Least Squares Regressions

InterpretingComputerRegressionOutput

Besureyoucanlocate:theslope,they interceptanddeterminetheequationoftheLSRL.

𝒚" =-0.0034415x+3.5051𝒚" =predicted....x=explanatoryvariable

Page 31: 3.2: Least Squares Regressions

DetermineistheequationoftheLSRL.

Page 32: 3.2: Least Squares Regressions

DetermineistheequationoftheLSRL.

𝒚" =174.40x+72.95x=customersinline𝒚" =predictedsecondsittakestocheckout.

Page 33: 3.2: Least Squares Regressions

r2:CoefficientofDeterminationr2tellsushowmuchbettertheLSRLdoesatpredictingvaluesofythansimplyguessingthemeany foreachvalueinthedataset.

Inthisexample,r2 equals60.6%.

60.6%ofthevariationinpackweightisexplainedbythelinearrelationshipwithbodyweight.

(Insertr2)%ofthevariationiny isexplainedbythelinearrelationshipwithx.

Page 34: 3.2: Least Squares Regressions

Interpretr2

Interpretinasentence(howmuchvariationisaccountedfor?)

1. r2 =0.875,x=hoursstudied,y=SATscore2. r2 =0.523,x=hoursslept,y=alertnessscore

Page 35: 3.2: Least Squares Regressions

Answers:1. 87.5%ofthevariationinSATscoreis

explainedbythelinearrelationshipwiththenumberofhoursstudied.

2. 52.3%ofthevariationinalertnessscoreisexplainedbythelinearrelationshipwiththenumberofhoursslept.

Interpretr2

Page 36: 3.2: Least Squares Regressions

S:StandardDeviationoftheResiduals

Ifweusealeast-squaresregressionlinetopredictthevaluesofaresponsevariabley fromanexplanatoryvariablex,thestandarddeviationoftheresiduals(s) isgivenby

SrepresentsthetypicaloraverageERROR(residual).

Positive=UNDERpredictsNegative=OVERpredicts

s =residuals2

n 2=

(yi Ù y )2

n 2

Page 37: 3.2: Least Squares Regressions

S:StandardDeviationoftheResiduals

1.Identifyandinterpretthestandarddeviationoftheresidual.

Page 38: 3.2: Least Squares Regressions

S:StandardDeviationoftheResiduals

Answer:S=0.740

Interpretation:Onaverage,themodelunderpredictsfatgainby0.740kilogramsusingtheleast-squaresregressionline.

Page 39: 3.2: Least Squares Regressions

SelfCheckQuiz!Thedataisarandomsampleof10trainscomparingnumberofcarsonthetrainandfuelconsumptioninpoundsofcoal.• Whatistheregressionequation?Besuretodefineallvariables.• Whatisr2 tellingyou?• Defineandinterprettheslopeincontext.Doesithavea

practicalinterpretation?• Defineandinterpretthey-interceptincontext.• Whatisstellingyou?

Page 40: 3.2: Least Squares Regressions

1.ŷ=2.1495x+10.667ŷ=predictedfuelconsumptioninpoundsofcoalx=numberofrailcars

2.96.7%ofthevarationisfuelconsumptionisexplainedbythelinearrelationshipwiththenumberofrailcars.3.Slope=2.1495.Witheachadditionalcar,thefuelconsuptionincreasedby2.1495poundsofcoal,onaverage.Thismakespracticalsense.4.Y-interpect=10.667.Whentherearenocarsattachedtothetrainthefuelconsuptionis10.667poundsofcoal.Thishasnopracticalintrepretationbeacusethereisalwaysatleastonecar,theengine.5.S=4.361.Onaverage,themodelunderpredictsfuelconsumptionby4.361poundsofcoalusingtheleast-squaresregressionline.

Page 41: 3.2: Least Squares Regressions

ExtrapolationWecanusearegressionlinetopredicttheresponseŷ foraspecificvalueoftheexplanatoryvariablex.Theaccuracyofthepredictiondependsonhowmuchthedatascatterabouttheline.Exercisecautioninmakingpredictionsoutsidetheobservedvaluesofx.

Extrapolation istheuseofaregressionlineforpredictionfaroutsidetheintervalofvaluesoftheexplanatory

variablex usedtoobtaintheline.Suchpredictionsareoftennotaccurate.

Page 42: 3.2: Least Squares Regressions

OutliersandInfluentialPoints

• Anoutlierisanobservationthatliesoutsidetheoverallpatternoftheotherobservations.

• Anobservationisinfluentialforastatisticalcalculationifremovingitwouldmarkedlychangetheresultofthecalculation.

• Pointsthatareoutliersinthex directionofascatterplotareofteninfluentialfortheleast-squaresregressionline.

• Note:Notallinfluentialpointsareoutliers,norarealloutliersinfluentialpoints.

Page 43: 3.2: Least Squares Regressions

OutliersandInfluentialPoints

Theleftgraphisperfectlylinear.Intherightgraph,thelastvaluewaschangedfrom(5,5)to(8,5)…clearlyinfluential,becauseitchangedthegraphsignificantly.However,theresidualisverysmall.

Page 44: 3.2: Least Squares Regressions

IdentifytheOutlier…

Page 45: 3.2: Least Squares Regressions

IdentifytheOutlier…

Page 46: 3.2: Least Squares Regressions

CheckYourUnderstandingThescatterplotshowsthepayroll(inmillionsofdollars)andnumberofwinsforMajorLeagueBaseballteamsin2016,alongwiththeleast-squaresregressionline.ThepointshighlightedinredrepresenttheLosAngelesDodgers(farright)andtheClevelandIndians(upperleft).

Page 47: 3.2: Least Squares Regressions

CheckYourUnderstandingA.DescribewhatinfluencethepointrepresentingtheLosAngelesDodgershasonthe equationoftheleast-squaresregressionline.Explainyourreasoning.

Page 48: 3.2: Least Squares Regressions

CheckYourUnderstandingB.DescribewhatinfluencethepointrepresentingtheClevelandIndianshasonthestandarddeviation oftheresidualsandr2.Explainyourreasoning.

Page 49: 3.2: Least Squares Regressions

CorrelationandRegressionLimitations

Thedistinctionbetweenexplanatoryandresponsevariablesisimportantinregression.

Page 50: 3.2: Least Squares Regressions

CorrelationandRegressionLimitations

Correlationandregressionlinesdescribeonlylinearrelationships.

NO!!!

Page 51: 3.2: Least Squares Regressions

Correlationandleast-squaresregressionlinesarenotresistant.

CorrelationandRegressionLimitations

Page 52: 3.2: Least Squares Regressions
Page 53: 3.2: Least Squares Regressions

CorrelationandRegressionWisdom

Anassociationbetweenanexplanatoryvariablex andaresponsevariabley,evenifitisverystrong,isnotbyitselfgoodevidencethatchangesinx actuallycausechangesiny.

AssociationDoesNotImplyCausation

Aseriousstudyoncefoundthatpeoplewithtwocarslivelongerthanpeoplewhoonlyownonecar.Owningthreecarsisevenbetter,andsoon.Thereisasubstantialpositivecorrelationbetweennumberofcarsx andlengthoflifey.Why?

Page 54: 3.2: Least Squares Regressions

FRQ2018#1

Page 55: 3.2: Least Squares Regressions

AdditionalCalculations&Proofs

Page 56: 3.2: Least Squares Regressions

Least-SquaresRegressionLineWecanusetechnologytofindtheequationoftheleast-squaresregressionline.Wecanalsowriteitintermsofthemeansandstandarddeviationsofthetwovariablesandtheircorrelation.

Equationoftheleast-squaresregressionlineWehavedataonanexplanatoryvariablex andaresponsevariabley forn individuals.Fromthedata,calculatethemeansandstandarddeviationsofthetwovariablesandtheircorrelation.Theleastsquaresregressionlineisthelineŷ =a +bx with

slope andy intercept

b = rsysx

a = y bx

Page 57: 3.2: Least Squares Regressions

CalculatetheLeastSquaresRegressionLine

SomepeoplethinkthatthebehaviorofthestockmarketinJanuarypredictsitsbehaviorfortherestoftheyear.Taketheexplanatoryvariablex tobethepercentchangeinastockmarketindexinJanuaryandtheresponsevariabley tobethechangeintheindexfortheentireyear.Weexpectapositivecorrelationbetweenx andy becausethechangeduringJanuarycontributestothefullyear’schange.Calculationfromdataforan18-yearperiodgivesMeanx=1.75% Sx=5.36% Meany=9.07%Sy =15.35% r=0.596Findtheequationoftheleast-squareslineforpredictingfull-yearchangefromJanuarychange.Showyourwork.

Page 58: 3.2: Least Squares Regressions

TheRoleofr2 inRegressionThestandarddeviationoftheresidualsgivesusanumericalestimateoftheaveragesizeofourpredictionerrors.

Thecoefficientofdeterminationr2 isthefractionofthevariationinthevaluesofy thatisaccountedforbytheleast-squaresregressionlineofy onx.Wecancalculater2 usingthefollowingformula:

Inpracticality,justsquarethecorrelationr.

r2 =1 SSESST

= 2residualSSE = 2)( yySST i

Page 59: 3.2: Least Squares Regressions

AccountedforError

IfweusetheLSRLtomakeourpredictions,thesumofthesquaredresidualsis30.90.SSE=30.90

1– SSE/SST=1–30.97/83.87r2 =0.63263.2%ofthevariationinbackpackweightisaccountedforbythelinearmodelrelatingpackweighttobodyweight.

Page 60: 3.2: Least Squares Regressions

Ifweusethemeanbackpackweightasourprediction,thesumofthesquaredresidualsis83.87.SST=83.87

SSE/SST=30.97/83.87SSE/SST=0.368

Therefore,36.8%ofthevariationinpackweightisunaccountedfor bytheleast-squaresregressionline.

UnaccountedforError

Page 61: 3.2: Least Squares Regressions

InterpretingaRegressionLineConsidertheregressionlinefromtheexample(pg.164)“DoesFidgetingKeepYouSlim?”Identifytheslopeandy-interceptandinterpreteachvalueincontext.

The y-intercept a = 3.505 kg is the fat gain estimated by this model if NEA does not change when a person overeats.

The slope b = -0.00344 tells us that the amount of fat gained is predicted to go down by 0.00344 kg for each added calorie of NEA.

fatgain = 3.505 - 0.00344(NEA change)