Chapter 13: SIMPLE LINEAR REGRESSION. 2 Simple Regression Linear Regression.
Regression Tutorial 201 With NumXL
-
Upload
spider-financial -
Category
Documents
-
view
224 -
download
0
Transcript of Regression Tutorial 201 With NumXL
-
7/30/2019 Regression Tutorial 201 With NumXL
1/12
Regression201Tutorial 1 SpiderFinancialCorp,2013
Tutorial:Regression201
Thisisthethirdentryinourregressionanalysisandmodelingseries.Inthistutorial,wecontinuethe
analysisdiscussionwestartedearlierbyleveragingamoreadvancedtechniqueinfluentialdata
analysis tohelpusimprovethemodel,and,asaresult,thereliabilityoftheforecast.
Again,wewilluseasampledatasetgatheredfrom20differentsalespersons.Theregressionmodel
attemptstoexplainandpredicttheweeklysalesforeachperson(dependentvariable)usingtwo
explanatoryvariables:intelligence(IQ)andextroversion.
DataPreparationSimilartowhatwedidinourearliertutorial,weorganizeoursampledatabyplacingthevalueofeach
variableinaseparatecolumnandeachobservationinaseparaterow.
Next,weintroducethemask.ThemaskisaBooleanarray(0,1)thatchooseswhichvariableis
included(orexcluded)intheanalysis.
Initially,atthetopofthetable,letsinsertthemaskcellsarray;eachwithavalueof1(i.e.included).
Thearrayisshownbelowhighlightedbelow:
Inthisexample,wehave20observationsandtwoindependent(explanatory)variables.Theresponseor
dependentvariableistheweeklysales.
ProcessNowwearereadytoconductourregressionanalysis. First,selectanemptycellinyourworksheet
whereyouwishtheoutputtobegenerated,thenlocateandclickontheregressioniconintheNumXL
-
7/30/2019 Regression Tutorial 201 With NumXL
2/12
Regression201Tutorial 2 SpiderFinancialCorp,2013
tab(ortoolbar).
NowtheRegressionWizardwillappear.
Selectthecellsrangefortheresponse/dependentvariablevalues(i.e.weeklysales).Selectthecells
rangefortheexplanatory(independent)variablesvalues.ForVariables(X)Mask,selectthecellsatthe
topofthedatatable(Booleanarray).
Notes:1. Thecellsrangeincludes(optional)theheading(Label)cell,whichwouldbeusedintheoutput
tableswhereitreferencesthosevariables.
2. Theexplanatoryvariables(i.e.X)arealreadygroupedbycolumns(eachcolumnrepresentsavariable),sowedontneedtochangethat.
3. Bydefault,theoutputcellsrangeissettothecurrentselectedcellinyourworksheet.Pleasenotethat,onceweselecttheXandYcellsrange,theoptions,ForecastandMissingValues
tabsbecomeavailable(enabled).
Next,selecttheOptionstab.
-
7/30/2019 Regression Tutorial 201 With NumXL
3/12
Regression201Tutorial 3 SpiderFinancialCorp,2013
Initially,thetabissettothefollowingvalues:
Theregressionintercept/constantisleftblank.Thisindicatesthattheregressioninterceptwillbeestimatedbytheregression.Tosettheregressiontoafixedvalue(e.g.zero(0)),enterit
there.
Thesignificancelevel(aka. )issetto5% Inoutputsection,themostcommonregressionanalysisisselected. Forautomodeling,checkthisoption.
Now,clicktheMissingValuestab.
-
7/30/2019 Regression Tutorial 201 With NumXL
4/12
Regression201Tutorial 4 SpiderFinancialCorp,2013
Inthistab,youcanselectanapproachtohandlemissingvaluesinthedataset(XandY).Bydefault,any
missingvaluefoundinXorinYinanyobservationwouldexcludetheobservationfromtheanalysis.
Thistreatmentisagoodapproachforouranalysis,soletsleaveitunchanged.
Now,clickOKtogeneratetheoutputtables.
Toassesstheinfluencethateachobservationexertsonourmodel,wecalculateacoupleofstatistical
measures:leverageandCooksdistance.
-
7/30/2019 Regression Tutorial 201 With NumXL
5/12
Regression201Tutorial 5 SpiderFinancialCorp,2013
Selectthecellnexttotheresponsevariable. Intheformulabar,typeintheMLR_FITTEDfunction,thenclickthefxbutton.
TheFunctionWizardpopsup.Selecttheinputcellsrange,mask,andaReturntypeof4fortheleveragestatistics.ClickOK.
MLR_FITTEDreturnsanarrayofvalues,butyouwillinitiallyonlyseethe1stvalue.
To
display
the
full
array,
select
all
the
cells
below
(to
the
end
of
the
sample).
Press
F2,
then
press
CTRL+SHIFT+ENTERtocopythearrayformula.
-
7/30/2019 Regression Tutorial 201 With NumXL
6/12
Regression201Tutorial 6 SpiderFinancialCorp,2013
Now,tocalculatetheCooksdistance,selectthecellnexttoLeverageandrepeatthesamesteps,butwiththereturntype=5.
AnalysisNowthatwehavetheleverageandCooksdistancestatistics,letsinterprettheirfindings.
-
7/30/2019 Regression Tutorial 201 With NumXL
7/12
Regression201Tutorial 7 SpiderFinancialCorp,2013
1. LeverageStatistics(H)
Leveragestatisticsmeasure
thedistance
of
an
observationfromthecenter
ofthedata.Inourexample,
theintelligenceand
extroversionvaluesfor
Salesman11arefurthest
fromtheaverage.Doesthis
meanSalesman11isan
outlier?Doesthismeanhe
exertsinfluenceonthe
calculationof
the
regression
coefficient?
Toexaminethisassumption,letsremoveSalesman11fromourinputdataandexaminetheresulting
regression.Todoso,justinsertan#N/Avalueinanyinputvariableofthisobservation.
(Fulldataset) Omittingsalesman#11
Droppingobservation11madethingsatbestthesameasearlier. Weoptedtorecoverthisobservation
backintothesample.
Insum,theleveragestatisticsdo
notnecessarilyimplyanoutlier,
butmerely
adistant
observation
withfewneighbors.
2. CooksDistance(D)TheCooksdistancecorrectsfor
weaknessintheleverage
statistics,andisthusmore
0%
10%
20%
30%
40%
50%
60%
70%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Cook'sDistance
(D)
0%
5%
10%
15%
20%
25%
30%
35%
40%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Leverage(H)
-
7/30/2019 Regression Tutorial 201 With NumXL
8/12
Regression201Tutorial 8 SpiderFinancialCorp,2013
indicativeofinfluentialdata.Furthermore,therearefewheuristicsforthethresholdvaluesofCooks
distancetodetectaninfluentialdatum.Forouranalysis,weoftenuse4
Nasathreshold(which
translatesto20%forthe20observationsinourdataset).
Usingthethresholdorjustlookingattheearlierplot,wedetectthatSalesman16exertsthehighest
influenceonourregression,soletsvoidthisobservation(bysetting#N/Ainoneoftheinputvariables).
NotethattheleveragestatisticsandCooksdistancereturn#N/Aforthismissingvalue.
Letsnowexaminetheregressionstatisticsbeforeandafterwedroppedthesixteenthobservation.
(FullDataset) (WithoutSalesman#16)
Asyoumayalreadyhavenoticed,theregressionimprovedsignificantlyoneverydimension(e.g.R
square,stderror,etc.). Salesman#16seemstobeaninfluentialoutlier,sowelldrophim.
-
7/30/2019 Regression Tutorial 201 With NumXL
9/12
Regression201Tutorial 9 SpiderFinancialCorp,2013
Tohelpexplainwhatmakesanobservationinfluential,letsexaminetheextroversionvs.weeklysales
graphbelow:
Wedrawthelineartrendasaproxyforourregressionmodel.Theblack(circle)datapointrepresents
Salesman16.Itslocation(extroversionandweeklysalesvalue)ispullingtheregression(dashed)line
towardit,affectingthevalueoftheregressionslopeandintercept.
Droppingthisobservationreleasestheregressionline,adjustingittobetterfittheremainingpoints.
LetstakeanotherlookattheCooksdistanceplot(withoutSalesman16,andwithathresholdof
421%
19 )
-
7/30/2019 Regression Tutorial 201 With NumXL
10/12
Regression201Tutorial 10 SpiderFinancialCorp,2013
TheCooksdistancevaluesforthedifferentplotsaredistributedsomewhatuniformly,andwemaystop
there.
Note:Bearinmindthatourthresholdruleismerelyaheuristic(ruleofthumb),andshouldnotbetaken
rigidly,butratherasaguideline.
ConclusionInthistutorial,wehaveshownthatexcludingobservation#16isbeneficialtoourmodelingeffortsasit
exertsignificant
influence
on
our
coefficient
calculation.
Next,usingtheremaining19observations,letsrecalculate(SHIFT+F9)theregressionstatistics,ANOVA,
residualsdiagnosis,stepwiseregression,etc.
-
7/30/2019 Regression Tutorial 201 With NumXL
11/12
Regression201Tutorial 11 SpiderFinancialCorp,2013
Theoptimalsetoftheinputvariablesisthesameasearlier.Letsdroptheintelligencevariable(by
settingitsvalueto0inthemask),andrecalculate
Theregressionerroris$307(vs.$332beforeweremovedsalesman#16).
-
7/30/2019 Regression Tutorial 201 With NumXL
12/12
Regression201Tutorial 12 SpiderFinancialCorp,2013
TheFinalquestionwemayaskourselves;Istheregressionstableoverthesampledataset? Nextissue.
$1,500
$2,000
$2,500
$3,000
$3,500
$4,000
$4,500
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20