Quantitative Analysis, Pt. 2 - Environmental Science & Policy · Quantitative Analysis, Pt. 2 ......

32
Quantitative Analysis, Pt. 2 ESP 178 Applied Research Methods Calvin Thigpen 2/28/17 Adapted from Prof. Susan Handy

Transcript of Quantitative Analysis, Pt. 2 - Environmental Science & Policy · Quantitative Analysis, Pt. 2 ......

QuantitativeAnalysis,Pt.2

ESP178AppliedResearchMethodsCalvinThigpen

2/28/17AdaptedfromProf.SusanHandy

Reviewfromlastweek

• Descriptivestatistics• What’sthepoint?Whatarewaystoexamine?

• Keyconcepts:• Measuresofcentraltendency?

• Mean• Median• Mode

• Measuresofvariation?• Standarddeviation• Variance• Percentiles

Reviewfromlastweek

• Bivariate(twovariable)relationships• Howcanyouexaminethese?

• Multivariablerelationships• Howtoanalyze?• Whatcausalitycriterionisaddressedbyincludingmultiplevariablesinaregressionmodel?• Whatisanimportantassumptionoflinearregression?

Importantassumptionoflinearregression• Outcomeisanormallydistributed, continuousratio variable• (afteraccountingforpredictor/independentvariables)• Thisassumptionworksfor(most)ratioDVs

5’4” 5’10”

height

menwomen

CoefficientInterpretation

• Howdoyouinterpretthecoefficient(s)?• Ratio

• Nominalbinary

• Nominal/orderedwithmultiplecategories

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.04084 0.02206 -1.852 0.0644 . IndependentV 0.73379 0.02148 34.156 <2e-16 *** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.6976 on 998 degrees of freedom Multiple R-squared: 0.539, Adjusted R-squared: 0.5385 F-statistic: 1167 on 1 and 998 DF, p-value: < 2.2e-16

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 4.44421 0.09048 49.120 < 2e-16 *** culdesac 1 1.46650 0.20871 7.026 3.93e-12 *** culdesac 2 1.70579 0.32747 5.209 2.31e-07 *** culdesac 3 3.36348 0.13810 24.356 < 2e-16 ***

--- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.99 on 996 degrees of freedom Multiple R-squared: 0.3735, Adjusted R-squared: 0.3716 F-statistic: 198 on 3 and 996 DF, p-value: < 2.2e-16

!" = $ + &'" + ("("~*+,-./ 0, 23

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 4.44421 0.09445 47.06 <2e-16 *** culdesac.binary 2.82323 0.13148 21.47 <2e-16 *** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.078 on 998 degrees of freedom Multiple R-squared: 0.316, Adjusted R-squared: 0.3153 F-statistic: 461.1 on 1 and 998 DF, p-value: < 2.2e-16

“ButwhatifmyDVisn’tacontinuousratiovariable?”• Continuousratiovariable(linearregressionworkswell):

• BoundedorIntegerratiovariable(linearregressiondoesn’tworkaswell):

height

0

20

40

60

80

100

120

DaysBikedinthePastWeek

0 1 2 3 4 5 6 7

“ButwhatifmyDVisn’tacontinuousratiovariable?”• Nominal/Ordinal(linearregressionreally doesn’tworkwell):

ordinalnominal

“ButwhatifmyDVisn’tacontinuousratiovariable?”• Don’tforcethings!• Youdon’tnecessarily havetochangeyoursurveyquestiontoacontinuousratiooutcome(thoughyoumightconsiderit)

• Today:conceptualoverviewoftwoalternativeapproaches,partofthegeneralizedlinearmodelfamily.

“ButwhatifmyDVisn’tacontinuousratiovariable?”• Let’spicktherighttoolforthe(statistical)problem.• We’llfindtherighttoolbythinkingcriticallyabouthowtofitastatisticalmodelfortheDV:

Binomialdistribution

• Describesthelikelihoodofacertain#ofevents(y)occurring,basedonasetnumberof“trials”(n)andanunderlyingprobability(p)

• Importantassumptions:• Theprobabilityisconsistentacrosstrials• Onlytwothingscanhappen

• yes/no• heads/tails• event/noevent

Twoflavors

• “Bernoulli”• Lookingatasingletrial(n=1)

• Aggregate• Lookingattheresultsofmultipletrials(n=#oftrials)• Bernoulli,summedupto#oftrials

PhotoCredit:PauliAntero

RDemonstration

Poissondistribution

• Describesthelikelihoodofacertain#ofeventsoccurring,basedonanaveragerateofoccurrence(lambda)

• Importantassumptions:• Dependentvariablemustbeapositiveinteger

• 0,1,2,3,4,5…

• Eventsoccurindependently• akatheydon’tinfluenceeachother

Whydoesthismatter?

Idon’tstudycoinflipping!

Binomialdistributionexamples

• Transportation:• Modechoice(bikevs.notbike)

• Wateruse:• Installationofwater-efficienttoilet

• Electricity:• Installationofsolarpanel

• Naturalresources:• Iftroutwereillegallycaughtornot

Poissondistributionexamples

• Transportation:• #ofcrashesatagivenintersectioninaday

• Wateruse:• #oftimesatoiletisflushedinaday

• Electricity:• #oftimesalightisturnedoninanhour

• Naturalresources:• #oftroutillegallycaughtinaday

Translatingthesedistributionsintoregressionmodels• Linearregression

• Dependentvariableandlinearmodelareonthesame,unboundedscale.

• Notsowithotherdistributions!• Wearen’texplainingtheoutcomeitself,weareexplainingparametersinthebinomial/poisson/otherdistributions

• Solution:usea“linkfunction”toaddressdifferenceinscalesofparameterandlinearmodel

!" = $ + &'" + ("("~*+,-./ 0, 23

Linkfunction

• Translatestheunboundedlinearmodelintothemorerestrictedscaleofthedistributionparameter

• Canonical(i.e.“typical”)waystodoso:• “Logit”linkforbinomial• Loglinkforpoisson

Walk-throughexample

!"~567+-6./(7, 9")

yisabounded,integerratiovariableoutof20trials(e.g.20coinflips)

xisacontinuousratiovariablethatvariesfrom-5to2(e.g.amountofweightyouplaceonheads(vs.tails)tobiasthecoin)

/+;6< 9" = 0.3 + 1.2 ∗ '"

Walk-throughexample

−5 −4 −3 −2 −1 0 1 2

−6−2

2

x

Line

ar E

quat

ion

−5 −4 −3 −2 −1 0 1 2

0.0

0.4

0.8

Tran

sfor

med

Lin

ear E

quat

ion

onto

Pro

babi

lity

Scal

e

Index

05

1020

x

Pred

icte

d co

unts

out

of 2

0

−5 −4 −3 −2 −1 0 1 2

? ?

05

1020

Index

Pred

icte

d co

unts

out

of 2

0

−5 −4 −3 −2 −1 0 1 2

LinearModel

LinkFunction

WHEW!

Nowlet’sbringitbacktothecul-de-sacexample…

Livingonacul-de-sac

Outdoorplay

RethinkingtheDV:“#oftimesplayedoutsideinthelastweek”• Whatstatisticalmodelwouldyouchoosetoanalyzethisdependentvariable?

Poissonmodelof#oftimesplayedoutsidelastweek

Howtointerpret?• Similarities?• Differences?

Estimate Std.Errorz value Pr(>|z|)(Intercept) 1.555040.0203576.42<2e-16***d$culdesac.binary 0.440630.0263016.76<2e-16***---Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1

Whatdothecoefficientsreallymean,though?• Togetabettersense,translatebackontoscaleofthedependentvariable!

log E = $ + &'E = FGHIJE = FK.LL = 4.71E = FK.LLHO.PP = 7.32

Poissonmodelof#oftimesplayedoutsidelastweek

Estimate Std.Errorz value Pr(>|z|)(Intercept) 1.555040.0203576.42<2e-16***d$culdesac.binary 0.440630.0263016.76<2e-16***---Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1

●●

●● ● ● ● ● ● ● ● ● ●

5 10 15 20

0.00

0.05

0.10

0.15

0.20

Count

Probability

● ●

●●

●● ● ● ● ● ● ●

Howtointerpret?

Again,butwithmultiplecul-de-sacresponseoptions

Estimate Std.Error z value Pr(>|z|)(Intercept)1.555040.0203576.419<2e-16***culdesac 10.247560.044215.5992.15e-08***culdesac 20.394180.061046.4581.06e-10***culdesac 30.499830.0281217.774<2e-16***---Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1

●●

●● ● ● ● ● ● ● ● ● ● ●

5 10 15 20

0.00

0.10

0.20

Count

Probability

●●

● ●●

●●

● ● ● ● ● ● ● ● ●

Again,butwithage

• Howtointerpret?

Estimate Std.Errorz value Pr(>|z|)(Intercept) 3.1642470.05354959.09<2e-16***d$age -0.1584550.006259-25.32<2e-16***---Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1

●●

● ●●

●●

● ● ● ● ● ● ● ● ● ●

5 10 15 20

0.00

0.10

0.20

Count

Probability

●●

● ●●

●●

● ● ● ● ● ● ● ● ●

Again,withageandcul-de-sac

Estimate Std.Error z value Pr(>|z|)(Intercept) 2.8256950.06381244.282<2e-16 ***culdesac 1 0.1498470.0444433.372 0.000747***culdesac 2 0.2173380.0615473.531 0.000414***culdesac 3 0.2963270.0298919.914 <2e-16 ***age -0.1356820.006639-20.43 <2e-16 ***---Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1

Closing:otherprobabilitydistributions• Gamma,exponential,orderedlogit,etc.

McElreath2015,StatisticalRethinking

Recap

DependentVariableLevel Typically AppropriateModel

Nominal(binary) Binomial logisticregressionNominal (multiplecategories) Multinomiallogisticregression

Ordinal Ordinal logisticregression

Ratio(count, greaterthan0) PoissonregressionRatio(count,twooutcomes) Binomial logisticregressionRatio(unbounded,continuous) Linearregression

Todo

Foryou:• Bring3 copiesofyourdraftsurvey/datacollectioninstrumenttoclassonThursday!• Meetin75HutchisonforsectiononFriday!!!• Assignment4duenextThursday (NOTnextTuesday!)

ForDillonandmetodo:• Gradethemidtermandassignment3