Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis...

70
Hypothesis Testing I: Introduction and Comparing means BIO5312 FALL2017 STEPHANIE J. SPIELMAN, PHD

Transcript of Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis...

Page 1: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

HypothesisTestingI:IntroductionandComparingmeansBIO5312FALL2017

STEPHANIE J. SPIELMAN,PHD

Page 2: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

StatisticalinferencePopulation Sample

Random sampling

Statistical inferencePopulation parameters !, "

Sample estimates x, s

Page 3: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

RecallfromlastweekFor𝑋~𝑁(0,1),whatistheprobabilityP(X≤0.47)?

0.0

0.1

0.2

0.3

0.4

−5 −4 −3 −2 −1 0 1 2 3 4 5Z

Probability

0.47

# CDF: P(X <= 0.47)> pnorm(0.47)

[1] 0.6808225

Page 4: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

ComputingprobabilitiesforapopulationClownfishlengthsarenormallydistributedasN(11,1.9).

Whatistheprobabilitythataclownfishislessthan13cm?

𝑍 = ,-./= 01-00

0.03� = 1.83> pnorm(1.83) [1] 0.966375

Page 5: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

ClownfishlengthsarenormallydistributedasN(11,1.9).

Whatistheprobabilitythatasampleof100clownfishhasameanlessthan11.5?

𝑍 = 𝑥 − 𝜇𝑆𝐸

Computingprobabilitiesforasample

𝑍 = 𝑥 − 𝜇𝜎

𝑆𝐸 = /;�= 0.3

0<<� = 0.19

= 00.=-00<.03

= 2.63 > pnorm(2.63) [1] 0.9957308

Page 6: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

AreturntosamplingdistributionsThesamplingdistribution isthedistributionofallpossiblevaluesanestimateforaparametercantakebasedonrandomsampling◦ Samplingdistributionofthemeans=allthemeansonecouldmeasureacrosssamples

Forclownfish,samplingdistributionofthemean,forN=100,hasastandarddeviationof1.1◦ SE=standarddeviationofthesamplingdistributionofthemean=1.1

Andifwedidn’tknowthepopulation𝜎 tocomputeSE?◦ WecouldapproximateasSE= >

;�,wheres isstandarddeviationofasample

Page 7: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

HypothesistestingComparedata(randomsample)totheexpectationofaspecific nullhypothesis

Hypothesistestingusesprobabilitytoanswerwhetheranobservedeffectoccurredbychance

Page 8: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

ExamplescenariotousehypothesistestingThepoliovaccinewasfirsttestedin1954.~400,000studentsweredividedintotworandomgroups:◦ Halfreceivedthevaccine,halfreceivedplacebo◦ Invaccinegroup,0.016%developedpolio.◦ Inplacebogroup,0.057%developedpolio.

Wecanusehypothesistestingtoaskifthevaccinelikelyworked,orwhetherrandomchancelikelycausedresults.

Page 9: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

HypothesistestinghasnullandalternativehypothesesThenullhypothesisH0 makesaclaimabouttheunderlyingpopulationparameter◦ "Nothinginterestingisgoingon"◦ Specifictestshavespecificnullhypotheses

ThealternativehypothesisHA iswhatwewouldliketo"knowifit'strue"◦ "Somethingwecareaboutisgoingon"

Page 10: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

Parametricvs.nonparametrichypothesistestsParametrictestsassumethatthedatafollowaparticularknowndistribution◦ Distributionssuchasnormal,binomial,chi-squared,etc.

Nonparametric testsmakenoassumptionaboutthedataandare"distribution-free"

Page 11: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

ThenulldistributionrepresentsH0Thenulldistribution isthesamplingdistributionofoutcomesforateststatisticassumingthenullistrue

Hypothesistestingissometimesreferredtoasnullhypothesissignificancetesting

Hypothesistestsask:Towhatextentaremydataexpectedunderthenullhypothesis?

Page 12: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

0.00

0.05

0.10

0.15

−4 −3 −2 −1 0 1 2 3 4test statistic

Prob

abilit

y de

nsity

Samplingdistributionofthenullhypothesis

Xvaluesinthetails areunlikely

Xvaluesinmiddleofthedistributionarefairlylikely

Page 13: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

0.00

0.05

0.10

0.15

−4 −3 −2 −1 0 1 2 3 4test statistic

Prob

abilit

y de

nsity

Samplingdistributionofthenullhypothesis

ShadedareasaretheCriticalregions

criticalvalue criticalvalue

Thetotalareaofthecriticalregionsisdefinedas𝛂

Page 14: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

CriticalvalueswhereN(0,1)isthenull

Page 15: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

TheP-valueistheareaunderthecurveforyourteststatisticResultofhypothesistestissignificant ifteststatisticfallsinthecriticalregion

0.00

0.05

0.10

0.15

−4 −3 −2 −1 0 1 2 3 4test statistic

Prob

abilit

y de

nsity

Forteststatistic=2.25,thesumofareaislessthan𝛂

0.00

0.05

0.10

0.15

−4 −3 −2 −1 0 1 2 3 4test statistic

Prob

abilit

y de

nsity

0.00

0.05

0.10

0.15

−4−3−2−101234test statistic

Probability density

-2.25 2.25

NOTE:Itwillnotalwaysbesymmetric

Page 16: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

FormingconclusionsBasedonyourpre-chosenα:

1. P-value<=α◦ Significantresultsallowustorejectthenullhypothesisandconclude

evidenceinfavorofthealternativehypothesis

2. P-value> α◦ Wedonothavesignificantresults.Wefailtorejectthenullhypothesis,

andwehavenoevidenceinfavorofthealternativehypothesis.

Thechoiceofαistotallyarbitrary,butusuallyyouwillsee0.05or0.01

Page 17: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

Errorratesα setstheoverallfalsepositiverate forourtestprocedure◦ Ifthenullistrue,wefalselyrejectthenull5%ofthetimeforα=0.05

Truthaboutpopulation(generallyunknown)

Conclusion

Nullistrue Alternativeistrue

Rejectnull(P<= α) TypeIerror(Falsepositive)

Truepositive

Failtorejectnull(P> α) True negative TypeIIError(Falsenegative)

Page 18: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

Whattypeoferrorisit(orisit?)Anewarthritisdrugdoesnothaveaneffectinclinicaltrials,eventhoughitactuallydoesreducearthritispain.ApersonwithHIVreceivesapositivetestresultforHIV.Apersonusingillegalperformingenhancingdrugspassesatestclearingthemofdruguse.Astudyfoundasignificantrelationshipbetweenneckstrainandjogging,whenrealitythereisnorelationship.Anhealthyindividualgetsapositivecancerbiopsyresult.

FN(typeII)

Noerror

FN(typeII)

FP(typeI)

FP(typeI)

Page 19: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

One-sidedvs.Two-sided(or–tailed)One-sidedtestsaredirectional◦ Aremydatalarger/smallerthannull?

Two-sidedtestsarenon-directional◦ Domydatadifferfromnull?

Total area=𝛂

Doesdatadifferfromnull?

Isdata<null?

Isdata>null?

Page 20: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

One-sidedvs.Two-sidedtestsOne-sidedtestshavemorepowerthantwo-sidedtests◦ Power=theabilitytodetectatrueeffect◦ Alsoknownastruepositiverate

One-sidedtestsaremorelimitedinscopeandcangetyouintroubleifyouchoosethewrongdirection

Youmustchooseonlyonetestbeforeyoulookatyourdata

Page 21: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

Approachtohypothesistesting1. Decidewhatquestionyouareinterestedinanswering2. Determinetheappropriatehypothesistesttouse3. Checkthatyourdatameettheassumptions*ofthetest4. Computetheteststatisticforyourhypothesistestand

thecorrespondingP-value5. Drawconclusionsusinganapriorispecified𝛂 (P-value

threshold)*Parametriconly

Page 22: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

HypothesisteststocomparemeansTestthenullhypothesis:◦ Onesampletest:Themeanofmysampleequalsanullvalue◦ Twosampletest:Themeanofmytwosamplesareequal(differenceinmeansis0)

Twooptions:◦ Z-test,wherethenulldistributionisthestandardnormalN(0,1)◦ TeststatisticisZ

◦ t-test,wherethenulldistributionistheStudent'stdistribution◦ Teststatisticist

Page 23: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

Z-testsvst-testsNormalpopulationshavethesamplingdistribution𝑋~𝑁(𝜇, /

@

;�)

◦ Itsteststatisticis𝑍 = ,-.AB whereSE=𝜎/ 𝑛�

◦ Usedforz-tests

Because𝜎 israrelyknown,wecanapproximateSE = >;�

𝑡 = ,̅-.ABHI

,where𝑆𝐸,̅ = >;�

Page 24: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

StandardnormalandStudent'st

0.0

0.1

0.2

0.3

0.4

−4 −2 0 2 4Z or t

Prob

abilit

y de

nsity

0.0

0.1

0.2

0.3

0.4

−4 −2 0 2 4Z or t

Prob

abilit

y de

nsity

0.0

0.1

0.2

0.3

0.4

−4 −2 0 2 4Z or t

Prob

abilit

y de

nsity

0.0

0.1

0.2

0.3

0.4

−4 −2 0 2 4Z or t

Prob

abilit

y de

nsity

StandardnormalN(0,1)Addingt distributionswithincreasingdegreesoffreedom

df =2 df =5 df =15

Page 25: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

Student'stwheredf =100

0.0

0.1

0.2

0.3

0.4

−4 −2 0 2 4Z or t

Prob

abilit

y de

nsity

Page 26: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

Propertiesofthet distributionAsdf approaches∞,thet distributionapproachesN(0,1)◦ Usually,df =n– 1,wherenissamplesize

Likethenormaldistribution…◦ t distributionissymmetric◦ Mean=median=mode

Unlikethenormaldistribution…◦ t has*muchfattertails*

0.0

0.1

0.2

0.3

0.4

−4 −2 0 2 4Z or t

Prob

abilit

y de

nsity

Page 27: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

TestAssumptionsSampleisarandomsamplefromitspopulation

Sampledataisnormallydistributed

Page 28: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

Diverightin:Onesamplet-testNullhypothesisH0 :𝒙I = 𝝁

One-sidedtestHA :𝒙I > 𝝁HA :𝒙I < 𝝁

Directional

Two-sidedtestHA :𝒙I ≠ 𝝁

Non-directional

Page 29: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

Performingaone-samplet-testIwanttoknowifthedisease,BadDisease,influenceshumanbodytemperature.Weknowthatstandardhumanbodytemperatureis98.6degreesF.Imeasuredthetemperaturesforarandomsampleof15individualswithBadDisease.Onaverage,theyhaveatempof99.59 degreesF.

DoesBadDiseaseraisebodytemperature?

H0 :BadDiseasedoesnotraisebodytemperature.𝒙I = 𝟗𝟖. 𝟔HA :BadDiseaseraisesbodytemperature.𝒙I > 𝟗𝟖. 𝟔

> head(bad.disease.temp)temp

1 98.174202 97.621373 99.609204 100.441585 99.754836 100.28846

> mean(bad.disease.temp$temp)[1] 99.594

Page 30: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

CheckingassumptionsofthetestThet-testassumesthatoursampledataisnormallydistributed

Hardtotellifnormalfromhistogram!

0

1

2

3

98 99 100 101 102 103Bad disease temperature for n=15

coun

t

Page 31: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

AssessingnormalitywithaQ-QplotQuantile-Quantileplotsgraphicallyshowiftwodatasetscomefromthesamedistribution◦ Ifthepointsfollowthe"expectation"line,datasetsaresimilarlydistributed

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

−2 −1 0 1 2

01

23

45

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

−2 −1 0 1 2

4.4

4.6

4.8

5.0

5.2

5.4

5.6

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

●●

−2 −1 0 1 2

020

4060

80

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Idealscenario.Dataisnormal Dataisnotnormal Closeenough,let'ssaynormal

Page 32: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

Granulardataisalsonormal!

! !

!"#$%&#"'()*+#%,-.*/)*"0%$.'$1*.0-,*$0(*'$.'+#(-*.-2'#('0$*3"04*$0"4#&'()

$0"4#&

Page 33: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

MakingaNormalQ-Qplot> head(bad.disease.temp)

temp1 98.174202 97.621373 99.609204 100.441585 99.754836 100.28846

## Uses base R, not ggplot ##> qqnorm(bad.disease.temp$temp, pch=20)> qqline(bad.disease.temp$temp, col="red")

DataapproximatelyfollowtheQQline.Therefore,assumptionshavebeenmetandwecanrunthetest.

−1 0 1

9899

100

102

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Page 34: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

Performingthet-test1. Calculatetheteststatistic

2. Seewhereteststatisticfallsonitsdistribution

3. ComputeP-valueasareaunderthecurve pastthisstatistic

o TheP-valueistheprobabilityofobtainingateststatisticaslargeorlargerthanthatrecovered

Page 35: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

Computetheteststatistic,tH0 :BadDiseasedoesnotraisebodytemperature.𝒙I = 𝟗𝟖. 𝟔HA :BadDiseaseraisesbodytemperature.𝒙I > 𝟗𝟖. 𝟔

𝑡 = ,̅-.RS�= 33.=3-3T.UV.WW

VX�= 2.66

> mean(bad.disease.temp$temp)[1] 99.594

> sd(bad.disease.temp$temp) [1] 1.438273

> nrow(bad.disease.temp)[1] 15

thisisourteststatistic,t2.66

Moreprecisely,t2.66,df=14

Page 36: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

FindwherethestatisticfallsindistributionOurnulldistributionisat distributionwithdf =14(15-1)

0.0

0.1

0.2

0.3

0.4

−4 −2 0 2 4t

Prob

abilit

y D

ensi

ty

2.66

ThisareaisourP-valuebecausewetestif 𝒙I > 𝟗𝟖. 𝟔

## P(X >= 2.66) ##> 1 – pt(2.66, 14)

0.009

Page 37: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

Formingconclusions

OurP=0.009,whichislessthanα=0.05.ThereforewerejectthenullhypothesisandwehaveevidencethatBadDiseaseraisesbodytemperature.

H0 :BadDiseasedoesnotraisebodytemperature.𝒙I = 𝟗𝟖. 𝟔HA :BadDiseaseraisesbodytemperature.𝒙I > 𝟗𝟖. 𝟔

Page 38: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

Approachtohypothesistesting1. Decidewhatquestionyouareinterestedinanswering

Isthemeanofmydataequalto98.6?

2. DeterminetheappropriatehypothesistesttouseUseaone-samplet-test

3. CheckthatyourdatameettheassumptionsofthetestWeconfirmedthedataisnormallydistributed

4. ComputetheteststatisticforyourhypothesistestandthecorrespondingP-value

Wefoundt=2.66andP=0.009

5. DrawconclusionsusingaspecifiedP-valuethresholdAt𝛂 =0.05,werejectthenullhypothesisandfindevidencethatBadDiseaseraisesbody

temp.

Page 39: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

EffectsizeTheeffectweobservedhereis99.59– 98.6=0.99

Statisticalsignificanceisnotthesameasbiologicalsignificance

0

1

2

3

4

1.0 1.5 2.0 2.5 3.0x

y

Comparisonbetweenredandbluesampleswillshowasignificantdifferencebetweenmeans.Butdoesitmatter?

Page 40: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

ConfidenceintervalsRangeofvaluessurroundingthesampleestimatethatislikelytocontainthepopulationparameter

Generallywecalculatethe95%confidenceinterval(goeswith𝛂 =0.05)

In95%ofrandomsamples,thiswillbetrue:

𝒙I − 𝒕𝟎.𝟎𝟐𝟓 ∗ 𝑺𝑬𝒙I < 𝝁 < 𝒙I + (𝒕𝟎.𝟎𝟐𝟓 ∗ 𝑺𝑬𝒙I)

Page 41: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

95%ConfidenceInterval

Page 42: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

ConceptualizingtheCI

True population mean

10 random samples, in no particular order

sample effect

95% CI

9/10(~95%)oftheserandomsampleshasa95%confidenceintervalthatoverlapsthetruemean

Page 43: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

Doesthisfigurerepresent95%CIs?

True population mean

10 random samples, in no particular order

NO.Ifanything,theseare40%CIs

Page 44: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

CalculatingtheCIConstructupperandlowerlimitsof95%CI◦ Lower:�̅� − (𝑡<.<c= ∗ 𝑆𝐸,̅)◦ Upper:�̅� + (𝑡<.<c= ∗ 𝑆𝐸,̅)

### Calculate t_0.025> qt(0.025, 14)[1] -2.144787

### Calculate t_0.025*SE> t <- abs(qt(0.025, 14)) > se <- sd(bad.disease.temp$temp)/sqrt(15)> t * se

[1] 0.3713605

95%CI=𝒙I ± 𝒕𝟎.𝟎𝟐𝟓 ∗ 𝑺𝑬𝒙I

= 𝟗𝟗. 𝟓𝟗 ± 0.371

Thetruepopulationmeanis95%likelytobeintherange99.22– 99.96

Page 45: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

BringitalltogetherWehaveasamplemeanof99.59withastandarderrorof0.19.

Ourteststatistictdf=14 =2.66,givingaP-value=0.009.Werejectthenullhypothesisat𝛂 =0.05 andhaveevidencethatBadDiseaseraisestemperature.

Oureffectsizeis0.99.

Wefounda95%CIof99.59±0.371,givingthelikelyrangeforthetruepopulationparameter.Notethatthenullof98.6isnotintheCI.

Page 46: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

Reportingnon-significantresultsLet'ssaywefoundtdf=14 =1.05à P=0.15

At𝛂 =0.05,wefailtorejectthenullhypothesis.WehavenoevidencethatBadDiseaseraisesbodytemperature.◦ DoesnotmeanthatBadDiseasedoesn'traisethetemperature– oursamplejusthadnoevidenceforthiseffect.

### Calculate P-value> 1 - pt(1.05, 14)[1] 0.1557531

Page 47: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

Whatis aP-value?TheP-valueisanareaunderthecurveofthenulldistribution◦ Itisthereforetheprobabilityofobservingthiseffectorlargerassumingthenullhypothesisistrue

◦ P-value=P(effectormoreobserved|H0 istrue)◦ P-value=0.009:IfH0 istrue,Iwouldobtainthiseffectorlarger(t>=2.66)in0.9%ofsuchstudiesduetorandomsamplingerror

Page 48: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

Whatis aP-value?AlowP-valuemeansthatthedataareunlikelyunderthenull◦ Wethereforemakeaneducatedguessthatthereisprobablysomethingelsegoingon,suchasthealternativehypothesis

◦ Wecannever ruleoutthepossibilitythatresultswerefullyconsistentwithnull,justunlikely

Page 49: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

P-valuesarenotmagicP-valuescannot evaluatewhetherH0 orHA istrue◦ LargeP-valuesdonot provethenullistrue◦ SmallP-valuesdonot provethealternativeistrue.Theymerelysuggestthenulllikelyisn't.

P-valuesdonot givetheprobabilitythatyoumadetherightconclusion

TwostudieswiththesameP-valuedonotprovidethesameweightofevidence

Page 50: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

DistributionofP-valuesIperform1000t-testsonrandomsamplesfromN(3.5,1)andcomparetonullmean=3.5.Therefore,nullistrue.

Histogram of ps

Pvalues

Count

0.0 0.2 0.4 0.6 0.8 1.0

020

4060

80

P<=0.05à Falsepositives

Page 51: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

DistributionofP-valuesIperform1000t-testsonrandomsamplesfromN(3.5,1)andcomparetonullmean=2.Therefore,nullisfalse.

P−values

Count

0.0 0.2 0.4 0.6 0.8 1.0

0200

400

600

P<=0.05à TruepositivesOnly~67%ofdistribution!

Page 52: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

P-valuesandeffectsize

cally important. This is often untrue. First, the difference maybe too small to be clinically important. The P value carries noinformation about the magnitude of an effect, which is cap-tured by the effect estimate and confidence interval. Second,the end point may itself not be clinically important, as canoccur with some surrogate outcomes: response rates versussurvival, CD4 counts versus clinical disease, change in a mea-surement scale versus improved functioning, and so on.25-27

Misconception #4: Studies with P values on opposite sides of.05 are conflicting. Studies can have differing degrees of sig-nificance even when the estimates of treatment benefit areidentical, by changing only the precision of the estimate, typ-ically through the sample size (Figure 2A). Studies statisti-cally conflict only when the difference between their results isunlikely to have occurred by chance, corresponding to whentheir confidence intervals show little or no overlap, formallyassessed with a test of heterogeneity.

Misconception #5: Studies with the same P value provide thesame evidence against the null hypothesis. Dramatically differentobserved effects can have the same P value. Figure 2B showsthe results of two trials, one with a treatment effect of 3%(confidence interval [CI], 0% to 6%), and the other with aneffect of 19% (CI, 0% to 38%). These both have a P value of.05, but the fact that these mean different things is easilydemonstrated. If we felt that a 10% benefit was necessary tooffset the adverse effects of this therapy, we might well adopta therapy on the basis of the study showing the large effectand strongly reject that therapy based on the study showingthe small effect, which rules out a 10% benefit. It is of coursealso possible to have the same P value even if the lower CI isnot close to zero.

This seeming incongruity occurs because the P value de-fines “evidence” relative to only one hypothesis—the null.There is no notion of positive evidence—if data with a P !.05 are evidence against the null, what are they evidence for?In this example, the strongest evidence for a benefit is for 3%in one study and 19% in the other. If we quantified evidencein a relative way, and asked which experiment provided

greater evidence for a 10% or higher effect (versus the null),we would find that the evidence was far greater in the trialshowing a 19% benefit.13,18,28

Misconception #6: P ! .05 means that we have observeddata that would occur only 5% of the time under the null hypoth-esis. That this is not the case is seen immediately from the Pvalue’s definition, the probability of the observed data, plusmore extreme data, under the null hypothesis. The result withthe P value of exactly .05 (or any other value) is the mostprobable of all the other possible results included in the “tailarea” that defines the P value. The probability of any individ-ual result is actually quite small, and Fisher said he threw inthe rest of the tail area “as an approximation.” As we will seelater in this chapter, the inclusion of these rarer outcomesposes serious logical and quantitative problems for the Pvalue, and using comparative rather than single probabilitiesto measure evidence eliminates the need to include outcomesother than what was observed.

This is the error made in the published survey of medicalresidents cited in the Introduction,3 where the following fouranswers were offered as possible interpretations of P ".05:

a. The chances are greater than 1 in 20 that a differencewould be found again if the study were repeated.

b. The probability is less than 1 in 20 that a difference thislarge could occur by chance alone.

c. The probability is greater than 1 in 20 that a differencethis large could occur by chance alone.

d. The chance is 95% that the study is correct.

The correct answer was identified as “c”, whereas the ac-tual correct answer should have read, “The probability isgreater than 1 in 20 that a difference this large or larger couldoccur by chance alone.”

These “more extreme” values included in the P-value def-inition actually introduce an operational difficulty in calcu-lating P values, as more extreme data are by definition unob-served data. What “could” have been observed depends onwhat experiment we imagine repeating. This means that twoexperiments with identical data on identical patients couldgenerate different P values if the imagined “long run” weredifferent. This can occur when one study uses a stoppingrule, and the other does not, or if one employs multiplecomparisons and the other does not.29,30

Misconception #7: P ! .05 and P !.05 mean the samething. This misconception shows how diabolically difficult itis to either explain or understand P values. There is a bigdifference between these results in terms of weight of evi-dence, but because the same number (5%) is associated witheach, that difference is literally impossible to communicate. Itcan be calculated and seen clearly only using a Bayesian evi-dence metric.16

Misconception #8: P values are properly written as inequal-ities (eg, “P !.02” when P ! .015). Expressing all P values asinequalities is a confusion that comes from the combinationof hypothesis tests and P values. In a hypothesis test, a pre-set“rejection” threshold is established. It is typically set at P !.05, corresponding to a type I error rate (or “alpha”) of 5%. Insuch a test, the only relevant information is whether the

-10

-5

0

5

10

15

20

25

30

35

40

Perc

ent b

enef

it

P=0.30 P=0.002

P=0.05 P=0.05

Same effect, different P-value

Different effect, same P-value

A BFigure 2 Figure showing how the P values of very different signifi-cance can arise from trials showing the identical effect with differentprecision (A, Misconception #4), or how same P value can be de-rived from profoundly different results (B, Misconception #5).

Twelve P-value misconceptions 137

Page 53: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

P-valuesarestronglyinfluencedbysamplesize

𝑡 = ,̅-.RS�= 33.=3-3T.UV.WW

𝟏𝟓�= 2.66à P=0.009

𝑡 = ,̅-.RS�= 33.=3-3T.UV.WW

𝟏𝟎𝟎�= 6.88à P=2.81e-10

Increasingsamplingsizeincreasespower

Poweristheprobabilityyoudetectatrueeffect,i.e.truepositiverate

Page 54: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

P-valuesarekindofanaccidentPersonally,thewriterpreferstosetalowstandardofsignificanceatthe5percentpoint....Ascientificfactshouldberegardedasexperimentallyestablishedonlyifaproperlydesignedexperimentrarelyfailstogivethislevelofsignificance.

- R.A.Fisher

Page 55: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

TorecapWehaveperformedaone-sidedt-testtotestthealternativehypothesisthat𝒙I > 𝟗𝟖. 𝟔

Wecanalsoperformatwo-sidedt-testtotestthenon-directional alternativehypothesisthat 𝒙I ≠ 𝟗𝟖. 𝟔

Page 56: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

Computetheteststatistic,tH0 :BadDiseasedoesnotaffectbodytemperature.𝒙I = 𝟗𝟖. 𝟔HA :BadDiseaseaffects bodytemperature.𝒙I ≠ 𝟗𝟖. 𝟔

𝒙I =99.59𝜇 =98.6n=15

𝑡 = ,̅-.RS�= 33.=3-3T.UV.WW

VX�= 2.66

> head(bad.disease.temp)temp

1 98.174202 97.621373 99.609204 100.441585 99.754836 100.28846

> mean(bad.disease.temp$temp)[1] 99.594

> sd(bad.disease.temp$temp) [1] 1.438273

thisisourteststatistic,t2.66

Moreprecisely,t2.66,df=14

Page 57: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

ComputingtheP-valueForatwo-sidedtest,weconsiderbothextremes

## P(X >= 2.66) or P(X<=-2.66)> 2 * (1 - pt(2.66, 14))

0.01806

0.0

0.1

0.2

0.3

0.4

−4 −2 0 2 4t

Prob

abilit

y D

ensi

ty

2.66

ThiscombinedareaisourP-value=0.018

-2.66

Page 58: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

Whenrunningaone-sidedt-testgoeswrongForsampleofn=20,Iwanttotest�̅� < 𝜇 where𝜇 =5.8> head(example.sample)

values1 5.5113072 5.0126123 6.1784214 7.5875685 6.8921656 5.197049

> mean(example.sample$values)[1] 6.325366

> sd(example.sample$values)[1] 0.8295474

𝑡 = �̅� − 𝜇𝑠𝑛�

= 6.32 − 5.80.8320�

= 2.801

Page 59: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

Theareabelow thestatisticisourP-valuefor�̅� < 𝜇

0.0

0.1

0.2

0.3

0.4

−4 −2 0 2 4t

Prob

abilit

y D

ensi

ty2.801

> pt(2.801, 19)[1] 0.9943006

Page 60: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

Two-samplet-testsComparemeansoftwosamples(�̅�0and�̅�c )toeachother.Aretheunderlyingpopulationmeans𝜇0and𝜇c thesame?

NullhypothesisH0 :𝝁𝟏 =𝝁𝟐

H0 :𝝁𝟏 - 𝝁𝟐 = 𝟎

One-sidedtestHA :𝝁𝟏 > 𝝁𝟐HA :𝝁𝟏 < 𝝁𝟐

HA :𝝁𝟏 −𝝁𝟐 > 𝟎HA :𝝁𝟏 −𝝁𝟐 < 𝟎

Two-sidedtestHA :𝝁𝟏 ≠ 𝝁𝟐

HA :𝝁𝟏 − 𝝁𝟐 ≠ 𝟎

Page 61: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

Formulasfortwo-samplet-test

𝒕 = 𝒙I𝟏 − 𝒙I𝟐𝑺𝑬𝒙I𝟏-𝒙I𝟐

𝑆𝐸,̅V-,̅@ = 𝑠jc 1𝑛0+

1𝑛c

𝑠jc =𝑑𝑓0𝑠0c + 𝑑𝑓c𝑠cc

𝑑𝑓0 + 𝑑𝑓c

𝑑𝑓 = 𝑑𝑓0 + 𝑑𝑓c = 𝑛0 + 𝑛c − 2

Standarderrorfortwosamples

Pooledsamplevariance

Degreesoffreedomfort distribution

Page 62: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

Two-samplet-testassumptionsBothsamplesmustbedistributednormally

Samplesshouldhaveequalvariances◦ F-test cancomparevariancesofsamplestocheckassumption,butitishighlysensitiveandwill"toooften"rejectthenull.

◦ Levene's test willtestforhomogeneityofvariancesaswell◦ CanuseWelch'st-test whenvarianceassumptionisnotmet

Forthisclass,wefocusonnormalassumption

Page 63: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

Two-samplevs.pairedt-testPairedt-testisaspecialcasewherethetwosamplesbeingcomparedhaveanaturalpairing◦ Effectivelyaone-samplet-test wherewetesttheifdifference betweentwosamples=0

Pairedt-testmustcheckassumptionthatdifferencebetweenmeansisnormalYoucanalwaysperformatwo-sampleinsteadofpaired,butpairedwillhavemorepower

Page 64: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

PairedscenariosMakingtwomeasurementsoneachsubject

Makingrepeatmeasurementsonthesamesubjectattwotimepoints◦ Beforeandaftertreatment

Matchingsubjectswithsimilarage,sex,etc.

Placingsubjectandcontrolincloseproximity

Page 65: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

Isitpairedorindependent?Triglyceridelevelsofagroupofsubjectsiscomparedbeforeandaftertakingavitaminsupplement.Foraclinicaltrial,onegroupisgivenavitaminandtheotheraplacebo.Theirtriglyceridelevelsarecompared.Foraclinicaltrial,twogroupswithindividualsmatchedforage,sex,andhealthhistoryaregivenvitaminandplacebo,respectively.Triglyceridelevelsarecompared.Imeasurefreeenergies,betweeninactiveandactiveconformations,for30enzymes.Aclinicaltrialtestsanewdrugon20setsoftwins,giving,foreachtwinpair,oneaplaceboandonethedrug.Comparisonsbetweenplaceboanddruggroupsaremade.Aclinicaltrialtestsanewdrugon20setsoftwins,randomizingallindividualsintoaplacebogroupandoradruggroup.Comparisonsbetweengroupsaremade.

Paired

Independent

PairedPaired

Paired

Independent

Page 66: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

Isitpairedorindependent?Paired:CanIdrawalinebetweenindividualsingroups?◦ MustbesameNpergroup,bydefinition

Independent:Nonaturalwaytodrawthelines.

5

10

15

before afterPaired groups

Mea

sure

d va

lue

Page 67: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

Troubleshooting:Failuretomeetnormalassumption1.Ifsamplesizeislargeenough(>~30),CentralLimitTheoremkicksinandassumptionsareeffectivelymet

2.Ifsampleissizeissmall(<~30)wecaneither:◦ Transform thedatatobenormal◦ Useanonparametrictest

Page 68: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

Non-normaldataright skew

coun

t

0 5 10 15 20 25 30

010

25

left skew

coun

t

−20 −10 0 5 10

020

40

long tails

coun

t

−20 0 10 20 30

010

30

●●

●●

● ●

●●

●●

●●●●●

●●●

●●

●●

●●

●●● ●

●●●

●●

●●●●●

●●●

●●

●●

●● ●●●

●●

● ●● ●●

●●●●

●●●●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

−2 −1 0 1 2

010

2030

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

●● ●●

●● ●

● ●●

● ●●

●●●●

●● ●

●●

●●

●●

●●

●●●●

●●●

●●

●●●

●●

●●

●● ●

● ●

●●

●●●

●●

●●

●●

●●●●

●●

● ●●

●●

●●

●●

●●●

●●

−2 −1 0 1 2

−20

−55

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

●●● ●●

●●

●●

●●●

● ● ● ●● ●

●●●●●●

●● ●●●

●● ●●●●

●● ●● ●

●●

●●●●●

●●●● ●

●● ●

●●●

●●

●●●●

●●

● ●

●●

●●

●●

● ●●

●●●

●●●●●

● ●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

−2 −1 0 1 2

−20

020

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Page 69: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

DatatransformationsLogofdata:xà log(x)

Squarerootofdata:xà sqrt(x)

Inverseofdata:xà 1/x

Page 70: Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

Datatransforms:CautionYourtestwillnowrunontransformeddata◦ Assumelogtransformperformedandresulthaseffectsize1.5◦ Actualeffectsizeisexp(1.5)=4.48

Becarefulof0'sindata◦ 1/0andlog(0)areundefined◦ Hack:Replaceall0'swithtinynumberlike1e-8

InstructorEditorializing:

MuchlikeZ-tables,datatransformsaremostlyhistorical