Download - Hypothesis Testing I: Introduction and Comparing means · Example scenario to use hypothesis testing The polio vaccine was first tested in 1954. ~400,000 students were divided into

HypothesisTestingI:IntroductionandComparingmeansBIO5312FALL2017

STEPHANIE J. SPIELMAN,PHD

StatisticalinferencePopulation Sample

Random sampling

Statistical inferencePopulation parameters !, "

Sample estimates x, s

RecallfromlastweekFor𝑋~𝑁(0,1),whatistheprobabilityP(X≤0.47)?

0.0

0.1

0.2

0.3

0.4

−5 −4 −3 −2 −1 0 1 2 3 4 5Z

Probability

0.47

# CDF: P(X <= 0.47)> pnorm(0.47)

[1] 0.6808225

ComputingprobabilitiesforapopulationClownfishlengthsarenormallydistributedasN(11,1.9).

Whatistheprobabilitythataclownfishislessthan13cm?

𝑍 = ,-./= 01-00

0.03� = 1.83> pnorm(1.83) [1] 0.966375

ClownfishlengthsarenormallydistributedasN(11,1.9).

Whatistheprobabilitythatasampleof100clownfishhasameanlessthan11.5?

𝑍 = 𝑥 − 𝜇𝑆𝐸

Computingprobabilitiesforasample

𝑍 = 𝑥 − 𝜇𝜎

𝑆𝐸 = /;�= 0.3

0<<� = 0.19

= 00.=-00<.03

= 2.63 > pnorm(2.63) [1] 0.9957308

AreturntosamplingdistributionsThesamplingdistribution isthedistributionofallpossiblevaluesanestimateforaparametercantakebasedonrandomsampling◦ Samplingdistributionofthemeans=allthemeansonecouldmeasureacrosssamples

Forclownfish,samplingdistributionofthemean,forN=100,hasastandarddeviationof1.1◦ SE=standarddeviationofthesamplingdistributionofthemean=1.1

Andifwedidn’tknowthepopulation𝜎 tocomputeSE?◦ WecouldapproximateasSE= >

;�,wheres isstandarddeviationofasample

HypothesistestingComparedata(randomsample)totheexpectationofaspecific nullhypothesis

Hypothesistestingusesprobabilitytoanswerwhetheranobservedeffectoccurredbychance

ExamplescenariotousehypothesistestingThepoliovaccinewasfirsttestedin1954.~400,000studentsweredividedintotworandomgroups:◦ Halfreceivedthevaccine,halfreceivedplacebo◦ Invaccinegroup,0.016%developedpolio.◦ Inplacebogroup,0.057%developedpolio.

Wecanusehypothesistestingtoaskifthevaccinelikelyworked,orwhetherrandomchancelikelycausedresults.

HypothesistestinghasnullandalternativehypothesesThenullhypothesisH0 makesaclaimabouttheunderlyingpopulationparameter◦ "Nothinginterestingisgoingon"◦ Specifictestshavespecificnullhypotheses

ThealternativehypothesisHA iswhatwewouldliketo"knowifit'strue"◦ "Somethingwecareaboutisgoingon"

Parametricvs.nonparametrichypothesistestsParametrictestsassumethatthedatafollowaparticularknowndistribution◦ Distributionssuchasnormal,binomial,chi-squared,etc.

Nonparametric testsmakenoassumptionaboutthedataandare"distribution-free"

ThenulldistributionrepresentsH0Thenulldistribution isthesamplingdistributionofoutcomesforateststatisticassumingthenullistrue

Hypothesistestingissometimesreferredtoasnullhypothesissignificancetesting

Hypothesistestsask:Towhatextentaremydataexpectedunderthenullhypothesis?

0.00

0.05

0.10

0.15

−4 −3 −2 −1 0 1 2 3 4test statistic

Prob

abilit

y de

nsity

Samplingdistributionofthenullhypothesis

Xvaluesinthetails areunlikely

Xvaluesinmiddleofthedistributionarefairlylikely

0.00

0.05

0.10

0.15

−4 −3 −2 −1 0 1 2 3 4test statistic

Prob

abilit

y de

nsity

Samplingdistributionofthenullhypothesis

ShadedareasaretheCriticalregions

criticalvalue criticalvalue

Thetotalareaofthecriticalregionsisdefinedas𝛂

CriticalvalueswhereN(0,1)isthenull

TheP-valueistheareaunderthecurveforyourteststatisticResultofhypothesistestissignificant ifteststatisticfallsinthecriticalregion

0.00

0.05

0.10

0.15

−4 −3 −2 −1 0 1 2 3 4test statistic

Prob

abilit

y de

nsity

Forteststatistic=2.25,thesumofareaislessthan𝛂

0.00

0.05

0.10

0.15

−4 −3 −2 −1 0 1 2 3 4test statistic

Prob

abilit

y de

nsity

0.00

0.05

0.10

0.15

−4−3−2−101234test statistic

Probability density

-2.25 2.25

NOTE:Itwillnotalwaysbesymmetric

FormingconclusionsBasedonyourpre-chosenα:

1. P-value<=α◦ Significantresultsallowustorejectthenullhypothesisandconclude

evidenceinfavorofthealternativehypothesis

2. P-value> α◦ Wedonothavesignificantresults.Wefailtorejectthenullhypothesis,

andwehavenoevidenceinfavorofthealternativehypothesis.

Thechoiceofαistotallyarbitrary,butusuallyyouwillsee0.05or0.01

Errorratesα setstheoverallfalsepositiverate forourtestprocedure◦ Ifthenullistrue,wefalselyrejectthenull5%ofthetimeforα=0.05

Truthaboutpopulation(generallyunknown)

Conclusion

Nullistrue Alternativeistrue

Rejectnull(P<= α) TypeIerror(Falsepositive)

Truepositive

Failtorejectnull(P> α) True negative TypeIIError(Falsenegative)

Whattypeoferrorisit(orisit?)Anewarthritisdrugdoesnothaveaneffectinclinicaltrials,eventhoughitactuallydoesreducearthritispain.ApersonwithHIVreceivesapositivetestresultforHIV.Apersonusingillegalperformingenhancingdrugspassesatestclearingthemofdruguse.Astudyfoundasignificantrelationshipbetweenneckstrainandjogging,whenrealitythereisnorelationship.Anhealthyindividualgetsapositivecancerbiopsyresult.

FN(typeII)

Noerror

FN(typeII)

FP(typeI)

FP(typeI)

One-sidedvs.Two-sided(or–tailed)One-sidedtestsaredirectional◦ Aremydatalarger/smallerthannull?

Two-sidedtestsarenon-directional◦ Domydatadifferfromnull?

Total area=𝛂

Doesdatadifferfromnull?

Isdata<null?

Isdata>null?

One-sidedvs.Two-sidedtestsOne-sidedtestshavemorepowerthantwo-sidedtests◦ Power=theabilitytodetectatrueeffect◦ Alsoknownastruepositiverate

One-sidedtestsaremorelimitedinscopeandcangetyouintroubleifyouchoosethewrongdirection

Youmustchooseonlyonetestbeforeyoulookatyourdata

Approachtohypothesistesting1. Decidewhatquestionyouareinterestedinanswering2. Determinetheappropriatehypothesistesttouse3. Checkthatyourdatameettheassumptions*ofthetest4. Computetheteststatisticforyourhypothesistestand

thecorrespondingP-value5. Drawconclusionsusinganapriorispecified𝛂 (P-value

threshold)*Parametriconly

HypothesisteststocomparemeansTestthenullhypothesis:◦ Onesampletest:Themeanofmysampleequalsanullvalue◦ Twosampletest:Themeanofmytwosamplesareequal(differenceinmeansis0)

Twooptions:◦ Z-test,wherethenulldistributionisthestandardnormalN(0,1)◦ TeststatisticisZ

◦ t-test,wherethenulldistributionistheStudent'stdistribution◦ Teststatisticist

Z-testsvst-testsNormalpopulationshavethesamplingdistribution𝑋~𝑁(𝜇, /

@

;�)

◦ Itsteststatisticis𝑍 = ,-.AB whereSE=𝜎/ 𝑛�

◦ Usedforz-tests

Because𝜎 israrelyknown,wecanapproximateSE = >;�

𝑡 = ,̅-.ABHI

,where𝑆𝐸,̅ = >;�

StandardnormalandStudent'st

0.0

0.1

0.2

0.3

0.4

−4 −2 0 2 4Z or t

Prob

abilit

y de

nsity

0.0

0.1

0.2

0.3

0.4

−4 −2 0 2 4Z or t

Prob

abilit

y de

nsity

0.0

0.1

0.2

0.3

0.4

−4 −2 0 2 4Z or t

Prob

abilit

y de

nsity

0.0

0.1

0.2

0.3

0.4

−4 −2 0 2 4Z or t

Prob

abilit

y de

nsity

StandardnormalN(0,1)Addingt distributionswithincreasingdegreesoffreedom

df =2 df =5 df =15

Student'stwheredf =100

0.0

0.1

0.2

0.3

0.4

−4 −2 0 2 4Z or t

Prob

abilit

y de

nsity

Propertiesofthet distributionAsdf approaches∞,thet distributionapproachesN(0,1)◦ Usually,df =n– 1,wherenissamplesize

Likethenormaldistribution…◦ t distributionissymmetric◦ Mean=median=mode

Unlikethenormaldistribution…◦ t has*muchfattertails*

0.0

0.1

0.2

0.3

0.4

−4 −2 0 2 4Z or t

Prob

abilit

y de

nsity

TestAssumptionsSampleisarandomsamplefromitspopulation

Sampledataisnormallydistributed

Diverightin:Onesamplet-testNullhypothesisH0 :𝒙I = 𝝁

One-sidedtestHA :𝒙I > 𝝁HA :𝒙I < 𝝁

Directional

Two-sidedtestHA :𝒙I ≠ 𝝁

Non-directional

Performingaone-samplet-testIwanttoknowifthedisease,BadDisease,influenceshumanbodytemperature.Weknowthatstandardhumanbodytemperatureis98.6degreesF.Imeasuredthetemperaturesforarandomsampleof15individualswithBadDisease.Onaverage,theyhaveatempof99.59 degreesF.

DoesBadDiseaseraisebodytemperature?

H0 :BadDiseasedoesnotraisebodytemperature.𝒙I = 𝟗𝟖. 𝟔HA :BadDiseaseraisesbodytemperature.𝒙I > 𝟗𝟖. 𝟔

> head(bad.disease.temp)temp

1 98.174202 97.621373 99.609204 100.441585 99.754836 100.28846

> mean(bad.disease.temp$temp)[1] 99.594

CheckingassumptionsofthetestThet-testassumesthatoursampledataisnormallydistributed

Hardtotellifnormalfromhistogram!

0

1

2

3

98 99 100 101 102 103Bad disease temperature for n=15

coun

t

AssessingnormalitywithaQ-QplotQuantile-Quantileplotsgraphicallyshowiftwodatasetscomefromthesamedistribution◦ Ifthepointsfollowthe"expectation"line,datasetsaresimilarlydistributed

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●●

●

●●

●

●

−2 −1 0 1 2

01

23

45

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s●

●

●

●●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

−2 −1 0 1 2

4.4

4.6

4.8

5.0

5.2

5.4

5.6

Normal Q−Q Plot


Sam

ple

Qua

ntile

s

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

−2 −1 0 1 2

020

4060

80

Normal Q−Q Plot


Sam

ple

Qua

ntile

s

Idealscenario.Dataisnormal Dataisnotnormal Closeenough,let'ssaynormal

Granulardataisalsonormal!

! !

!"#$%&#"'()*+#%,-.*/)*"0%$.'$1*.0-,*$0(*'$.'+#(-*.-2'#('0$*3"04*$0"4#&'()

$0"4#&

MakingaNormalQ-Qplot> head(bad.disease.temp)

temp1 98.174202 97.621373 99.609204 100.441585 99.754836 100.28846

## Uses base R, not ggplot ##> qqnorm(bad.disease.temp$temp, pch=20)> qqline(bad.disease.temp$temp, col="red")

DataapproximatelyfollowtheQQline.Therefore,assumptionshavebeenmetandwecanrunthetest.

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

−1 0 1

9899

100

102

Normal Q−Q Plot


Sam

ple

Qua

ntile

s

Performingthet-test1. Calculatetheteststatistic

2. Seewhereteststatisticfallsonitsdistribution

3. ComputeP-valueasareaunderthecurve pastthisstatistic

o TheP-valueistheprobabilityofobtainingateststatisticaslargeorlargerthanthatrecovered

Computetheteststatistic,tH0 :BadDiseasedoesnotraisebodytemperature.𝒙I = 𝟗𝟖. 𝟔HA :BadDiseaseraisesbodytemperature.𝒙I > 𝟗𝟖. 𝟔

𝑡 = ,̅-.RS�= 33.=3-3T.UV.WW

VX�= 2.66


> sd(bad.disease.temp$temp) [1] 1.438273

> nrow(bad.disease.temp)[1] 15

thisisourteststatistic,t2.66

Moreprecisely,t2.66,df=14

FindwherethestatisticfallsindistributionOurnulldistributionisat distributionwithdf =14(15-1)

0.0

0.1

0.2

0.3

0.4

−4 −2 0 2 4t

Prob

abilit

y D

ensi

ty

2.66

ThisareaisourP-valuebecausewetestif 𝒙I > 𝟗𝟖. 𝟔

## P(X >= 2.66) ##> 1 – pt(2.66, 14)

0.009

Formingconclusions

OurP=0.009,whichislessthanα=0.05.ThereforewerejectthenullhypothesisandwehaveevidencethatBadDiseaseraisesbodytemperature.

H0 :BadDiseasedoesnotraisebodytemperature.𝒙I = 𝟗𝟖. 𝟔HA :BadDiseaseraisesbodytemperature.𝒙I > 𝟗𝟖. 𝟔

Approachtohypothesistesting1. Decidewhatquestionyouareinterestedinanswering

Isthemeanofmydataequalto98.6?

2. DeterminetheappropriatehypothesistesttouseUseaone-samplet-test

3. CheckthatyourdatameettheassumptionsofthetestWeconfirmedthedataisnormallydistributed

4. ComputetheteststatisticforyourhypothesistestandthecorrespondingP-value

Wefoundt=2.66andP=0.009

5. DrawconclusionsusingaspecifiedP-valuethresholdAt𝛂 =0.05,werejectthenullhypothesisandfindevidencethatBadDiseaseraisesbody

temp.

EffectsizeTheeffectweobservedhereis99.59– 98.6=0.99

Statisticalsignificanceisnotthesameasbiologicalsignificance

0

1

2

3

4

1.0 1.5 2.0 2.5 3.0x

y

Comparisonbetweenredandbluesampleswillshowasignificantdifferencebetweenmeans.Butdoesitmatter?

ConfidenceintervalsRangeofvaluessurroundingthesampleestimatethatislikelytocontainthepopulationparameter

Generallywecalculatethe95%confidenceinterval(goeswith𝛂 =0.05)

In95%ofrandomsamples,thiswillbetrue:

𝒙I − 𝒕𝟎.𝟎𝟐𝟓 ∗ 𝑺𝑬𝒙I < 𝝁 < 𝒙I + (𝒕𝟎.𝟎𝟐𝟓 ∗ 𝑺𝑬𝒙I)

95%ConfidenceInterval

ConceptualizingtheCI

True population mean

10 random samples, in no particular order

sample effect

95% CI

9/10(~95%)oftheserandomsampleshasa95%confidenceintervalthatoverlapsthetruemean

Doesthisfigurerepresent95%CIs?

True population mean

10 random samples, in no particular order

NO.Ifanything,theseare40%CIs

CalculatingtheCIConstructupperandlowerlimitsof95%CI◦ Lower:�̅� − (𝑡<.<c= ∗ 𝑆𝐸,̅)◦ Upper:�̅� + (𝑡<.<c= ∗ 𝑆𝐸,̅)

### Calculate t_0.025> qt(0.025, 14)[1] -2.144787

### Calculate t_0.025*SE> t <- abs(qt(0.025, 14)) > se <- sd(bad.disease.temp$temp)/sqrt(15)> t * se

[1] 0.3713605

95%CI=𝒙I ± 𝒕𝟎.𝟎𝟐𝟓 ∗ 𝑺𝑬𝒙I

= 𝟗𝟗. 𝟓𝟗 ± 0.371

Thetruepopulationmeanis95%likelytobeintherange99.22– 99.96

BringitalltogetherWehaveasamplemeanof99.59withastandarderrorof0.19.

Ourteststatistictdf=14 =2.66,givingaP-value=0.009.Werejectthenullhypothesisat𝛂 =0.05 andhaveevidencethatBadDiseaseraisestemperature.

Oureffectsizeis0.99.

Wefounda95%CIof99.59±0.371,givingthelikelyrangeforthetruepopulationparameter.Notethatthenullof98.6isnotintheCI.

Reportingnon-significantresultsLet'ssaywefoundtdf=14 =1.05à P=0.15

At𝛂 =0.05,wefailtorejectthenullhypothesis.WehavenoevidencethatBadDiseaseraisesbodytemperature.◦ DoesnotmeanthatBadDiseasedoesn'traisethetemperature– oursamplejusthadnoevidenceforthiseffect.

### Calculate P-value> 1 - pt(1.05, 14)[1] 0.1557531

Whatis aP-value?TheP-valueisanareaunderthecurveofthenulldistribution◦ Itisthereforetheprobabilityofobservingthiseffectorlargerassumingthenullhypothesisistrue

◦ P-value=P(effectormoreobserved|H0 istrue)◦ P-value=0.009:IfH0 istrue,Iwouldobtainthiseffectorlarger(t>=2.66)in0.9%ofsuchstudiesduetorandomsamplingerror

Whatis aP-value?AlowP-valuemeansthatthedataareunlikelyunderthenull◦ Wethereforemakeaneducatedguessthatthereisprobablysomethingelsegoingon,suchasthealternativehypothesis

◦ Wecannever ruleoutthepossibilitythatresultswerefullyconsistentwithnull,justunlikely

P-valuesarenotmagicP-valuescannot evaluatewhetherH0 orHA istrue◦ LargeP-valuesdonot provethenullistrue◦ SmallP-valuesdonot provethealternativeistrue.Theymerelysuggestthenulllikelyisn't.

P-valuesdonot givetheprobabilitythatyoumadetherightconclusion

TwostudieswiththesameP-valuedonotprovidethesameweightofevidence

DistributionofP-valuesIperform1000t-testsonrandomsamplesfromN(3.5,1)andcomparetonullmean=3.5.Therefore,nullistrue.

Histogram of ps

Pvalues

Count

0.0 0.2 0.4 0.6 0.8 1.0

020

4060

80

P<=0.05à Falsepositives

DistributionofP-valuesIperform1000t-testsonrandomsamplesfromN(3.5,1)andcomparetonullmean=2.Therefore,nullisfalse.

P−values

Count

0.0 0.2 0.4 0.6 0.8 1.0

0200

400

600

P<=0.05à TruepositivesOnly~67%ofdistribution!

P-valuesandeffectsize

cally important. This is often untrue. First, the difference maybe too small to be clinically important. The P value carries noinformation about the magnitude of an effect, which is cap-tured by the effect estimate and confidence interval. Second,the end point may itself not be clinically important, as canoccur with some surrogate outcomes: response rates versussurvival, CD4 counts versus clinical disease, change in a mea-surement scale versus improved functioning, and so on.25-27

Misconception #4: Studies with P values on opposite sides of.05 are conflicting. Studies can have differing degrees of sig-nificance even when the estimates of treatment benefit areidentical, by changing only the precision of the estimate, typ-ically through the sample size (Figure 2A). Studies statisti-cally conflict only when the difference between their results isunlikely to have occurred by chance, corresponding to whentheir confidence intervals show little or no overlap, formallyassessed with a test of heterogeneity.

Misconception #5: Studies with the same P value provide thesame evidence against the null hypothesis. Dramatically differentobserved effects can have the same P value. Figure 2B showsthe results of two trials, one with a treatment effect of 3%(confidence interval [CI], 0% to 6%), and the other with aneffect of 19% (CI, 0% to 38%). These both have a P value of.05, but the fact that these mean different things is easilydemonstrated. If we felt that a 10% benefit was necessary tooffset the adverse effects of this therapy, we might well adopta therapy on the basis of the study showing the large effectand strongly reject that therapy based on the study showingthe small effect, which rules out a 10% benefit. It is of coursealso possible to have the same P value even if the lower CI isnot close to zero.

This seeming incongruity occurs because the P value de-fines “evidence” relative to only one hypothesis—the null.There is no notion of positive evidence—if data with a P !.05 are evidence against the null, what are they evidence for?In this example, the strongest evidence for a benefit is for 3%in one study and 19% in the other. If we quantified evidencein a relative way, and asked which experiment provided

greater evidence for a 10% or higher effect (versus the null),we would find that the evidence was far greater in the trialshowing a 19% benefit.13,18,28

Misconception #6: P ! .05 means that we have observeddata that would occur only 5% of the time under the null hypoth-esis. That this is not the case is seen immediately from the Pvalue’s definition, the probability of the observed data, plusmore extreme data, under the null hypothesis. The result withthe P value of exactly .05 (or any other value) is the mostprobable of all the other possible results included in the “tailarea” that defines the P value. The probability of any individ-ual result is actually quite small, and Fisher said he threw inthe rest of the tail area “as an approximation.” As we will seelater in this chapter, the inclusion of these rarer outcomesposes serious logical and quantitative problems for the Pvalue, and using comparative rather than single probabilitiesto measure evidence eliminates the need to include outcomesother than what was observed.

This is the error made in the published survey of medicalresidents cited in the Introduction,3 where the following fouranswers were offered as possible interpretations of P ".05:

a. The chances are greater than 1 in 20 that a differencewould be found again if the study were repeated.

b. The probability is less than 1 in 20 that a difference thislarge could occur by chance alone.

c. The probability is greater than 1 in 20 that a differencethis large could occur by chance alone.

d. The chance is 95% that the study is correct.

The correct answer was identified as “c”, whereas the ac-tual correct answer should have read, “The probability isgreater than 1 in 20 that a difference this large or larger couldoccur by chance alone.”

These “more extreme” values included in the P-value def-inition actually introduce an operational difficulty in calcu-lating P values, as more extreme data are by definition unob-served data. What “could” have been observed depends onwhat experiment we imagine repeating. This means that twoexperiments with identical data on identical patients couldgenerate different P values if the imagined “long run” weredifferent. This can occur when one study uses a stoppingrule, and the other does not, or if one employs multiplecomparisons and the other does not.29,30

Misconception #7: P ! .05 and P !.05 mean the samething. This misconception shows how diabolically difficult itis to either explain or understand P values. There is a bigdifference between these results in terms of weight of evi-dence, but because the same number (5%) is associated witheach, that difference is literally impossible to communicate. Itcan be calculated and seen clearly only using a Bayesian evi-dence metric.16

Misconception #8: P values are properly written as inequal-ities (eg, “P !.02” when P ! .015). Expressing all P values asinequalities is a confusion that comes from the combinationof hypothesis tests and P values. In a hypothesis test, a pre-set“rejection” threshold is established. It is typically set at P !.05, corresponding to a type I error rate (or “alpha”) of 5%. Insuch a test, the only relevant information is whether the

-10

-5

0

5

10

15

20

25

30

35

40

Perc

ent b

enef

it

P=0.30 P=0.002

P=0.05 P=0.05

Same effect, different P-value

Different effect, same P-value

A BFigure 2 Figure showing how the P values of very different signifi-cance can arise from trials showing the identical effect with differentprecision (A, Misconception #4), or how same P value can be de-rived from profoundly different results (B, Misconception #5).

Twelve P-value misconceptions 137

P-valuesarestronglyinfluencedbysamplesize

𝑡 = ,̅-.RS�= 33.=3-3T.UV.WW

𝟏𝟓�= 2.66à P=0.009

𝑡 = ,̅-.RS�= 33.=3-3T.UV.WW

𝟏𝟎𝟎�= 6.88à P=2.81e-10

Increasingsamplingsizeincreasespower

Poweristheprobabilityyoudetectatrueeffect,i.e.truepositiverate

P-valuesarekindofanaccidentPersonally,thewriterpreferstosetalowstandardofsignificanceatthe5percentpoint....Ascientificfactshouldberegardedasexperimentallyestablishedonlyifaproperlydesignedexperimentrarelyfailstogivethislevelofsignificance.

- R.A.Fisher

TorecapWehaveperformedaone-sidedt-testtotestthealternativehypothesisthat𝒙I > 𝟗𝟖. 𝟔

Wecanalsoperformatwo-sidedt-testtotestthenon-directional alternativehypothesisthat 𝒙I ≠ 𝟗𝟖. 𝟔

Computetheteststatistic,tH0 :BadDiseasedoesnotaffectbodytemperature.𝒙I = 𝟗𝟖. 𝟔HA :BadDiseaseaffects bodytemperature.𝒙I ≠ 𝟗𝟖. 𝟔

𝒙I =99.59𝜇 =98.6n=15

𝑡 = ,̅-.RS�= 33.=3-3T.UV.WW

VX�= 2.66

> head(bad.disease.temp)temp

1 98.174202 97.621373 99.609204 100.441585 99.754836 100.28846


> sd(bad.disease.temp$temp) [1] 1.438273

thisisourteststatistic,t2.66

Moreprecisely,t2.66,df=14

ComputingtheP-valueForatwo-sidedtest,weconsiderbothextremes

## P(X >= 2.66) or P(X<=-2.66)> 2 * (1 - pt(2.66, 14))

0.01806

0.0

0.1

0.2

0.3

0.4

−4 −2 0 2 4t

Prob

abilit

y D

ensi

ty

2.66

ThiscombinedareaisourP-value=0.018

-2.66

Whenrunningaone-sidedt-testgoeswrongForsampleofn=20,Iwanttotest�̅� < 𝜇 where𝜇 =5.8> head(example.sample)

values1 5.5113072 5.0126123 6.1784214 7.5875685 6.8921656 5.197049

> mean(example.sample$values)[1] 6.325366

> sd(example.sample$values)[1] 0.8295474

𝑡 = �̅� − 𝜇𝑠𝑛�

= 6.32 − 5.80.8320�

= 2.801

Theareabelow thestatisticisourP-valuefor�̅� < 𝜇

0.0

0.1

0.2

0.3

0.4

−4 −2 0 2 4t

Prob

abilit

y D

ensi

ty2.801

> pt(2.801, 19)[1] 0.9943006

Two-samplet-testsComparemeansoftwosamples(�̅�0and�̅�c )toeachother.Aretheunderlyingpopulationmeans𝜇0and𝜇c thesame?

NullhypothesisH0 :𝝁𝟏 =𝝁𝟐

H0 :𝝁𝟏 - 𝝁𝟐 = 𝟎

One-sidedtestHA :𝝁𝟏 > 𝝁𝟐HA :𝝁𝟏 < 𝝁𝟐

HA :𝝁𝟏 −𝝁𝟐 > 𝟎HA :𝝁𝟏 −𝝁𝟐 < 𝟎

Two-sidedtestHA :𝝁𝟏 ≠ 𝝁𝟐

HA :𝝁𝟏 − 𝝁𝟐 ≠ 𝟎

Formulasfortwo-samplet-test

𝒕 = 𝒙I𝟏 − 𝒙I𝟐𝑺𝑬𝒙I𝟏-𝒙I𝟐

𝑆𝐸,̅V-,̅@ = 𝑠jc 1𝑛0+

1𝑛c

�

𝑠jc =𝑑𝑓0𝑠0c + 𝑑𝑓c𝑠cc

𝑑𝑓0 + 𝑑𝑓c

𝑑𝑓 = 𝑑𝑓0 + 𝑑𝑓c = 𝑛0 + 𝑛c − 2

Standarderrorfortwosamples

Pooledsamplevariance

Degreesoffreedomfort distribution

Two-samplet-testassumptionsBothsamplesmustbedistributednormally

Samplesshouldhaveequalvariances◦ F-test cancomparevariancesofsamplestocheckassumption,butitishighlysensitiveandwill"toooften"rejectthenull.

◦ Levene's test willtestforhomogeneityofvariancesaswell◦ CanuseWelch'st-test whenvarianceassumptionisnotmet

Forthisclass,wefocusonnormalassumption

Two-samplevs.pairedt-testPairedt-testisaspecialcasewherethetwosamplesbeingcomparedhaveanaturalpairing◦ Effectivelyaone-samplet-test wherewetesttheifdifference betweentwosamples=0

Pairedt-testmustcheckassumptionthatdifferencebetweenmeansisnormalYoucanalwaysperformatwo-sampleinsteadofpaired,butpairedwillhavemorepower

PairedscenariosMakingtwomeasurementsoneachsubject

Makingrepeatmeasurementsonthesamesubjectattwotimepoints◦ Beforeandaftertreatment

Matchingsubjectswithsimilarage,sex,etc.

Placingsubjectandcontrolincloseproximity

Isitpairedorindependent?Triglyceridelevelsofagroupofsubjectsiscomparedbeforeandaftertakingavitaminsupplement.Foraclinicaltrial,onegroupisgivenavitaminandtheotheraplacebo.Theirtriglyceridelevelsarecompared.Foraclinicaltrial,twogroupswithindividualsmatchedforage,sex,andhealthhistoryaregivenvitaminandplacebo,respectively.Triglyceridelevelsarecompared.Imeasurefreeenergies,betweeninactiveandactiveconformations,for30enzymes.Aclinicaltrialtestsanewdrugon20setsoftwins,giving,foreachtwinpair,oneaplaceboandonethedrug.Comparisonsbetweenplaceboanddruggroupsaremade.Aclinicaltrialtestsanewdrugon20setsoftwins,randomizingallindividualsintoaplacebogroupandoradruggroup.Comparisonsbetweengroupsaremade.

Paired

Independent

PairedPaired

Paired

Independent

Isitpairedorindependent?Paired:CanIdrawalinebetweenindividualsingroups?◦ MustbesameNpergroup,bydefinition

Independent:Nonaturalwaytodrawthelines.

5

10

15

before afterPaired groups

Mea

sure

d va

lue

Troubleshooting:Failuretomeetnormalassumption1.Ifsamplesizeislargeenough(>~30),CentralLimitTheoremkicksinandassumptionsareeffectivelymet

2.Ifsampleissizeissmall(<~30)wecaneither:◦ Transform thedatatobenormal◦ Useanonparametrictest

Non-normaldataright skew

coun

t

0 5 10 15 20 25 30

010

25

left skew

coun

t

−20 −10 0 5 10

020

40

long tails

coun

t

−20 0 10 20 30

010

30

●

●●

●

●●

●

●

● ●

●

●●

●●

●●●●●

●●●

●●

●●

●

●

●

●

●●

●●● ●

●●●

●●

●

●●●●●

●●●

●

●

●●

●●

●

●

●● ●●●

●●

●

●

●

●

● ●● ●●

●●●●

●●●●●

●●

●

●

●●

●●

●●●

●

●●

●●

●

●

●

●

●

●

●

●

●●

●●●

●

●

●

●

●

●

●●

●

●

●

●

−2 −1 0 1 2

010

2030

Normal Q−Q Plot


Sam

ple

Qua

ntile

s

●● ●●

●● ●

● ●●

● ●●

●●●●

●

●● ●

●●

●

●

●●

●●

●

●●

●

●

●●●●

●●●

●

●

●●

●

●●●

●●

●

●

●

●

●

●●

●

●● ●

● ●

●

●

●

●

●

●●

●●●

●

●●

●

●

●●

●

●●

●●●●

●●

●

● ●●

●●

●●

●●

●

●

●

●

●

●●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

−2 −1 0 1 2

−20

−55

Normal Q−Q Plot


Sam

ple

Qua

ntile

s

●

●

●●● ●●

●●

●●

●

●●●

● ● ● ●● ●

●●●●●●

●● ●●●

●● ●●●●

●● ●● ●

●

●●

●●●●●

●

●●●● ●

●● ●

●

●

●

●●●

●●

●

●

●●●●

●●

● ●

●●

●

●●

●

●●

● ●●

●●●

●●●●●

● ●●

●

●●

●

●

●

●

●●

●

●

●

●●

●●●

●●

●●●

●

●●

●●●

●

●●

●

●●

●

●

●

●

●●

●

●●

●●

●●

●

●

●

−2 −1 0 1 2

−20

020

Normal Q−Q Plot


Sam

ple

Qua

ntile

s

DatatransformationsLogofdata:xà log(x)

Squarerootofdata:xà sqrt(x)

Inverseofdata:xà 1/x

Datatransforms:CautionYourtestwillnowrunontransformeddata◦ Assumelogtransformperformedandresulthaseffectsize1.5◦ Actualeffectsizeisexp(1.5)=4.48

Becarefulof0'sindata◦ 1/0andlog(0)areundefined◦ Hack:Replaceall0'swithtinynumberlike1e-8

InstructorEditorializing:

MuchlikeZ-tables,datatransformsaremostlyhistorical