HypothesisTestingI:IntroductionandComparingmeansBIO5312FALL2017
STEPHANIE J. SPIELMAN,PHD
StatisticalinferencePopulation Sample
Random sampling
Statistical inferencePopulation parameters !, "
Sample estimates x, s
RecallfromlastweekFor𝑋~𝑁(0,1),whatistheprobabilityP(X≤0.47)?
0.0
0.1
0.2
0.3
0.4
−5 −4 −3 −2 −1 0 1 2 3 4 5Z
Probability
0.47
# CDF: P(X <= 0.47)> pnorm(0.47)
[1] 0.6808225
ComputingprobabilitiesforapopulationClownfishlengthsarenormallydistributedasN(11,1.9).
Whatistheprobabilitythataclownfishislessthan13cm?
𝑍 = ,-./= 01-00
0.03� = 1.83> pnorm(1.83) [1] 0.966375
ClownfishlengthsarenormallydistributedasN(11,1.9).
Whatistheprobabilitythatasampleof100clownfishhasameanlessthan11.5?
𝑍 = 𝑥 − 𝜇𝑆𝐸
Computingprobabilitiesforasample
𝑍 = 𝑥 − 𝜇𝜎
𝑆𝐸 = /;�= 0.3
0<<� = 0.19
= 00.=-00<.03
= 2.63 > pnorm(2.63) [1] 0.9957308
AreturntosamplingdistributionsThesamplingdistribution isthedistributionofallpossiblevaluesanestimateforaparametercantakebasedonrandomsampling◦ Samplingdistributionofthemeans=allthemeansonecouldmeasureacrosssamples
Forclownfish,samplingdistributionofthemean,forN=100,hasastandarddeviationof1.1◦ SE=standarddeviationofthesamplingdistributionofthemean=1.1
Andifwedidn’tknowthepopulation𝜎 tocomputeSE?◦ WecouldapproximateasSE= >
;�,wheres isstandarddeviationofasample
HypothesistestingComparedata(randomsample)totheexpectationofaspecific nullhypothesis
Hypothesistestingusesprobabilitytoanswerwhetheranobservedeffectoccurredbychance
ExamplescenariotousehypothesistestingThepoliovaccinewasfirsttestedin1954.~400,000studentsweredividedintotworandomgroups:◦ Halfreceivedthevaccine,halfreceivedplacebo◦ Invaccinegroup,0.016%developedpolio.◦ Inplacebogroup,0.057%developedpolio.
Wecanusehypothesistestingtoaskifthevaccinelikelyworked,orwhetherrandomchancelikelycausedresults.
HypothesistestinghasnullandalternativehypothesesThenullhypothesisH0 makesaclaimabouttheunderlyingpopulationparameter◦ "Nothinginterestingisgoingon"◦ Specifictestshavespecificnullhypotheses
ThealternativehypothesisHA iswhatwewouldliketo"knowifit'strue"◦ "Somethingwecareaboutisgoingon"
Parametricvs.nonparametrichypothesistestsParametrictestsassumethatthedatafollowaparticularknowndistribution◦ Distributionssuchasnormal,binomial,chi-squared,etc.
Nonparametric testsmakenoassumptionaboutthedataandare"distribution-free"
ThenulldistributionrepresentsH0Thenulldistribution isthesamplingdistributionofoutcomesforateststatisticassumingthenullistrue
Hypothesistestingissometimesreferredtoasnullhypothesissignificancetesting
Hypothesistestsask:Towhatextentaremydataexpectedunderthenullhypothesis?
0.00
0.05
0.10
0.15
−4 −3 −2 −1 0 1 2 3 4test statistic
Prob
abilit
y de
nsity
Samplingdistributionofthenullhypothesis
Xvaluesinthetails areunlikely
Xvaluesinmiddleofthedistributionarefairlylikely
0.00
0.05
0.10
0.15
−4 −3 −2 −1 0 1 2 3 4test statistic
Prob
abilit
y de
nsity
Samplingdistributionofthenullhypothesis
ShadedareasaretheCriticalregions
criticalvalue criticalvalue
Thetotalareaofthecriticalregionsisdefinedas𝛂
CriticalvalueswhereN(0,1)isthenull
TheP-valueistheareaunderthecurveforyourteststatisticResultofhypothesistestissignificant ifteststatisticfallsinthecriticalregion
0.00
0.05
0.10
0.15
−4 −3 −2 −1 0 1 2 3 4test statistic
Prob
abilit
y de
nsity
Forteststatistic=2.25,thesumofareaislessthan𝛂
0.00
0.05
0.10
0.15
−4 −3 −2 −1 0 1 2 3 4test statistic
Prob
abilit
y de
nsity
0.00
0.05
0.10
0.15
−4−3−2−101234test statistic
Probability density
-2.25 2.25
NOTE:Itwillnotalwaysbesymmetric
FormingconclusionsBasedonyourpre-chosenα:
1. P-value<=α◦ Significantresultsallowustorejectthenullhypothesisandconclude
evidenceinfavorofthealternativehypothesis
2. P-value> α◦ Wedonothavesignificantresults.Wefailtorejectthenullhypothesis,
andwehavenoevidenceinfavorofthealternativehypothesis.
Thechoiceofαistotallyarbitrary,butusuallyyouwillsee0.05or0.01
Errorratesα setstheoverallfalsepositiverate forourtestprocedure◦ Ifthenullistrue,wefalselyrejectthenull5%ofthetimeforα=0.05
Truthaboutpopulation(generallyunknown)
Conclusion
Nullistrue Alternativeistrue
Rejectnull(P<= α) TypeIerror(Falsepositive)
Truepositive
Failtorejectnull(P> α) True negative TypeIIError(Falsenegative)
Whattypeoferrorisit(orisit?)Anewarthritisdrugdoesnothaveaneffectinclinicaltrials,eventhoughitactuallydoesreducearthritispain.ApersonwithHIVreceivesapositivetestresultforHIV.Apersonusingillegalperformingenhancingdrugspassesatestclearingthemofdruguse.Astudyfoundasignificantrelationshipbetweenneckstrainandjogging,whenrealitythereisnorelationship.Anhealthyindividualgetsapositivecancerbiopsyresult.
FN(typeII)
Noerror
FN(typeII)
FP(typeI)
FP(typeI)
One-sidedvs.Two-sided(or–tailed)One-sidedtestsaredirectional◦ Aremydatalarger/smallerthannull?
Two-sidedtestsarenon-directional◦ Domydatadifferfromnull?
Total area=𝛂
Doesdatadifferfromnull?
Isdata<null?
Isdata>null?
One-sidedvs.Two-sidedtestsOne-sidedtestshavemorepowerthantwo-sidedtests◦ Power=theabilitytodetectatrueeffect◦ Alsoknownastruepositiverate
One-sidedtestsaremorelimitedinscopeandcangetyouintroubleifyouchoosethewrongdirection
Youmustchooseonlyonetestbeforeyoulookatyourdata
Approachtohypothesistesting1. Decidewhatquestionyouareinterestedinanswering2. Determinetheappropriatehypothesistesttouse3. Checkthatyourdatameettheassumptions*ofthetest4. Computetheteststatisticforyourhypothesistestand
thecorrespondingP-value5. Drawconclusionsusinganapriorispecified𝛂 (P-value
threshold)*Parametriconly
HypothesisteststocomparemeansTestthenullhypothesis:◦ Onesampletest:Themeanofmysampleequalsanullvalue◦ Twosampletest:Themeanofmytwosamplesareequal(differenceinmeansis0)
Twooptions:◦ Z-test,wherethenulldistributionisthestandardnormalN(0,1)◦ TeststatisticisZ
◦ t-test,wherethenulldistributionistheStudent'stdistribution◦ Teststatisticist
Z-testsvst-testsNormalpopulationshavethesamplingdistribution𝑋~𝑁(𝜇, /
@
;�)
◦ Itsteststatisticis𝑍 = ,-.AB whereSE=𝜎/ 𝑛�
◦ Usedforz-tests
Because𝜎 israrelyknown,wecanapproximateSE = >;�
𝑡 = ,̅-.ABHI
,where𝑆𝐸,̅ = >;�
StandardnormalandStudent'st
0.0
0.1
0.2
0.3
0.4
−4 −2 0 2 4Z or t
Prob
abilit
y de
nsity
0.0
0.1
0.2
0.3
0.4
−4 −2 0 2 4Z or t
Prob
abilit
y de
nsity
0.0
0.1
0.2
0.3
0.4
−4 −2 0 2 4Z or t
Prob
abilit
y de
nsity
0.0
0.1
0.2
0.3
0.4
−4 −2 0 2 4Z or t
Prob
abilit
y de
nsity
StandardnormalN(0,1)Addingt distributionswithincreasingdegreesoffreedom
df =2 df =5 df =15
Student'stwheredf =100
0.0
0.1
0.2
0.3
0.4
−4 −2 0 2 4Z or t
Prob
abilit
y de
nsity
Propertiesofthet distributionAsdf approaches∞,thet distributionapproachesN(0,1)◦ Usually,df =n– 1,wherenissamplesize
Likethenormaldistribution…◦ t distributionissymmetric◦ Mean=median=mode
Unlikethenormaldistribution…◦ t has*muchfattertails*
0.0
0.1
0.2
0.3
0.4
−4 −2 0 2 4Z or t
Prob
abilit
y de
nsity
TestAssumptionsSampleisarandomsamplefromitspopulation
Sampledataisnormallydistributed
Diverightin:Onesamplet-testNullhypothesisH0 :𝒙I = 𝝁
One-sidedtestHA :𝒙I > 𝝁HA :𝒙I < 𝝁
Directional
Two-sidedtestHA :𝒙I ≠ 𝝁
Non-directional
Performingaone-samplet-testIwanttoknowifthedisease,BadDisease,influenceshumanbodytemperature.Weknowthatstandardhumanbodytemperatureis98.6degreesF.Imeasuredthetemperaturesforarandomsampleof15individualswithBadDisease.Onaverage,theyhaveatempof99.59 degreesF.
DoesBadDiseaseraisebodytemperature?
H0 :BadDiseasedoesnotraisebodytemperature.𝒙I = 𝟗𝟖. 𝟔HA :BadDiseaseraisesbodytemperature.𝒙I > 𝟗𝟖. 𝟔
> head(bad.disease.temp)temp
1 98.174202 97.621373 99.609204 100.441585 99.754836 100.28846
> mean(bad.disease.temp$temp)[1] 99.594
CheckingassumptionsofthetestThet-testassumesthatoursampledataisnormallydistributed
Hardtotellifnormalfromhistogram!
0
1
2
3
98 99 100 101 102 103Bad disease temperature for n=15
coun
t
AssessingnormalitywithaQ-QplotQuantile-Quantileplotsgraphicallyshowiftwodatasetscomefromthesamedistribution◦ Ifthepointsfollowthe"expectation"line,datasetsaresimilarlydistributed
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●●
●
●
−2 −1 0 1 2
01
23
45
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
−2 −1 0 1 2
4.4
4.6
4.8
5.0
5.2
5.4
5.6
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−2 −1 0 1 2
020
4060
80
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Idealscenario.Dataisnormal Dataisnotnormal Closeenough,let'ssaynormal
Granulardataisalsonormal!
! !
!"#$%&#"'()*+#%,-.*/)*"0%$.'$1*.0-,*$0(*'$.'+#(-*.-2'#('0$*3"04*$0"4#&'()
$0"4#&
MakingaNormalQ-Qplot> head(bad.disease.temp)
temp1 98.174202 97.621373 99.609204 100.441585 99.754836 100.28846
## Uses base R, not ggplot ##> qqnorm(bad.disease.temp$temp, pch=20)> qqline(bad.disease.temp$temp, col="red")
DataapproximatelyfollowtheQQline.Therefore,assumptionshavebeenmetandwecanrunthetest.
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−1 0 1
9899
100
102
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Performingthet-test1. Calculatetheteststatistic
2. Seewhereteststatisticfallsonitsdistribution
3. ComputeP-valueasareaunderthecurve pastthisstatistic
o TheP-valueistheprobabilityofobtainingateststatisticaslargeorlargerthanthatrecovered
Computetheteststatistic,tH0 :BadDiseasedoesnotraisebodytemperature.𝒙I = 𝟗𝟖. 𝟔HA :BadDiseaseraisesbodytemperature.𝒙I > 𝟗𝟖. 𝟔
𝑡 = ,̅-.RS�= 33.=3-3T.UV.WW
VX�= 2.66
> mean(bad.disease.temp$temp)[1] 99.594
> sd(bad.disease.temp$temp) [1] 1.438273
> nrow(bad.disease.temp)[1] 15
thisisourteststatistic,t2.66
Moreprecisely,t2.66,df=14
FindwherethestatisticfallsindistributionOurnulldistributionisat distributionwithdf =14(15-1)
0.0
0.1
0.2
0.3
0.4
−4 −2 0 2 4t
Prob
abilit
y D
ensi
ty
2.66
ThisareaisourP-valuebecausewetestif 𝒙I > 𝟗𝟖. 𝟔
## P(X >= 2.66) ##> 1 – pt(2.66, 14)
0.009
Formingconclusions
OurP=0.009,whichislessthanα=0.05.ThereforewerejectthenullhypothesisandwehaveevidencethatBadDiseaseraisesbodytemperature.
H0 :BadDiseasedoesnotraisebodytemperature.𝒙I = 𝟗𝟖. 𝟔HA :BadDiseaseraisesbodytemperature.𝒙I > 𝟗𝟖. 𝟔
Approachtohypothesistesting1. Decidewhatquestionyouareinterestedinanswering
Isthemeanofmydataequalto98.6?
2. DeterminetheappropriatehypothesistesttouseUseaone-samplet-test
3. CheckthatyourdatameettheassumptionsofthetestWeconfirmedthedataisnormallydistributed
4. ComputetheteststatisticforyourhypothesistestandthecorrespondingP-value
Wefoundt=2.66andP=0.009
5. DrawconclusionsusingaspecifiedP-valuethresholdAt𝛂 =0.05,werejectthenullhypothesisandfindevidencethatBadDiseaseraisesbody
temp.
EffectsizeTheeffectweobservedhereis99.59– 98.6=0.99
Statisticalsignificanceisnotthesameasbiologicalsignificance
0
1
2
3
4
1.0 1.5 2.0 2.5 3.0x
y
Comparisonbetweenredandbluesampleswillshowasignificantdifferencebetweenmeans.Butdoesitmatter?
ConfidenceintervalsRangeofvaluessurroundingthesampleestimatethatislikelytocontainthepopulationparameter
Generallywecalculatethe95%confidenceinterval(goeswith𝛂 =0.05)
In95%ofrandomsamples,thiswillbetrue:
𝒙I − 𝒕𝟎.𝟎𝟐𝟓 ∗ 𝑺𝑬𝒙I < 𝝁 < 𝒙I + (𝒕𝟎.𝟎𝟐𝟓 ∗ 𝑺𝑬𝒙I)
95%ConfidenceInterval
ConceptualizingtheCI
True population mean
10 random samples, in no particular order
sample effect
95% CI
9/10(~95%)oftheserandomsampleshasa95%confidenceintervalthatoverlapsthetruemean
Doesthisfigurerepresent95%CIs?
True population mean
10 random samples, in no particular order
NO.Ifanything,theseare40%CIs
CalculatingtheCIConstructupperandlowerlimitsof95%CI◦ Lower:�̅� − (𝑡<.<c= ∗ 𝑆𝐸,̅)◦ Upper:�̅� + (𝑡<.<c= ∗ 𝑆𝐸,̅)
### Calculate t_0.025> qt(0.025, 14)[1] -2.144787
### Calculate t_0.025*SE> t <- abs(qt(0.025, 14)) > se <- sd(bad.disease.temp$temp)/sqrt(15)> t * se
[1] 0.3713605
95%CI=𝒙I ± 𝒕𝟎.𝟎𝟐𝟓 ∗ 𝑺𝑬𝒙I
= 𝟗𝟗. 𝟓𝟗 ± 0.371
Thetruepopulationmeanis95%likelytobeintherange99.22– 99.96
BringitalltogetherWehaveasamplemeanof99.59withastandarderrorof0.19.
Ourteststatistictdf=14 =2.66,givingaP-value=0.009.Werejectthenullhypothesisat𝛂 =0.05 andhaveevidencethatBadDiseaseraisestemperature.
Oureffectsizeis0.99.
Wefounda95%CIof99.59±0.371,givingthelikelyrangeforthetruepopulationparameter.Notethatthenullof98.6isnotintheCI.
Reportingnon-significantresultsLet'ssaywefoundtdf=14 =1.05à P=0.15
At𝛂 =0.05,wefailtorejectthenullhypothesis.WehavenoevidencethatBadDiseaseraisesbodytemperature.◦ DoesnotmeanthatBadDiseasedoesn'traisethetemperature– oursamplejusthadnoevidenceforthiseffect.
### Calculate P-value> 1 - pt(1.05, 14)[1] 0.1557531
Whatis aP-value?TheP-valueisanareaunderthecurveofthenulldistribution◦ Itisthereforetheprobabilityofobservingthiseffectorlargerassumingthenullhypothesisistrue
◦ P-value=P(effectormoreobserved|H0 istrue)◦ P-value=0.009:IfH0 istrue,Iwouldobtainthiseffectorlarger(t>=2.66)in0.9%ofsuchstudiesduetorandomsamplingerror
Whatis aP-value?AlowP-valuemeansthatthedataareunlikelyunderthenull◦ Wethereforemakeaneducatedguessthatthereisprobablysomethingelsegoingon,suchasthealternativehypothesis
◦ Wecannever ruleoutthepossibilitythatresultswerefullyconsistentwithnull,justunlikely
P-valuesarenotmagicP-valuescannot evaluatewhetherH0 orHA istrue◦ LargeP-valuesdonot provethenullistrue◦ SmallP-valuesdonot provethealternativeistrue.Theymerelysuggestthenulllikelyisn't.
P-valuesdonot givetheprobabilitythatyoumadetherightconclusion
TwostudieswiththesameP-valuedonotprovidethesameweightofevidence
DistributionofP-valuesIperform1000t-testsonrandomsamplesfromN(3.5,1)andcomparetonullmean=3.5.Therefore,nullistrue.
Histogram of ps
Pvalues
Count
0.0 0.2 0.4 0.6 0.8 1.0
020
4060
80
P<=0.05à Falsepositives
DistributionofP-valuesIperform1000t-testsonrandomsamplesfromN(3.5,1)andcomparetonullmean=2.Therefore,nullisfalse.
P−values
Count
0.0 0.2 0.4 0.6 0.8 1.0
0200
400
600
P<=0.05à TruepositivesOnly~67%ofdistribution!
P-valuesandeffectsize
cally important. This is often untrue. First, the difference maybe too small to be clinically important. The P value carries noinformation about the magnitude of an effect, which is cap-tured by the effect estimate and confidence interval. Second,the end point may itself not be clinically important, as canoccur with some surrogate outcomes: response rates versussurvival, CD4 counts versus clinical disease, change in a mea-surement scale versus improved functioning, and so on.25-27
Misconception #4: Studies with P values on opposite sides of.05 are conflicting. Studies can have differing degrees of sig-nificance even when the estimates of treatment benefit areidentical, by changing only the precision of the estimate, typ-ically through the sample size (Figure 2A). Studies statisti-cally conflict only when the difference between their results isunlikely to have occurred by chance, corresponding to whentheir confidence intervals show little or no overlap, formallyassessed with a test of heterogeneity.
Misconception #5: Studies with the same P value provide thesame evidence against the null hypothesis. Dramatically differentobserved effects can have the same P value. Figure 2B showsthe results of two trials, one with a treatment effect of 3%(confidence interval [CI], 0% to 6%), and the other with aneffect of 19% (CI, 0% to 38%). These both have a P value of.05, but the fact that these mean different things is easilydemonstrated. If we felt that a 10% benefit was necessary tooffset the adverse effects of this therapy, we might well adopta therapy on the basis of the study showing the large effectand strongly reject that therapy based on the study showingthe small effect, which rules out a 10% benefit. It is of coursealso possible to have the same P value even if the lower CI isnot close to zero.
This seeming incongruity occurs because the P value de-fines “evidence” relative to only one hypothesis—the null.There is no notion of positive evidence—if data with a P !.05 are evidence against the null, what are they evidence for?In this example, the strongest evidence for a benefit is for 3%in one study and 19% in the other. If we quantified evidencein a relative way, and asked which experiment provided
greater evidence for a 10% or higher effect (versus the null),we would find that the evidence was far greater in the trialshowing a 19% benefit.13,18,28
Misconception #6: P ! .05 means that we have observeddata that would occur only 5% of the time under the null hypoth-esis. That this is not the case is seen immediately from the Pvalue’s definition, the probability of the observed data, plusmore extreme data, under the null hypothesis. The result withthe P value of exactly .05 (or any other value) is the mostprobable of all the other possible results included in the “tailarea” that defines the P value. The probability of any individ-ual result is actually quite small, and Fisher said he threw inthe rest of the tail area “as an approximation.” As we will seelater in this chapter, the inclusion of these rarer outcomesposes serious logical and quantitative problems for the Pvalue, and using comparative rather than single probabilitiesto measure evidence eliminates the need to include outcomesother than what was observed.
This is the error made in the published survey of medicalresidents cited in the Introduction,3 where the following fouranswers were offered as possible interpretations of P ".05:
a. The chances are greater than 1 in 20 that a differencewould be found again if the study were repeated.
b. The probability is less than 1 in 20 that a difference thislarge could occur by chance alone.
c. The probability is greater than 1 in 20 that a differencethis large could occur by chance alone.
d. The chance is 95% that the study is correct.
The correct answer was identified as “c”, whereas the ac-tual correct answer should have read, “The probability isgreater than 1 in 20 that a difference this large or larger couldoccur by chance alone.”
These “more extreme” values included in the P-value def-inition actually introduce an operational difficulty in calcu-lating P values, as more extreme data are by definition unob-served data. What “could” have been observed depends onwhat experiment we imagine repeating. This means that twoexperiments with identical data on identical patients couldgenerate different P values if the imagined “long run” weredifferent. This can occur when one study uses a stoppingrule, and the other does not, or if one employs multiplecomparisons and the other does not.29,30
Misconception #7: P ! .05 and P !.05 mean the samething. This misconception shows how diabolically difficult itis to either explain or understand P values. There is a bigdifference between these results in terms of weight of evi-dence, but because the same number (5%) is associated witheach, that difference is literally impossible to communicate. Itcan be calculated and seen clearly only using a Bayesian evi-dence metric.16
Misconception #8: P values are properly written as inequal-ities (eg, “P !.02” when P ! .015). Expressing all P values asinequalities is a confusion that comes from the combinationof hypothesis tests and P values. In a hypothesis test, a pre-set“rejection” threshold is established. It is typically set at P !.05, corresponding to a type I error rate (or “alpha”) of 5%. Insuch a test, the only relevant information is whether the
-10
-5
0
5
10
15
20
25
30
35
40
Perc
ent b
enef
it
P=0.30 P=0.002
P=0.05 P=0.05
Same effect, different P-value
Different effect, same P-value
A BFigure 2 Figure showing how the P values of very different signifi-cance can arise from trials showing the identical effect with differentprecision (A, Misconception #4), or how same P value can be de-rived from profoundly different results (B, Misconception #5).
Twelve P-value misconceptions 137
P-valuesarestronglyinfluencedbysamplesize
𝑡 = ,̅-.RS�= 33.=3-3T.UV.WW
𝟏𝟓�= 2.66à P=0.009
𝑡 = ,̅-.RS�= 33.=3-3T.UV.WW
𝟏𝟎𝟎�= 6.88à P=2.81e-10
Increasingsamplingsizeincreasespower
Poweristheprobabilityyoudetectatrueeffect,i.e.truepositiverate
P-valuesarekindofanaccidentPersonally,thewriterpreferstosetalowstandardofsignificanceatthe5percentpoint....Ascientificfactshouldberegardedasexperimentallyestablishedonlyifaproperlydesignedexperimentrarelyfailstogivethislevelofsignificance.
- R.A.Fisher
TorecapWehaveperformedaone-sidedt-testtotestthealternativehypothesisthat𝒙I > 𝟗𝟖. 𝟔
Wecanalsoperformatwo-sidedt-testtotestthenon-directional alternativehypothesisthat 𝒙I ≠ 𝟗𝟖. 𝟔
Computetheteststatistic,tH0 :BadDiseasedoesnotaffectbodytemperature.𝒙I = 𝟗𝟖. 𝟔HA :BadDiseaseaffects bodytemperature.𝒙I ≠ 𝟗𝟖. 𝟔
𝒙I =99.59𝜇 =98.6n=15
𝑡 = ,̅-.RS�= 33.=3-3T.UV.WW
VX�= 2.66
> head(bad.disease.temp)temp
1 98.174202 97.621373 99.609204 100.441585 99.754836 100.28846
> mean(bad.disease.temp$temp)[1] 99.594
> sd(bad.disease.temp$temp) [1] 1.438273
thisisourteststatistic,t2.66
Moreprecisely,t2.66,df=14
ComputingtheP-valueForatwo-sidedtest,weconsiderbothextremes
## P(X >= 2.66) or P(X<=-2.66)> 2 * (1 - pt(2.66, 14))
0.01806
0.0
0.1
0.2
0.3
0.4
−4 −2 0 2 4t
Prob
abilit
y D
ensi
ty
2.66
ThiscombinedareaisourP-value=0.018
-2.66
Whenrunningaone-sidedt-testgoeswrongForsampleofn=20,Iwanttotest�̅� < 𝜇 where𝜇 =5.8> head(example.sample)
values1 5.5113072 5.0126123 6.1784214 7.5875685 6.8921656 5.197049
> mean(example.sample$values)[1] 6.325366
> sd(example.sample$values)[1] 0.8295474
𝑡 = �̅� − 𝜇𝑠𝑛�
= 6.32 − 5.80.8320�
= 2.801
Theareabelow thestatisticisourP-valuefor�̅� < 𝜇
0.0
0.1
0.2
0.3
0.4
−4 −2 0 2 4t
Prob
abilit
y D
ensi
ty2.801
> pt(2.801, 19)[1] 0.9943006
Two-samplet-testsComparemeansoftwosamples(�̅�0and�̅�c )toeachother.Aretheunderlyingpopulationmeans𝜇0and𝜇c thesame?
NullhypothesisH0 :𝝁𝟏 =𝝁𝟐
H0 :𝝁𝟏 - 𝝁𝟐 = 𝟎
One-sidedtestHA :𝝁𝟏 > 𝝁𝟐HA :𝝁𝟏 < 𝝁𝟐
HA :𝝁𝟏 −𝝁𝟐 > 𝟎HA :𝝁𝟏 −𝝁𝟐 < 𝟎
Two-sidedtestHA :𝝁𝟏 ≠ 𝝁𝟐
HA :𝝁𝟏 − 𝝁𝟐 ≠ 𝟎
Formulasfortwo-samplet-test
𝒕 = 𝒙I𝟏 − 𝒙I𝟐𝑺𝑬𝒙I𝟏-𝒙I𝟐
𝑆𝐸,̅V-,̅@ = 𝑠jc 1𝑛0+
1𝑛c
�
𝑠jc =𝑑𝑓0𝑠0c + 𝑑𝑓c𝑠cc
𝑑𝑓0 + 𝑑𝑓c
𝑑𝑓 = 𝑑𝑓0 + 𝑑𝑓c = 𝑛0 + 𝑛c − 2
Standarderrorfortwosamples
Pooledsamplevariance
Degreesoffreedomfort distribution
Two-samplet-testassumptionsBothsamplesmustbedistributednormally
Samplesshouldhaveequalvariances◦ F-test cancomparevariancesofsamplestocheckassumption,butitishighlysensitiveandwill"toooften"rejectthenull.
◦ Levene's test willtestforhomogeneityofvariancesaswell◦ CanuseWelch'st-test whenvarianceassumptionisnotmet
Forthisclass,wefocusonnormalassumption
Two-samplevs.pairedt-testPairedt-testisaspecialcasewherethetwosamplesbeingcomparedhaveanaturalpairing◦ Effectivelyaone-samplet-test wherewetesttheifdifference betweentwosamples=0
Pairedt-testmustcheckassumptionthatdifferencebetweenmeansisnormalYoucanalwaysperformatwo-sampleinsteadofpaired,butpairedwillhavemorepower
PairedscenariosMakingtwomeasurementsoneachsubject
Makingrepeatmeasurementsonthesamesubjectattwotimepoints◦ Beforeandaftertreatment
Matchingsubjectswithsimilarage,sex,etc.
Placingsubjectandcontrolincloseproximity
Isitpairedorindependent?Triglyceridelevelsofagroupofsubjectsiscomparedbeforeandaftertakingavitaminsupplement.Foraclinicaltrial,onegroupisgivenavitaminandtheotheraplacebo.Theirtriglyceridelevelsarecompared.Foraclinicaltrial,twogroupswithindividualsmatchedforage,sex,andhealthhistoryaregivenvitaminandplacebo,respectively.Triglyceridelevelsarecompared.Imeasurefreeenergies,betweeninactiveandactiveconformations,for30enzymes.Aclinicaltrialtestsanewdrugon20setsoftwins,giving,foreachtwinpair,oneaplaceboandonethedrug.Comparisonsbetweenplaceboanddruggroupsaremade.Aclinicaltrialtestsanewdrugon20setsoftwins,randomizingallindividualsintoaplacebogroupandoradruggroup.Comparisonsbetweengroupsaremade.
Paired
Independent
PairedPaired
Paired
Independent
Isitpairedorindependent?Paired:CanIdrawalinebetweenindividualsingroups?◦ MustbesameNpergroup,bydefinition
Independent:Nonaturalwaytodrawthelines.
5
10
15
before afterPaired groups
Mea
sure
d va
lue
Troubleshooting:Failuretomeetnormalassumption1.Ifsamplesizeislargeenough(>~30),CentralLimitTheoremkicksinandassumptionsareeffectivelymet
2.Ifsampleissizeissmall(<~30)wecaneither:◦ Transform thedatatobenormal◦ Useanonparametrictest
Non-normaldataright skew
coun
t
0 5 10 15 20 25 30
010
25
left skew
coun
t
−20 −10 0 5 10
020
40
long tails
coun
t
−20 0 10 20 30
010
30
●
●●
●
●●
●
●
● ●
●
●●
●●
●●●●●
●●●
●●
●●
●
●
●
●
●●
●●● ●
●●●
●●
●
●●●●●
●●●
●
●
●●
●●
●
●
●● ●●●
●●
●
●
●
●
● ●● ●●
●●●●
●●●●●
●●
●
●
●●
●●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●●
●
●
●
●
−2 −1 0 1 2
010
2030
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
●● ●●
●● ●
● ●●
● ●●
●●●●
●
●● ●
●●
●
●
●●
●●
●
●●
●
●
●●●●
●●●
●
●
●●
●
●●●
●●
●
●
●
●
●
●●
●
●● ●
● ●
●
●
●
●
●
●●
●●●
●
●●
●
●
●●
●
●●
●●●●
●●
●
● ●●
●●
●●
●●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
−2 −1 0 1 2
−20
−55
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
●
●
●●● ●●
●●
●●
●
●●●
● ● ● ●● ●
●●●●●●
●● ●●●
●● ●●●●
●● ●● ●
●
●●
●●●●●
●
●●●● ●
●● ●
●
●
●
●●●
●●
●
●
●●●●
●●
● ●
●●
●
●●
●
●●
● ●●
●●●
●●●●●
● ●●
●
●●
●
●
●
●
●●
●
●
●
●●
●●●
●●
●●●
●
●●
●●●
●
●●
●
●●
●
●
●
●
●●
●
●●
●●
●●
●
●
●
−2 −1 0 1 2
−20
020
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
DatatransformationsLogofdata:xà log(x)
Squarerootofdata:xà sqrt(x)
Inverseofdata:xà 1/x
Datatransforms:CautionYourtestwillnowrunontransformeddata◦ Assumelogtransformperformedandresulthaseffectsize1.5◦ Actualeffectsizeisexp(1.5)=4.48
Becarefulof0'sindata◦ 1/0andlog(0)areundefined◦ Hack:Replaceall0'swithtinynumberlike1e-8
InstructorEditorializing:
MuchlikeZ-tables,datatransformsaremostlyhistorical
Top Related