P-values and statistical tests - University of Dundee · False positive probability 7 H 0: no...

Post on 20-May-2020

1 views 0 download

Transcript of P-values and statistical tests - University of Dundee · False positive probability 7 H 0: no...

P-values and statistical tests7. Multiple test corrections

Hand-outsavailableathttp://is.gd/statlec

MarekGierlińskiDivisionofComputationalBiology

2

Lets perform a test 𝑚 times

3

H0 true H0 false Total

Significant 𝐹𝑃 𝑇𝑃 𝐷

Notsignificant 𝑇𝑁 𝐹𝑁 𝑚 − 𝐷

Total 𝑚( 𝑚) 𝑚

Falsepositives Truepositives

Truenegatives Falsenegatives

Numberoftests

Numberofdiscoveries

4

Family-wise error rate

𝐹𝑊𝐸𝑅 = Pr(𝐹𝑃 ≥ 1)

Probabilities of independent events multiply

5

Tosstwocoins

12

12

12

12

12

12

𝑃 𝐻and𝐻 = 𝑃(𝐻)×𝑃(𝐻)14

Probabilities of either event is 1 − 1 − 𝑝 =

6

Tosstwocoins

12

12

12

12

12

12

𝑃 𝐻or𝐻 =?14

14

14

𝑃 𝑇 = 1 − 𝑃(𝐻)

𝑃 𝑇and𝑇 = 𝑃 𝑇 ×𝑃 𝑇= 1 − 𝑃 𝐻 =

𝑃 𝐻or𝐻 = 1 − 𝑃 𝑇and𝑇

𝑃 𝐻or𝐻 = 1 − 1 − 𝑃 𝐻 =

𝑃(𝐻or𝐻) = 1 − 1 −12

==34

14

False positive probability

7

H0:noeffectSet𝛼 = 0.05

Onetest

Probabilityofhavingafalsepositive𝑃) = 𝛼

Twoindependenttests

Probabilityofhavingatleastonefalsepositiveineithertest𝑃= = 1 − 1 − 𝛼 =

𝒎 independent tests

Probabilityofhavingatleastonefalsepositiveinanytest𝑃F = 1 − 1 − 𝛼 F

Family-wise error rate (FWER)

8

Probabilityofhavingatleastonefalsepositiveamong𝑚 tests;𝛼 = 0.05

FWER

𝑃F = 1 − 1 − 𝛼 F

Jellybeanstest,𝑚 = 20, 𝛼 = 0.05

𝑃=( = 1 − 1 − 0.05 =( = 0.64

𝛼

Bonferroni limit – to control FWER

9

ControllingFWER

Wewanttomakesurethat

𝐹𝑊𝐸𝑅 ≤ 𝛼J.

Then,theFWERiscontrolledatlevel𝛼′.

Bonferronilimit

𝛼J =𝛼𝑚

𝑃F = 1 − 1 −𝛼𝑚

F≈ 𝛼

FWER

Probabilityofhavingatleastonefalsepositiveamong𝑚 tests;𝛼 = 0.05

𝑃F = 1 − 1 − 𝛼 F

Test data (1000 independent experiments)

10

Negatives

𝜇( = 20 g𝑚( = 970

Positives

𝜇) = 40 g𝑚) = 30

Randomsamples,size𝑛 = 5,fromtwonormaldistributions

One sample t-test, H0: 𝜇 = 20 g

11

970negatives

30positives

Nocorrection

H0 true H0 false Total

Significant 56 30 86Not

significant 914 0 914

Total 970 30 1000

One sample t-test, H0: 𝜇 = 20 g

12

Nocorrection

H0 true H0 false Total

Significant 𝐹𝑃 = 56 𝑇𝑃 = 30 86

Notsignificant 𝑇𝑁 = 914 𝐹𝑁 = 0 914Total 970 30 1000

𝐹𝑃𝑅 =56

56 + 914 = 0.058

𝐹𝑁𝑅 =0

0 + 30 = 0

Family-wiseerrorrate 𝐹𝑊𝐸𝑅 = Pr(𝐹𝑃 ≥ 1)

Falsepositiverate 𝐹𝑃𝑅 =𝐹𝑃𝑚(

=𝐹𝑃

𝐹𝑃 + 𝑇𝑁

Falsenegativerate 𝐹𝑁𝑅 =𝐹𝑁𝑚)

=𝐹𝑁

𝐹𝑁 + 𝑇𝑃

Bonferroni limit

13

Nocorrection Bonferroni

𝛼 0.05 5×10ST

𝐹𝑃𝑅 0.058 0𝐹𝑁𝑅 0 0.87

Nocorrection

H0 true H0 false Total

Significant 56 30 86Not

significant 914 0 914

Total 970 30 1000

Bonferroni

H0 true H0 false Total

Significant 0 4 4Not

significant 970 26 996

Total 970 30 1000

Holm-Bonferroni method

14

Sortp-values

𝑝()), 𝑝(=), … , 𝑝(F)

Reject(1)if𝑝()) ≤VF

Reject(2)if𝑝(=) ≤V

FS)

Reject(3)if𝑝(W) ≤V

FS=

...

Stopwhen𝑝(X) >V

FSXZ)𝑘 𝑝 𝛼

𝛼𝑚

𝛼𝑚 − 𝑘 + 1

1 0.003 0.05 0.01 0.01

2 0.005 0.05 0.01 0.0125

3 0.012 0.05 0.01 0.017

4 0.04 0.05 0.01 0.025

5 0.058 0.05 0.01 0.05

Holm-BonferronimethodcontrolsFWER

Holm-Bonferroni method

15

Nocorrection Bonferroni HB𝛼 0.05 5×10ST 5×10ST

𝐹𝑃𝑅 0.058 0 0𝐹𝑁𝑅 0 0.87 0.87

Holm-Bonferroni

H0 true H0 false Total

Significant 0 4 4Not

significant 970 26 996

Total 970 30 1000

Bonferroni

Holm-Bonferroni

30smallestp-values(outof1000)

False discovery rate

𝐹𝑃𝑅 =𝐹𝑃𝐷

False discovery rate

17

Falsepositiverate

𝐹𝑃𝑅 =𝐹𝑃𝑚(

=𝐹𝑃

𝐹𝑃 + 𝑇𝑁

Thefractionoftrulynon-significanteventswefalselymarkedassignificant

𝐹𝑃𝑅 =56970 = 0.058

Falsediscoveryrate

𝐹𝑃𝑅 =𝐹𝑃𝐷 =

𝐹𝑃𝐹𝑃 + 𝑇𝑃

Thefractionofdiscoveriesthatarefalse

𝐹𝐷𝑅 =5686 = 0.65

Nocorrection

H0 true H0 false Total

Significant 𝐹𝑃 = 56 𝑇𝑃 = 30 86

Notsignificant 𝑇𝑁 = 914 𝐹𝑁 = 0 914Total 970 30 1000

Benjamini-Hochberg method

18

Sortp-values

𝑝()), 𝑝(=), … , 𝑝(F)

Findthelargest𝑘,suchthat

𝑝(X) ≤𝑘𝑚𝛼

Rejectallnullhypothesesfor𝑖 = 1, … , 𝑘

𝑘 𝑝 𝛼𝛼𝑚

𝛼𝑚 − 𝑘 + 1

𝑘𝑚𝛼

1 0.003 0.05 0.01 0.01 0.01

2 0.005 0.05 0.01 0.0125 0.02

3 0.012 0.05 0.01 0.017 0.03

4 0.038 0.05 0.01 0.025 0.04

5 0.058 0.05 0.01 0.05 0.05

Benjamini-HochbergmethodcontrolsFDR

Benjamini-Hochberg method

19

Nocorrection Bonferroni HB BH𝛼 0.05 5×10ST 3.7×10ST 0.0011𝐹𝑃𝑅 0.058 0 0 0.0021𝐹𝑁𝑅 0 0.87 0.87 0.30𝐹𝐷𝑅 0.65 0 0 0.087

Benjamini-Hochberg

H0 true H0 false Total

Significant 2 21 23Not

significant 968 9 977

Total 970 30 1000

Bonferroni

Holm-Bonferroni

Benjamini-Hochberg

Controlling FWER and FDR

20

Benjamini-HochbergcontrolsFDR

𝐹𝐷𝑅 =𝐹𝑃

𝐹𝑃 + 𝑇𝑃

𝐹𝐷𝑅 isarandomvariable

ControllingFDR- guaranteed

𝐸[𝐹𝐷𝑅] ≤ 𝛼

Holm-BonferronicontrolsFWER

𝐹𝑊𝐸𝑅 = Pr(𝐹𝑃 ≥ 1)

ControllingFWER- guaranteed

𝐹𝑊𝐸𝑅 ≤ 𝛼′

Benjamini-Hochberg procedure controls FDR

21

ControllingFDR

𝐸[𝐹𝐷𝑅] ≤ 𝛼

𝐸 𝐹𝐷𝑅 canbeapproximatedbythemeanovermanyexperiments

Bootstrap:generatetestdata10,000times,perform1000t-testsforeachsetandfindFDRforBHprocedure

Adjusted p-values

22

p-valuescanbe“adjusted”,sotheycomparedirectlywith𝛼,andnotX

F𝛼

Problem:adjustedp-valuedoesnotexpressanyprobability

Despitetheirpopularity,Irecommendagainstusingadjustedp-values

rawp-values

adjustedp-values

𝑘𝑚𝛼

𝛼

How to do this in R# Read generated data

> d = read.table("http://tiny.cc/two_hypotheses", header=TRUE)> p = d$p

# Holm-Bonferroni procedure> p.adj = p.adjust(p, "holm")> p[which(p.adj < 0.05)][1] 1.476263e-05 2.662440e-05 3.029839e-05

# Benjamini-Hochberg procedure> p.adj = p.adjust(p, "BH")> p[which(p.adj < 0.05)][1] 1.038835e-03 6.670798e-04 1.050547e-03 1.476263e-05 5.271367e-04[6] 3.503370e-04 9.664789e-04 1.068863e-03 7.995860e-04 5.404476e-04[11] 9.681321e-04 1.580069e-04 1.732747e-04 3.159954e-04 2.662440e-05[16] 4.709732e-04 1.517964e-04 2.873971e-04 3.258726e-04 4.087615e-04[21] 3.029839e-05 9.320438e-04 1.713309e-04 2.863402e-04 4.082322e-04

23

Estimating false discovery rate

Control and estimate

25

ControllingFDR

1. FixacceptableFDRlimit,𝛼,beforehand

2. Findathresholdingrule,sothat

𝐸[𝐹𝐷𝑅] ≤ 𝛼

EstimatingFDR

Foreachp-value,𝑝_,formapointestimateofFDR,

𝐹𝐷�̀�(𝑝_)

P-value distribution

26

100%null Dataset280%null,20%alternative

P-value distribution

27

Good Bad!

Definition of 𝜋(

28

80%null,20%alternative Proportionofnulltests

𝜋( =# non−significanttests

# alltests

𝜋(

H0

H1

Storey method

29

Storey,J.D.,2002,JRStatistSoc B,64,479

Estimate𝝅𝟎

Usethehistogramfor𝑝 > 𝜆 toestimate𝜋n((𝜆)

Doitforall0 < 𝜆 < 1 andthenfindthebest𝜋n(

𝜆

Point estimate of FDR

30

80%null,20%alternative Pointestimate,𝑭𝑫𝑹(𝒕)

Arbitrarylimit𝑡,every𝑝_ < 𝑡 issignificant.No.ofsignificanttestsis

𝑅(𝑡) = # 𝑝_ < 𝑡

No.offalsepositivesis

𝐹𝑃(𝑡) = 𝑡𝜋(𝑚

Hence,

𝐹𝐷𝑅 𝑡 =𝑡𝜋(𝑚𝑅(𝑡)

𝜋(

𝑡

Truepositives

Falsepositives

Storey method

31

PointestimateofFDRThisistheso-calledq-value:

𝑞 𝑝_ = minwxyz

𝐹𝐷𝑅 𝑡

Ifmonotonic

𝑞 𝑝_ = 𝐹𝐷𝑅 𝑝_

How to do this in R> library(qvalue)

# Read data set 1

> pvalues = read.table('http://tiny.cc/multi_FDR', header=TRUE)> p = pvalues$p

# Benjamini-Hochberg limit> p.adj = p.adjust(p, method='BH')> sum(p.adj <= 0.05) [1] 216

# q-values

> qobj = qvalue(p)> q = qobj$qv> summary(qobj)

pi0: 0.8189884

Cumulative number of significant calls:<1e-04 <0.001 <0.01 <0.025 <0.05 <0.1 <1

p-value 40 202 611 955 1373 2138 10000q-value 0 1 1 96 276 483 10000local FDR 0 1 3 50 141 278 5915

> plot(qobj)> hist(qobj)

32

33

Interpretation of q-value

34

No. ID p-value q-value

... ... ... ...

100 9249 0.000328 0.0266

101 8157 0.000328 0.0266

102 8228 0.000335 0.0269

103 8291 0.000338 0.0269

104 8254 0.000347 0.0272

105 8875 0.000348 0.0272

106 8055 0.000353 0.0273

107 8235 0.000375 0.0284

108 8148 0.000376 0.0284

109 8236 0.000381 0.0284

110 8040 0.000382 0.0284

... ... ... ...

Thereare106testswith𝑞 ≤ 0.0273

Expect2.73%offalsepositivesamongthesetests

Expect~3 falsepositivesifyousetalimitof𝑞 ≤ 0.0273 or𝑝 ≤ 0.00353

q-valuetellsyouhowmanyfalsepositivesyoushouldexpectafter

choosingasignificancelimit

Q-values vs Benjamini-Hochberg

35

q-value

BHadjustedp-value

𝜋n(

When𝜋n( = 1,bothmethodsgivethesameresult.

ForthesameFDR,Storey’smethodprovidesmoresignificantp-values.

Hence,itismorepowerful,especiallyforsmall𝜋n(.

Butthisdependsonhowgoodtheestimateof𝜋n( is.

𝜋n( - estimateoftheproportionofnull(non-significant)tests

Which multiple-test correction should I use?

36

Falsepositive

“Discover”effectwherethereisnoeffect

Canbetestedinfollow-upexperiments

Nothugelyimportantinsmallsamples

Impossibletomanageinlargesamples

Falsenegative

Misseddiscovery

Onceyou’vemissedit,it’sgone

Falsepositives Falsenegatives

Nocorrection BonferroniBenjamini-Hochberg

Storey

Multiple test procedures: summary

37

Method Controls Advantages Disadvantages Recommendation

Nocorrection

FPR Falsenegativesnotinflated

Canresultin𝐹𝑃 ≫ 𝑇𝑃

Smallsamples,whenthecostofFNishigh

Bonferroni FWER None Lotsoffalsenegatives

Donotuse

Holm-Bonferroni

FWER SlightlybetterthanBonferroni

Lotsoffalsenegatives

Appropriateonlywhen youwanttoguardagainstanyfalsepositives

Benjamini-Hochberg

FDR Goodtrade-offbetweenfalsepositivesandnegatives

Onaverage,𝛼 ofyourpositiveswillbefalse

Betterinlargesamples

Storey -- MorepowerfulthanBH,inparticularforsmall𝜋n(

Dependsonagoodestimateof𝜋n(

Thebestmethod,givesmoreinsightintoFDR

Hand-outsavailableathttp://tiny.cc/statlec

Benjamini-Hochberg method

39

Theoreticalnulldistribution:uniformp-values

𝑝(X) =𝑘𝑚

Generateddatanulldistribution

Benjamini-Hochberg method

40

𝑘𝑚

𝑘𝑚𝛼

100%FDR

5%FDR

Nulldata1000H0

Testdata970H020H1

p-values

Benjamini-Hochberg method

41

𝑘𝑚

𝑘𝑚𝛼

Nulldata1000H0

Testdata970H020H1

p-value

p-value