P-values and statistical tests - University of Dundee · False positive probability 7 H 0: no...

P-values and statistical tests7. Multiple test corrections

Hand-outsavailableathttp://is.gd/statlec

MarekGierlińskiDivisionofComputationalBiology

Lets perform a test 𝑚 times

H0 true H0 false Total

Significant 𝐹𝑃 𝑇𝑃 𝐷

Notsignificant 𝑇𝑁 𝐹𝑁 𝑚 − 𝐷

Total 𝑚( 𝑚) 𝑚

Falsepositives Truepositives

Truenegatives Falsenegatives

Numberoftests

Numberofdiscoveries

Family-wise error rate

𝐹𝑊𝐸𝑅 = Pr(𝐹𝑃 ≥ 1)

Probabilities of independent events multiply

Tosstwocoins

𝑃 𝐻and𝐻 = 𝑃(𝐻)×𝑃(𝐻)14

Probabilities of either event is 1 − 1 − 𝑝 =

Tosstwocoins

𝑃 𝐻or𝐻 =?14

𝑃 𝑇 = 1 − 𝑃(𝐻)

𝑃 𝑇and𝑇 = 𝑃 𝑇 ×𝑃 𝑇= 1 − 𝑃 𝐻 =

𝑃 𝐻or𝐻 = 1 − 𝑃 𝑇and𝑇

𝑃 𝐻or𝐻 = 1 − 1 − 𝑃 𝐻 =

𝑃(𝐻or𝐻) = 1 − 1 −12

False positive probability

H0:noeffectSet𝛼 = 0.05

Onetest

Probabilityofhavingafalsepositive𝑃) = 𝛼

Twoindependenttests

Probabilityofhavingatleastonefalsepositiveineithertest𝑃= = 1 − 1 − 𝛼 =

𝒎 independent tests

Probabilityofhavingatleastonefalsepositiveinanytest𝑃F = 1 − 1 − 𝛼 F

Family-wise error rate (FWER)

Probabilityofhavingatleastonefalsepositiveamong𝑚 tests;𝛼 = 0.05

𝑃F = 1 − 1 − 𝛼 F

Jellybeanstest,𝑚 = 20, 𝛼 = 0.05

𝑃=( = 1 − 1 − 0.05 =( = 0.64

Bonferroni limit – to control FWER

ControllingFWER

Wewanttomakesurethat

𝐹𝑊𝐸𝑅 ≤ 𝛼J.

Then,theFWERiscontrolledatlevel𝛼′.

Bonferronilimit

𝛼J =𝛼𝑚

𝑃F = 1 − 1 −𝛼𝑚

F≈ 𝛼

Probabilityofhavingatleastonefalsepositiveamong𝑚 tests;𝛼 = 0.05

𝑃F = 1 − 1 − 𝛼 F

Test data (1000 independent experiments)

Negatives

𝜇( = 20 g𝑚( = 970

Positives

𝜇) = 40 g𝑚) = 30

Randomsamples,size𝑛 = 5,fromtwonormaldistributions

One sample t-test, H0: 𝜇 = 20 g

970negatives

30positives

Nocorrection

Significant 56 30 86Not

significant 914 0 914

Total 970 30 1000

One sample t-test, H0: 𝜇 = 20 g

Nocorrection

Significant 𝐹𝑃 = 56 𝑇𝑃 = 30 86

Notsignificant 𝑇𝑁 = 914 𝐹𝑁 = 0 914Total 970 30 1000

𝐹𝑃𝑅 =56

56 + 914 = 0.058

𝐹𝑁𝑅 =0

0 + 30 = 0

Family-wiseerrorrate 𝐹𝑊𝐸𝑅 = Pr(𝐹𝑃 ≥ 1)

Falsepositiverate 𝐹𝑃𝑅 =𝐹𝑃𝑚(

=𝐹𝑃

𝐹𝑃 + 𝑇𝑁

Falsenegativerate 𝐹𝑁𝑅 =𝐹𝑁𝑚)

=𝐹𝑁

𝐹𝑁 + 𝑇𝑃

Bonferroni limit

Nocorrection Bonferroni

𝛼 0.05 5×10ST

𝐹𝑃𝑅 0.058 0𝐹𝑁𝑅 0 0.87

Nocorrection

Total 970 30 1000

Bonferroni

Total 970 30 1000

Holm-Bonferroni method

Sortp-values

𝑝()), 𝑝(=), … , 𝑝(F)

Reject(1)if𝑝()) ≤VF

Reject(2)if𝑝(=) ≤V

Reject(3)if𝑝(W) ≤V

Stopwhen𝑝(X) >V

FSXZ)𝑘 𝑝 𝛼

𝛼𝑚

𝛼𝑚 − 𝑘 + 1

1 0.003 0.05 0.01 0.01

2 0.005 0.05 0.01 0.0125

3 0.012 0.05 0.01 0.017

4 0.04 0.05 0.01 0.025

5 0.058 0.05 0.01 0.05

Holm-BonferronimethodcontrolsFWER

Holm-Bonferroni method

Nocorrection Bonferroni HB𝛼 0.05 5×10ST 5×10ST

𝐹𝑃𝑅 0.058 0 0𝐹𝑁𝑅 0 0.87 0.87

Holm-Bonferroni

Total 970 30 1000

Bonferroni

Holm-Bonferroni

30smallestp-values(outof1000)

False discovery rate

𝐹𝑃𝑅 =𝐹𝑃𝐷

False discovery rate

Falsepositiverate

𝐹𝑃𝑅 =𝐹𝑃𝑚(

=𝐹𝑃

𝐹𝑃 + 𝑇𝑁

Thefractionoftrulynon-significanteventswefalselymarkedassignificant

𝐹𝑃𝑅 =56970 = 0.058

Falsediscoveryrate

𝐹𝑃𝑅 =𝐹𝑃𝐷 =

𝐹𝑃𝐹𝑃 + 𝑇𝑃

Thefractionofdiscoveriesthatarefalse

𝐹𝐷𝑅 =5686 = 0.65

Nocorrection

Significant 𝐹𝑃 = 56 𝑇𝑃 = 30 86

Notsignificant 𝑇𝑁 = 914 𝐹𝑁 = 0 914Total 970 30 1000

Benjamini-Hochberg method

Sortp-values

𝑝()), 𝑝(=), … , 𝑝(F)

Findthelargest𝑘,suchthat

𝑝(X) ≤𝑘𝑚𝛼

Rejectallnullhypothesesfor𝑖 = 1, … , 𝑘

𝑘 𝑝 𝛼𝛼𝑚

𝛼𝑚 − 𝑘 + 1

𝑘𝑚𝛼

1 0.003 0.05 0.01 0.01 0.01

2 0.005 0.05 0.01 0.0125 0.02

3 0.012 0.05 0.01 0.017 0.03

4 0.038 0.05 0.01 0.025 0.04

5 0.058 0.05 0.01 0.05 0.05

Benjamini-HochbergmethodcontrolsFDR

Nocorrection Bonferroni HB BH𝛼 0.05 5×10ST 3.7×10ST 0.0011𝐹𝑃𝑅 0.058 0 0 0.0021𝐹𝑁𝑅 0 0.87 0.87 0.30𝐹𝐷𝑅 0.65 0 0 0.087

Benjamini-Hochberg

Total 970 30 1000

Bonferroni

Holm-Bonferroni

Benjamini-Hochberg

Controlling FWER and FDR

Benjamini-HochbergcontrolsFDR

𝐹𝐷𝑅 =𝐹𝑃

𝐹𝑃 + 𝑇𝑃

𝐹𝐷𝑅 isarandomvariable

ControllingFDR- guaranteed

𝐸[𝐹𝐷𝑅] ≤ 𝛼

Holm-BonferronicontrolsFWER

𝐹𝑊𝐸𝑅 = Pr(𝐹𝑃 ≥ 1)

ControllingFWER- guaranteed

𝐹𝑊𝐸𝑅 ≤ 𝛼′

Benjamini-Hochberg procedure controls FDR

ControllingFDR

𝐸 𝐹𝐷𝑅 canbeapproximatedbythemeanovermanyexperiments

Bootstrap:generatetestdata10,000times,perform1000t-testsforeachsetandfindFDRforBHprocedure

Adjusted p-values

p-valuescanbe“adjusted”,sotheycomparedirectlywith𝛼,andnotX

Problem:adjustedp-valuedoesnotexpressanyprobability

Despitetheirpopularity,Irecommendagainstusingadjustedp-values

rawp-values

adjustedp-values

𝑘𝑚𝛼

How to do this in R# Read generated data

> d = read.table("http://tiny.cc/two_hypotheses", header=TRUE)> p = d$p

# Holm-Bonferroni procedure> p.adj = p.adjust(p, "holm")> p[which(p.adj < 0.05)][1] 1.476263e-05 2.662440e-05 3.029839e-05

# Benjamini-Hochberg procedure> p.adj = p.adjust(p, "BH")> p[which(p.adj < 0.05)][1] 1.038835e-03 6.670798e-04 1.050547e-03 1.476263e-05 5.271367e-04[6] 3.503370e-04 9.664789e-04 1.068863e-03 7.995860e-04 5.404476e-04[11] 9.681321e-04 1.580069e-04 1.732747e-04 3.159954e-04 2.662440e-05[16] 4.709732e-04 1.517964e-04 2.873971e-04 3.258726e-04 4.087615e-04[21] 3.029839e-05 9.320438e-04 1.713309e-04 2.863402e-04 4.082322e-04

Estimating false discovery rate

Control and estimate

ControllingFDR

1. FixacceptableFDRlimit,𝛼,beforehand

2. Findathresholdingrule,sothat

EstimatingFDR

Foreachp-value,𝑝_,formapointestimateofFDR,

𝐹𝐷�̀�(𝑝_)

P-value distribution

100%null Dataset280%null,20%alternative

P-value distribution

Good Bad!

Definition of 𝜋(

80%null,20%alternative Proportionofnulltests

𝜋( =# non−significanttests

# alltests

Storey method

Storey,J.D.,2002,JRStatistSoc B,64,479

Estimate𝝅𝟎

Usethehistogramfor𝑝 > 𝜆 toestimate𝜋n((𝜆)

Doitforall0 < 𝜆 < 1 andthenfindthebest𝜋n(

Point estimate of FDR

80%null,20%alternative Pointestimate,𝑭𝑫𝑹(𝒕)

Arbitrarylimit𝑡,every𝑝_ < 𝑡 issignificant.No.ofsignificanttestsis

𝑅(𝑡) = # 𝑝_ < 𝑡

No.offalsepositivesis

𝐹𝑃(𝑡) = 𝑡𝜋(𝑚

Hence,

𝐹𝐷𝑅 𝑡 =𝑡𝜋(𝑚𝑅(𝑡)

Truepositives

Falsepositives

Storey method

PointestimateofFDRThisistheso-calledq-value:

𝑞 𝑝_ = minwxyz

𝐹𝐷𝑅 𝑡

Ifmonotonic

𝑞 𝑝_ = 𝐹𝐷𝑅 𝑝_

How to do this in R> library(qvalue)

# Read data set 1

> pvalues = read.table('http://tiny.cc/multi_FDR', header=TRUE)> p = pvalues$p

# Benjamini-Hochberg limit> p.adj = p.adjust(p, method='BH')> sum(p.adj <= 0.05) [1] 216

# q-values

> qobj = qvalue(p)> q = qobj$qv> summary(qobj)

pi0: 0.8189884

Cumulative number of significant calls:<1e-04 <0.001 <0.01 <0.025 <0.05 <0.1 <1

p-value 40 202 611 955 1373 2138 10000q-value 0 1 1 96 276 483 10000local FDR 0 1 3 50 141 278 5915

> plot(qobj)> hist(qobj)

Interpretation of q-value

No. ID p-value q-value

... ... ... ...

100 9249 0.000328 0.0266

101 8157 0.000328 0.0266

102 8228 0.000335 0.0269

103 8291 0.000338 0.0269

104 8254 0.000347 0.0272

105 8875 0.000348 0.0272

106 8055 0.000353 0.0273

107 8235 0.000375 0.0284

108 8148 0.000376 0.0284

109 8236 0.000381 0.0284

110 8040 0.000382 0.0284

... ... ... ...

Thereare106testswith𝑞 ≤ 0.0273

Expect2.73%offalsepositivesamongthesetests

Expect~3 falsepositivesifyousetalimitof𝑞 ≤ 0.0273 or𝑝 ≤ 0.00353

q-valuetellsyouhowmanyfalsepositivesyoushouldexpectafter

choosingasignificancelimit

Q-values vs Benjamini-Hochberg

q-value

BHadjustedp-value

𝜋n(

When𝜋n( = 1,bothmethodsgivethesameresult.

ForthesameFDR,Storey’smethodprovidesmoresignificantp-values.

Hence,itismorepowerful,especiallyforsmall𝜋n(.

Butthisdependsonhowgoodtheestimateof𝜋n( is.

𝜋n( - estimateoftheproportionofnull(non-significant)tests

Which multiple-test correction should I use?

Falsepositive

“Discover”effectwherethereisnoeffect

Canbetestedinfollow-upexperiments

Nothugelyimportantinsmallsamples

Impossibletomanageinlargesamples

Falsenegative

Misseddiscovery

Onceyou’vemissedit,it’sgone

Falsepositives Falsenegatives

Nocorrection BonferroniBenjamini-Hochberg

Storey

Multiple test procedures: summary

Method Controls Advantages Disadvantages Recommendation

Nocorrection

FPR Falsenegativesnotinflated

Canresultin𝐹𝑃 ≫ 𝑇𝑃

Smallsamples,whenthecostofFNishigh

Bonferroni FWER None Lotsoffalsenegatives

Donotuse

Holm-Bonferroni

FWER SlightlybetterthanBonferroni

Lotsoffalsenegatives

Appropriateonlywhen youwanttoguardagainstanyfalsepositives

Benjamini-Hochberg

FDR Goodtrade-offbetweenfalsepositivesandnegatives

Onaverage,𝛼 ofyourpositiveswillbefalse

Betterinlargesamples

Storey -- MorepowerfulthanBH,inparticularforsmall𝜋n(

Dependsonagoodestimateof𝜋n(

Thebestmethod,givesmoreinsightintoFDR

Hand-outsavailableathttp://tiny.cc/statlec

Theoreticalnulldistribution:uniformp-values

𝑝(X) =𝑘𝑚

Generateddatanulldistribution

𝑘𝑚

𝑘𝑚𝛼

100%FDR

Nulldata1000H0

Testdata970H020H1

p-values

𝑘𝑚

𝑘𝑚𝛼

Nulldata1000H0

Testdata970H020H1

p-value

P-values and statistical tests - University of Dundee · False positive probability 7 H 0: no...

Documents

Transcript of P-values and statistical tests - University of Dundee · False positive probability 7 H 0: no...

Cumulative probability of a false-positive screening ... · Cumulative probability of a false-positive screening result in the Finnish breast cancer screening program Tilastotiede

Leveraging bloom › bloom-presentation.pdfTuning bloom filters •False positive probability: •Depends on your use case •Initial capacity: •Can't be too generous •Can't be

OFAC Name Matching and False-Positive Reduction Techniques

TRUE or FALSE Electrons have a positive charge. FALSE! Electrons have a NEGATIVE charge. 1.

Announcementsalfchen/teaching/cs134... · Biometrics • Accuracy: •False Acceptance Rate (False Positive) •False Rejection Rate (False Negative) • Retinal scanner, fingerprint

False-Positive Investigation Toolkit

Science article "False Positive" chronicles XMRV research controversy

THE POSITIVE FALSE DISCOVERY RATE: A BAYESIAN ...genomics.princeton.edu/storeylab/papers/Storey_Annals_2003.pdf · THE POSITIVE FALSE DISCOVERY RATE 2015 the p-values resulting from

A False Positive Safe Neural Network for Spam Detection

PCSL Greater China Region False Positive Test

Review Article False-positive uptake on radioiodine whole-body ...

Probability of False Prostate Cancer Diagnosis Using a ...

THE POSITIVE FOUNDATIONS OF FORMALISM: FALSE NECESSITY …cdn.harvardlawreview.org/wp-content/uploads/2014/06/vol127_solum.pdf · THE POSITIVE FOUNDATIONS OF FORMALISM: FALSE NECESSITY

Evaluation of false positive results in microbial ...

Controlling false positive selections in high-dimensional ...stat.ethz.ch/Manuscripts/buhlmann/SMMR2011.pdf · controlling false positive selections (type I error) which is a prime

Project Idea Fire potential models can help stratify and reduce the number of false positive fire ‘detections’ by assigning probability levels to the landscape.

PCSL Greater China Region False Positive Testdownload.zillya.com/2010_JAN_Greater_China_Region... · static False Positive Test PCSL scans our extensive base of clean files with each

Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

Wix Automation - The False Positive Paradox

False Discovery Rate (FDR) = proportion of false positive results out of all positive results (positive result = statistically significant result) Ladislav.