Why multiple tests are a problem? Rafael A. Irizarry.

Why multiple tests are a Why multiple tests are a problem?problem?

Rafael A. Irizarry

Other namesOther names

• Multiple comparisons

• Data snooping

• Others?

ReferencesReferences

• H. Scheffe (1953), “A method for judging all contrasts in the analysis of variance”, Biometrika 40:87-104

• D.B. Duncan (1965), “A Bayesian Approach to multiple comparisons” Technometrics 7:171-222.

• J.W. Tukey (1953), “The problem on multiple comparisons” reprinted in CWJWT Vol. VIII (1994)

• R.G. Miller, Simultaneous Statistical nference, 2nd ed. (Springer 1981)

Thanks to Yoav BenjaminiThanks to Yoav Benjamini

Benjamini and Hochberg (1995) “Controlling the false discovery rate: a practical and powerful approach to multiple testing”. JR Stat. Soc. Ser. B

ExampleExample

E. Giovannucci, A. Ascherio, E. Rimm, M. Stampfer, G.Coldizt, W. Willett:‘‘Intake of Carotenoids and Retinol in Relationto Risk of Prostate Cancer’’, Journal of the NationalCancer Insitute 87(23):1767--1776 (6 Dec 1995).

‘‘Using responses to a validated, semiquantitative food Frequency questionnaire mailed to participants in theHealth Professionals Follow-up Study in 1986, weassessed dietary intake for a 1-year period for a cohort of47,894 eligible subjects initially free of diagnosedcancer....We calculate the relative risk (RR) for each ofthe upper categories of intake of a specific food ornutrient by dividing the incidence of prostate canceramong men in each of these categories by the rate amongmen in the lowest intake level....

‘‘Of 46 vegetables and fruits or related products, fourwere significantly associated with lower prostate cancerrisk; of the four --- tomato sauce (P for trend = 0.001),tomatoes (P for trend = 0.03), and pizza (P for trend =0.05), but not strawberries --- were primary sources oflycopene.’’

BUT the Methods section one page later states:

‘‘For each of 131 food and beverage items listed ...’’And the (presumably strongest) carotenoids and p-valuesare listed in Table 2 (p.1770):

Tomato sauce Tomatoes Tomato juice Pizza0.001 0.03 0.67 0.05

‘‘Our findings ... suggest that tomato-based foods may beespecially beneficial regarding prostate cancer risk.’’

What is a p-value again?What is a p-value again?

When nothing protects, we expect

131 x 0.05 7

foods/nutrients to have p-values < 0.05

MicroarraysMicroarrays

When no genes are changing between two groups we expect

20,000 x 0.01 = 200

genes to have p-value < 0.01

However, false positives are not as bad as in other fields

What can we do?What can we do?

• p-values no longer mean what they used to… no argument

• Histogram of p-values is useful plot

• What can we do… lots of argument

Multiple Hypothesis TestingMultiple Hypothesis Testing

CalledSignificant

Not Called

Significant

Total

Null True V m0 – V m0

Altern.True S m1 – S m1

Total R m – R m

Null = Equivalent Expression; Alternative = Differential Expression

Error RatesError Rates•Per comparison error rate (PCER): the expected value of the number of Type I errors over the number of hypotheses

PCER = E(V)/m

•Per family error rate (PFER): the expected number of Type I errorsPFER = E(V)

•Family-wise error rate: the probability of at least one Type I errorFEWR = Pr(V ≥ 1)

•False discovery rate (FDR) rate that false discoveries occurFDR = E(V/R; R>0) = E(V/R | R>0)Pr(R>0)

•Positive false discovery rate (pFDR): rate that discoveries are falsepFDR = E(V/R | R>0)

•Many others

ConclusionsConclusions

• Lets do a multiple comparison of the different beers sold by the IF

Why multiple tests are a problem? Rafael A. Irizarry.

Documents

Transcript of Why multiple tests are a problem? Rafael A. Irizarry.