Why multiple tests are a problem? Rafael A. Irizarry.

14
Why multiple tests are Why multiple tests are a problem? a problem? Rafael A. Irizarry

Transcript of Why multiple tests are a problem? Rafael A. Irizarry.

Page 1: Why multiple tests are a problem? Rafael A. Irizarry.

Why multiple tests are a Why multiple tests are a problem?problem?

Rafael A. Irizarry

Page 2: Why multiple tests are a problem? Rafael A. Irizarry.

Other namesOther names

• Multiple comparisons

• Data snooping

• Others?

Page 3: Why multiple tests are a problem? Rafael A. Irizarry.

ReferencesReferences

• H. Scheffe (1953), “A method for judging all contrasts in the analysis of variance”, Biometrika 40:87-104

• D.B. Duncan (1965), “A Bayesian Approach to multiple comparisons” Technometrics 7:171-222.

• J.W. Tukey (1953), “The problem on multiple comparisons” reprinted in CWJWT Vol. VIII (1994)

• R.G. Miller, Simultaneous Statistical nference, 2nd ed. (Springer 1981)

Page 4: Why multiple tests are a problem? Rafael A. Irizarry.

Thanks to Yoav BenjaminiThanks to Yoav Benjamini

Benjamini and Hochberg (1995) “Controlling the false discovery rate: a practical and powerful approach to multiple testing”. JR Stat. Soc. Ser. B

Page 5: Why multiple tests are a problem? Rafael A. Irizarry.

ExampleExample

E. Giovannucci, A. Ascherio, E. Rimm, M. Stampfer, G.Coldizt, W. Willett:‘‘Intake of Carotenoids and Retinol in Relationto Risk of Prostate Cancer’’, Journal of the NationalCancer Insitute 87(23):1767--1776 (6 Dec 1995).

Page 6: Why multiple tests are a problem? Rafael A. Irizarry.

‘‘Using responses to a validated, semiquantitative food Frequency questionnaire mailed to participants in theHealth Professionals Follow-up Study in 1986, weassessed dietary intake for a 1-year period for a cohort of47,894 eligible subjects initially free of diagnosedcancer....We calculate the relative risk (RR) for each ofthe upper categories of intake of a specific food ornutrient by dividing the incidence of prostate canceramong men in each of these categories by the rate amongmen in the lowest intake level....

Page 7: Why multiple tests are a problem? Rafael A. Irizarry.

‘‘Of 46 vegetables and fruits or related products, fourwere significantly associated with lower prostate cancerrisk; of the four --- tomato sauce (P for trend = 0.001),tomatoes (P for trend = 0.03), and pizza (P for trend =0.05), but not strawberries --- were primary sources oflycopene.’’

Page 8: Why multiple tests are a problem? Rafael A. Irizarry.

BUT the Methods section one page later states:

‘‘For each of 131 food and beverage items listed ...’’And the (presumably strongest) carotenoids and p-valuesare listed in Table 2 (p.1770):

Tomato sauce Tomatoes Tomato juice Pizza0.001 0.03 0.67 0.05

‘‘Our findings ... suggest that tomato-based foods may beespecially beneficial regarding prostate cancer risk.’’

Page 9: Why multiple tests are a problem? Rafael A. Irizarry.

What is a p-value again?What is a p-value again?

When nothing protects, we expect

131 x 0.05 7

foods/nutrients to have p-values < 0.05

Page 10: Why multiple tests are a problem? Rafael A. Irizarry.

MicroarraysMicroarrays

When no genes are changing between two groups we expect

20,000 x 0.01 = 200

genes to have p-value < 0.01

However, false positives are not as bad as in other fields

Page 11: Why multiple tests are a problem? Rafael A. Irizarry.

What can we do?What can we do?

• p-values no longer mean what they used to… no argument

• Histogram of p-values is useful plot

• What can we do… lots of argument

Page 12: Why multiple tests are a problem? Rafael A. Irizarry.

Multiple Hypothesis TestingMultiple Hypothesis Testing

CalledSignificant

Not Called

Significant

Total

Null True V m0 – V m0

Altern.True S m1 – S m1

Total R m – R m

Null = Equivalent Expression; Alternative = Differential Expression

Page 13: Why multiple tests are a problem? Rafael A. Irizarry.

Error RatesError Rates•Per comparison error rate (PCER): the expected value of the number of Type I errors over the number of hypotheses

PCER = E(V)/m

•Per family error rate (PFER): the expected number of Type I errorsPFER = E(V)

•Family-wise error rate: the probability of at least one Type I errorFEWR = Pr(V ≥ 1)

•False discovery rate (FDR) rate that false discoveries occurFDR = E(V/R; R>0) = E(V/R | R>0)Pr(R>0)

•Positive false discovery rate (pFDR): rate that discoveries are falsepFDR = E(V/R | R>0)

•Many others

Page 14: Why multiple tests are a problem? Rafael A. Irizarry.

ConclusionsConclusions

• Lets do a multiple comparison of the different beers sold by the IF