Adaptive FDR Estimation for High Dimensional Discrete Data
Transcript of Adaptive FDR Estimation for High Dimensional Discrete Data
Adaptive FDR Estimation for High DimensionalDiscrete Data
Naomi Altman1 & Isaac Dialsingh2
ADNAT 2012 - HyderabadFebruary 5, 2013
1. The Pennsylvania State University 2. The University of the West [email protected] [email protected]
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 1 / 34
False Discovery Rate
Controlling error rates is essential for high dimensional “omics” data.
Benjamini & Hochberg (1995) realized that when testing 1000’s ofhypotheses a few errors could be tolerated.
False Discovery Rate (FDR) is the expected percentage of nullhypotheses among the statistically significant tests.
Table : Outcomes of m tests.
Not Significant TotalSignificant
True Null U V m0False Null T S m1
Total W R m
FDR =E(
VR |R > 0
)P(R > 0)
R: number of rejectionsV: number of false
rejections
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 2 / 34
False Discovery Rate
Controlling error rates is essential for high dimensional “omics” data.
Benjamini & Hochberg (1995) realized that when testing 1000’s ofhypotheses a few errors could be tolerated.
False Discovery Rate (FDR) is the expected percentage of nullhypotheses among the statistically significant tests.
Table : Outcomes of m tests.
Not Significant TotalSignificant
True Null U V m0False Null T S m1
Total W R m
FDR =E(
VR |R > 0
)P(R > 0)
R: number of rejectionsV: number of false
rejections
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 2 / 34
False Discovery Rate
Controlling error rates is essential for high dimensional “omics” data.
Benjamini & Hochberg (1995) realized that when testing 1000’s ofhypotheses a few errors could be tolerated.
False Discovery Rate (FDR) is the expected percentage of nullhypotheses among the statistically significant tests.
Table : Outcomes of m tests.
Not Significant TotalSignificant
True Null U V m0False Null T S m1
Total W R m
FDR =E(
VR |R > 0
)P(R > 0)
R: number of rejectionsV: number of false
rejections
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 2 / 34
Adaptive FDR
Table : Outcomes of m tests.
Not Significant TotalSignificant
True Null U V m0False Null T S m1
Total W R m
π0 = m0m
m: number of testsm0: number of null tests
Since we are trying to control false discoveries we do not need tocontrol for the truly non-null tests.Adaptive FDR methods use an estimate of π0 to improve thepower of the multiple comparisons adjustments.
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 3 / 34
Adaptive FDR
Table : Outcomes of m tests.
Not Significant TotalSignificant
True Null U V m0False Null T S m1
Total W R m
π0 = m0m
m: number of testsm0: number of null tests
Since we are trying to control false discoveries we do not need tocontrol for the truly non-null tests.Adaptive FDR methods use an estimate of π0 to improve thepower of the multiple comparisons adjustments.
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 3 / 34
Adaptive FDR
Table : Outcomes of m tests.
Not Significant TotalSignificant
True Null U V m0False Null T S m1
Total W R m
π0 = m0m
m: number of testsm0: number of null tests
Since we are trying to control false discoveries we do not need tocontrol for the truly non-null tests.Adaptive FDR methods use an estimate of π0 to improve thepower of the multiple comparisons adjustments.
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 3 / 34
Implementing FDR procedures
Compute a test statistic for each hypothesis.We might use the p-value as the test statistic.
Order the hypotheses from most to least significant, so that H0k hasthe k th significant test statistic.Estimate FDR(k) if we reject H01 · · ·H0k .Either
Pick a level q and reject H01 · · ·H0k if FDR(k)< q ORPick a p-value α and reject H0i if its p-value is less than α. Thenestimate FDR.
0 2000 4000 6000 8000 10000
0.00
0.02
0.04
0.06
0.08
0.10
BH Heuristic
Sorted Hypothesis Number
p−va
lue
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
BH q=0.05Adaptive BH, q=0.05 pi0=.6reject at p<.02, q=0.133
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 4 / 34
Implementing FDR procedures
Compute a test statistic for each hypothesis.We might use the p-value as the test statistic.
Order the hypotheses from most to least significant, so that H0k hasthe k th significant test statistic.Estimate FDR(k) if we reject H01 · · ·H0k .
EitherPick a level q and reject H01 · · ·H0k if FDR(k)< q ORPick a p-value α and reject H0i if its p-value is less than α. Thenestimate FDR.
0 2000 4000 6000 8000 10000
0.00
0.02
0.04
0.06
0.08
0.10
BH Heuristic
Sorted Hypothesis Number
p−va
lue
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
BH q=0.05Adaptive BH, q=0.05 pi0=.6reject at p<.02, q=0.133
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 4 / 34
Implementing FDR procedures
Compute a test statistic for each hypothesis.We might use the p-value as the test statistic.
Order the hypotheses from most to least significant, so that H0k hasthe k th significant test statistic.Estimate FDR(k) if we reject H01 · · ·H0k .Either
Pick a level q and reject H01 · · ·H0k if FDR(k)< q ORPick a p-value α and reject H0i if its p-value is less than α. Thenestimate FDR.
0 2000 4000 6000 8000 10000
0.00
0.02
0.04
0.06
0.08
0.10
BH Heuristic
Sorted Hypothesis Number
p−va
lue
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
BH q=0.05Adaptive BH, q=0.05 pi0=.6reject at p<.02, q=0.133
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 4 / 34
Estimating FDR
For discrete test statistics we want to:Estimate π0.Estimate FDR.
Discrete test statisticsarise from binary and count data such as
read counts in RNA-seq and ChIP-seqSNP studiesthresholding (above/below)multiple 2-way tables (e.g. surveys)
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 5 / 34
Estimating FDR
For discrete test statistics we want to:Estimate π0.Estimate FDR.
Discrete test statisticsarise from binary and count data such as
read counts in RNA-seq and ChIP-seqSNP studiesthresholding (above/below)multiple 2-way tables (e.g. surveys)
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 5 / 34
Why does discreteness matter?
0
500
1000
1500
0.0 0.3 0.6 0.9pValues
coun
t h0
Null
Nxn−null
Figure : Continuous p-values π0 = 0.8
0
500
1000
1500
0.00 0.25 0.50 0.75 1.00pValues
coun
t h0
Null
Nxn−null
Figure : Discrete p-values π0 = 0.8
The histogram on the left represents 10000 t-tests.The histogram on the right represents p-values from 10000 Fisher exacttests.
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 6 / 34
Why does discreteness matter?edgeR 3 samples/trt
p−values
Freq
uenc
y
0.0 0.2 0.4 0.6 0.8 1.0
050
010
0015
0020
0025
00
LIMMA
p−values
Freq
uenc
y
0.0 0.2 0.4 0.6 0.8 1.0
020
040
060
080
010
0012
00
P-values from an RNA-seq study of2 maize genotypes with 3 biologicalreplicates.
P-values from a microarray study inpoppy tissues with 4 biologicalreplicates.
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 7 / 34
Why does discreteness matter?
We will use the same heuristics for discrete and continuous tests. BUT
Table : Distribution of p-values.
Continuous Discretenull p-value distribution uniform depends on an ancillary
Prob(p=1) 0 >0percent of support points 0% 100%with positive probability
minimum achievable 0 >0p-value
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 8 / 34
Estimating π0
Estimating π0 from continuous p-values
Storey (2002) estimates height of flat part of histogram.Nettleton et al (2006) estimate the heights of the bins in excess ofexpected given π̂0.Pounds and Cheng (2004) assume all true non-nulls have p=0, so2 ∗ p̄ ≈ π0.
LIMMA
p−values
Freq
uenc
y
0.0 0.2 0.4 0.6 0.8 1.0
020
040
060
080
010
0012
00
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 9 / 34
Estimating π0
Estimating π0 from discrete p-valuesThese methods seem less plausible since low power non-null testsmay have p-values far from 0.As well, both null and non-null tests have p-values with mass at 1,leading to a peak at p=1.We add 3 new methods.
edgeR 3 samples/trt
p−values
Freq
uenc
y
0.0 0.2 0.4 0.6 0.8 1.0
050
010
0015
0020
0025
00
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 10 / 34
Estimating π0
Mixture distribution of p-valuesWe use the mixture distribution
f (p) = π0f0(p) + (1− π0)fA(p)
wheref is the distribution of the p-values,f0 is the distribution of p-values for the hypotheses that are truly nullandfA is the distribution of p-values for the hypotheses that are truly notnull.
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 11 / 34
Estimating π0
Estimating π0 from discrete p-values
There is often an ancillary statistic which determines the distributionof the test statistic. e.g. row totals.
If the ancillary statistic is known for each test, the distribution ofp-values under the null f0(p) is known.If there are many tests with the same value of the ancillary, f̂ (p) theempirical distribution of the p-values can be estimated by theobserved frequencies.
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 12 / 34
Estimating π0
Estimating π0 from discrete p-values
There is often an ancillary statistic which determines the distributionof the test statistic. e.g. row totals.If the ancillary statistic is known for each test, the distribution ofp-values under the null f0(p) is known.
If there are many tests with the same value of the ancillary, f̂ (p) theempirical distribution of the p-values can be estimated by theobserved frequencies.
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 12 / 34
Estimating π0
Estimating π0 from discrete p-values
There is often an ancillary statistic which determines the distributionof the test statistic. e.g. row totals.If the ancillary statistic is known for each test, the distribution ofp-values under the null f0(p) is known.If there are many tests with the same value of the ancillary, f̂ (p) theempirical distribution of the p-values can be estimated by theobserved frequencies.
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 12 / 34
Estimating π0 using f (p) = π0f0(p) + (1− π0)fA(p)
Regression Method
Useful when we have many tests with the same ancillary statistic.Regression method - regress empirical frequencies of p-valuesagainst expected frequencies under H0.The slope is approximately π0.
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 13 / 34
Estimating π0 using the histogram of p-values
0.00
0.25
0.50
0.75
1.00
1.25
0.00 0.25 0.50 0.75 1.00p
dens
ity
type
expected
observed
Histogram methodUsing the ancillary, we compute theexpected frequency of the p-valuesunder the null.We use the area A between theobserved histogram and the histogramexpected under the null.
A ≤ 2(1− π0)
π̂0 = 1− A2 has expectation at least as
big as π0.The method is sensitive to the choiceof bin boundaries.
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 14 / 34
Estimating π0 using the histogram of p-values
0.00
0.25
0.50
0.75
1.00
1.25
0.00 0.25 0.50 0.75 1.00p
dens
ity
type
expected
observed
Histogram methodUsing the ancillary, we compute theexpected frequency of the p-valuesunder the null.We use the area A between theobserved histogram and the histogramexpected under the null.A ≤ 2(1− π0)
π̂0 = 1− A2 has expectation at least as
big as π0.
The method is sensitive to the choiceof bin boundaries.
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 14 / 34
Estimating π0 using the histogram of p-values
0.00
0.25
0.50
0.75
1.00
1.25
0.00 0.25 0.50 0.75 1.00p
dens
ity
type
expected
observed
Histogram methodUsing the ancillary, we compute theexpected frequency of the p-valuesunder the null.We use the area A between theobserved histogram and the histogramexpected under the null.A ≤ 2(1− π0)
π̂0 = 1− A2 has expectation at least as
big as π0.The method is sensitive to the choiceof bin boundaries.
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 14 / 34
Estimating π0 by removing zero-power tests
Minimum achievable p-valueFor a given test statistic, there is some set ψ1 < · · · < ψk = 1 ofachievable p-values.ψ1 is the minimal achievable p-value for the test.Select a level α the maximum p-value at which to reject the nullhypothesis.If ψ1 > α then the test has zero power.
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 15 / 34
Estimating π0 by removing zero-power tests
Tarone (1990) noted that we need not consider tests with zeropower.We call a method a “T” method if we remove the zero-power testsand then proceed with a method for continuous data.“T” methods remove some of the excess mass at p = 1 and makethe histogram of p-values more uniform.In this talk, we use the Storey-T method.We remove the tests with zero power at α = 0.01 and then useStorey’s method on the remaining tests (right plot).
p−values pi0=0.9
p−values
Fre
quen
cy
0.0 0.2 0.4 0.6 0.8 1.0
020
040
060
080
010
0012
00
p−values pi0=0.9 power>0
p−values
Fre
quen
cy
0.0 0.2 0.4 0.6 0.8 1.0
020
040
060
0
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 16 / 34
Simulated RNA-seq data
DataWe simulated RNA-seq data assuming:
m (number of tests) =1000 or 10,000.π0 = 0.1,0.2 · · · 0.8,0.9,0.95,1.0Two different discretized log-Normal distributions for totalreads/feature estimated from real data.Features are independent within sample.We used 2 treatments with no replication.The statistic was Fisher’s Exact Test.
0 200 400 600 800 1000
0.0
00
0.0
05
0.0
10
0.0
15
x value
De
nsi
ty
Lognormal Parameters
(3,2)(4,2)
log-Normal distributions
Configuration % 0 or 1 total1 0.9%2 3.2%
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 17 / 34
Estimated π0 with m=10,000
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Scenario 1, m=10000
pi0
Est
imat
ed p
i0
● pi0HistogramRegressionNettletonPoundStoreyStorey−T
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Scenario 2, m=10000
pi0
Est
imat
ed p
i0
● pi0HistogramRegressionNettletonPoundStoreyStorey−T
logNormal(3,2)
(few small counts)
logNormal(4,2)
(many small counts)
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 18 / 34
Estimating FDR
Benjamini and Hochberg (1995) suggest an algorithm for controllingFDR at level q:
Find the maximal i such that p(i) ≤ i×qm .
It has been shown when the test statistics are continuous andindependent, then this algorithm controls the FDR at level π0q
The BH method is known to be conservative with discrete tests.
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 19 / 34
Estimating FDR
Benjamini and Hochberg (1995) suggest an algorithm for controllingFDR at level q:
Find the maximal i such that p(i) ≤ i×qm .
It has been shown when the test statistics are continuous andindependent, then this algorithm controls the FDR at level π0q
The BH method is known to be conservative with discrete tests.
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 19 / 34
Estimating FDR
Adaptive FDR methods control the FDR at approximate level qusing an estimate of π0.For example, the adaptive Benjamini and Hochberg method is
Find the maximal i such that p(i) ≤ i×qmπ̂0
.
When the test statistics are continuous and independent, then thisalgorithm controls the FDR at approximately level q
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 20 / 34
Estimating FDR
Adaptive FDR methods control the FDR at approximate level qusing an estimate of π0.For example, the adaptive Benjamini and Hochberg method is
Find the maximal i such that p(i) ≤ i×qmπ̂0
.
When the test statistics are continuous and independent, then thisalgorithm controls the FDR at approximately level q
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 20 / 34
Estimating FDR
Gilbert (2005) uses Tarone’s idea of removing tests which have zeropower to achieve significance at level α.Gilbert filters zero power tests then applies the BH method to theremaining mF tests.
We suggest an adaptive Gilbert method that uses an estimate of π0with Gilbert’s method.
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 21 / 34
Estimating FDR
Gilbert (2005) uses Tarone’s idea of removing tests which have zeropower to achieve significance at level α.Gilbert filters zero power tests then applies the BH method to theremaining mF tests.We suggest an adaptive Gilbert method that uses an estimate of π0with Gilbert’s method.
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 21 / 34
Simulation Results
Using the same simulation scenario as before we implementedBenjamini and Hochberg’s 1995 methodGilbert’s (2005) method using α = 0.01Adaptive versions of BH and Gilbert using
true π0estimated π0 using the Storey-T method
We considered error rates forfalse detectionfalse nondetection (including nondetection due to zero power)total errors
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 22 / 34
Simulation Results
Using the same simulation scenario as before we implementedBenjamini and Hochberg’s 1995 methodGilbert’s (2005) method using α = 0.01Adaptive versions of BH and Gilbert using
true π0estimated π0 using the Storey-T method
We considered error rates forfalse detectionfalse nondetection (including nondetection due to zero power)total errors
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 22 / 34
Simulation Results
Using the same simulation scenario as before we implementedBenjamini and Hochberg’s 1995 methodGilbert’s (2005) method using α = 0.01Adaptive versions of BH and Gilbert using
true π0estimated π0 using the Storey-T method
We considered error rates forfalse detectionfalse nondetection (including nondetection due to zero power)total errors
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 22 / 34
Results m=10,000 few small margins
●
●
●
●
●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
020
0040
0060
0080
00
Scenario 1, m=10000 Total Rejections
pi0
Mea
n To
tal R
ejec
tions
● NonNullBHBH−TrueBH−TGilbertGilbert−TrueGilbert−T
0.0 0.2 0.4 0.6 0.8 1.0
010
020
030
040
0
Scenario 1, m=10000 False Rejections
pi0
Mea
n Fa
lse
Rej
ectio
ns
BHBH−TrueBH−TGilbertGilbert−TrueGilbert−T
0.0 0.2 0.4 0.6 0.8 1.0
0.00
0.01
0.02
0.03
0.04
0.05
Scenario 1, m=10000 E(V)/E(R)
pi0
Mea
n Fa
lse
Rej
ectio
ns
BHBH−TrueBH−TGilbertGilbert−TrueGilbert−T
0.0 0.2 0.4 0.6 0.8 1.0
010
0020
0030
00
Scenario 1, m=10000 Total Errors
pi0
Mea
n To
tal E
rror
s
BHBH−TrueBH−TGilbertGilbert−TrueGilbert−T
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 23 / 34
Results m=10,000 few small margins
●
●
●
●
●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
020
0040
0060
0080
00
Scenario 1, m=10000 Total Rejections
pi0
Mea
n To
tal R
ejec
tions
● NonNullBHBH−TrueBH−TGilbertGilbert−TrueGilbert−T
0.0 0.2 0.4 0.6 0.8 1.0
010
020
030
040
0
Scenario 1, m=10000 False Rejections
pi0
Mea
n Fa
lse
Rej
ectio
ns
BHBH−TrueBH−TGilbertGilbert−TrueGilbert−T
0.0 0.2 0.4 0.6 0.8 1.0
0.00
0.01
0.02
0.03
0.04
0.05
Scenario 1, m=10000 E(V)/E(R)
pi0
Mea
n Fa
lse
Rej
ectio
ns
BHBH−TrueBH−TGilbertGilbert−TrueGilbert−T
0.0 0.2 0.4 0.6 0.8 1.0
010
0020
0030
00
Scenario 1, m=10000 Total Errors
pi0
Mea
n To
tal E
rror
s
BHBH−TrueBH−TGilbertGilbert−TrueGilbert−T
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 23 / 34
Results m=10,000 few small margins
●
●
●
●
●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
020
0040
0060
0080
00
Scenario 1, m=10000 Total Rejections
pi0
Mea
n To
tal R
ejec
tions
● NonNullBHBH−TrueBH−TGilbertGilbert−TrueGilbert−T
0.0 0.2 0.4 0.6 0.8 1.0
010
020
030
040
0
Scenario 1, m=10000 False Rejections
pi0
Mea
n Fa
lse
Rej
ectio
ns
BHBH−TrueBH−TGilbertGilbert−TrueGilbert−T
0.0 0.2 0.4 0.6 0.8 1.0
0.00
0.01
0.02
0.03
0.04
0.05
Scenario 1, m=10000 E(V)/E(R)
pi0
Mea
n Fa
lse
Rej
ectio
ns
BHBH−TrueBH−TGilbertGilbert−TrueGilbert−T
0.0 0.2 0.4 0.6 0.8 1.0
010
0020
0030
00
Scenario 1, m=10000 Total Errors
pi0
Mea
n To
tal E
rror
s
BHBH−TrueBH−TGilbertGilbert−TrueGilbert−T
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 23 / 34
Results m=10,000 few small margins
●
●
●
●
●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
020
0040
0060
0080
00
Scenario 1, m=10000 Total Rejections
pi0
Mea
n To
tal R
ejec
tions
● NonNullBHBH−TrueBH−TGilbertGilbert−TrueGilbert−T
0.0 0.2 0.4 0.6 0.8 1.0
010
020
030
040
0
Scenario 1, m=10000 False Rejections
pi0
Mea
n Fa
lse
Rej
ectio
ns
BHBH−TrueBH−TGilbertGilbert−TrueGilbert−T
0.0 0.2 0.4 0.6 0.8 1.0
0.00
0.01
0.02
0.03
0.04
0.05
Scenario 1, m=10000 E(V)/E(R)
pi0
Mea
n Fa
lse
Rej
ectio
ns
BHBH−TrueBH−TGilbertGilbert−TrueGilbert−T
0.0 0.2 0.4 0.6 0.8 1.0
010
0020
0030
00
Scenario 1, m=10000 Total Errors
pi0
Mea
n To
tal E
rror
s
BHBH−TrueBH−TGilbertGilbert−TrueGilbert−T
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 23 / 34
Results m=10,000 many small margins●
●
●
●
●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
020
0040
0060
0080
00
Scenario 2, m=10000 Total Rejections
pi0
Mea
n To
tal R
ejec
tions
● NonNullBHBH−TrueBH−TGilbertGilbert−TrueGilbert−T
0.0 0.2 0.4 0.6 0.8 1.0
050
100
150
200
250
300
Scenario 2, m=10000 False Rejections
pi0
Mea
n Fa
lse
Rej
ectio
ns
BHBH−TrueBH−TGilbertGilbert−TrueGilbert−T
0.0 0.2 0.4 0.6 0.8 1.0
0.00
0.01
0.02
0.03
0.04
0.05
Scenario 2, m=10000 E(V)/E(R)
pi0
Mea
n Fa
lse
Rej
ectio
ns
BHBH−TrueBH−TGilbertGilbert−TrueGilbert−T
0.0 0.2 0.4 0.6 0.8 1.0
010
0020
0030
0040
0050
00
Scenario 2, m=10000 Total Errors
pi0
Mea
n To
tal E
rror
s
BHBH−TrueBH−TGilbertGilbert−TrueGilbert−T
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 24 / 34
Does it matter?
0.0 0.2 0.4 0.6 0.8 1.0
020
040
060
080
010
00
Scenario 1, m=10000 Difference in Total Errors
pi0
Mea
n D
iffer
ence
Tot
al E
rror
s
BH−TrueBH−TGilbertGilbert−TrueGilbert−T
●●
●
●
●
●
●●●
●
●
●
●●●●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
BH BHT BHTr G GT GTr
1250
1300
1350
1400
1450
Scenario 1, m=10000, pi0=0.7
0.0 0.2 0.4 0.6 0.8 1.0
020
040
060
080
010
00
Scenario 2, m=10000 Difference in Total Errors
pi0
Mea
n D
iffer
ence
Tot
al E
rror
s
BH−TrueBH−TGilbertGilbert−TrueGilbert−T
●●
●
●
●
●
●
●●●
●
●●
●●●
●
●
●
●
●
●
●●
●
●
●
BH BHT BHTr G GT GTr
1650
1700
1750
1800
1850
1900
1950
Scenario 2, m=10000, pi0=0.7
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 25 / 34
Blekhman Primate Liver Data
Blekhman, et al, (2010) used RNA-seq to interrogate liver samples inmale and female human, chimpanzee and rhesus monkey.
There were 20689 features but 2803 had no reads, and a further907 had only 1 read across the 18 samples.These 3710 features were removed, leaving 16979 features.There were 3 biological samples for each species by gendercombination.Each sample was divided into 2 sequencing lanes.
The 2 lanes were combined to attain total reads for each feature foreach biological sample.
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 26 / 34
Blekhman Primate Liver Data
Blekhman, et al, (2010) used RNA-seq to interrogate liver samples inmale and female human, chimpanzee and rhesus monkey.
There were 20689 features but 2803 had no reads, and a further907 had only 1 read across the 18 samples.These 3710 features were removed, leaving 16979 features.There were 3 biological samples for each species by gendercombination.Each sample was divided into 2 sequencing lanes.The 2 lanes were combined to attain total reads for each feature foreach biological sample.
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 26 / 34
Blekhman Primate Liver Data
We look at 2 comparisons:Comparison Test
2 lanes same human Fisher’s exact testmale human versus chimpanzee moderated Negative Binomial test
It is difficult to compute expected counts for the moderated NegativeBinomial test, so we use the T-method.We use Fisher’s exact test to estimate the minimal achievablep-value. It is conservative.Data are normalized using the TMM method.Analysis is done using edgeR in Bioconductor.
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 27 / 34
Blekhman Primate Liver Data
We look at 2 comparisons:Comparison Test
2 lanes same human Fisher’s exact testmale human versus chimpanzee moderated Negative Binomial test
It is difficult to compute expected counts for the moderated NegativeBinomial test, so we use the T-method.We use Fisher’s exact test to estimate the minimal achievablep-value. It is conservative.Data are normalized using the TMM method.Analysis is done using edgeR in Bioconductor.
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 27 / 34
Human Male 1
We compared the two lanes of sequencing data for Human male 1.
13553 features were detected by at least 1 read.10359 features were detected by at least 7 reads (giving minimumachievable p-value>0.01.)We do not expect any differences between the two lanes.
0 5 10 15
0
5
10
15
Log2(Lane 1+.5)
Log2
(Lan
e 2+
.5)
Lanes 1 and 2 of Human Male 1
14283
124165206247288329370411452493534575616657
Counts
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 28 / 34
Human Male 1
HS Male 1 p−values
All p−values
Fre
quen
cy
0.0 0.2 0.4 0.6 0.8 1.0
010
0020
0030
0040
0050
0060
0070
00
HS Male 1 Filtered p−values
Filtered p−values
Fre
quen
cy
0.0 0.2 0.4 0.6 0.8 1.0
020
040
060
080
010
0012
00
π̂0 = 1.0 using both Storey’s method and the Storey-T method.Method Number Significant FDR<0.05
Benjamini & Hochberg 3Gilbert 5
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 29 / 34
Human Males versus Chimpanzee Males
The data were normalized, and the dispersion shrinkage factorswere computed.There are 3 biological replicates of each.16375 features were detected with at least 1 read.13809 features were detected with at least 7 reads.
HS Male Vs Chimp Male
All p−values
Fre
quen
cy
0.0 0.2 0.4 0.6 0.8 1.0
010
0020
0030
00
HS Male Vs Chimp Male
Filtered p−values
Fre
quen
cy
0.0 0.2 0.4 0.6 0.8 1.0
050
010
0015
00
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 30 / 34
Human Males versus Chimpanzee Males
π̂0 = 1.0 using Storey’s method and 0.87 using the Storey-T method.
Method Number Significant Number SignificantNon-adaptive Adaptive
Benjamini & Hochberg 1166 1239Gilbert 1251 1325
Note: 1166/16375 = 7.1% so π0 = 1 is not reasonable.
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 31 / 34
Summary:
What did we learn?For tabular count data (e.g. RNA-seq, SNP)
It is important to have an estimate of π0.When most counts are big, remove features with small margins anduse methods for continuous data.When many counts are small, use the regression method.
Adaptive methodsdo not help much for π0 > 0.9.can significantly reduce total errors when π0 < 0.5
Gilbert’s method is preferable to vanilla BH.Happily Gilbert’s method is equivalent to removing features withsmall margins and then using BH.
SummaryFor RNA-seq and SNP data, remove features with small margins andproceed as if p-values were continuous.
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 32 / 34
Summary:
What did we learn?For tabular count data (e.g. RNA-seq, SNP)
It is important to have an estimate of π0.When most counts are big, remove features with small margins anduse methods for continuous data.When many counts are small, use the regression method.
Adaptive methodsdo not help much for π0 > 0.9.can significantly reduce total errors when π0 < 0.5
Gilbert’s method is preferable to vanilla BH.Happily Gilbert’s method is equivalent to removing features withsmall margins and then using BH.
SummaryFor RNA-seq and SNP data, remove features with small margins andproceed as if p-values were continuous.
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 32 / 34
Summary:
What did we learn?For tabular count data (e.g. RNA-seq, SNP)
It is important to have an estimate of π0.When most counts are big, remove features with small margins anduse methods for continuous data.When many counts are small, use the regression method.
Adaptive methodsdo not help much for π0 > 0.9.can significantly reduce total errors when π0 < 0.5
Gilbert’s method is preferable to vanilla BH.
Happily Gilbert’s method is equivalent to removing features withsmall margins and then using BH.
SummaryFor RNA-seq and SNP data, remove features with small margins andproceed as if p-values were continuous.
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 32 / 34
Summary:
What did we learn?For tabular count data (e.g. RNA-seq, SNP)
It is important to have an estimate of π0.When most counts are big, remove features with small margins anduse methods for continuous data.When many counts are small, use the regression method.
Adaptive methodsdo not help much for π0 > 0.9.can significantly reduce total errors when π0 < 0.5
Gilbert’s method is preferable to vanilla BH.Happily Gilbert’s method is equivalent to removing features withsmall margins and then using BH.
SummaryFor RNA-seq and SNP data, remove features with small margins andproceed as if p-values were continuous.
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 32 / 34
Summary:
What did we learn?For tabular count data (e.g. RNA-seq, SNP)
It is important to have an estimate of π0.When most counts are big, remove features with small margins anduse methods for continuous data.When many counts are small, use the regression method.
Adaptive methodsdo not help much for π0 > 0.9.can significantly reduce total errors when π0 < 0.5
Gilbert’s method is preferable to vanilla BH.Happily Gilbert’s method is equivalent to removing features withsmall margins and then using BH.
SummaryFor RNA-seq and SNP data, remove features with small margins andproceed as if p-values were continuous.
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 32 / 34
Many thanks
Thanks for your attention
Thanks to NSFNSF DMS 1007801 (Altman, PI)NSF IOS 0820729 (Altman, subcontract from McSteen, PI)
Main Reference:Dialsingh, I (2011) False Discovery Rates when the Statistics areDiscrete. PhD Dissertation, Dept. of Statistics, Penn StateUniversity
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 33 / 34
Many thanks
Thanks for your attention
Thanks to NSFNSF DMS 1007801 (Altman, PI)NSF IOS 0820729 (Altman, subcontract from McSteen, PI)
Main Reference:Dialsingh, I (2011) False Discovery Rates when the Statistics areDiscrete. PhD Dissertation, Dept. of Statistics, Penn StateUniversity
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 33 / 34
References
Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: Apractical and powerful approach to multiple testing. Journal of the Royal StatisticalSociety Series B, 57, 289-300.
Benjamini, Y., Hochberg, Y. (2000). On the adaptive control of the false discoveryrate in multiple testing with independent statistics. Journal of BehavioralEducational Statistics, 25, 60-83.
Gilbert,P.B. (2005). A modified false discovery rate multiple comparisonsprocedure for discrete data, applied to human immunodeficiency virus genetics.Journal of Applied Statistics, 54, 143-158.
Nettleton, D., Hwang, J.T.G., Caldo, R.A., Wise, R.P. (2006). Estimating thenumber of true null hypotheses from a histogram of p-values. Journal ofAgricultural, Biological, and Environmental Statistics, 11, 337-356.
Pounds, S. and Cheng, C. (2004). Improving false discovery rate estimation.Bioinformatics, 20, 1737-1745.
Storey, J.D. (2003) The positive false discovery rate: A Bayesian interpretation andthe q-value. Annals of Statistics. 31, 2013-2035.
Tarone,R.E. (1990) A modified Bonferroni method for discrete data. Biometrics.46, 515-522.
Altman & Dialsingh (Penn State) Discrete FDR February 5, 2013 34 / 34