Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments Russ Wolfinger 1,...
-
Upload
darleen-matthews -
Category
Documents
-
view
212 -
download
0
Transcript of Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments Russ Wolfinger 1,...
![Page 1: Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments Russ Wolfinger 1, Dmitri Zaykin 2, Lev Zhivotovsky 3, Wendy Czika 1, Susan.](https://reader031.fdocuments.us/reader031/viewer/2022013101/56649f2e5503460f94c480a0/html5/thumbnails/1.jpg)
Reproducibility and Ranks of True Positives in Large Scale
Genomics Experiments
Russ Wolfinger1, Dmitri Zaykin2, Lev Zhivotovsky3,
Wendy Czika1, Susan Shao1
1SAS Institute, Inc., 2National Institute of Environmental Health Sciences, 3Vavilov Institute of General Genetics
MCP ViennaJuly 11, 2007
![Page 2: Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments Russ Wolfinger 1, Dmitri Zaykin 2, Lev Zhivotovsky 3, Wendy Czika 1, Susan.](https://reader031.fdocuments.us/reader031/viewer/2022013101/56649f2e5503460f94c480a0/html5/thumbnails/2.jpg)
Criticism of Statistical Methods in Genomics
1. Two labs run the same microarray experiment, and resulting lists of significant genes barely overlap.
2. Significant SNPs from a genetic study are not validated in subsequent follow up studies.
Conclusions from scientific community:
Statistical results are not reproducible.
Genomics technology is not reliable.
![Page 3: Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments Russ Wolfinger 1, Dmitri Zaykin 2, Lev Zhivotovsky 3, Wendy Czika 1, Susan.](https://reader031.fdocuments.us/reader031/viewer/2022013101/56649f2e5503460f94c480a0/html5/thumbnails/3.jpg)
“P vs FC” Controversy
• Occurred recently within the FDA-driven Microarray Quality Control Consortium (MAQC)
• Biologists, chemists, regulators concerned with lack of reproducibility of significant gene lists, and have observed that lists based on fold change (FC) are more consistent than those based on p-values (P)
• Statisticians usually seek an optimal tradeoff between specificity (Type 1) and sensitivity (Type 2, power), often portrayed in a Receiver Operating Characteristics (ROC) plot
![Page 4: Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments Russ Wolfinger 1, Dmitri Zaykin 2, Lev Zhivotovsky 3, Wendy Czika 1, Susan.](https://reader031.fdocuments.us/reader031/viewer/2022013101/56649f2e5503460f94c480a0/html5/thumbnails/4.jpg)
Outline
1. Reproducibility versus specificity and sensitivity
2. Rank distribution of a single true positive
3. P-value combination methods for multiple true positives
All results are based on simulation.
![Page 5: Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments Russ Wolfinger 1, Dmitri Zaykin 2, Lev Zhivotovsky 3, Wendy Czika 1, Susan.](https://reader031.fdocuments.us/reader031/viewer/2022013101/56649f2e5503460f94c480a0/html5/thumbnails/5.jpg)
Questions
• Should statisticians concern themselves with reproducibility, the hallmark of science? YES!
• How to define reproducibility?• How does it relate to specificity and
sensitivity?• Is it possible to dialectically reconcile
conflicting perspectives, or at least provide an explanatory (and hence mollifying) framework?
![Page 6: Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments Russ Wolfinger 1, Dmitri Zaykin 2, Lev Zhivotovsky 3, Wendy Czika 1, Susan.](https://reader031.fdocuments.us/reader031/viewer/2022013101/56649f2e5503460f94c480a0/html5/thumbnails/6.jpg)
Simulation Study 1: Based on MAQC Phase 1 Experiment
• Initially designed and implemented by Wendell Jones, Expression Analysis Inc.
• Two treatment groups, n=5 in each• 15,000 genes, 1000 truly changed with varying
degrees of expression that mimic real data• Coefficient of variation (CV) on original data
scale set to varying percentages: (2, 10, 30, 100)
![Page 7: Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments Russ Wolfinger 1, Dmitri Zaykin 2, Lev Zhivotovsky 3, Wendy Czika 1, Susan.](https://reader031.fdocuments.us/reader031/viewer/2022013101/56649f2e5503460f94c480a0/html5/thumbnails/7.jpg)
Simulation Study 1 (continued)• For sake of simplicity, we focus only on gene-
selection rules based on fold change (FC, same as effect size) or simple t-test p-values
• Note that gene lists can be constructed in many other ways; e.g. shrunken t-statistics
• Use Proportion of Overlapping Genes (POG) as a measure of reproducibility, based on simple Venn diagram
• Compute POG on simulated pairs of gene lists; list sizes range from 10 to 15000
• Require direction of FC to match
![Page 8: Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments Russ Wolfinger 1, Dmitri Zaykin 2, Lev Zhivotovsky 3, Wendy Czika 1, Susan.](https://reader031.fdocuments.us/reader031/viewer/2022013101/56649f2e5503460f94c480a0/html5/thumbnails/8.jpg)
Simulated POG vs. Gene List Size
FC Ranking P-Value Ranking
0
0.25
0.5
0.75
1
Y
10 100 60 40 20 1000 500 200 10000 4000 2000
size
Y POG_CV_002 POG_CV_010
POG_CV_030 POG_CV_100
0
0.25
0.5
0.75
1
Y
10 100 60 40 20 1000 500 200 10000 4000 2000
size
Y POG_CV_002 POG_CV_010
POG_CV_030 POG_CV_100
![Page 9: Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments Russ Wolfinger 1, Dmitri Zaykin 2, Lev Zhivotovsky 3, Wendy Czika 1, Susan.](https://reader031.fdocuments.us/reader031/viewer/2022013101/56649f2e5503460f94c480a0/html5/thumbnails/9.jpg)
Three Dimensions CV=2%
FC Ranking P-Value Ranking
0
0.25
0.5
0.75
1
0
0.25
0.5
0.75
1
0
0.25
0.5
0.75
1
1
2
3
4
5
sensitivity
0 .25 .5 .75 1
one_minus_specificity
0 .25 .5 .75 1
pog
0 .25 .5 .75 1
log10size
1 2 3 4 5
0
0.25
0.5
0.75
1
0
0.25
0.5
0.75
1
0
0.25
0.5
0.75
1
1
2
3
4
5
sensitivity
0 .25 .5 .75 1
one_minus_specificity
0 .25 .5 .75 1
pog
0 .25 .5 .75 1
log10size
1 2 3 4 5
![Page 10: Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments Russ Wolfinger 1, Dmitri Zaykin 2, Lev Zhivotovsky 3, Wendy Czika 1, Susan.](https://reader031.fdocuments.us/reader031/viewer/2022013101/56649f2e5503460f94c480a0/html5/thumbnails/10.jpg)
Discussion 1• Reproducibility is not monotonically related to
specificity and sensitivity.• There appear to be tradeoffs in all three
dimensions: specificity, sensitivity, and reproducibility.
• The weight attached to each dimension depends on the objectives of the study.
• Simple rules based on both FC and P-value cutoffs appear viable as a starting compromise.
• Challenge you to …
![Page 11: Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments Russ Wolfinger 1, Dmitri Zaykin 2, Lev Zhivotovsky 3, Wendy Czika 1, Susan.](https://reader031.fdocuments.us/reader031/viewer/2022013101/56649f2e5503460f94c480a0/html5/thumbnails/11.jpg)
Enter the Third Dimension
Specificity – Sensitivity - Reproducibility
![Page 12: Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments Russ Wolfinger 1, Dmitri Zaykin 2, Lev Zhivotovsky 3, Wendy Czika 1, Susan.](https://reader031.fdocuments.us/reader031/viewer/2022013101/56649f2e5503460f94c480a0/html5/thumbnails/12.jpg)
Volcano Plots Help Visualize Ranking Rules
“Dormant” Volcano from Two-Sample T-Test (df=4) on 10,000 Genes
0
1
2
3
4
-log1
0(p)
|d||d|
d*dd*d
sqrt(d)sqrt(d)
-4 -3 -2 -1 0 1 2 3 4 5
diff
![Page 13: Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments Russ Wolfinger 1, Dmitri Zaykin 2, Lev Zhivotovsky 3, Wendy Czika 1, Susan.](https://reader031.fdocuments.us/reader031/viewer/2022013101/56649f2e5503460f94c480a0/html5/thumbnails/13.jpg)
Outline
1. Reproducibility versus specificity and sensitivity
2. Rank distribution of a single true positive
3. P-value combination methods for multiple true positives
All results are based on simulation.
![Page 14: Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments Russ Wolfinger 1, Dmitri Zaykin 2, Lev Zhivotovsky 3, Wendy Czika 1, Susan.](https://reader031.fdocuments.us/reader031/viewer/2022013101/56649f2e5503460f94c480a0/html5/thumbnails/14.jpg)
Simulation Study 2A: Number of Best T-Test Results Required to Cover a
Single True Positive
• Compare different ranking rules based on P, FC, or functional combination
• Two treatment groups, n=100 in each• 38,500 t-tests (4 df), only 1 truly changed • Power for the one true positive set to (80, 90,
95, 99, and 80-Śidák) at alpha=5%
![Page 15: Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments Russ Wolfinger 1, Dmitri Zaykin 2, Lev Zhivotovsky 3, Wendy Czika 1, Susan.](https://reader031.fdocuments.us/reader031/viewer/2022013101/56649f2e5503460f94c480a0/html5/thumbnails/15.jpg)
Simulation Study 2A ResultsNumber of best t-test (df=4) results out of 38,500 required to cover a
single true positive with 95% probability
Ranking by
Power p-value (p) log(p) |d|1/2 log(p) |d| log(p) d2 |d|
80% at 5% 7255 6727 6544 6410 6374
90% at 5% 2067 1868 1863 1937 2322
95% at 5% 467 422 455 531 856
99% at 5% 11 11 16 26 101
80% at a* 1 1 1 2 12
p: p-value; d: effect size; a*: 1-(1-0.05)(1/38500)
![Page 16: Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments Russ Wolfinger 1, Dmitri Zaykin 2, Lev Zhivotovsky 3, Wendy Czika 1, Susan.](https://reader031.fdocuments.us/reader031/viewer/2022013101/56649f2e5503460f94c480a0/html5/thumbnails/16.jpg)
Simulation Study 2B: Number of Best Chi-Square Test Results Required to
Cover a Single True Positive
• Again compare different ranking rules based on p-value, effect size, or a functional combination
• Two binomial proportions, n=500 in each group• 200,000 chi-square 1-df tests, only 1 true association • Genetic allele frequency for true negatives simulated
to be uniform [0.05,0.95]• Genetic allele frequency for true positive control group
set to 0.1 or 0.5. Frequency for case group set higher to achieve power of (80, 90, 95, 99, and 80-Śidák) at alpha=5%
![Page 17: Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments Russ Wolfinger 1, Dmitri Zaykin 2, Lev Zhivotovsky 3, Wendy Czika 1, Susan.](https://reader031.fdocuments.us/reader031/viewer/2022013101/56649f2e5503460f94c480a0/html5/thumbnails/17.jpg)
Simulation Study 2B ResultsNumber of best chi-square (1 df) test results out of 200,000 required to
cover a single true positive with 95% probability TP case frequency 0.1
p: p-value; d: effect size; a*: 1-(1-0.05)(1/200,000)
Ranking by
Power p-value (p) log(p) |d|1/2 log(p) |d| log(p) d2 |d|
80% at 5% 38776 43559 46292 49332 58689
90% at 5% 12159 15075 16895 19675 27466
95% at 5% 2753 3764 4667 5900 10102
99% at 5% 55 101 157 261 869
80% at a* 1 1 1 2 7
![Page 18: Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments Russ Wolfinger 1, Dmitri Zaykin 2, Lev Zhivotovsky 3, Wendy Czika 1, Susan.](https://reader031.fdocuments.us/reader031/viewer/2022013101/56649f2e5503460f94c480a0/html5/thumbnails/18.jpg)
Simulation Study 2B ResultsNumber of best chi-square (1 df) test results out of 200,000 required to
cover a single true positive with 95% probability TP case frequency 0.5
p: p-value; d: effect size; a*: 1-(1-0.05)(1/200,000)
Ranking by
Power p-value (p) log(p) |d|1/2 Log(p) |d| log(p) d2 |d|
80% at 5% 39940 35887 33784 31678 28451
90% at 5% 11107 9293 8451 7682 6685
95% at 5% 2962 2338 2078 1856 1582
99% at 5% 51 36 31 27 23
80% at a* 1 1 1 1 1
![Page 19: Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments Russ Wolfinger 1, Dmitri Zaykin 2, Lev Zhivotovsky 3, Wendy Czika 1, Susan.](https://reader031.fdocuments.us/reader031/viewer/2022013101/56649f2e5503460f94c480a0/html5/thumbnails/19.jpg)
Discussion 2
• Incorporating effect size into ranking rules can improve ranking performance, particularly when variance of true positives is comparatively larger than variance of true negatives
• Possible Empirical Bayes effect
![Page 20: Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments Russ Wolfinger 1, Dmitri Zaykin 2, Lev Zhivotovsky 3, Wendy Czika 1, Susan.](https://reader031.fdocuments.us/reader031/viewer/2022013101/56649f2e5503460f94c480a0/html5/thumbnails/20.jpg)
Outline
1. Reproducibility versus specificity and sensitivity
2. Rank distribution of a single true positive
3. P-value combination methods for multiple true positives
All results are based on simulation.
![Page 21: Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments Russ Wolfinger 1, Dmitri Zaykin 2, Lev Zhivotovsky 3, Wendy Czika 1, Susan.](https://reader031.fdocuments.us/reader031/viewer/2022013101/56649f2e5503460f94c480a0/html5/thumbnails/21.jpg)
Simulation Study 3: Compare Power of P-Value Combination Methods with
Multiple True Positives
• 5,000 Chi-Square (1 df) tests• Number of true associations ranges from 10 to
200 with various powers • Compare Sidak, Simes, Fisher Combination,
and three more modern methods:– Gamma Method (GM)– Truncated Product Method (TPM)– Rank Truncated Product (RTP)
![Page 22: Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments Russ Wolfinger 1, Dmitri Zaykin 2, Lev Zhivotovsky 3, Wendy Czika 1, Susan.](https://reader031.fdocuments.us/reader031/viewer/2022013101/56649f2e5503460f94c480a0/html5/thumbnails/22.jpg)
Gamma Method (GM)
• Generalization of Fisher and Stouffer
• Sum inverse Gamma-transformed 1-pi
• Tune using Soft Truncation Threshold, accommodates effect heterogeneity
![Page 23: Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments Russ Wolfinger 1, Dmitri Zaykin 2, Lev Zhivotovsky 3, Wendy Czika 1, Susan.](https://reader031.fdocuments.us/reader031/viewer/2022013101/56649f2e5503460f94c480a0/html5/thumbnails/23.jpg)
Truncated Product Method (TPM)
• Combine only the subset of p-values less than some threshold
• Assess significance by evaluating product distribution via Monte Carlo on uniforms.
• Upon rejecting the null, can claim true positives are in the subset
![Page 24: Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments Russ Wolfinger 1, Dmitri Zaykin 2, Lev Zhivotovsky 3, Wendy Czika 1, Susan.](https://reader031.fdocuments.us/reader031/viewer/2022013101/56649f2e5503460f94c480a0/html5/thumbnails/24.jpg)
Rank Truncated Product (RTP)
• Combine the K smallest p-values
• Assess significance by evaluating product distribution with Monte Carlo
• K=1 same as Sidak, K=max same as Fisher
• On rejecting the null, cannot claim true positives are in the subset
![Page 25: Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments Russ Wolfinger 1, Dmitri Zaykin 2, Lev Zhivotovsky 3, Wendy Czika 1, Susan.](https://reader031.fdocuments.us/reader031/viewer/2022013101/56649f2e5503460f94c480a0/html5/thumbnails/25.jpg)
Simulation Study 3 ResultsPower of different p-value combination methods
from 5,000 chi-square (1 df) tests
#TA TA Power
Śidák Simes Fisher GM0.05
GM0.1
TPM0.05
TPM0.01
TPM0.005
TPM0.001
RTP10
RTP50
RTP100
RTP200
10 0.90 0.899 0.756 0.225 0.791 0.650 0.279 0.455 0.550 0.752 0.879 0.814 0.739 0.625
50 0.50 0.498 0.351 0.525 0.799 0.789 0.595 0.650 0.656 0.601 0.636 0.751 0.769 0.764
50 0.60 0.592 0.553 0.693 0.961 0.950 0.788 0.876 0.888 0.864 0.875 0.947 0.951 0.942
100 0.30 0.297 0.181 0.598 0.644 0.697 0.595 0.543 0.495 0.378 0.377 0.544 0.607 0.649
100 0.40 0.401 0.339 0.831 0.926 0.944 0.861 0.853 0.825 0.715 0.703 0.874 0.907 0.926
200 0.20 0.202 0.143 0.756 0.653 0.746 0.696 0.563 0.490 0.332 0.314 0.511 0.605 0.682
200 0.25 0.255 0.216 0.920 0.883 0.936 0.895 0.814 0.742 0.545 0.509 0.765 0.847 0.904
200 0.30 0.297 0.300 0.981 0.978 0.992 0.980 0.949 0.915 0.764 0.715 0.932 0.967 0.984
![Page 26: Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments Russ Wolfinger 1, Dmitri Zaykin 2, Lev Zhivotovsky 3, Wendy Czika 1, Susan.](https://reader031.fdocuments.us/reader031/viewer/2022013101/56649f2e5503460f94c480a0/html5/thumbnails/26.jpg)
Discussion 3
• Gamma Method competitive as a global test
• Truncated Product Method enables more specific inference.
![Page 27: Reproducibility and Ranks of True Positives in Large Scale Genomics Experiments Russ Wolfinger 1, Dmitri Zaykin 2, Lev Zhivotovsky 3, Wendy Czika 1, Susan.](https://reader031.fdocuments.us/reader031/viewer/2022013101/56649f2e5503460f94c480a0/html5/thumbnails/27.jpg)
Reproducibility and Ranks of True Positives in Large Scale
Genomics Experiments
Russ Wolfinger1, Dmitri Zaykin2, Lev Zhivotovsky3,
Wendy Czika1, Susan Shao1
1SAS Institute, Inc., 2National Institute of Environmental Health Sciences, 3Vavilov Institute of General Genetics
MCP ViennaJuly 11, 2007