Putting lots of things in order: R-VALUESpages.stat.wisc.edu/~newton/talks/cancun.pdf · Putting...

●●●●●●●●●●●●●

● ●● ●●●●● ●●● ●●● ●● ●● ● ●●●● ● ● ●●●●● ● ●● ●● ● ●● ●●● ● ●● ●●●●● ● ●● ●● ● ●●● ●●●● ●●● ●● ● ●●●● ●●●●●●● ●● ●● ●●● ●● ●●●● ●● ● ●● ●●● ● ●● ●●● ● ●●● ●●● ●●●●● ● ●●● ●●● ● ●●●● ●● ●●●● ● ●●●● ●●●●●● ●● ● ●● ●● ● ●● ● ●● ● ●●● ● ●●● ● ●●● ●●● ●●●● ●●● ●● ● ●● ●● ●● ● ●● ● ●● ●● ●●●● ●● ●● ●● ●● ●● ● ●●● ●● ●●●●●●●●● ●●●●● ●● ● ●● ●●● ●● ●● ●● ●● ● ●● ●● ●●●● ●●● ●● ●● ●●●● ● ● ●● ●●●●●● ●● ●●●●●●● ● ●●● ●● ● ●●● ●● ●●● ●●●●● ● ●● ●●● ● ● ●●●●●●●● ●●● ●● ● ●●●● ●● ●● ●●●● ●● ● ● ●● ●● ● ●●●● ●● ●●● ●●●● ●●● ●● ● ●●●●● ●● ● ●●●●●●● ● ●●● ●● ●●● ●● ●●●● ●● ●●●● ● ●● ●●

●●●●●●●●●●● ●● ● ●●● ●● ●

●●●●

●●

●●●● ●●●

1 5 10 50 100 500 1000

# Free Throw Attempts

Sports statistics 2013-14 NBA

461 players58029 free throw attempts

espn.go.com

Gene set analysis

● ●●

●●

● ●●

●●●

●●

●●●

●●

●●●

●●

● ●

●●

● ●

●●

● ●

●●

●● ●

●●

●●●●

●●

● ●

●●

●●●

●●

●●●

●●

●●●

●●

● ●

●●

● ●

●●

●●●

●●

●● ●

●●

● ●

●●●

●●

●●●

●●

●●●

●●

●●●

● ●

●●

●●●●●●●

●●

●●●

●●

●● ●

●●

● ● ●

●●

● ●

●●

● ●

●●

● ●

●●●

● ●

●●

●● ●

●●

●●●●

●●

● ●

●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●

●●● ●●

●●●

●●

● ●●

●●

●● ●

●●

●● ●

●●

●●●●

●●●

●●

● ●

●●

●●●

● ●

●●

●●●●

●●

●●●●

●●

●●●

●●●●

●●●

●● ●

●●●

●●

● ●

●●

●●●●●

●● ●

●●●

●●

●●●

● ●●

●● ●

●●

●●●

●●

● ●

●●

● ●

●●

● ●

●●

●●●

●●

●●●

●●

● ●●●

●●

● ●

●●

● ●●

●●

●●●

●●

●●●

●●●●

●●

●● ●

●●

● ●

●●

● ●

●●

●●●

●●

● ●

●●●

●●

●●●

●●

● ●●

●●●●

●●

●●●

●●

●●●

●●

●●●●

●●

● ●

●●

●●●●

●●●

●●●●●

●●

●●●

●●

●●●●

● ●

●●

● ●

●●

●●●

●●

●●●

● ●

●●

●● ● ●

●●

●●●

●●

● ●

●●

●●●●

●●●

●●● ●

●●●●

●●

● ●

●●●●●

●●

● ●

●●

●●●●

●●

● ●●

●●

●●●●

●●●

●●

● ●

●●

● ●●●

●●

● ●

●●●

●●

●●●●

●●●

●●

●●●

●●

●●●

● ●

●●

● ●

●●

●● ●●

●●

● ●

●●

● ●

●●●

●●

●●●●

●●

●●●

● ●●

●●

●● ●

● ●●●

●●

●●●

●●

● ●

●●

● ●

●●

●●●●

●●

● ●●

●●

●●●

● ●

●●● ●

●●

●●●

●●●●

●●

● ●

●●●

●●

●●●

●●●●

●●

●●●●

●●

● ● ●

●●

●● ●

●●

● ●

●●

● ●

●●

● ●

●●

●●●

●●

● ●

●●

●●●

●●

● ●

●●

● ●

●●

●●●

●●

●● ●

●●●

● ●

●●●

●●

●●●

●●

● ●

●●

● ●

●●

● ●

●●

●●●

●●

●●●

●●

● ●●

●●

● ●●

●●●

●●

● ●

●●

●●●

● ●

●●●

●●

●●●

● ●

●●●

●●

●●●

●●

● ●

●●

● ●●

●●

● ●

●●●

●●

● ●

●●

● ●

●●

●●●

●●

●●●

●●

●●●

●●

● ●

●●

●●●

●● ●●

●●

●●●

●●

●●●●

●●

●●●

●●

● ●

●●

●● ●

●●

●●●

● ●

●●

● ●●

●●

●●●●

●●●

●●

● ●

●● ●

● ●●●

●●

● ●

●●

●●●

●●

● ●●

●●●●●

●●

●● ●

●●

● ●

●●

●● ●

●●

● ●

● ●●●

●●

● ●

●●

● ●

●●

●● ●●● ●

●●

●● ●

●●● ●

●● ●

●●●● ●● ●

●●● ●●● ●●●

●●●●●●●●●● ●

●●●

●●

●●● ●●

●●

●●●

●●

●●●●

●●

●●●

●●

● ●

●●

● ●

●●

●● ●

●●

●●●

●●●●

●●

● ●

●●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●●

●●●

●●

●●●●

●●

●●●

●●

● ●●

●●

● ●

●●

●●●●

●●

● ●● ●

●●

● ●

●●●

● ●

●●

● ●●●

●●

●●●

●●

●●●

●●

●●●

●●

● ●

●●

●0.01 0.02 0.05 0.10

Standard Error

10 20 50 100 200 500 1000

set size N

Genome-wide association studies

Examples

Intuitively, the r-value for unit i is the smallest α such that when ranking units by the posterior probability of being in the top α fraction of the system, unit i remains in the top α fraction of the ranked list.

• setting: large scale, non-sparse inference• examples: genomics/sports/evaluations/...• task: rank order the units• challenge: differential uncertainty

• lots of units: • data:• parameters (of interest): • model: p(Di|✓i)

{✓i}{Di}{i}

Putting lots of things in order: R-VALUES

Free Throw Ability

0.2 0.4 0.6 0.8 1.0

0.002 0.005 0.010 0.020 0.050 0.100 0.200 0.500 1.0000

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●

P( θi ≥ θα | Di )two examplesempirical quantileλ̂αr−value

DRay.Allen = 105 116

DLeBron.James = 439 585

Improved ranking and selection 25

Table 2. Leading free-throw shooters, 2013-2014 regular season of the National BasketballAssociation. From 461 players who attempted at least one free throw, shown are the top 25players as inferred by r-value. Data Di on player i include the number of made free throws yiand the number of attempts ni. Other columns indicate free-throw percentage FTP= yi/ni,which is the maximum likelihood estimate (MLE) of the underlying ability !i; posterior meanE(!i|Di), r-value inf{" : P (!i ! !!|Di) ! #!}; qualified rank, Q.R, which is the rank of FTPamongst players for whom yi ! 125; and ranks associated with the MLE, posterior mean, andr-value.player i yi ni FTP PM RV Q.R MLE.R PM.R RV.RBrian Roberts 125 133 0.940 0.913 0.002 1 17 1 1Ryan Anderson 59 62 0.952 0.898 0.003 15 2 2Danny Granger 63 67 0.940 0.893 0.005 16 3 3Kyle Korver 87 94 0.926 0.892 0.008 19 4 4Mike Harris 26 27 0.963 0.866 0.010 14 15 5J.J. Redick 97 106 0.915 0.886 0.011 22 6 6Ray Allen 105 116 0.905 0.880 0.016 25 8 7Mike Muscala 14 14 1.000 0.844 0.017 7 34 8Dirk Nowitzki 338 376 0.899 0.891 0.018 2 30 5 9Trey Burke 102 113 0.903 0.877 0.018 28 9 10Reggie Jackson 158 177 0.893 0.877 0.024 3 32 11 11Kevin Martin 303 340 0.891 0.882 0.025 4 33 7 12Gary Neal 94 105 0.895 0.869 0.025 31 14 13D.J. Augustin 201 227 0.885 0.873 0.031 5 38 12 14Stephen Curry 308 348 0.885 0.877 0.031 6 39 10 15Patty Mills 73 82 0.890 0.860 0.032 34 19 16Courtney Lee 99 112 0.884 0.861 0.035 40 18 17Steve Nash 22 24 0.917 0.834 0.039 20.5 44 18Greivis Vasquez 95 108 0.880 0.857 0.040 41 22 19Robbie Hummel 15 16 0.938 0.825 0.043 18 55 20Mo Williams 78 89 0.876 0.850 0.046 42 24 21Kevin Durant 703 805 0.873 0.870 0.048 7 45 13 22Aaron Brooks 83 95 0.874 0.850 0.049 44 26 23Damian Lillard 371 426 0.871 0.865 0.050 8 47 16 24Nando de Colo 31 35 0.886 0.831 0.057 37 48 25

Construction of ranking variables, NBA example

Free Throw Ability

0.2 0.4 0.6 0.8 1.0

✓↵

●● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

5 10 15 20 25

t = rank from top

● ● ●

●●

● ●●

●●

●● ●

● ● ●

● ● ● ● ● ● ● ● ● ● ●●

●●

● ● ● ● ● ● ● ● ● ● ● ●

r−valueposterior meanMLE

1. local posteriors 2. estimated marginal

3. posterior exceedance probability 4. r-value

p(✓i|Di) p(✓i)

r(Di) = inf{↵ : P (✓i � ✓↵|Di) � �↵}

P {P (✓i � ✓↵|Di) � �↵} = ↵

binomial likelihoodbeta priorbeta posteriors

beta (parametric case)

r-values rank players better from mid-season data, as validated on complete season data.

r = 0.016

R-value ranking maximizes agreement between the true top α fraction and

the reported top α fraction, for allα.

R package: rvalueshttp://www.stat.wisc.edu/~newton/

Henderson, N.C. and Newton, M.A. (2013). Making the cut: improved ranking and selection for large-scale inference. arXiv:1312.5776

M.A. Newton and N.C. HendersonUniversity of Wisconsin, Madison, USA

Di = (Xi,�2i )

E(Xi|✓i,�2i ) = ✓i

var(Xi|✓i,�2i ) = �2

iV↵(x,�

2) = P (✓i � ✓↵|Xi = x,�

2i = �

Assume V↵(x,�2) is right-continuous and non-decreasing in x for every ↵ and �

Measurement model:

Assume V↵(x,�2) is right-continuous and non-decreasing in x for every ↵ and �

Theorem: P {r(Di) ↵, ✓i � ✓↵} � P {T (Di) ↵, ✓i � ✓↵}for any other ranking variable T (Di), P [T (Di) ↵] ↵

0.01 0.02 0.05 0.100.05

Standard Error

Fig. 1. Type-2 diabetes example: From the full complement of 127,903 SNPs used in the secondstage meta-analysis from Morris et al. 2012, we filtered to a reduced set of 25,558 SNPs that areprobably associated with T2D, and plot 3371 of those having highest observed association (log oddsexceeding 0.05). These estimates are based on genotype data from 22,669 T2D cases and 58,119control subjects.

18 Henderson and Newton

10 20 50 100 200 500 1000

set size N

Fig. 2. RNAi example: From a recent version of Gene Ontology, 5719 terms (gene sets) annotatebetween 10 and 1000 human genes. Shown is a summary of the integration of these terms with thelist of 984 genes detected by RNAi as being involved in influenza virus replication (from Hao et al.2013). The x-axis shows set size and the y-axis shows the proportion of the set that was detectedby RNAi. The plot is restricted to 3626 sets for which the observed proportion exceeds 5%.

Hao, L., Q. He, Z. Wang, M. Craven, M. A. Newton, and P. Ahlquist (2013). Limitedagreement of independent RNAi screens for virus-required host genes owes more to false-negative than false-positive factors. PLoS computational biology 9 (9), e1003235.

Jost, J. and X. Li-Jost (1998). Calculus of variations, Volume 64. Cambridge UniversityPress.

Kass, R. E. and A. E. Raftery (1995). Bayes factors. Journal of the American StatisticalAssociation 90 (430), pp. 773–795.

Kendziorski, C., M. Newton, H. Lan, and M. Gould (2003). On parametric empiricalBayes methods for comparing multiple groups using replicated gene expression profiles.Statistics in medicine 22 (24), 3899–3914.

Laird, N. M. and T. A. Louis (1989). Empirical Bayes ranking methods. Journal of Edu-cational and Behavioral Statistics 14 (1), 29–46.

Lehmann, E. (1986). Testing statistical hypotheses (2nd ed.). Wiley series in probabilityand mathematical statistics: Probability and mathematical statistics. Wiley.

Leng, N., J. A. Dawson, J. A. Thomson, V. Ruotti, A. I. Rissman, B. M. Smits, J. D.Haag, M. N. Gould, R. M. Stewart, and C. Kendziorski (2013). EBSeq: an empiricalbayes hierarchical model for inference in rna-seq experiments. Bioinformatics 29 (8),1035–1043.

Lin, R., T. A. Louis, S. M. Paddock, and G. Ridgeway (2006). Loss function based rankingin two-stage, hierarchical models. Bayesian Analysis 1 (4), 915–946.

McCarthy, D. J. and G. K. Smyth (2009). Testing significance relative to a fold-changethreshold is a TREAT. Bioinformatics 25 (6), 765–771.

Morris, A. P., B. F. Voight, T. M. Teslovich, T. Ferreira, A. V. Segre, V. Steinthorsdottir,R. J. Strawbridge, H. Khan, H. Grallert, A. Mahajan, et al. (2012). Large-scale associationanalysis provides insights into the genetic architecture and pathophysiology of type 2diabetes. Nature genetics 44 (9), 981–990.

Niemi, J. (2010). Evaluating individual player contributions in basketball. In JSM Proceed-ings, Statistical Computing Section, Alexandria, VA, pp. 4914–4923. American StatisticalAssociation.

Noma, H., S. Matsui, T. Omori, and T. Sato (2010). Bayesian ranking and selection methodsusing hierarchical mixture models in microarray studies. Biostatistics 11 (2), 281–289.

Normand, S.-L. T., M. E. Glickman, and C. A. Gatsonis (1997). Statistical methods forprofiling providers of medical care: issues and applications. Journal of the AmericanStatistical Association 92 (439), 803–814.

Paddock, S. M. and T. A. Louis (2011). Percentile-based empirical distribution functionestimates for performance evaluation of healthcare providers. Journal of the Royal Sta-tistical Society: Series C (Applied Statistics) 60 (4), 575–589.