A Bayesian Method for Rank Agreggation
description
Transcript of A Bayesian Method for Rank Agreggation
![Page 1: A Bayesian Method for Rank Agreggation](https://reader035.fdocuments.us/reader035/viewer/2022070503/568161d9550346895dd1e1be/html5/thumbnails/1.jpg)
A Bayesian Method for Rank Agreggation
Xuxin Liu, Jiong Du, Ke Deng, and Jun S LiuDepartment of StatisticsHarvard University
![Page 2: A Bayesian Method for Rank Agreggation](https://reader035.fdocuments.us/reader035/viewer/2022070503/568161d9550346895dd1e1be/html5/thumbnails/2.jpg)
Outline of the talkMotivationsMethods Review
◦Classics: SumRank, Fisher, InvZ◦State transition method: MC4, MCT
Bayesian model for the ranks ◦power laws◦MCMC algorithm
Simulation results
![Page 3: A Bayesian Method for Rank Agreggation](https://reader035.fdocuments.us/reader035/viewer/2022070503/568161d9550346895dd1e1be/html5/thumbnails/3.jpg)
MotivationsGoal: to combine rank lists from
multiple experiments to obtain a “most reliable” list of candidates.
Examples:◦Combine ranking results from different
judges◦Combine biological evidences to rank the
relevance of genes to a certain disease◦Combine different genomic experiment
results for a similar biological setup
![Page 4: A Bayesian Method for Rank Agreggation](https://reader035.fdocuments.us/reader035/viewer/2022070503/568161d9550346895dd1e1be/html5/thumbnails/4.jpg)
Data – the rank matrixThe ranks of N “genes” in M experiments.
Questions of interest:◦ How many genes are “true” targets (e.g., truly
differentially expressed, or truly involved in a certain biological function)
◦ Who are they?
![Page 5: A Bayesian Method for Rank Agreggation](https://reader035.fdocuments.us/reader035/viewer/2022070503/568161d9550346895dd1e1be/html5/thumbnails/5.jpg)
ChallengesFull rank list versus partial rank
list◦Sometimes we can only “reliably”
rank the top k candidates (genes)The quality of the ranking results
can vary greatly from experiment (judge) to experiment (judge)
There are also “spam” experiments that give high ranks to certain candidates because of some other reasons (bribes)
![Page 6: A Bayesian Method for Rank Agreggation](https://reader035.fdocuments.us/reader035/viewer/2022070503/568161d9550346895dd1e1be/html5/thumbnails/6.jpg)
Some available methodsRelated to the methods for
combining multiple p-values:
![Page 7: A Bayesian Method for Rank Agreggation](https://reader035.fdocuments.us/reader035/viewer/2022070503/568161d9550346895dd1e1be/html5/thumbnails/7.jpg)
Corresponding methods for ranksUnder H0, each candidate’s rank is
uniformly distributed in {1,…,N}. Hence, the p-value for a gene ranked at the kth place has a p-value k/N.
Hence the previous 3 methods correspond to
![Page 8: A Bayesian Method for Rank Agreggation](https://reader035.fdocuments.us/reader035/viewer/2022070503/568161d9550346895dd1e1be/html5/thumbnails/8.jpg)
Problems with these methodsExperiments are treated equally,
which is sometimes undesirable
![Page 9: A Bayesian Method for Rank Agreggation](https://reader035.fdocuments.us/reader035/viewer/2022070503/568161d9550346895dd1e1be/html5/thumbnails/9.jpg)
Transition matrix method (google inspired?)
Treat each gene as a node. P(i,j) is the transition probabilities from i to j.
The stationary distribution is given by
The importance of each candidate is ranked by
P
![Page 10: A Bayesian Method for Rank Agreggation](https://reader035.fdocuments.us/reader035/viewer/2022070503/568161d9550346895dd1e1be/html5/thumbnails/10.jpg)
MC4 algorithmThe method is usually applied to
rank the top K candidates, so P is KK matrix◦Let U be the list of genes that hare
ranked as top K at least once in some experiment
◦For each pair of genes in U, let if for a majority of experiments i is ranked above j.
◦Define◦Make P ergodic by mixing:
, 1i jm
, , / | |, for i j i jP m U i j
*, ,(1 ) / | | / | |, for i j i jP m U U i j
![Page 11: A Bayesian Method for Rank Agreggation](https://reader035.fdocuments.us/reader035/viewer/2022070503/568161d9550346895dd1e1be/html5/thumbnails/11.jpg)
CommentsThe method can be viewed as a
variation of the simple majority vote
As long as spam experiments do not dominate the truth, MC4 can filter them out.
Ad-hoc, no clear principles behind the method.
![Page 12: A Bayesian Method for Rank Agreggation](https://reader035.fdocuments.us/reader035/viewer/2022070503/568161d9550346895dd1e1be/html5/thumbnails/12.jpg)
MCT algorithmInstead of using 0, or 1 for ,
it defines
where is the number of times i ranked before j.
,i jm, , /i j i jm r m
,i jr
![Page 13: A Bayesian Method for Rank Agreggation](https://reader035.fdocuments.us/reader035/viewer/2022070503/568161d9550346895dd1e1be/html5/thumbnails/13.jpg)
A Bayesian modelWe introduce D as an indicator
vector indicating which of the candidates are among the true targets: ◦ if the ith gene is one of the
targets, 0 otherwise. Prior ◦The joint probability:
1iD
( , ) ( ) ( | ) ( ) ( | )jj
P R D P D P R D P D P R D
where is the rank list from the jth experiment
1, ,( , , )j j N jR R R
![Page 14: A Bayesian Method for Rank Agreggation](https://reader035.fdocuments.us/reader035/viewer/2022070503/568161d9550346895dd1e1be/html5/thumbnails/14.jpg)
Given D, we decompose Rj into 3 parts:◦ , the relative ranks within the
“null genes”, i.e., with◦ , the relative ranks within
“targets”, i.e., with ◦ , the relative ranks of each target
among the null genes.
0jR
0iD 1jR
1iD 1|0jR
![Page 15: A Bayesian Method for Rank Agreggation](https://reader035.fdocuments.us/reader035/viewer/2022070503/568161d9550346895dd1e1be/html5/thumbnails/15.jpg)
Example
![Page 16: A Bayesian Method for Rank Agreggation](https://reader035.fdocuments.us/reader035/viewer/2022070503/568161d9550346895dd1e1be/html5/thumbnails/16.jpg)
Decompose the likelihood
![Page 17: A Bayesian Method for Rank Agreggation](https://reader035.fdocuments.us/reader035/viewer/2022070503/568161d9550346895dd1e1be/html5/thumbnails/17.jpg)
![Page 18: A Bayesian Method for Rank Agreggation](https://reader035.fdocuments.us/reader035/viewer/2022070503/568161d9550346895dd1e1be/html5/thumbnails/18.jpg)
Power law?
![Page 19: A Bayesian Method for Rank Agreggation](https://reader035.fdocuments.us/reader035/viewer/2022070503/568161d9550346895dd1e1be/html5/thumbnails/19.jpg)
![Page 20: A Bayesian Method for Rank Agreggation](https://reader035.fdocuments.us/reader035/viewer/2022070503/568161d9550346895dd1e1be/html5/thumbnails/20.jpg)
An MCMC algorithm
![Page 21: A Bayesian Method for Rank Agreggation](https://reader035.fdocuments.us/reader035/viewer/2022070503/568161d9550346895dd1e1be/html5/thumbnails/21.jpg)
Simulation Study 1
![Page 22: A Bayesian Method for Rank Agreggation](https://reader035.fdocuments.us/reader035/viewer/2022070503/568161d9550346895dd1e1be/html5/thumbnails/22.jpg)
True positives in top 20:
![Page 23: A Bayesian Method for Rank Agreggation](https://reader035.fdocuments.us/reader035/viewer/2022070503/568161d9550346895dd1e1be/html5/thumbnails/23.jpg)
Inferred qualities of the experiments
![Page 24: A Bayesian Method for Rank Agreggation](https://reader035.fdocuments.us/reader035/viewer/2022070503/568161d9550346895dd1e1be/html5/thumbnails/24.jpg)
![Page 25: A Bayesian Method for Rank Agreggation](https://reader035.fdocuments.us/reader035/viewer/2022070503/568161d9550346895dd1e1be/html5/thumbnails/25.jpg)
![Page 26: A Bayesian Method for Rank Agreggation](https://reader035.fdocuments.us/reader035/viewer/2022070503/568161d9550346895dd1e1be/html5/thumbnails/26.jpg)
![Page 27: A Bayesian Method for Rank Agreggation](https://reader035.fdocuments.us/reader035/viewer/2022070503/568161d9550346895dd1e1be/html5/thumbnails/27.jpg)
![Page 28: A Bayesian Method for Rank Agreggation](https://reader035.fdocuments.us/reader035/viewer/2022070503/568161d9550346895dd1e1be/html5/thumbnails/28.jpg)
Simulation 2: power of the spam filtering experiments
![Page 29: A Bayesian Method for Rank Agreggation](https://reader035.fdocuments.us/reader035/viewer/2022070503/568161d9550346895dd1e1be/html5/thumbnails/29.jpg)
![Page 30: A Bayesian Method for Rank Agreggation](https://reader035.fdocuments.us/reader035/viewer/2022070503/568161d9550346895dd1e1be/html5/thumbnails/30.jpg)
![Page 31: A Bayesian Method for Rank Agreggation](https://reader035.fdocuments.us/reader035/viewer/2022070503/568161d9550346895dd1e1be/html5/thumbnails/31.jpg)
SummaryThe Bayesian method is robust, and
performs especially well when the data is noisy and experiments have varying qualities
The Fisher’s works quite well in most cases, seems rather robust to noisy experiments
The MC-based methods worked surprisingly badly, with no exception