PAGERANK PARAMETERS - University of Chicagolekheng/meetings/matho...40 60 80 100 120 40 60 80 mm...
Transcript of PAGERANK PARAMETERS - University of Chicagolekheng/meetings/matho...40 60 80 100 120 40 60 80 mm...
40 60 80 100 120
40
60
80
mm
PAGERANK PARAMETERS
David F. GleichAmy N. Langville
American Institute of MathematicsWorkshop on Ranking
Palo Alto, CAAugust 17th, 2010
Gleich & Langville AIM 1 / 21
40 60 80 100 120
40
60
80
mm
The most important page on the web
Gleich & Langville Recap AIM 2 / 21
40 60 80 100 120
40
60
80
mm
The most important page on the web
Gleich & Langville Recap AIM 2 / 21
40 60 80 100 120
40
60
80
mm
PageRank details
1
2
3
4
5
6
→
1/6 1/2 0 0 0 01/6 0 0 1/3 0 01/6 1/2 0 1/3 0 01/6 0 1/2 0 0 01/6 0 1/2 1/3 0 11/6 0 0 0 1 0
︸ ︷︷ ︸
P
Pj≥0eTP=eT
“jump” → v = [ 1n ... 1n ]T ≥0
eTv=1
Markov chain�
αP+ (1− α)veT�
x = xunique x ⇒ j ≥ 0, eTx = 1.
Linear system (− αP)x = (1− α)vIgnored dangling nodes patched back to v
algorithms laterGleich & Langville Recap AIM 3 / 21
40 60 80 100 120
40
60
80
mm
Other uses for PageRankWhat else people use PageRank to do
GeneRankMorrison et al. GeneRank, 2005
10 20 30 40 50 60 70
NM_003748NM_003862Contig32125_RCU82987AB037863NM_020974Contig55377_RCNM_003882NM_000849Contig48328_RCContig46223_RCNM_006117NM_003239NM_018401AF257175AF201951NM_001282Contig63102_RCNM_000286Contig34634_RCNM_000320AB033007AL355708NM_000017NM_006763AF148505Contig57595NM_001280AJ224741U45975Contig49670_RCContig753_RCContig25055_RCContig53646_RCContig42421_RCContig51749_RCAL137514NM_004911NM_000224NM_013262Contig41887_RCNM_004163AB020689NM_015416Contig43747_RCNM_012429AB033043AL133619NM_016569NM_004480NM_004798Contig37063_RCNM_000507AB037745Contig50802_RCNM_001007Contig53742_RCNM_018104Contig51963Contig53268_RCNM_012261NM_020244Contig55813_RCContig27312_RCContig44064_RCNM_002570NM_002900AL050090NM_015417Contig47405_RCNM_016337Contig55829_RCContig37598Contig45347_RCNM_020675NM_003234AL080110AL137295Contig17359_RCNM_013296NM_019013AF052159Contig55313_RCNM_002358NM_004358Contig50106_RCNM_005342NM_014754U58033Contig64688NM_001827Contig3902_RCContig41413_RCNM_015434NM_014078NM_018120NM_001124L27560Contig45816_RCAL050021NM_006115NM_001333NM_005496Contig51519_RCContig1778_RCNM_014363NM_001905NM_018454NM_002811NM_004603AB032973NM_006096D25328Contig46802_RCX94232NM_018004Contig8581_RCContig55188_RCContig50410Contig53226_RCNM_012214NM_006201NM_006372Contig13480_RCAL137502Contig40128_RCNM_003676NM_013437Contig2504_RCAL133603NM_012177R70506_RCNM_003662NM_018136NM_000158NM_018410Contig21812_RCNM_004052Contig4595Contig60864_RCNM_003878U96131NM_005563NM_018455Contig44799_RCNM_003258NM_004456NM_003158NM_014750Contig25343_RCNM_005196Contig57864_RCNM_014109NM_002808Contig58368_RCContig46653_RCNM_004504M21551NM_014875NM_001168NM_003376NM_018098AF161553NM_020166NM_017779NM_018265AF155117NM_004701NM_006281Contig44289_RCNM_004336Contig33814_RCNM_003600NM_006265NM_000291NM_000096NM_001673NM_001216NM_014968NM_018354NM_007036NM_004702Contig2399_RCNM_001809Contig20217_RCNM_003981NM_007203NM_006681AF055033NM_014889NM_020386NM_000599Contig56457_RCNM_005915Contig24252_RCContig55725_RCNM_002916NM_014321NM_006931AL080079Contig51464_RCNM_000788NM_016448X05610NM_014791Contig40831_RCAK000745NM_015984NM_016577Contig32185_RCAF052162AF073519NM_003607NM_006101NM_003875Contig25991Contig35251_RCNM_004994NM_000436NM_002073NM_002019NM_000127NM_020188AL137718Contig28552_RCContig38288_RCAA555029_RCNM_016359Contig46218_RCContig63649_RCAL080059
Use (− αGD−1)x =w tofind “nearby” importantgenes.
ProteinRankObjectRankEventRank
IsoRankClustering
Sports rankingFood websCentrality
Reverse PageRankFutureRank
SocialPageRankBookRank
ArticleRankItemRankSimRank
DiffusionRankTrustRankTweetRank
Note New paper LabRank with a random scientist?
Gleich & Langville Recap AIM 4 / 21
40 60 80 100 120
40
60
80
mm
Ulam NetworksChirikov mapyt+1 = ηyt+k sin(t+θt)t+1 = t + yt+1
Ulam network1. divide phase space into uniform cells2. form P based on trajectories.
log(E [x(A)]) log(Std [x(A)]))/ log(E [x(A)])
A ∼ Bet(2,16)Note White is larger, black is smaller
Google matrix, dynamical attractors, and Ulam networks, Shepelyansky and Zhirov, arXivGleich & Langville Recap AIM 5 / 21
40 60 80 100 120
40
60
80
mm
Choosing alphaSlide 6 of 21
Choosing alpha
Choosing personalization
Related methods
Open issues
40 60 80 100 120
40
60
80
mm
What is alpha? There’s no single answer.Ask yourself, why am I computing PageRank? Then use the bestvalue for your application.
web-search → tune α for the best featurevector
node centrality → understand what randomjumps mean in your graph
find important nodesin a web-graph → use the random surfer inter-
pretation
Author αBrin and Page (1998) 0.85Najork et al. (2007) 0.85Litvak et al. (2006) 0.5Pan el al. (2004) 0.15Algorithms (...) ≥ 0.85Experiment ???
Gleich & Langville Choosing alpha AIM 7 / 21
40 60 80 100 120
40
60
80
mm
The PageRank limit valueSingular? (− αP)x = (1− α)v
P = X�
00 J1
�
X−1
�
− αX�
00 J1
�
X−1�
x = (1− α)v
X
�
− α�
00 J1
��
X−1x = (1− α)v�
− α�
00 J1
��
y = (1− α)z
(1− α)y1 = (1− α)z1(− αJ2)y2 = (1− α)z2
Boldi et al. 2003: PageRank as a function of the damping parameter
Gleich & Langville Choosing alpha AIM 8 / 21
40 60 80 100 120
40
60
80
mm
TotalRank
t =∫ 1
0x(α)dα
Proposed by Boldi et al. (2005) as a parameter free PageRank.
Gleich & Langville Choosing alpha AIM 9 / 21
40 60 80 100 120
40
60
80
mm
Generalized PageRank
PageRank (− αP)x = (1− α)vx =
∑∞=0(1− α)(α
)Pv
Generalized PageRank y =∑∞
=0 ƒ ()Pv
∑
ƒ () <∞
TotalRank ƒ () = 1+1 −
1+2
LinearRank ...HyperRank ...
Baeza-Yates et al. 2006
Gleich & Langville Choosing alpha AIM 10 / 21
40 60 80 100 120
40
60
80
mm
Pick a distributionMultiple surfers should have an impact!
Each person picks α from distribution A
↓x(E [A])
...
↓E [x(A)]
↘ ↙x(E [A]) 6= E [x(A)]
TotalRank : E [x(A)] : A ∼ U[0,1]Constantine & Gleich, Internet Mathematics, in press.
Gleich & Langville Choosing alpha AIM 11 / 21
40 60 80 100 120
40
60
80
mm
From users
Raw α
density
0.0
0.5
1.0
1.5
2.0
0.0 0.2 0.4 0.6 0.8 1.0
Sample mean μ̄ = 0.631.Gleich et al., WWW2010Note 257,664 users from Microsoft toolbar data
Gleich & Langville Choosing alpha AIM 12 / 21
40 60 80 100 120
40
60
80
mm
ChoosingpersonalizationSlide 13 of 21
Choosing alpha
Choosing personalization
Related methods
Open issues
40 60 80 100 120
40
60
80
mm
Personalization choices
Application specific
É GeneRank : v = normalized microarray weightsÉ TopicRank: v = pages on the same topicÉ TrustRank: v = only pages known to be goodÉ BadRank: v = only pages known to be bad (an reverse the
graph)
Super-personalized
É Set v to have only a single non-zero : v = e.
Gleich & Langville Choosing personalization AIM 14 / 21
40 60 80 100 120
40
60
80
mm
Personalized PageRank
B = (1− α)(− αP)−1
Bj = “personalized score of page when jumping to page ”
Gleich & Langville Choosing personalization AIM 15 / 21
40 60 80 100 120
40
60
80
mm
Related methodsSlide 16 of 21
Choosing alpha
Choosing personalization
Related methods
Open issues
40 60 80 100 120
40
60
80
mm
PageRank history
See Vigna 2010: Spectral Ranking andFranceschet 2010: PageRank: Standing on the shoulder of giants.
Let A be the adjacency matrix of a graph.PageRank (− αP)x = (1− α)v (αP+ (1− α)veT )x = x
Seeley 1949 Px = x
Wei 1952 ATx = x
Katz 1953 (− αA)x = e
Hubbell 1965 ATx = x+ v
Gleich & Langville Related methods AIM 17 / 21
40 60 80 100 120
40
60
80
mm
Graph centrality
For a graph G, a score assigned to each vertex ∈ V is acentrality score if larger scores are “more central” vertices andthe score is independent of the labeling on the vertices.
Gleich & Langville Related methods AIM 18 / 21
40 60 80 100 120
40
60
80
mm
Open issuesSlide 19 of 21
Choosing alpha
Choosing personalization
Related methods
Open issues
40 60 80 100 120
40
60
80
mm
From
Vigna, A history of spectral ranking, MMDS2010
40 60 80 100 120
40
60
80
mm
Other issues
Gleich & Langville Open issues AIM 21 / 21
40 60 80 100 120
40
60
80
mm
QUESTIONS?