Universita degli Studi di Milanoboldi.di.unimi.it/download/TRdampingWithCover.pdf · Universita...
Transcript of Universita degli Studi di Milanoboldi.di.unimi.it/download/TRdampingWithCover.pdf · Universita...
Universita degli Studi di Milano
Dipartimento di Scienze dellrsquoInformazione
Rapporto interno N 305-05
amp
$
The Choice of a Damping Function forPropagating Importance in Link-Based
Ranking
Ricardo Baeza-Yates Paolo Boldi Carlos Castillo
The Choice of a Damping Functionfor Propagating Importance in Link-Based Ranking
Ricardo Baeza-YatesICREA Professor
Depto de TecnologıaUniversitat Pompeu Fabra
Spain
Paolo Boldilowast
Dipartimento di ScienzedellrsquoInformazione
Universita degli Studi di MilanoItaly
Carlos CastilloCatedra Telefonica
Depto de TecnologıaUniversitat Pompeu Fabra
Spain
Abstract
This paper studies a family of link-based algorithms that propagate page importance through linksIn these algorithms there is a damping function that decreases with the distance so a direct link impliesmore endorsement than a link through a long path PageRank is the most widely known ranking functionof this family
We focus on three damping functions having linear exponential and hyperbolic decay on the lengthsof the paths The exponential decay corresponds to PageRank and the other functions are new Ouranalysis includes a comparison among them and experiments for studying their behavior under differentparameters
1 Introduction
One of the measures of importance of a scientific paper is the number of citations that the article receivesFollowing this idea several authors proposed to use links for ranking web pages [Mar97 JM98 Li98] how-ever it quickly become clear that just counting the links was not a very reliable measure of authoritativeness(it was not in scientific citations either) because it is very easy to manipulate in the context of the webwhere creating a page costs nearly nothing
The PageRank technique introduced by Pageet al [PBMW98] actually tries to mend this problem bylooking at the importance of a page in a recursive manner In other words ldquoa page with high PageRank is apage referenced by many pages with high PageRankrdquo PageRank not only counts the direct links to a pagebut also includes indirect links The same is valid for scientific citations
PageRank turns out to be a simple robust and reliable way to measure the importance of web pages andit can be computed in a very efficient way For these reasons most of todayrsquos commercial search enginesare believed to use PageRank as an important source of ranking
In this paper we
bull describe general ranking functions that depend on incoming paths of varying lengths
bull show that PageRank belongs to this class of functions
bull show how to compute these rankings
lowastPartially supported by MIUR COFIN Project ldquoLinguaggi formali e automirdquo
1
bull suggest why 085 is a good choice of the parameter for PageRank and
bull compare the ranking orders induced by different ranking functions finding ways of approximatingPageRank up to very high precision
The rest of this paper is organized as followsSection2 introduces the notion of functional rankingSec-tion 3 describes three damping functionsSection4 compares them analytically andSection5 experimentswith different parameters for each function Finallythe last sectionpresents our conclusions
2 Functional Rankings
In this section we introduce the notion offunctional ranking a general family of ranking functions thatincludes PageRank To describe PageRank formally consider a web graph ofN pages LetANtimesN be thelink matrix in this graphai j = 1 iff there is a link from pagei to pagej This link matrix is seldom used asit is mainly for two reasons
Normalization In the Web creating an out-link is free so there is an incentive for web page authors tocreate pages with many out-links this is the reason why a metaphor of ldquovotingrdquo is enforced [Lif00]in which each page has only one ldquovoterdquo that has to be split among its linked pages This is typicallydone in link-based ranking by normalizingA row-wise the normalization process means that everyweb page can only decide how to divide its own score among the pages it leads to but it cannot assignmore score than it has Another way to look at normalization is that the matrix is turned into thetransition matrix of a stochastic process
The normalization does not need to give each out-link the same value as there is evidence that weblinks have different purposes such as navigating in a multi-page set expanding the contents of the cur-rent page pointing to another resource etc [HG98] Also links within the same site can be consideredself-links and as such do not confer as much authority as a link between different sites indeed thereare ranking methods like BHITS [BH98] or Altavistarsquos Eigenrank that treat them differently Othercharacteristics of links such as if they appear at the beginning or the bottom of the page or if theyappear within a certain HTML element can also be used for non-uniform normalization [BYD04]
We will assume uniform normalization so if a page hasd out-links each of those links has a weightof 1d but the results of this paper can be applied to other forms of normalization
Dangling nodesSpecial attention should be paid to the possible presence of nodes with no outgoing arcs(also known as ldquodangling nodesrdquo) in fact dangling nodes fail to produce a row-stochastic matrixbecause the rows of dangling nodes are filled with zeroes Dangling nodes can be dealt with byadding an extra node that is linked to and from all other nodes or by introducing new arcs from eachdangling node to every node in the graph In our analysis we shall assume that all dangling nodeshave been eliminated already in some way so that we do not have to worry about their presence Allthe algorithms we will present can be modified so that dangling nodes can be dealt with explicitly andwith virtually no additional cost
Let P be the row-normalized link matrix of the graph withN nodes PageRankr(α) is defined as thestationary distribution of the matrix
αP+(1minusα)1Tv
whereα isin [01) is a parameter calleddamping factor(sometimes also called a dampening factor) andv isa fixedpreference vectorthat may represent the interests of a particular user or another ranking vector that
2
is used for weighting pages Note that the above matrix is ergodic (at least if every entry ofv is strictlypositive) so it has exactly one stationary distribution Even though most of our results can be easily restatedwith a non-uniform preference vectorv for the sake of clarity we shall only consider the uniform preference1N in the rest of the paper
As observed in [BSV05] the PageRank vectorr(α) can be written as
r(α) = (1minusα)infin
sumt=0
αt 1N
1Pt
Or in matricial form
r(α) = (1minusα)1N
1(I minusαP)minus1 ||αP||lt 1
There is an equivalent and actually very intriguing way of rewriting this formula mentioned in [NSW01]that leads to a conclusion similar to those of [Bri06] given a pathp= 〈x1x2 xk〉 in the graph we defineits branching contributionas follows
branching(p) =1
d1d2 middot middot middotdkminus1
whered j is the outdegree this is the number of outgoing arcs of nodex j Then the ranking of nodei according to PageRank is
r i(α) = sumpisinPath(minusi)
(1minusα)α|p|
Nbranching(p)
where Path(minus i) is the set of all paths into nodei and |p| is the length of pathp this is because(Pt)i j
contains the sum of the branching contributions of all paths of lengtht from i to j as one can easily show byinduction ont This way of expressing the PageRank of a node is interesting because it highlights the factthat the rank of a node is essentially obtained as a weighted sum of contributions coming from every pathentering into the node with weights that decay exponentially in the length of the path
A natural generalization of this idea consists in taking into consideration a rankingR of the generalform
R =infin
sumt=0
damping(t)1N
1middotPt
or equivalently
Ri = sumpisinPath(minusi)
damping(|p|) 1N
branching(p)
where the damping function is a suitable choice of weights We will refer to this form of ranking asfunc-tional ranking because it is parametrized by the damping function As we have seen generic PageRank isa functional ranking where the damping function
damping(t) = (1minusα)αt
decays exponentially fast
3
3 Damping Functions
In this section we show several functional rankings by describing their damping functions First we showwhich class of damping functions generate well-defined functional rankings
As shown in [Bri06 Corollary 24] for every pair of nodesi and j and for every lengtht
sumpisinPath(i j)|p|=t
branching(p)le 1
A more general property holds
Theorem 1 For every node i and every length t
sumpisinPath(iminus)|p|=t
branching(p) = 1
Proof By induction ont For t = 0 there is only one path fromi of length 0 and its branching is 1 Forthe inductive stepsumpisinPath(iminus)|p|=t+1branching(p) can be rewritten by observing that ifi has outdegreedi every path fromi of lengtht +1 is the concatenation ofi with a path of lengtht from an out-neighbor ofi
sumpisinPath(iminus)|p|=t+1
branching(p) = sumjirarr j
1di
sumpisinPath( jminus)|p|=t
branching(p) = sumjirarr j
1di
= 1
As a consequence to guarantee that the functional ranking is well-defined and normalized (ie that rankvalues sum to 1) we need
N
sumi=1
sumpisinPath(minusi)
damping(|p|) 1N
branching(p) = 1
that isinfin
sumt=0
damping(t)1N sum
pisinPath(minusminus)branching(p) = 1
UsingTheorem1 sumpisinPath(minusminus) branching(p) = N so the latter equality is equivalent to
infin
sumt=0
damping(t) = 1
Hence every choice of the damping function such thatsuminfint=0damping(t) = 1 yields a well-defined nor-
malized functional ranking Nonetheless not all choices are equivalent so we have to find out whichfunctions generate better rankings Since a direct link should be more valuable as a source of evidence thana distant link we focus on damping functions that are decreasing ont the length of the paths
31 Linear damping
Letrsquos start by considering a simple damping function such as
damping(t) =
2(Lminust)L(L+1) t lt L
0 t ge L
4
this is a damping function that decreases linearly with distance and reaches zero at distanceL The trivialcaseL = 1 gives a uniform ranking andL = 2 is ranking by indegree as in the latter case all paths of lengthge 2 are not considered
From the definition
R =infin
sumt=0
damping(t)vPt
=L
sumt=0
2(Lminus t)L(L+1)
vPt
=2
L(L+1)v
Lminus1
sumt=0
(Lminus t)Pt
=2
L(L+1)v(L(I minusP)minusP(I minusPL))((I minusP)2)minus1
provided that(I minusP)2 is not singularAn advantage of this type of ranking is that only the first few levels are considered so the computation
is fast and the number of iterations is fixed
Computation For computing this functional ranking we can define the following sequence
R(0) =2
L+1v
R(k+1) =(Lminuskminus1)
(Lminusk)RkP
The functional ranking with linear damping issumLminus1k=0 R(k) An algorithm for computing this ranking
shown inFigure1 arises directly from this summation
32 Exponential damping PageRank
As we already noted PageRank can be seen as a functional ranking where the damping function decaysexponentially
damping(t) = (1minusα)αt
As longer paths have less importance in the calculation of PageRank it could be approximated by usingonly a few levels of links In [CGS04] it is shown that by using only the nodes at distance 1 from a targetnode (equivalent to linear damping withL = 2) PageRank can be approximated with 30 of average errorUsing nodes at distance 2 the average error drops to 20 and at distance 3 to 10 After that there are nosignificant improvements by adding more levels and the cost (the number of nodes to be explored) is muchhigher
Computation Since PageRank is the principal eigenvector of the modified graph matrix it can be easilyapproximated by the iterative Power Method algorithm as suggested by Pageet al in their original pa-per [PBMW98] this iterative algorithm gives good approximations (both in norm and with respect to theinduced node order) in few iterations even though convergence speed and numerical stability decay whenαgets close to 1 [HK03b HK03a] Other methods to compute PageRank have been proposed some of them
5
Require L maximum path length N number of nodesv preference vector1 for i 1 N do Initialization2 S[i] larr R[i] larr 2v[i](L+1)3 end for4 for step 1 Lminus1 do Iteration step5 Auxlarr 06 for i 1 N do Follow links in the graph7 for all j such that there is a link fromi to j do8 Aux[j] larr Aux[j] + R[i]outdegree(i)9 end for
10 end for11 for i 1 N do Add to ranking value12 R[i] larr Aux[i] times (Lminusstep)
(Lminus(stepminus1))13 S[i] larr S[i] + R[i]14 end for15 end for16 return R
Figure 1 Algorithm for computing a functional ranking with linear damping
using techniques for the solution of systems of linear equations some other concentrating on some specificfeatures of the web as a graph that determine forms of locality in the computation of PageRank (see forexample [PBMW98 Hav99 GG04 LGZ04 KHMG03b KHMG03a])
33 Quadratic hyperbolic damping TotalRank
Recently a ranking method called TotalRank [Bol05] has been proposed The method aims at eliminatingthe necessity for an arbitrary parameter by integrating PageRank over the entire range ofα If r(α) is thevector of PageRank then TotalRank is defined as
T =int 1
0r(α)dα
T can be written as
int 1
0r(α)dα =
1N
infin
sumt=0
int 1
0(1minusα)αt1middotPtdα
=1N
infin
sumt=0
1(t +1)(t +2)
1middotPt
By using the definition of the logarithm of a matrix
ln(I minusP) =minusinfin
sumk=1
Pk
k
we can write TotalRank as
T = Pminus1(I +(I minusPminus1) ln(I minusP))
6
provided thatPminus1 is not singular andP 6= I TotalRank is as a weighted sum of the score associated with paths of varying lengths in which the
weights are hyperbolically decreasing on the lengths of the paths In other words TotalRank is a functionalranking with damping function
damping(t) =1
(t +1)(t +2)
and it is well defined assuminfint=0damping(t) = 1
Computation It is known that the cost of calculating TotalRank is the same as the cost of calculatingPageRank via the Power Method [BSV05] even though some more iterations are required to obtain thesame precision
34 General hyperbolic damping HyperRank
TotalRank is part of a more general family of weighting schemes for paths of different lengths that can beapproximated using
s(β) =1
Nζ(β)
infin
sumt=0
1
(t +1)β 1middotPt
Again this way of ranking follows the general scheme with damping function chosen as
damping(t) =1
ζ(β)(t +1)β
Here we are using Riemannrsquos zeta functionζ(β) = suminfint=1
1tβ for normalization and we needβ gt 1 for it
to converge Note that whenβ = 2 we get weights similar to those of TotalRank in which thet-th coefficientis 1(t +1)(t +2) whereas here it is 1ζ(2)(t +1)2
A meaningful choice forβ should be done considering the distribution of paths of different lengths in ascale-free graph A largeα in PageRank or a smallβ in HyperRank means increasing the effect of longerpaths in the score
Computation Figure2 shows an algorithm for approximating HyperRank Let us define a vector se-quenceR(t) as follows
R(0) =1
Nζ(β)
R(k+1) =(
k+1k+2
)βR(k)P
It is easy to see thatsuminfint=0R(k) = s(β) becauseR(k) = 1(N middotζ(β)(k+1)β))1middotPk this observation leads
to the algorithm Note that convergence speed is much slower than ordinary PageRank especially whenβis close to 1 the norm of thek-th summand being bound by 1(1+ 1k)β Interestingly enough thoughconvergence speed is reasonable ifβ is sufficiently large
7
Require N number of nodesv preference vectorβ damping parameter1 for i 1 N do Initialization2 S[i] larr R[i] larr v[i]ζ(β)3 end for4 klarr 05 while not convergeddo Iteration step6 klarr k + 17 Auxlarr 08 for i 1 N do Follow links in the graph9 for all j such that there is a link fromi to j do
10 Aux[j] larr Aux[j] + R[i]outdegree(i)11 end for12 end for13 for i 1 N do Add to ranking value14 R[i] larr Aux[i] times(k(k+1))β
15 S[i] larr S[i] + R[i]16 end for17 end while18 return S
Figure 2 An algorithm to compute general hyperbolic rank
35 An empirical damping
An empirical damping function would consider how much the value of an endorsement decreases by fol-lowing longer paths in the real web graph This cannot be known exactly but we can attempt to measureit indirectly Pages that link to each other are more similar than pages chosen at random [Dav00] evi-dence from topical crawlers [SPM05] shows that when doing breadth-first exploring the topic ldquodriftsrdquo asthe distance increases On the same line of though we propose to use the decrease of text similarity as anapproximation to an ldquoempiricalrdquo damping function
To find out which is the correlation between link-distance and similarity we performed the followingexperiment we considered a web graph corresponding to a partial snapshot of theuk domain and sampled200 nodes at random For each sampled node we followed links backwards to obtain nodes at a minimumdistance of 1 2 3 4 or 5 links Then we sampled 12000 pairs at each minimum distance at randomand computed their similarities with the original nodes Similarity was measured using the normalization ofTFIDF [BYRN99] without stemming nor stopwords removal
The resulting averages are shown inFigure3 with standard deviation error bars Text similarity clearlydecreases with distance and in some applications the empirical distribution of text similarity versus distancecould be used as an ldquoempiricalrdquo damping function Different measures of text similarity can yield differentdistributions for instance [WHAT04] uses the number of repeated words and phrases between pages andobtains a faster decrease in similarity
8
02
03
04
05
06
07
1 2 3 4 5
Ave
rage
text
sim
ilari
ty
Link distance
Figure 3 Link distance vs average text similarity
4 Comparing Damping Functions
A comparison of the damping functions described in the previous section is shown inFigure4 of coursehyperbolic damping functions decay asymptotically more slowly than exponential damping but notice thatfor short paths the latter may dominate the former in many cases
000
005
010
015
020
025
1 2 3 4 5 6 7
Wei
ght
Length of the path (t)
PageRank α=075PageRank α=050
TotalRankHyperRank β=2HyperRank β=3
Linear damping L=10
Figure 4 Weights given by the different damping functions for some values ofαandβ
In this section we aim at analyzing how similar are these functional rankings and how could we useone of the damping functions to approximate another with a suitable choice of parameters
9
41 Approximating TotalRank with PageRank
It has been shown experimentally that the rank correlation (Kendallrsquosτ) between TotalRank and PageRankis maximal whenα asymp 07 [Bol05] the maximum value forτ is over 095 meaning that for that specificchoice ofα PageRank and TotalRank induce almost equivalent ranking orders
We now want to approach the same problem in an analytic fashion more precisely we aim at study-ing the difference between TotalRank and PageRank by calculating the difference between their respectivedamping functions
dampingTotalRank(t) =1
(t +1)(t +2)dampingPageRank(α)(t) = (1minusα)αt
As they are normalized both damping functions have the same summation over the entire range oftOur approach is to consider the summation of their differences up to a maximum length for a path As thetwo functions are decreasing the difference in the first levels makes most of the difference in the rankingsIf ` is the maximum path length we are interested in we aim at minimizing this sum
`
sumt=0
(1
(t +1)(t +2)minus (1minusα)αt
)= α`+1minus 1
`+2
The minimum absolute value is 0 and it is obtained whenα is equal to
αlowast(`) =1
`+1radic
`+2= 1minus log`
`+O
(log2`
`2
)
Figure5 showsαlowast(`) as a function of Recall that for the World-Wide Web graph the average lengthof a path between two nodes when a path exists has been estimated in about 16 [BKM+00] or 19 [AJB99]but clearly today is over 20 Now in the range of path lengths between 15 and 20 the value ofαlowast(`)parameters that minimizes the difference between the exponentially decaying weights of PageRank and thehyperbolically decaying weights of TotalRank is roughly 085
Difference Optimal value
0506
0708
09
α0 5 10 15 20 25
Length
-02
-01
0
01
02
03
04
045
050
055
060
065
070
075
080
085
090
0 5 10 15 20 25
α that
min
imiz
es th
e di
ffer
ence
of
sum
of w
eigh
ts w
ith T
otal
Ran
k
Maximum length of the paths considered
Figure 5 Bestαlowast(`) for minimizing the difference of the sum of weights betweenPageRank and TotalRank
10
42 Approximating HyperRank with PageRank
Now we want to approximate the weights of
s(β) =1
Nζ(β)
infin
sumt=0
1
(t +1)β Pt
using the weights of
r(α) =1minusα
N
infin
sumt=0
αtPt
and we proceed again by considering paths up to a certain length
`
sumt=0
(1
ζ(β)(t +1)β minus (1minusα)αt
)The minimum can be zero and it is attained at
αlowast(`β) =
radic1minus 1
ζ(β)
`
sumt=0
1
(t +1)β
Theα that minimizes the difference of weights for different values ofβ and of the maximum path lengths` is shown inFigure6 In the case ofβ = 2 for instance for path lengths up to 10 to 20 the bestα is between075 and 085
α
15202530β 5 10 15 20 25
Length
03
04
05
06
07
08
09
1
00
01
02
03
04
05
06
07
08
09
10
1 15 2 25 3 35 4
α th
at m
inim
izes
the
diff
eren
ce o
fsu
m o
f wei
ghts
Exponent β
Max path length=25Max path length=20Max path length=15Max path length=10
Max path length=5
Figure 6 Bestα for minimizing the difference of the sum of weights betweenPageRank and HyperRank with exponentβ for various path lengths
11
43 Approximating PageRank with LinearRank
For approximating the damping function of PageRank with the damping function of LinearRank we con-sider the summation of the differences up to a certain path length If`le L
`
sumt=0
((1minusα)αt minus 2(Lminus t)
L(L+1)
)And if ` gt L
Lminus1
sumt=0
((1minusα)αt minus 2(Lminus t)
L(L+1)
)+
`
sumt=L
(1minusα)αt
We will assume thatle L so the evaluation of the difference between the two rankings is done in an areain which both rankings have non-zero values TheL that minimizes the difference for a given combinationof α and` is
Llowast(α `) = `+2`α`+1 +α`+1 +1+
radic(1+α`+1)2 +4`(`+2)α`+1
2(1minusα`+1)
= `+1+O(`α(`+1)2
)and we have plotted it for different values ofα and` in Figure7
L
0405
0607
0809
α5 10 15 20 25
Length
5
10
15
20
25
30
5
10
15
20
25
30
35
40
05 06 07 08 09
L th
at m
inim
izes
the
diff
eren
ce o
fsu
m o
f wei
ghts
Exponent α
Max path length=25Max path length=20Max path length=15Max path length=10Max path length=5
Figure 7 BestL for minimizing the difference of the sum of weights betweenLinearRank and LinearRank for various path lengths
12
5 Parameters for the Damping Functions
For our experiments we used several snapshots from the Web including theuk it andeuint domainsFor comparison we also considered a synthetic scale-free network produced according to the evolving modeldescribed by Kumaret al [KRR+00] (a combination of preferential attachment and random links) with theparameters suggested by Panduranganet al [PRU02] As far as the latter is concerned in the generatedgraph the exponents for the power-law in the center part of the distributions are -21 for in-degree andPageRank and -27 for out-degree we generated a 100000-nodes graph without disconnected nodes
In this section we study the behavior of the ranking functions for varying values of their parameters
51 Characteristic path lengths
In scale-free networks the distances between pairs of nodes follow a Gaussian distribution [AJB99] (theaverage is not given in their paper) Analytic estimations for the average distance of a graph of scale-freenetwork ofn nodes include
bull O(log(n)) [WS98]
bull O(log(n) log(np)) in sparse graphs withp links [CL01]
bull 1+ log(nz1) log(z2z1) wherez1 is the average indegree andz2 is the average number of nodes atdistance 2 [NSW01] and
bull O(log(n) log(log(n))) [BR04]
We did the following experiment starting from a node picked at random we followed the links back-wards and counted the number of nodes at different distancesFigure8 plots the average distances foundwhich appear to be growing (sub)logarithmically with the size of the graphFigure9 shows the distributionobtained in each sample For this experiment we are not counting the pages without in-links
3456789
10111213141516
105 106 107 108
Ave
rage
dis
tanc
e
Number of nodes (N)
Figure 8 Average distances versus number of nodes
If graphs of different sizes show different path lengths what is the effect of this in the ranking calcu-lation Letrsquos suppose that for a graph withN1 nodes it is found by experimental or analytic means that agood parameter for PageRank isαlowast1 Now we would like to have a good parameterαlowast2 for a graph with thesame properties except that the size of the new graph isN2 lt N1
13
it Web graph uk Web graph40 mill nodes 18 mill nodes
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
Avg distance 149 Avg distance 148
euint Web graph Synthetic graph800000 nodes 100000 nodes
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
Avg distance 100 Avg distance 42
Figure 9 Distribution of the average number of nodes at a certain distance froma given node
One possible approach consistent with what we have done so far is to consider that the sum of theweights up to the average path lengths of the graphs (L1 L2) have to be similar for both rankings to behavein a similar way If we take this approach the solution is
1minus (αlowast1)L1+1 = 1minus (αlowast2)
L2+1
αlowast2 = (αlowast1)L1+1L2+1
αlowast2 asymp (αlowast1)log(N1)log(N2)
An example that can be used in practice is the following letrsquos consider a web graph withN1 = 115times109
pages (the size of the full Web estimated by [GS05]) and another graph with onlyN2 = 50times106 pages (thesize of the Web of a large country) the second graph is roughly 3 orders of magnitude smaller
If it is shown empirically thatαlowast1 = 085 is a good value for the PageRank parameter for the whole Webthenαlowast2 = 081 should have a similar behavior in the 50-million page set which is natural as the path lengthsare shorter If the subset of web pages were even smaller for instanceN2 = 106 pages (the size of the webof a large organization) thenαlowast2 = 076
14
52 Damping parameters and in-degree
In this section we are using data from theuk Web graph and a 8500-nodes synthetic graph with the sameindegree and outdegree distribution than the previous synthetic graph We first measured the variance ofthe values from the ranking function as we consider that a high variance is good in a ranking functionas the relative values differ more We also measured the relationship between the ranking function andin-degree for different values of the parameters in terms of covariance correlation coefficient and rankingorders (Kendallrsquosτ) The results are shown inFigure10
Variance Covariance Correlation coefficient Kendallrsquosτ
000e+00200e-13400e-13600e-13800e-13100e-12120e-12140e-12160e-12180e-12200e-12
01 02 03 04 05 06 07 08 09
Var
ianc
e of
Pag
eRan
k
Damping factor α
00 times 100
50 times 10-5
10 times 10-4
15 times 10-4
20 times 10-4
25 times 10-4
30 times 10-4
35 times 10-4
01 02 03 04 05 06 07 08 09
Cov
aria
nce
betw
een
Page
Ran
k an
d in
-deg
ree
Damping factor α
07000710072007300740075007600770078007900800
01 02 03 04 05 06 07 08 09
Corr
elat
ion
coef
ficie
nt w
ith in
-deg
ree
Damping factor α
042
044
046
048
050
052
054
056
01 02 03 04 05 06 07 08 09Ken
dallrsquo
s τ b
etw
een
Page
Rank
and
in-d
egre
e
Damping factor α
005
010
015
020
025
030
035
040
045
050
055
01 02 03 04 05 06 07 08 09
Var
ianc
e of
Pag
eRan
k
Damping factor α
0 times 100
1 times 10-3
2 times 10-3
3 times 10-3
4 times 10-3
5 times 10-3
6 times 10-3
01 02 03 04 05 06 07 08 09
Cov
aria
nce
betw
een
Page
Ran
k an
d in
-deg
ree
Damping factor α
0995
0996
0997
0998
0999
1000
01 02 03 04 05 06 07 08 09
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor α
079
080
081
082
083
084
085
086
087
01 02 03 04 05 06 07 08 09Ken
dallrsquo
s τ b
etw
een
Page
Rank
and
in-d
egre
e
Damping factor α
Figure 10 Variance of PageRank and its relationship with indegree in terms ofcovariance correlation coefficient and Kendallrsquosτ coefficient for varying valuesof the parameterα Top uk web graph bottom synthetic graph
The variance is higher asα increases In regard to the relationship with indegree for company homepages it has been shown that the logarithm of the in-degree is correlated with PageRank [UCH03] Ourresults are consistent with this observation The covariance monotonically increases withα both in the caseof the synthetic graph and in the web graph Not surprisingly using the generative model the correlationsare higher We observe a maximum correlation atα = 07 in the synthetic graph and atα = 05 in the webgraph We also notice that the correlation drops significativelly asα increases because a largeα means thatlonger paths have an effect in the calculation note however that this phenomenon does not significantlyimpact on the correlation coefficient that is still very large
A high correlation between PageRank and in-degree is bad from the point of view of a search enginebecause it makes link-spam easier In particular as the correlation coefficient is higher in theuk web graphnear 05 if we chooseα close to this value we are helping link spammers Note however that a highcorrelation was foreseeable because as shown in[CGS04] even approximating PageRank with just only 1level of links gets 70 of accuracy
The behavior of the Kendallrsquosτ coefficient which measures the similarity between ranking orders isthe opposite than the one observed in the real graph This also happens for HyperRank as inFigure11we made the same measurements for this functional ranking and the results were consistent The graphseems inverted because a low value ofβ has the same effect as a high value ofα longer paths have moreimportance in the calculation
The differences in the behavior of the ranking order in the synthetic graph might be explained by thefact that the generative model we are using does not capture some properties that might be relevant for theranking order under a functional ranking such as the clustering coefficient Also the synthetic graph is
15
Variance Covariance Correlation coefficient Kendallrsquosτ
000e+00
100e-12
200e-12
300e-12
400e-12
500e-12
600e-12
700e-12
800e-12
900e-12
1 15 2 25 3 35 4
Var
ianc
e of
Gen
eral
Hyp
erbo
lic R
ank
Damping factor β
000e+00
100e-04
200e-04
300e-04
400e-04
500e-04
600e-04
700e-04
800e-04
1 15 2 25 3 35 4Cov
aria
nce
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
0730
0740
0750
0760
0770
0780
0790
0800
1 15 2 25 3 35 4
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor β
04600
04700
04800
04900
05000
05100
05200
05300
1 15 2 25 3
Ken
dallrsquo
s τ
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
000e+00
200e-05
400e-05
600e-05
800e-05
100e-04
120e-04
140e-04
160e-04
1 15 2 25 3 35 4
Var
ianc
e of
Gen
eral
Hyp
erbo
lic R
ank
Damping factor β
000e+00
200e-03
400e-03
600e-03
800e-03
100e-02
120e-02
1 15 2 25 3 35 4Cov
aria
nce
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
0997
0997
0998
0998
0999
0999
1 15 2 25 3 35 4
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor β
0815008200082500830008350084000845008500085500860008650
1 15 2 25 3 35 4
Ken
dallrsquo
s τ b
etw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
Figure 11 Variance of general hyperbolic rank for some values ofβ and itsrelation with in-degree Experiments have been performed on theuk graph (top)and synthetic graph (bottom)
assortative (highly linked pages are mostly linked to other highly linked pages) while the real web graph isdisassortative (most of the neighbours of highly linked pages have small indegree)
53 Experimental comparison of ranking orders
In this section we present experimental results about the similarity between the ranking orders induced bysome of the functional rankings discussed in the previous sections To perform the comparison we usedKendallrsquosτ and data from the U K Web graphFigure12shows how PageRank compares with HyperRankfor various pairs ofα andβ In the limit αβrarr 1 both rankings are equivalent and they remain similar in alarge region of the parameter space
α
β
07
08
09
1
τ ge 095
τ
02 03
04 05
06 07
08 09
2 25
3 35
4
01
02
03
04
05
06
07
08
09
15 2 25 3 35 4
α th
at m
axim
izes
Ken
dallrsquo
s τ
Exponent β
Actual optimumPredicted optimum with length=5
Figure 12 Comparison (using Kendallrsquosτ) between PageRank and HyperRankwith various damping parameters in the UK web graph The optimum predictedin the analysis with = 5 is very close to the real one
In the figure we can see that the rankings obtained with HyperRank and PageRank can be almostequivalent (Kendallrsquosτ ge 095) moreover the analysis shown insection42 considering only paths of
16
lengths less than 5 provides a very good approximation for the optimum combination of parameters Thismeans that in fact the difference in the damping functions in the first few levels is crucial
The exponentsβ required for giving a good approximation of PageRank are very small whenα ge 07limiting the practical applicability of HyperRank as its convergence is not faster than the one of PageRank
As far as LinearRank and PageRank are concerned long paths and largeα should be considered toobtain a sufficiently similar ranking as shown inFigure13
05 06
07 08
09
5 10
15 20
25
080085090095100
τ
τ ge 095
αL
5
10
15
20
25
05 06 07 08 09L
that
max
imiz
es K
enda
llrsquos
τExponent α
Actual optimumPredicted optimum with length=5
Figure 13 Comparison (using Kendallrsquosτ) between PageRank and LinearRankin the UK web graph with various damping parameters Again the predictedoptimum with` = 5 is very close to the actual optimum
The predicted optimum given insection43 with ` = 5 this is considering only the summation of thedifferences between both damping functions up to paths of length 5 is very close to what was obtained inpractice Forα = 08 calculating LinearRank withL = 10 (which means the same number of iterations)givesτ ge 098 for α = 09 calculating LinearRank withL = 15 also givesτ ge 098 In both cases theranking order of PageRank is approximated by the ranking order of LinearRank with very high precision
6 Conclusions
In this paper we have defined a broad class of link-based ranking algorithms based on the contribution ofdamping factors along all different paths reaching a page We introduce four particular damping decays lin-ear exponential quadratic hyperbolic and general hyperbolic where exponential is equivalent to PageRankand quadratic hyperbolic to TotalRank
We studied the differences and similarities among these ranking algorithms and we have found that
bull Functional rankings using different damping functions can provide similar orderings if the parametersare chosen carefully
bull LinearRank can be used for calculating a ranking that is as good as PageRank but with a fixed andsmaller number of iterations
bull The parameters for the damping functions depend on the characteristic of path lengths in the graphwhich is known to grow sub-logarithmically on the size of the graph
More work is needed to find other damping functions that compute rankings similar to PageRank butare easier and faster to compute We use global ranking similarity but another measure could be the ranking
17
similarity in the top 20 results of real queries In this setting our results can change so future work willinclude this variation
Because of their high cost link-based ranking methods that involve iterative calculations at query timeare probably not used by large-scale search engines at this moment but the functional ranking with lineardamping we have presented can provide a good approximation with few iterations Also the approach wehave presented could be also applied to multivalued ranking functions such as HITS [Kle99] and topic-sensitive PageRank [Hav02] to obtain for instance a method for approximating the hubs and authorityscores using less iterations and a linear damping function
Our approach also helps to understand how easy or difficult is to collude many pages to modify theranking of a given page Clearly there are many different factors path lengths damping function branchingdegrees and number of colluded pages The graph structure of the collusion will affect those factors and weplan to analyze them
Acknowledgements
We would like to thank Daniel Fogaras for a valuable discussion about TotalRank that motivated part of thisresearch
References
[AJB99] Reka Albert Hawoong Jeong and Albert L Barabasi Diameter of the world wide webNature 401130ndash131 1999
[BH98] Krishna Bharat and Monika R HenzingerImproved algorithms for topic distillation in ahyperlinked environment In Proceedings of the 21st Annual International ACM SIGIR Con-ference on Research and Development in Information Retrieval pages 104ndash111 MelbourneAustralia August 1998 ACM Press New York
[BKM +00] Andrei Broder Ravi Kumar Farzin Maghoul Prabhakar Raghavan Sridhar RajagopalanRaymie Stata Andrew Tomkins and Janet WienerGraph structure in the web Experimentsand models In Proceedings of the Ninth Conference on World Wide Web pages 309ndash320Amsterdam Netherlands May 2000 ACM Press
[Bol05] Paolo BoldiTotalrank ranking without damping In Poster proceedings of the 14th interna-tional conference on World Wide Web pages 898ndash899 Chiba Japan 2005 ACM Press
[BR04] Bela Bollobas and Oliver RiordanThe diameter of a scale-free random graph Combinator-ica 24(1)5ndash34 2004
[Bri06] Michael Brinkmeier PageRank revisited ACM Transaction on Internet Technologies6(3)257ndash279 2006
[BSV05] Paolo Boldi Massimo Santini and Sebastiano VignaPagerank as a function of the dampingfactor In Proceedings of the 14th international conference on World Wide Web pages 557ndash566 Chiba Japan 2005 ACM Press
[BYD04] Ricardo Baeza-Yates and Emilio DavisWeb page ranking using link attributes In Alternatetrack papers amp posters of the 13th international conference on World Wide Web pages 328ndash329 New York NY USA 2004 ACM Press
18
[BYRN99] Ricardo Baeza-Yates and Berthier Ribeiro-NetoModern Information Retrieval AddisonWesley May 1999
[CGS04] Yen-Yu Chen Qingqing Gan and Torsten SuelLocal methods for estimating pagerank val-ues In Proceedings of the thirteenth ACM conference on Information and knowledge man-agement (CIKM) pages 381ndash389 New York NY USA 2004 ACM Press
[CL01] Fan Chung and Linyuan LuThe diameter of random sparse graphs Adv Appl Math26257ndash279 2001
[Dav00] Brian D DavisonTopical locality in the web In Proceedings of the 23rd annual internationalACM SIGIR conference on research and development in information retrieval pages 272ndash279Athens Greece 2000 ACM Press
[GG04] Gene H Golub and Chen GreifArnoldi-type algorithms for computing stationary distributionvectors with application to pagerank Technical Report SCCM-04-15 Stanford University2004
[GS05] Antonio Gulli and Alessio SignoriniThe indexable Web is more than 115 billion pages InPoster proceedings of the 14th international conference on World Wide Web pages 902ndash903Chiba Japan 2005 ACM Press
[Hav99] Taher HaveliwalaEfficient computation of pagerank Technical report Stanford University1999
[Hav02] Taher H HaveliwalaTopic-sensitive pagerank In Proceedings of the Eleventh World WideWeb Conference pages 517ndash526 Honolulu Hawaii USA May 2002 ACM Press
[HG98] Stephanie W Haas and Erika S GramsPage and link classifications connecting diverseresources In DL rsquo98 Proceedings of the third ACM conference on Digital libraries pages99ndash107 New York NY USA 1998 ACM Press
[HK03a] Taher Haveliwala and Sepandar KamvarThe condition number of the pagerank problemTechnical Report 36 Stanford University 2003
[HK03b] Taher Haveliwala and Sepandar KamvarThe second eigenvalue of the google matrix Tech-nical Report 20 Stanford University 2003
[JM98] W-K Joo and S H Myaeng Improving retrieval effectiveness with hyperlink informationIn Proceedings of International Workshop on Information Retrieval with Asian Languages(IRAL) Singapore October 1998
[KHMG03a] S Kamvar T Haveliwala C Manning and G GolubExploiting the block structure of theweb for computing pagerank 2003
[KHMG03b] Sepandar D Kamvar Taher H Haveliwala Christopher D Manning and Gene H GolubExtrapolation methods for accelerating pagerank computations In Proceedings of the twelfthinternational conference on World Wide Web pages 261ndash270 ACM Press 2003
[Kle99] Jon M KleinbergAuthoritative sources in a hyperlinked environment Journal of the ACM46(5)604ndash632 1999
19
[KRR+00] R Kumar P Raghavan S Rajagopalan D Sivakumar A Tomkins and E UpfalStochasticmodels for the web graph In Proceedings of the 41st Annual Symposium on Foundations ofComputer Science (FOCS) pages 57ndash65 Redondo Beach CA USA 2000 IEEE CS Press
[LGZ04] Chris P Lee Gene H Golub and Stefanos A ZeniosA fast two-stage algorithm for com-puting pagerank and its extensions Technical report Stanford University 2004
[Li98] Yanhong LiToward a qualitative search engine IEEE Internet Computing July 1998
[Lif00] Maxim LifantsevVoting model for ranking Web pages In Peter Graham and MuthucumaruMaheswaran editorsProceedings of the International Conference on Internet Computingpages 143ndash148 Las Vegas Nevada USA June 2000 CSREA Press
[Mar97] Massimo MarchioriThe quest for correct information of the Web hyper search engines InProc of the sixth international conference on the Web Santa Clara USA April 1997
[NSW01] M E Newman S H Strogatz and D J WattsRandom graphs with arbitrary degree distri-butions and their applicationsPhys Rev E Stat Nonlin Soft Matter Phys 64(2 Pt 2) August2001
[PBMW98] Lawrence Page Sergey Brin Rajeev Motwani and Terry WinogradThe PageRank citationranking bringing order to the Web Technical report Stanford Digital Library TechnologiesProject 1998
[PRU02] Gopal Pandurangan Prabhakara Raghavan and Eli UpfalUsing Pagerank to characterizeWeb structure In Proceedings of the 8th Annual International Computing and CombinatoricsConference (COCOON) volume 2387 ofLecture Notes in Computer Science pages 330ndash390Singapore August 2002 Springer
[SPM05] Padmini Srinivasan Gautam Pant and Filippo MenczerA general evaluation framework fortopical crawlers Information Retrieval 8(3)417ndash447 2005
[UCH03] Trystan Upstill Nick Craswell and David HawkingPredicting fame and fortune Page-rank or indegree In Proceedings of the Australasian Document Computing SymposiumADCS2003 pages 31ndash40 Canberra Australia December 2003
[WHAT04] Fang Wu Bernardo A Huberman Lada A Adamic and Joshua R TylerInformation flow insocial groups Physica A Statistical and Theoretical Physics 337(1-2)327ndash335 June 2004
[WS98] Duncan J Watts and Steven H StrogatzCollective dynamics of small-world networksNa-ture 393(6684)440ndash442 June 1998
20
- Introduction
- Functional Rankings
- Damping Functions
-
- Linear damping
- Exponential damping PageRank
- Quadratic hyperbolic damping TotalRank
- General hyperbolic damping HyperRank
- An empirical damping
-
- Comparing Damping Functions
-
- Approximating TotalRank with PageRank
- Approximating HyperRank with PageRank
- Approximating PageRank with LinearRank
-
- Parameters for the Damping Functions
-
- Characteristic path lengths
- Damping parameters and in-degree
- Experimental comparison of ranking orders
-
- Conclusions
-
The Choice of a Damping Functionfor Propagating Importance in Link-Based Ranking
Ricardo Baeza-YatesICREA Professor
Depto de TecnologıaUniversitat Pompeu Fabra
Spain
Paolo Boldilowast
Dipartimento di ScienzedellrsquoInformazione
Universita degli Studi di MilanoItaly
Carlos CastilloCatedra Telefonica
Depto de TecnologıaUniversitat Pompeu Fabra
Spain
Abstract
This paper studies a family of link-based algorithms that propagate page importance through linksIn these algorithms there is a damping function that decreases with the distance so a direct link impliesmore endorsement than a link through a long path PageRank is the most widely known ranking functionof this family
We focus on three damping functions having linear exponential and hyperbolic decay on the lengthsof the paths The exponential decay corresponds to PageRank and the other functions are new Ouranalysis includes a comparison among them and experiments for studying their behavior under differentparameters
1 Introduction
One of the measures of importance of a scientific paper is the number of citations that the article receivesFollowing this idea several authors proposed to use links for ranking web pages [Mar97 JM98 Li98] how-ever it quickly become clear that just counting the links was not a very reliable measure of authoritativeness(it was not in scientific citations either) because it is very easy to manipulate in the context of the webwhere creating a page costs nearly nothing
The PageRank technique introduced by Pageet al [PBMW98] actually tries to mend this problem bylooking at the importance of a page in a recursive manner In other words ldquoa page with high PageRank is apage referenced by many pages with high PageRankrdquo PageRank not only counts the direct links to a pagebut also includes indirect links The same is valid for scientific citations
PageRank turns out to be a simple robust and reliable way to measure the importance of web pages andit can be computed in a very efficient way For these reasons most of todayrsquos commercial search enginesare believed to use PageRank as an important source of ranking
In this paper we
bull describe general ranking functions that depend on incoming paths of varying lengths
bull show that PageRank belongs to this class of functions
bull show how to compute these rankings
lowastPartially supported by MIUR COFIN Project ldquoLinguaggi formali e automirdquo
1
bull suggest why 085 is a good choice of the parameter for PageRank and
bull compare the ranking orders induced by different ranking functions finding ways of approximatingPageRank up to very high precision
The rest of this paper is organized as followsSection2 introduces the notion of functional rankingSec-tion 3 describes three damping functionsSection4 compares them analytically andSection5 experimentswith different parameters for each function Finallythe last sectionpresents our conclusions
2 Functional Rankings
In this section we introduce the notion offunctional ranking a general family of ranking functions thatincludes PageRank To describe PageRank formally consider a web graph ofN pages LetANtimesN be thelink matrix in this graphai j = 1 iff there is a link from pagei to pagej This link matrix is seldom used asit is mainly for two reasons
Normalization In the Web creating an out-link is free so there is an incentive for web page authors tocreate pages with many out-links this is the reason why a metaphor of ldquovotingrdquo is enforced [Lif00]in which each page has only one ldquovoterdquo that has to be split among its linked pages This is typicallydone in link-based ranking by normalizingA row-wise the normalization process means that everyweb page can only decide how to divide its own score among the pages it leads to but it cannot assignmore score than it has Another way to look at normalization is that the matrix is turned into thetransition matrix of a stochastic process
The normalization does not need to give each out-link the same value as there is evidence that weblinks have different purposes such as navigating in a multi-page set expanding the contents of the cur-rent page pointing to another resource etc [HG98] Also links within the same site can be consideredself-links and as such do not confer as much authority as a link between different sites indeed thereare ranking methods like BHITS [BH98] or Altavistarsquos Eigenrank that treat them differently Othercharacteristics of links such as if they appear at the beginning or the bottom of the page or if theyappear within a certain HTML element can also be used for non-uniform normalization [BYD04]
We will assume uniform normalization so if a page hasd out-links each of those links has a weightof 1d but the results of this paper can be applied to other forms of normalization
Dangling nodesSpecial attention should be paid to the possible presence of nodes with no outgoing arcs(also known as ldquodangling nodesrdquo) in fact dangling nodes fail to produce a row-stochastic matrixbecause the rows of dangling nodes are filled with zeroes Dangling nodes can be dealt with byadding an extra node that is linked to and from all other nodes or by introducing new arcs from eachdangling node to every node in the graph In our analysis we shall assume that all dangling nodeshave been eliminated already in some way so that we do not have to worry about their presence Allthe algorithms we will present can be modified so that dangling nodes can be dealt with explicitly andwith virtually no additional cost
Let P be the row-normalized link matrix of the graph withN nodes PageRankr(α) is defined as thestationary distribution of the matrix
αP+(1minusα)1Tv
whereα isin [01) is a parameter calleddamping factor(sometimes also called a dampening factor) andv isa fixedpreference vectorthat may represent the interests of a particular user or another ranking vector that
2
is used for weighting pages Note that the above matrix is ergodic (at least if every entry ofv is strictlypositive) so it has exactly one stationary distribution Even though most of our results can be easily restatedwith a non-uniform preference vectorv for the sake of clarity we shall only consider the uniform preference1N in the rest of the paper
As observed in [BSV05] the PageRank vectorr(α) can be written as
r(α) = (1minusα)infin
sumt=0
αt 1N
1Pt
Or in matricial form
r(α) = (1minusα)1N
1(I minusαP)minus1 ||αP||lt 1
There is an equivalent and actually very intriguing way of rewriting this formula mentioned in [NSW01]that leads to a conclusion similar to those of [Bri06] given a pathp= 〈x1x2 xk〉 in the graph we defineits branching contributionas follows
branching(p) =1
d1d2 middot middot middotdkminus1
whered j is the outdegree this is the number of outgoing arcs of nodex j Then the ranking of nodei according to PageRank is
r i(α) = sumpisinPath(minusi)
(1minusα)α|p|
Nbranching(p)
where Path(minus i) is the set of all paths into nodei and |p| is the length of pathp this is because(Pt)i j
contains the sum of the branching contributions of all paths of lengtht from i to j as one can easily show byinduction ont This way of expressing the PageRank of a node is interesting because it highlights the factthat the rank of a node is essentially obtained as a weighted sum of contributions coming from every pathentering into the node with weights that decay exponentially in the length of the path
A natural generalization of this idea consists in taking into consideration a rankingR of the generalform
R =infin
sumt=0
damping(t)1N
1middotPt
or equivalently
Ri = sumpisinPath(minusi)
damping(|p|) 1N
branching(p)
where the damping function is a suitable choice of weights We will refer to this form of ranking asfunc-tional ranking because it is parametrized by the damping function As we have seen generic PageRank isa functional ranking where the damping function
damping(t) = (1minusα)αt
decays exponentially fast
3
3 Damping Functions
In this section we show several functional rankings by describing their damping functions First we showwhich class of damping functions generate well-defined functional rankings
As shown in [Bri06 Corollary 24] for every pair of nodesi and j and for every lengtht
sumpisinPath(i j)|p|=t
branching(p)le 1
A more general property holds
Theorem 1 For every node i and every length t
sumpisinPath(iminus)|p|=t
branching(p) = 1
Proof By induction ont For t = 0 there is only one path fromi of length 0 and its branching is 1 Forthe inductive stepsumpisinPath(iminus)|p|=t+1branching(p) can be rewritten by observing that ifi has outdegreedi every path fromi of lengtht +1 is the concatenation ofi with a path of lengtht from an out-neighbor ofi
sumpisinPath(iminus)|p|=t+1
branching(p) = sumjirarr j
1di
sumpisinPath( jminus)|p|=t
branching(p) = sumjirarr j
1di
= 1
As a consequence to guarantee that the functional ranking is well-defined and normalized (ie that rankvalues sum to 1) we need
N
sumi=1
sumpisinPath(minusi)
damping(|p|) 1N
branching(p) = 1
that isinfin
sumt=0
damping(t)1N sum
pisinPath(minusminus)branching(p) = 1
UsingTheorem1 sumpisinPath(minusminus) branching(p) = N so the latter equality is equivalent to
infin
sumt=0
damping(t) = 1
Hence every choice of the damping function such thatsuminfint=0damping(t) = 1 yields a well-defined nor-
malized functional ranking Nonetheless not all choices are equivalent so we have to find out whichfunctions generate better rankings Since a direct link should be more valuable as a source of evidence thana distant link we focus on damping functions that are decreasing ont the length of the paths
31 Linear damping
Letrsquos start by considering a simple damping function such as
damping(t) =
2(Lminust)L(L+1) t lt L
0 t ge L
4
this is a damping function that decreases linearly with distance and reaches zero at distanceL The trivialcaseL = 1 gives a uniform ranking andL = 2 is ranking by indegree as in the latter case all paths of lengthge 2 are not considered
From the definition
R =infin
sumt=0
damping(t)vPt
=L
sumt=0
2(Lminus t)L(L+1)
vPt
=2
L(L+1)v
Lminus1
sumt=0
(Lminus t)Pt
=2
L(L+1)v(L(I minusP)minusP(I minusPL))((I minusP)2)minus1
provided that(I minusP)2 is not singularAn advantage of this type of ranking is that only the first few levels are considered so the computation
is fast and the number of iterations is fixed
Computation For computing this functional ranking we can define the following sequence
R(0) =2
L+1v
R(k+1) =(Lminuskminus1)
(Lminusk)RkP
The functional ranking with linear damping issumLminus1k=0 R(k) An algorithm for computing this ranking
shown inFigure1 arises directly from this summation
32 Exponential damping PageRank
As we already noted PageRank can be seen as a functional ranking where the damping function decaysexponentially
damping(t) = (1minusα)αt
As longer paths have less importance in the calculation of PageRank it could be approximated by usingonly a few levels of links In [CGS04] it is shown that by using only the nodes at distance 1 from a targetnode (equivalent to linear damping withL = 2) PageRank can be approximated with 30 of average errorUsing nodes at distance 2 the average error drops to 20 and at distance 3 to 10 After that there are nosignificant improvements by adding more levels and the cost (the number of nodes to be explored) is muchhigher
Computation Since PageRank is the principal eigenvector of the modified graph matrix it can be easilyapproximated by the iterative Power Method algorithm as suggested by Pageet al in their original pa-per [PBMW98] this iterative algorithm gives good approximations (both in norm and with respect to theinduced node order) in few iterations even though convergence speed and numerical stability decay whenαgets close to 1 [HK03b HK03a] Other methods to compute PageRank have been proposed some of them
5
Require L maximum path length N number of nodesv preference vector1 for i 1 N do Initialization2 S[i] larr R[i] larr 2v[i](L+1)3 end for4 for step 1 Lminus1 do Iteration step5 Auxlarr 06 for i 1 N do Follow links in the graph7 for all j such that there is a link fromi to j do8 Aux[j] larr Aux[j] + R[i]outdegree(i)9 end for
10 end for11 for i 1 N do Add to ranking value12 R[i] larr Aux[i] times (Lminusstep)
(Lminus(stepminus1))13 S[i] larr S[i] + R[i]14 end for15 end for16 return R
Figure 1 Algorithm for computing a functional ranking with linear damping
using techniques for the solution of systems of linear equations some other concentrating on some specificfeatures of the web as a graph that determine forms of locality in the computation of PageRank (see forexample [PBMW98 Hav99 GG04 LGZ04 KHMG03b KHMG03a])
33 Quadratic hyperbolic damping TotalRank
Recently a ranking method called TotalRank [Bol05] has been proposed The method aims at eliminatingthe necessity for an arbitrary parameter by integrating PageRank over the entire range ofα If r(α) is thevector of PageRank then TotalRank is defined as
T =int 1
0r(α)dα
T can be written as
int 1
0r(α)dα =
1N
infin
sumt=0
int 1
0(1minusα)αt1middotPtdα
=1N
infin
sumt=0
1(t +1)(t +2)
1middotPt
By using the definition of the logarithm of a matrix
ln(I minusP) =minusinfin
sumk=1
Pk
k
we can write TotalRank as
T = Pminus1(I +(I minusPminus1) ln(I minusP))
6
provided thatPminus1 is not singular andP 6= I TotalRank is as a weighted sum of the score associated with paths of varying lengths in which the
weights are hyperbolically decreasing on the lengths of the paths In other words TotalRank is a functionalranking with damping function
damping(t) =1
(t +1)(t +2)
and it is well defined assuminfint=0damping(t) = 1
Computation It is known that the cost of calculating TotalRank is the same as the cost of calculatingPageRank via the Power Method [BSV05] even though some more iterations are required to obtain thesame precision
34 General hyperbolic damping HyperRank
TotalRank is part of a more general family of weighting schemes for paths of different lengths that can beapproximated using
s(β) =1
Nζ(β)
infin
sumt=0
1
(t +1)β 1middotPt
Again this way of ranking follows the general scheme with damping function chosen as
damping(t) =1
ζ(β)(t +1)β
Here we are using Riemannrsquos zeta functionζ(β) = suminfint=1
1tβ for normalization and we needβ gt 1 for it
to converge Note that whenβ = 2 we get weights similar to those of TotalRank in which thet-th coefficientis 1(t +1)(t +2) whereas here it is 1ζ(2)(t +1)2
A meaningful choice forβ should be done considering the distribution of paths of different lengths in ascale-free graph A largeα in PageRank or a smallβ in HyperRank means increasing the effect of longerpaths in the score
Computation Figure2 shows an algorithm for approximating HyperRank Let us define a vector se-quenceR(t) as follows
R(0) =1
Nζ(β)
R(k+1) =(
k+1k+2
)βR(k)P
It is easy to see thatsuminfint=0R(k) = s(β) becauseR(k) = 1(N middotζ(β)(k+1)β))1middotPk this observation leads
to the algorithm Note that convergence speed is much slower than ordinary PageRank especially whenβis close to 1 the norm of thek-th summand being bound by 1(1+ 1k)β Interestingly enough thoughconvergence speed is reasonable ifβ is sufficiently large
7
Require N number of nodesv preference vectorβ damping parameter1 for i 1 N do Initialization2 S[i] larr R[i] larr v[i]ζ(β)3 end for4 klarr 05 while not convergeddo Iteration step6 klarr k + 17 Auxlarr 08 for i 1 N do Follow links in the graph9 for all j such that there is a link fromi to j do
10 Aux[j] larr Aux[j] + R[i]outdegree(i)11 end for12 end for13 for i 1 N do Add to ranking value14 R[i] larr Aux[i] times(k(k+1))β
15 S[i] larr S[i] + R[i]16 end for17 end while18 return S
Figure 2 An algorithm to compute general hyperbolic rank
35 An empirical damping
An empirical damping function would consider how much the value of an endorsement decreases by fol-lowing longer paths in the real web graph This cannot be known exactly but we can attempt to measureit indirectly Pages that link to each other are more similar than pages chosen at random [Dav00] evi-dence from topical crawlers [SPM05] shows that when doing breadth-first exploring the topic ldquodriftsrdquo asthe distance increases On the same line of though we propose to use the decrease of text similarity as anapproximation to an ldquoempiricalrdquo damping function
To find out which is the correlation between link-distance and similarity we performed the followingexperiment we considered a web graph corresponding to a partial snapshot of theuk domain and sampled200 nodes at random For each sampled node we followed links backwards to obtain nodes at a minimumdistance of 1 2 3 4 or 5 links Then we sampled 12000 pairs at each minimum distance at randomand computed their similarities with the original nodes Similarity was measured using the normalization ofTFIDF [BYRN99] without stemming nor stopwords removal
The resulting averages are shown inFigure3 with standard deviation error bars Text similarity clearlydecreases with distance and in some applications the empirical distribution of text similarity versus distancecould be used as an ldquoempiricalrdquo damping function Different measures of text similarity can yield differentdistributions for instance [WHAT04] uses the number of repeated words and phrases between pages andobtains a faster decrease in similarity
8
02
03
04
05
06
07
1 2 3 4 5
Ave
rage
text
sim
ilari
ty
Link distance
Figure 3 Link distance vs average text similarity
4 Comparing Damping Functions
A comparison of the damping functions described in the previous section is shown inFigure4 of coursehyperbolic damping functions decay asymptotically more slowly than exponential damping but notice thatfor short paths the latter may dominate the former in many cases
000
005
010
015
020
025
1 2 3 4 5 6 7
Wei
ght
Length of the path (t)
PageRank α=075PageRank α=050
TotalRankHyperRank β=2HyperRank β=3
Linear damping L=10
Figure 4 Weights given by the different damping functions for some values ofαandβ
In this section we aim at analyzing how similar are these functional rankings and how could we useone of the damping functions to approximate another with a suitable choice of parameters
9
41 Approximating TotalRank with PageRank
It has been shown experimentally that the rank correlation (Kendallrsquosτ) between TotalRank and PageRankis maximal whenα asymp 07 [Bol05] the maximum value forτ is over 095 meaning that for that specificchoice ofα PageRank and TotalRank induce almost equivalent ranking orders
We now want to approach the same problem in an analytic fashion more precisely we aim at study-ing the difference between TotalRank and PageRank by calculating the difference between their respectivedamping functions
dampingTotalRank(t) =1
(t +1)(t +2)dampingPageRank(α)(t) = (1minusα)αt
As they are normalized both damping functions have the same summation over the entire range oftOur approach is to consider the summation of their differences up to a maximum length for a path As thetwo functions are decreasing the difference in the first levels makes most of the difference in the rankingsIf ` is the maximum path length we are interested in we aim at minimizing this sum
`
sumt=0
(1
(t +1)(t +2)minus (1minusα)αt
)= α`+1minus 1
`+2
The minimum absolute value is 0 and it is obtained whenα is equal to
αlowast(`) =1
`+1radic
`+2= 1minus log`
`+O
(log2`
`2
)
Figure5 showsαlowast(`) as a function of Recall that for the World-Wide Web graph the average lengthof a path between two nodes when a path exists has been estimated in about 16 [BKM+00] or 19 [AJB99]but clearly today is over 20 Now in the range of path lengths between 15 and 20 the value ofαlowast(`)parameters that minimizes the difference between the exponentially decaying weights of PageRank and thehyperbolically decaying weights of TotalRank is roughly 085
Difference Optimal value
0506
0708
09
α0 5 10 15 20 25
Length
-02
-01
0
01
02
03
04
045
050
055
060
065
070
075
080
085
090
0 5 10 15 20 25
α that
min
imiz
es th
e di
ffer
ence
of
sum
of w
eigh
ts w
ith T
otal
Ran
k
Maximum length of the paths considered
Figure 5 Bestαlowast(`) for minimizing the difference of the sum of weights betweenPageRank and TotalRank
10
42 Approximating HyperRank with PageRank
Now we want to approximate the weights of
s(β) =1
Nζ(β)
infin
sumt=0
1
(t +1)β Pt
using the weights of
r(α) =1minusα
N
infin
sumt=0
αtPt
and we proceed again by considering paths up to a certain length
`
sumt=0
(1
ζ(β)(t +1)β minus (1minusα)αt
)The minimum can be zero and it is attained at
αlowast(`β) =
radic1minus 1
ζ(β)
`
sumt=0
1
(t +1)β
Theα that minimizes the difference of weights for different values ofβ and of the maximum path lengths` is shown inFigure6 In the case ofβ = 2 for instance for path lengths up to 10 to 20 the bestα is between075 and 085
α
15202530β 5 10 15 20 25
Length
03
04
05
06
07
08
09
1
00
01
02
03
04
05
06
07
08
09
10
1 15 2 25 3 35 4
α th
at m
inim
izes
the
diff
eren
ce o
fsu
m o
f wei
ghts
Exponent β
Max path length=25Max path length=20Max path length=15Max path length=10
Max path length=5
Figure 6 Bestα for minimizing the difference of the sum of weights betweenPageRank and HyperRank with exponentβ for various path lengths
11
43 Approximating PageRank with LinearRank
For approximating the damping function of PageRank with the damping function of LinearRank we con-sider the summation of the differences up to a certain path length If`le L
`
sumt=0
((1minusα)αt minus 2(Lminus t)
L(L+1)
)And if ` gt L
Lminus1
sumt=0
((1minusα)αt minus 2(Lminus t)
L(L+1)
)+
`
sumt=L
(1minusα)αt
We will assume thatle L so the evaluation of the difference between the two rankings is done in an areain which both rankings have non-zero values TheL that minimizes the difference for a given combinationof α and` is
Llowast(α `) = `+2`α`+1 +α`+1 +1+
radic(1+α`+1)2 +4`(`+2)α`+1
2(1minusα`+1)
= `+1+O(`α(`+1)2
)and we have plotted it for different values ofα and` in Figure7
L
0405
0607
0809
α5 10 15 20 25
Length
5
10
15
20
25
30
5
10
15
20
25
30
35
40
05 06 07 08 09
L th
at m
inim
izes
the
diff
eren
ce o
fsu
m o
f wei
ghts
Exponent α
Max path length=25Max path length=20Max path length=15Max path length=10Max path length=5
Figure 7 BestL for minimizing the difference of the sum of weights betweenLinearRank and LinearRank for various path lengths
12
5 Parameters for the Damping Functions
For our experiments we used several snapshots from the Web including theuk it andeuint domainsFor comparison we also considered a synthetic scale-free network produced according to the evolving modeldescribed by Kumaret al [KRR+00] (a combination of preferential attachment and random links) with theparameters suggested by Panduranganet al [PRU02] As far as the latter is concerned in the generatedgraph the exponents for the power-law in the center part of the distributions are -21 for in-degree andPageRank and -27 for out-degree we generated a 100000-nodes graph without disconnected nodes
In this section we study the behavior of the ranking functions for varying values of their parameters
51 Characteristic path lengths
In scale-free networks the distances between pairs of nodes follow a Gaussian distribution [AJB99] (theaverage is not given in their paper) Analytic estimations for the average distance of a graph of scale-freenetwork ofn nodes include
bull O(log(n)) [WS98]
bull O(log(n) log(np)) in sparse graphs withp links [CL01]
bull 1+ log(nz1) log(z2z1) wherez1 is the average indegree andz2 is the average number of nodes atdistance 2 [NSW01] and
bull O(log(n) log(log(n))) [BR04]
We did the following experiment starting from a node picked at random we followed the links back-wards and counted the number of nodes at different distancesFigure8 plots the average distances foundwhich appear to be growing (sub)logarithmically with the size of the graphFigure9 shows the distributionobtained in each sample For this experiment we are not counting the pages without in-links
3456789
10111213141516
105 106 107 108
Ave
rage
dis
tanc
e
Number of nodes (N)
Figure 8 Average distances versus number of nodes
If graphs of different sizes show different path lengths what is the effect of this in the ranking calcu-lation Letrsquos suppose that for a graph withN1 nodes it is found by experimental or analytic means that agood parameter for PageRank isαlowast1 Now we would like to have a good parameterαlowast2 for a graph with thesame properties except that the size of the new graph isN2 lt N1
13
it Web graph uk Web graph40 mill nodes 18 mill nodes
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
Avg distance 149 Avg distance 148
euint Web graph Synthetic graph800000 nodes 100000 nodes
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
Avg distance 100 Avg distance 42
Figure 9 Distribution of the average number of nodes at a certain distance froma given node
One possible approach consistent with what we have done so far is to consider that the sum of theweights up to the average path lengths of the graphs (L1 L2) have to be similar for both rankings to behavein a similar way If we take this approach the solution is
1minus (αlowast1)L1+1 = 1minus (αlowast2)
L2+1
αlowast2 = (αlowast1)L1+1L2+1
αlowast2 asymp (αlowast1)log(N1)log(N2)
An example that can be used in practice is the following letrsquos consider a web graph withN1 = 115times109
pages (the size of the full Web estimated by [GS05]) and another graph with onlyN2 = 50times106 pages (thesize of the Web of a large country) the second graph is roughly 3 orders of magnitude smaller
If it is shown empirically thatαlowast1 = 085 is a good value for the PageRank parameter for the whole Webthenαlowast2 = 081 should have a similar behavior in the 50-million page set which is natural as the path lengthsare shorter If the subset of web pages were even smaller for instanceN2 = 106 pages (the size of the webof a large organization) thenαlowast2 = 076
14
52 Damping parameters and in-degree
In this section we are using data from theuk Web graph and a 8500-nodes synthetic graph with the sameindegree and outdegree distribution than the previous synthetic graph We first measured the variance ofthe values from the ranking function as we consider that a high variance is good in a ranking functionas the relative values differ more We also measured the relationship between the ranking function andin-degree for different values of the parameters in terms of covariance correlation coefficient and rankingorders (Kendallrsquosτ) The results are shown inFigure10
Variance Covariance Correlation coefficient Kendallrsquosτ
000e+00200e-13400e-13600e-13800e-13100e-12120e-12140e-12160e-12180e-12200e-12
01 02 03 04 05 06 07 08 09
Var
ianc
e of
Pag
eRan
k
Damping factor α
00 times 100
50 times 10-5
10 times 10-4
15 times 10-4
20 times 10-4
25 times 10-4
30 times 10-4
35 times 10-4
01 02 03 04 05 06 07 08 09
Cov
aria
nce
betw
een
Page
Ran
k an
d in
-deg
ree
Damping factor α
07000710072007300740075007600770078007900800
01 02 03 04 05 06 07 08 09
Corr
elat
ion
coef
ficie
nt w
ith in
-deg
ree
Damping factor α
042
044
046
048
050
052
054
056
01 02 03 04 05 06 07 08 09Ken
dallrsquo
s τ b
etw
een
Page
Rank
and
in-d
egre
e
Damping factor α
005
010
015
020
025
030
035
040
045
050
055
01 02 03 04 05 06 07 08 09
Var
ianc
e of
Pag
eRan
k
Damping factor α
0 times 100
1 times 10-3
2 times 10-3
3 times 10-3
4 times 10-3
5 times 10-3
6 times 10-3
01 02 03 04 05 06 07 08 09
Cov
aria
nce
betw
een
Page
Ran
k an
d in
-deg
ree
Damping factor α
0995
0996
0997
0998
0999
1000
01 02 03 04 05 06 07 08 09
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor α
079
080
081
082
083
084
085
086
087
01 02 03 04 05 06 07 08 09Ken
dallrsquo
s τ b
etw
een
Page
Rank
and
in-d
egre
e
Damping factor α
Figure 10 Variance of PageRank and its relationship with indegree in terms ofcovariance correlation coefficient and Kendallrsquosτ coefficient for varying valuesof the parameterα Top uk web graph bottom synthetic graph
The variance is higher asα increases In regard to the relationship with indegree for company homepages it has been shown that the logarithm of the in-degree is correlated with PageRank [UCH03] Ourresults are consistent with this observation The covariance monotonically increases withα both in the caseof the synthetic graph and in the web graph Not surprisingly using the generative model the correlationsare higher We observe a maximum correlation atα = 07 in the synthetic graph and atα = 05 in the webgraph We also notice that the correlation drops significativelly asα increases because a largeα means thatlonger paths have an effect in the calculation note however that this phenomenon does not significantlyimpact on the correlation coefficient that is still very large
A high correlation between PageRank and in-degree is bad from the point of view of a search enginebecause it makes link-spam easier In particular as the correlation coefficient is higher in theuk web graphnear 05 if we chooseα close to this value we are helping link spammers Note however that a highcorrelation was foreseeable because as shown in[CGS04] even approximating PageRank with just only 1level of links gets 70 of accuracy
The behavior of the Kendallrsquosτ coefficient which measures the similarity between ranking orders isthe opposite than the one observed in the real graph This also happens for HyperRank as inFigure11we made the same measurements for this functional ranking and the results were consistent The graphseems inverted because a low value ofβ has the same effect as a high value ofα longer paths have moreimportance in the calculation
The differences in the behavior of the ranking order in the synthetic graph might be explained by thefact that the generative model we are using does not capture some properties that might be relevant for theranking order under a functional ranking such as the clustering coefficient Also the synthetic graph is
15
Variance Covariance Correlation coefficient Kendallrsquosτ
000e+00
100e-12
200e-12
300e-12
400e-12
500e-12
600e-12
700e-12
800e-12
900e-12
1 15 2 25 3 35 4
Var
ianc
e of
Gen
eral
Hyp
erbo
lic R
ank
Damping factor β
000e+00
100e-04
200e-04
300e-04
400e-04
500e-04
600e-04
700e-04
800e-04
1 15 2 25 3 35 4Cov
aria
nce
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
0730
0740
0750
0760
0770
0780
0790
0800
1 15 2 25 3 35 4
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor β
04600
04700
04800
04900
05000
05100
05200
05300
1 15 2 25 3
Ken
dallrsquo
s τ
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
000e+00
200e-05
400e-05
600e-05
800e-05
100e-04
120e-04
140e-04
160e-04
1 15 2 25 3 35 4
Var
ianc
e of
Gen
eral
Hyp
erbo
lic R
ank
Damping factor β
000e+00
200e-03
400e-03
600e-03
800e-03
100e-02
120e-02
1 15 2 25 3 35 4Cov
aria
nce
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
0997
0997
0998
0998
0999
0999
1 15 2 25 3 35 4
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor β
0815008200082500830008350084000845008500085500860008650
1 15 2 25 3 35 4
Ken
dallrsquo
s τ b
etw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
Figure 11 Variance of general hyperbolic rank for some values ofβ and itsrelation with in-degree Experiments have been performed on theuk graph (top)and synthetic graph (bottom)
assortative (highly linked pages are mostly linked to other highly linked pages) while the real web graph isdisassortative (most of the neighbours of highly linked pages have small indegree)
53 Experimental comparison of ranking orders
In this section we present experimental results about the similarity between the ranking orders induced bysome of the functional rankings discussed in the previous sections To perform the comparison we usedKendallrsquosτ and data from the U K Web graphFigure12shows how PageRank compares with HyperRankfor various pairs ofα andβ In the limit αβrarr 1 both rankings are equivalent and they remain similar in alarge region of the parameter space
α
β
07
08
09
1
τ ge 095
τ
02 03
04 05
06 07
08 09
2 25
3 35
4
01
02
03
04
05
06
07
08
09
15 2 25 3 35 4
α th
at m
axim
izes
Ken
dallrsquo
s τ
Exponent β
Actual optimumPredicted optimum with length=5
Figure 12 Comparison (using Kendallrsquosτ) between PageRank and HyperRankwith various damping parameters in the UK web graph The optimum predictedin the analysis with = 5 is very close to the real one
In the figure we can see that the rankings obtained with HyperRank and PageRank can be almostequivalent (Kendallrsquosτ ge 095) moreover the analysis shown insection42 considering only paths of
16
lengths less than 5 provides a very good approximation for the optimum combination of parameters Thismeans that in fact the difference in the damping functions in the first few levels is crucial
The exponentsβ required for giving a good approximation of PageRank are very small whenα ge 07limiting the practical applicability of HyperRank as its convergence is not faster than the one of PageRank
As far as LinearRank and PageRank are concerned long paths and largeα should be considered toobtain a sufficiently similar ranking as shown inFigure13
05 06
07 08
09
5 10
15 20
25
080085090095100
τ
τ ge 095
αL
5
10
15
20
25
05 06 07 08 09L
that
max
imiz
es K
enda
llrsquos
τExponent α
Actual optimumPredicted optimum with length=5
Figure 13 Comparison (using Kendallrsquosτ) between PageRank and LinearRankin the UK web graph with various damping parameters Again the predictedoptimum with` = 5 is very close to the actual optimum
The predicted optimum given insection43 with ` = 5 this is considering only the summation of thedifferences between both damping functions up to paths of length 5 is very close to what was obtained inpractice Forα = 08 calculating LinearRank withL = 10 (which means the same number of iterations)givesτ ge 098 for α = 09 calculating LinearRank withL = 15 also givesτ ge 098 In both cases theranking order of PageRank is approximated by the ranking order of LinearRank with very high precision
6 Conclusions
In this paper we have defined a broad class of link-based ranking algorithms based on the contribution ofdamping factors along all different paths reaching a page We introduce four particular damping decays lin-ear exponential quadratic hyperbolic and general hyperbolic where exponential is equivalent to PageRankand quadratic hyperbolic to TotalRank
We studied the differences and similarities among these ranking algorithms and we have found that
bull Functional rankings using different damping functions can provide similar orderings if the parametersare chosen carefully
bull LinearRank can be used for calculating a ranking that is as good as PageRank but with a fixed andsmaller number of iterations
bull The parameters for the damping functions depend on the characteristic of path lengths in the graphwhich is known to grow sub-logarithmically on the size of the graph
More work is needed to find other damping functions that compute rankings similar to PageRank butare easier and faster to compute We use global ranking similarity but another measure could be the ranking
17
similarity in the top 20 results of real queries In this setting our results can change so future work willinclude this variation
Because of their high cost link-based ranking methods that involve iterative calculations at query timeare probably not used by large-scale search engines at this moment but the functional ranking with lineardamping we have presented can provide a good approximation with few iterations Also the approach wehave presented could be also applied to multivalued ranking functions such as HITS [Kle99] and topic-sensitive PageRank [Hav02] to obtain for instance a method for approximating the hubs and authorityscores using less iterations and a linear damping function
Our approach also helps to understand how easy or difficult is to collude many pages to modify theranking of a given page Clearly there are many different factors path lengths damping function branchingdegrees and number of colluded pages The graph structure of the collusion will affect those factors and weplan to analyze them
Acknowledgements
We would like to thank Daniel Fogaras for a valuable discussion about TotalRank that motivated part of thisresearch
References
[AJB99] Reka Albert Hawoong Jeong and Albert L Barabasi Diameter of the world wide webNature 401130ndash131 1999
[BH98] Krishna Bharat and Monika R HenzingerImproved algorithms for topic distillation in ahyperlinked environment In Proceedings of the 21st Annual International ACM SIGIR Con-ference on Research and Development in Information Retrieval pages 104ndash111 MelbourneAustralia August 1998 ACM Press New York
[BKM +00] Andrei Broder Ravi Kumar Farzin Maghoul Prabhakar Raghavan Sridhar RajagopalanRaymie Stata Andrew Tomkins and Janet WienerGraph structure in the web Experimentsand models In Proceedings of the Ninth Conference on World Wide Web pages 309ndash320Amsterdam Netherlands May 2000 ACM Press
[Bol05] Paolo BoldiTotalrank ranking without damping In Poster proceedings of the 14th interna-tional conference on World Wide Web pages 898ndash899 Chiba Japan 2005 ACM Press
[BR04] Bela Bollobas and Oliver RiordanThe diameter of a scale-free random graph Combinator-ica 24(1)5ndash34 2004
[Bri06] Michael Brinkmeier PageRank revisited ACM Transaction on Internet Technologies6(3)257ndash279 2006
[BSV05] Paolo Boldi Massimo Santini and Sebastiano VignaPagerank as a function of the dampingfactor In Proceedings of the 14th international conference on World Wide Web pages 557ndash566 Chiba Japan 2005 ACM Press
[BYD04] Ricardo Baeza-Yates and Emilio DavisWeb page ranking using link attributes In Alternatetrack papers amp posters of the 13th international conference on World Wide Web pages 328ndash329 New York NY USA 2004 ACM Press
18
[BYRN99] Ricardo Baeza-Yates and Berthier Ribeiro-NetoModern Information Retrieval AddisonWesley May 1999
[CGS04] Yen-Yu Chen Qingqing Gan and Torsten SuelLocal methods for estimating pagerank val-ues In Proceedings of the thirteenth ACM conference on Information and knowledge man-agement (CIKM) pages 381ndash389 New York NY USA 2004 ACM Press
[CL01] Fan Chung and Linyuan LuThe diameter of random sparse graphs Adv Appl Math26257ndash279 2001
[Dav00] Brian D DavisonTopical locality in the web In Proceedings of the 23rd annual internationalACM SIGIR conference on research and development in information retrieval pages 272ndash279Athens Greece 2000 ACM Press
[GG04] Gene H Golub and Chen GreifArnoldi-type algorithms for computing stationary distributionvectors with application to pagerank Technical Report SCCM-04-15 Stanford University2004
[GS05] Antonio Gulli and Alessio SignoriniThe indexable Web is more than 115 billion pages InPoster proceedings of the 14th international conference on World Wide Web pages 902ndash903Chiba Japan 2005 ACM Press
[Hav99] Taher HaveliwalaEfficient computation of pagerank Technical report Stanford University1999
[Hav02] Taher H HaveliwalaTopic-sensitive pagerank In Proceedings of the Eleventh World WideWeb Conference pages 517ndash526 Honolulu Hawaii USA May 2002 ACM Press
[HG98] Stephanie W Haas and Erika S GramsPage and link classifications connecting diverseresources In DL rsquo98 Proceedings of the third ACM conference on Digital libraries pages99ndash107 New York NY USA 1998 ACM Press
[HK03a] Taher Haveliwala and Sepandar KamvarThe condition number of the pagerank problemTechnical Report 36 Stanford University 2003
[HK03b] Taher Haveliwala and Sepandar KamvarThe second eigenvalue of the google matrix Tech-nical Report 20 Stanford University 2003
[JM98] W-K Joo and S H Myaeng Improving retrieval effectiveness with hyperlink informationIn Proceedings of International Workshop on Information Retrieval with Asian Languages(IRAL) Singapore October 1998
[KHMG03a] S Kamvar T Haveliwala C Manning and G GolubExploiting the block structure of theweb for computing pagerank 2003
[KHMG03b] Sepandar D Kamvar Taher H Haveliwala Christopher D Manning and Gene H GolubExtrapolation methods for accelerating pagerank computations In Proceedings of the twelfthinternational conference on World Wide Web pages 261ndash270 ACM Press 2003
[Kle99] Jon M KleinbergAuthoritative sources in a hyperlinked environment Journal of the ACM46(5)604ndash632 1999
19
[KRR+00] R Kumar P Raghavan S Rajagopalan D Sivakumar A Tomkins and E UpfalStochasticmodels for the web graph In Proceedings of the 41st Annual Symposium on Foundations ofComputer Science (FOCS) pages 57ndash65 Redondo Beach CA USA 2000 IEEE CS Press
[LGZ04] Chris P Lee Gene H Golub and Stefanos A ZeniosA fast two-stage algorithm for com-puting pagerank and its extensions Technical report Stanford University 2004
[Li98] Yanhong LiToward a qualitative search engine IEEE Internet Computing July 1998
[Lif00] Maxim LifantsevVoting model for ranking Web pages In Peter Graham and MuthucumaruMaheswaran editorsProceedings of the International Conference on Internet Computingpages 143ndash148 Las Vegas Nevada USA June 2000 CSREA Press
[Mar97] Massimo MarchioriThe quest for correct information of the Web hyper search engines InProc of the sixth international conference on the Web Santa Clara USA April 1997
[NSW01] M E Newman S H Strogatz and D J WattsRandom graphs with arbitrary degree distri-butions and their applicationsPhys Rev E Stat Nonlin Soft Matter Phys 64(2 Pt 2) August2001
[PBMW98] Lawrence Page Sergey Brin Rajeev Motwani and Terry WinogradThe PageRank citationranking bringing order to the Web Technical report Stanford Digital Library TechnologiesProject 1998
[PRU02] Gopal Pandurangan Prabhakara Raghavan and Eli UpfalUsing Pagerank to characterizeWeb structure In Proceedings of the 8th Annual International Computing and CombinatoricsConference (COCOON) volume 2387 ofLecture Notes in Computer Science pages 330ndash390Singapore August 2002 Springer
[SPM05] Padmini Srinivasan Gautam Pant and Filippo MenczerA general evaluation framework fortopical crawlers Information Retrieval 8(3)417ndash447 2005
[UCH03] Trystan Upstill Nick Craswell and David HawkingPredicting fame and fortune Page-rank or indegree In Proceedings of the Australasian Document Computing SymposiumADCS2003 pages 31ndash40 Canberra Australia December 2003
[WHAT04] Fang Wu Bernardo A Huberman Lada A Adamic and Joshua R TylerInformation flow insocial groups Physica A Statistical and Theoretical Physics 337(1-2)327ndash335 June 2004
[WS98] Duncan J Watts and Steven H StrogatzCollective dynamics of small-world networksNa-ture 393(6684)440ndash442 June 1998
20
- Introduction
- Functional Rankings
- Damping Functions
-
- Linear damping
- Exponential damping PageRank
- Quadratic hyperbolic damping TotalRank
- General hyperbolic damping HyperRank
- An empirical damping
-
- Comparing Damping Functions
-
- Approximating TotalRank with PageRank
- Approximating HyperRank with PageRank
- Approximating PageRank with LinearRank
-
- Parameters for the Damping Functions
-
- Characteristic path lengths
- Damping parameters and in-degree
- Experimental comparison of ranking orders
-
- Conclusions
-
bull suggest why 085 is a good choice of the parameter for PageRank and
bull compare the ranking orders induced by different ranking functions finding ways of approximatingPageRank up to very high precision
The rest of this paper is organized as followsSection2 introduces the notion of functional rankingSec-tion 3 describes three damping functionsSection4 compares them analytically andSection5 experimentswith different parameters for each function Finallythe last sectionpresents our conclusions
2 Functional Rankings
In this section we introduce the notion offunctional ranking a general family of ranking functions thatincludes PageRank To describe PageRank formally consider a web graph ofN pages LetANtimesN be thelink matrix in this graphai j = 1 iff there is a link from pagei to pagej This link matrix is seldom used asit is mainly for two reasons
Normalization In the Web creating an out-link is free so there is an incentive for web page authors tocreate pages with many out-links this is the reason why a metaphor of ldquovotingrdquo is enforced [Lif00]in which each page has only one ldquovoterdquo that has to be split among its linked pages This is typicallydone in link-based ranking by normalizingA row-wise the normalization process means that everyweb page can only decide how to divide its own score among the pages it leads to but it cannot assignmore score than it has Another way to look at normalization is that the matrix is turned into thetransition matrix of a stochastic process
The normalization does not need to give each out-link the same value as there is evidence that weblinks have different purposes such as navigating in a multi-page set expanding the contents of the cur-rent page pointing to another resource etc [HG98] Also links within the same site can be consideredself-links and as such do not confer as much authority as a link between different sites indeed thereare ranking methods like BHITS [BH98] or Altavistarsquos Eigenrank that treat them differently Othercharacteristics of links such as if they appear at the beginning or the bottom of the page or if theyappear within a certain HTML element can also be used for non-uniform normalization [BYD04]
We will assume uniform normalization so if a page hasd out-links each of those links has a weightof 1d but the results of this paper can be applied to other forms of normalization
Dangling nodesSpecial attention should be paid to the possible presence of nodes with no outgoing arcs(also known as ldquodangling nodesrdquo) in fact dangling nodes fail to produce a row-stochastic matrixbecause the rows of dangling nodes are filled with zeroes Dangling nodes can be dealt with byadding an extra node that is linked to and from all other nodes or by introducing new arcs from eachdangling node to every node in the graph In our analysis we shall assume that all dangling nodeshave been eliminated already in some way so that we do not have to worry about their presence Allthe algorithms we will present can be modified so that dangling nodes can be dealt with explicitly andwith virtually no additional cost
Let P be the row-normalized link matrix of the graph withN nodes PageRankr(α) is defined as thestationary distribution of the matrix
αP+(1minusα)1Tv
whereα isin [01) is a parameter calleddamping factor(sometimes also called a dampening factor) andv isa fixedpreference vectorthat may represent the interests of a particular user or another ranking vector that
2
is used for weighting pages Note that the above matrix is ergodic (at least if every entry ofv is strictlypositive) so it has exactly one stationary distribution Even though most of our results can be easily restatedwith a non-uniform preference vectorv for the sake of clarity we shall only consider the uniform preference1N in the rest of the paper
As observed in [BSV05] the PageRank vectorr(α) can be written as
r(α) = (1minusα)infin
sumt=0
αt 1N
1Pt
Or in matricial form
r(α) = (1minusα)1N
1(I minusαP)minus1 ||αP||lt 1
There is an equivalent and actually very intriguing way of rewriting this formula mentioned in [NSW01]that leads to a conclusion similar to those of [Bri06] given a pathp= 〈x1x2 xk〉 in the graph we defineits branching contributionas follows
branching(p) =1
d1d2 middot middot middotdkminus1
whered j is the outdegree this is the number of outgoing arcs of nodex j Then the ranking of nodei according to PageRank is
r i(α) = sumpisinPath(minusi)
(1minusα)α|p|
Nbranching(p)
where Path(minus i) is the set of all paths into nodei and |p| is the length of pathp this is because(Pt)i j
contains the sum of the branching contributions of all paths of lengtht from i to j as one can easily show byinduction ont This way of expressing the PageRank of a node is interesting because it highlights the factthat the rank of a node is essentially obtained as a weighted sum of contributions coming from every pathentering into the node with weights that decay exponentially in the length of the path
A natural generalization of this idea consists in taking into consideration a rankingR of the generalform
R =infin
sumt=0
damping(t)1N
1middotPt
or equivalently
Ri = sumpisinPath(minusi)
damping(|p|) 1N
branching(p)
where the damping function is a suitable choice of weights We will refer to this form of ranking asfunc-tional ranking because it is parametrized by the damping function As we have seen generic PageRank isa functional ranking where the damping function
damping(t) = (1minusα)αt
decays exponentially fast
3
3 Damping Functions
In this section we show several functional rankings by describing their damping functions First we showwhich class of damping functions generate well-defined functional rankings
As shown in [Bri06 Corollary 24] for every pair of nodesi and j and for every lengtht
sumpisinPath(i j)|p|=t
branching(p)le 1
A more general property holds
Theorem 1 For every node i and every length t
sumpisinPath(iminus)|p|=t
branching(p) = 1
Proof By induction ont For t = 0 there is only one path fromi of length 0 and its branching is 1 Forthe inductive stepsumpisinPath(iminus)|p|=t+1branching(p) can be rewritten by observing that ifi has outdegreedi every path fromi of lengtht +1 is the concatenation ofi with a path of lengtht from an out-neighbor ofi
sumpisinPath(iminus)|p|=t+1
branching(p) = sumjirarr j
1di
sumpisinPath( jminus)|p|=t
branching(p) = sumjirarr j
1di
= 1
As a consequence to guarantee that the functional ranking is well-defined and normalized (ie that rankvalues sum to 1) we need
N
sumi=1
sumpisinPath(minusi)
damping(|p|) 1N
branching(p) = 1
that isinfin
sumt=0
damping(t)1N sum
pisinPath(minusminus)branching(p) = 1
UsingTheorem1 sumpisinPath(minusminus) branching(p) = N so the latter equality is equivalent to
infin
sumt=0
damping(t) = 1
Hence every choice of the damping function such thatsuminfint=0damping(t) = 1 yields a well-defined nor-
malized functional ranking Nonetheless not all choices are equivalent so we have to find out whichfunctions generate better rankings Since a direct link should be more valuable as a source of evidence thana distant link we focus on damping functions that are decreasing ont the length of the paths
31 Linear damping
Letrsquos start by considering a simple damping function such as
damping(t) =
2(Lminust)L(L+1) t lt L
0 t ge L
4
this is a damping function that decreases linearly with distance and reaches zero at distanceL The trivialcaseL = 1 gives a uniform ranking andL = 2 is ranking by indegree as in the latter case all paths of lengthge 2 are not considered
From the definition
R =infin
sumt=0
damping(t)vPt
=L
sumt=0
2(Lminus t)L(L+1)
vPt
=2
L(L+1)v
Lminus1
sumt=0
(Lminus t)Pt
=2
L(L+1)v(L(I minusP)minusP(I minusPL))((I minusP)2)minus1
provided that(I minusP)2 is not singularAn advantage of this type of ranking is that only the first few levels are considered so the computation
is fast and the number of iterations is fixed
Computation For computing this functional ranking we can define the following sequence
R(0) =2
L+1v
R(k+1) =(Lminuskminus1)
(Lminusk)RkP
The functional ranking with linear damping issumLminus1k=0 R(k) An algorithm for computing this ranking
shown inFigure1 arises directly from this summation
32 Exponential damping PageRank
As we already noted PageRank can be seen as a functional ranking where the damping function decaysexponentially
damping(t) = (1minusα)αt
As longer paths have less importance in the calculation of PageRank it could be approximated by usingonly a few levels of links In [CGS04] it is shown that by using only the nodes at distance 1 from a targetnode (equivalent to linear damping withL = 2) PageRank can be approximated with 30 of average errorUsing nodes at distance 2 the average error drops to 20 and at distance 3 to 10 After that there are nosignificant improvements by adding more levels and the cost (the number of nodes to be explored) is muchhigher
Computation Since PageRank is the principal eigenvector of the modified graph matrix it can be easilyapproximated by the iterative Power Method algorithm as suggested by Pageet al in their original pa-per [PBMW98] this iterative algorithm gives good approximations (both in norm and with respect to theinduced node order) in few iterations even though convergence speed and numerical stability decay whenαgets close to 1 [HK03b HK03a] Other methods to compute PageRank have been proposed some of them
5
Require L maximum path length N number of nodesv preference vector1 for i 1 N do Initialization2 S[i] larr R[i] larr 2v[i](L+1)3 end for4 for step 1 Lminus1 do Iteration step5 Auxlarr 06 for i 1 N do Follow links in the graph7 for all j such that there is a link fromi to j do8 Aux[j] larr Aux[j] + R[i]outdegree(i)9 end for
10 end for11 for i 1 N do Add to ranking value12 R[i] larr Aux[i] times (Lminusstep)
(Lminus(stepminus1))13 S[i] larr S[i] + R[i]14 end for15 end for16 return R
Figure 1 Algorithm for computing a functional ranking with linear damping
using techniques for the solution of systems of linear equations some other concentrating on some specificfeatures of the web as a graph that determine forms of locality in the computation of PageRank (see forexample [PBMW98 Hav99 GG04 LGZ04 KHMG03b KHMG03a])
33 Quadratic hyperbolic damping TotalRank
Recently a ranking method called TotalRank [Bol05] has been proposed The method aims at eliminatingthe necessity for an arbitrary parameter by integrating PageRank over the entire range ofα If r(α) is thevector of PageRank then TotalRank is defined as
T =int 1
0r(α)dα
T can be written as
int 1
0r(α)dα =
1N
infin
sumt=0
int 1
0(1minusα)αt1middotPtdα
=1N
infin
sumt=0
1(t +1)(t +2)
1middotPt
By using the definition of the logarithm of a matrix
ln(I minusP) =minusinfin
sumk=1
Pk
k
we can write TotalRank as
T = Pminus1(I +(I minusPminus1) ln(I minusP))
6
provided thatPminus1 is not singular andP 6= I TotalRank is as a weighted sum of the score associated with paths of varying lengths in which the
weights are hyperbolically decreasing on the lengths of the paths In other words TotalRank is a functionalranking with damping function
damping(t) =1
(t +1)(t +2)
and it is well defined assuminfint=0damping(t) = 1
Computation It is known that the cost of calculating TotalRank is the same as the cost of calculatingPageRank via the Power Method [BSV05] even though some more iterations are required to obtain thesame precision
34 General hyperbolic damping HyperRank
TotalRank is part of a more general family of weighting schemes for paths of different lengths that can beapproximated using
s(β) =1
Nζ(β)
infin
sumt=0
1
(t +1)β 1middotPt
Again this way of ranking follows the general scheme with damping function chosen as
damping(t) =1
ζ(β)(t +1)β
Here we are using Riemannrsquos zeta functionζ(β) = suminfint=1
1tβ for normalization and we needβ gt 1 for it
to converge Note that whenβ = 2 we get weights similar to those of TotalRank in which thet-th coefficientis 1(t +1)(t +2) whereas here it is 1ζ(2)(t +1)2
A meaningful choice forβ should be done considering the distribution of paths of different lengths in ascale-free graph A largeα in PageRank or a smallβ in HyperRank means increasing the effect of longerpaths in the score
Computation Figure2 shows an algorithm for approximating HyperRank Let us define a vector se-quenceR(t) as follows
R(0) =1
Nζ(β)
R(k+1) =(
k+1k+2
)βR(k)P
It is easy to see thatsuminfint=0R(k) = s(β) becauseR(k) = 1(N middotζ(β)(k+1)β))1middotPk this observation leads
to the algorithm Note that convergence speed is much slower than ordinary PageRank especially whenβis close to 1 the norm of thek-th summand being bound by 1(1+ 1k)β Interestingly enough thoughconvergence speed is reasonable ifβ is sufficiently large
7
Require N number of nodesv preference vectorβ damping parameter1 for i 1 N do Initialization2 S[i] larr R[i] larr v[i]ζ(β)3 end for4 klarr 05 while not convergeddo Iteration step6 klarr k + 17 Auxlarr 08 for i 1 N do Follow links in the graph9 for all j such that there is a link fromi to j do
10 Aux[j] larr Aux[j] + R[i]outdegree(i)11 end for12 end for13 for i 1 N do Add to ranking value14 R[i] larr Aux[i] times(k(k+1))β
15 S[i] larr S[i] + R[i]16 end for17 end while18 return S
Figure 2 An algorithm to compute general hyperbolic rank
35 An empirical damping
An empirical damping function would consider how much the value of an endorsement decreases by fol-lowing longer paths in the real web graph This cannot be known exactly but we can attempt to measureit indirectly Pages that link to each other are more similar than pages chosen at random [Dav00] evi-dence from topical crawlers [SPM05] shows that when doing breadth-first exploring the topic ldquodriftsrdquo asthe distance increases On the same line of though we propose to use the decrease of text similarity as anapproximation to an ldquoempiricalrdquo damping function
To find out which is the correlation between link-distance and similarity we performed the followingexperiment we considered a web graph corresponding to a partial snapshot of theuk domain and sampled200 nodes at random For each sampled node we followed links backwards to obtain nodes at a minimumdistance of 1 2 3 4 or 5 links Then we sampled 12000 pairs at each minimum distance at randomand computed their similarities with the original nodes Similarity was measured using the normalization ofTFIDF [BYRN99] without stemming nor stopwords removal
The resulting averages are shown inFigure3 with standard deviation error bars Text similarity clearlydecreases with distance and in some applications the empirical distribution of text similarity versus distancecould be used as an ldquoempiricalrdquo damping function Different measures of text similarity can yield differentdistributions for instance [WHAT04] uses the number of repeated words and phrases between pages andobtains a faster decrease in similarity
8
02
03
04
05
06
07
1 2 3 4 5
Ave
rage
text
sim
ilari
ty
Link distance
Figure 3 Link distance vs average text similarity
4 Comparing Damping Functions
A comparison of the damping functions described in the previous section is shown inFigure4 of coursehyperbolic damping functions decay asymptotically more slowly than exponential damping but notice thatfor short paths the latter may dominate the former in many cases
000
005
010
015
020
025
1 2 3 4 5 6 7
Wei
ght
Length of the path (t)
PageRank α=075PageRank α=050
TotalRankHyperRank β=2HyperRank β=3
Linear damping L=10
Figure 4 Weights given by the different damping functions for some values ofαandβ
In this section we aim at analyzing how similar are these functional rankings and how could we useone of the damping functions to approximate another with a suitable choice of parameters
9
41 Approximating TotalRank with PageRank
It has been shown experimentally that the rank correlation (Kendallrsquosτ) between TotalRank and PageRankis maximal whenα asymp 07 [Bol05] the maximum value forτ is over 095 meaning that for that specificchoice ofα PageRank and TotalRank induce almost equivalent ranking orders
We now want to approach the same problem in an analytic fashion more precisely we aim at study-ing the difference between TotalRank and PageRank by calculating the difference between their respectivedamping functions
dampingTotalRank(t) =1
(t +1)(t +2)dampingPageRank(α)(t) = (1minusα)αt
As they are normalized both damping functions have the same summation over the entire range oftOur approach is to consider the summation of their differences up to a maximum length for a path As thetwo functions are decreasing the difference in the first levels makes most of the difference in the rankingsIf ` is the maximum path length we are interested in we aim at minimizing this sum
`
sumt=0
(1
(t +1)(t +2)minus (1minusα)αt
)= α`+1minus 1
`+2
The minimum absolute value is 0 and it is obtained whenα is equal to
αlowast(`) =1
`+1radic
`+2= 1minus log`
`+O
(log2`
`2
)
Figure5 showsαlowast(`) as a function of Recall that for the World-Wide Web graph the average lengthof a path between two nodes when a path exists has been estimated in about 16 [BKM+00] or 19 [AJB99]but clearly today is over 20 Now in the range of path lengths between 15 and 20 the value ofαlowast(`)parameters that minimizes the difference between the exponentially decaying weights of PageRank and thehyperbolically decaying weights of TotalRank is roughly 085
Difference Optimal value
0506
0708
09
α0 5 10 15 20 25
Length
-02
-01
0
01
02
03
04
045
050
055
060
065
070
075
080
085
090
0 5 10 15 20 25
α that
min
imiz
es th
e di
ffer
ence
of
sum
of w
eigh
ts w
ith T
otal
Ran
k
Maximum length of the paths considered
Figure 5 Bestαlowast(`) for minimizing the difference of the sum of weights betweenPageRank and TotalRank
10
42 Approximating HyperRank with PageRank
Now we want to approximate the weights of
s(β) =1
Nζ(β)
infin
sumt=0
1
(t +1)β Pt
using the weights of
r(α) =1minusα
N
infin
sumt=0
αtPt
and we proceed again by considering paths up to a certain length
`
sumt=0
(1
ζ(β)(t +1)β minus (1minusα)αt
)The minimum can be zero and it is attained at
αlowast(`β) =
radic1minus 1
ζ(β)
`
sumt=0
1
(t +1)β
Theα that minimizes the difference of weights for different values ofβ and of the maximum path lengths` is shown inFigure6 In the case ofβ = 2 for instance for path lengths up to 10 to 20 the bestα is between075 and 085
α
15202530β 5 10 15 20 25
Length
03
04
05
06
07
08
09
1
00
01
02
03
04
05
06
07
08
09
10
1 15 2 25 3 35 4
α th
at m
inim
izes
the
diff
eren
ce o
fsu
m o
f wei
ghts
Exponent β
Max path length=25Max path length=20Max path length=15Max path length=10
Max path length=5
Figure 6 Bestα for minimizing the difference of the sum of weights betweenPageRank and HyperRank with exponentβ for various path lengths
11
43 Approximating PageRank with LinearRank
For approximating the damping function of PageRank with the damping function of LinearRank we con-sider the summation of the differences up to a certain path length If`le L
`
sumt=0
((1minusα)αt minus 2(Lminus t)
L(L+1)
)And if ` gt L
Lminus1
sumt=0
((1minusα)αt minus 2(Lminus t)
L(L+1)
)+
`
sumt=L
(1minusα)αt
We will assume thatle L so the evaluation of the difference between the two rankings is done in an areain which both rankings have non-zero values TheL that minimizes the difference for a given combinationof α and` is
Llowast(α `) = `+2`α`+1 +α`+1 +1+
radic(1+α`+1)2 +4`(`+2)α`+1
2(1minusα`+1)
= `+1+O(`α(`+1)2
)and we have plotted it for different values ofα and` in Figure7
L
0405
0607
0809
α5 10 15 20 25
Length
5
10
15
20
25
30
5
10
15
20
25
30
35
40
05 06 07 08 09
L th
at m
inim
izes
the
diff
eren
ce o
fsu
m o
f wei
ghts
Exponent α
Max path length=25Max path length=20Max path length=15Max path length=10Max path length=5
Figure 7 BestL for minimizing the difference of the sum of weights betweenLinearRank and LinearRank for various path lengths
12
5 Parameters for the Damping Functions
For our experiments we used several snapshots from the Web including theuk it andeuint domainsFor comparison we also considered a synthetic scale-free network produced according to the evolving modeldescribed by Kumaret al [KRR+00] (a combination of preferential attachment and random links) with theparameters suggested by Panduranganet al [PRU02] As far as the latter is concerned in the generatedgraph the exponents for the power-law in the center part of the distributions are -21 for in-degree andPageRank and -27 for out-degree we generated a 100000-nodes graph without disconnected nodes
In this section we study the behavior of the ranking functions for varying values of their parameters
51 Characteristic path lengths
In scale-free networks the distances between pairs of nodes follow a Gaussian distribution [AJB99] (theaverage is not given in their paper) Analytic estimations for the average distance of a graph of scale-freenetwork ofn nodes include
bull O(log(n)) [WS98]
bull O(log(n) log(np)) in sparse graphs withp links [CL01]
bull 1+ log(nz1) log(z2z1) wherez1 is the average indegree andz2 is the average number of nodes atdistance 2 [NSW01] and
bull O(log(n) log(log(n))) [BR04]
We did the following experiment starting from a node picked at random we followed the links back-wards and counted the number of nodes at different distancesFigure8 plots the average distances foundwhich appear to be growing (sub)logarithmically with the size of the graphFigure9 shows the distributionobtained in each sample For this experiment we are not counting the pages without in-links
3456789
10111213141516
105 106 107 108
Ave
rage
dis
tanc
e
Number of nodes (N)
Figure 8 Average distances versus number of nodes
If graphs of different sizes show different path lengths what is the effect of this in the ranking calcu-lation Letrsquos suppose that for a graph withN1 nodes it is found by experimental or analytic means that agood parameter for PageRank isαlowast1 Now we would like to have a good parameterαlowast2 for a graph with thesame properties except that the size of the new graph isN2 lt N1
13
it Web graph uk Web graph40 mill nodes 18 mill nodes
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
Avg distance 149 Avg distance 148
euint Web graph Synthetic graph800000 nodes 100000 nodes
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
Avg distance 100 Avg distance 42
Figure 9 Distribution of the average number of nodes at a certain distance froma given node
One possible approach consistent with what we have done so far is to consider that the sum of theweights up to the average path lengths of the graphs (L1 L2) have to be similar for both rankings to behavein a similar way If we take this approach the solution is
1minus (αlowast1)L1+1 = 1minus (αlowast2)
L2+1
αlowast2 = (αlowast1)L1+1L2+1
αlowast2 asymp (αlowast1)log(N1)log(N2)
An example that can be used in practice is the following letrsquos consider a web graph withN1 = 115times109
pages (the size of the full Web estimated by [GS05]) and another graph with onlyN2 = 50times106 pages (thesize of the Web of a large country) the second graph is roughly 3 orders of magnitude smaller
If it is shown empirically thatαlowast1 = 085 is a good value for the PageRank parameter for the whole Webthenαlowast2 = 081 should have a similar behavior in the 50-million page set which is natural as the path lengthsare shorter If the subset of web pages were even smaller for instanceN2 = 106 pages (the size of the webof a large organization) thenαlowast2 = 076
14
52 Damping parameters and in-degree
In this section we are using data from theuk Web graph and a 8500-nodes synthetic graph with the sameindegree and outdegree distribution than the previous synthetic graph We first measured the variance ofthe values from the ranking function as we consider that a high variance is good in a ranking functionas the relative values differ more We also measured the relationship between the ranking function andin-degree for different values of the parameters in terms of covariance correlation coefficient and rankingorders (Kendallrsquosτ) The results are shown inFigure10
Variance Covariance Correlation coefficient Kendallrsquosτ
000e+00200e-13400e-13600e-13800e-13100e-12120e-12140e-12160e-12180e-12200e-12
01 02 03 04 05 06 07 08 09
Var
ianc
e of
Pag
eRan
k
Damping factor α
00 times 100
50 times 10-5
10 times 10-4
15 times 10-4
20 times 10-4
25 times 10-4
30 times 10-4
35 times 10-4
01 02 03 04 05 06 07 08 09
Cov
aria
nce
betw
een
Page
Ran
k an
d in
-deg
ree
Damping factor α
07000710072007300740075007600770078007900800
01 02 03 04 05 06 07 08 09
Corr
elat
ion
coef
ficie
nt w
ith in
-deg
ree
Damping factor α
042
044
046
048
050
052
054
056
01 02 03 04 05 06 07 08 09Ken
dallrsquo
s τ b
etw
een
Page
Rank
and
in-d
egre
e
Damping factor α
005
010
015
020
025
030
035
040
045
050
055
01 02 03 04 05 06 07 08 09
Var
ianc
e of
Pag
eRan
k
Damping factor α
0 times 100
1 times 10-3
2 times 10-3
3 times 10-3
4 times 10-3
5 times 10-3
6 times 10-3
01 02 03 04 05 06 07 08 09
Cov
aria
nce
betw
een
Page
Ran
k an
d in
-deg
ree
Damping factor α
0995
0996
0997
0998
0999
1000
01 02 03 04 05 06 07 08 09
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor α
079
080
081
082
083
084
085
086
087
01 02 03 04 05 06 07 08 09Ken
dallrsquo
s τ b
etw
een
Page
Rank
and
in-d
egre
e
Damping factor α
Figure 10 Variance of PageRank and its relationship with indegree in terms ofcovariance correlation coefficient and Kendallrsquosτ coefficient for varying valuesof the parameterα Top uk web graph bottom synthetic graph
The variance is higher asα increases In regard to the relationship with indegree for company homepages it has been shown that the logarithm of the in-degree is correlated with PageRank [UCH03] Ourresults are consistent with this observation The covariance monotonically increases withα both in the caseof the synthetic graph and in the web graph Not surprisingly using the generative model the correlationsare higher We observe a maximum correlation atα = 07 in the synthetic graph and atα = 05 in the webgraph We also notice that the correlation drops significativelly asα increases because a largeα means thatlonger paths have an effect in the calculation note however that this phenomenon does not significantlyimpact on the correlation coefficient that is still very large
A high correlation between PageRank and in-degree is bad from the point of view of a search enginebecause it makes link-spam easier In particular as the correlation coefficient is higher in theuk web graphnear 05 if we chooseα close to this value we are helping link spammers Note however that a highcorrelation was foreseeable because as shown in[CGS04] even approximating PageRank with just only 1level of links gets 70 of accuracy
The behavior of the Kendallrsquosτ coefficient which measures the similarity between ranking orders isthe opposite than the one observed in the real graph This also happens for HyperRank as inFigure11we made the same measurements for this functional ranking and the results were consistent The graphseems inverted because a low value ofβ has the same effect as a high value ofα longer paths have moreimportance in the calculation
The differences in the behavior of the ranking order in the synthetic graph might be explained by thefact that the generative model we are using does not capture some properties that might be relevant for theranking order under a functional ranking such as the clustering coefficient Also the synthetic graph is
15
Variance Covariance Correlation coefficient Kendallrsquosτ
000e+00
100e-12
200e-12
300e-12
400e-12
500e-12
600e-12
700e-12
800e-12
900e-12
1 15 2 25 3 35 4
Var
ianc
e of
Gen
eral
Hyp
erbo
lic R
ank
Damping factor β
000e+00
100e-04
200e-04
300e-04
400e-04
500e-04
600e-04
700e-04
800e-04
1 15 2 25 3 35 4Cov
aria
nce
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
0730
0740
0750
0760
0770
0780
0790
0800
1 15 2 25 3 35 4
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor β
04600
04700
04800
04900
05000
05100
05200
05300
1 15 2 25 3
Ken
dallrsquo
s τ
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
000e+00
200e-05
400e-05
600e-05
800e-05
100e-04
120e-04
140e-04
160e-04
1 15 2 25 3 35 4
Var
ianc
e of
Gen
eral
Hyp
erbo
lic R
ank
Damping factor β
000e+00
200e-03
400e-03
600e-03
800e-03
100e-02
120e-02
1 15 2 25 3 35 4Cov
aria
nce
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
0997
0997
0998
0998
0999
0999
1 15 2 25 3 35 4
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor β
0815008200082500830008350084000845008500085500860008650
1 15 2 25 3 35 4
Ken
dallrsquo
s τ b
etw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
Figure 11 Variance of general hyperbolic rank for some values ofβ and itsrelation with in-degree Experiments have been performed on theuk graph (top)and synthetic graph (bottom)
assortative (highly linked pages are mostly linked to other highly linked pages) while the real web graph isdisassortative (most of the neighbours of highly linked pages have small indegree)
53 Experimental comparison of ranking orders
In this section we present experimental results about the similarity between the ranking orders induced bysome of the functional rankings discussed in the previous sections To perform the comparison we usedKendallrsquosτ and data from the U K Web graphFigure12shows how PageRank compares with HyperRankfor various pairs ofα andβ In the limit αβrarr 1 both rankings are equivalent and they remain similar in alarge region of the parameter space
α
β
07
08
09
1
τ ge 095
τ
02 03
04 05
06 07
08 09
2 25
3 35
4
01
02
03
04
05
06
07
08
09
15 2 25 3 35 4
α th
at m
axim
izes
Ken
dallrsquo
s τ
Exponent β
Actual optimumPredicted optimum with length=5
Figure 12 Comparison (using Kendallrsquosτ) between PageRank and HyperRankwith various damping parameters in the UK web graph The optimum predictedin the analysis with = 5 is very close to the real one
In the figure we can see that the rankings obtained with HyperRank and PageRank can be almostequivalent (Kendallrsquosτ ge 095) moreover the analysis shown insection42 considering only paths of
16
lengths less than 5 provides a very good approximation for the optimum combination of parameters Thismeans that in fact the difference in the damping functions in the first few levels is crucial
The exponentsβ required for giving a good approximation of PageRank are very small whenα ge 07limiting the practical applicability of HyperRank as its convergence is not faster than the one of PageRank
As far as LinearRank and PageRank are concerned long paths and largeα should be considered toobtain a sufficiently similar ranking as shown inFigure13
05 06
07 08
09
5 10
15 20
25
080085090095100
τ
τ ge 095
αL
5
10
15
20
25
05 06 07 08 09L
that
max
imiz
es K
enda
llrsquos
τExponent α
Actual optimumPredicted optimum with length=5
Figure 13 Comparison (using Kendallrsquosτ) between PageRank and LinearRankin the UK web graph with various damping parameters Again the predictedoptimum with` = 5 is very close to the actual optimum
The predicted optimum given insection43 with ` = 5 this is considering only the summation of thedifferences between both damping functions up to paths of length 5 is very close to what was obtained inpractice Forα = 08 calculating LinearRank withL = 10 (which means the same number of iterations)givesτ ge 098 for α = 09 calculating LinearRank withL = 15 also givesτ ge 098 In both cases theranking order of PageRank is approximated by the ranking order of LinearRank with very high precision
6 Conclusions
In this paper we have defined a broad class of link-based ranking algorithms based on the contribution ofdamping factors along all different paths reaching a page We introduce four particular damping decays lin-ear exponential quadratic hyperbolic and general hyperbolic where exponential is equivalent to PageRankand quadratic hyperbolic to TotalRank
We studied the differences and similarities among these ranking algorithms and we have found that
bull Functional rankings using different damping functions can provide similar orderings if the parametersare chosen carefully
bull LinearRank can be used for calculating a ranking that is as good as PageRank but with a fixed andsmaller number of iterations
bull The parameters for the damping functions depend on the characteristic of path lengths in the graphwhich is known to grow sub-logarithmically on the size of the graph
More work is needed to find other damping functions that compute rankings similar to PageRank butare easier and faster to compute We use global ranking similarity but another measure could be the ranking
17
similarity in the top 20 results of real queries In this setting our results can change so future work willinclude this variation
Because of their high cost link-based ranking methods that involve iterative calculations at query timeare probably not used by large-scale search engines at this moment but the functional ranking with lineardamping we have presented can provide a good approximation with few iterations Also the approach wehave presented could be also applied to multivalued ranking functions such as HITS [Kle99] and topic-sensitive PageRank [Hav02] to obtain for instance a method for approximating the hubs and authorityscores using less iterations and a linear damping function
Our approach also helps to understand how easy or difficult is to collude many pages to modify theranking of a given page Clearly there are many different factors path lengths damping function branchingdegrees and number of colluded pages The graph structure of the collusion will affect those factors and weplan to analyze them
Acknowledgements
We would like to thank Daniel Fogaras for a valuable discussion about TotalRank that motivated part of thisresearch
References
[AJB99] Reka Albert Hawoong Jeong and Albert L Barabasi Diameter of the world wide webNature 401130ndash131 1999
[BH98] Krishna Bharat and Monika R HenzingerImproved algorithms for topic distillation in ahyperlinked environment In Proceedings of the 21st Annual International ACM SIGIR Con-ference on Research and Development in Information Retrieval pages 104ndash111 MelbourneAustralia August 1998 ACM Press New York
[BKM +00] Andrei Broder Ravi Kumar Farzin Maghoul Prabhakar Raghavan Sridhar RajagopalanRaymie Stata Andrew Tomkins and Janet WienerGraph structure in the web Experimentsand models In Proceedings of the Ninth Conference on World Wide Web pages 309ndash320Amsterdam Netherlands May 2000 ACM Press
[Bol05] Paolo BoldiTotalrank ranking without damping In Poster proceedings of the 14th interna-tional conference on World Wide Web pages 898ndash899 Chiba Japan 2005 ACM Press
[BR04] Bela Bollobas and Oliver RiordanThe diameter of a scale-free random graph Combinator-ica 24(1)5ndash34 2004
[Bri06] Michael Brinkmeier PageRank revisited ACM Transaction on Internet Technologies6(3)257ndash279 2006
[BSV05] Paolo Boldi Massimo Santini and Sebastiano VignaPagerank as a function of the dampingfactor In Proceedings of the 14th international conference on World Wide Web pages 557ndash566 Chiba Japan 2005 ACM Press
[BYD04] Ricardo Baeza-Yates and Emilio DavisWeb page ranking using link attributes In Alternatetrack papers amp posters of the 13th international conference on World Wide Web pages 328ndash329 New York NY USA 2004 ACM Press
18
[BYRN99] Ricardo Baeza-Yates and Berthier Ribeiro-NetoModern Information Retrieval AddisonWesley May 1999
[CGS04] Yen-Yu Chen Qingqing Gan and Torsten SuelLocal methods for estimating pagerank val-ues In Proceedings of the thirteenth ACM conference on Information and knowledge man-agement (CIKM) pages 381ndash389 New York NY USA 2004 ACM Press
[CL01] Fan Chung and Linyuan LuThe diameter of random sparse graphs Adv Appl Math26257ndash279 2001
[Dav00] Brian D DavisonTopical locality in the web In Proceedings of the 23rd annual internationalACM SIGIR conference on research and development in information retrieval pages 272ndash279Athens Greece 2000 ACM Press
[GG04] Gene H Golub and Chen GreifArnoldi-type algorithms for computing stationary distributionvectors with application to pagerank Technical Report SCCM-04-15 Stanford University2004
[GS05] Antonio Gulli and Alessio SignoriniThe indexable Web is more than 115 billion pages InPoster proceedings of the 14th international conference on World Wide Web pages 902ndash903Chiba Japan 2005 ACM Press
[Hav99] Taher HaveliwalaEfficient computation of pagerank Technical report Stanford University1999
[Hav02] Taher H HaveliwalaTopic-sensitive pagerank In Proceedings of the Eleventh World WideWeb Conference pages 517ndash526 Honolulu Hawaii USA May 2002 ACM Press
[HG98] Stephanie W Haas and Erika S GramsPage and link classifications connecting diverseresources In DL rsquo98 Proceedings of the third ACM conference on Digital libraries pages99ndash107 New York NY USA 1998 ACM Press
[HK03a] Taher Haveliwala and Sepandar KamvarThe condition number of the pagerank problemTechnical Report 36 Stanford University 2003
[HK03b] Taher Haveliwala and Sepandar KamvarThe second eigenvalue of the google matrix Tech-nical Report 20 Stanford University 2003
[JM98] W-K Joo and S H Myaeng Improving retrieval effectiveness with hyperlink informationIn Proceedings of International Workshop on Information Retrieval with Asian Languages(IRAL) Singapore October 1998
[KHMG03a] S Kamvar T Haveliwala C Manning and G GolubExploiting the block structure of theweb for computing pagerank 2003
[KHMG03b] Sepandar D Kamvar Taher H Haveliwala Christopher D Manning and Gene H GolubExtrapolation methods for accelerating pagerank computations In Proceedings of the twelfthinternational conference on World Wide Web pages 261ndash270 ACM Press 2003
[Kle99] Jon M KleinbergAuthoritative sources in a hyperlinked environment Journal of the ACM46(5)604ndash632 1999
19
[KRR+00] R Kumar P Raghavan S Rajagopalan D Sivakumar A Tomkins and E UpfalStochasticmodels for the web graph In Proceedings of the 41st Annual Symposium on Foundations ofComputer Science (FOCS) pages 57ndash65 Redondo Beach CA USA 2000 IEEE CS Press
[LGZ04] Chris P Lee Gene H Golub and Stefanos A ZeniosA fast two-stage algorithm for com-puting pagerank and its extensions Technical report Stanford University 2004
[Li98] Yanhong LiToward a qualitative search engine IEEE Internet Computing July 1998
[Lif00] Maxim LifantsevVoting model for ranking Web pages In Peter Graham and MuthucumaruMaheswaran editorsProceedings of the International Conference on Internet Computingpages 143ndash148 Las Vegas Nevada USA June 2000 CSREA Press
[Mar97] Massimo MarchioriThe quest for correct information of the Web hyper search engines InProc of the sixth international conference on the Web Santa Clara USA April 1997
[NSW01] M E Newman S H Strogatz and D J WattsRandom graphs with arbitrary degree distri-butions and their applicationsPhys Rev E Stat Nonlin Soft Matter Phys 64(2 Pt 2) August2001
[PBMW98] Lawrence Page Sergey Brin Rajeev Motwani and Terry WinogradThe PageRank citationranking bringing order to the Web Technical report Stanford Digital Library TechnologiesProject 1998
[PRU02] Gopal Pandurangan Prabhakara Raghavan and Eli UpfalUsing Pagerank to characterizeWeb structure In Proceedings of the 8th Annual International Computing and CombinatoricsConference (COCOON) volume 2387 ofLecture Notes in Computer Science pages 330ndash390Singapore August 2002 Springer
[SPM05] Padmini Srinivasan Gautam Pant and Filippo MenczerA general evaluation framework fortopical crawlers Information Retrieval 8(3)417ndash447 2005
[UCH03] Trystan Upstill Nick Craswell and David HawkingPredicting fame and fortune Page-rank or indegree In Proceedings of the Australasian Document Computing SymposiumADCS2003 pages 31ndash40 Canberra Australia December 2003
[WHAT04] Fang Wu Bernardo A Huberman Lada A Adamic and Joshua R TylerInformation flow insocial groups Physica A Statistical and Theoretical Physics 337(1-2)327ndash335 June 2004
[WS98] Duncan J Watts and Steven H StrogatzCollective dynamics of small-world networksNa-ture 393(6684)440ndash442 June 1998
20
- Introduction
- Functional Rankings
- Damping Functions
-
- Linear damping
- Exponential damping PageRank
- Quadratic hyperbolic damping TotalRank
- General hyperbolic damping HyperRank
- An empirical damping
-
- Comparing Damping Functions
-
- Approximating TotalRank with PageRank
- Approximating HyperRank with PageRank
- Approximating PageRank with LinearRank
-
- Parameters for the Damping Functions
-
- Characteristic path lengths
- Damping parameters and in-degree
- Experimental comparison of ranking orders
-
- Conclusions
-
is used for weighting pages Note that the above matrix is ergodic (at least if every entry ofv is strictlypositive) so it has exactly one stationary distribution Even though most of our results can be easily restatedwith a non-uniform preference vectorv for the sake of clarity we shall only consider the uniform preference1N in the rest of the paper
As observed in [BSV05] the PageRank vectorr(α) can be written as
r(α) = (1minusα)infin
sumt=0
αt 1N
1Pt
Or in matricial form
r(α) = (1minusα)1N
1(I minusαP)minus1 ||αP||lt 1
There is an equivalent and actually very intriguing way of rewriting this formula mentioned in [NSW01]that leads to a conclusion similar to those of [Bri06] given a pathp= 〈x1x2 xk〉 in the graph we defineits branching contributionas follows
branching(p) =1
d1d2 middot middot middotdkminus1
whered j is the outdegree this is the number of outgoing arcs of nodex j Then the ranking of nodei according to PageRank is
r i(α) = sumpisinPath(minusi)
(1minusα)α|p|
Nbranching(p)
where Path(minus i) is the set of all paths into nodei and |p| is the length of pathp this is because(Pt)i j
contains the sum of the branching contributions of all paths of lengtht from i to j as one can easily show byinduction ont This way of expressing the PageRank of a node is interesting because it highlights the factthat the rank of a node is essentially obtained as a weighted sum of contributions coming from every pathentering into the node with weights that decay exponentially in the length of the path
A natural generalization of this idea consists in taking into consideration a rankingR of the generalform
R =infin
sumt=0
damping(t)1N
1middotPt
or equivalently
Ri = sumpisinPath(minusi)
damping(|p|) 1N
branching(p)
where the damping function is a suitable choice of weights We will refer to this form of ranking asfunc-tional ranking because it is parametrized by the damping function As we have seen generic PageRank isa functional ranking where the damping function
damping(t) = (1minusα)αt
decays exponentially fast
3
3 Damping Functions
In this section we show several functional rankings by describing their damping functions First we showwhich class of damping functions generate well-defined functional rankings
As shown in [Bri06 Corollary 24] for every pair of nodesi and j and for every lengtht
sumpisinPath(i j)|p|=t
branching(p)le 1
A more general property holds
Theorem 1 For every node i and every length t
sumpisinPath(iminus)|p|=t
branching(p) = 1
Proof By induction ont For t = 0 there is only one path fromi of length 0 and its branching is 1 Forthe inductive stepsumpisinPath(iminus)|p|=t+1branching(p) can be rewritten by observing that ifi has outdegreedi every path fromi of lengtht +1 is the concatenation ofi with a path of lengtht from an out-neighbor ofi
sumpisinPath(iminus)|p|=t+1
branching(p) = sumjirarr j
1di
sumpisinPath( jminus)|p|=t
branching(p) = sumjirarr j
1di
= 1
As a consequence to guarantee that the functional ranking is well-defined and normalized (ie that rankvalues sum to 1) we need
N
sumi=1
sumpisinPath(minusi)
damping(|p|) 1N
branching(p) = 1
that isinfin
sumt=0
damping(t)1N sum
pisinPath(minusminus)branching(p) = 1
UsingTheorem1 sumpisinPath(minusminus) branching(p) = N so the latter equality is equivalent to
infin
sumt=0
damping(t) = 1
Hence every choice of the damping function such thatsuminfint=0damping(t) = 1 yields a well-defined nor-
malized functional ranking Nonetheless not all choices are equivalent so we have to find out whichfunctions generate better rankings Since a direct link should be more valuable as a source of evidence thana distant link we focus on damping functions that are decreasing ont the length of the paths
31 Linear damping
Letrsquos start by considering a simple damping function such as
damping(t) =
2(Lminust)L(L+1) t lt L
0 t ge L
4
this is a damping function that decreases linearly with distance and reaches zero at distanceL The trivialcaseL = 1 gives a uniform ranking andL = 2 is ranking by indegree as in the latter case all paths of lengthge 2 are not considered
From the definition
R =infin
sumt=0
damping(t)vPt
=L
sumt=0
2(Lminus t)L(L+1)
vPt
=2
L(L+1)v
Lminus1
sumt=0
(Lminus t)Pt
=2
L(L+1)v(L(I minusP)minusP(I minusPL))((I minusP)2)minus1
provided that(I minusP)2 is not singularAn advantage of this type of ranking is that only the first few levels are considered so the computation
is fast and the number of iterations is fixed
Computation For computing this functional ranking we can define the following sequence
R(0) =2
L+1v
R(k+1) =(Lminuskminus1)
(Lminusk)RkP
The functional ranking with linear damping issumLminus1k=0 R(k) An algorithm for computing this ranking
shown inFigure1 arises directly from this summation
32 Exponential damping PageRank
As we already noted PageRank can be seen as a functional ranking where the damping function decaysexponentially
damping(t) = (1minusα)αt
As longer paths have less importance in the calculation of PageRank it could be approximated by usingonly a few levels of links In [CGS04] it is shown that by using only the nodes at distance 1 from a targetnode (equivalent to linear damping withL = 2) PageRank can be approximated with 30 of average errorUsing nodes at distance 2 the average error drops to 20 and at distance 3 to 10 After that there are nosignificant improvements by adding more levels and the cost (the number of nodes to be explored) is muchhigher
Computation Since PageRank is the principal eigenvector of the modified graph matrix it can be easilyapproximated by the iterative Power Method algorithm as suggested by Pageet al in their original pa-per [PBMW98] this iterative algorithm gives good approximations (both in norm and with respect to theinduced node order) in few iterations even though convergence speed and numerical stability decay whenαgets close to 1 [HK03b HK03a] Other methods to compute PageRank have been proposed some of them
5
Require L maximum path length N number of nodesv preference vector1 for i 1 N do Initialization2 S[i] larr R[i] larr 2v[i](L+1)3 end for4 for step 1 Lminus1 do Iteration step5 Auxlarr 06 for i 1 N do Follow links in the graph7 for all j such that there is a link fromi to j do8 Aux[j] larr Aux[j] + R[i]outdegree(i)9 end for
10 end for11 for i 1 N do Add to ranking value12 R[i] larr Aux[i] times (Lminusstep)
(Lminus(stepminus1))13 S[i] larr S[i] + R[i]14 end for15 end for16 return R
Figure 1 Algorithm for computing a functional ranking with linear damping
using techniques for the solution of systems of linear equations some other concentrating on some specificfeatures of the web as a graph that determine forms of locality in the computation of PageRank (see forexample [PBMW98 Hav99 GG04 LGZ04 KHMG03b KHMG03a])
33 Quadratic hyperbolic damping TotalRank
Recently a ranking method called TotalRank [Bol05] has been proposed The method aims at eliminatingthe necessity for an arbitrary parameter by integrating PageRank over the entire range ofα If r(α) is thevector of PageRank then TotalRank is defined as
T =int 1
0r(α)dα
T can be written as
int 1
0r(α)dα =
1N
infin
sumt=0
int 1
0(1minusα)αt1middotPtdα
=1N
infin
sumt=0
1(t +1)(t +2)
1middotPt
By using the definition of the logarithm of a matrix
ln(I minusP) =minusinfin
sumk=1
Pk
k
we can write TotalRank as
T = Pminus1(I +(I minusPminus1) ln(I minusP))
6
provided thatPminus1 is not singular andP 6= I TotalRank is as a weighted sum of the score associated with paths of varying lengths in which the
weights are hyperbolically decreasing on the lengths of the paths In other words TotalRank is a functionalranking with damping function
damping(t) =1
(t +1)(t +2)
and it is well defined assuminfint=0damping(t) = 1
Computation It is known that the cost of calculating TotalRank is the same as the cost of calculatingPageRank via the Power Method [BSV05] even though some more iterations are required to obtain thesame precision
34 General hyperbolic damping HyperRank
TotalRank is part of a more general family of weighting schemes for paths of different lengths that can beapproximated using
s(β) =1
Nζ(β)
infin
sumt=0
1
(t +1)β 1middotPt
Again this way of ranking follows the general scheme with damping function chosen as
damping(t) =1
ζ(β)(t +1)β
Here we are using Riemannrsquos zeta functionζ(β) = suminfint=1
1tβ for normalization and we needβ gt 1 for it
to converge Note that whenβ = 2 we get weights similar to those of TotalRank in which thet-th coefficientis 1(t +1)(t +2) whereas here it is 1ζ(2)(t +1)2
A meaningful choice forβ should be done considering the distribution of paths of different lengths in ascale-free graph A largeα in PageRank or a smallβ in HyperRank means increasing the effect of longerpaths in the score
Computation Figure2 shows an algorithm for approximating HyperRank Let us define a vector se-quenceR(t) as follows
R(0) =1
Nζ(β)
R(k+1) =(
k+1k+2
)βR(k)P
It is easy to see thatsuminfint=0R(k) = s(β) becauseR(k) = 1(N middotζ(β)(k+1)β))1middotPk this observation leads
to the algorithm Note that convergence speed is much slower than ordinary PageRank especially whenβis close to 1 the norm of thek-th summand being bound by 1(1+ 1k)β Interestingly enough thoughconvergence speed is reasonable ifβ is sufficiently large
7
Require N number of nodesv preference vectorβ damping parameter1 for i 1 N do Initialization2 S[i] larr R[i] larr v[i]ζ(β)3 end for4 klarr 05 while not convergeddo Iteration step6 klarr k + 17 Auxlarr 08 for i 1 N do Follow links in the graph9 for all j such that there is a link fromi to j do
10 Aux[j] larr Aux[j] + R[i]outdegree(i)11 end for12 end for13 for i 1 N do Add to ranking value14 R[i] larr Aux[i] times(k(k+1))β
15 S[i] larr S[i] + R[i]16 end for17 end while18 return S
Figure 2 An algorithm to compute general hyperbolic rank
35 An empirical damping
An empirical damping function would consider how much the value of an endorsement decreases by fol-lowing longer paths in the real web graph This cannot be known exactly but we can attempt to measureit indirectly Pages that link to each other are more similar than pages chosen at random [Dav00] evi-dence from topical crawlers [SPM05] shows that when doing breadth-first exploring the topic ldquodriftsrdquo asthe distance increases On the same line of though we propose to use the decrease of text similarity as anapproximation to an ldquoempiricalrdquo damping function
To find out which is the correlation between link-distance and similarity we performed the followingexperiment we considered a web graph corresponding to a partial snapshot of theuk domain and sampled200 nodes at random For each sampled node we followed links backwards to obtain nodes at a minimumdistance of 1 2 3 4 or 5 links Then we sampled 12000 pairs at each minimum distance at randomand computed their similarities with the original nodes Similarity was measured using the normalization ofTFIDF [BYRN99] without stemming nor stopwords removal
The resulting averages are shown inFigure3 with standard deviation error bars Text similarity clearlydecreases with distance and in some applications the empirical distribution of text similarity versus distancecould be used as an ldquoempiricalrdquo damping function Different measures of text similarity can yield differentdistributions for instance [WHAT04] uses the number of repeated words and phrases between pages andobtains a faster decrease in similarity
8
02
03
04
05
06
07
1 2 3 4 5
Ave
rage
text
sim
ilari
ty
Link distance
Figure 3 Link distance vs average text similarity
4 Comparing Damping Functions
A comparison of the damping functions described in the previous section is shown inFigure4 of coursehyperbolic damping functions decay asymptotically more slowly than exponential damping but notice thatfor short paths the latter may dominate the former in many cases
000
005
010
015
020
025
1 2 3 4 5 6 7
Wei
ght
Length of the path (t)
PageRank α=075PageRank α=050
TotalRankHyperRank β=2HyperRank β=3
Linear damping L=10
Figure 4 Weights given by the different damping functions for some values ofαandβ
In this section we aim at analyzing how similar are these functional rankings and how could we useone of the damping functions to approximate another with a suitable choice of parameters
9
41 Approximating TotalRank with PageRank
It has been shown experimentally that the rank correlation (Kendallrsquosτ) between TotalRank and PageRankis maximal whenα asymp 07 [Bol05] the maximum value forτ is over 095 meaning that for that specificchoice ofα PageRank and TotalRank induce almost equivalent ranking orders
We now want to approach the same problem in an analytic fashion more precisely we aim at study-ing the difference between TotalRank and PageRank by calculating the difference between their respectivedamping functions
dampingTotalRank(t) =1
(t +1)(t +2)dampingPageRank(α)(t) = (1minusα)αt
As they are normalized both damping functions have the same summation over the entire range oftOur approach is to consider the summation of their differences up to a maximum length for a path As thetwo functions are decreasing the difference in the first levels makes most of the difference in the rankingsIf ` is the maximum path length we are interested in we aim at minimizing this sum
`
sumt=0
(1
(t +1)(t +2)minus (1minusα)αt
)= α`+1minus 1
`+2
The minimum absolute value is 0 and it is obtained whenα is equal to
αlowast(`) =1
`+1radic
`+2= 1minus log`
`+O
(log2`
`2
)
Figure5 showsαlowast(`) as a function of Recall that for the World-Wide Web graph the average lengthof a path between two nodes when a path exists has been estimated in about 16 [BKM+00] or 19 [AJB99]but clearly today is over 20 Now in the range of path lengths between 15 and 20 the value ofαlowast(`)parameters that minimizes the difference between the exponentially decaying weights of PageRank and thehyperbolically decaying weights of TotalRank is roughly 085
Difference Optimal value
0506
0708
09
α0 5 10 15 20 25
Length
-02
-01
0
01
02
03
04
045
050
055
060
065
070
075
080
085
090
0 5 10 15 20 25
α that
min
imiz
es th
e di
ffer
ence
of
sum
of w
eigh
ts w
ith T
otal
Ran
k
Maximum length of the paths considered
Figure 5 Bestαlowast(`) for minimizing the difference of the sum of weights betweenPageRank and TotalRank
10
42 Approximating HyperRank with PageRank
Now we want to approximate the weights of
s(β) =1
Nζ(β)
infin
sumt=0
1
(t +1)β Pt
using the weights of
r(α) =1minusα
N
infin
sumt=0
αtPt
and we proceed again by considering paths up to a certain length
`
sumt=0
(1
ζ(β)(t +1)β minus (1minusα)αt
)The minimum can be zero and it is attained at
αlowast(`β) =
radic1minus 1
ζ(β)
`
sumt=0
1
(t +1)β
Theα that minimizes the difference of weights for different values ofβ and of the maximum path lengths` is shown inFigure6 In the case ofβ = 2 for instance for path lengths up to 10 to 20 the bestα is between075 and 085
α
15202530β 5 10 15 20 25
Length
03
04
05
06
07
08
09
1
00
01
02
03
04
05
06
07
08
09
10
1 15 2 25 3 35 4
α th
at m
inim
izes
the
diff
eren
ce o
fsu
m o
f wei
ghts
Exponent β
Max path length=25Max path length=20Max path length=15Max path length=10
Max path length=5
Figure 6 Bestα for minimizing the difference of the sum of weights betweenPageRank and HyperRank with exponentβ for various path lengths
11
43 Approximating PageRank with LinearRank
For approximating the damping function of PageRank with the damping function of LinearRank we con-sider the summation of the differences up to a certain path length If`le L
`
sumt=0
((1minusα)αt minus 2(Lminus t)
L(L+1)
)And if ` gt L
Lminus1
sumt=0
((1minusα)αt minus 2(Lminus t)
L(L+1)
)+
`
sumt=L
(1minusα)αt
We will assume thatle L so the evaluation of the difference between the two rankings is done in an areain which both rankings have non-zero values TheL that minimizes the difference for a given combinationof α and` is
Llowast(α `) = `+2`α`+1 +α`+1 +1+
radic(1+α`+1)2 +4`(`+2)α`+1
2(1minusα`+1)
= `+1+O(`α(`+1)2
)and we have plotted it for different values ofα and` in Figure7
L
0405
0607
0809
α5 10 15 20 25
Length
5
10
15
20
25
30
5
10
15
20
25
30
35
40
05 06 07 08 09
L th
at m
inim
izes
the
diff
eren
ce o
fsu
m o
f wei
ghts
Exponent α
Max path length=25Max path length=20Max path length=15Max path length=10Max path length=5
Figure 7 BestL for minimizing the difference of the sum of weights betweenLinearRank and LinearRank for various path lengths
12
5 Parameters for the Damping Functions
For our experiments we used several snapshots from the Web including theuk it andeuint domainsFor comparison we also considered a synthetic scale-free network produced according to the evolving modeldescribed by Kumaret al [KRR+00] (a combination of preferential attachment and random links) with theparameters suggested by Panduranganet al [PRU02] As far as the latter is concerned in the generatedgraph the exponents for the power-law in the center part of the distributions are -21 for in-degree andPageRank and -27 for out-degree we generated a 100000-nodes graph without disconnected nodes
In this section we study the behavior of the ranking functions for varying values of their parameters
51 Characteristic path lengths
In scale-free networks the distances between pairs of nodes follow a Gaussian distribution [AJB99] (theaverage is not given in their paper) Analytic estimations for the average distance of a graph of scale-freenetwork ofn nodes include
bull O(log(n)) [WS98]
bull O(log(n) log(np)) in sparse graphs withp links [CL01]
bull 1+ log(nz1) log(z2z1) wherez1 is the average indegree andz2 is the average number of nodes atdistance 2 [NSW01] and
bull O(log(n) log(log(n))) [BR04]
We did the following experiment starting from a node picked at random we followed the links back-wards and counted the number of nodes at different distancesFigure8 plots the average distances foundwhich appear to be growing (sub)logarithmically with the size of the graphFigure9 shows the distributionobtained in each sample For this experiment we are not counting the pages without in-links
3456789
10111213141516
105 106 107 108
Ave
rage
dis
tanc
e
Number of nodes (N)
Figure 8 Average distances versus number of nodes
If graphs of different sizes show different path lengths what is the effect of this in the ranking calcu-lation Letrsquos suppose that for a graph withN1 nodes it is found by experimental or analytic means that agood parameter for PageRank isαlowast1 Now we would like to have a good parameterαlowast2 for a graph with thesame properties except that the size of the new graph isN2 lt N1
13
it Web graph uk Web graph40 mill nodes 18 mill nodes
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
Avg distance 149 Avg distance 148
euint Web graph Synthetic graph800000 nodes 100000 nodes
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
Avg distance 100 Avg distance 42
Figure 9 Distribution of the average number of nodes at a certain distance froma given node
One possible approach consistent with what we have done so far is to consider that the sum of theweights up to the average path lengths of the graphs (L1 L2) have to be similar for both rankings to behavein a similar way If we take this approach the solution is
1minus (αlowast1)L1+1 = 1minus (αlowast2)
L2+1
αlowast2 = (αlowast1)L1+1L2+1
αlowast2 asymp (αlowast1)log(N1)log(N2)
An example that can be used in practice is the following letrsquos consider a web graph withN1 = 115times109
pages (the size of the full Web estimated by [GS05]) and another graph with onlyN2 = 50times106 pages (thesize of the Web of a large country) the second graph is roughly 3 orders of magnitude smaller
If it is shown empirically thatαlowast1 = 085 is a good value for the PageRank parameter for the whole Webthenαlowast2 = 081 should have a similar behavior in the 50-million page set which is natural as the path lengthsare shorter If the subset of web pages were even smaller for instanceN2 = 106 pages (the size of the webof a large organization) thenαlowast2 = 076
14
52 Damping parameters and in-degree
In this section we are using data from theuk Web graph and a 8500-nodes synthetic graph with the sameindegree and outdegree distribution than the previous synthetic graph We first measured the variance ofthe values from the ranking function as we consider that a high variance is good in a ranking functionas the relative values differ more We also measured the relationship between the ranking function andin-degree for different values of the parameters in terms of covariance correlation coefficient and rankingorders (Kendallrsquosτ) The results are shown inFigure10
Variance Covariance Correlation coefficient Kendallrsquosτ
000e+00200e-13400e-13600e-13800e-13100e-12120e-12140e-12160e-12180e-12200e-12
01 02 03 04 05 06 07 08 09
Var
ianc
e of
Pag
eRan
k
Damping factor α
00 times 100
50 times 10-5
10 times 10-4
15 times 10-4
20 times 10-4
25 times 10-4
30 times 10-4
35 times 10-4
01 02 03 04 05 06 07 08 09
Cov
aria
nce
betw
een
Page
Ran
k an
d in
-deg
ree
Damping factor α
07000710072007300740075007600770078007900800
01 02 03 04 05 06 07 08 09
Corr
elat
ion
coef
ficie
nt w
ith in
-deg
ree
Damping factor α
042
044
046
048
050
052
054
056
01 02 03 04 05 06 07 08 09Ken
dallrsquo
s τ b
etw
een
Page
Rank
and
in-d
egre
e
Damping factor α
005
010
015
020
025
030
035
040
045
050
055
01 02 03 04 05 06 07 08 09
Var
ianc
e of
Pag
eRan
k
Damping factor α
0 times 100
1 times 10-3
2 times 10-3
3 times 10-3
4 times 10-3
5 times 10-3
6 times 10-3
01 02 03 04 05 06 07 08 09
Cov
aria
nce
betw
een
Page
Ran
k an
d in
-deg
ree
Damping factor α
0995
0996
0997
0998
0999
1000
01 02 03 04 05 06 07 08 09
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor α
079
080
081
082
083
084
085
086
087
01 02 03 04 05 06 07 08 09Ken
dallrsquo
s τ b
etw
een
Page
Rank
and
in-d
egre
e
Damping factor α
Figure 10 Variance of PageRank and its relationship with indegree in terms ofcovariance correlation coefficient and Kendallrsquosτ coefficient for varying valuesof the parameterα Top uk web graph bottom synthetic graph
The variance is higher asα increases In regard to the relationship with indegree for company homepages it has been shown that the logarithm of the in-degree is correlated with PageRank [UCH03] Ourresults are consistent with this observation The covariance monotonically increases withα both in the caseof the synthetic graph and in the web graph Not surprisingly using the generative model the correlationsare higher We observe a maximum correlation atα = 07 in the synthetic graph and atα = 05 in the webgraph We also notice that the correlation drops significativelly asα increases because a largeα means thatlonger paths have an effect in the calculation note however that this phenomenon does not significantlyimpact on the correlation coefficient that is still very large
A high correlation between PageRank and in-degree is bad from the point of view of a search enginebecause it makes link-spam easier In particular as the correlation coefficient is higher in theuk web graphnear 05 if we chooseα close to this value we are helping link spammers Note however that a highcorrelation was foreseeable because as shown in[CGS04] even approximating PageRank with just only 1level of links gets 70 of accuracy
The behavior of the Kendallrsquosτ coefficient which measures the similarity between ranking orders isthe opposite than the one observed in the real graph This also happens for HyperRank as inFigure11we made the same measurements for this functional ranking and the results were consistent The graphseems inverted because a low value ofβ has the same effect as a high value ofα longer paths have moreimportance in the calculation
The differences in the behavior of the ranking order in the synthetic graph might be explained by thefact that the generative model we are using does not capture some properties that might be relevant for theranking order under a functional ranking such as the clustering coefficient Also the synthetic graph is
15
Variance Covariance Correlation coefficient Kendallrsquosτ
000e+00
100e-12
200e-12
300e-12
400e-12
500e-12
600e-12
700e-12
800e-12
900e-12
1 15 2 25 3 35 4
Var
ianc
e of
Gen
eral
Hyp
erbo
lic R
ank
Damping factor β
000e+00
100e-04
200e-04
300e-04
400e-04
500e-04
600e-04
700e-04
800e-04
1 15 2 25 3 35 4Cov
aria
nce
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
0730
0740
0750
0760
0770
0780
0790
0800
1 15 2 25 3 35 4
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor β
04600
04700
04800
04900
05000
05100
05200
05300
1 15 2 25 3
Ken
dallrsquo
s τ
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
000e+00
200e-05
400e-05
600e-05
800e-05
100e-04
120e-04
140e-04
160e-04
1 15 2 25 3 35 4
Var
ianc
e of
Gen
eral
Hyp
erbo
lic R
ank
Damping factor β
000e+00
200e-03
400e-03
600e-03
800e-03
100e-02
120e-02
1 15 2 25 3 35 4Cov
aria
nce
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
0997
0997
0998
0998
0999
0999
1 15 2 25 3 35 4
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor β
0815008200082500830008350084000845008500085500860008650
1 15 2 25 3 35 4
Ken
dallrsquo
s τ b
etw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
Figure 11 Variance of general hyperbolic rank for some values ofβ and itsrelation with in-degree Experiments have been performed on theuk graph (top)and synthetic graph (bottom)
assortative (highly linked pages are mostly linked to other highly linked pages) while the real web graph isdisassortative (most of the neighbours of highly linked pages have small indegree)
53 Experimental comparison of ranking orders
In this section we present experimental results about the similarity between the ranking orders induced bysome of the functional rankings discussed in the previous sections To perform the comparison we usedKendallrsquosτ and data from the U K Web graphFigure12shows how PageRank compares with HyperRankfor various pairs ofα andβ In the limit αβrarr 1 both rankings are equivalent and they remain similar in alarge region of the parameter space
α
β
07
08
09
1
τ ge 095
τ
02 03
04 05
06 07
08 09
2 25
3 35
4
01
02
03
04
05
06
07
08
09
15 2 25 3 35 4
α th
at m
axim
izes
Ken
dallrsquo
s τ
Exponent β
Actual optimumPredicted optimum with length=5
Figure 12 Comparison (using Kendallrsquosτ) between PageRank and HyperRankwith various damping parameters in the UK web graph The optimum predictedin the analysis with = 5 is very close to the real one
In the figure we can see that the rankings obtained with HyperRank and PageRank can be almostequivalent (Kendallrsquosτ ge 095) moreover the analysis shown insection42 considering only paths of
16
lengths less than 5 provides a very good approximation for the optimum combination of parameters Thismeans that in fact the difference in the damping functions in the first few levels is crucial
The exponentsβ required for giving a good approximation of PageRank are very small whenα ge 07limiting the practical applicability of HyperRank as its convergence is not faster than the one of PageRank
As far as LinearRank and PageRank are concerned long paths and largeα should be considered toobtain a sufficiently similar ranking as shown inFigure13
05 06
07 08
09
5 10
15 20
25
080085090095100
τ
τ ge 095
αL
5
10
15
20
25
05 06 07 08 09L
that
max
imiz
es K
enda
llrsquos
τExponent α
Actual optimumPredicted optimum with length=5
Figure 13 Comparison (using Kendallrsquosτ) between PageRank and LinearRankin the UK web graph with various damping parameters Again the predictedoptimum with` = 5 is very close to the actual optimum
The predicted optimum given insection43 with ` = 5 this is considering only the summation of thedifferences between both damping functions up to paths of length 5 is very close to what was obtained inpractice Forα = 08 calculating LinearRank withL = 10 (which means the same number of iterations)givesτ ge 098 for α = 09 calculating LinearRank withL = 15 also givesτ ge 098 In both cases theranking order of PageRank is approximated by the ranking order of LinearRank with very high precision
6 Conclusions
In this paper we have defined a broad class of link-based ranking algorithms based on the contribution ofdamping factors along all different paths reaching a page We introduce four particular damping decays lin-ear exponential quadratic hyperbolic and general hyperbolic where exponential is equivalent to PageRankand quadratic hyperbolic to TotalRank
We studied the differences and similarities among these ranking algorithms and we have found that
bull Functional rankings using different damping functions can provide similar orderings if the parametersare chosen carefully
bull LinearRank can be used for calculating a ranking that is as good as PageRank but with a fixed andsmaller number of iterations
bull The parameters for the damping functions depend on the characteristic of path lengths in the graphwhich is known to grow sub-logarithmically on the size of the graph
More work is needed to find other damping functions that compute rankings similar to PageRank butare easier and faster to compute We use global ranking similarity but another measure could be the ranking
17
similarity in the top 20 results of real queries In this setting our results can change so future work willinclude this variation
Because of their high cost link-based ranking methods that involve iterative calculations at query timeare probably not used by large-scale search engines at this moment but the functional ranking with lineardamping we have presented can provide a good approximation with few iterations Also the approach wehave presented could be also applied to multivalued ranking functions such as HITS [Kle99] and topic-sensitive PageRank [Hav02] to obtain for instance a method for approximating the hubs and authorityscores using less iterations and a linear damping function
Our approach also helps to understand how easy or difficult is to collude many pages to modify theranking of a given page Clearly there are many different factors path lengths damping function branchingdegrees and number of colluded pages The graph structure of the collusion will affect those factors and weplan to analyze them
Acknowledgements
We would like to thank Daniel Fogaras for a valuable discussion about TotalRank that motivated part of thisresearch
References
[AJB99] Reka Albert Hawoong Jeong and Albert L Barabasi Diameter of the world wide webNature 401130ndash131 1999
[BH98] Krishna Bharat and Monika R HenzingerImproved algorithms for topic distillation in ahyperlinked environment In Proceedings of the 21st Annual International ACM SIGIR Con-ference on Research and Development in Information Retrieval pages 104ndash111 MelbourneAustralia August 1998 ACM Press New York
[BKM +00] Andrei Broder Ravi Kumar Farzin Maghoul Prabhakar Raghavan Sridhar RajagopalanRaymie Stata Andrew Tomkins and Janet WienerGraph structure in the web Experimentsand models In Proceedings of the Ninth Conference on World Wide Web pages 309ndash320Amsterdam Netherlands May 2000 ACM Press
[Bol05] Paolo BoldiTotalrank ranking without damping In Poster proceedings of the 14th interna-tional conference on World Wide Web pages 898ndash899 Chiba Japan 2005 ACM Press
[BR04] Bela Bollobas and Oliver RiordanThe diameter of a scale-free random graph Combinator-ica 24(1)5ndash34 2004
[Bri06] Michael Brinkmeier PageRank revisited ACM Transaction on Internet Technologies6(3)257ndash279 2006
[BSV05] Paolo Boldi Massimo Santini and Sebastiano VignaPagerank as a function of the dampingfactor In Proceedings of the 14th international conference on World Wide Web pages 557ndash566 Chiba Japan 2005 ACM Press
[BYD04] Ricardo Baeza-Yates and Emilio DavisWeb page ranking using link attributes In Alternatetrack papers amp posters of the 13th international conference on World Wide Web pages 328ndash329 New York NY USA 2004 ACM Press
18
[BYRN99] Ricardo Baeza-Yates and Berthier Ribeiro-NetoModern Information Retrieval AddisonWesley May 1999
[CGS04] Yen-Yu Chen Qingqing Gan and Torsten SuelLocal methods for estimating pagerank val-ues In Proceedings of the thirteenth ACM conference on Information and knowledge man-agement (CIKM) pages 381ndash389 New York NY USA 2004 ACM Press
[CL01] Fan Chung and Linyuan LuThe diameter of random sparse graphs Adv Appl Math26257ndash279 2001
[Dav00] Brian D DavisonTopical locality in the web In Proceedings of the 23rd annual internationalACM SIGIR conference on research and development in information retrieval pages 272ndash279Athens Greece 2000 ACM Press
[GG04] Gene H Golub and Chen GreifArnoldi-type algorithms for computing stationary distributionvectors with application to pagerank Technical Report SCCM-04-15 Stanford University2004
[GS05] Antonio Gulli and Alessio SignoriniThe indexable Web is more than 115 billion pages InPoster proceedings of the 14th international conference on World Wide Web pages 902ndash903Chiba Japan 2005 ACM Press
[Hav99] Taher HaveliwalaEfficient computation of pagerank Technical report Stanford University1999
[Hav02] Taher H HaveliwalaTopic-sensitive pagerank In Proceedings of the Eleventh World WideWeb Conference pages 517ndash526 Honolulu Hawaii USA May 2002 ACM Press
[HG98] Stephanie W Haas and Erika S GramsPage and link classifications connecting diverseresources In DL rsquo98 Proceedings of the third ACM conference on Digital libraries pages99ndash107 New York NY USA 1998 ACM Press
[HK03a] Taher Haveliwala and Sepandar KamvarThe condition number of the pagerank problemTechnical Report 36 Stanford University 2003
[HK03b] Taher Haveliwala and Sepandar KamvarThe second eigenvalue of the google matrix Tech-nical Report 20 Stanford University 2003
[JM98] W-K Joo and S H Myaeng Improving retrieval effectiveness with hyperlink informationIn Proceedings of International Workshop on Information Retrieval with Asian Languages(IRAL) Singapore October 1998
[KHMG03a] S Kamvar T Haveliwala C Manning and G GolubExploiting the block structure of theweb for computing pagerank 2003
[KHMG03b] Sepandar D Kamvar Taher H Haveliwala Christopher D Manning and Gene H GolubExtrapolation methods for accelerating pagerank computations In Proceedings of the twelfthinternational conference on World Wide Web pages 261ndash270 ACM Press 2003
[Kle99] Jon M KleinbergAuthoritative sources in a hyperlinked environment Journal of the ACM46(5)604ndash632 1999
19
[KRR+00] R Kumar P Raghavan S Rajagopalan D Sivakumar A Tomkins and E UpfalStochasticmodels for the web graph In Proceedings of the 41st Annual Symposium on Foundations ofComputer Science (FOCS) pages 57ndash65 Redondo Beach CA USA 2000 IEEE CS Press
[LGZ04] Chris P Lee Gene H Golub and Stefanos A ZeniosA fast two-stage algorithm for com-puting pagerank and its extensions Technical report Stanford University 2004
[Li98] Yanhong LiToward a qualitative search engine IEEE Internet Computing July 1998
[Lif00] Maxim LifantsevVoting model for ranking Web pages In Peter Graham and MuthucumaruMaheswaran editorsProceedings of the International Conference on Internet Computingpages 143ndash148 Las Vegas Nevada USA June 2000 CSREA Press
[Mar97] Massimo MarchioriThe quest for correct information of the Web hyper search engines InProc of the sixth international conference on the Web Santa Clara USA April 1997
[NSW01] M E Newman S H Strogatz and D J WattsRandom graphs with arbitrary degree distri-butions and their applicationsPhys Rev E Stat Nonlin Soft Matter Phys 64(2 Pt 2) August2001
[PBMW98] Lawrence Page Sergey Brin Rajeev Motwani and Terry WinogradThe PageRank citationranking bringing order to the Web Technical report Stanford Digital Library TechnologiesProject 1998
[PRU02] Gopal Pandurangan Prabhakara Raghavan and Eli UpfalUsing Pagerank to characterizeWeb structure In Proceedings of the 8th Annual International Computing and CombinatoricsConference (COCOON) volume 2387 ofLecture Notes in Computer Science pages 330ndash390Singapore August 2002 Springer
[SPM05] Padmini Srinivasan Gautam Pant and Filippo MenczerA general evaluation framework fortopical crawlers Information Retrieval 8(3)417ndash447 2005
[UCH03] Trystan Upstill Nick Craswell and David HawkingPredicting fame and fortune Page-rank or indegree In Proceedings of the Australasian Document Computing SymposiumADCS2003 pages 31ndash40 Canberra Australia December 2003
[WHAT04] Fang Wu Bernardo A Huberman Lada A Adamic and Joshua R TylerInformation flow insocial groups Physica A Statistical and Theoretical Physics 337(1-2)327ndash335 June 2004
[WS98] Duncan J Watts and Steven H StrogatzCollective dynamics of small-world networksNa-ture 393(6684)440ndash442 June 1998
20
- Introduction
- Functional Rankings
- Damping Functions
-
- Linear damping
- Exponential damping PageRank
- Quadratic hyperbolic damping TotalRank
- General hyperbolic damping HyperRank
- An empirical damping
-
- Comparing Damping Functions
-
- Approximating TotalRank with PageRank
- Approximating HyperRank with PageRank
- Approximating PageRank with LinearRank
-
- Parameters for the Damping Functions
-
- Characteristic path lengths
- Damping parameters and in-degree
- Experimental comparison of ranking orders
-
- Conclusions
-
3 Damping Functions
In this section we show several functional rankings by describing their damping functions First we showwhich class of damping functions generate well-defined functional rankings
As shown in [Bri06 Corollary 24] for every pair of nodesi and j and for every lengtht
sumpisinPath(i j)|p|=t
branching(p)le 1
A more general property holds
Theorem 1 For every node i and every length t
sumpisinPath(iminus)|p|=t
branching(p) = 1
Proof By induction ont For t = 0 there is only one path fromi of length 0 and its branching is 1 Forthe inductive stepsumpisinPath(iminus)|p|=t+1branching(p) can be rewritten by observing that ifi has outdegreedi every path fromi of lengtht +1 is the concatenation ofi with a path of lengtht from an out-neighbor ofi
sumpisinPath(iminus)|p|=t+1
branching(p) = sumjirarr j
1di
sumpisinPath( jminus)|p|=t
branching(p) = sumjirarr j
1di
= 1
As a consequence to guarantee that the functional ranking is well-defined and normalized (ie that rankvalues sum to 1) we need
N
sumi=1
sumpisinPath(minusi)
damping(|p|) 1N
branching(p) = 1
that isinfin
sumt=0
damping(t)1N sum
pisinPath(minusminus)branching(p) = 1
UsingTheorem1 sumpisinPath(minusminus) branching(p) = N so the latter equality is equivalent to
infin
sumt=0
damping(t) = 1
Hence every choice of the damping function such thatsuminfint=0damping(t) = 1 yields a well-defined nor-
malized functional ranking Nonetheless not all choices are equivalent so we have to find out whichfunctions generate better rankings Since a direct link should be more valuable as a source of evidence thana distant link we focus on damping functions that are decreasing ont the length of the paths
31 Linear damping
Letrsquos start by considering a simple damping function such as
damping(t) =
2(Lminust)L(L+1) t lt L
0 t ge L
4
this is a damping function that decreases linearly with distance and reaches zero at distanceL The trivialcaseL = 1 gives a uniform ranking andL = 2 is ranking by indegree as in the latter case all paths of lengthge 2 are not considered
From the definition
R =infin
sumt=0
damping(t)vPt
=L
sumt=0
2(Lminus t)L(L+1)
vPt
=2
L(L+1)v
Lminus1
sumt=0
(Lminus t)Pt
=2
L(L+1)v(L(I minusP)minusP(I minusPL))((I minusP)2)minus1
provided that(I minusP)2 is not singularAn advantage of this type of ranking is that only the first few levels are considered so the computation
is fast and the number of iterations is fixed
Computation For computing this functional ranking we can define the following sequence
R(0) =2
L+1v
R(k+1) =(Lminuskminus1)
(Lminusk)RkP
The functional ranking with linear damping issumLminus1k=0 R(k) An algorithm for computing this ranking
shown inFigure1 arises directly from this summation
32 Exponential damping PageRank
As we already noted PageRank can be seen as a functional ranking where the damping function decaysexponentially
damping(t) = (1minusα)αt
As longer paths have less importance in the calculation of PageRank it could be approximated by usingonly a few levels of links In [CGS04] it is shown that by using only the nodes at distance 1 from a targetnode (equivalent to linear damping withL = 2) PageRank can be approximated with 30 of average errorUsing nodes at distance 2 the average error drops to 20 and at distance 3 to 10 After that there are nosignificant improvements by adding more levels and the cost (the number of nodes to be explored) is muchhigher
Computation Since PageRank is the principal eigenvector of the modified graph matrix it can be easilyapproximated by the iterative Power Method algorithm as suggested by Pageet al in their original pa-per [PBMW98] this iterative algorithm gives good approximations (both in norm and with respect to theinduced node order) in few iterations even though convergence speed and numerical stability decay whenαgets close to 1 [HK03b HK03a] Other methods to compute PageRank have been proposed some of them
5
Require L maximum path length N number of nodesv preference vector1 for i 1 N do Initialization2 S[i] larr R[i] larr 2v[i](L+1)3 end for4 for step 1 Lminus1 do Iteration step5 Auxlarr 06 for i 1 N do Follow links in the graph7 for all j such that there is a link fromi to j do8 Aux[j] larr Aux[j] + R[i]outdegree(i)9 end for
10 end for11 for i 1 N do Add to ranking value12 R[i] larr Aux[i] times (Lminusstep)
(Lminus(stepminus1))13 S[i] larr S[i] + R[i]14 end for15 end for16 return R
Figure 1 Algorithm for computing a functional ranking with linear damping
using techniques for the solution of systems of linear equations some other concentrating on some specificfeatures of the web as a graph that determine forms of locality in the computation of PageRank (see forexample [PBMW98 Hav99 GG04 LGZ04 KHMG03b KHMG03a])
33 Quadratic hyperbolic damping TotalRank
Recently a ranking method called TotalRank [Bol05] has been proposed The method aims at eliminatingthe necessity for an arbitrary parameter by integrating PageRank over the entire range ofα If r(α) is thevector of PageRank then TotalRank is defined as
T =int 1
0r(α)dα
T can be written as
int 1
0r(α)dα =
1N
infin
sumt=0
int 1
0(1minusα)αt1middotPtdα
=1N
infin
sumt=0
1(t +1)(t +2)
1middotPt
By using the definition of the logarithm of a matrix
ln(I minusP) =minusinfin
sumk=1
Pk
k
we can write TotalRank as
T = Pminus1(I +(I minusPminus1) ln(I minusP))
6
provided thatPminus1 is not singular andP 6= I TotalRank is as a weighted sum of the score associated with paths of varying lengths in which the
weights are hyperbolically decreasing on the lengths of the paths In other words TotalRank is a functionalranking with damping function
damping(t) =1
(t +1)(t +2)
and it is well defined assuminfint=0damping(t) = 1
Computation It is known that the cost of calculating TotalRank is the same as the cost of calculatingPageRank via the Power Method [BSV05] even though some more iterations are required to obtain thesame precision
34 General hyperbolic damping HyperRank
TotalRank is part of a more general family of weighting schemes for paths of different lengths that can beapproximated using
s(β) =1
Nζ(β)
infin
sumt=0
1
(t +1)β 1middotPt
Again this way of ranking follows the general scheme with damping function chosen as
damping(t) =1
ζ(β)(t +1)β
Here we are using Riemannrsquos zeta functionζ(β) = suminfint=1
1tβ for normalization and we needβ gt 1 for it
to converge Note that whenβ = 2 we get weights similar to those of TotalRank in which thet-th coefficientis 1(t +1)(t +2) whereas here it is 1ζ(2)(t +1)2
A meaningful choice forβ should be done considering the distribution of paths of different lengths in ascale-free graph A largeα in PageRank or a smallβ in HyperRank means increasing the effect of longerpaths in the score
Computation Figure2 shows an algorithm for approximating HyperRank Let us define a vector se-quenceR(t) as follows
R(0) =1
Nζ(β)
R(k+1) =(
k+1k+2
)βR(k)P
It is easy to see thatsuminfint=0R(k) = s(β) becauseR(k) = 1(N middotζ(β)(k+1)β))1middotPk this observation leads
to the algorithm Note that convergence speed is much slower than ordinary PageRank especially whenβis close to 1 the norm of thek-th summand being bound by 1(1+ 1k)β Interestingly enough thoughconvergence speed is reasonable ifβ is sufficiently large
7
Require N number of nodesv preference vectorβ damping parameter1 for i 1 N do Initialization2 S[i] larr R[i] larr v[i]ζ(β)3 end for4 klarr 05 while not convergeddo Iteration step6 klarr k + 17 Auxlarr 08 for i 1 N do Follow links in the graph9 for all j such that there is a link fromi to j do
10 Aux[j] larr Aux[j] + R[i]outdegree(i)11 end for12 end for13 for i 1 N do Add to ranking value14 R[i] larr Aux[i] times(k(k+1))β
15 S[i] larr S[i] + R[i]16 end for17 end while18 return S
Figure 2 An algorithm to compute general hyperbolic rank
35 An empirical damping
An empirical damping function would consider how much the value of an endorsement decreases by fol-lowing longer paths in the real web graph This cannot be known exactly but we can attempt to measureit indirectly Pages that link to each other are more similar than pages chosen at random [Dav00] evi-dence from topical crawlers [SPM05] shows that when doing breadth-first exploring the topic ldquodriftsrdquo asthe distance increases On the same line of though we propose to use the decrease of text similarity as anapproximation to an ldquoempiricalrdquo damping function
To find out which is the correlation between link-distance and similarity we performed the followingexperiment we considered a web graph corresponding to a partial snapshot of theuk domain and sampled200 nodes at random For each sampled node we followed links backwards to obtain nodes at a minimumdistance of 1 2 3 4 or 5 links Then we sampled 12000 pairs at each minimum distance at randomand computed their similarities with the original nodes Similarity was measured using the normalization ofTFIDF [BYRN99] without stemming nor stopwords removal
The resulting averages are shown inFigure3 with standard deviation error bars Text similarity clearlydecreases with distance and in some applications the empirical distribution of text similarity versus distancecould be used as an ldquoempiricalrdquo damping function Different measures of text similarity can yield differentdistributions for instance [WHAT04] uses the number of repeated words and phrases between pages andobtains a faster decrease in similarity
8
02
03
04
05
06
07
1 2 3 4 5
Ave
rage
text
sim
ilari
ty
Link distance
Figure 3 Link distance vs average text similarity
4 Comparing Damping Functions
A comparison of the damping functions described in the previous section is shown inFigure4 of coursehyperbolic damping functions decay asymptotically more slowly than exponential damping but notice thatfor short paths the latter may dominate the former in many cases
000
005
010
015
020
025
1 2 3 4 5 6 7
Wei
ght
Length of the path (t)
PageRank α=075PageRank α=050
TotalRankHyperRank β=2HyperRank β=3
Linear damping L=10
Figure 4 Weights given by the different damping functions for some values ofαandβ
In this section we aim at analyzing how similar are these functional rankings and how could we useone of the damping functions to approximate another with a suitable choice of parameters
9
41 Approximating TotalRank with PageRank
It has been shown experimentally that the rank correlation (Kendallrsquosτ) between TotalRank and PageRankis maximal whenα asymp 07 [Bol05] the maximum value forτ is over 095 meaning that for that specificchoice ofα PageRank and TotalRank induce almost equivalent ranking orders
We now want to approach the same problem in an analytic fashion more precisely we aim at study-ing the difference between TotalRank and PageRank by calculating the difference between their respectivedamping functions
dampingTotalRank(t) =1
(t +1)(t +2)dampingPageRank(α)(t) = (1minusα)αt
As they are normalized both damping functions have the same summation over the entire range oftOur approach is to consider the summation of their differences up to a maximum length for a path As thetwo functions are decreasing the difference in the first levels makes most of the difference in the rankingsIf ` is the maximum path length we are interested in we aim at minimizing this sum
`
sumt=0
(1
(t +1)(t +2)minus (1minusα)αt
)= α`+1minus 1
`+2
The minimum absolute value is 0 and it is obtained whenα is equal to
αlowast(`) =1
`+1radic
`+2= 1minus log`
`+O
(log2`
`2
)
Figure5 showsαlowast(`) as a function of Recall that for the World-Wide Web graph the average lengthof a path between two nodes when a path exists has been estimated in about 16 [BKM+00] or 19 [AJB99]but clearly today is over 20 Now in the range of path lengths between 15 and 20 the value ofαlowast(`)parameters that minimizes the difference between the exponentially decaying weights of PageRank and thehyperbolically decaying weights of TotalRank is roughly 085
Difference Optimal value
0506
0708
09
α0 5 10 15 20 25
Length
-02
-01
0
01
02
03
04
045
050
055
060
065
070
075
080
085
090
0 5 10 15 20 25
α that
min
imiz
es th
e di
ffer
ence
of
sum
of w
eigh
ts w
ith T
otal
Ran
k
Maximum length of the paths considered
Figure 5 Bestαlowast(`) for minimizing the difference of the sum of weights betweenPageRank and TotalRank
10
42 Approximating HyperRank with PageRank
Now we want to approximate the weights of
s(β) =1
Nζ(β)
infin
sumt=0
1
(t +1)β Pt
using the weights of
r(α) =1minusα
N
infin
sumt=0
αtPt
and we proceed again by considering paths up to a certain length
`
sumt=0
(1
ζ(β)(t +1)β minus (1minusα)αt
)The minimum can be zero and it is attained at
αlowast(`β) =
radic1minus 1
ζ(β)
`
sumt=0
1
(t +1)β
Theα that minimizes the difference of weights for different values ofβ and of the maximum path lengths` is shown inFigure6 In the case ofβ = 2 for instance for path lengths up to 10 to 20 the bestα is between075 and 085
α
15202530β 5 10 15 20 25
Length
03
04
05
06
07
08
09
1
00
01
02
03
04
05
06
07
08
09
10
1 15 2 25 3 35 4
α th
at m
inim
izes
the
diff
eren
ce o
fsu
m o
f wei
ghts
Exponent β
Max path length=25Max path length=20Max path length=15Max path length=10
Max path length=5
Figure 6 Bestα for minimizing the difference of the sum of weights betweenPageRank and HyperRank with exponentβ for various path lengths
11
43 Approximating PageRank with LinearRank
For approximating the damping function of PageRank with the damping function of LinearRank we con-sider the summation of the differences up to a certain path length If`le L
`
sumt=0
((1minusα)αt minus 2(Lminus t)
L(L+1)
)And if ` gt L
Lminus1
sumt=0
((1minusα)αt minus 2(Lminus t)
L(L+1)
)+
`
sumt=L
(1minusα)αt
We will assume thatle L so the evaluation of the difference between the two rankings is done in an areain which both rankings have non-zero values TheL that minimizes the difference for a given combinationof α and` is
Llowast(α `) = `+2`α`+1 +α`+1 +1+
radic(1+α`+1)2 +4`(`+2)α`+1
2(1minusα`+1)
= `+1+O(`α(`+1)2
)and we have plotted it for different values ofα and` in Figure7
L
0405
0607
0809
α5 10 15 20 25
Length
5
10
15
20
25
30
5
10
15
20
25
30
35
40
05 06 07 08 09
L th
at m
inim
izes
the
diff
eren
ce o
fsu
m o
f wei
ghts
Exponent α
Max path length=25Max path length=20Max path length=15Max path length=10Max path length=5
Figure 7 BestL for minimizing the difference of the sum of weights betweenLinearRank and LinearRank for various path lengths
12
5 Parameters for the Damping Functions
For our experiments we used several snapshots from the Web including theuk it andeuint domainsFor comparison we also considered a synthetic scale-free network produced according to the evolving modeldescribed by Kumaret al [KRR+00] (a combination of preferential attachment and random links) with theparameters suggested by Panduranganet al [PRU02] As far as the latter is concerned in the generatedgraph the exponents for the power-law in the center part of the distributions are -21 for in-degree andPageRank and -27 for out-degree we generated a 100000-nodes graph without disconnected nodes
In this section we study the behavior of the ranking functions for varying values of their parameters
51 Characteristic path lengths
In scale-free networks the distances between pairs of nodes follow a Gaussian distribution [AJB99] (theaverage is not given in their paper) Analytic estimations for the average distance of a graph of scale-freenetwork ofn nodes include
bull O(log(n)) [WS98]
bull O(log(n) log(np)) in sparse graphs withp links [CL01]
bull 1+ log(nz1) log(z2z1) wherez1 is the average indegree andz2 is the average number of nodes atdistance 2 [NSW01] and
bull O(log(n) log(log(n))) [BR04]
We did the following experiment starting from a node picked at random we followed the links back-wards and counted the number of nodes at different distancesFigure8 plots the average distances foundwhich appear to be growing (sub)logarithmically with the size of the graphFigure9 shows the distributionobtained in each sample For this experiment we are not counting the pages without in-links
3456789
10111213141516
105 106 107 108
Ave
rage
dis
tanc
e
Number of nodes (N)
Figure 8 Average distances versus number of nodes
If graphs of different sizes show different path lengths what is the effect of this in the ranking calcu-lation Letrsquos suppose that for a graph withN1 nodes it is found by experimental or analytic means that agood parameter for PageRank isαlowast1 Now we would like to have a good parameterαlowast2 for a graph with thesame properties except that the size of the new graph isN2 lt N1
13
it Web graph uk Web graph40 mill nodes 18 mill nodes
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
Avg distance 149 Avg distance 148
euint Web graph Synthetic graph800000 nodes 100000 nodes
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
Avg distance 100 Avg distance 42
Figure 9 Distribution of the average number of nodes at a certain distance froma given node
One possible approach consistent with what we have done so far is to consider that the sum of theweights up to the average path lengths of the graphs (L1 L2) have to be similar for both rankings to behavein a similar way If we take this approach the solution is
1minus (αlowast1)L1+1 = 1minus (αlowast2)
L2+1
αlowast2 = (αlowast1)L1+1L2+1
αlowast2 asymp (αlowast1)log(N1)log(N2)
An example that can be used in practice is the following letrsquos consider a web graph withN1 = 115times109
pages (the size of the full Web estimated by [GS05]) and another graph with onlyN2 = 50times106 pages (thesize of the Web of a large country) the second graph is roughly 3 orders of magnitude smaller
If it is shown empirically thatαlowast1 = 085 is a good value for the PageRank parameter for the whole Webthenαlowast2 = 081 should have a similar behavior in the 50-million page set which is natural as the path lengthsare shorter If the subset of web pages were even smaller for instanceN2 = 106 pages (the size of the webof a large organization) thenαlowast2 = 076
14
52 Damping parameters and in-degree
In this section we are using data from theuk Web graph and a 8500-nodes synthetic graph with the sameindegree and outdegree distribution than the previous synthetic graph We first measured the variance ofthe values from the ranking function as we consider that a high variance is good in a ranking functionas the relative values differ more We also measured the relationship between the ranking function andin-degree for different values of the parameters in terms of covariance correlation coefficient and rankingorders (Kendallrsquosτ) The results are shown inFigure10
Variance Covariance Correlation coefficient Kendallrsquosτ
000e+00200e-13400e-13600e-13800e-13100e-12120e-12140e-12160e-12180e-12200e-12
01 02 03 04 05 06 07 08 09
Var
ianc
e of
Pag
eRan
k
Damping factor α
00 times 100
50 times 10-5
10 times 10-4
15 times 10-4
20 times 10-4
25 times 10-4
30 times 10-4
35 times 10-4
01 02 03 04 05 06 07 08 09
Cov
aria
nce
betw
een
Page
Ran
k an
d in
-deg
ree
Damping factor α
07000710072007300740075007600770078007900800
01 02 03 04 05 06 07 08 09
Corr
elat
ion
coef
ficie
nt w
ith in
-deg
ree
Damping factor α
042
044
046
048
050
052
054
056
01 02 03 04 05 06 07 08 09Ken
dallrsquo
s τ b
etw
een
Page
Rank
and
in-d
egre
e
Damping factor α
005
010
015
020
025
030
035
040
045
050
055
01 02 03 04 05 06 07 08 09
Var
ianc
e of
Pag
eRan
k
Damping factor α
0 times 100
1 times 10-3
2 times 10-3
3 times 10-3
4 times 10-3
5 times 10-3
6 times 10-3
01 02 03 04 05 06 07 08 09
Cov
aria
nce
betw
een
Page
Ran
k an
d in
-deg
ree
Damping factor α
0995
0996
0997
0998
0999
1000
01 02 03 04 05 06 07 08 09
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor α
079
080
081
082
083
084
085
086
087
01 02 03 04 05 06 07 08 09Ken
dallrsquo
s τ b
etw
een
Page
Rank
and
in-d
egre
e
Damping factor α
Figure 10 Variance of PageRank and its relationship with indegree in terms ofcovariance correlation coefficient and Kendallrsquosτ coefficient for varying valuesof the parameterα Top uk web graph bottom synthetic graph
The variance is higher asα increases In regard to the relationship with indegree for company homepages it has been shown that the logarithm of the in-degree is correlated with PageRank [UCH03] Ourresults are consistent with this observation The covariance monotonically increases withα both in the caseof the synthetic graph and in the web graph Not surprisingly using the generative model the correlationsare higher We observe a maximum correlation atα = 07 in the synthetic graph and atα = 05 in the webgraph We also notice that the correlation drops significativelly asα increases because a largeα means thatlonger paths have an effect in the calculation note however that this phenomenon does not significantlyimpact on the correlation coefficient that is still very large
A high correlation between PageRank and in-degree is bad from the point of view of a search enginebecause it makes link-spam easier In particular as the correlation coefficient is higher in theuk web graphnear 05 if we chooseα close to this value we are helping link spammers Note however that a highcorrelation was foreseeable because as shown in[CGS04] even approximating PageRank with just only 1level of links gets 70 of accuracy
The behavior of the Kendallrsquosτ coefficient which measures the similarity between ranking orders isthe opposite than the one observed in the real graph This also happens for HyperRank as inFigure11we made the same measurements for this functional ranking and the results were consistent The graphseems inverted because a low value ofβ has the same effect as a high value ofα longer paths have moreimportance in the calculation
The differences in the behavior of the ranking order in the synthetic graph might be explained by thefact that the generative model we are using does not capture some properties that might be relevant for theranking order under a functional ranking such as the clustering coefficient Also the synthetic graph is
15
Variance Covariance Correlation coefficient Kendallrsquosτ
000e+00
100e-12
200e-12
300e-12
400e-12
500e-12
600e-12
700e-12
800e-12
900e-12
1 15 2 25 3 35 4
Var
ianc
e of
Gen
eral
Hyp
erbo
lic R
ank
Damping factor β
000e+00
100e-04
200e-04
300e-04
400e-04
500e-04
600e-04
700e-04
800e-04
1 15 2 25 3 35 4Cov
aria
nce
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
0730
0740
0750
0760
0770
0780
0790
0800
1 15 2 25 3 35 4
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor β
04600
04700
04800
04900
05000
05100
05200
05300
1 15 2 25 3
Ken
dallrsquo
s τ
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
000e+00
200e-05
400e-05
600e-05
800e-05
100e-04
120e-04
140e-04
160e-04
1 15 2 25 3 35 4
Var
ianc
e of
Gen
eral
Hyp
erbo
lic R
ank
Damping factor β
000e+00
200e-03
400e-03
600e-03
800e-03
100e-02
120e-02
1 15 2 25 3 35 4Cov
aria
nce
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
0997
0997
0998
0998
0999
0999
1 15 2 25 3 35 4
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor β
0815008200082500830008350084000845008500085500860008650
1 15 2 25 3 35 4
Ken
dallrsquo
s τ b
etw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
Figure 11 Variance of general hyperbolic rank for some values ofβ and itsrelation with in-degree Experiments have been performed on theuk graph (top)and synthetic graph (bottom)
assortative (highly linked pages are mostly linked to other highly linked pages) while the real web graph isdisassortative (most of the neighbours of highly linked pages have small indegree)
53 Experimental comparison of ranking orders
In this section we present experimental results about the similarity between the ranking orders induced bysome of the functional rankings discussed in the previous sections To perform the comparison we usedKendallrsquosτ and data from the U K Web graphFigure12shows how PageRank compares with HyperRankfor various pairs ofα andβ In the limit αβrarr 1 both rankings are equivalent and they remain similar in alarge region of the parameter space
α
β
07
08
09
1
τ ge 095
τ
02 03
04 05
06 07
08 09
2 25
3 35
4
01
02
03
04
05
06
07
08
09
15 2 25 3 35 4
α th
at m
axim
izes
Ken
dallrsquo
s τ
Exponent β
Actual optimumPredicted optimum with length=5
Figure 12 Comparison (using Kendallrsquosτ) between PageRank and HyperRankwith various damping parameters in the UK web graph The optimum predictedin the analysis with = 5 is very close to the real one
In the figure we can see that the rankings obtained with HyperRank and PageRank can be almostequivalent (Kendallrsquosτ ge 095) moreover the analysis shown insection42 considering only paths of
16
lengths less than 5 provides a very good approximation for the optimum combination of parameters Thismeans that in fact the difference in the damping functions in the first few levels is crucial
The exponentsβ required for giving a good approximation of PageRank are very small whenα ge 07limiting the practical applicability of HyperRank as its convergence is not faster than the one of PageRank
As far as LinearRank and PageRank are concerned long paths and largeα should be considered toobtain a sufficiently similar ranking as shown inFigure13
05 06
07 08
09
5 10
15 20
25
080085090095100
τ
τ ge 095
αL
5
10
15
20
25
05 06 07 08 09L
that
max
imiz
es K
enda
llrsquos
τExponent α
Actual optimumPredicted optimum with length=5
Figure 13 Comparison (using Kendallrsquosτ) between PageRank and LinearRankin the UK web graph with various damping parameters Again the predictedoptimum with` = 5 is very close to the actual optimum
The predicted optimum given insection43 with ` = 5 this is considering only the summation of thedifferences between both damping functions up to paths of length 5 is very close to what was obtained inpractice Forα = 08 calculating LinearRank withL = 10 (which means the same number of iterations)givesτ ge 098 for α = 09 calculating LinearRank withL = 15 also givesτ ge 098 In both cases theranking order of PageRank is approximated by the ranking order of LinearRank with very high precision
6 Conclusions
In this paper we have defined a broad class of link-based ranking algorithms based on the contribution ofdamping factors along all different paths reaching a page We introduce four particular damping decays lin-ear exponential quadratic hyperbolic and general hyperbolic where exponential is equivalent to PageRankand quadratic hyperbolic to TotalRank
We studied the differences and similarities among these ranking algorithms and we have found that
bull Functional rankings using different damping functions can provide similar orderings if the parametersare chosen carefully
bull LinearRank can be used for calculating a ranking that is as good as PageRank but with a fixed andsmaller number of iterations
bull The parameters for the damping functions depend on the characteristic of path lengths in the graphwhich is known to grow sub-logarithmically on the size of the graph
More work is needed to find other damping functions that compute rankings similar to PageRank butare easier and faster to compute We use global ranking similarity but another measure could be the ranking
17
similarity in the top 20 results of real queries In this setting our results can change so future work willinclude this variation
Because of their high cost link-based ranking methods that involve iterative calculations at query timeare probably not used by large-scale search engines at this moment but the functional ranking with lineardamping we have presented can provide a good approximation with few iterations Also the approach wehave presented could be also applied to multivalued ranking functions such as HITS [Kle99] and topic-sensitive PageRank [Hav02] to obtain for instance a method for approximating the hubs and authorityscores using less iterations and a linear damping function
Our approach also helps to understand how easy or difficult is to collude many pages to modify theranking of a given page Clearly there are many different factors path lengths damping function branchingdegrees and number of colluded pages The graph structure of the collusion will affect those factors and weplan to analyze them
Acknowledgements
We would like to thank Daniel Fogaras for a valuable discussion about TotalRank that motivated part of thisresearch
References
[AJB99] Reka Albert Hawoong Jeong and Albert L Barabasi Diameter of the world wide webNature 401130ndash131 1999
[BH98] Krishna Bharat and Monika R HenzingerImproved algorithms for topic distillation in ahyperlinked environment In Proceedings of the 21st Annual International ACM SIGIR Con-ference on Research and Development in Information Retrieval pages 104ndash111 MelbourneAustralia August 1998 ACM Press New York
[BKM +00] Andrei Broder Ravi Kumar Farzin Maghoul Prabhakar Raghavan Sridhar RajagopalanRaymie Stata Andrew Tomkins and Janet WienerGraph structure in the web Experimentsand models In Proceedings of the Ninth Conference on World Wide Web pages 309ndash320Amsterdam Netherlands May 2000 ACM Press
[Bol05] Paolo BoldiTotalrank ranking without damping In Poster proceedings of the 14th interna-tional conference on World Wide Web pages 898ndash899 Chiba Japan 2005 ACM Press
[BR04] Bela Bollobas and Oliver RiordanThe diameter of a scale-free random graph Combinator-ica 24(1)5ndash34 2004
[Bri06] Michael Brinkmeier PageRank revisited ACM Transaction on Internet Technologies6(3)257ndash279 2006
[BSV05] Paolo Boldi Massimo Santini and Sebastiano VignaPagerank as a function of the dampingfactor In Proceedings of the 14th international conference on World Wide Web pages 557ndash566 Chiba Japan 2005 ACM Press
[BYD04] Ricardo Baeza-Yates and Emilio DavisWeb page ranking using link attributes In Alternatetrack papers amp posters of the 13th international conference on World Wide Web pages 328ndash329 New York NY USA 2004 ACM Press
18
[BYRN99] Ricardo Baeza-Yates and Berthier Ribeiro-NetoModern Information Retrieval AddisonWesley May 1999
[CGS04] Yen-Yu Chen Qingqing Gan and Torsten SuelLocal methods for estimating pagerank val-ues In Proceedings of the thirteenth ACM conference on Information and knowledge man-agement (CIKM) pages 381ndash389 New York NY USA 2004 ACM Press
[CL01] Fan Chung and Linyuan LuThe diameter of random sparse graphs Adv Appl Math26257ndash279 2001
[Dav00] Brian D DavisonTopical locality in the web In Proceedings of the 23rd annual internationalACM SIGIR conference on research and development in information retrieval pages 272ndash279Athens Greece 2000 ACM Press
[GG04] Gene H Golub and Chen GreifArnoldi-type algorithms for computing stationary distributionvectors with application to pagerank Technical Report SCCM-04-15 Stanford University2004
[GS05] Antonio Gulli and Alessio SignoriniThe indexable Web is more than 115 billion pages InPoster proceedings of the 14th international conference on World Wide Web pages 902ndash903Chiba Japan 2005 ACM Press
[Hav99] Taher HaveliwalaEfficient computation of pagerank Technical report Stanford University1999
[Hav02] Taher H HaveliwalaTopic-sensitive pagerank In Proceedings of the Eleventh World WideWeb Conference pages 517ndash526 Honolulu Hawaii USA May 2002 ACM Press
[HG98] Stephanie W Haas and Erika S GramsPage and link classifications connecting diverseresources In DL rsquo98 Proceedings of the third ACM conference on Digital libraries pages99ndash107 New York NY USA 1998 ACM Press
[HK03a] Taher Haveliwala and Sepandar KamvarThe condition number of the pagerank problemTechnical Report 36 Stanford University 2003
[HK03b] Taher Haveliwala and Sepandar KamvarThe second eigenvalue of the google matrix Tech-nical Report 20 Stanford University 2003
[JM98] W-K Joo and S H Myaeng Improving retrieval effectiveness with hyperlink informationIn Proceedings of International Workshop on Information Retrieval with Asian Languages(IRAL) Singapore October 1998
[KHMG03a] S Kamvar T Haveliwala C Manning and G GolubExploiting the block structure of theweb for computing pagerank 2003
[KHMG03b] Sepandar D Kamvar Taher H Haveliwala Christopher D Manning and Gene H GolubExtrapolation methods for accelerating pagerank computations In Proceedings of the twelfthinternational conference on World Wide Web pages 261ndash270 ACM Press 2003
[Kle99] Jon M KleinbergAuthoritative sources in a hyperlinked environment Journal of the ACM46(5)604ndash632 1999
19
[KRR+00] R Kumar P Raghavan S Rajagopalan D Sivakumar A Tomkins and E UpfalStochasticmodels for the web graph In Proceedings of the 41st Annual Symposium on Foundations ofComputer Science (FOCS) pages 57ndash65 Redondo Beach CA USA 2000 IEEE CS Press
[LGZ04] Chris P Lee Gene H Golub and Stefanos A ZeniosA fast two-stage algorithm for com-puting pagerank and its extensions Technical report Stanford University 2004
[Li98] Yanhong LiToward a qualitative search engine IEEE Internet Computing July 1998
[Lif00] Maxim LifantsevVoting model for ranking Web pages In Peter Graham and MuthucumaruMaheswaran editorsProceedings of the International Conference on Internet Computingpages 143ndash148 Las Vegas Nevada USA June 2000 CSREA Press
[Mar97] Massimo MarchioriThe quest for correct information of the Web hyper search engines InProc of the sixth international conference on the Web Santa Clara USA April 1997
[NSW01] M E Newman S H Strogatz and D J WattsRandom graphs with arbitrary degree distri-butions and their applicationsPhys Rev E Stat Nonlin Soft Matter Phys 64(2 Pt 2) August2001
[PBMW98] Lawrence Page Sergey Brin Rajeev Motwani and Terry WinogradThe PageRank citationranking bringing order to the Web Technical report Stanford Digital Library TechnologiesProject 1998
[PRU02] Gopal Pandurangan Prabhakara Raghavan and Eli UpfalUsing Pagerank to characterizeWeb structure In Proceedings of the 8th Annual International Computing and CombinatoricsConference (COCOON) volume 2387 ofLecture Notes in Computer Science pages 330ndash390Singapore August 2002 Springer
[SPM05] Padmini Srinivasan Gautam Pant and Filippo MenczerA general evaluation framework fortopical crawlers Information Retrieval 8(3)417ndash447 2005
[UCH03] Trystan Upstill Nick Craswell and David HawkingPredicting fame and fortune Page-rank or indegree In Proceedings of the Australasian Document Computing SymposiumADCS2003 pages 31ndash40 Canberra Australia December 2003
[WHAT04] Fang Wu Bernardo A Huberman Lada A Adamic and Joshua R TylerInformation flow insocial groups Physica A Statistical and Theoretical Physics 337(1-2)327ndash335 June 2004
[WS98] Duncan J Watts and Steven H StrogatzCollective dynamics of small-world networksNa-ture 393(6684)440ndash442 June 1998
20
- Introduction
- Functional Rankings
- Damping Functions
-
- Linear damping
- Exponential damping PageRank
- Quadratic hyperbolic damping TotalRank
- General hyperbolic damping HyperRank
- An empirical damping
-
- Comparing Damping Functions
-
- Approximating TotalRank with PageRank
- Approximating HyperRank with PageRank
- Approximating PageRank with LinearRank
-
- Parameters for the Damping Functions
-
- Characteristic path lengths
- Damping parameters and in-degree
- Experimental comparison of ranking orders
-
- Conclusions
-
this is a damping function that decreases linearly with distance and reaches zero at distanceL The trivialcaseL = 1 gives a uniform ranking andL = 2 is ranking by indegree as in the latter case all paths of lengthge 2 are not considered
From the definition
R =infin
sumt=0
damping(t)vPt
=L
sumt=0
2(Lminus t)L(L+1)
vPt
=2
L(L+1)v
Lminus1
sumt=0
(Lminus t)Pt
=2
L(L+1)v(L(I minusP)minusP(I minusPL))((I minusP)2)minus1
provided that(I minusP)2 is not singularAn advantage of this type of ranking is that only the first few levels are considered so the computation
is fast and the number of iterations is fixed
Computation For computing this functional ranking we can define the following sequence
R(0) =2
L+1v
R(k+1) =(Lminuskminus1)
(Lminusk)RkP
The functional ranking with linear damping issumLminus1k=0 R(k) An algorithm for computing this ranking
shown inFigure1 arises directly from this summation
32 Exponential damping PageRank
As we already noted PageRank can be seen as a functional ranking where the damping function decaysexponentially
damping(t) = (1minusα)αt
As longer paths have less importance in the calculation of PageRank it could be approximated by usingonly a few levels of links In [CGS04] it is shown that by using only the nodes at distance 1 from a targetnode (equivalent to linear damping withL = 2) PageRank can be approximated with 30 of average errorUsing nodes at distance 2 the average error drops to 20 and at distance 3 to 10 After that there are nosignificant improvements by adding more levels and the cost (the number of nodes to be explored) is muchhigher
Computation Since PageRank is the principal eigenvector of the modified graph matrix it can be easilyapproximated by the iterative Power Method algorithm as suggested by Pageet al in their original pa-per [PBMW98] this iterative algorithm gives good approximations (both in norm and with respect to theinduced node order) in few iterations even though convergence speed and numerical stability decay whenαgets close to 1 [HK03b HK03a] Other methods to compute PageRank have been proposed some of them
5
Require L maximum path length N number of nodesv preference vector1 for i 1 N do Initialization2 S[i] larr R[i] larr 2v[i](L+1)3 end for4 for step 1 Lminus1 do Iteration step5 Auxlarr 06 for i 1 N do Follow links in the graph7 for all j such that there is a link fromi to j do8 Aux[j] larr Aux[j] + R[i]outdegree(i)9 end for
10 end for11 for i 1 N do Add to ranking value12 R[i] larr Aux[i] times (Lminusstep)
(Lminus(stepminus1))13 S[i] larr S[i] + R[i]14 end for15 end for16 return R
Figure 1 Algorithm for computing a functional ranking with linear damping
using techniques for the solution of systems of linear equations some other concentrating on some specificfeatures of the web as a graph that determine forms of locality in the computation of PageRank (see forexample [PBMW98 Hav99 GG04 LGZ04 KHMG03b KHMG03a])
33 Quadratic hyperbolic damping TotalRank
Recently a ranking method called TotalRank [Bol05] has been proposed The method aims at eliminatingthe necessity for an arbitrary parameter by integrating PageRank over the entire range ofα If r(α) is thevector of PageRank then TotalRank is defined as
T =int 1
0r(α)dα
T can be written as
int 1
0r(α)dα =
1N
infin
sumt=0
int 1
0(1minusα)αt1middotPtdα
=1N
infin
sumt=0
1(t +1)(t +2)
1middotPt
By using the definition of the logarithm of a matrix
ln(I minusP) =minusinfin
sumk=1
Pk
k
we can write TotalRank as
T = Pminus1(I +(I minusPminus1) ln(I minusP))
6
provided thatPminus1 is not singular andP 6= I TotalRank is as a weighted sum of the score associated with paths of varying lengths in which the
weights are hyperbolically decreasing on the lengths of the paths In other words TotalRank is a functionalranking with damping function
damping(t) =1
(t +1)(t +2)
and it is well defined assuminfint=0damping(t) = 1
Computation It is known that the cost of calculating TotalRank is the same as the cost of calculatingPageRank via the Power Method [BSV05] even though some more iterations are required to obtain thesame precision
34 General hyperbolic damping HyperRank
TotalRank is part of a more general family of weighting schemes for paths of different lengths that can beapproximated using
s(β) =1
Nζ(β)
infin
sumt=0
1
(t +1)β 1middotPt
Again this way of ranking follows the general scheme with damping function chosen as
damping(t) =1
ζ(β)(t +1)β
Here we are using Riemannrsquos zeta functionζ(β) = suminfint=1
1tβ for normalization and we needβ gt 1 for it
to converge Note that whenβ = 2 we get weights similar to those of TotalRank in which thet-th coefficientis 1(t +1)(t +2) whereas here it is 1ζ(2)(t +1)2
A meaningful choice forβ should be done considering the distribution of paths of different lengths in ascale-free graph A largeα in PageRank or a smallβ in HyperRank means increasing the effect of longerpaths in the score
Computation Figure2 shows an algorithm for approximating HyperRank Let us define a vector se-quenceR(t) as follows
R(0) =1
Nζ(β)
R(k+1) =(
k+1k+2
)βR(k)P
It is easy to see thatsuminfint=0R(k) = s(β) becauseR(k) = 1(N middotζ(β)(k+1)β))1middotPk this observation leads
to the algorithm Note that convergence speed is much slower than ordinary PageRank especially whenβis close to 1 the norm of thek-th summand being bound by 1(1+ 1k)β Interestingly enough thoughconvergence speed is reasonable ifβ is sufficiently large
7
Require N number of nodesv preference vectorβ damping parameter1 for i 1 N do Initialization2 S[i] larr R[i] larr v[i]ζ(β)3 end for4 klarr 05 while not convergeddo Iteration step6 klarr k + 17 Auxlarr 08 for i 1 N do Follow links in the graph9 for all j such that there is a link fromi to j do
10 Aux[j] larr Aux[j] + R[i]outdegree(i)11 end for12 end for13 for i 1 N do Add to ranking value14 R[i] larr Aux[i] times(k(k+1))β
15 S[i] larr S[i] + R[i]16 end for17 end while18 return S
Figure 2 An algorithm to compute general hyperbolic rank
35 An empirical damping
An empirical damping function would consider how much the value of an endorsement decreases by fol-lowing longer paths in the real web graph This cannot be known exactly but we can attempt to measureit indirectly Pages that link to each other are more similar than pages chosen at random [Dav00] evi-dence from topical crawlers [SPM05] shows that when doing breadth-first exploring the topic ldquodriftsrdquo asthe distance increases On the same line of though we propose to use the decrease of text similarity as anapproximation to an ldquoempiricalrdquo damping function
To find out which is the correlation between link-distance and similarity we performed the followingexperiment we considered a web graph corresponding to a partial snapshot of theuk domain and sampled200 nodes at random For each sampled node we followed links backwards to obtain nodes at a minimumdistance of 1 2 3 4 or 5 links Then we sampled 12000 pairs at each minimum distance at randomand computed their similarities with the original nodes Similarity was measured using the normalization ofTFIDF [BYRN99] without stemming nor stopwords removal
The resulting averages are shown inFigure3 with standard deviation error bars Text similarity clearlydecreases with distance and in some applications the empirical distribution of text similarity versus distancecould be used as an ldquoempiricalrdquo damping function Different measures of text similarity can yield differentdistributions for instance [WHAT04] uses the number of repeated words and phrases between pages andobtains a faster decrease in similarity
8
02
03
04
05
06
07
1 2 3 4 5
Ave
rage
text
sim
ilari
ty
Link distance
Figure 3 Link distance vs average text similarity
4 Comparing Damping Functions
A comparison of the damping functions described in the previous section is shown inFigure4 of coursehyperbolic damping functions decay asymptotically more slowly than exponential damping but notice thatfor short paths the latter may dominate the former in many cases
000
005
010
015
020
025
1 2 3 4 5 6 7
Wei
ght
Length of the path (t)
PageRank α=075PageRank α=050
TotalRankHyperRank β=2HyperRank β=3
Linear damping L=10
Figure 4 Weights given by the different damping functions for some values ofαandβ
In this section we aim at analyzing how similar are these functional rankings and how could we useone of the damping functions to approximate another with a suitable choice of parameters
9
41 Approximating TotalRank with PageRank
It has been shown experimentally that the rank correlation (Kendallrsquosτ) between TotalRank and PageRankis maximal whenα asymp 07 [Bol05] the maximum value forτ is over 095 meaning that for that specificchoice ofα PageRank and TotalRank induce almost equivalent ranking orders
We now want to approach the same problem in an analytic fashion more precisely we aim at study-ing the difference between TotalRank and PageRank by calculating the difference between their respectivedamping functions
dampingTotalRank(t) =1
(t +1)(t +2)dampingPageRank(α)(t) = (1minusα)αt
As they are normalized both damping functions have the same summation over the entire range oftOur approach is to consider the summation of their differences up to a maximum length for a path As thetwo functions are decreasing the difference in the first levels makes most of the difference in the rankingsIf ` is the maximum path length we are interested in we aim at minimizing this sum
`
sumt=0
(1
(t +1)(t +2)minus (1minusα)αt
)= α`+1minus 1
`+2
The minimum absolute value is 0 and it is obtained whenα is equal to
αlowast(`) =1
`+1radic
`+2= 1minus log`
`+O
(log2`
`2
)
Figure5 showsαlowast(`) as a function of Recall that for the World-Wide Web graph the average lengthof a path between two nodes when a path exists has been estimated in about 16 [BKM+00] or 19 [AJB99]but clearly today is over 20 Now in the range of path lengths between 15 and 20 the value ofαlowast(`)parameters that minimizes the difference between the exponentially decaying weights of PageRank and thehyperbolically decaying weights of TotalRank is roughly 085
Difference Optimal value
0506
0708
09
α0 5 10 15 20 25
Length
-02
-01
0
01
02
03
04
045
050
055
060
065
070
075
080
085
090
0 5 10 15 20 25
α that
min
imiz
es th
e di
ffer
ence
of
sum
of w
eigh
ts w
ith T
otal
Ran
k
Maximum length of the paths considered
Figure 5 Bestαlowast(`) for minimizing the difference of the sum of weights betweenPageRank and TotalRank
10
42 Approximating HyperRank with PageRank
Now we want to approximate the weights of
s(β) =1
Nζ(β)
infin
sumt=0
1
(t +1)β Pt
using the weights of
r(α) =1minusα
N
infin
sumt=0
αtPt
and we proceed again by considering paths up to a certain length
`
sumt=0
(1
ζ(β)(t +1)β minus (1minusα)αt
)The minimum can be zero and it is attained at
αlowast(`β) =
radic1minus 1
ζ(β)
`
sumt=0
1
(t +1)β
Theα that minimizes the difference of weights for different values ofβ and of the maximum path lengths` is shown inFigure6 In the case ofβ = 2 for instance for path lengths up to 10 to 20 the bestα is between075 and 085
α
15202530β 5 10 15 20 25
Length
03
04
05
06
07
08
09
1
00
01
02
03
04
05
06
07
08
09
10
1 15 2 25 3 35 4
α th
at m
inim
izes
the
diff
eren
ce o
fsu
m o
f wei
ghts
Exponent β
Max path length=25Max path length=20Max path length=15Max path length=10
Max path length=5
Figure 6 Bestα for minimizing the difference of the sum of weights betweenPageRank and HyperRank with exponentβ for various path lengths
11
43 Approximating PageRank with LinearRank
For approximating the damping function of PageRank with the damping function of LinearRank we con-sider the summation of the differences up to a certain path length If`le L
`
sumt=0
((1minusα)αt minus 2(Lminus t)
L(L+1)
)And if ` gt L
Lminus1
sumt=0
((1minusα)αt minus 2(Lminus t)
L(L+1)
)+
`
sumt=L
(1minusα)αt
We will assume thatle L so the evaluation of the difference between the two rankings is done in an areain which both rankings have non-zero values TheL that minimizes the difference for a given combinationof α and` is
Llowast(α `) = `+2`α`+1 +α`+1 +1+
radic(1+α`+1)2 +4`(`+2)α`+1
2(1minusα`+1)
= `+1+O(`α(`+1)2
)and we have plotted it for different values ofα and` in Figure7
L
0405
0607
0809
α5 10 15 20 25
Length
5
10
15
20
25
30
5
10
15
20
25
30
35
40
05 06 07 08 09
L th
at m
inim
izes
the
diff
eren
ce o
fsu
m o
f wei
ghts
Exponent α
Max path length=25Max path length=20Max path length=15Max path length=10Max path length=5
Figure 7 BestL for minimizing the difference of the sum of weights betweenLinearRank and LinearRank for various path lengths
12
5 Parameters for the Damping Functions
For our experiments we used several snapshots from the Web including theuk it andeuint domainsFor comparison we also considered a synthetic scale-free network produced according to the evolving modeldescribed by Kumaret al [KRR+00] (a combination of preferential attachment and random links) with theparameters suggested by Panduranganet al [PRU02] As far as the latter is concerned in the generatedgraph the exponents for the power-law in the center part of the distributions are -21 for in-degree andPageRank and -27 for out-degree we generated a 100000-nodes graph without disconnected nodes
In this section we study the behavior of the ranking functions for varying values of their parameters
51 Characteristic path lengths
In scale-free networks the distances between pairs of nodes follow a Gaussian distribution [AJB99] (theaverage is not given in their paper) Analytic estimations for the average distance of a graph of scale-freenetwork ofn nodes include
bull O(log(n)) [WS98]
bull O(log(n) log(np)) in sparse graphs withp links [CL01]
bull 1+ log(nz1) log(z2z1) wherez1 is the average indegree andz2 is the average number of nodes atdistance 2 [NSW01] and
bull O(log(n) log(log(n))) [BR04]
We did the following experiment starting from a node picked at random we followed the links back-wards and counted the number of nodes at different distancesFigure8 plots the average distances foundwhich appear to be growing (sub)logarithmically with the size of the graphFigure9 shows the distributionobtained in each sample For this experiment we are not counting the pages without in-links
3456789
10111213141516
105 106 107 108
Ave
rage
dis
tanc
e
Number of nodes (N)
Figure 8 Average distances versus number of nodes
If graphs of different sizes show different path lengths what is the effect of this in the ranking calcu-lation Letrsquos suppose that for a graph withN1 nodes it is found by experimental or analytic means that agood parameter for PageRank isαlowast1 Now we would like to have a good parameterαlowast2 for a graph with thesame properties except that the size of the new graph isN2 lt N1
13
it Web graph uk Web graph40 mill nodes 18 mill nodes
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
Avg distance 149 Avg distance 148
euint Web graph Synthetic graph800000 nodes 100000 nodes
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
Avg distance 100 Avg distance 42
Figure 9 Distribution of the average number of nodes at a certain distance froma given node
One possible approach consistent with what we have done so far is to consider that the sum of theweights up to the average path lengths of the graphs (L1 L2) have to be similar for both rankings to behavein a similar way If we take this approach the solution is
1minus (αlowast1)L1+1 = 1minus (αlowast2)
L2+1
αlowast2 = (αlowast1)L1+1L2+1
αlowast2 asymp (αlowast1)log(N1)log(N2)
An example that can be used in practice is the following letrsquos consider a web graph withN1 = 115times109
pages (the size of the full Web estimated by [GS05]) and another graph with onlyN2 = 50times106 pages (thesize of the Web of a large country) the second graph is roughly 3 orders of magnitude smaller
If it is shown empirically thatαlowast1 = 085 is a good value for the PageRank parameter for the whole Webthenαlowast2 = 081 should have a similar behavior in the 50-million page set which is natural as the path lengthsare shorter If the subset of web pages were even smaller for instanceN2 = 106 pages (the size of the webof a large organization) thenαlowast2 = 076
14
52 Damping parameters and in-degree
In this section we are using data from theuk Web graph and a 8500-nodes synthetic graph with the sameindegree and outdegree distribution than the previous synthetic graph We first measured the variance ofthe values from the ranking function as we consider that a high variance is good in a ranking functionas the relative values differ more We also measured the relationship between the ranking function andin-degree for different values of the parameters in terms of covariance correlation coefficient and rankingorders (Kendallrsquosτ) The results are shown inFigure10
Variance Covariance Correlation coefficient Kendallrsquosτ
000e+00200e-13400e-13600e-13800e-13100e-12120e-12140e-12160e-12180e-12200e-12
01 02 03 04 05 06 07 08 09
Var
ianc
e of
Pag
eRan
k
Damping factor α
00 times 100
50 times 10-5
10 times 10-4
15 times 10-4
20 times 10-4
25 times 10-4
30 times 10-4
35 times 10-4
01 02 03 04 05 06 07 08 09
Cov
aria
nce
betw
een
Page
Ran
k an
d in
-deg
ree
Damping factor α
07000710072007300740075007600770078007900800
01 02 03 04 05 06 07 08 09
Corr
elat
ion
coef
ficie
nt w
ith in
-deg
ree
Damping factor α
042
044
046
048
050
052
054
056
01 02 03 04 05 06 07 08 09Ken
dallrsquo
s τ b
etw
een
Page
Rank
and
in-d
egre
e
Damping factor α
005
010
015
020
025
030
035
040
045
050
055
01 02 03 04 05 06 07 08 09
Var
ianc
e of
Pag
eRan
k
Damping factor α
0 times 100
1 times 10-3
2 times 10-3
3 times 10-3
4 times 10-3
5 times 10-3
6 times 10-3
01 02 03 04 05 06 07 08 09
Cov
aria
nce
betw
een
Page
Ran
k an
d in
-deg
ree
Damping factor α
0995
0996
0997
0998
0999
1000
01 02 03 04 05 06 07 08 09
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor α
079
080
081
082
083
084
085
086
087
01 02 03 04 05 06 07 08 09Ken
dallrsquo
s τ b
etw
een
Page
Rank
and
in-d
egre
e
Damping factor α
Figure 10 Variance of PageRank and its relationship with indegree in terms ofcovariance correlation coefficient and Kendallrsquosτ coefficient for varying valuesof the parameterα Top uk web graph bottom synthetic graph
The variance is higher asα increases In regard to the relationship with indegree for company homepages it has been shown that the logarithm of the in-degree is correlated with PageRank [UCH03] Ourresults are consistent with this observation The covariance monotonically increases withα both in the caseof the synthetic graph and in the web graph Not surprisingly using the generative model the correlationsare higher We observe a maximum correlation atα = 07 in the synthetic graph and atα = 05 in the webgraph We also notice that the correlation drops significativelly asα increases because a largeα means thatlonger paths have an effect in the calculation note however that this phenomenon does not significantlyimpact on the correlation coefficient that is still very large
A high correlation between PageRank and in-degree is bad from the point of view of a search enginebecause it makes link-spam easier In particular as the correlation coefficient is higher in theuk web graphnear 05 if we chooseα close to this value we are helping link spammers Note however that a highcorrelation was foreseeable because as shown in[CGS04] even approximating PageRank with just only 1level of links gets 70 of accuracy
The behavior of the Kendallrsquosτ coefficient which measures the similarity between ranking orders isthe opposite than the one observed in the real graph This also happens for HyperRank as inFigure11we made the same measurements for this functional ranking and the results were consistent The graphseems inverted because a low value ofβ has the same effect as a high value ofα longer paths have moreimportance in the calculation
The differences in the behavior of the ranking order in the synthetic graph might be explained by thefact that the generative model we are using does not capture some properties that might be relevant for theranking order under a functional ranking such as the clustering coefficient Also the synthetic graph is
15
Variance Covariance Correlation coefficient Kendallrsquosτ
000e+00
100e-12
200e-12
300e-12
400e-12
500e-12
600e-12
700e-12
800e-12
900e-12
1 15 2 25 3 35 4
Var
ianc
e of
Gen
eral
Hyp
erbo
lic R
ank
Damping factor β
000e+00
100e-04
200e-04
300e-04
400e-04
500e-04
600e-04
700e-04
800e-04
1 15 2 25 3 35 4Cov
aria
nce
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
0730
0740
0750
0760
0770
0780
0790
0800
1 15 2 25 3 35 4
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor β
04600
04700
04800
04900
05000
05100
05200
05300
1 15 2 25 3
Ken
dallrsquo
s τ
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
000e+00
200e-05
400e-05
600e-05
800e-05
100e-04
120e-04
140e-04
160e-04
1 15 2 25 3 35 4
Var
ianc
e of
Gen
eral
Hyp
erbo
lic R
ank
Damping factor β
000e+00
200e-03
400e-03
600e-03
800e-03
100e-02
120e-02
1 15 2 25 3 35 4Cov
aria
nce
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
0997
0997
0998
0998
0999
0999
1 15 2 25 3 35 4
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor β
0815008200082500830008350084000845008500085500860008650
1 15 2 25 3 35 4
Ken
dallrsquo
s τ b
etw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
Figure 11 Variance of general hyperbolic rank for some values ofβ and itsrelation with in-degree Experiments have been performed on theuk graph (top)and synthetic graph (bottom)
assortative (highly linked pages are mostly linked to other highly linked pages) while the real web graph isdisassortative (most of the neighbours of highly linked pages have small indegree)
53 Experimental comparison of ranking orders
In this section we present experimental results about the similarity between the ranking orders induced bysome of the functional rankings discussed in the previous sections To perform the comparison we usedKendallrsquosτ and data from the U K Web graphFigure12shows how PageRank compares with HyperRankfor various pairs ofα andβ In the limit αβrarr 1 both rankings are equivalent and they remain similar in alarge region of the parameter space
α
β
07
08
09
1
τ ge 095
τ
02 03
04 05
06 07
08 09
2 25
3 35
4
01
02
03
04
05
06
07
08
09
15 2 25 3 35 4
α th
at m
axim
izes
Ken
dallrsquo
s τ
Exponent β
Actual optimumPredicted optimum with length=5
Figure 12 Comparison (using Kendallrsquosτ) between PageRank and HyperRankwith various damping parameters in the UK web graph The optimum predictedin the analysis with = 5 is very close to the real one
In the figure we can see that the rankings obtained with HyperRank and PageRank can be almostequivalent (Kendallrsquosτ ge 095) moreover the analysis shown insection42 considering only paths of
16
lengths less than 5 provides a very good approximation for the optimum combination of parameters Thismeans that in fact the difference in the damping functions in the first few levels is crucial
The exponentsβ required for giving a good approximation of PageRank are very small whenα ge 07limiting the practical applicability of HyperRank as its convergence is not faster than the one of PageRank
As far as LinearRank and PageRank are concerned long paths and largeα should be considered toobtain a sufficiently similar ranking as shown inFigure13
05 06
07 08
09
5 10
15 20
25
080085090095100
τ
τ ge 095
αL
5
10
15
20
25
05 06 07 08 09L
that
max
imiz
es K
enda
llrsquos
τExponent α
Actual optimumPredicted optimum with length=5
Figure 13 Comparison (using Kendallrsquosτ) between PageRank and LinearRankin the UK web graph with various damping parameters Again the predictedoptimum with` = 5 is very close to the actual optimum
The predicted optimum given insection43 with ` = 5 this is considering only the summation of thedifferences between both damping functions up to paths of length 5 is very close to what was obtained inpractice Forα = 08 calculating LinearRank withL = 10 (which means the same number of iterations)givesτ ge 098 for α = 09 calculating LinearRank withL = 15 also givesτ ge 098 In both cases theranking order of PageRank is approximated by the ranking order of LinearRank with very high precision
6 Conclusions
In this paper we have defined a broad class of link-based ranking algorithms based on the contribution ofdamping factors along all different paths reaching a page We introduce four particular damping decays lin-ear exponential quadratic hyperbolic and general hyperbolic where exponential is equivalent to PageRankand quadratic hyperbolic to TotalRank
We studied the differences and similarities among these ranking algorithms and we have found that
bull Functional rankings using different damping functions can provide similar orderings if the parametersare chosen carefully
bull LinearRank can be used for calculating a ranking that is as good as PageRank but with a fixed andsmaller number of iterations
bull The parameters for the damping functions depend on the characteristic of path lengths in the graphwhich is known to grow sub-logarithmically on the size of the graph
More work is needed to find other damping functions that compute rankings similar to PageRank butare easier and faster to compute We use global ranking similarity but another measure could be the ranking
17
similarity in the top 20 results of real queries In this setting our results can change so future work willinclude this variation
Because of their high cost link-based ranking methods that involve iterative calculations at query timeare probably not used by large-scale search engines at this moment but the functional ranking with lineardamping we have presented can provide a good approximation with few iterations Also the approach wehave presented could be also applied to multivalued ranking functions such as HITS [Kle99] and topic-sensitive PageRank [Hav02] to obtain for instance a method for approximating the hubs and authorityscores using less iterations and a linear damping function
Our approach also helps to understand how easy or difficult is to collude many pages to modify theranking of a given page Clearly there are many different factors path lengths damping function branchingdegrees and number of colluded pages The graph structure of the collusion will affect those factors and weplan to analyze them
Acknowledgements
We would like to thank Daniel Fogaras for a valuable discussion about TotalRank that motivated part of thisresearch
References
[AJB99] Reka Albert Hawoong Jeong and Albert L Barabasi Diameter of the world wide webNature 401130ndash131 1999
[BH98] Krishna Bharat and Monika R HenzingerImproved algorithms for topic distillation in ahyperlinked environment In Proceedings of the 21st Annual International ACM SIGIR Con-ference on Research and Development in Information Retrieval pages 104ndash111 MelbourneAustralia August 1998 ACM Press New York
[BKM +00] Andrei Broder Ravi Kumar Farzin Maghoul Prabhakar Raghavan Sridhar RajagopalanRaymie Stata Andrew Tomkins and Janet WienerGraph structure in the web Experimentsand models In Proceedings of the Ninth Conference on World Wide Web pages 309ndash320Amsterdam Netherlands May 2000 ACM Press
[Bol05] Paolo BoldiTotalrank ranking without damping In Poster proceedings of the 14th interna-tional conference on World Wide Web pages 898ndash899 Chiba Japan 2005 ACM Press
[BR04] Bela Bollobas and Oliver RiordanThe diameter of a scale-free random graph Combinator-ica 24(1)5ndash34 2004
[Bri06] Michael Brinkmeier PageRank revisited ACM Transaction on Internet Technologies6(3)257ndash279 2006
[BSV05] Paolo Boldi Massimo Santini and Sebastiano VignaPagerank as a function of the dampingfactor In Proceedings of the 14th international conference on World Wide Web pages 557ndash566 Chiba Japan 2005 ACM Press
[BYD04] Ricardo Baeza-Yates and Emilio DavisWeb page ranking using link attributes In Alternatetrack papers amp posters of the 13th international conference on World Wide Web pages 328ndash329 New York NY USA 2004 ACM Press
18
[BYRN99] Ricardo Baeza-Yates and Berthier Ribeiro-NetoModern Information Retrieval AddisonWesley May 1999
[CGS04] Yen-Yu Chen Qingqing Gan and Torsten SuelLocal methods for estimating pagerank val-ues In Proceedings of the thirteenth ACM conference on Information and knowledge man-agement (CIKM) pages 381ndash389 New York NY USA 2004 ACM Press
[CL01] Fan Chung and Linyuan LuThe diameter of random sparse graphs Adv Appl Math26257ndash279 2001
[Dav00] Brian D DavisonTopical locality in the web In Proceedings of the 23rd annual internationalACM SIGIR conference on research and development in information retrieval pages 272ndash279Athens Greece 2000 ACM Press
[GG04] Gene H Golub and Chen GreifArnoldi-type algorithms for computing stationary distributionvectors with application to pagerank Technical Report SCCM-04-15 Stanford University2004
[GS05] Antonio Gulli and Alessio SignoriniThe indexable Web is more than 115 billion pages InPoster proceedings of the 14th international conference on World Wide Web pages 902ndash903Chiba Japan 2005 ACM Press
[Hav99] Taher HaveliwalaEfficient computation of pagerank Technical report Stanford University1999
[Hav02] Taher H HaveliwalaTopic-sensitive pagerank In Proceedings of the Eleventh World WideWeb Conference pages 517ndash526 Honolulu Hawaii USA May 2002 ACM Press
[HG98] Stephanie W Haas and Erika S GramsPage and link classifications connecting diverseresources In DL rsquo98 Proceedings of the third ACM conference on Digital libraries pages99ndash107 New York NY USA 1998 ACM Press
[HK03a] Taher Haveliwala and Sepandar KamvarThe condition number of the pagerank problemTechnical Report 36 Stanford University 2003
[HK03b] Taher Haveliwala and Sepandar KamvarThe second eigenvalue of the google matrix Tech-nical Report 20 Stanford University 2003
[JM98] W-K Joo and S H Myaeng Improving retrieval effectiveness with hyperlink informationIn Proceedings of International Workshop on Information Retrieval with Asian Languages(IRAL) Singapore October 1998
[KHMG03a] S Kamvar T Haveliwala C Manning and G GolubExploiting the block structure of theweb for computing pagerank 2003
[KHMG03b] Sepandar D Kamvar Taher H Haveliwala Christopher D Manning and Gene H GolubExtrapolation methods for accelerating pagerank computations In Proceedings of the twelfthinternational conference on World Wide Web pages 261ndash270 ACM Press 2003
[Kle99] Jon M KleinbergAuthoritative sources in a hyperlinked environment Journal of the ACM46(5)604ndash632 1999
19
[KRR+00] R Kumar P Raghavan S Rajagopalan D Sivakumar A Tomkins and E UpfalStochasticmodels for the web graph In Proceedings of the 41st Annual Symposium on Foundations ofComputer Science (FOCS) pages 57ndash65 Redondo Beach CA USA 2000 IEEE CS Press
[LGZ04] Chris P Lee Gene H Golub and Stefanos A ZeniosA fast two-stage algorithm for com-puting pagerank and its extensions Technical report Stanford University 2004
[Li98] Yanhong LiToward a qualitative search engine IEEE Internet Computing July 1998
[Lif00] Maxim LifantsevVoting model for ranking Web pages In Peter Graham and MuthucumaruMaheswaran editorsProceedings of the International Conference on Internet Computingpages 143ndash148 Las Vegas Nevada USA June 2000 CSREA Press
[Mar97] Massimo MarchioriThe quest for correct information of the Web hyper search engines InProc of the sixth international conference on the Web Santa Clara USA April 1997
[NSW01] M E Newman S H Strogatz and D J WattsRandom graphs with arbitrary degree distri-butions and their applicationsPhys Rev E Stat Nonlin Soft Matter Phys 64(2 Pt 2) August2001
[PBMW98] Lawrence Page Sergey Brin Rajeev Motwani and Terry WinogradThe PageRank citationranking bringing order to the Web Technical report Stanford Digital Library TechnologiesProject 1998
[PRU02] Gopal Pandurangan Prabhakara Raghavan and Eli UpfalUsing Pagerank to characterizeWeb structure In Proceedings of the 8th Annual International Computing and CombinatoricsConference (COCOON) volume 2387 ofLecture Notes in Computer Science pages 330ndash390Singapore August 2002 Springer
[SPM05] Padmini Srinivasan Gautam Pant and Filippo MenczerA general evaluation framework fortopical crawlers Information Retrieval 8(3)417ndash447 2005
[UCH03] Trystan Upstill Nick Craswell and David HawkingPredicting fame and fortune Page-rank or indegree In Proceedings of the Australasian Document Computing SymposiumADCS2003 pages 31ndash40 Canberra Australia December 2003
[WHAT04] Fang Wu Bernardo A Huberman Lada A Adamic and Joshua R TylerInformation flow insocial groups Physica A Statistical and Theoretical Physics 337(1-2)327ndash335 June 2004
[WS98] Duncan J Watts and Steven H StrogatzCollective dynamics of small-world networksNa-ture 393(6684)440ndash442 June 1998
20
- Introduction
- Functional Rankings
- Damping Functions
-
- Linear damping
- Exponential damping PageRank
- Quadratic hyperbolic damping TotalRank
- General hyperbolic damping HyperRank
- An empirical damping
-
- Comparing Damping Functions
-
- Approximating TotalRank with PageRank
- Approximating HyperRank with PageRank
- Approximating PageRank with LinearRank
-
- Parameters for the Damping Functions
-
- Characteristic path lengths
- Damping parameters and in-degree
- Experimental comparison of ranking orders
-
- Conclusions
-
Require L maximum path length N number of nodesv preference vector1 for i 1 N do Initialization2 S[i] larr R[i] larr 2v[i](L+1)3 end for4 for step 1 Lminus1 do Iteration step5 Auxlarr 06 for i 1 N do Follow links in the graph7 for all j such that there is a link fromi to j do8 Aux[j] larr Aux[j] + R[i]outdegree(i)9 end for
10 end for11 for i 1 N do Add to ranking value12 R[i] larr Aux[i] times (Lminusstep)
(Lminus(stepminus1))13 S[i] larr S[i] + R[i]14 end for15 end for16 return R
Figure 1 Algorithm for computing a functional ranking with linear damping
using techniques for the solution of systems of linear equations some other concentrating on some specificfeatures of the web as a graph that determine forms of locality in the computation of PageRank (see forexample [PBMW98 Hav99 GG04 LGZ04 KHMG03b KHMG03a])
33 Quadratic hyperbolic damping TotalRank
Recently a ranking method called TotalRank [Bol05] has been proposed The method aims at eliminatingthe necessity for an arbitrary parameter by integrating PageRank over the entire range ofα If r(α) is thevector of PageRank then TotalRank is defined as
T =int 1
0r(α)dα
T can be written as
int 1
0r(α)dα =
1N
infin
sumt=0
int 1
0(1minusα)αt1middotPtdα
=1N
infin
sumt=0
1(t +1)(t +2)
1middotPt
By using the definition of the logarithm of a matrix
ln(I minusP) =minusinfin
sumk=1
Pk
k
we can write TotalRank as
T = Pminus1(I +(I minusPminus1) ln(I minusP))
6
provided thatPminus1 is not singular andP 6= I TotalRank is as a weighted sum of the score associated with paths of varying lengths in which the
weights are hyperbolically decreasing on the lengths of the paths In other words TotalRank is a functionalranking with damping function
damping(t) =1
(t +1)(t +2)
and it is well defined assuminfint=0damping(t) = 1
Computation It is known that the cost of calculating TotalRank is the same as the cost of calculatingPageRank via the Power Method [BSV05] even though some more iterations are required to obtain thesame precision
34 General hyperbolic damping HyperRank
TotalRank is part of a more general family of weighting schemes for paths of different lengths that can beapproximated using
s(β) =1
Nζ(β)
infin
sumt=0
1
(t +1)β 1middotPt
Again this way of ranking follows the general scheme with damping function chosen as
damping(t) =1
ζ(β)(t +1)β
Here we are using Riemannrsquos zeta functionζ(β) = suminfint=1
1tβ for normalization and we needβ gt 1 for it
to converge Note that whenβ = 2 we get weights similar to those of TotalRank in which thet-th coefficientis 1(t +1)(t +2) whereas here it is 1ζ(2)(t +1)2
A meaningful choice forβ should be done considering the distribution of paths of different lengths in ascale-free graph A largeα in PageRank or a smallβ in HyperRank means increasing the effect of longerpaths in the score
Computation Figure2 shows an algorithm for approximating HyperRank Let us define a vector se-quenceR(t) as follows
R(0) =1
Nζ(β)
R(k+1) =(
k+1k+2
)βR(k)P
It is easy to see thatsuminfint=0R(k) = s(β) becauseR(k) = 1(N middotζ(β)(k+1)β))1middotPk this observation leads
to the algorithm Note that convergence speed is much slower than ordinary PageRank especially whenβis close to 1 the norm of thek-th summand being bound by 1(1+ 1k)β Interestingly enough thoughconvergence speed is reasonable ifβ is sufficiently large
7
Require N number of nodesv preference vectorβ damping parameter1 for i 1 N do Initialization2 S[i] larr R[i] larr v[i]ζ(β)3 end for4 klarr 05 while not convergeddo Iteration step6 klarr k + 17 Auxlarr 08 for i 1 N do Follow links in the graph9 for all j such that there is a link fromi to j do
10 Aux[j] larr Aux[j] + R[i]outdegree(i)11 end for12 end for13 for i 1 N do Add to ranking value14 R[i] larr Aux[i] times(k(k+1))β
15 S[i] larr S[i] + R[i]16 end for17 end while18 return S
Figure 2 An algorithm to compute general hyperbolic rank
35 An empirical damping
An empirical damping function would consider how much the value of an endorsement decreases by fol-lowing longer paths in the real web graph This cannot be known exactly but we can attempt to measureit indirectly Pages that link to each other are more similar than pages chosen at random [Dav00] evi-dence from topical crawlers [SPM05] shows that when doing breadth-first exploring the topic ldquodriftsrdquo asthe distance increases On the same line of though we propose to use the decrease of text similarity as anapproximation to an ldquoempiricalrdquo damping function
To find out which is the correlation between link-distance and similarity we performed the followingexperiment we considered a web graph corresponding to a partial snapshot of theuk domain and sampled200 nodes at random For each sampled node we followed links backwards to obtain nodes at a minimumdistance of 1 2 3 4 or 5 links Then we sampled 12000 pairs at each minimum distance at randomand computed their similarities with the original nodes Similarity was measured using the normalization ofTFIDF [BYRN99] without stemming nor stopwords removal
The resulting averages are shown inFigure3 with standard deviation error bars Text similarity clearlydecreases with distance and in some applications the empirical distribution of text similarity versus distancecould be used as an ldquoempiricalrdquo damping function Different measures of text similarity can yield differentdistributions for instance [WHAT04] uses the number of repeated words and phrases between pages andobtains a faster decrease in similarity
8
02
03
04
05
06
07
1 2 3 4 5
Ave
rage
text
sim
ilari
ty
Link distance
Figure 3 Link distance vs average text similarity
4 Comparing Damping Functions
A comparison of the damping functions described in the previous section is shown inFigure4 of coursehyperbolic damping functions decay asymptotically more slowly than exponential damping but notice thatfor short paths the latter may dominate the former in many cases
000
005
010
015
020
025
1 2 3 4 5 6 7
Wei
ght
Length of the path (t)
PageRank α=075PageRank α=050
TotalRankHyperRank β=2HyperRank β=3
Linear damping L=10
Figure 4 Weights given by the different damping functions for some values ofαandβ
In this section we aim at analyzing how similar are these functional rankings and how could we useone of the damping functions to approximate another with a suitable choice of parameters
9
41 Approximating TotalRank with PageRank
It has been shown experimentally that the rank correlation (Kendallrsquosτ) between TotalRank and PageRankis maximal whenα asymp 07 [Bol05] the maximum value forτ is over 095 meaning that for that specificchoice ofα PageRank and TotalRank induce almost equivalent ranking orders
We now want to approach the same problem in an analytic fashion more precisely we aim at study-ing the difference between TotalRank and PageRank by calculating the difference between their respectivedamping functions
dampingTotalRank(t) =1
(t +1)(t +2)dampingPageRank(α)(t) = (1minusα)αt
As they are normalized both damping functions have the same summation over the entire range oftOur approach is to consider the summation of their differences up to a maximum length for a path As thetwo functions are decreasing the difference in the first levels makes most of the difference in the rankingsIf ` is the maximum path length we are interested in we aim at minimizing this sum
`
sumt=0
(1
(t +1)(t +2)minus (1minusα)αt
)= α`+1minus 1
`+2
The minimum absolute value is 0 and it is obtained whenα is equal to
αlowast(`) =1
`+1radic
`+2= 1minus log`
`+O
(log2`
`2
)
Figure5 showsαlowast(`) as a function of Recall that for the World-Wide Web graph the average lengthof a path between two nodes when a path exists has been estimated in about 16 [BKM+00] or 19 [AJB99]but clearly today is over 20 Now in the range of path lengths between 15 and 20 the value ofαlowast(`)parameters that minimizes the difference between the exponentially decaying weights of PageRank and thehyperbolically decaying weights of TotalRank is roughly 085
Difference Optimal value
0506
0708
09
α0 5 10 15 20 25
Length
-02
-01
0
01
02
03
04
045
050
055
060
065
070
075
080
085
090
0 5 10 15 20 25
α that
min
imiz
es th
e di
ffer
ence
of
sum
of w
eigh
ts w
ith T
otal
Ran
k
Maximum length of the paths considered
Figure 5 Bestαlowast(`) for minimizing the difference of the sum of weights betweenPageRank and TotalRank
10
42 Approximating HyperRank with PageRank
Now we want to approximate the weights of
s(β) =1
Nζ(β)
infin
sumt=0
1
(t +1)β Pt
using the weights of
r(α) =1minusα
N
infin
sumt=0
αtPt
and we proceed again by considering paths up to a certain length
`
sumt=0
(1
ζ(β)(t +1)β minus (1minusα)αt
)The minimum can be zero and it is attained at
αlowast(`β) =
radic1minus 1
ζ(β)
`
sumt=0
1
(t +1)β
Theα that minimizes the difference of weights for different values ofβ and of the maximum path lengths` is shown inFigure6 In the case ofβ = 2 for instance for path lengths up to 10 to 20 the bestα is between075 and 085
α
15202530β 5 10 15 20 25
Length
03
04
05
06
07
08
09
1
00
01
02
03
04
05
06
07
08
09
10
1 15 2 25 3 35 4
α th
at m
inim
izes
the
diff
eren
ce o
fsu
m o
f wei
ghts
Exponent β
Max path length=25Max path length=20Max path length=15Max path length=10
Max path length=5
Figure 6 Bestα for minimizing the difference of the sum of weights betweenPageRank and HyperRank with exponentβ for various path lengths
11
43 Approximating PageRank with LinearRank
For approximating the damping function of PageRank with the damping function of LinearRank we con-sider the summation of the differences up to a certain path length If`le L
`
sumt=0
((1minusα)αt minus 2(Lminus t)
L(L+1)
)And if ` gt L
Lminus1
sumt=0
((1minusα)αt minus 2(Lminus t)
L(L+1)
)+
`
sumt=L
(1minusα)αt
We will assume thatle L so the evaluation of the difference between the two rankings is done in an areain which both rankings have non-zero values TheL that minimizes the difference for a given combinationof α and` is
Llowast(α `) = `+2`α`+1 +α`+1 +1+
radic(1+α`+1)2 +4`(`+2)α`+1
2(1minusα`+1)
= `+1+O(`α(`+1)2
)and we have plotted it for different values ofα and` in Figure7
L
0405
0607
0809
α5 10 15 20 25
Length
5
10
15
20
25
30
5
10
15
20
25
30
35
40
05 06 07 08 09
L th
at m
inim
izes
the
diff
eren
ce o
fsu
m o
f wei
ghts
Exponent α
Max path length=25Max path length=20Max path length=15Max path length=10Max path length=5
Figure 7 BestL for minimizing the difference of the sum of weights betweenLinearRank and LinearRank for various path lengths
12
5 Parameters for the Damping Functions
For our experiments we used several snapshots from the Web including theuk it andeuint domainsFor comparison we also considered a synthetic scale-free network produced according to the evolving modeldescribed by Kumaret al [KRR+00] (a combination of preferential attachment and random links) with theparameters suggested by Panduranganet al [PRU02] As far as the latter is concerned in the generatedgraph the exponents for the power-law in the center part of the distributions are -21 for in-degree andPageRank and -27 for out-degree we generated a 100000-nodes graph without disconnected nodes
In this section we study the behavior of the ranking functions for varying values of their parameters
51 Characteristic path lengths
In scale-free networks the distances between pairs of nodes follow a Gaussian distribution [AJB99] (theaverage is not given in their paper) Analytic estimations for the average distance of a graph of scale-freenetwork ofn nodes include
bull O(log(n)) [WS98]
bull O(log(n) log(np)) in sparse graphs withp links [CL01]
bull 1+ log(nz1) log(z2z1) wherez1 is the average indegree andz2 is the average number of nodes atdistance 2 [NSW01] and
bull O(log(n) log(log(n))) [BR04]
We did the following experiment starting from a node picked at random we followed the links back-wards and counted the number of nodes at different distancesFigure8 plots the average distances foundwhich appear to be growing (sub)logarithmically with the size of the graphFigure9 shows the distributionobtained in each sample For this experiment we are not counting the pages without in-links
3456789
10111213141516
105 106 107 108
Ave
rage
dis
tanc
e
Number of nodes (N)
Figure 8 Average distances versus number of nodes
If graphs of different sizes show different path lengths what is the effect of this in the ranking calcu-lation Letrsquos suppose that for a graph withN1 nodes it is found by experimental or analytic means that agood parameter for PageRank isαlowast1 Now we would like to have a good parameterαlowast2 for a graph with thesame properties except that the size of the new graph isN2 lt N1
13
it Web graph uk Web graph40 mill nodes 18 mill nodes
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
Avg distance 149 Avg distance 148
euint Web graph Synthetic graph800000 nodes 100000 nodes
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
Avg distance 100 Avg distance 42
Figure 9 Distribution of the average number of nodes at a certain distance froma given node
One possible approach consistent with what we have done so far is to consider that the sum of theweights up to the average path lengths of the graphs (L1 L2) have to be similar for both rankings to behavein a similar way If we take this approach the solution is
1minus (αlowast1)L1+1 = 1minus (αlowast2)
L2+1
αlowast2 = (αlowast1)L1+1L2+1
αlowast2 asymp (αlowast1)log(N1)log(N2)
An example that can be used in practice is the following letrsquos consider a web graph withN1 = 115times109
pages (the size of the full Web estimated by [GS05]) and another graph with onlyN2 = 50times106 pages (thesize of the Web of a large country) the second graph is roughly 3 orders of magnitude smaller
If it is shown empirically thatαlowast1 = 085 is a good value for the PageRank parameter for the whole Webthenαlowast2 = 081 should have a similar behavior in the 50-million page set which is natural as the path lengthsare shorter If the subset of web pages were even smaller for instanceN2 = 106 pages (the size of the webof a large organization) thenαlowast2 = 076
14
52 Damping parameters and in-degree
In this section we are using data from theuk Web graph and a 8500-nodes synthetic graph with the sameindegree and outdegree distribution than the previous synthetic graph We first measured the variance ofthe values from the ranking function as we consider that a high variance is good in a ranking functionas the relative values differ more We also measured the relationship between the ranking function andin-degree for different values of the parameters in terms of covariance correlation coefficient and rankingorders (Kendallrsquosτ) The results are shown inFigure10
Variance Covariance Correlation coefficient Kendallrsquosτ
000e+00200e-13400e-13600e-13800e-13100e-12120e-12140e-12160e-12180e-12200e-12
01 02 03 04 05 06 07 08 09
Var
ianc
e of
Pag
eRan
k
Damping factor α
00 times 100
50 times 10-5
10 times 10-4
15 times 10-4
20 times 10-4
25 times 10-4
30 times 10-4
35 times 10-4
01 02 03 04 05 06 07 08 09
Cov
aria
nce
betw
een
Page
Ran
k an
d in
-deg
ree
Damping factor α
07000710072007300740075007600770078007900800
01 02 03 04 05 06 07 08 09
Corr
elat
ion
coef
ficie
nt w
ith in
-deg
ree
Damping factor α
042
044
046
048
050
052
054
056
01 02 03 04 05 06 07 08 09Ken
dallrsquo
s τ b
etw
een
Page
Rank
and
in-d
egre
e
Damping factor α
005
010
015
020
025
030
035
040
045
050
055
01 02 03 04 05 06 07 08 09
Var
ianc
e of
Pag
eRan
k
Damping factor α
0 times 100
1 times 10-3
2 times 10-3
3 times 10-3
4 times 10-3
5 times 10-3
6 times 10-3
01 02 03 04 05 06 07 08 09
Cov
aria
nce
betw
een
Page
Ran
k an
d in
-deg
ree
Damping factor α
0995
0996
0997
0998
0999
1000
01 02 03 04 05 06 07 08 09
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor α
079
080
081
082
083
084
085
086
087
01 02 03 04 05 06 07 08 09Ken
dallrsquo
s τ b
etw
een
Page
Rank
and
in-d
egre
e
Damping factor α
Figure 10 Variance of PageRank and its relationship with indegree in terms ofcovariance correlation coefficient and Kendallrsquosτ coefficient for varying valuesof the parameterα Top uk web graph bottom synthetic graph
The variance is higher asα increases In regard to the relationship with indegree for company homepages it has been shown that the logarithm of the in-degree is correlated with PageRank [UCH03] Ourresults are consistent with this observation The covariance monotonically increases withα both in the caseof the synthetic graph and in the web graph Not surprisingly using the generative model the correlationsare higher We observe a maximum correlation atα = 07 in the synthetic graph and atα = 05 in the webgraph We also notice that the correlation drops significativelly asα increases because a largeα means thatlonger paths have an effect in the calculation note however that this phenomenon does not significantlyimpact on the correlation coefficient that is still very large
A high correlation between PageRank and in-degree is bad from the point of view of a search enginebecause it makes link-spam easier In particular as the correlation coefficient is higher in theuk web graphnear 05 if we chooseα close to this value we are helping link spammers Note however that a highcorrelation was foreseeable because as shown in[CGS04] even approximating PageRank with just only 1level of links gets 70 of accuracy
The behavior of the Kendallrsquosτ coefficient which measures the similarity between ranking orders isthe opposite than the one observed in the real graph This also happens for HyperRank as inFigure11we made the same measurements for this functional ranking and the results were consistent The graphseems inverted because a low value ofβ has the same effect as a high value ofα longer paths have moreimportance in the calculation
The differences in the behavior of the ranking order in the synthetic graph might be explained by thefact that the generative model we are using does not capture some properties that might be relevant for theranking order under a functional ranking such as the clustering coefficient Also the synthetic graph is
15
Variance Covariance Correlation coefficient Kendallrsquosτ
000e+00
100e-12
200e-12
300e-12
400e-12
500e-12
600e-12
700e-12
800e-12
900e-12
1 15 2 25 3 35 4
Var
ianc
e of
Gen
eral
Hyp
erbo
lic R
ank
Damping factor β
000e+00
100e-04
200e-04
300e-04
400e-04
500e-04
600e-04
700e-04
800e-04
1 15 2 25 3 35 4Cov
aria
nce
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
0730
0740
0750
0760
0770
0780
0790
0800
1 15 2 25 3 35 4
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor β
04600
04700
04800
04900
05000
05100
05200
05300
1 15 2 25 3
Ken
dallrsquo
s τ
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
000e+00
200e-05
400e-05
600e-05
800e-05
100e-04
120e-04
140e-04
160e-04
1 15 2 25 3 35 4
Var
ianc
e of
Gen
eral
Hyp
erbo
lic R
ank
Damping factor β
000e+00
200e-03
400e-03
600e-03
800e-03
100e-02
120e-02
1 15 2 25 3 35 4Cov
aria
nce
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
0997
0997
0998
0998
0999
0999
1 15 2 25 3 35 4
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor β
0815008200082500830008350084000845008500085500860008650
1 15 2 25 3 35 4
Ken
dallrsquo
s τ b
etw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
Figure 11 Variance of general hyperbolic rank for some values ofβ and itsrelation with in-degree Experiments have been performed on theuk graph (top)and synthetic graph (bottom)
assortative (highly linked pages are mostly linked to other highly linked pages) while the real web graph isdisassortative (most of the neighbours of highly linked pages have small indegree)
53 Experimental comparison of ranking orders
In this section we present experimental results about the similarity between the ranking orders induced bysome of the functional rankings discussed in the previous sections To perform the comparison we usedKendallrsquosτ and data from the U K Web graphFigure12shows how PageRank compares with HyperRankfor various pairs ofα andβ In the limit αβrarr 1 both rankings are equivalent and they remain similar in alarge region of the parameter space
α
β
07
08
09
1
τ ge 095
τ
02 03
04 05
06 07
08 09
2 25
3 35
4
01
02
03
04
05
06
07
08
09
15 2 25 3 35 4
α th
at m
axim
izes
Ken
dallrsquo
s τ
Exponent β
Actual optimumPredicted optimum with length=5
Figure 12 Comparison (using Kendallrsquosτ) between PageRank and HyperRankwith various damping parameters in the UK web graph The optimum predictedin the analysis with = 5 is very close to the real one
In the figure we can see that the rankings obtained with HyperRank and PageRank can be almostequivalent (Kendallrsquosτ ge 095) moreover the analysis shown insection42 considering only paths of
16
lengths less than 5 provides a very good approximation for the optimum combination of parameters Thismeans that in fact the difference in the damping functions in the first few levels is crucial
The exponentsβ required for giving a good approximation of PageRank are very small whenα ge 07limiting the practical applicability of HyperRank as its convergence is not faster than the one of PageRank
As far as LinearRank and PageRank are concerned long paths and largeα should be considered toobtain a sufficiently similar ranking as shown inFigure13
05 06
07 08
09
5 10
15 20
25
080085090095100
τ
τ ge 095
αL
5
10
15
20
25
05 06 07 08 09L
that
max
imiz
es K
enda
llrsquos
τExponent α
Actual optimumPredicted optimum with length=5
Figure 13 Comparison (using Kendallrsquosτ) between PageRank and LinearRankin the UK web graph with various damping parameters Again the predictedoptimum with` = 5 is very close to the actual optimum
The predicted optimum given insection43 with ` = 5 this is considering only the summation of thedifferences between both damping functions up to paths of length 5 is very close to what was obtained inpractice Forα = 08 calculating LinearRank withL = 10 (which means the same number of iterations)givesτ ge 098 for α = 09 calculating LinearRank withL = 15 also givesτ ge 098 In both cases theranking order of PageRank is approximated by the ranking order of LinearRank with very high precision
6 Conclusions
In this paper we have defined a broad class of link-based ranking algorithms based on the contribution ofdamping factors along all different paths reaching a page We introduce four particular damping decays lin-ear exponential quadratic hyperbolic and general hyperbolic where exponential is equivalent to PageRankand quadratic hyperbolic to TotalRank
We studied the differences and similarities among these ranking algorithms and we have found that
bull Functional rankings using different damping functions can provide similar orderings if the parametersare chosen carefully
bull LinearRank can be used for calculating a ranking that is as good as PageRank but with a fixed andsmaller number of iterations
bull The parameters for the damping functions depend on the characteristic of path lengths in the graphwhich is known to grow sub-logarithmically on the size of the graph
More work is needed to find other damping functions that compute rankings similar to PageRank butare easier and faster to compute We use global ranking similarity but another measure could be the ranking
17
similarity in the top 20 results of real queries In this setting our results can change so future work willinclude this variation
Because of their high cost link-based ranking methods that involve iterative calculations at query timeare probably not used by large-scale search engines at this moment but the functional ranking with lineardamping we have presented can provide a good approximation with few iterations Also the approach wehave presented could be also applied to multivalued ranking functions such as HITS [Kle99] and topic-sensitive PageRank [Hav02] to obtain for instance a method for approximating the hubs and authorityscores using less iterations and a linear damping function
Our approach also helps to understand how easy or difficult is to collude many pages to modify theranking of a given page Clearly there are many different factors path lengths damping function branchingdegrees and number of colluded pages The graph structure of the collusion will affect those factors and weplan to analyze them
Acknowledgements
We would like to thank Daniel Fogaras for a valuable discussion about TotalRank that motivated part of thisresearch
References
[AJB99] Reka Albert Hawoong Jeong and Albert L Barabasi Diameter of the world wide webNature 401130ndash131 1999
[BH98] Krishna Bharat and Monika R HenzingerImproved algorithms for topic distillation in ahyperlinked environment In Proceedings of the 21st Annual International ACM SIGIR Con-ference on Research and Development in Information Retrieval pages 104ndash111 MelbourneAustralia August 1998 ACM Press New York
[BKM +00] Andrei Broder Ravi Kumar Farzin Maghoul Prabhakar Raghavan Sridhar RajagopalanRaymie Stata Andrew Tomkins and Janet WienerGraph structure in the web Experimentsand models In Proceedings of the Ninth Conference on World Wide Web pages 309ndash320Amsterdam Netherlands May 2000 ACM Press
[Bol05] Paolo BoldiTotalrank ranking without damping In Poster proceedings of the 14th interna-tional conference on World Wide Web pages 898ndash899 Chiba Japan 2005 ACM Press
[BR04] Bela Bollobas and Oliver RiordanThe diameter of a scale-free random graph Combinator-ica 24(1)5ndash34 2004
[Bri06] Michael Brinkmeier PageRank revisited ACM Transaction on Internet Technologies6(3)257ndash279 2006
[BSV05] Paolo Boldi Massimo Santini and Sebastiano VignaPagerank as a function of the dampingfactor In Proceedings of the 14th international conference on World Wide Web pages 557ndash566 Chiba Japan 2005 ACM Press
[BYD04] Ricardo Baeza-Yates and Emilio DavisWeb page ranking using link attributes In Alternatetrack papers amp posters of the 13th international conference on World Wide Web pages 328ndash329 New York NY USA 2004 ACM Press
18
[BYRN99] Ricardo Baeza-Yates and Berthier Ribeiro-NetoModern Information Retrieval AddisonWesley May 1999
[CGS04] Yen-Yu Chen Qingqing Gan and Torsten SuelLocal methods for estimating pagerank val-ues In Proceedings of the thirteenth ACM conference on Information and knowledge man-agement (CIKM) pages 381ndash389 New York NY USA 2004 ACM Press
[CL01] Fan Chung and Linyuan LuThe diameter of random sparse graphs Adv Appl Math26257ndash279 2001
[Dav00] Brian D DavisonTopical locality in the web In Proceedings of the 23rd annual internationalACM SIGIR conference on research and development in information retrieval pages 272ndash279Athens Greece 2000 ACM Press
[GG04] Gene H Golub and Chen GreifArnoldi-type algorithms for computing stationary distributionvectors with application to pagerank Technical Report SCCM-04-15 Stanford University2004
[GS05] Antonio Gulli and Alessio SignoriniThe indexable Web is more than 115 billion pages InPoster proceedings of the 14th international conference on World Wide Web pages 902ndash903Chiba Japan 2005 ACM Press
[Hav99] Taher HaveliwalaEfficient computation of pagerank Technical report Stanford University1999
[Hav02] Taher H HaveliwalaTopic-sensitive pagerank In Proceedings of the Eleventh World WideWeb Conference pages 517ndash526 Honolulu Hawaii USA May 2002 ACM Press
[HG98] Stephanie W Haas and Erika S GramsPage and link classifications connecting diverseresources In DL rsquo98 Proceedings of the third ACM conference on Digital libraries pages99ndash107 New York NY USA 1998 ACM Press
[HK03a] Taher Haveliwala and Sepandar KamvarThe condition number of the pagerank problemTechnical Report 36 Stanford University 2003
[HK03b] Taher Haveliwala and Sepandar KamvarThe second eigenvalue of the google matrix Tech-nical Report 20 Stanford University 2003
[JM98] W-K Joo and S H Myaeng Improving retrieval effectiveness with hyperlink informationIn Proceedings of International Workshop on Information Retrieval with Asian Languages(IRAL) Singapore October 1998
[KHMG03a] S Kamvar T Haveliwala C Manning and G GolubExploiting the block structure of theweb for computing pagerank 2003
[KHMG03b] Sepandar D Kamvar Taher H Haveliwala Christopher D Manning and Gene H GolubExtrapolation methods for accelerating pagerank computations In Proceedings of the twelfthinternational conference on World Wide Web pages 261ndash270 ACM Press 2003
[Kle99] Jon M KleinbergAuthoritative sources in a hyperlinked environment Journal of the ACM46(5)604ndash632 1999
19
[KRR+00] R Kumar P Raghavan S Rajagopalan D Sivakumar A Tomkins and E UpfalStochasticmodels for the web graph In Proceedings of the 41st Annual Symposium on Foundations ofComputer Science (FOCS) pages 57ndash65 Redondo Beach CA USA 2000 IEEE CS Press
[LGZ04] Chris P Lee Gene H Golub and Stefanos A ZeniosA fast two-stage algorithm for com-puting pagerank and its extensions Technical report Stanford University 2004
[Li98] Yanhong LiToward a qualitative search engine IEEE Internet Computing July 1998
[Lif00] Maxim LifantsevVoting model for ranking Web pages In Peter Graham and MuthucumaruMaheswaran editorsProceedings of the International Conference on Internet Computingpages 143ndash148 Las Vegas Nevada USA June 2000 CSREA Press
[Mar97] Massimo MarchioriThe quest for correct information of the Web hyper search engines InProc of the sixth international conference on the Web Santa Clara USA April 1997
[NSW01] M E Newman S H Strogatz and D J WattsRandom graphs with arbitrary degree distri-butions and their applicationsPhys Rev E Stat Nonlin Soft Matter Phys 64(2 Pt 2) August2001
[PBMW98] Lawrence Page Sergey Brin Rajeev Motwani and Terry WinogradThe PageRank citationranking bringing order to the Web Technical report Stanford Digital Library TechnologiesProject 1998
[PRU02] Gopal Pandurangan Prabhakara Raghavan and Eli UpfalUsing Pagerank to characterizeWeb structure In Proceedings of the 8th Annual International Computing and CombinatoricsConference (COCOON) volume 2387 ofLecture Notes in Computer Science pages 330ndash390Singapore August 2002 Springer
[SPM05] Padmini Srinivasan Gautam Pant and Filippo MenczerA general evaluation framework fortopical crawlers Information Retrieval 8(3)417ndash447 2005
[UCH03] Trystan Upstill Nick Craswell and David HawkingPredicting fame and fortune Page-rank or indegree In Proceedings of the Australasian Document Computing SymposiumADCS2003 pages 31ndash40 Canberra Australia December 2003
[WHAT04] Fang Wu Bernardo A Huberman Lada A Adamic and Joshua R TylerInformation flow insocial groups Physica A Statistical and Theoretical Physics 337(1-2)327ndash335 June 2004
[WS98] Duncan J Watts and Steven H StrogatzCollective dynamics of small-world networksNa-ture 393(6684)440ndash442 June 1998
20
- Introduction
- Functional Rankings
- Damping Functions
-
- Linear damping
- Exponential damping PageRank
- Quadratic hyperbolic damping TotalRank
- General hyperbolic damping HyperRank
- An empirical damping
-
- Comparing Damping Functions
-
- Approximating TotalRank with PageRank
- Approximating HyperRank with PageRank
- Approximating PageRank with LinearRank
-
- Parameters for the Damping Functions
-
- Characteristic path lengths
- Damping parameters and in-degree
- Experimental comparison of ranking orders
-
- Conclusions
-
provided thatPminus1 is not singular andP 6= I TotalRank is as a weighted sum of the score associated with paths of varying lengths in which the
weights are hyperbolically decreasing on the lengths of the paths In other words TotalRank is a functionalranking with damping function
damping(t) =1
(t +1)(t +2)
and it is well defined assuminfint=0damping(t) = 1
Computation It is known that the cost of calculating TotalRank is the same as the cost of calculatingPageRank via the Power Method [BSV05] even though some more iterations are required to obtain thesame precision
34 General hyperbolic damping HyperRank
TotalRank is part of a more general family of weighting schemes for paths of different lengths that can beapproximated using
s(β) =1
Nζ(β)
infin
sumt=0
1
(t +1)β 1middotPt
Again this way of ranking follows the general scheme with damping function chosen as
damping(t) =1
ζ(β)(t +1)β
Here we are using Riemannrsquos zeta functionζ(β) = suminfint=1
1tβ for normalization and we needβ gt 1 for it
to converge Note that whenβ = 2 we get weights similar to those of TotalRank in which thet-th coefficientis 1(t +1)(t +2) whereas here it is 1ζ(2)(t +1)2
A meaningful choice forβ should be done considering the distribution of paths of different lengths in ascale-free graph A largeα in PageRank or a smallβ in HyperRank means increasing the effect of longerpaths in the score
Computation Figure2 shows an algorithm for approximating HyperRank Let us define a vector se-quenceR(t) as follows
R(0) =1
Nζ(β)
R(k+1) =(
k+1k+2
)βR(k)P
It is easy to see thatsuminfint=0R(k) = s(β) becauseR(k) = 1(N middotζ(β)(k+1)β))1middotPk this observation leads
to the algorithm Note that convergence speed is much slower than ordinary PageRank especially whenβis close to 1 the norm of thek-th summand being bound by 1(1+ 1k)β Interestingly enough thoughconvergence speed is reasonable ifβ is sufficiently large
7
Require N number of nodesv preference vectorβ damping parameter1 for i 1 N do Initialization2 S[i] larr R[i] larr v[i]ζ(β)3 end for4 klarr 05 while not convergeddo Iteration step6 klarr k + 17 Auxlarr 08 for i 1 N do Follow links in the graph9 for all j such that there is a link fromi to j do
10 Aux[j] larr Aux[j] + R[i]outdegree(i)11 end for12 end for13 for i 1 N do Add to ranking value14 R[i] larr Aux[i] times(k(k+1))β
15 S[i] larr S[i] + R[i]16 end for17 end while18 return S
Figure 2 An algorithm to compute general hyperbolic rank
35 An empirical damping
An empirical damping function would consider how much the value of an endorsement decreases by fol-lowing longer paths in the real web graph This cannot be known exactly but we can attempt to measureit indirectly Pages that link to each other are more similar than pages chosen at random [Dav00] evi-dence from topical crawlers [SPM05] shows that when doing breadth-first exploring the topic ldquodriftsrdquo asthe distance increases On the same line of though we propose to use the decrease of text similarity as anapproximation to an ldquoempiricalrdquo damping function
To find out which is the correlation between link-distance and similarity we performed the followingexperiment we considered a web graph corresponding to a partial snapshot of theuk domain and sampled200 nodes at random For each sampled node we followed links backwards to obtain nodes at a minimumdistance of 1 2 3 4 or 5 links Then we sampled 12000 pairs at each minimum distance at randomand computed their similarities with the original nodes Similarity was measured using the normalization ofTFIDF [BYRN99] without stemming nor stopwords removal
The resulting averages are shown inFigure3 with standard deviation error bars Text similarity clearlydecreases with distance and in some applications the empirical distribution of text similarity versus distancecould be used as an ldquoempiricalrdquo damping function Different measures of text similarity can yield differentdistributions for instance [WHAT04] uses the number of repeated words and phrases between pages andobtains a faster decrease in similarity
8
02
03
04
05
06
07
1 2 3 4 5
Ave
rage
text
sim
ilari
ty
Link distance
Figure 3 Link distance vs average text similarity
4 Comparing Damping Functions
A comparison of the damping functions described in the previous section is shown inFigure4 of coursehyperbolic damping functions decay asymptotically more slowly than exponential damping but notice thatfor short paths the latter may dominate the former in many cases
000
005
010
015
020
025
1 2 3 4 5 6 7
Wei
ght
Length of the path (t)
PageRank α=075PageRank α=050
TotalRankHyperRank β=2HyperRank β=3
Linear damping L=10
Figure 4 Weights given by the different damping functions for some values ofαandβ
In this section we aim at analyzing how similar are these functional rankings and how could we useone of the damping functions to approximate another with a suitable choice of parameters
9
41 Approximating TotalRank with PageRank
It has been shown experimentally that the rank correlation (Kendallrsquosτ) between TotalRank and PageRankis maximal whenα asymp 07 [Bol05] the maximum value forτ is over 095 meaning that for that specificchoice ofα PageRank and TotalRank induce almost equivalent ranking orders
We now want to approach the same problem in an analytic fashion more precisely we aim at study-ing the difference between TotalRank and PageRank by calculating the difference between their respectivedamping functions
dampingTotalRank(t) =1
(t +1)(t +2)dampingPageRank(α)(t) = (1minusα)αt
As they are normalized both damping functions have the same summation over the entire range oftOur approach is to consider the summation of their differences up to a maximum length for a path As thetwo functions are decreasing the difference in the first levels makes most of the difference in the rankingsIf ` is the maximum path length we are interested in we aim at minimizing this sum
`
sumt=0
(1
(t +1)(t +2)minus (1minusα)αt
)= α`+1minus 1
`+2
The minimum absolute value is 0 and it is obtained whenα is equal to
αlowast(`) =1
`+1radic
`+2= 1minus log`
`+O
(log2`
`2
)
Figure5 showsαlowast(`) as a function of Recall that for the World-Wide Web graph the average lengthof a path between two nodes when a path exists has been estimated in about 16 [BKM+00] or 19 [AJB99]but clearly today is over 20 Now in the range of path lengths between 15 and 20 the value ofαlowast(`)parameters that minimizes the difference between the exponentially decaying weights of PageRank and thehyperbolically decaying weights of TotalRank is roughly 085
Difference Optimal value
0506
0708
09
α0 5 10 15 20 25
Length
-02
-01
0
01
02
03
04
045
050
055
060
065
070
075
080
085
090
0 5 10 15 20 25
α that
min
imiz
es th
e di
ffer
ence
of
sum
of w
eigh
ts w
ith T
otal
Ran
k
Maximum length of the paths considered
Figure 5 Bestαlowast(`) for minimizing the difference of the sum of weights betweenPageRank and TotalRank
10
42 Approximating HyperRank with PageRank
Now we want to approximate the weights of
s(β) =1
Nζ(β)
infin
sumt=0
1
(t +1)β Pt
using the weights of
r(α) =1minusα
N
infin
sumt=0
αtPt
and we proceed again by considering paths up to a certain length
`
sumt=0
(1
ζ(β)(t +1)β minus (1minusα)αt
)The minimum can be zero and it is attained at
αlowast(`β) =
radic1minus 1
ζ(β)
`
sumt=0
1
(t +1)β
Theα that minimizes the difference of weights for different values ofβ and of the maximum path lengths` is shown inFigure6 In the case ofβ = 2 for instance for path lengths up to 10 to 20 the bestα is between075 and 085
α
15202530β 5 10 15 20 25
Length
03
04
05
06
07
08
09
1
00
01
02
03
04
05
06
07
08
09
10
1 15 2 25 3 35 4
α th
at m
inim
izes
the
diff
eren
ce o
fsu
m o
f wei
ghts
Exponent β
Max path length=25Max path length=20Max path length=15Max path length=10
Max path length=5
Figure 6 Bestα for minimizing the difference of the sum of weights betweenPageRank and HyperRank with exponentβ for various path lengths
11
43 Approximating PageRank with LinearRank
For approximating the damping function of PageRank with the damping function of LinearRank we con-sider the summation of the differences up to a certain path length If`le L
`
sumt=0
((1minusα)αt minus 2(Lminus t)
L(L+1)
)And if ` gt L
Lminus1
sumt=0
((1minusα)αt minus 2(Lminus t)
L(L+1)
)+
`
sumt=L
(1minusα)αt
We will assume thatle L so the evaluation of the difference between the two rankings is done in an areain which both rankings have non-zero values TheL that minimizes the difference for a given combinationof α and` is
Llowast(α `) = `+2`α`+1 +α`+1 +1+
radic(1+α`+1)2 +4`(`+2)α`+1
2(1minusα`+1)
= `+1+O(`α(`+1)2
)and we have plotted it for different values ofα and` in Figure7
L
0405
0607
0809
α5 10 15 20 25
Length
5
10
15
20
25
30
5
10
15
20
25
30
35
40
05 06 07 08 09
L th
at m
inim
izes
the
diff
eren
ce o
fsu
m o
f wei
ghts
Exponent α
Max path length=25Max path length=20Max path length=15Max path length=10Max path length=5
Figure 7 BestL for minimizing the difference of the sum of weights betweenLinearRank and LinearRank for various path lengths
12
5 Parameters for the Damping Functions
For our experiments we used several snapshots from the Web including theuk it andeuint domainsFor comparison we also considered a synthetic scale-free network produced according to the evolving modeldescribed by Kumaret al [KRR+00] (a combination of preferential attachment and random links) with theparameters suggested by Panduranganet al [PRU02] As far as the latter is concerned in the generatedgraph the exponents for the power-law in the center part of the distributions are -21 for in-degree andPageRank and -27 for out-degree we generated a 100000-nodes graph without disconnected nodes
In this section we study the behavior of the ranking functions for varying values of their parameters
51 Characteristic path lengths
In scale-free networks the distances between pairs of nodes follow a Gaussian distribution [AJB99] (theaverage is not given in their paper) Analytic estimations for the average distance of a graph of scale-freenetwork ofn nodes include
bull O(log(n)) [WS98]
bull O(log(n) log(np)) in sparse graphs withp links [CL01]
bull 1+ log(nz1) log(z2z1) wherez1 is the average indegree andz2 is the average number of nodes atdistance 2 [NSW01] and
bull O(log(n) log(log(n))) [BR04]
We did the following experiment starting from a node picked at random we followed the links back-wards and counted the number of nodes at different distancesFigure8 plots the average distances foundwhich appear to be growing (sub)logarithmically with the size of the graphFigure9 shows the distributionobtained in each sample For this experiment we are not counting the pages without in-links
3456789
10111213141516
105 106 107 108
Ave
rage
dis
tanc
e
Number of nodes (N)
Figure 8 Average distances versus number of nodes
If graphs of different sizes show different path lengths what is the effect of this in the ranking calcu-lation Letrsquos suppose that for a graph withN1 nodes it is found by experimental or analytic means that agood parameter for PageRank isαlowast1 Now we would like to have a good parameterαlowast2 for a graph with thesame properties except that the size of the new graph isN2 lt N1
13
it Web graph uk Web graph40 mill nodes 18 mill nodes
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
Avg distance 149 Avg distance 148
euint Web graph Synthetic graph800000 nodes 100000 nodes
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
Avg distance 100 Avg distance 42
Figure 9 Distribution of the average number of nodes at a certain distance froma given node
One possible approach consistent with what we have done so far is to consider that the sum of theweights up to the average path lengths of the graphs (L1 L2) have to be similar for both rankings to behavein a similar way If we take this approach the solution is
1minus (αlowast1)L1+1 = 1minus (αlowast2)
L2+1
αlowast2 = (αlowast1)L1+1L2+1
αlowast2 asymp (αlowast1)log(N1)log(N2)
An example that can be used in practice is the following letrsquos consider a web graph withN1 = 115times109
pages (the size of the full Web estimated by [GS05]) and another graph with onlyN2 = 50times106 pages (thesize of the Web of a large country) the second graph is roughly 3 orders of magnitude smaller
If it is shown empirically thatαlowast1 = 085 is a good value for the PageRank parameter for the whole Webthenαlowast2 = 081 should have a similar behavior in the 50-million page set which is natural as the path lengthsare shorter If the subset of web pages were even smaller for instanceN2 = 106 pages (the size of the webof a large organization) thenαlowast2 = 076
14
52 Damping parameters and in-degree
In this section we are using data from theuk Web graph and a 8500-nodes synthetic graph with the sameindegree and outdegree distribution than the previous synthetic graph We first measured the variance ofthe values from the ranking function as we consider that a high variance is good in a ranking functionas the relative values differ more We also measured the relationship between the ranking function andin-degree for different values of the parameters in terms of covariance correlation coefficient and rankingorders (Kendallrsquosτ) The results are shown inFigure10
Variance Covariance Correlation coefficient Kendallrsquosτ
000e+00200e-13400e-13600e-13800e-13100e-12120e-12140e-12160e-12180e-12200e-12
01 02 03 04 05 06 07 08 09
Var
ianc
e of
Pag
eRan
k
Damping factor α
00 times 100
50 times 10-5
10 times 10-4
15 times 10-4
20 times 10-4
25 times 10-4
30 times 10-4
35 times 10-4
01 02 03 04 05 06 07 08 09
Cov
aria
nce
betw
een
Page
Ran
k an
d in
-deg
ree
Damping factor α
07000710072007300740075007600770078007900800
01 02 03 04 05 06 07 08 09
Corr
elat
ion
coef
ficie
nt w
ith in
-deg
ree
Damping factor α
042
044
046
048
050
052
054
056
01 02 03 04 05 06 07 08 09Ken
dallrsquo
s τ b
etw
een
Page
Rank
and
in-d
egre
e
Damping factor α
005
010
015
020
025
030
035
040
045
050
055
01 02 03 04 05 06 07 08 09
Var
ianc
e of
Pag
eRan
k
Damping factor α
0 times 100
1 times 10-3
2 times 10-3
3 times 10-3
4 times 10-3
5 times 10-3
6 times 10-3
01 02 03 04 05 06 07 08 09
Cov
aria
nce
betw
een
Page
Ran
k an
d in
-deg
ree
Damping factor α
0995
0996
0997
0998
0999
1000
01 02 03 04 05 06 07 08 09
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor α
079
080
081
082
083
084
085
086
087
01 02 03 04 05 06 07 08 09Ken
dallrsquo
s τ b
etw
een
Page
Rank
and
in-d
egre
e
Damping factor α
Figure 10 Variance of PageRank and its relationship with indegree in terms ofcovariance correlation coefficient and Kendallrsquosτ coefficient for varying valuesof the parameterα Top uk web graph bottom synthetic graph
The variance is higher asα increases In regard to the relationship with indegree for company homepages it has been shown that the logarithm of the in-degree is correlated with PageRank [UCH03] Ourresults are consistent with this observation The covariance monotonically increases withα both in the caseof the synthetic graph and in the web graph Not surprisingly using the generative model the correlationsare higher We observe a maximum correlation atα = 07 in the synthetic graph and atα = 05 in the webgraph We also notice that the correlation drops significativelly asα increases because a largeα means thatlonger paths have an effect in the calculation note however that this phenomenon does not significantlyimpact on the correlation coefficient that is still very large
A high correlation between PageRank and in-degree is bad from the point of view of a search enginebecause it makes link-spam easier In particular as the correlation coefficient is higher in theuk web graphnear 05 if we chooseα close to this value we are helping link spammers Note however that a highcorrelation was foreseeable because as shown in[CGS04] even approximating PageRank with just only 1level of links gets 70 of accuracy
The behavior of the Kendallrsquosτ coefficient which measures the similarity between ranking orders isthe opposite than the one observed in the real graph This also happens for HyperRank as inFigure11we made the same measurements for this functional ranking and the results were consistent The graphseems inverted because a low value ofβ has the same effect as a high value ofα longer paths have moreimportance in the calculation
The differences in the behavior of the ranking order in the synthetic graph might be explained by thefact that the generative model we are using does not capture some properties that might be relevant for theranking order under a functional ranking such as the clustering coefficient Also the synthetic graph is
15
Variance Covariance Correlation coefficient Kendallrsquosτ
000e+00
100e-12
200e-12
300e-12
400e-12
500e-12
600e-12
700e-12
800e-12
900e-12
1 15 2 25 3 35 4
Var
ianc
e of
Gen
eral
Hyp
erbo
lic R
ank
Damping factor β
000e+00
100e-04
200e-04
300e-04
400e-04
500e-04
600e-04
700e-04
800e-04
1 15 2 25 3 35 4Cov
aria
nce
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
0730
0740
0750
0760
0770
0780
0790
0800
1 15 2 25 3 35 4
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor β
04600
04700
04800
04900
05000
05100
05200
05300
1 15 2 25 3
Ken
dallrsquo
s τ
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
000e+00
200e-05
400e-05
600e-05
800e-05
100e-04
120e-04
140e-04
160e-04
1 15 2 25 3 35 4
Var
ianc
e of
Gen
eral
Hyp
erbo
lic R
ank
Damping factor β
000e+00
200e-03
400e-03
600e-03
800e-03
100e-02
120e-02
1 15 2 25 3 35 4Cov
aria
nce
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
0997
0997
0998
0998
0999
0999
1 15 2 25 3 35 4
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor β
0815008200082500830008350084000845008500085500860008650
1 15 2 25 3 35 4
Ken
dallrsquo
s τ b
etw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
Figure 11 Variance of general hyperbolic rank for some values ofβ and itsrelation with in-degree Experiments have been performed on theuk graph (top)and synthetic graph (bottom)
assortative (highly linked pages are mostly linked to other highly linked pages) while the real web graph isdisassortative (most of the neighbours of highly linked pages have small indegree)
53 Experimental comparison of ranking orders
In this section we present experimental results about the similarity between the ranking orders induced bysome of the functional rankings discussed in the previous sections To perform the comparison we usedKendallrsquosτ and data from the U K Web graphFigure12shows how PageRank compares with HyperRankfor various pairs ofα andβ In the limit αβrarr 1 both rankings are equivalent and they remain similar in alarge region of the parameter space
α
β
07
08
09
1
τ ge 095
τ
02 03
04 05
06 07
08 09
2 25
3 35
4
01
02
03
04
05
06
07
08
09
15 2 25 3 35 4
α th
at m
axim
izes
Ken
dallrsquo
s τ
Exponent β
Actual optimumPredicted optimum with length=5
Figure 12 Comparison (using Kendallrsquosτ) between PageRank and HyperRankwith various damping parameters in the UK web graph The optimum predictedin the analysis with = 5 is very close to the real one
In the figure we can see that the rankings obtained with HyperRank and PageRank can be almostequivalent (Kendallrsquosτ ge 095) moreover the analysis shown insection42 considering only paths of
16
lengths less than 5 provides a very good approximation for the optimum combination of parameters Thismeans that in fact the difference in the damping functions in the first few levels is crucial
The exponentsβ required for giving a good approximation of PageRank are very small whenα ge 07limiting the practical applicability of HyperRank as its convergence is not faster than the one of PageRank
As far as LinearRank and PageRank are concerned long paths and largeα should be considered toobtain a sufficiently similar ranking as shown inFigure13
05 06
07 08
09
5 10
15 20
25
080085090095100
τ
τ ge 095
αL
5
10
15
20
25
05 06 07 08 09L
that
max
imiz
es K
enda
llrsquos
τExponent α
Actual optimumPredicted optimum with length=5
Figure 13 Comparison (using Kendallrsquosτ) between PageRank and LinearRankin the UK web graph with various damping parameters Again the predictedoptimum with` = 5 is very close to the actual optimum
The predicted optimum given insection43 with ` = 5 this is considering only the summation of thedifferences between both damping functions up to paths of length 5 is very close to what was obtained inpractice Forα = 08 calculating LinearRank withL = 10 (which means the same number of iterations)givesτ ge 098 for α = 09 calculating LinearRank withL = 15 also givesτ ge 098 In both cases theranking order of PageRank is approximated by the ranking order of LinearRank with very high precision
6 Conclusions
In this paper we have defined a broad class of link-based ranking algorithms based on the contribution ofdamping factors along all different paths reaching a page We introduce four particular damping decays lin-ear exponential quadratic hyperbolic and general hyperbolic where exponential is equivalent to PageRankand quadratic hyperbolic to TotalRank
We studied the differences and similarities among these ranking algorithms and we have found that
bull Functional rankings using different damping functions can provide similar orderings if the parametersare chosen carefully
bull LinearRank can be used for calculating a ranking that is as good as PageRank but with a fixed andsmaller number of iterations
bull The parameters for the damping functions depend on the characteristic of path lengths in the graphwhich is known to grow sub-logarithmically on the size of the graph
More work is needed to find other damping functions that compute rankings similar to PageRank butare easier and faster to compute We use global ranking similarity but another measure could be the ranking
17
similarity in the top 20 results of real queries In this setting our results can change so future work willinclude this variation
Because of their high cost link-based ranking methods that involve iterative calculations at query timeare probably not used by large-scale search engines at this moment but the functional ranking with lineardamping we have presented can provide a good approximation with few iterations Also the approach wehave presented could be also applied to multivalued ranking functions such as HITS [Kle99] and topic-sensitive PageRank [Hav02] to obtain for instance a method for approximating the hubs and authorityscores using less iterations and a linear damping function
Our approach also helps to understand how easy or difficult is to collude many pages to modify theranking of a given page Clearly there are many different factors path lengths damping function branchingdegrees and number of colluded pages The graph structure of the collusion will affect those factors and weplan to analyze them
Acknowledgements
We would like to thank Daniel Fogaras for a valuable discussion about TotalRank that motivated part of thisresearch
References
[AJB99] Reka Albert Hawoong Jeong and Albert L Barabasi Diameter of the world wide webNature 401130ndash131 1999
[BH98] Krishna Bharat and Monika R HenzingerImproved algorithms for topic distillation in ahyperlinked environment In Proceedings of the 21st Annual International ACM SIGIR Con-ference on Research and Development in Information Retrieval pages 104ndash111 MelbourneAustralia August 1998 ACM Press New York
[BKM +00] Andrei Broder Ravi Kumar Farzin Maghoul Prabhakar Raghavan Sridhar RajagopalanRaymie Stata Andrew Tomkins and Janet WienerGraph structure in the web Experimentsand models In Proceedings of the Ninth Conference on World Wide Web pages 309ndash320Amsterdam Netherlands May 2000 ACM Press
[Bol05] Paolo BoldiTotalrank ranking without damping In Poster proceedings of the 14th interna-tional conference on World Wide Web pages 898ndash899 Chiba Japan 2005 ACM Press
[BR04] Bela Bollobas and Oliver RiordanThe diameter of a scale-free random graph Combinator-ica 24(1)5ndash34 2004
[Bri06] Michael Brinkmeier PageRank revisited ACM Transaction on Internet Technologies6(3)257ndash279 2006
[BSV05] Paolo Boldi Massimo Santini and Sebastiano VignaPagerank as a function of the dampingfactor In Proceedings of the 14th international conference on World Wide Web pages 557ndash566 Chiba Japan 2005 ACM Press
[BYD04] Ricardo Baeza-Yates and Emilio DavisWeb page ranking using link attributes In Alternatetrack papers amp posters of the 13th international conference on World Wide Web pages 328ndash329 New York NY USA 2004 ACM Press
18
[BYRN99] Ricardo Baeza-Yates and Berthier Ribeiro-NetoModern Information Retrieval AddisonWesley May 1999
[CGS04] Yen-Yu Chen Qingqing Gan and Torsten SuelLocal methods for estimating pagerank val-ues In Proceedings of the thirteenth ACM conference on Information and knowledge man-agement (CIKM) pages 381ndash389 New York NY USA 2004 ACM Press
[CL01] Fan Chung and Linyuan LuThe diameter of random sparse graphs Adv Appl Math26257ndash279 2001
[Dav00] Brian D DavisonTopical locality in the web In Proceedings of the 23rd annual internationalACM SIGIR conference on research and development in information retrieval pages 272ndash279Athens Greece 2000 ACM Press
[GG04] Gene H Golub and Chen GreifArnoldi-type algorithms for computing stationary distributionvectors with application to pagerank Technical Report SCCM-04-15 Stanford University2004
[GS05] Antonio Gulli and Alessio SignoriniThe indexable Web is more than 115 billion pages InPoster proceedings of the 14th international conference on World Wide Web pages 902ndash903Chiba Japan 2005 ACM Press
[Hav99] Taher HaveliwalaEfficient computation of pagerank Technical report Stanford University1999
[Hav02] Taher H HaveliwalaTopic-sensitive pagerank In Proceedings of the Eleventh World WideWeb Conference pages 517ndash526 Honolulu Hawaii USA May 2002 ACM Press
[HG98] Stephanie W Haas and Erika S GramsPage and link classifications connecting diverseresources In DL rsquo98 Proceedings of the third ACM conference on Digital libraries pages99ndash107 New York NY USA 1998 ACM Press
[HK03a] Taher Haveliwala and Sepandar KamvarThe condition number of the pagerank problemTechnical Report 36 Stanford University 2003
[HK03b] Taher Haveliwala and Sepandar KamvarThe second eigenvalue of the google matrix Tech-nical Report 20 Stanford University 2003
[JM98] W-K Joo and S H Myaeng Improving retrieval effectiveness with hyperlink informationIn Proceedings of International Workshop on Information Retrieval with Asian Languages(IRAL) Singapore October 1998
[KHMG03a] S Kamvar T Haveliwala C Manning and G GolubExploiting the block structure of theweb for computing pagerank 2003
[KHMG03b] Sepandar D Kamvar Taher H Haveliwala Christopher D Manning and Gene H GolubExtrapolation methods for accelerating pagerank computations In Proceedings of the twelfthinternational conference on World Wide Web pages 261ndash270 ACM Press 2003
[Kle99] Jon M KleinbergAuthoritative sources in a hyperlinked environment Journal of the ACM46(5)604ndash632 1999
19
[KRR+00] R Kumar P Raghavan S Rajagopalan D Sivakumar A Tomkins and E UpfalStochasticmodels for the web graph In Proceedings of the 41st Annual Symposium on Foundations ofComputer Science (FOCS) pages 57ndash65 Redondo Beach CA USA 2000 IEEE CS Press
[LGZ04] Chris P Lee Gene H Golub and Stefanos A ZeniosA fast two-stage algorithm for com-puting pagerank and its extensions Technical report Stanford University 2004
[Li98] Yanhong LiToward a qualitative search engine IEEE Internet Computing July 1998
[Lif00] Maxim LifantsevVoting model for ranking Web pages In Peter Graham and MuthucumaruMaheswaran editorsProceedings of the International Conference on Internet Computingpages 143ndash148 Las Vegas Nevada USA June 2000 CSREA Press
[Mar97] Massimo MarchioriThe quest for correct information of the Web hyper search engines InProc of the sixth international conference on the Web Santa Clara USA April 1997
[NSW01] M E Newman S H Strogatz and D J WattsRandom graphs with arbitrary degree distri-butions and their applicationsPhys Rev E Stat Nonlin Soft Matter Phys 64(2 Pt 2) August2001
[PBMW98] Lawrence Page Sergey Brin Rajeev Motwani and Terry WinogradThe PageRank citationranking bringing order to the Web Technical report Stanford Digital Library TechnologiesProject 1998
[PRU02] Gopal Pandurangan Prabhakara Raghavan and Eli UpfalUsing Pagerank to characterizeWeb structure In Proceedings of the 8th Annual International Computing and CombinatoricsConference (COCOON) volume 2387 ofLecture Notes in Computer Science pages 330ndash390Singapore August 2002 Springer
[SPM05] Padmini Srinivasan Gautam Pant and Filippo MenczerA general evaluation framework fortopical crawlers Information Retrieval 8(3)417ndash447 2005
[UCH03] Trystan Upstill Nick Craswell and David HawkingPredicting fame and fortune Page-rank or indegree In Proceedings of the Australasian Document Computing SymposiumADCS2003 pages 31ndash40 Canberra Australia December 2003
[WHAT04] Fang Wu Bernardo A Huberman Lada A Adamic and Joshua R TylerInformation flow insocial groups Physica A Statistical and Theoretical Physics 337(1-2)327ndash335 June 2004
[WS98] Duncan J Watts and Steven H StrogatzCollective dynamics of small-world networksNa-ture 393(6684)440ndash442 June 1998
20
- Introduction
- Functional Rankings
- Damping Functions
-
- Linear damping
- Exponential damping PageRank
- Quadratic hyperbolic damping TotalRank
- General hyperbolic damping HyperRank
- An empirical damping
-
- Comparing Damping Functions
-
- Approximating TotalRank with PageRank
- Approximating HyperRank with PageRank
- Approximating PageRank with LinearRank
-
- Parameters for the Damping Functions
-
- Characteristic path lengths
- Damping parameters and in-degree
- Experimental comparison of ranking orders
-
- Conclusions
-
Require N number of nodesv preference vectorβ damping parameter1 for i 1 N do Initialization2 S[i] larr R[i] larr v[i]ζ(β)3 end for4 klarr 05 while not convergeddo Iteration step6 klarr k + 17 Auxlarr 08 for i 1 N do Follow links in the graph9 for all j such that there is a link fromi to j do
10 Aux[j] larr Aux[j] + R[i]outdegree(i)11 end for12 end for13 for i 1 N do Add to ranking value14 R[i] larr Aux[i] times(k(k+1))β
15 S[i] larr S[i] + R[i]16 end for17 end while18 return S
Figure 2 An algorithm to compute general hyperbolic rank
35 An empirical damping
An empirical damping function would consider how much the value of an endorsement decreases by fol-lowing longer paths in the real web graph This cannot be known exactly but we can attempt to measureit indirectly Pages that link to each other are more similar than pages chosen at random [Dav00] evi-dence from topical crawlers [SPM05] shows that when doing breadth-first exploring the topic ldquodriftsrdquo asthe distance increases On the same line of though we propose to use the decrease of text similarity as anapproximation to an ldquoempiricalrdquo damping function
To find out which is the correlation between link-distance and similarity we performed the followingexperiment we considered a web graph corresponding to a partial snapshot of theuk domain and sampled200 nodes at random For each sampled node we followed links backwards to obtain nodes at a minimumdistance of 1 2 3 4 or 5 links Then we sampled 12000 pairs at each minimum distance at randomand computed their similarities with the original nodes Similarity was measured using the normalization ofTFIDF [BYRN99] without stemming nor stopwords removal
The resulting averages are shown inFigure3 with standard deviation error bars Text similarity clearlydecreases with distance and in some applications the empirical distribution of text similarity versus distancecould be used as an ldquoempiricalrdquo damping function Different measures of text similarity can yield differentdistributions for instance [WHAT04] uses the number of repeated words and phrases between pages andobtains a faster decrease in similarity
8
02
03
04
05
06
07
1 2 3 4 5
Ave
rage
text
sim
ilari
ty
Link distance
Figure 3 Link distance vs average text similarity
4 Comparing Damping Functions
A comparison of the damping functions described in the previous section is shown inFigure4 of coursehyperbolic damping functions decay asymptotically more slowly than exponential damping but notice thatfor short paths the latter may dominate the former in many cases
000
005
010
015
020
025
1 2 3 4 5 6 7
Wei
ght
Length of the path (t)
PageRank α=075PageRank α=050
TotalRankHyperRank β=2HyperRank β=3
Linear damping L=10
Figure 4 Weights given by the different damping functions for some values ofαandβ
In this section we aim at analyzing how similar are these functional rankings and how could we useone of the damping functions to approximate another with a suitable choice of parameters
9
41 Approximating TotalRank with PageRank
It has been shown experimentally that the rank correlation (Kendallrsquosτ) between TotalRank and PageRankis maximal whenα asymp 07 [Bol05] the maximum value forτ is over 095 meaning that for that specificchoice ofα PageRank and TotalRank induce almost equivalent ranking orders
We now want to approach the same problem in an analytic fashion more precisely we aim at study-ing the difference between TotalRank and PageRank by calculating the difference between their respectivedamping functions
dampingTotalRank(t) =1
(t +1)(t +2)dampingPageRank(α)(t) = (1minusα)αt
As they are normalized both damping functions have the same summation over the entire range oftOur approach is to consider the summation of their differences up to a maximum length for a path As thetwo functions are decreasing the difference in the first levels makes most of the difference in the rankingsIf ` is the maximum path length we are interested in we aim at minimizing this sum
`
sumt=0
(1
(t +1)(t +2)minus (1minusα)αt
)= α`+1minus 1
`+2
The minimum absolute value is 0 and it is obtained whenα is equal to
αlowast(`) =1
`+1radic
`+2= 1minus log`
`+O
(log2`
`2
)
Figure5 showsαlowast(`) as a function of Recall that for the World-Wide Web graph the average lengthof a path between two nodes when a path exists has been estimated in about 16 [BKM+00] or 19 [AJB99]but clearly today is over 20 Now in the range of path lengths between 15 and 20 the value ofαlowast(`)parameters that minimizes the difference between the exponentially decaying weights of PageRank and thehyperbolically decaying weights of TotalRank is roughly 085
Difference Optimal value
0506
0708
09
α0 5 10 15 20 25
Length
-02
-01
0
01
02
03
04
045
050
055
060
065
070
075
080
085
090
0 5 10 15 20 25
α that
min
imiz
es th
e di
ffer
ence
of
sum
of w
eigh
ts w
ith T
otal
Ran
k
Maximum length of the paths considered
Figure 5 Bestαlowast(`) for minimizing the difference of the sum of weights betweenPageRank and TotalRank
10
42 Approximating HyperRank with PageRank
Now we want to approximate the weights of
s(β) =1
Nζ(β)
infin
sumt=0
1
(t +1)β Pt
using the weights of
r(α) =1minusα
N
infin
sumt=0
αtPt
and we proceed again by considering paths up to a certain length
`
sumt=0
(1
ζ(β)(t +1)β minus (1minusα)αt
)The minimum can be zero and it is attained at
αlowast(`β) =
radic1minus 1
ζ(β)
`
sumt=0
1
(t +1)β
Theα that minimizes the difference of weights for different values ofβ and of the maximum path lengths` is shown inFigure6 In the case ofβ = 2 for instance for path lengths up to 10 to 20 the bestα is between075 and 085
α
15202530β 5 10 15 20 25
Length
03
04
05
06
07
08
09
1
00
01
02
03
04
05
06
07
08
09
10
1 15 2 25 3 35 4
α th
at m
inim
izes
the
diff
eren
ce o
fsu
m o
f wei
ghts
Exponent β
Max path length=25Max path length=20Max path length=15Max path length=10
Max path length=5
Figure 6 Bestα for minimizing the difference of the sum of weights betweenPageRank and HyperRank with exponentβ for various path lengths
11
43 Approximating PageRank with LinearRank
For approximating the damping function of PageRank with the damping function of LinearRank we con-sider the summation of the differences up to a certain path length If`le L
`
sumt=0
((1minusα)αt minus 2(Lminus t)
L(L+1)
)And if ` gt L
Lminus1
sumt=0
((1minusα)αt minus 2(Lminus t)
L(L+1)
)+
`
sumt=L
(1minusα)αt
We will assume thatle L so the evaluation of the difference between the two rankings is done in an areain which both rankings have non-zero values TheL that minimizes the difference for a given combinationof α and` is
Llowast(α `) = `+2`α`+1 +α`+1 +1+
radic(1+α`+1)2 +4`(`+2)α`+1
2(1minusα`+1)
= `+1+O(`α(`+1)2
)and we have plotted it for different values ofα and` in Figure7
L
0405
0607
0809
α5 10 15 20 25
Length
5
10
15
20
25
30
5
10
15
20
25
30
35
40
05 06 07 08 09
L th
at m
inim
izes
the
diff
eren
ce o
fsu
m o
f wei
ghts
Exponent α
Max path length=25Max path length=20Max path length=15Max path length=10Max path length=5
Figure 7 BestL for minimizing the difference of the sum of weights betweenLinearRank and LinearRank for various path lengths
12
5 Parameters for the Damping Functions
For our experiments we used several snapshots from the Web including theuk it andeuint domainsFor comparison we also considered a synthetic scale-free network produced according to the evolving modeldescribed by Kumaret al [KRR+00] (a combination of preferential attachment and random links) with theparameters suggested by Panduranganet al [PRU02] As far as the latter is concerned in the generatedgraph the exponents for the power-law in the center part of the distributions are -21 for in-degree andPageRank and -27 for out-degree we generated a 100000-nodes graph without disconnected nodes
In this section we study the behavior of the ranking functions for varying values of their parameters
51 Characteristic path lengths
In scale-free networks the distances between pairs of nodes follow a Gaussian distribution [AJB99] (theaverage is not given in their paper) Analytic estimations for the average distance of a graph of scale-freenetwork ofn nodes include
bull O(log(n)) [WS98]
bull O(log(n) log(np)) in sparse graphs withp links [CL01]
bull 1+ log(nz1) log(z2z1) wherez1 is the average indegree andz2 is the average number of nodes atdistance 2 [NSW01] and
bull O(log(n) log(log(n))) [BR04]
We did the following experiment starting from a node picked at random we followed the links back-wards and counted the number of nodes at different distancesFigure8 plots the average distances foundwhich appear to be growing (sub)logarithmically with the size of the graphFigure9 shows the distributionobtained in each sample For this experiment we are not counting the pages without in-links
3456789
10111213141516
105 106 107 108
Ave
rage
dis
tanc
e
Number of nodes (N)
Figure 8 Average distances versus number of nodes
If graphs of different sizes show different path lengths what is the effect of this in the ranking calcu-lation Letrsquos suppose that for a graph withN1 nodes it is found by experimental or analytic means that agood parameter for PageRank isαlowast1 Now we would like to have a good parameterαlowast2 for a graph with thesame properties except that the size of the new graph isN2 lt N1
13
it Web graph uk Web graph40 mill nodes 18 mill nodes
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
Avg distance 149 Avg distance 148
euint Web graph Synthetic graph800000 nodes 100000 nodes
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
Avg distance 100 Avg distance 42
Figure 9 Distribution of the average number of nodes at a certain distance froma given node
One possible approach consistent with what we have done so far is to consider that the sum of theweights up to the average path lengths of the graphs (L1 L2) have to be similar for both rankings to behavein a similar way If we take this approach the solution is
1minus (αlowast1)L1+1 = 1minus (αlowast2)
L2+1
αlowast2 = (αlowast1)L1+1L2+1
αlowast2 asymp (αlowast1)log(N1)log(N2)
An example that can be used in practice is the following letrsquos consider a web graph withN1 = 115times109
pages (the size of the full Web estimated by [GS05]) and another graph with onlyN2 = 50times106 pages (thesize of the Web of a large country) the second graph is roughly 3 orders of magnitude smaller
If it is shown empirically thatαlowast1 = 085 is a good value for the PageRank parameter for the whole Webthenαlowast2 = 081 should have a similar behavior in the 50-million page set which is natural as the path lengthsare shorter If the subset of web pages were even smaller for instanceN2 = 106 pages (the size of the webof a large organization) thenαlowast2 = 076
14
52 Damping parameters and in-degree
In this section we are using data from theuk Web graph and a 8500-nodes synthetic graph with the sameindegree and outdegree distribution than the previous synthetic graph We first measured the variance ofthe values from the ranking function as we consider that a high variance is good in a ranking functionas the relative values differ more We also measured the relationship between the ranking function andin-degree for different values of the parameters in terms of covariance correlation coefficient and rankingorders (Kendallrsquosτ) The results are shown inFigure10
Variance Covariance Correlation coefficient Kendallrsquosτ
000e+00200e-13400e-13600e-13800e-13100e-12120e-12140e-12160e-12180e-12200e-12
01 02 03 04 05 06 07 08 09
Var
ianc
e of
Pag
eRan
k
Damping factor α
00 times 100
50 times 10-5
10 times 10-4
15 times 10-4
20 times 10-4
25 times 10-4
30 times 10-4
35 times 10-4
01 02 03 04 05 06 07 08 09
Cov
aria
nce
betw
een
Page
Ran
k an
d in
-deg
ree
Damping factor α
07000710072007300740075007600770078007900800
01 02 03 04 05 06 07 08 09
Corr
elat
ion
coef
ficie
nt w
ith in
-deg
ree
Damping factor α
042
044
046
048
050
052
054
056
01 02 03 04 05 06 07 08 09Ken
dallrsquo
s τ b
etw
een
Page
Rank
and
in-d
egre
e
Damping factor α
005
010
015
020
025
030
035
040
045
050
055
01 02 03 04 05 06 07 08 09
Var
ianc
e of
Pag
eRan
k
Damping factor α
0 times 100
1 times 10-3
2 times 10-3
3 times 10-3
4 times 10-3
5 times 10-3
6 times 10-3
01 02 03 04 05 06 07 08 09
Cov
aria
nce
betw
een
Page
Ran
k an
d in
-deg
ree
Damping factor α
0995
0996
0997
0998
0999
1000
01 02 03 04 05 06 07 08 09
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor α
079
080
081
082
083
084
085
086
087
01 02 03 04 05 06 07 08 09Ken
dallrsquo
s τ b
etw
een
Page
Rank
and
in-d
egre
e
Damping factor α
Figure 10 Variance of PageRank and its relationship with indegree in terms ofcovariance correlation coefficient and Kendallrsquosτ coefficient for varying valuesof the parameterα Top uk web graph bottom synthetic graph
The variance is higher asα increases In regard to the relationship with indegree for company homepages it has been shown that the logarithm of the in-degree is correlated with PageRank [UCH03] Ourresults are consistent with this observation The covariance monotonically increases withα both in the caseof the synthetic graph and in the web graph Not surprisingly using the generative model the correlationsare higher We observe a maximum correlation atα = 07 in the synthetic graph and atα = 05 in the webgraph We also notice that the correlation drops significativelly asα increases because a largeα means thatlonger paths have an effect in the calculation note however that this phenomenon does not significantlyimpact on the correlation coefficient that is still very large
A high correlation between PageRank and in-degree is bad from the point of view of a search enginebecause it makes link-spam easier In particular as the correlation coefficient is higher in theuk web graphnear 05 if we chooseα close to this value we are helping link spammers Note however that a highcorrelation was foreseeable because as shown in[CGS04] even approximating PageRank with just only 1level of links gets 70 of accuracy
The behavior of the Kendallrsquosτ coefficient which measures the similarity between ranking orders isthe opposite than the one observed in the real graph This also happens for HyperRank as inFigure11we made the same measurements for this functional ranking and the results were consistent The graphseems inverted because a low value ofβ has the same effect as a high value ofα longer paths have moreimportance in the calculation
The differences in the behavior of the ranking order in the synthetic graph might be explained by thefact that the generative model we are using does not capture some properties that might be relevant for theranking order under a functional ranking such as the clustering coefficient Also the synthetic graph is
15
Variance Covariance Correlation coefficient Kendallrsquosτ
000e+00
100e-12
200e-12
300e-12
400e-12
500e-12
600e-12
700e-12
800e-12
900e-12
1 15 2 25 3 35 4
Var
ianc
e of
Gen
eral
Hyp
erbo
lic R
ank
Damping factor β
000e+00
100e-04
200e-04
300e-04
400e-04
500e-04
600e-04
700e-04
800e-04
1 15 2 25 3 35 4Cov
aria
nce
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
0730
0740
0750
0760
0770
0780
0790
0800
1 15 2 25 3 35 4
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor β
04600
04700
04800
04900
05000
05100
05200
05300
1 15 2 25 3
Ken
dallrsquo
s τ
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
000e+00
200e-05
400e-05
600e-05
800e-05
100e-04
120e-04
140e-04
160e-04
1 15 2 25 3 35 4
Var
ianc
e of
Gen
eral
Hyp
erbo
lic R
ank
Damping factor β
000e+00
200e-03
400e-03
600e-03
800e-03
100e-02
120e-02
1 15 2 25 3 35 4Cov
aria
nce
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
0997
0997
0998
0998
0999
0999
1 15 2 25 3 35 4
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor β
0815008200082500830008350084000845008500085500860008650
1 15 2 25 3 35 4
Ken
dallrsquo
s τ b
etw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
Figure 11 Variance of general hyperbolic rank for some values ofβ and itsrelation with in-degree Experiments have been performed on theuk graph (top)and synthetic graph (bottom)
assortative (highly linked pages are mostly linked to other highly linked pages) while the real web graph isdisassortative (most of the neighbours of highly linked pages have small indegree)
53 Experimental comparison of ranking orders
In this section we present experimental results about the similarity between the ranking orders induced bysome of the functional rankings discussed in the previous sections To perform the comparison we usedKendallrsquosτ and data from the U K Web graphFigure12shows how PageRank compares with HyperRankfor various pairs ofα andβ In the limit αβrarr 1 both rankings are equivalent and they remain similar in alarge region of the parameter space
α
β
07
08
09
1
τ ge 095
τ
02 03
04 05
06 07
08 09
2 25
3 35
4
01
02
03
04
05
06
07
08
09
15 2 25 3 35 4
α th
at m
axim
izes
Ken
dallrsquo
s τ
Exponent β
Actual optimumPredicted optimum with length=5
Figure 12 Comparison (using Kendallrsquosτ) between PageRank and HyperRankwith various damping parameters in the UK web graph The optimum predictedin the analysis with = 5 is very close to the real one
In the figure we can see that the rankings obtained with HyperRank and PageRank can be almostequivalent (Kendallrsquosτ ge 095) moreover the analysis shown insection42 considering only paths of
16
lengths less than 5 provides a very good approximation for the optimum combination of parameters Thismeans that in fact the difference in the damping functions in the first few levels is crucial
The exponentsβ required for giving a good approximation of PageRank are very small whenα ge 07limiting the practical applicability of HyperRank as its convergence is not faster than the one of PageRank
As far as LinearRank and PageRank are concerned long paths and largeα should be considered toobtain a sufficiently similar ranking as shown inFigure13
05 06
07 08
09
5 10
15 20
25
080085090095100
τ
τ ge 095
αL
5
10
15
20
25
05 06 07 08 09L
that
max
imiz
es K
enda
llrsquos
τExponent α
Actual optimumPredicted optimum with length=5
Figure 13 Comparison (using Kendallrsquosτ) between PageRank and LinearRankin the UK web graph with various damping parameters Again the predictedoptimum with` = 5 is very close to the actual optimum
The predicted optimum given insection43 with ` = 5 this is considering only the summation of thedifferences between both damping functions up to paths of length 5 is very close to what was obtained inpractice Forα = 08 calculating LinearRank withL = 10 (which means the same number of iterations)givesτ ge 098 for α = 09 calculating LinearRank withL = 15 also givesτ ge 098 In both cases theranking order of PageRank is approximated by the ranking order of LinearRank with very high precision
6 Conclusions
In this paper we have defined a broad class of link-based ranking algorithms based on the contribution ofdamping factors along all different paths reaching a page We introduce four particular damping decays lin-ear exponential quadratic hyperbolic and general hyperbolic where exponential is equivalent to PageRankand quadratic hyperbolic to TotalRank
We studied the differences and similarities among these ranking algorithms and we have found that
bull Functional rankings using different damping functions can provide similar orderings if the parametersare chosen carefully
bull LinearRank can be used for calculating a ranking that is as good as PageRank but with a fixed andsmaller number of iterations
bull The parameters for the damping functions depend on the characteristic of path lengths in the graphwhich is known to grow sub-logarithmically on the size of the graph
More work is needed to find other damping functions that compute rankings similar to PageRank butare easier and faster to compute We use global ranking similarity but another measure could be the ranking
17
similarity in the top 20 results of real queries In this setting our results can change so future work willinclude this variation
Because of their high cost link-based ranking methods that involve iterative calculations at query timeare probably not used by large-scale search engines at this moment but the functional ranking with lineardamping we have presented can provide a good approximation with few iterations Also the approach wehave presented could be also applied to multivalued ranking functions such as HITS [Kle99] and topic-sensitive PageRank [Hav02] to obtain for instance a method for approximating the hubs and authorityscores using less iterations and a linear damping function
Our approach also helps to understand how easy or difficult is to collude many pages to modify theranking of a given page Clearly there are many different factors path lengths damping function branchingdegrees and number of colluded pages The graph structure of the collusion will affect those factors and weplan to analyze them
Acknowledgements
We would like to thank Daniel Fogaras for a valuable discussion about TotalRank that motivated part of thisresearch
References
[AJB99] Reka Albert Hawoong Jeong and Albert L Barabasi Diameter of the world wide webNature 401130ndash131 1999
[BH98] Krishna Bharat and Monika R HenzingerImproved algorithms for topic distillation in ahyperlinked environment In Proceedings of the 21st Annual International ACM SIGIR Con-ference on Research and Development in Information Retrieval pages 104ndash111 MelbourneAustralia August 1998 ACM Press New York
[BKM +00] Andrei Broder Ravi Kumar Farzin Maghoul Prabhakar Raghavan Sridhar RajagopalanRaymie Stata Andrew Tomkins and Janet WienerGraph structure in the web Experimentsand models In Proceedings of the Ninth Conference on World Wide Web pages 309ndash320Amsterdam Netherlands May 2000 ACM Press
[Bol05] Paolo BoldiTotalrank ranking without damping In Poster proceedings of the 14th interna-tional conference on World Wide Web pages 898ndash899 Chiba Japan 2005 ACM Press
[BR04] Bela Bollobas and Oliver RiordanThe diameter of a scale-free random graph Combinator-ica 24(1)5ndash34 2004
[Bri06] Michael Brinkmeier PageRank revisited ACM Transaction on Internet Technologies6(3)257ndash279 2006
[BSV05] Paolo Boldi Massimo Santini and Sebastiano VignaPagerank as a function of the dampingfactor In Proceedings of the 14th international conference on World Wide Web pages 557ndash566 Chiba Japan 2005 ACM Press
[BYD04] Ricardo Baeza-Yates and Emilio DavisWeb page ranking using link attributes In Alternatetrack papers amp posters of the 13th international conference on World Wide Web pages 328ndash329 New York NY USA 2004 ACM Press
18
[BYRN99] Ricardo Baeza-Yates and Berthier Ribeiro-NetoModern Information Retrieval AddisonWesley May 1999
[CGS04] Yen-Yu Chen Qingqing Gan and Torsten SuelLocal methods for estimating pagerank val-ues In Proceedings of the thirteenth ACM conference on Information and knowledge man-agement (CIKM) pages 381ndash389 New York NY USA 2004 ACM Press
[CL01] Fan Chung and Linyuan LuThe diameter of random sparse graphs Adv Appl Math26257ndash279 2001
[Dav00] Brian D DavisonTopical locality in the web In Proceedings of the 23rd annual internationalACM SIGIR conference on research and development in information retrieval pages 272ndash279Athens Greece 2000 ACM Press
[GG04] Gene H Golub and Chen GreifArnoldi-type algorithms for computing stationary distributionvectors with application to pagerank Technical Report SCCM-04-15 Stanford University2004
[GS05] Antonio Gulli and Alessio SignoriniThe indexable Web is more than 115 billion pages InPoster proceedings of the 14th international conference on World Wide Web pages 902ndash903Chiba Japan 2005 ACM Press
[Hav99] Taher HaveliwalaEfficient computation of pagerank Technical report Stanford University1999
[Hav02] Taher H HaveliwalaTopic-sensitive pagerank In Proceedings of the Eleventh World WideWeb Conference pages 517ndash526 Honolulu Hawaii USA May 2002 ACM Press
[HG98] Stephanie W Haas and Erika S GramsPage and link classifications connecting diverseresources In DL rsquo98 Proceedings of the third ACM conference on Digital libraries pages99ndash107 New York NY USA 1998 ACM Press
[HK03a] Taher Haveliwala and Sepandar KamvarThe condition number of the pagerank problemTechnical Report 36 Stanford University 2003
[HK03b] Taher Haveliwala and Sepandar KamvarThe second eigenvalue of the google matrix Tech-nical Report 20 Stanford University 2003
[JM98] W-K Joo and S H Myaeng Improving retrieval effectiveness with hyperlink informationIn Proceedings of International Workshop on Information Retrieval with Asian Languages(IRAL) Singapore October 1998
[KHMG03a] S Kamvar T Haveliwala C Manning and G GolubExploiting the block structure of theweb for computing pagerank 2003
[KHMG03b] Sepandar D Kamvar Taher H Haveliwala Christopher D Manning and Gene H GolubExtrapolation methods for accelerating pagerank computations In Proceedings of the twelfthinternational conference on World Wide Web pages 261ndash270 ACM Press 2003
[Kle99] Jon M KleinbergAuthoritative sources in a hyperlinked environment Journal of the ACM46(5)604ndash632 1999
19
[KRR+00] R Kumar P Raghavan S Rajagopalan D Sivakumar A Tomkins and E UpfalStochasticmodels for the web graph In Proceedings of the 41st Annual Symposium on Foundations ofComputer Science (FOCS) pages 57ndash65 Redondo Beach CA USA 2000 IEEE CS Press
[LGZ04] Chris P Lee Gene H Golub and Stefanos A ZeniosA fast two-stage algorithm for com-puting pagerank and its extensions Technical report Stanford University 2004
[Li98] Yanhong LiToward a qualitative search engine IEEE Internet Computing July 1998
[Lif00] Maxim LifantsevVoting model for ranking Web pages In Peter Graham and MuthucumaruMaheswaran editorsProceedings of the International Conference on Internet Computingpages 143ndash148 Las Vegas Nevada USA June 2000 CSREA Press
[Mar97] Massimo MarchioriThe quest for correct information of the Web hyper search engines InProc of the sixth international conference on the Web Santa Clara USA April 1997
[NSW01] M E Newman S H Strogatz and D J WattsRandom graphs with arbitrary degree distri-butions and their applicationsPhys Rev E Stat Nonlin Soft Matter Phys 64(2 Pt 2) August2001
[PBMW98] Lawrence Page Sergey Brin Rajeev Motwani and Terry WinogradThe PageRank citationranking bringing order to the Web Technical report Stanford Digital Library TechnologiesProject 1998
[PRU02] Gopal Pandurangan Prabhakara Raghavan and Eli UpfalUsing Pagerank to characterizeWeb structure In Proceedings of the 8th Annual International Computing and CombinatoricsConference (COCOON) volume 2387 ofLecture Notes in Computer Science pages 330ndash390Singapore August 2002 Springer
[SPM05] Padmini Srinivasan Gautam Pant and Filippo MenczerA general evaluation framework fortopical crawlers Information Retrieval 8(3)417ndash447 2005
[UCH03] Trystan Upstill Nick Craswell and David HawkingPredicting fame and fortune Page-rank or indegree In Proceedings of the Australasian Document Computing SymposiumADCS2003 pages 31ndash40 Canberra Australia December 2003
[WHAT04] Fang Wu Bernardo A Huberman Lada A Adamic and Joshua R TylerInformation flow insocial groups Physica A Statistical and Theoretical Physics 337(1-2)327ndash335 June 2004
[WS98] Duncan J Watts and Steven H StrogatzCollective dynamics of small-world networksNa-ture 393(6684)440ndash442 June 1998
20
- Introduction
- Functional Rankings
- Damping Functions
-
- Linear damping
- Exponential damping PageRank
- Quadratic hyperbolic damping TotalRank
- General hyperbolic damping HyperRank
- An empirical damping
-
- Comparing Damping Functions
-
- Approximating TotalRank with PageRank
- Approximating HyperRank with PageRank
- Approximating PageRank with LinearRank
-
- Parameters for the Damping Functions
-
- Characteristic path lengths
- Damping parameters and in-degree
- Experimental comparison of ranking orders
-
- Conclusions
-
02
03
04
05
06
07
1 2 3 4 5
Ave
rage
text
sim
ilari
ty
Link distance
Figure 3 Link distance vs average text similarity
4 Comparing Damping Functions
A comparison of the damping functions described in the previous section is shown inFigure4 of coursehyperbolic damping functions decay asymptotically more slowly than exponential damping but notice thatfor short paths the latter may dominate the former in many cases
000
005
010
015
020
025
1 2 3 4 5 6 7
Wei
ght
Length of the path (t)
PageRank α=075PageRank α=050
TotalRankHyperRank β=2HyperRank β=3
Linear damping L=10
Figure 4 Weights given by the different damping functions for some values ofαandβ
In this section we aim at analyzing how similar are these functional rankings and how could we useone of the damping functions to approximate another with a suitable choice of parameters
9
41 Approximating TotalRank with PageRank
It has been shown experimentally that the rank correlation (Kendallrsquosτ) between TotalRank and PageRankis maximal whenα asymp 07 [Bol05] the maximum value forτ is over 095 meaning that for that specificchoice ofα PageRank and TotalRank induce almost equivalent ranking orders
We now want to approach the same problem in an analytic fashion more precisely we aim at study-ing the difference between TotalRank and PageRank by calculating the difference between their respectivedamping functions
dampingTotalRank(t) =1
(t +1)(t +2)dampingPageRank(α)(t) = (1minusα)αt
As they are normalized both damping functions have the same summation over the entire range oftOur approach is to consider the summation of their differences up to a maximum length for a path As thetwo functions are decreasing the difference in the first levels makes most of the difference in the rankingsIf ` is the maximum path length we are interested in we aim at minimizing this sum
`
sumt=0
(1
(t +1)(t +2)minus (1minusα)αt
)= α`+1minus 1
`+2
The minimum absolute value is 0 and it is obtained whenα is equal to
αlowast(`) =1
`+1radic
`+2= 1minus log`
`+O
(log2`
`2
)
Figure5 showsαlowast(`) as a function of Recall that for the World-Wide Web graph the average lengthof a path between two nodes when a path exists has been estimated in about 16 [BKM+00] or 19 [AJB99]but clearly today is over 20 Now in the range of path lengths between 15 and 20 the value ofαlowast(`)parameters that minimizes the difference between the exponentially decaying weights of PageRank and thehyperbolically decaying weights of TotalRank is roughly 085
Difference Optimal value
0506
0708
09
α0 5 10 15 20 25
Length
-02
-01
0
01
02
03
04
045
050
055
060
065
070
075
080
085
090
0 5 10 15 20 25
α that
min
imiz
es th
e di
ffer
ence
of
sum
of w
eigh
ts w
ith T
otal
Ran
k
Maximum length of the paths considered
Figure 5 Bestαlowast(`) for minimizing the difference of the sum of weights betweenPageRank and TotalRank
10
42 Approximating HyperRank with PageRank
Now we want to approximate the weights of
s(β) =1
Nζ(β)
infin
sumt=0
1
(t +1)β Pt
using the weights of
r(α) =1minusα
N
infin
sumt=0
αtPt
and we proceed again by considering paths up to a certain length
`
sumt=0
(1
ζ(β)(t +1)β minus (1minusα)αt
)The minimum can be zero and it is attained at
αlowast(`β) =
radic1minus 1
ζ(β)
`
sumt=0
1
(t +1)β
Theα that minimizes the difference of weights for different values ofβ and of the maximum path lengths` is shown inFigure6 In the case ofβ = 2 for instance for path lengths up to 10 to 20 the bestα is between075 and 085
α
15202530β 5 10 15 20 25
Length
03
04
05
06
07
08
09
1
00
01
02
03
04
05
06
07
08
09
10
1 15 2 25 3 35 4
α th
at m
inim
izes
the
diff
eren
ce o
fsu
m o
f wei
ghts
Exponent β
Max path length=25Max path length=20Max path length=15Max path length=10
Max path length=5
Figure 6 Bestα for minimizing the difference of the sum of weights betweenPageRank and HyperRank with exponentβ for various path lengths
11
43 Approximating PageRank with LinearRank
For approximating the damping function of PageRank with the damping function of LinearRank we con-sider the summation of the differences up to a certain path length If`le L
`
sumt=0
((1minusα)αt minus 2(Lminus t)
L(L+1)
)And if ` gt L
Lminus1
sumt=0
((1minusα)αt minus 2(Lminus t)
L(L+1)
)+
`
sumt=L
(1minusα)αt
We will assume thatle L so the evaluation of the difference between the two rankings is done in an areain which both rankings have non-zero values TheL that minimizes the difference for a given combinationof α and` is
Llowast(α `) = `+2`α`+1 +α`+1 +1+
radic(1+α`+1)2 +4`(`+2)α`+1
2(1minusα`+1)
= `+1+O(`α(`+1)2
)and we have plotted it for different values ofα and` in Figure7
L
0405
0607
0809
α5 10 15 20 25
Length
5
10
15
20
25
30
5
10
15
20
25
30
35
40
05 06 07 08 09
L th
at m
inim
izes
the
diff
eren
ce o
fsu
m o
f wei
ghts
Exponent α
Max path length=25Max path length=20Max path length=15Max path length=10Max path length=5
Figure 7 BestL for minimizing the difference of the sum of weights betweenLinearRank and LinearRank for various path lengths
12
5 Parameters for the Damping Functions
For our experiments we used several snapshots from the Web including theuk it andeuint domainsFor comparison we also considered a synthetic scale-free network produced according to the evolving modeldescribed by Kumaret al [KRR+00] (a combination of preferential attachment and random links) with theparameters suggested by Panduranganet al [PRU02] As far as the latter is concerned in the generatedgraph the exponents for the power-law in the center part of the distributions are -21 for in-degree andPageRank and -27 for out-degree we generated a 100000-nodes graph without disconnected nodes
In this section we study the behavior of the ranking functions for varying values of their parameters
51 Characteristic path lengths
In scale-free networks the distances between pairs of nodes follow a Gaussian distribution [AJB99] (theaverage is not given in their paper) Analytic estimations for the average distance of a graph of scale-freenetwork ofn nodes include
bull O(log(n)) [WS98]
bull O(log(n) log(np)) in sparse graphs withp links [CL01]
bull 1+ log(nz1) log(z2z1) wherez1 is the average indegree andz2 is the average number of nodes atdistance 2 [NSW01] and
bull O(log(n) log(log(n))) [BR04]
We did the following experiment starting from a node picked at random we followed the links back-wards and counted the number of nodes at different distancesFigure8 plots the average distances foundwhich appear to be growing (sub)logarithmically with the size of the graphFigure9 shows the distributionobtained in each sample For this experiment we are not counting the pages without in-links
3456789
10111213141516
105 106 107 108
Ave
rage
dis
tanc
e
Number of nodes (N)
Figure 8 Average distances versus number of nodes
If graphs of different sizes show different path lengths what is the effect of this in the ranking calcu-lation Letrsquos suppose that for a graph withN1 nodes it is found by experimental or analytic means that agood parameter for PageRank isαlowast1 Now we would like to have a good parameterαlowast2 for a graph with thesame properties except that the size of the new graph isN2 lt N1
13
it Web graph uk Web graph40 mill nodes 18 mill nodes
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
Avg distance 149 Avg distance 148
euint Web graph Synthetic graph800000 nodes 100000 nodes
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
Avg distance 100 Avg distance 42
Figure 9 Distribution of the average number of nodes at a certain distance froma given node
One possible approach consistent with what we have done so far is to consider that the sum of theweights up to the average path lengths of the graphs (L1 L2) have to be similar for both rankings to behavein a similar way If we take this approach the solution is
1minus (αlowast1)L1+1 = 1minus (αlowast2)
L2+1
αlowast2 = (αlowast1)L1+1L2+1
αlowast2 asymp (αlowast1)log(N1)log(N2)
An example that can be used in practice is the following letrsquos consider a web graph withN1 = 115times109
pages (the size of the full Web estimated by [GS05]) and another graph with onlyN2 = 50times106 pages (thesize of the Web of a large country) the second graph is roughly 3 orders of magnitude smaller
If it is shown empirically thatαlowast1 = 085 is a good value for the PageRank parameter for the whole Webthenαlowast2 = 081 should have a similar behavior in the 50-million page set which is natural as the path lengthsare shorter If the subset of web pages were even smaller for instanceN2 = 106 pages (the size of the webof a large organization) thenαlowast2 = 076
14
52 Damping parameters and in-degree
In this section we are using data from theuk Web graph and a 8500-nodes synthetic graph with the sameindegree and outdegree distribution than the previous synthetic graph We first measured the variance ofthe values from the ranking function as we consider that a high variance is good in a ranking functionas the relative values differ more We also measured the relationship between the ranking function andin-degree for different values of the parameters in terms of covariance correlation coefficient and rankingorders (Kendallrsquosτ) The results are shown inFigure10
Variance Covariance Correlation coefficient Kendallrsquosτ
000e+00200e-13400e-13600e-13800e-13100e-12120e-12140e-12160e-12180e-12200e-12
01 02 03 04 05 06 07 08 09
Var
ianc
e of
Pag
eRan
k
Damping factor α
00 times 100
50 times 10-5
10 times 10-4
15 times 10-4
20 times 10-4
25 times 10-4
30 times 10-4
35 times 10-4
01 02 03 04 05 06 07 08 09
Cov
aria
nce
betw
een
Page
Ran
k an
d in
-deg
ree
Damping factor α
07000710072007300740075007600770078007900800
01 02 03 04 05 06 07 08 09
Corr
elat
ion
coef
ficie
nt w
ith in
-deg
ree
Damping factor α
042
044
046
048
050
052
054
056
01 02 03 04 05 06 07 08 09Ken
dallrsquo
s τ b
etw
een
Page
Rank
and
in-d
egre
e
Damping factor α
005
010
015
020
025
030
035
040
045
050
055
01 02 03 04 05 06 07 08 09
Var
ianc
e of
Pag
eRan
k
Damping factor α
0 times 100
1 times 10-3
2 times 10-3
3 times 10-3
4 times 10-3
5 times 10-3
6 times 10-3
01 02 03 04 05 06 07 08 09
Cov
aria
nce
betw
een
Page
Ran
k an
d in
-deg
ree
Damping factor α
0995
0996
0997
0998
0999
1000
01 02 03 04 05 06 07 08 09
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor α
079
080
081
082
083
084
085
086
087
01 02 03 04 05 06 07 08 09Ken
dallrsquo
s τ b
etw
een
Page
Rank
and
in-d
egre
e
Damping factor α
Figure 10 Variance of PageRank and its relationship with indegree in terms ofcovariance correlation coefficient and Kendallrsquosτ coefficient for varying valuesof the parameterα Top uk web graph bottom synthetic graph
The variance is higher asα increases In regard to the relationship with indegree for company homepages it has been shown that the logarithm of the in-degree is correlated with PageRank [UCH03] Ourresults are consistent with this observation The covariance monotonically increases withα both in the caseof the synthetic graph and in the web graph Not surprisingly using the generative model the correlationsare higher We observe a maximum correlation atα = 07 in the synthetic graph and atα = 05 in the webgraph We also notice that the correlation drops significativelly asα increases because a largeα means thatlonger paths have an effect in the calculation note however that this phenomenon does not significantlyimpact on the correlation coefficient that is still very large
A high correlation between PageRank and in-degree is bad from the point of view of a search enginebecause it makes link-spam easier In particular as the correlation coefficient is higher in theuk web graphnear 05 if we chooseα close to this value we are helping link spammers Note however that a highcorrelation was foreseeable because as shown in[CGS04] even approximating PageRank with just only 1level of links gets 70 of accuracy
The behavior of the Kendallrsquosτ coefficient which measures the similarity between ranking orders isthe opposite than the one observed in the real graph This also happens for HyperRank as inFigure11we made the same measurements for this functional ranking and the results were consistent The graphseems inverted because a low value ofβ has the same effect as a high value ofα longer paths have moreimportance in the calculation
The differences in the behavior of the ranking order in the synthetic graph might be explained by thefact that the generative model we are using does not capture some properties that might be relevant for theranking order under a functional ranking such as the clustering coefficient Also the synthetic graph is
15
Variance Covariance Correlation coefficient Kendallrsquosτ
000e+00
100e-12
200e-12
300e-12
400e-12
500e-12
600e-12
700e-12
800e-12
900e-12
1 15 2 25 3 35 4
Var
ianc
e of
Gen
eral
Hyp
erbo
lic R
ank
Damping factor β
000e+00
100e-04
200e-04
300e-04
400e-04
500e-04
600e-04
700e-04
800e-04
1 15 2 25 3 35 4Cov
aria
nce
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
0730
0740
0750
0760
0770
0780
0790
0800
1 15 2 25 3 35 4
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor β
04600
04700
04800
04900
05000
05100
05200
05300
1 15 2 25 3
Ken
dallrsquo
s τ
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
000e+00
200e-05
400e-05
600e-05
800e-05
100e-04
120e-04
140e-04
160e-04
1 15 2 25 3 35 4
Var
ianc
e of
Gen
eral
Hyp
erbo
lic R
ank
Damping factor β
000e+00
200e-03
400e-03
600e-03
800e-03
100e-02
120e-02
1 15 2 25 3 35 4Cov
aria
nce
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
0997
0997
0998
0998
0999
0999
1 15 2 25 3 35 4
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor β
0815008200082500830008350084000845008500085500860008650
1 15 2 25 3 35 4
Ken
dallrsquo
s τ b
etw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
Figure 11 Variance of general hyperbolic rank for some values ofβ and itsrelation with in-degree Experiments have been performed on theuk graph (top)and synthetic graph (bottom)
assortative (highly linked pages are mostly linked to other highly linked pages) while the real web graph isdisassortative (most of the neighbours of highly linked pages have small indegree)
53 Experimental comparison of ranking orders
In this section we present experimental results about the similarity between the ranking orders induced bysome of the functional rankings discussed in the previous sections To perform the comparison we usedKendallrsquosτ and data from the U K Web graphFigure12shows how PageRank compares with HyperRankfor various pairs ofα andβ In the limit αβrarr 1 both rankings are equivalent and they remain similar in alarge region of the parameter space
α
β
07
08
09
1
τ ge 095
τ
02 03
04 05
06 07
08 09
2 25
3 35
4
01
02
03
04
05
06
07
08
09
15 2 25 3 35 4
α th
at m
axim
izes
Ken
dallrsquo
s τ
Exponent β
Actual optimumPredicted optimum with length=5
Figure 12 Comparison (using Kendallrsquosτ) between PageRank and HyperRankwith various damping parameters in the UK web graph The optimum predictedin the analysis with = 5 is very close to the real one
In the figure we can see that the rankings obtained with HyperRank and PageRank can be almostequivalent (Kendallrsquosτ ge 095) moreover the analysis shown insection42 considering only paths of
16
lengths less than 5 provides a very good approximation for the optimum combination of parameters Thismeans that in fact the difference in the damping functions in the first few levels is crucial
The exponentsβ required for giving a good approximation of PageRank are very small whenα ge 07limiting the practical applicability of HyperRank as its convergence is not faster than the one of PageRank
As far as LinearRank and PageRank are concerned long paths and largeα should be considered toobtain a sufficiently similar ranking as shown inFigure13
05 06
07 08
09
5 10
15 20
25
080085090095100
τ
τ ge 095
αL
5
10
15
20
25
05 06 07 08 09L
that
max
imiz
es K
enda
llrsquos
τExponent α
Actual optimumPredicted optimum with length=5
Figure 13 Comparison (using Kendallrsquosτ) between PageRank and LinearRankin the UK web graph with various damping parameters Again the predictedoptimum with` = 5 is very close to the actual optimum
The predicted optimum given insection43 with ` = 5 this is considering only the summation of thedifferences between both damping functions up to paths of length 5 is very close to what was obtained inpractice Forα = 08 calculating LinearRank withL = 10 (which means the same number of iterations)givesτ ge 098 for α = 09 calculating LinearRank withL = 15 also givesτ ge 098 In both cases theranking order of PageRank is approximated by the ranking order of LinearRank with very high precision
6 Conclusions
In this paper we have defined a broad class of link-based ranking algorithms based on the contribution ofdamping factors along all different paths reaching a page We introduce four particular damping decays lin-ear exponential quadratic hyperbolic and general hyperbolic where exponential is equivalent to PageRankand quadratic hyperbolic to TotalRank
We studied the differences and similarities among these ranking algorithms and we have found that
bull Functional rankings using different damping functions can provide similar orderings if the parametersare chosen carefully
bull LinearRank can be used for calculating a ranking that is as good as PageRank but with a fixed andsmaller number of iterations
bull The parameters for the damping functions depend on the characteristic of path lengths in the graphwhich is known to grow sub-logarithmically on the size of the graph
More work is needed to find other damping functions that compute rankings similar to PageRank butare easier and faster to compute We use global ranking similarity but another measure could be the ranking
17
similarity in the top 20 results of real queries In this setting our results can change so future work willinclude this variation
Because of their high cost link-based ranking methods that involve iterative calculations at query timeare probably not used by large-scale search engines at this moment but the functional ranking with lineardamping we have presented can provide a good approximation with few iterations Also the approach wehave presented could be also applied to multivalued ranking functions such as HITS [Kle99] and topic-sensitive PageRank [Hav02] to obtain for instance a method for approximating the hubs and authorityscores using less iterations and a linear damping function
Our approach also helps to understand how easy or difficult is to collude many pages to modify theranking of a given page Clearly there are many different factors path lengths damping function branchingdegrees and number of colluded pages The graph structure of the collusion will affect those factors and weplan to analyze them
Acknowledgements
We would like to thank Daniel Fogaras for a valuable discussion about TotalRank that motivated part of thisresearch
References
[AJB99] Reka Albert Hawoong Jeong and Albert L Barabasi Diameter of the world wide webNature 401130ndash131 1999
[BH98] Krishna Bharat and Monika R HenzingerImproved algorithms for topic distillation in ahyperlinked environment In Proceedings of the 21st Annual International ACM SIGIR Con-ference on Research and Development in Information Retrieval pages 104ndash111 MelbourneAustralia August 1998 ACM Press New York
[BKM +00] Andrei Broder Ravi Kumar Farzin Maghoul Prabhakar Raghavan Sridhar RajagopalanRaymie Stata Andrew Tomkins and Janet WienerGraph structure in the web Experimentsand models In Proceedings of the Ninth Conference on World Wide Web pages 309ndash320Amsterdam Netherlands May 2000 ACM Press
[Bol05] Paolo BoldiTotalrank ranking without damping In Poster proceedings of the 14th interna-tional conference on World Wide Web pages 898ndash899 Chiba Japan 2005 ACM Press
[BR04] Bela Bollobas and Oliver RiordanThe diameter of a scale-free random graph Combinator-ica 24(1)5ndash34 2004
[Bri06] Michael Brinkmeier PageRank revisited ACM Transaction on Internet Technologies6(3)257ndash279 2006
[BSV05] Paolo Boldi Massimo Santini and Sebastiano VignaPagerank as a function of the dampingfactor In Proceedings of the 14th international conference on World Wide Web pages 557ndash566 Chiba Japan 2005 ACM Press
[BYD04] Ricardo Baeza-Yates and Emilio DavisWeb page ranking using link attributes In Alternatetrack papers amp posters of the 13th international conference on World Wide Web pages 328ndash329 New York NY USA 2004 ACM Press
18
[BYRN99] Ricardo Baeza-Yates and Berthier Ribeiro-NetoModern Information Retrieval AddisonWesley May 1999
[CGS04] Yen-Yu Chen Qingqing Gan and Torsten SuelLocal methods for estimating pagerank val-ues In Proceedings of the thirteenth ACM conference on Information and knowledge man-agement (CIKM) pages 381ndash389 New York NY USA 2004 ACM Press
[CL01] Fan Chung and Linyuan LuThe diameter of random sparse graphs Adv Appl Math26257ndash279 2001
[Dav00] Brian D DavisonTopical locality in the web In Proceedings of the 23rd annual internationalACM SIGIR conference on research and development in information retrieval pages 272ndash279Athens Greece 2000 ACM Press
[GG04] Gene H Golub and Chen GreifArnoldi-type algorithms for computing stationary distributionvectors with application to pagerank Technical Report SCCM-04-15 Stanford University2004
[GS05] Antonio Gulli and Alessio SignoriniThe indexable Web is more than 115 billion pages InPoster proceedings of the 14th international conference on World Wide Web pages 902ndash903Chiba Japan 2005 ACM Press
[Hav99] Taher HaveliwalaEfficient computation of pagerank Technical report Stanford University1999
[Hav02] Taher H HaveliwalaTopic-sensitive pagerank In Proceedings of the Eleventh World WideWeb Conference pages 517ndash526 Honolulu Hawaii USA May 2002 ACM Press
[HG98] Stephanie W Haas and Erika S GramsPage and link classifications connecting diverseresources In DL rsquo98 Proceedings of the third ACM conference on Digital libraries pages99ndash107 New York NY USA 1998 ACM Press
[HK03a] Taher Haveliwala and Sepandar KamvarThe condition number of the pagerank problemTechnical Report 36 Stanford University 2003
[HK03b] Taher Haveliwala and Sepandar KamvarThe second eigenvalue of the google matrix Tech-nical Report 20 Stanford University 2003
[JM98] W-K Joo and S H Myaeng Improving retrieval effectiveness with hyperlink informationIn Proceedings of International Workshop on Information Retrieval with Asian Languages(IRAL) Singapore October 1998
[KHMG03a] S Kamvar T Haveliwala C Manning and G GolubExploiting the block structure of theweb for computing pagerank 2003
[KHMG03b] Sepandar D Kamvar Taher H Haveliwala Christopher D Manning and Gene H GolubExtrapolation methods for accelerating pagerank computations In Proceedings of the twelfthinternational conference on World Wide Web pages 261ndash270 ACM Press 2003
[Kle99] Jon M KleinbergAuthoritative sources in a hyperlinked environment Journal of the ACM46(5)604ndash632 1999
19
[KRR+00] R Kumar P Raghavan S Rajagopalan D Sivakumar A Tomkins and E UpfalStochasticmodels for the web graph In Proceedings of the 41st Annual Symposium on Foundations ofComputer Science (FOCS) pages 57ndash65 Redondo Beach CA USA 2000 IEEE CS Press
[LGZ04] Chris P Lee Gene H Golub and Stefanos A ZeniosA fast two-stage algorithm for com-puting pagerank and its extensions Technical report Stanford University 2004
[Li98] Yanhong LiToward a qualitative search engine IEEE Internet Computing July 1998
[Lif00] Maxim LifantsevVoting model for ranking Web pages In Peter Graham and MuthucumaruMaheswaran editorsProceedings of the International Conference on Internet Computingpages 143ndash148 Las Vegas Nevada USA June 2000 CSREA Press
[Mar97] Massimo MarchioriThe quest for correct information of the Web hyper search engines InProc of the sixth international conference on the Web Santa Clara USA April 1997
[NSW01] M E Newman S H Strogatz and D J WattsRandom graphs with arbitrary degree distri-butions and their applicationsPhys Rev E Stat Nonlin Soft Matter Phys 64(2 Pt 2) August2001
[PBMW98] Lawrence Page Sergey Brin Rajeev Motwani and Terry WinogradThe PageRank citationranking bringing order to the Web Technical report Stanford Digital Library TechnologiesProject 1998
[PRU02] Gopal Pandurangan Prabhakara Raghavan and Eli UpfalUsing Pagerank to characterizeWeb structure In Proceedings of the 8th Annual International Computing and CombinatoricsConference (COCOON) volume 2387 ofLecture Notes in Computer Science pages 330ndash390Singapore August 2002 Springer
[SPM05] Padmini Srinivasan Gautam Pant and Filippo MenczerA general evaluation framework fortopical crawlers Information Retrieval 8(3)417ndash447 2005
[UCH03] Trystan Upstill Nick Craswell and David HawkingPredicting fame and fortune Page-rank or indegree In Proceedings of the Australasian Document Computing SymposiumADCS2003 pages 31ndash40 Canberra Australia December 2003
[WHAT04] Fang Wu Bernardo A Huberman Lada A Adamic and Joshua R TylerInformation flow insocial groups Physica A Statistical and Theoretical Physics 337(1-2)327ndash335 June 2004
[WS98] Duncan J Watts and Steven H StrogatzCollective dynamics of small-world networksNa-ture 393(6684)440ndash442 June 1998
20
- Introduction
- Functional Rankings
- Damping Functions
-
- Linear damping
- Exponential damping PageRank
- Quadratic hyperbolic damping TotalRank
- General hyperbolic damping HyperRank
- An empirical damping
-
- Comparing Damping Functions
-
- Approximating TotalRank with PageRank
- Approximating HyperRank with PageRank
- Approximating PageRank with LinearRank
-
- Parameters for the Damping Functions
-
- Characteristic path lengths
- Damping parameters and in-degree
- Experimental comparison of ranking orders
-
- Conclusions
-
41 Approximating TotalRank with PageRank
It has been shown experimentally that the rank correlation (Kendallrsquosτ) between TotalRank and PageRankis maximal whenα asymp 07 [Bol05] the maximum value forτ is over 095 meaning that for that specificchoice ofα PageRank and TotalRank induce almost equivalent ranking orders
We now want to approach the same problem in an analytic fashion more precisely we aim at study-ing the difference between TotalRank and PageRank by calculating the difference between their respectivedamping functions
dampingTotalRank(t) =1
(t +1)(t +2)dampingPageRank(α)(t) = (1minusα)αt
As they are normalized both damping functions have the same summation over the entire range oftOur approach is to consider the summation of their differences up to a maximum length for a path As thetwo functions are decreasing the difference in the first levels makes most of the difference in the rankingsIf ` is the maximum path length we are interested in we aim at minimizing this sum
`
sumt=0
(1
(t +1)(t +2)minus (1minusα)αt
)= α`+1minus 1
`+2
The minimum absolute value is 0 and it is obtained whenα is equal to
αlowast(`) =1
`+1radic
`+2= 1minus log`
`+O
(log2`
`2
)
Figure5 showsαlowast(`) as a function of Recall that for the World-Wide Web graph the average lengthof a path between two nodes when a path exists has been estimated in about 16 [BKM+00] or 19 [AJB99]but clearly today is over 20 Now in the range of path lengths between 15 and 20 the value ofαlowast(`)parameters that minimizes the difference between the exponentially decaying weights of PageRank and thehyperbolically decaying weights of TotalRank is roughly 085
Difference Optimal value
0506
0708
09
α0 5 10 15 20 25
Length
-02
-01
0
01
02
03
04
045
050
055
060
065
070
075
080
085
090
0 5 10 15 20 25
α that
min
imiz
es th
e di
ffer
ence
of
sum
of w
eigh
ts w
ith T
otal
Ran
k
Maximum length of the paths considered
Figure 5 Bestαlowast(`) for minimizing the difference of the sum of weights betweenPageRank and TotalRank
10
42 Approximating HyperRank with PageRank
Now we want to approximate the weights of
s(β) =1
Nζ(β)
infin
sumt=0
1
(t +1)β Pt
using the weights of
r(α) =1minusα
N
infin
sumt=0
αtPt
and we proceed again by considering paths up to a certain length
`
sumt=0
(1
ζ(β)(t +1)β minus (1minusα)αt
)The minimum can be zero and it is attained at
αlowast(`β) =
radic1minus 1
ζ(β)
`
sumt=0
1
(t +1)β
Theα that minimizes the difference of weights for different values ofβ and of the maximum path lengths` is shown inFigure6 In the case ofβ = 2 for instance for path lengths up to 10 to 20 the bestα is between075 and 085
α
15202530β 5 10 15 20 25
Length
03
04
05
06
07
08
09
1
00
01
02
03
04
05
06
07
08
09
10
1 15 2 25 3 35 4
α th
at m
inim
izes
the
diff
eren
ce o
fsu
m o
f wei
ghts
Exponent β
Max path length=25Max path length=20Max path length=15Max path length=10
Max path length=5
Figure 6 Bestα for minimizing the difference of the sum of weights betweenPageRank and HyperRank with exponentβ for various path lengths
11
43 Approximating PageRank with LinearRank
For approximating the damping function of PageRank with the damping function of LinearRank we con-sider the summation of the differences up to a certain path length If`le L
`
sumt=0
((1minusα)αt minus 2(Lminus t)
L(L+1)
)And if ` gt L
Lminus1
sumt=0
((1minusα)αt minus 2(Lminus t)
L(L+1)
)+
`
sumt=L
(1minusα)αt
We will assume thatle L so the evaluation of the difference between the two rankings is done in an areain which both rankings have non-zero values TheL that minimizes the difference for a given combinationof α and` is
Llowast(α `) = `+2`α`+1 +α`+1 +1+
radic(1+α`+1)2 +4`(`+2)α`+1
2(1minusα`+1)
= `+1+O(`α(`+1)2
)and we have plotted it for different values ofα and` in Figure7
L
0405
0607
0809
α5 10 15 20 25
Length
5
10
15
20
25
30
5
10
15
20
25
30
35
40
05 06 07 08 09
L th
at m
inim
izes
the
diff
eren
ce o
fsu
m o
f wei
ghts
Exponent α
Max path length=25Max path length=20Max path length=15Max path length=10Max path length=5
Figure 7 BestL for minimizing the difference of the sum of weights betweenLinearRank and LinearRank for various path lengths
12
5 Parameters for the Damping Functions
For our experiments we used several snapshots from the Web including theuk it andeuint domainsFor comparison we also considered a synthetic scale-free network produced according to the evolving modeldescribed by Kumaret al [KRR+00] (a combination of preferential attachment and random links) with theparameters suggested by Panduranganet al [PRU02] As far as the latter is concerned in the generatedgraph the exponents for the power-law in the center part of the distributions are -21 for in-degree andPageRank and -27 for out-degree we generated a 100000-nodes graph without disconnected nodes
In this section we study the behavior of the ranking functions for varying values of their parameters
51 Characteristic path lengths
In scale-free networks the distances between pairs of nodes follow a Gaussian distribution [AJB99] (theaverage is not given in their paper) Analytic estimations for the average distance of a graph of scale-freenetwork ofn nodes include
bull O(log(n)) [WS98]
bull O(log(n) log(np)) in sparse graphs withp links [CL01]
bull 1+ log(nz1) log(z2z1) wherez1 is the average indegree andz2 is the average number of nodes atdistance 2 [NSW01] and
bull O(log(n) log(log(n))) [BR04]
We did the following experiment starting from a node picked at random we followed the links back-wards and counted the number of nodes at different distancesFigure8 plots the average distances foundwhich appear to be growing (sub)logarithmically with the size of the graphFigure9 shows the distributionobtained in each sample For this experiment we are not counting the pages without in-links
3456789
10111213141516
105 106 107 108
Ave
rage
dis
tanc
e
Number of nodes (N)
Figure 8 Average distances versus number of nodes
If graphs of different sizes show different path lengths what is the effect of this in the ranking calcu-lation Letrsquos suppose that for a graph withN1 nodes it is found by experimental or analytic means that agood parameter for PageRank isαlowast1 Now we would like to have a good parameterαlowast2 for a graph with thesame properties except that the size of the new graph isN2 lt N1
13
it Web graph uk Web graph40 mill nodes 18 mill nodes
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
Avg distance 149 Avg distance 148
euint Web graph Synthetic graph800000 nodes 100000 nodes
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
Avg distance 100 Avg distance 42
Figure 9 Distribution of the average number of nodes at a certain distance froma given node
One possible approach consistent with what we have done so far is to consider that the sum of theweights up to the average path lengths of the graphs (L1 L2) have to be similar for both rankings to behavein a similar way If we take this approach the solution is
1minus (αlowast1)L1+1 = 1minus (αlowast2)
L2+1
αlowast2 = (αlowast1)L1+1L2+1
αlowast2 asymp (αlowast1)log(N1)log(N2)
An example that can be used in practice is the following letrsquos consider a web graph withN1 = 115times109
pages (the size of the full Web estimated by [GS05]) and another graph with onlyN2 = 50times106 pages (thesize of the Web of a large country) the second graph is roughly 3 orders of magnitude smaller
If it is shown empirically thatαlowast1 = 085 is a good value for the PageRank parameter for the whole Webthenαlowast2 = 081 should have a similar behavior in the 50-million page set which is natural as the path lengthsare shorter If the subset of web pages were even smaller for instanceN2 = 106 pages (the size of the webof a large organization) thenαlowast2 = 076
14
52 Damping parameters and in-degree
In this section we are using data from theuk Web graph and a 8500-nodes synthetic graph with the sameindegree and outdegree distribution than the previous synthetic graph We first measured the variance ofthe values from the ranking function as we consider that a high variance is good in a ranking functionas the relative values differ more We also measured the relationship between the ranking function andin-degree for different values of the parameters in terms of covariance correlation coefficient and rankingorders (Kendallrsquosτ) The results are shown inFigure10
Variance Covariance Correlation coefficient Kendallrsquosτ
000e+00200e-13400e-13600e-13800e-13100e-12120e-12140e-12160e-12180e-12200e-12
01 02 03 04 05 06 07 08 09
Var
ianc
e of
Pag
eRan
k
Damping factor α
00 times 100
50 times 10-5
10 times 10-4
15 times 10-4
20 times 10-4
25 times 10-4
30 times 10-4
35 times 10-4
01 02 03 04 05 06 07 08 09
Cov
aria
nce
betw
een
Page
Ran
k an
d in
-deg
ree
Damping factor α
07000710072007300740075007600770078007900800
01 02 03 04 05 06 07 08 09
Corr
elat
ion
coef
ficie
nt w
ith in
-deg
ree
Damping factor α
042
044
046
048
050
052
054
056
01 02 03 04 05 06 07 08 09Ken
dallrsquo
s τ b
etw
een
Page
Rank
and
in-d
egre
e
Damping factor α
005
010
015
020
025
030
035
040
045
050
055
01 02 03 04 05 06 07 08 09
Var
ianc
e of
Pag
eRan
k
Damping factor α
0 times 100
1 times 10-3
2 times 10-3
3 times 10-3
4 times 10-3
5 times 10-3
6 times 10-3
01 02 03 04 05 06 07 08 09
Cov
aria
nce
betw
een
Page
Ran
k an
d in
-deg
ree
Damping factor α
0995
0996
0997
0998
0999
1000
01 02 03 04 05 06 07 08 09
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor α
079
080
081
082
083
084
085
086
087
01 02 03 04 05 06 07 08 09Ken
dallrsquo
s τ b
etw
een
Page
Rank
and
in-d
egre
e
Damping factor α
Figure 10 Variance of PageRank and its relationship with indegree in terms ofcovariance correlation coefficient and Kendallrsquosτ coefficient for varying valuesof the parameterα Top uk web graph bottom synthetic graph
The variance is higher asα increases In regard to the relationship with indegree for company homepages it has been shown that the logarithm of the in-degree is correlated with PageRank [UCH03] Ourresults are consistent with this observation The covariance monotonically increases withα both in the caseof the synthetic graph and in the web graph Not surprisingly using the generative model the correlationsare higher We observe a maximum correlation atα = 07 in the synthetic graph and atα = 05 in the webgraph We also notice that the correlation drops significativelly asα increases because a largeα means thatlonger paths have an effect in the calculation note however that this phenomenon does not significantlyimpact on the correlation coefficient that is still very large
A high correlation between PageRank and in-degree is bad from the point of view of a search enginebecause it makes link-spam easier In particular as the correlation coefficient is higher in theuk web graphnear 05 if we chooseα close to this value we are helping link spammers Note however that a highcorrelation was foreseeable because as shown in[CGS04] even approximating PageRank with just only 1level of links gets 70 of accuracy
The behavior of the Kendallrsquosτ coefficient which measures the similarity between ranking orders isthe opposite than the one observed in the real graph This also happens for HyperRank as inFigure11we made the same measurements for this functional ranking and the results were consistent The graphseems inverted because a low value ofβ has the same effect as a high value ofα longer paths have moreimportance in the calculation
The differences in the behavior of the ranking order in the synthetic graph might be explained by thefact that the generative model we are using does not capture some properties that might be relevant for theranking order under a functional ranking such as the clustering coefficient Also the synthetic graph is
15
Variance Covariance Correlation coefficient Kendallrsquosτ
000e+00
100e-12
200e-12
300e-12
400e-12
500e-12
600e-12
700e-12
800e-12
900e-12
1 15 2 25 3 35 4
Var
ianc
e of
Gen
eral
Hyp
erbo
lic R
ank
Damping factor β
000e+00
100e-04
200e-04
300e-04
400e-04
500e-04
600e-04
700e-04
800e-04
1 15 2 25 3 35 4Cov
aria
nce
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
0730
0740
0750
0760
0770
0780
0790
0800
1 15 2 25 3 35 4
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor β
04600
04700
04800
04900
05000
05100
05200
05300
1 15 2 25 3
Ken
dallrsquo
s τ
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
000e+00
200e-05
400e-05
600e-05
800e-05
100e-04
120e-04
140e-04
160e-04
1 15 2 25 3 35 4
Var
ianc
e of
Gen
eral
Hyp
erbo
lic R
ank
Damping factor β
000e+00
200e-03
400e-03
600e-03
800e-03
100e-02
120e-02
1 15 2 25 3 35 4Cov
aria
nce
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
0997
0997
0998
0998
0999
0999
1 15 2 25 3 35 4
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor β
0815008200082500830008350084000845008500085500860008650
1 15 2 25 3 35 4
Ken
dallrsquo
s τ b
etw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
Figure 11 Variance of general hyperbolic rank for some values ofβ and itsrelation with in-degree Experiments have been performed on theuk graph (top)and synthetic graph (bottom)
assortative (highly linked pages are mostly linked to other highly linked pages) while the real web graph isdisassortative (most of the neighbours of highly linked pages have small indegree)
53 Experimental comparison of ranking orders
In this section we present experimental results about the similarity between the ranking orders induced bysome of the functional rankings discussed in the previous sections To perform the comparison we usedKendallrsquosτ and data from the U K Web graphFigure12shows how PageRank compares with HyperRankfor various pairs ofα andβ In the limit αβrarr 1 both rankings are equivalent and they remain similar in alarge region of the parameter space
α
β
07
08
09
1
τ ge 095
τ
02 03
04 05
06 07
08 09
2 25
3 35
4
01
02
03
04
05
06
07
08
09
15 2 25 3 35 4
α th
at m
axim
izes
Ken
dallrsquo
s τ
Exponent β
Actual optimumPredicted optimum with length=5
Figure 12 Comparison (using Kendallrsquosτ) between PageRank and HyperRankwith various damping parameters in the UK web graph The optimum predictedin the analysis with = 5 is very close to the real one
In the figure we can see that the rankings obtained with HyperRank and PageRank can be almostequivalent (Kendallrsquosτ ge 095) moreover the analysis shown insection42 considering only paths of
16
lengths less than 5 provides a very good approximation for the optimum combination of parameters Thismeans that in fact the difference in the damping functions in the first few levels is crucial
The exponentsβ required for giving a good approximation of PageRank are very small whenα ge 07limiting the practical applicability of HyperRank as its convergence is not faster than the one of PageRank
As far as LinearRank and PageRank are concerned long paths and largeα should be considered toobtain a sufficiently similar ranking as shown inFigure13
05 06
07 08
09
5 10
15 20
25
080085090095100
τ
τ ge 095
αL
5
10
15
20
25
05 06 07 08 09L
that
max
imiz
es K
enda
llrsquos
τExponent α
Actual optimumPredicted optimum with length=5
Figure 13 Comparison (using Kendallrsquosτ) between PageRank and LinearRankin the UK web graph with various damping parameters Again the predictedoptimum with` = 5 is very close to the actual optimum
The predicted optimum given insection43 with ` = 5 this is considering only the summation of thedifferences between both damping functions up to paths of length 5 is very close to what was obtained inpractice Forα = 08 calculating LinearRank withL = 10 (which means the same number of iterations)givesτ ge 098 for α = 09 calculating LinearRank withL = 15 also givesτ ge 098 In both cases theranking order of PageRank is approximated by the ranking order of LinearRank with very high precision
6 Conclusions
In this paper we have defined a broad class of link-based ranking algorithms based on the contribution ofdamping factors along all different paths reaching a page We introduce four particular damping decays lin-ear exponential quadratic hyperbolic and general hyperbolic where exponential is equivalent to PageRankand quadratic hyperbolic to TotalRank
We studied the differences and similarities among these ranking algorithms and we have found that
bull Functional rankings using different damping functions can provide similar orderings if the parametersare chosen carefully
bull LinearRank can be used for calculating a ranking that is as good as PageRank but with a fixed andsmaller number of iterations
bull The parameters for the damping functions depend on the characteristic of path lengths in the graphwhich is known to grow sub-logarithmically on the size of the graph
More work is needed to find other damping functions that compute rankings similar to PageRank butare easier and faster to compute We use global ranking similarity but another measure could be the ranking
17
similarity in the top 20 results of real queries In this setting our results can change so future work willinclude this variation
Because of their high cost link-based ranking methods that involve iterative calculations at query timeare probably not used by large-scale search engines at this moment but the functional ranking with lineardamping we have presented can provide a good approximation with few iterations Also the approach wehave presented could be also applied to multivalued ranking functions such as HITS [Kle99] and topic-sensitive PageRank [Hav02] to obtain for instance a method for approximating the hubs and authorityscores using less iterations and a linear damping function
Our approach also helps to understand how easy or difficult is to collude many pages to modify theranking of a given page Clearly there are many different factors path lengths damping function branchingdegrees and number of colluded pages The graph structure of the collusion will affect those factors and weplan to analyze them
Acknowledgements
We would like to thank Daniel Fogaras for a valuable discussion about TotalRank that motivated part of thisresearch
References
[AJB99] Reka Albert Hawoong Jeong and Albert L Barabasi Diameter of the world wide webNature 401130ndash131 1999
[BH98] Krishna Bharat and Monika R HenzingerImproved algorithms for topic distillation in ahyperlinked environment In Proceedings of the 21st Annual International ACM SIGIR Con-ference on Research and Development in Information Retrieval pages 104ndash111 MelbourneAustralia August 1998 ACM Press New York
[BKM +00] Andrei Broder Ravi Kumar Farzin Maghoul Prabhakar Raghavan Sridhar RajagopalanRaymie Stata Andrew Tomkins and Janet WienerGraph structure in the web Experimentsand models In Proceedings of the Ninth Conference on World Wide Web pages 309ndash320Amsterdam Netherlands May 2000 ACM Press
[Bol05] Paolo BoldiTotalrank ranking without damping In Poster proceedings of the 14th interna-tional conference on World Wide Web pages 898ndash899 Chiba Japan 2005 ACM Press
[BR04] Bela Bollobas and Oliver RiordanThe diameter of a scale-free random graph Combinator-ica 24(1)5ndash34 2004
[Bri06] Michael Brinkmeier PageRank revisited ACM Transaction on Internet Technologies6(3)257ndash279 2006
[BSV05] Paolo Boldi Massimo Santini and Sebastiano VignaPagerank as a function of the dampingfactor In Proceedings of the 14th international conference on World Wide Web pages 557ndash566 Chiba Japan 2005 ACM Press
[BYD04] Ricardo Baeza-Yates and Emilio DavisWeb page ranking using link attributes In Alternatetrack papers amp posters of the 13th international conference on World Wide Web pages 328ndash329 New York NY USA 2004 ACM Press
18
[BYRN99] Ricardo Baeza-Yates and Berthier Ribeiro-NetoModern Information Retrieval AddisonWesley May 1999
[CGS04] Yen-Yu Chen Qingqing Gan and Torsten SuelLocal methods for estimating pagerank val-ues In Proceedings of the thirteenth ACM conference on Information and knowledge man-agement (CIKM) pages 381ndash389 New York NY USA 2004 ACM Press
[CL01] Fan Chung and Linyuan LuThe diameter of random sparse graphs Adv Appl Math26257ndash279 2001
[Dav00] Brian D DavisonTopical locality in the web In Proceedings of the 23rd annual internationalACM SIGIR conference on research and development in information retrieval pages 272ndash279Athens Greece 2000 ACM Press
[GG04] Gene H Golub and Chen GreifArnoldi-type algorithms for computing stationary distributionvectors with application to pagerank Technical Report SCCM-04-15 Stanford University2004
[GS05] Antonio Gulli and Alessio SignoriniThe indexable Web is more than 115 billion pages InPoster proceedings of the 14th international conference on World Wide Web pages 902ndash903Chiba Japan 2005 ACM Press
[Hav99] Taher HaveliwalaEfficient computation of pagerank Technical report Stanford University1999
[Hav02] Taher H HaveliwalaTopic-sensitive pagerank In Proceedings of the Eleventh World WideWeb Conference pages 517ndash526 Honolulu Hawaii USA May 2002 ACM Press
[HG98] Stephanie W Haas and Erika S GramsPage and link classifications connecting diverseresources In DL rsquo98 Proceedings of the third ACM conference on Digital libraries pages99ndash107 New York NY USA 1998 ACM Press
[HK03a] Taher Haveliwala and Sepandar KamvarThe condition number of the pagerank problemTechnical Report 36 Stanford University 2003
[HK03b] Taher Haveliwala and Sepandar KamvarThe second eigenvalue of the google matrix Tech-nical Report 20 Stanford University 2003
[JM98] W-K Joo and S H Myaeng Improving retrieval effectiveness with hyperlink informationIn Proceedings of International Workshop on Information Retrieval with Asian Languages(IRAL) Singapore October 1998
[KHMG03a] S Kamvar T Haveliwala C Manning and G GolubExploiting the block structure of theweb for computing pagerank 2003
[KHMG03b] Sepandar D Kamvar Taher H Haveliwala Christopher D Manning and Gene H GolubExtrapolation methods for accelerating pagerank computations In Proceedings of the twelfthinternational conference on World Wide Web pages 261ndash270 ACM Press 2003
[Kle99] Jon M KleinbergAuthoritative sources in a hyperlinked environment Journal of the ACM46(5)604ndash632 1999
19
[KRR+00] R Kumar P Raghavan S Rajagopalan D Sivakumar A Tomkins and E UpfalStochasticmodels for the web graph In Proceedings of the 41st Annual Symposium on Foundations ofComputer Science (FOCS) pages 57ndash65 Redondo Beach CA USA 2000 IEEE CS Press
[LGZ04] Chris P Lee Gene H Golub and Stefanos A ZeniosA fast two-stage algorithm for com-puting pagerank and its extensions Technical report Stanford University 2004
[Li98] Yanhong LiToward a qualitative search engine IEEE Internet Computing July 1998
[Lif00] Maxim LifantsevVoting model for ranking Web pages In Peter Graham and MuthucumaruMaheswaran editorsProceedings of the International Conference on Internet Computingpages 143ndash148 Las Vegas Nevada USA June 2000 CSREA Press
[Mar97] Massimo MarchioriThe quest for correct information of the Web hyper search engines InProc of the sixth international conference on the Web Santa Clara USA April 1997
[NSW01] M E Newman S H Strogatz and D J WattsRandom graphs with arbitrary degree distri-butions and their applicationsPhys Rev E Stat Nonlin Soft Matter Phys 64(2 Pt 2) August2001
[PBMW98] Lawrence Page Sergey Brin Rajeev Motwani and Terry WinogradThe PageRank citationranking bringing order to the Web Technical report Stanford Digital Library TechnologiesProject 1998
[PRU02] Gopal Pandurangan Prabhakara Raghavan and Eli UpfalUsing Pagerank to characterizeWeb structure In Proceedings of the 8th Annual International Computing and CombinatoricsConference (COCOON) volume 2387 ofLecture Notes in Computer Science pages 330ndash390Singapore August 2002 Springer
[SPM05] Padmini Srinivasan Gautam Pant and Filippo MenczerA general evaluation framework fortopical crawlers Information Retrieval 8(3)417ndash447 2005
[UCH03] Trystan Upstill Nick Craswell and David HawkingPredicting fame and fortune Page-rank or indegree In Proceedings of the Australasian Document Computing SymposiumADCS2003 pages 31ndash40 Canberra Australia December 2003
[WHAT04] Fang Wu Bernardo A Huberman Lada A Adamic and Joshua R TylerInformation flow insocial groups Physica A Statistical and Theoretical Physics 337(1-2)327ndash335 June 2004
[WS98] Duncan J Watts and Steven H StrogatzCollective dynamics of small-world networksNa-ture 393(6684)440ndash442 June 1998
20
- Introduction
- Functional Rankings
- Damping Functions
-
- Linear damping
- Exponential damping PageRank
- Quadratic hyperbolic damping TotalRank
- General hyperbolic damping HyperRank
- An empirical damping
-
- Comparing Damping Functions
-
- Approximating TotalRank with PageRank
- Approximating HyperRank with PageRank
- Approximating PageRank with LinearRank
-
- Parameters for the Damping Functions
-
- Characteristic path lengths
- Damping parameters and in-degree
- Experimental comparison of ranking orders
-
- Conclusions
-
42 Approximating HyperRank with PageRank
Now we want to approximate the weights of
s(β) =1
Nζ(β)
infin
sumt=0
1
(t +1)β Pt
using the weights of
r(α) =1minusα
N
infin
sumt=0
αtPt
and we proceed again by considering paths up to a certain length
`
sumt=0
(1
ζ(β)(t +1)β minus (1minusα)αt
)The minimum can be zero and it is attained at
αlowast(`β) =
radic1minus 1
ζ(β)
`
sumt=0
1
(t +1)β
Theα that minimizes the difference of weights for different values ofβ and of the maximum path lengths` is shown inFigure6 In the case ofβ = 2 for instance for path lengths up to 10 to 20 the bestα is between075 and 085
α
15202530β 5 10 15 20 25
Length
03
04
05
06
07
08
09
1
00
01
02
03
04
05
06
07
08
09
10
1 15 2 25 3 35 4
α th
at m
inim
izes
the
diff
eren
ce o
fsu
m o
f wei
ghts
Exponent β
Max path length=25Max path length=20Max path length=15Max path length=10
Max path length=5
Figure 6 Bestα for minimizing the difference of the sum of weights betweenPageRank and HyperRank with exponentβ for various path lengths
11
43 Approximating PageRank with LinearRank
For approximating the damping function of PageRank with the damping function of LinearRank we con-sider the summation of the differences up to a certain path length If`le L
`
sumt=0
((1minusα)αt minus 2(Lminus t)
L(L+1)
)And if ` gt L
Lminus1
sumt=0
((1minusα)αt minus 2(Lminus t)
L(L+1)
)+
`
sumt=L
(1minusα)αt
We will assume thatle L so the evaluation of the difference between the two rankings is done in an areain which both rankings have non-zero values TheL that minimizes the difference for a given combinationof α and` is
Llowast(α `) = `+2`α`+1 +α`+1 +1+
radic(1+α`+1)2 +4`(`+2)α`+1
2(1minusα`+1)
= `+1+O(`α(`+1)2
)and we have plotted it for different values ofα and` in Figure7
L
0405
0607
0809
α5 10 15 20 25
Length
5
10
15
20
25
30
5
10
15
20
25
30
35
40
05 06 07 08 09
L th
at m
inim
izes
the
diff
eren
ce o
fsu
m o
f wei
ghts
Exponent α
Max path length=25Max path length=20Max path length=15Max path length=10Max path length=5
Figure 7 BestL for minimizing the difference of the sum of weights betweenLinearRank and LinearRank for various path lengths
12
5 Parameters for the Damping Functions
For our experiments we used several snapshots from the Web including theuk it andeuint domainsFor comparison we also considered a synthetic scale-free network produced according to the evolving modeldescribed by Kumaret al [KRR+00] (a combination of preferential attachment and random links) with theparameters suggested by Panduranganet al [PRU02] As far as the latter is concerned in the generatedgraph the exponents for the power-law in the center part of the distributions are -21 for in-degree andPageRank and -27 for out-degree we generated a 100000-nodes graph without disconnected nodes
In this section we study the behavior of the ranking functions for varying values of their parameters
51 Characteristic path lengths
In scale-free networks the distances between pairs of nodes follow a Gaussian distribution [AJB99] (theaverage is not given in their paper) Analytic estimations for the average distance of a graph of scale-freenetwork ofn nodes include
bull O(log(n)) [WS98]
bull O(log(n) log(np)) in sparse graphs withp links [CL01]
bull 1+ log(nz1) log(z2z1) wherez1 is the average indegree andz2 is the average number of nodes atdistance 2 [NSW01] and
bull O(log(n) log(log(n))) [BR04]
We did the following experiment starting from a node picked at random we followed the links back-wards and counted the number of nodes at different distancesFigure8 plots the average distances foundwhich appear to be growing (sub)logarithmically with the size of the graphFigure9 shows the distributionobtained in each sample For this experiment we are not counting the pages without in-links
3456789
10111213141516
105 106 107 108
Ave
rage
dis
tanc
e
Number of nodes (N)
Figure 8 Average distances versus number of nodes
If graphs of different sizes show different path lengths what is the effect of this in the ranking calcu-lation Letrsquos suppose that for a graph withN1 nodes it is found by experimental or analytic means that agood parameter for PageRank isαlowast1 Now we would like to have a good parameterαlowast2 for a graph with thesame properties except that the size of the new graph isN2 lt N1
13
it Web graph uk Web graph40 mill nodes 18 mill nodes
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
Avg distance 149 Avg distance 148
euint Web graph Synthetic graph800000 nodes 100000 nodes
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
Avg distance 100 Avg distance 42
Figure 9 Distribution of the average number of nodes at a certain distance froma given node
One possible approach consistent with what we have done so far is to consider that the sum of theweights up to the average path lengths of the graphs (L1 L2) have to be similar for both rankings to behavein a similar way If we take this approach the solution is
1minus (αlowast1)L1+1 = 1minus (αlowast2)
L2+1
αlowast2 = (αlowast1)L1+1L2+1
αlowast2 asymp (αlowast1)log(N1)log(N2)
An example that can be used in practice is the following letrsquos consider a web graph withN1 = 115times109
pages (the size of the full Web estimated by [GS05]) and another graph with onlyN2 = 50times106 pages (thesize of the Web of a large country) the second graph is roughly 3 orders of magnitude smaller
If it is shown empirically thatαlowast1 = 085 is a good value for the PageRank parameter for the whole Webthenαlowast2 = 081 should have a similar behavior in the 50-million page set which is natural as the path lengthsare shorter If the subset of web pages were even smaller for instanceN2 = 106 pages (the size of the webof a large organization) thenαlowast2 = 076
14
52 Damping parameters and in-degree
In this section we are using data from theuk Web graph and a 8500-nodes synthetic graph with the sameindegree and outdegree distribution than the previous synthetic graph We first measured the variance ofthe values from the ranking function as we consider that a high variance is good in a ranking functionas the relative values differ more We also measured the relationship between the ranking function andin-degree for different values of the parameters in terms of covariance correlation coefficient and rankingorders (Kendallrsquosτ) The results are shown inFigure10
Variance Covariance Correlation coefficient Kendallrsquosτ
000e+00200e-13400e-13600e-13800e-13100e-12120e-12140e-12160e-12180e-12200e-12
01 02 03 04 05 06 07 08 09
Var
ianc
e of
Pag
eRan
k
Damping factor α
00 times 100
50 times 10-5
10 times 10-4
15 times 10-4
20 times 10-4
25 times 10-4
30 times 10-4
35 times 10-4
01 02 03 04 05 06 07 08 09
Cov
aria
nce
betw
een
Page
Ran
k an
d in
-deg
ree
Damping factor α
07000710072007300740075007600770078007900800
01 02 03 04 05 06 07 08 09
Corr
elat
ion
coef
ficie
nt w
ith in
-deg
ree
Damping factor α
042
044
046
048
050
052
054
056
01 02 03 04 05 06 07 08 09Ken
dallrsquo
s τ b
etw
een
Page
Rank
and
in-d
egre
e
Damping factor α
005
010
015
020
025
030
035
040
045
050
055
01 02 03 04 05 06 07 08 09
Var
ianc
e of
Pag
eRan
k
Damping factor α
0 times 100
1 times 10-3
2 times 10-3
3 times 10-3
4 times 10-3
5 times 10-3
6 times 10-3
01 02 03 04 05 06 07 08 09
Cov
aria
nce
betw
een
Page
Ran
k an
d in
-deg
ree
Damping factor α
0995
0996
0997
0998
0999
1000
01 02 03 04 05 06 07 08 09
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor α
079
080
081
082
083
084
085
086
087
01 02 03 04 05 06 07 08 09Ken
dallrsquo
s τ b
etw
een
Page
Rank
and
in-d
egre
e
Damping factor α
Figure 10 Variance of PageRank and its relationship with indegree in terms ofcovariance correlation coefficient and Kendallrsquosτ coefficient for varying valuesof the parameterα Top uk web graph bottom synthetic graph
The variance is higher asα increases In regard to the relationship with indegree for company homepages it has been shown that the logarithm of the in-degree is correlated with PageRank [UCH03] Ourresults are consistent with this observation The covariance monotonically increases withα both in the caseof the synthetic graph and in the web graph Not surprisingly using the generative model the correlationsare higher We observe a maximum correlation atα = 07 in the synthetic graph and atα = 05 in the webgraph We also notice that the correlation drops significativelly asα increases because a largeα means thatlonger paths have an effect in the calculation note however that this phenomenon does not significantlyimpact on the correlation coefficient that is still very large
A high correlation between PageRank and in-degree is bad from the point of view of a search enginebecause it makes link-spam easier In particular as the correlation coefficient is higher in theuk web graphnear 05 if we chooseα close to this value we are helping link spammers Note however that a highcorrelation was foreseeable because as shown in[CGS04] even approximating PageRank with just only 1level of links gets 70 of accuracy
The behavior of the Kendallrsquosτ coefficient which measures the similarity between ranking orders isthe opposite than the one observed in the real graph This also happens for HyperRank as inFigure11we made the same measurements for this functional ranking and the results were consistent The graphseems inverted because a low value ofβ has the same effect as a high value ofα longer paths have moreimportance in the calculation
The differences in the behavior of the ranking order in the synthetic graph might be explained by thefact that the generative model we are using does not capture some properties that might be relevant for theranking order under a functional ranking such as the clustering coefficient Also the synthetic graph is
15
Variance Covariance Correlation coefficient Kendallrsquosτ
000e+00
100e-12
200e-12
300e-12
400e-12
500e-12
600e-12
700e-12
800e-12
900e-12
1 15 2 25 3 35 4
Var
ianc
e of
Gen
eral
Hyp
erbo
lic R
ank
Damping factor β
000e+00
100e-04
200e-04
300e-04
400e-04
500e-04
600e-04
700e-04
800e-04
1 15 2 25 3 35 4Cov
aria
nce
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
0730
0740
0750
0760
0770
0780
0790
0800
1 15 2 25 3 35 4
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor β
04600
04700
04800
04900
05000
05100
05200
05300
1 15 2 25 3
Ken
dallrsquo
s τ
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
000e+00
200e-05
400e-05
600e-05
800e-05
100e-04
120e-04
140e-04
160e-04
1 15 2 25 3 35 4
Var
ianc
e of
Gen
eral
Hyp
erbo
lic R
ank
Damping factor β
000e+00
200e-03
400e-03
600e-03
800e-03
100e-02
120e-02
1 15 2 25 3 35 4Cov
aria
nce
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
0997
0997
0998
0998
0999
0999
1 15 2 25 3 35 4
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor β
0815008200082500830008350084000845008500085500860008650
1 15 2 25 3 35 4
Ken
dallrsquo
s τ b
etw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
Figure 11 Variance of general hyperbolic rank for some values ofβ and itsrelation with in-degree Experiments have been performed on theuk graph (top)and synthetic graph (bottom)
assortative (highly linked pages are mostly linked to other highly linked pages) while the real web graph isdisassortative (most of the neighbours of highly linked pages have small indegree)
53 Experimental comparison of ranking orders
In this section we present experimental results about the similarity between the ranking orders induced bysome of the functional rankings discussed in the previous sections To perform the comparison we usedKendallrsquosτ and data from the U K Web graphFigure12shows how PageRank compares with HyperRankfor various pairs ofα andβ In the limit αβrarr 1 both rankings are equivalent and they remain similar in alarge region of the parameter space
α
β
07
08
09
1
τ ge 095
τ
02 03
04 05
06 07
08 09
2 25
3 35
4
01
02
03
04
05
06
07
08
09
15 2 25 3 35 4
α th
at m
axim
izes
Ken
dallrsquo
s τ
Exponent β
Actual optimumPredicted optimum with length=5
Figure 12 Comparison (using Kendallrsquosτ) between PageRank and HyperRankwith various damping parameters in the UK web graph The optimum predictedin the analysis with = 5 is very close to the real one
In the figure we can see that the rankings obtained with HyperRank and PageRank can be almostequivalent (Kendallrsquosτ ge 095) moreover the analysis shown insection42 considering only paths of
16
lengths less than 5 provides a very good approximation for the optimum combination of parameters Thismeans that in fact the difference in the damping functions in the first few levels is crucial
The exponentsβ required for giving a good approximation of PageRank are very small whenα ge 07limiting the practical applicability of HyperRank as its convergence is not faster than the one of PageRank
As far as LinearRank and PageRank are concerned long paths and largeα should be considered toobtain a sufficiently similar ranking as shown inFigure13
05 06
07 08
09
5 10
15 20
25
080085090095100
τ
τ ge 095
αL
5
10
15
20
25
05 06 07 08 09L
that
max
imiz
es K
enda
llrsquos
τExponent α
Actual optimumPredicted optimum with length=5
Figure 13 Comparison (using Kendallrsquosτ) between PageRank and LinearRankin the UK web graph with various damping parameters Again the predictedoptimum with` = 5 is very close to the actual optimum
The predicted optimum given insection43 with ` = 5 this is considering only the summation of thedifferences between both damping functions up to paths of length 5 is very close to what was obtained inpractice Forα = 08 calculating LinearRank withL = 10 (which means the same number of iterations)givesτ ge 098 for α = 09 calculating LinearRank withL = 15 also givesτ ge 098 In both cases theranking order of PageRank is approximated by the ranking order of LinearRank with very high precision
6 Conclusions
In this paper we have defined a broad class of link-based ranking algorithms based on the contribution ofdamping factors along all different paths reaching a page We introduce four particular damping decays lin-ear exponential quadratic hyperbolic and general hyperbolic where exponential is equivalent to PageRankand quadratic hyperbolic to TotalRank
We studied the differences and similarities among these ranking algorithms and we have found that
bull Functional rankings using different damping functions can provide similar orderings if the parametersare chosen carefully
bull LinearRank can be used for calculating a ranking that is as good as PageRank but with a fixed andsmaller number of iterations
bull The parameters for the damping functions depend on the characteristic of path lengths in the graphwhich is known to grow sub-logarithmically on the size of the graph
More work is needed to find other damping functions that compute rankings similar to PageRank butare easier and faster to compute We use global ranking similarity but another measure could be the ranking
17
similarity in the top 20 results of real queries In this setting our results can change so future work willinclude this variation
Because of their high cost link-based ranking methods that involve iterative calculations at query timeare probably not used by large-scale search engines at this moment but the functional ranking with lineardamping we have presented can provide a good approximation with few iterations Also the approach wehave presented could be also applied to multivalued ranking functions such as HITS [Kle99] and topic-sensitive PageRank [Hav02] to obtain for instance a method for approximating the hubs and authorityscores using less iterations and a linear damping function
Our approach also helps to understand how easy or difficult is to collude many pages to modify theranking of a given page Clearly there are many different factors path lengths damping function branchingdegrees and number of colluded pages The graph structure of the collusion will affect those factors and weplan to analyze them
Acknowledgements
We would like to thank Daniel Fogaras for a valuable discussion about TotalRank that motivated part of thisresearch
References
[AJB99] Reka Albert Hawoong Jeong and Albert L Barabasi Diameter of the world wide webNature 401130ndash131 1999
[BH98] Krishna Bharat and Monika R HenzingerImproved algorithms for topic distillation in ahyperlinked environment In Proceedings of the 21st Annual International ACM SIGIR Con-ference on Research and Development in Information Retrieval pages 104ndash111 MelbourneAustralia August 1998 ACM Press New York
[BKM +00] Andrei Broder Ravi Kumar Farzin Maghoul Prabhakar Raghavan Sridhar RajagopalanRaymie Stata Andrew Tomkins and Janet WienerGraph structure in the web Experimentsand models In Proceedings of the Ninth Conference on World Wide Web pages 309ndash320Amsterdam Netherlands May 2000 ACM Press
[Bol05] Paolo BoldiTotalrank ranking without damping In Poster proceedings of the 14th interna-tional conference on World Wide Web pages 898ndash899 Chiba Japan 2005 ACM Press
[BR04] Bela Bollobas and Oliver RiordanThe diameter of a scale-free random graph Combinator-ica 24(1)5ndash34 2004
[Bri06] Michael Brinkmeier PageRank revisited ACM Transaction on Internet Technologies6(3)257ndash279 2006
[BSV05] Paolo Boldi Massimo Santini and Sebastiano VignaPagerank as a function of the dampingfactor In Proceedings of the 14th international conference on World Wide Web pages 557ndash566 Chiba Japan 2005 ACM Press
[BYD04] Ricardo Baeza-Yates and Emilio DavisWeb page ranking using link attributes In Alternatetrack papers amp posters of the 13th international conference on World Wide Web pages 328ndash329 New York NY USA 2004 ACM Press
18
[BYRN99] Ricardo Baeza-Yates and Berthier Ribeiro-NetoModern Information Retrieval AddisonWesley May 1999
[CGS04] Yen-Yu Chen Qingqing Gan and Torsten SuelLocal methods for estimating pagerank val-ues In Proceedings of the thirteenth ACM conference on Information and knowledge man-agement (CIKM) pages 381ndash389 New York NY USA 2004 ACM Press
[CL01] Fan Chung and Linyuan LuThe diameter of random sparse graphs Adv Appl Math26257ndash279 2001
[Dav00] Brian D DavisonTopical locality in the web In Proceedings of the 23rd annual internationalACM SIGIR conference on research and development in information retrieval pages 272ndash279Athens Greece 2000 ACM Press
[GG04] Gene H Golub and Chen GreifArnoldi-type algorithms for computing stationary distributionvectors with application to pagerank Technical Report SCCM-04-15 Stanford University2004
[GS05] Antonio Gulli and Alessio SignoriniThe indexable Web is more than 115 billion pages InPoster proceedings of the 14th international conference on World Wide Web pages 902ndash903Chiba Japan 2005 ACM Press
[Hav99] Taher HaveliwalaEfficient computation of pagerank Technical report Stanford University1999
[Hav02] Taher H HaveliwalaTopic-sensitive pagerank In Proceedings of the Eleventh World WideWeb Conference pages 517ndash526 Honolulu Hawaii USA May 2002 ACM Press
[HG98] Stephanie W Haas and Erika S GramsPage and link classifications connecting diverseresources In DL rsquo98 Proceedings of the third ACM conference on Digital libraries pages99ndash107 New York NY USA 1998 ACM Press
[HK03a] Taher Haveliwala and Sepandar KamvarThe condition number of the pagerank problemTechnical Report 36 Stanford University 2003
[HK03b] Taher Haveliwala and Sepandar KamvarThe second eigenvalue of the google matrix Tech-nical Report 20 Stanford University 2003
[JM98] W-K Joo and S H Myaeng Improving retrieval effectiveness with hyperlink informationIn Proceedings of International Workshop on Information Retrieval with Asian Languages(IRAL) Singapore October 1998
[KHMG03a] S Kamvar T Haveliwala C Manning and G GolubExploiting the block structure of theweb for computing pagerank 2003
[KHMG03b] Sepandar D Kamvar Taher H Haveliwala Christopher D Manning and Gene H GolubExtrapolation methods for accelerating pagerank computations In Proceedings of the twelfthinternational conference on World Wide Web pages 261ndash270 ACM Press 2003
[Kle99] Jon M KleinbergAuthoritative sources in a hyperlinked environment Journal of the ACM46(5)604ndash632 1999
19
[KRR+00] R Kumar P Raghavan S Rajagopalan D Sivakumar A Tomkins and E UpfalStochasticmodels for the web graph In Proceedings of the 41st Annual Symposium on Foundations ofComputer Science (FOCS) pages 57ndash65 Redondo Beach CA USA 2000 IEEE CS Press
[LGZ04] Chris P Lee Gene H Golub and Stefanos A ZeniosA fast two-stage algorithm for com-puting pagerank and its extensions Technical report Stanford University 2004
[Li98] Yanhong LiToward a qualitative search engine IEEE Internet Computing July 1998
[Lif00] Maxim LifantsevVoting model for ranking Web pages In Peter Graham and MuthucumaruMaheswaran editorsProceedings of the International Conference on Internet Computingpages 143ndash148 Las Vegas Nevada USA June 2000 CSREA Press
[Mar97] Massimo MarchioriThe quest for correct information of the Web hyper search engines InProc of the sixth international conference on the Web Santa Clara USA April 1997
[NSW01] M E Newman S H Strogatz and D J WattsRandom graphs with arbitrary degree distri-butions and their applicationsPhys Rev E Stat Nonlin Soft Matter Phys 64(2 Pt 2) August2001
[PBMW98] Lawrence Page Sergey Brin Rajeev Motwani and Terry WinogradThe PageRank citationranking bringing order to the Web Technical report Stanford Digital Library TechnologiesProject 1998
[PRU02] Gopal Pandurangan Prabhakara Raghavan and Eli UpfalUsing Pagerank to characterizeWeb structure In Proceedings of the 8th Annual International Computing and CombinatoricsConference (COCOON) volume 2387 ofLecture Notes in Computer Science pages 330ndash390Singapore August 2002 Springer
[SPM05] Padmini Srinivasan Gautam Pant and Filippo MenczerA general evaluation framework fortopical crawlers Information Retrieval 8(3)417ndash447 2005
[UCH03] Trystan Upstill Nick Craswell and David HawkingPredicting fame and fortune Page-rank or indegree In Proceedings of the Australasian Document Computing SymposiumADCS2003 pages 31ndash40 Canberra Australia December 2003
[WHAT04] Fang Wu Bernardo A Huberman Lada A Adamic and Joshua R TylerInformation flow insocial groups Physica A Statistical and Theoretical Physics 337(1-2)327ndash335 June 2004
[WS98] Duncan J Watts and Steven H StrogatzCollective dynamics of small-world networksNa-ture 393(6684)440ndash442 June 1998
20
- Introduction
- Functional Rankings
- Damping Functions
-
- Linear damping
- Exponential damping PageRank
- Quadratic hyperbolic damping TotalRank
- General hyperbolic damping HyperRank
- An empirical damping
-
- Comparing Damping Functions
-
- Approximating TotalRank with PageRank
- Approximating HyperRank with PageRank
- Approximating PageRank with LinearRank
-
- Parameters for the Damping Functions
-
- Characteristic path lengths
- Damping parameters and in-degree
- Experimental comparison of ranking orders
-
- Conclusions
-
43 Approximating PageRank with LinearRank
For approximating the damping function of PageRank with the damping function of LinearRank we con-sider the summation of the differences up to a certain path length If`le L
`
sumt=0
((1minusα)αt minus 2(Lminus t)
L(L+1)
)And if ` gt L
Lminus1
sumt=0
((1minusα)αt minus 2(Lminus t)
L(L+1)
)+
`
sumt=L
(1minusα)αt
We will assume thatle L so the evaluation of the difference between the two rankings is done in an areain which both rankings have non-zero values TheL that minimizes the difference for a given combinationof α and` is
Llowast(α `) = `+2`α`+1 +α`+1 +1+
radic(1+α`+1)2 +4`(`+2)α`+1
2(1minusα`+1)
= `+1+O(`α(`+1)2
)and we have plotted it for different values ofα and` in Figure7
L
0405
0607
0809
α5 10 15 20 25
Length
5
10
15
20
25
30
5
10
15
20
25
30
35
40
05 06 07 08 09
L th
at m
inim
izes
the
diff
eren
ce o
fsu
m o
f wei
ghts
Exponent α
Max path length=25Max path length=20Max path length=15Max path length=10Max path length=5
Figure 7 BestL for minimizing the difference of the sum of weights betweenLinearRank and LinearRank for various path lengths
12
5 Parameters for the Damping Functions
For our experiments we used several snapshots from the Web including theuk it andeuint domainsFor comparison we also considered a synthetic scale-free network produced according to the evolving modeldescribed by Kumaret al [KRR+00] (a combination of preferential attachment and random links) with theparameters suggested by Panduranganet al [PRU02] As far as the latter is concerned in the generatedgraph the exponents for the power-law in the center part of the distributions are -21 for in-degree andPageRank and -27 for out-degree we generated a 100000-nodes graph without disconnected nodes
In this section we study the behavior of the ranking functions for varying values of their parameters
51 Characteristic path lengths
In scale-free networks the distances between pairs of nodes follow a Gaussian distribution [AJB99] (theaverage is not given in their paper) Analytic estimations for the average distance of a graph of scale-freenetwork ofn nodes include
bull O(log(n)) [WS98]
bull O(log(n) log(np)) in sparse graphs withp links [CL01]
bull 1+ log(nz1) log(z2z1) wherez1 is the average indegree andz2 is the average number of nodes atdistance 2 [NSW01] and
bull O(log(n) log(log(n))) [BR04]
We did the following experiment starting from a node picked at random we followed the links back-wards and counted the number of nodes at different distancesFigure8 plots the average distances foundwhich appear to be growing (sub)logarithmically with the size of the graphFigure9 shows the distributionobtained in each sample For this experiment we are not counting the pages without in-links
3456789
10111213141516
105 106 107 108
Ave
rage
dis
tanc
e
Number of nodes (N)
Figure 8 Average distances versus number of nodes
If graphs of different sizes show different path lengths what is the effect of this in the ranking calcu-lation Letrsquos suppose that for a graph withN1 nodes it is found by experimental or analytic means that agood parameter for PageRank isαlowast1 Now we would like to have a good parameterαlowast2 for a graph with thesame properties except that the size of the new graph isN2 lt N1
13
it Web graph uk Web graph40 mill nodes 18 mill nodes
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
Avg distance 149 Avg distance 148
euint Web graph Synthetic graph800000 nodes 100000 nodes
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
Avg distance 100 Avg distance 42
Figure 9 Distribution of the average number of nodes at a certain distance froma given node
One possible approach consistent with what we have done so far is to consider that the sum of theweights up to the average path lengths of the graphs (L1 L2) have to be similar for both rankings to behavein a similar way If we take this approach the solution is
1minus (αlowast1)L1+1 = 1minus (αlowast2)
L2+1
αlowast2 = (αlowast1)L1+1L2+1
αlowast2 asymp (αlowast1)log(N1)log(N2)
An example that can be used in practice is the following letrsquos consider a web graph withN1 = 115times109
pages (the size of the full Web estimated by [GS05]) and another graph with onlyN2 = 50times106 pages (thesize of the Web of a large country) the second graph is roughly 3 orders of magnitude smaller
If it is shown empirically thatαlowast1 = 085 is a good value for the PageRank parameter for the whole Webthenαlowast2 = 081 should have a similar behavior in the 50-million page set which is natural as the path lengthsare shorter If the subset of web pages were even smaller for instanceN2 = 106 pages (the size of the webof a large organization) thenαlowast2 = 076
14
52 Damping parameters and in-degree
In this section we are using data from theuk Web graph and a 8500-nodes synthetic graph with the sameindegree and outdegree distribution than the previous synthetic graph We first measured the variance ofthe values from the ranking function as we consider that a high variance is good in a ranking functionas the relative values differ more We also measured the relationship between the ranking function andin-degree for different values of the parameters in terms of covariance correlation coefficient and rankingorders (Kendallrsquosτ) The results are shown inFigure10
Variance Covariance Correlation coefficient Kendallrsquosτ
000e+00200e-13400e-13600e-13800e-13100e-12120e-12140e-12160e-12180e-12200e-12
01 02 03 04 05 06 07 08 09
Var
ianc
e of
Pag
eRan
k
Damping factor α
00 times 100
50 times 10-5
10 times 10-4
15 times 10-4
20 times 10-4
25 times 10-4
30 times 10-4
35 times 10-4
01 02 03 04 05 06 07 08 09
Cov
aria
nce
betw
een
Page
Ran
k an
d in
-deg
ree
Damping factor α
07000710072007300740075007600770078007900800
01 02 03 04 05 06 07 08 09
Corr
elat
ion
coef
ficie
nt w
ith in
-deg
ree
Damping factor α
042
044
046
048
050
052
054
056
01 02 03 04 05 06 07 08 09Ken
dallrsquo
s τ b
etw
een
Page
Rank
and
in-d
egre
e
Damping factor α
005
010
015
020
025
030
035
040
045
050
055
01 02 03 04 05 06 07 08 09
Var
ianc
e of
Pag
eRan
k
Damping factor α
0 times 100
1 times 10-3
2 times 10-3
3 times 10-3
4 times 10-3
5 times 10-3
6 times 10-3
01 02 03 04 05 06 07 08 09
Cov
aria
nce
betw
een
Page
Ran
k an
d in
-deg
ree
Damping factor α
0995
0996
0997
0998
0999
1000
01 02 03 04 05 06 07 08 09
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor α
079
080
081
082
083
084
085
086
087
01 02 03 04 05 06 07 08 09Ken
dallrsquo
s τ b
etw
een
Page
Rank
and
in-d
egre
e
Damping factor α
Figure 10 Variance of PageRank and its relationship with indegree in terms ofcovariance correlation coefficient and Kendallrsquosτ coefficient for varying valuesof the parameterα Top uk web graph bottom synthetic graph
The variance is higher asα increases In regard to the relationship with indegree for company homepages it has been shown that the logarithm of the in-degree is correlated with PageRank [UCH03] Ourresults are consistent with this observation The covariance monotonically increases withα both in the caseof the synthetic graph and in the web graph Not surprisingly using the generative model the correlationsare higher We observe a maximum correlation atα = 07 in the synthetic graph and atα = 05 in the webgraph We also notice that the correlation drops significativelly asα increases because a largeα means thatlonger paths have an effect in the calculation note however that this phenomenon does not significantlyimpact on the correlation coefficient that is still very large
A high correlation between PageRank and in-degree is bad from the point of view of a search enginebecause it makes link-spam easier In particular as the correlation coefficient is higher in theuk web graphnear 05 if we chooseα close to this value we are helping link spammers Note however that a highcorrelation was foreseeable because as shown in[CGS04] even approximating PageRank with just only 1level of links gets 70 of accuracy
The behavior of the Kendallrsquosτ coefficient which measures the similarity between ranking orders isthe opposite than the one observed in the real graph This also happens for HyperRank as inFigure11we made the same measurements for this functional ranking and the results were consistent The graphseems inverted because a low value ofβ has the same effect as a high value ofα longer paths have moreimportance in the calculation
The differences in the behavior of the ranking order in the synthetic graph might be explained by thefact that the generative model we are using does not capture some properties that might be relevant for theranking order under a functional ranking such as the clustering coefficient Also the synthetic graph is
15
Variance Covariance Correlation coefficient Kendallrsquosτ
000e+00
100e-12
200e-12
300e-12
400e-12
500e-12
600e-12
700e-12
800e-12
900e-12
1 15 2 25 3 35 4
Var
ianc
e of
Gen
eral
Hyp
erbo
lic R
ank
Damping factor β
000e+00
100e-04
200e-04
300e-04
400e-04
500e-04
600e-04
700e-04
800e-04
1 15 2 25 3 35 4Cov
aria
nce
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
0730
0740
0750
0760
0770
0780
0790
0800
1 15 2 25 3 35 4
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor β
04600
04700
04800
04900
05000
05100
05200
05300
1 15 2 25 3
Ken
dallrsquo
s τ
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
000e+00
200e-05
400e-05
600e-05
800e-05
100e-04
120e-04
140e-04
160e-04
1 15 2 25 3 35 4
Var
ianc
e of
Gen
eral
Hyp
erbo
lic R
ank
Damping factor β
000e+00
200e-03
400e-03
600e-03
800e-03
100e-02
120e-02
1 15 2 25 3 35 4Cov
aria
nce
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
0997
0997
0998
0998
0999
0999
1 15 2 25 3 35 4
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor β
0815008200082500830008350084000845008500085500860008650
1 15 2 25 3 35 4
Ken
dallrsquo
s τ b
etw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
Figure 11 Variance of general hyperbolic rank for some values ofβ and itsrelation with in-degree Experiments have been performed on theuk graph (top)and synthetic graph (bottom)
assortative (highly linked pages are mostly linked to other highly linked pages) while the real web graph isdisassortative (most of the neighbours of highly linked pages have small indegree)
53 Experimental comparison of ranking orders
In this section we present experimental results about the similarity between the ranking orders induced bysome of the functional rankings discussed in the previous sections To perform the comparison we usedKendallrsquosτ and data from the U K Web graphFigure12shows how PageRank compares with HyperRankfor various pairs ofα andβ In the limit αβrarr 1 both rankings are equivalent and they remain similar in alarge region of the parameter space
α
β
07
08
09
1
τ ge 095
τ
02 03
04 05
06 07
08 09
2 25
3 35
4
01
02
03
04
05
06
07
08
09
15 2 25 3 35 4
α th
at m
axim
izes
Ken
dallrsquo
s τ
Exponent β
Actual optimumPredicted optimum with length=5
Figure 12 Comparison (using Kendallrsquosτ) between PageRank and HyperRankwith various damping parameters in the UK web graph The optimum predictedin the analysis with = 5 is very close to the real one
In the figure we can see that the rankings obtained with HyperRank and PageRank can be almostequivalent (Kendallrsquosτ ge 095) moreover the analysis shown insection42 considering only paths of
16
lengths less than 5 provides a very good approximation for the optimum combination of parameters Thismeans that in fact the difference in the damping functions in the first few levels is crucial
The exponentsβ required for giving a good approximation of PageRank are very small whenα ge 07limiting the practical applicability of HyperRank as its convergence is not faster than the one of PageRank
As far as LinearRank and PageRank are concerned long paths and largeα should be considered toobtain a sufficiently similar ranking as shown inFigure13
05 06
07 08
09
5 10
15 20
25
080085090095100
τ
τ ge 095
αL
5
10
15
20
25
05 06 07 08 09L
that
max
imiz
es K
enda
llrsquos
τExponent α
Actual optimumPredicted optimum with length=5
Figure 13 Comparison (using Kendallrsquosτ) between PageRank and LinearRankin the UK web graph with various damping parameters Again the predictedoptimum with` = 5 is very close to the actual optimum
The predicted optimum given insection43 with ` = 5 this is considering only the summation of thedifferences between both damping functions up to paths of length 5 is very close to what was obtained inpractice Forα = 08 calculating LinearRank withL = 10 (which means the same number of iterations)givesτ ge 098 for α = 09 calculating LinearRank withL = 15 also givesτ ge 098 In both cases theranking order of PageRank is approximated by the ranking order of LinearRank with very high precision
6 Conclusions
In this paper we have defined a broad class of link-based ranking algorithms based on the contribution ofdamping factors along all different paths reaching a page We introduce four particular damping decays lin-ear exponential quadratic hyperbolic and general hyperbolic where exponential is equivalent to PageRankand quadratic hyperbolic to TotalRank
We studied the differences and similarities among these ranking algorithms and we have found that
bull Functional rankings using different damping functions can provide similar orderings if the parametersare chosen carefully
bull LinearRank can be used for calculating a ranking that is as good as PageRank but with a fixed andsmaller number of iterations
bull The parameters for the damping functions depend on the characteristic of path lengths in the graphwhich is known to grow sub-logarithmically on the size of the graph
More work is needed to find other damping functions that compute rankings similar to PageRank butare easier and faster to compute We use global ranking similarity but another measure could be the ranking
17
similarity in the top 20 results of real queries In this setting our results can change so future work willinclude this variation
Because of their high cost link-based ranking methods that involve iterative calculations at query timeare probably not used by large-scale search engines at this moment but the functional ranking with lineardamping we have presented can provide a good approximation with few iterations Also the approach wehave presented could be also applied to multivalued ranking functions such as HITS [Kle99] and topic-sensitive PageRank [Hav02] to obtain for instance a method for approximating the hubs and authorityscores using less iterations and a linear damping function
Our approach also helps to understand how easy or difficult is to collude many pages to modify theranking of a given page Clearly there are many different factors path lengths damping function branchingdegrees and number of colluded pages The graph structure of the collusion will affect those factors and weplan to analyze them
Acknowledgements
We would like to thank Daniel Fogaras for a valuable discussion about TotalRank that motivated part of thisresearch
References
[AJB99] Reka Albert Hawoong Jeong and Albert L Barabasi Diameter of the world wide webNature 401130ndash131 1999
[BH98] Krishna Bharat and Monika R HenzingerImproved algorithms for topic distillation in ahyperlinked environment In Proceedings of the 21st Annual International ACM SIGIR Con-ference on Research and Development in Information Retrieval pages 104ndash111 MelbourneAustralia August 1998 ACM Press New York
[BKM +00] Andrei Broder Ravi Kumar Farzin Maghoul Prabhakar Raghavan Sridhar RajagopalanRaymie Stata Andrew Tomkins and Janet WienerGraph structure in the web Experimentsand models In Proceedings of the Ninth Conference on World Wide Web pages 309ndash320Amsterdam Netherlands May 2000 ACM Press
[Bol05] Paolo BoldiTotalrank ranking without damping In Poster proceedings of the 14th interna-tional conference on World Wide Web pages 898ndash899 Chiba Japan 2005 ACM Press
[BR04] Bela Bollobas and Oliver RiordanThe diameter of a scale-free random graph Combinator-ica 24(1)5ndash34 2004
[Bri06] Michael Brinkmeier PageRank revisited ACM Transaction on Internet Technologies6(3)257ndash279 2006
[BSV05] Paolo Boldi Massimo Santini and Sebastiano VignaPagerank as a function of the dampingfactor In Proceedings of the 14th international conference on World Wide Web pages 557ndash566 Chiba Japan 2005 ACM Press
[BYD04] Ricardo Baeza-Yates and Emilio DavisWeb page ranking using link attributes In Alternatetrack papers amp posters of the 13th international conference on World Wide Web pages 328ndash329 New York NY USA 2004 ACM Press
18
[BYRN99] Ricardo Baeza-Yates and Berthier Ribeiro-NetoModern Information Retrieval AddisonWesley May 1999
[CGS04] Yen-Yu Chen Qingqing Gan and Torsten SuelLocal methods for estimating pagerank val-ues In Proceedings of the thirteenth ACM conference on Information and knowledge man-agement (CIKM) pages 381ndash389 New York NY USA 2004 ACM Press
[CL01] Fan Chung and Linyuan LuThe diameter of random sparse graphs Adv Appl Math26257ndash279 2001
[Dav00] Brian D DavisonTopical locality in the web In Proceedings of the 23rd annual internationalACM SIGIR conference on research and development in information retrieval pages 272ndash279Athens Greece 2000 ACM Press
[GG04] Gene H Golub and Chen GreifArnoldi-type algorithms for computing stationary distributionvectors with application to pagerank Technical Report SCCM-04-15 Stanford University2004
[GS05] Antonio Gulli and Alessio SignoriniThe indexable Web is more than 115 billion pages InPoster proceedings of the 14th international conference on World Wide Web pages 902ndash903Chiba Japan 2005 ACM Press
[Hav99] Taher HaveliwalaEfficient computation of pagerank Technical report Stanford University1999
[Hav02] Taher H HaveliwalaTopic-sensitive pagerank In Proceedings of the Eleventh World WideWeb Conference pages 517ndash526 Honolulu Hawaii USA May 2002 ACM Press
[HG98] Stephanie W Haas and Erika S GramsPage and link classifications connecting diverseresources In DL rsquo98 Proceedings of the third ACM conference on Digital libraries pages99ndash107 New York NY USA 1998 ACM Press
[HK03a] Taher Haveliwala and Sepandar KamvarThe condition number of the pagerank problemTechnical Report 36 Stanford University 2003
[HK03b] Taher Haveliwala and Sepandar KamvarThe second eigenvalue of the google matrix Tech-nical Report 20 Stanford University 2003
[JM98] W-K Joo and S H Myaeng Improving retrieval effectiveness with hyperlink informationIn Proceedings of International Workshop on Information Retrieval with Asian Languages(IRAL) Singapore October 1998
[KHMG03a] S Kamvar T Haveliwala C Manning and G GolubExploiting the block structure of theweb for computing pagerank 2003
[KHMG03b] Sepandar D Kamvar Taher H Haveliwala Christopher D Manning and Gene H GolubExtrapolation methods for accelerating pagerank computations In Proceedings of the twelfthinternational conference on World Wide Web pages 261ndash270 ACM Press 2003
[Kle99] Jon M KleinbergAuthoritative sources in a hyperlinked environment Journal of the ACM46(5)604ndash632 1999
19
[KRR+00] R Kumar P Raghavan S Rajagopalan D Sivakumar A Tomkins and E UpfalStochasticmodels for the web graph In Proceedings of the 41st Annual Symposium on Foundations ofComputer Science (FOCS) pages 57ndash65 Redondo Beach CA USA 2000 IEEE CS Press
[LGZ04] Chris P Lee Gene H Golub and Stefanos A ZeniosA fast two-stage algorithm for com-puting pagerank and its extensions Technical report Stanford University 2004
[Li98] Yanhong LiToward a qualitative search engine IEEE Internet Computing July 1998
[Lif00] Maxim LifantsevVoting model for ranking Web pages In Peter Graham and MuthucumaruMaheswaran editorsProceedings of the International Conference on Internet Computingpages 143ndash148 Las Vegas Nevada USA June 2000 CSREA Press
[Mar97] Massimo MarchioriThe quest for correct information of the Web hyper search engines InProc of the sixth international conference on the Web Santa Clara USA April 1997
[NSW01] M E Newman S H Strogatz and D J WattsRandom graphs with arbitrary degree distri-butions and their applicationsPhys Rev E Stat Nonlin Soft Matter Phys 64(2 Pt 2) August2001
[PBMW98] Lawrence Page Sergey Brin Rajeev Motwani and Terry WinogradThe PageRank citationranking bringing order to the Web Technical report Stanford Digital Library TechnologiesProject 1998
[PRU02] Gopal Pandurangan Prabhakara Raghavan and Eli UpfalUsing Pagerank to characterizeWeb structure In Proceedings of the 8th Annual International Computing and CombinatoricsConference (COCOON) volume 2387 ofLecture Notes in Computer Science pages 330ndash390Singapore August 2002 Springer
[SPM05] Padmini Srinivasan Gautam Pant and Filippo MenczerA general evaluation framework fortopical crawlers Information Retrieval 8(3)417ndash447 2005
[UCH03] Trystan Upstill Nick Craswell and David HawkingPredicting fame and fortune Page-rank or indegree In Proceedings of the Australasian Document Computing SymposiumADCS2003 pages 31ndash40 Canberra Australia December 2003
[WHAT04] Fang Wu Bernardo A Huberman Lada A Adamic and Joshua R TylerInformation flow insocial groups Physica A Statistical and Theoretical Physics 337(1-2)327ndash335 June 2004
[WS98] Duncan J Watts and Steven H StrogatzCollective dynamics of small-world networksNa-ture 393(6684)440ndash442 June 1998
20
- Introduction
- Functional Rankings
- Damping Functions
-
- Linear damping
- Exponential damping PageRank
- Quadratic hyperbolic damping TotalRank
- General hyperbolic damping HyperRank
- An empirical damping
-
- Comparing Damping Functions
-
- Approximating TotalRank with PageRank
- Approximating HyperRank with PageRank
- Approximating PageRank with LinearRank
-
- Parameters for the Damping Functions
-
- Characteristic path lengths
- Damping parameters and in-degree
- Experimental comparison of ranking orders
-
- Conclusions
-
5 Parameters for the Damping Functions
For our experiments we used several snapshots from the Web including theuk it andeuint domainsFor comparison we also considered a synthetic scale-free network produced according to the evolving modeldescribed by Kumaret al [KRR+00] (a combination of preferential attachment and random links) with theparameters suggested by Panduranganet al [PRU02] As far as the latter is concerned in the generatedgraph the exponents for the power-law in the center part of the distributions are -21 for in-degree andPageRank and -27 for out-degree we generated a 100000-nodes graph without disconnected nodes
In this section we study the behavior of the ranking functions for varying values of their parameters
51 Characteristic path lengths
In scale-free networks the distances between pairs of nodes follow a Gaussian distribution [AJB99] (theaverage is not given in their paper) Analytic estimations for the average distance of a graph of scale-freenetwork ofn nodes include
bull O(log(n)) [WS98]
bull O(log(n) log(np)) in sparse graphs withp links [CL01]
bull 1+ log(nz1) log(z2z1) wherez1 is the average indegree andz2 is the average number of nodes atdistance 2 [NSW01] and
bull O(log(n) log(log(n))) [BR04]
We did the following experiment starting from a node picked at random we followed the links back-wards and counted the number of nodes at different distancesFigure8 plots the average distances foundwhich appear to be growing (sub)logarithmically with the size of the graphFigure9 shows the distributionobtained in each sample For this experiment we are not counting the pages without in-links
3456789
10111213141516
105 106 107 108
Ave
rage
dis
tanc
e
Number of nodes (N)
Figure 8 Average distances versus number of nodes
If graphs of different sizes show different path lengths what is the effect of this in the ranking calcu-lation Letrsquos suppose that for a graph withN1 nodes it is found by experimental or analytic means that agood parameter for PageRank isαlowast1 Now we would like to have a good parameterαlowast2 for a graph with thesame properties except that the size of the new graph isN2 lt N1
13
it Web graph uk Web graph40 mill nodes 18 mill nodes
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
Avg distance 149 Avg distance 148
euint Web graph Synthetic graph800000 nodes 100000 nodes
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
Avg distance 100 Avg distance 42
Figure 9 Distribution of the average number of nodes at a certain distance froma given node
One possible approach consistent with what we have done so far is to consider that the sum of theweights up to the average path lengths of the graphs (L1 L2) have to be similar for both rankings to behavein a similar way If we take this approach the solution is
1minus (αlowast1)L1+1 = 1minus (αlowast2)
L2+1
αlowast2 = (αlowast1)L1+1L2+1
αlowast2 asymp (αlowast1)log(N1)log(N2)
An example that can be used in practice is the following letrsquos consider a web graph withN1 = 115times109
pages (the size of the full Web estimated by [GS05]) and another graph with onlyN2 = 50times106 pages (thesize of the Web of a large country) the second graph is roughly 3 orders of magnitude smaller
If it is shown empirically thatαlowast1 = 085 is a good value for the PageRank parameter for the whole Webthenαlowast2 = 081 should have a similar behavior in the 50-million page set which is natural as the path lengthsare shorter If the subset of web pages were even smaller for instanceN2 = 106 pages (the size of the webof a large organization) thenαlowast2 = 076
14
52 Damping parameters and in-degree
In this section we are using data from theuk Web graph and a 8500-nodes synthetic graph with the sameindegree and outdegree distribution than the previous synthetic graph We first measured the variance ofthe values from the ranking function as we consider that a high variance is good in a ranking functionas the relative values differ more We also measured the relationship between the ranking function andin-degree for different values of the parameters in terms of covariance correlation coefficient and rankingorders (Kendallrsquosτ) The results are shown inFigure10
Variance Covariance Correlation coefficient Kendallrsquosτ
000e+00200e-13400e-13600e-13800e-13100e-12120e-12140e-12160e-12180e-12200e-12
01 02 03 04 05 06 07 08 09
Var
ianc
e of
Pag
eRan
k
Damping factor α
00 times 100
50 times 10-5
10 times 10-4
15 times 10-4
20 times 10-4
25 times 10-4
30 times 10-4
35 times 10-4
01 02 03 04 05 06 07 08 09
Cov
aria
nce
betw
een
Page
Ran
k an
d in
-deg
ree
Damping factor α
07000710072007300740075007600770078007900800
01 02 03 04 05 06 07 08 09
Corr
elat
ion
coef
ficie
nt w
ith in
-deg
ree
Damping factor α
042
044
046
048
050
052
054
056
01 02 03 04 05 06 07 08 09Ken
dallrsquo
s τ b
etw
een
Page
Rank
and
in-d
egre
e
Damping factor α
005
010
015
020
025
030
035
040
045
050
055
01 02 03 04 05 06 07 08 09
Var
ianc
e of
Pag
eRan
k
Damping factor α
0 times 100
1 times 10-3
2 times 10-3
3 times 10-3
4 times 10-3
5 times 10-3
6 times 10-3
01 02 03 04 05 06 07 08 09
Cov
aria
nce
betw
een
Page
Ran
k an
d in
-deg
ree
Damping factor α
0995
0996
0997
0998
0999
1000
01 02 03 04 05 06 07 08 09
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor α
079
080
081
082
083
084
085
086
087
01 02 03 04 05 06 07 08 09Ken
dallrsquo
s τ b
etw
een
Page
Rank
and
in-d
egre
e
Damping factor α
Figure 10 Variance of PageRank and its relationship with indegree in terms ofcovariance correlation coefficient and Kendallrsquosτ coefficient for varying valuesof the parameterα Top uk web graph bottom synthetic graph
The variance is higher asα increases In regard to the relationship with indegree for company homepages it has been shown that the logarithm of the in-degree is correlated with PageRank [UCH03] Ourresults are consistent with this observation The covariance monotonically increases withα both in the caseof the synthetic graph and in the web graph Not surprisingly using the generative model the correlationsare higher We observe a maximum correlation atα = 07 in the synthetic graph and atα = 05 in the webgraph We also notice that the correlation drops significativelly asα increases because a largeα means thatlonger paths have an effect in the calculation note however that this phenomenon does not significantlyimpact on the correlation coefficient that is still very large
A high correlation between PageRank and in-degree is bad from the point of view of a search enginebecause it makes link-spam easier In particular as the correlation coefficient is higher in theuk web graphnear 05 if we chooseα close to this value we are helping link spammers Note however that a highcorrelation was foreseeable because as shown in[CGS04] even approximating PageRank with just only 1level of links gets 70 of accuracy
The behavior of the Kendallrsquosτ coefficient which measures the similarity between ranking orders isthe opposite than the one observed in the real graph This also happens for HyperRank as inFigure11we made the same measurements for this functional ranking and the results were consistent The graphseems inverted because a low value ofβ has the same effect as a high value ofα longer paths have moreimportance in the calculation
The differences in the behavior of the ranking order in the synthetic graph might be explained by thefact that the generative model we are using does not capture some properties that might be relevant for theranking order under a functional ranking such as the clustering coefficient Also the synthetic graph is
15
Variance Covariance Correlation coefficient Kendallrsquosτ
000e+00
100e-12
200e-12
300e-12
400e-12
500e-12
600e-12
700e-12
800e-12
900e-12
1 15 2 25 3 35 4
Var
ianc
e of
Gen
eral
Hyp
erbo
lic R
ank
Damping factor β
000e+00
100e-04
200e-04
300e-04
400e-04
500e-04
600e-04
700e-04
800e-04
1 15 2 25 3 35 4Cov
aria
nce
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
0730
0740
0750
0760
0770
0780
0790
0800
1 15 2 25 3 35 4
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor β
04600
04700
04800
04900
05000
05100
05200
05300
1 15 2 25 3
Ken
dallrsquo
s τ
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
000e+00
200e-05
400e-05
600e-05
800e-05
100e-04
120e-04
140e-04
160e-04
1 15 2 25 3 35 4
Var
ianc
e of
Gen
eral
Hyp
erbo
lic R
ank
Damping factor β
000e+00
200e-03
400e-03
600e-03
800e-03
100e-02
120e-02
1 15 2 25 3 35 4Cov
aria
nce
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
0997
0997
0998
0998
0999
0999
1 15 2 25 3 35 4
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor β
0815008200082500830008350084000845008500085500860008650
1 15 2 25 3 35 4
Ken
dallrsquo
s τ b
etw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
Figure 11 Variance of general hyperbolic rank for some values ofβ and itsrelation with in-degree Experiments have been performed on theuk graph (top)and synthetic graph (bottom)
assortative (highly linked pages are mostly linked to other highly linked pages) while the real web graph isdisassortative (most of the neighbours of highly linked pages have small indegree)
53 Experimental comparison of ranking orders
In this section we present experimental results about the similarity between the ranking orders induced bysome of the functional rankings discussed in the previous sections To perform the comparison we usedKendallrsquosτ and data from the U K Web graphFigure12shows how PageRank compares with HyperRankfor various pairs ofα andβ In the limit αβrarr 1 both rankings are equivalent and they remain similar in alarge region of the parameter space
α
β
07
08
09
1
τ ge 095
τ
02 03
04 05
06 07
08 09
2 25
3 35
4
01
02
03
04
05
06
07
08
09
15 2 25 3 35 4
α th
at m
axim
izes
Ken
dallrsquo
s τ
Exponent β
Actual optimumPredicted optimum with length=5
Figure 12 Comparison (using Kendallrsquosτ) between PageRank and HyperRankwith various damping parameters in the UK web graph The optimum predictedin the analysis with = 5 is very close to the real one
In the figure we can see that the rankings obtained with HyperRank and PageRank can be almostequivalent (Kendallrsquosτ ge 095) moreover the analysis shown insection42 considering only paths of
16
lengths less than 5 provides a very good approximation for the optimum combination of parameters Thismeans that in fact the difference in the damping functions in the first few levels is crucial
The exponentsβ required for giving a good approximation of PageRank are very small whenα ge 07limiting the practical applicability of HyperRank as its convergence is not faster than the one of PageRank
As far as LinearRank and PageRank are concerned long paths and largeα should be considered toobtain a sufficiently similar ranking as shown inFigure13
05 06
07 08
09
5 10
15 20
25
080085090095100
τ
τ ge 095
αL
5
10
15
20
25
05 06 07 08 09L
that
max
imiz
es K
enda
llrsquos
τExponent α
Actual optimumPredicted optimum with length=5
Figure 13 Comparison (using Kendallrsquosτ) between PageRank and LinearRankin the UK web graph with various damping parameters Again the predictedoptimum with` = 5 is very close to the actual optimum
The predicted optimum given insection43 with ` = 5 this is considering only the summation of thedifferences between both damping functions up to paths of length 5 is very close to what was obtained inpractice Forα = 08 calculating LinearRank withL = 10 (which means the same number of iterations)givesτ ge 098 for α = 09 calculating LinearRank withL = 15 also givesτ ge 098 In both cases theranking order of PageRank is approximated by the ranking order of LinearRank with very high precision
6 Conclusions
In this paper we have defined a broad class of link-based ranking algorithms based on the contribution ofdamping factors along all different paths reaching a page We introduce four particular damping decays lin-ear exponential quadratic hyperbolic and general hyperbolic where exponential is equivalent to PageRankand quadratic hyperbolic to TotalRank
We studied the differences and similarities among these ranking algorithms and we have found that
bull Functional rankings using different damping functions can provide similar orderings if the parametersare chosen carefully
bull LinearRank can be used for calculating a ranking that is as good as PageRank but with a fixed andsmaller number of iterations
bull The parameters for the damping functions depend on the characteristic of path lengths in the graphwhich is known to grow sub-logarithmically on the size of the graph
More work is needed to find other damping functions that compute rankings similar to PageRank butare easier and faster to compute We use global ranking similarity but another measure could be the ranking
17
similarity in the top 20 results of real queries In this setting our results can change so future work willinclude this variation
Because of their high cost link-based ranking methods that involve iterative calculations at query timeare probably not used by large-scale search engines at this moment but the functional ranking with lineardamping we have presented can provide a good approximation with few iterations Also the approach wehave presented could be also applied to multivalued ranking functions such as HITS [Kle99] and topic-sensitive PageRank [Hav02] to obtain for instance a method for approximating the hubs and authorityscores using less iterations and a linear damping function
Our approach also helps to understand how easy or difficult is to collude many pages to modify theranking of a given page Clearly there are many different factors path lengths damping function branchingdegrees and number of colluded pages The graph structure of the collusion will affect those factors and weplan to analyze them
Acknowledgements
We would like to thank Daniel Fogaras for a valuable discussion about TotalRank that motivated part of thisresearch
References
[AJB99] Reka Albert Hawoong Jeong and Albert L Barabasi Diameter of the world wide webNature 401130ndash131 1999
[BH98] Krishna Bharat and Monika R HenzingerImproved algorithms for topic distillation in ahyperlinked environment In Proceedings of the 21st Annual International ACM SIGIR Con-ference on Research and Development in Information Retrieval pages 104ndash111 MelbourneAustralia August 1998 ACM Press New York
[BKM +00] Andrei Broder Ravi Kumar Farzin Maghoul Prabhakar Raghavan Sridhar RajagopalanRaymie Stata Andrew Tomkins and Janet WienerGraph structure in the web Experimentsand models In Proceedings of the Ninth Conference on World Wide Web pages 309ndash320Amsterdam Netherlands May 2000 ACM Press
[Bol05] Paolo BoldiTotalrank ranking without damping In Poster proceedings of the 14th interna-tional conference on World Wide Web pages 898ndash899 Chiba Japan 2005 ACM Press
[BR04] Bela Bollobas and Oliver RiordanThe diameter of a scale-free random graph Combinator-ica 24(1)5ndash34 2004
[Bri06] Michael Brinkmeier PageRank revisited ACM Transaction on Internet Technologies6(3)257ndash279 2006
[BSV05] Paolo Boldi Massimo Santini and Sebastiano VignaPagerank as a function of the dampingfactor In Proceedings of the 14th international conference on World Wide Web pages 557ndash566 Chiba Japan 2005 ACM Press
[BYD04] Ricardo Baeza-Yates and Emilio DavisWeb page ranking using link attributes In Alternatetrack papers amp posters of the 13th international conference on World Wide Web pages 328ndash329 New York NY USA 2004 ACM Press
18
[BYRN99] Ricardo Baeza-Yates and Berthier Ribeiro-NetoModern Information Retrieval AddisonWesley May 1999
[CGS04] Yen-Yu Chen Qingqing Gan and Torsten SuelLocal methods for estimating pagerank val-ues In Proceedings of the thirteenth ACM conference on Information and knowledge man-agement (CIKM) pages 381ndash389 New York NY USA 2004 ACM Press
[CL01] Fan Chung and Linyuan LuThe diameter of random sparse graphs Adv Appl Math26257ndash279 2001
[Dav00] Brian D DavisonTopical locality in the web In Proceedings of the 23rd annual internationalACM SIGIR conference on research and development in information retrieval pages 272ndash279Athens Greece 2000 ACM Press
[GG04] Gene H Golub and Chen GreifArnoldi-type algorithms for computing stationary distributionvectors with application to pagerank Technical Report SCCM-04-15 Stanford University2004
[GS05] Antonio Gulli and Alessio SignoriniThe indexable Web is more than 115 billion pages InPoster proceedings of the 14th international conference on World Wide Web pages 902ndash903Chiba Japan 2005 ACM Press
[Hav99] Taher HaveliwalaEfficient computation of pagerank Technical report Stanford University1999
[Hav02] Taher H HaveliwalaTopic-sensitive pagerank In Proceedings of the Eleventh World WideWeb Conference pages 517ndash526 Honolulu Hawaii USA May 2002 ACM Press
[HG98] Stephanie W Haas and Erika S GramsPage and link classifications connecting diverseresources In DL rsquo98 Proceedings of the third ACM conference on Digital libraries pages99ndash107 New York NY USA 1998 ACM Press
[HK03a] Taher Haveliwala and Sepandar KamvarThe condition number of the pagerank problemTechnical Report 36 Stanford University 2003
[HK03b] Taher Haveliwala and Sepandar KamvarThe second eigenvalue of the google matrix Tech-nical Report 20 Stanford University 2003
[JM98] W-K Joo and S H Myaeng Improving retrieval effectiveness with hyperlink informationIn Proceedings of International Workshop on Information Retrieval with Asian Languages(IRAL) Singapore October 1998
[KHMG03a] S Kamvar T Haveliwala C Manning and G GolubExploiting the block structure of theweb for computing pagerank 2003
[KHMG03b] Sepandar D Kamvar Taher H Haveliwala Christopher D Manning and Gene H GolubExtrapolation methods for accelerating pagerank computations In Proceedings of the twelfthinternational conference on World Wide Web pages 261ndash270 ACM Press 2003
[Kle99] Jon M KleinbergAuthoritative sources in a hyperlinked environment Journal of the ACM46(5)604ndash632 1999
19
[KRR+00] R Kumar P Raghavan S Rajagopalan D Sivakumar A Tomkins and E UpfalStochasticmodels for the web graph In Proceedings of the 41st Annual Symposium on Foundations ofComputer Science (FOCS) pages 57ndash65 Redondo Beach CA USA 2000 IEEE CS Press
[LGZ04] Chris P Lee Gene H Golub and Stefanos A ZeniosA fast two-stage algorithm for com-puting pagerank and its extensions Technical report Stanford University 2004
[Li98] Yanhong LiToward a qualitative search engine IEEE Internet Computing July 1998
[Lif00] Maxim LifantsevVoting model for ranking Web pages In Peter Graham and MuthucumaruMaheswaran editorsProceedings of the International Conference on Internet Computingpages 143ndash148 Las Vegas Nevada USA June 2000 CSREA Press
[Mar97] Massimo MarchioriThe quest for correct information of the Web hyper search engines InProc of the sixth international conference on the Web Santa Clara USA April 1997
[NSW01] M E Newman S H Strogatz and D J WattsRandom graphs with arbitrary degree distri-butions and their applicationsPhys Rev E Stat Nonlin Soft Matter Phys 64(2 Pt 2) August2001
[PBMW98] Lawrence Page Sergey Brin Rajeev Motwani and Terry WinogradThe PageRank citationranking bringing order to the Web Technical report Stanford Digital Library TechnologiesProject 1998
[PRU02] Gopal Pandurangan Prabhakara Raghavan and Eli UpfalUsing Pagerank to characterizeWeb structure In Proceedings of the 8th Annual International Computing and CombinatoricsConference (COCOON) volume 2387 ofLecture Notes in Computer Science pages 330ndash390Singapore August 2002 Springer
[SPM05] Padmini Srinivasan Gautam Pant and Filippo MenczerA general evaluation framework fortopical crawlers Information Retrieval 8(3)417ndash447 2005
[UCH03] Trystan Upstill Nick Craswell and David HawkingPredicting fame and fortune Page-rank or indegree In Proceedings of the Australasian Document Computing SymposiumADCS2003 pages 31ndash40 Canberra Australia December 2003
[WHAT04] Fang Wu Bernardo A Huberman Lada A Adamic and Joshua R TylerInformation flow insocial groups Physica A Statistical and Theoretical Physics 337(1-2)327ndash335 June 2004
[WS98] Duncan J Watts and Steven H StrogatzCollective dynamics of small-world networksNa-ture 393(6684)440ndash442 June 1998
20
- Introduction
- Functional Rankings
- Damping Functions
-
- Linear damping
- Exponential damping PageRank
- Quadratic hyperbolic damping TotalRank
- General hyperbolic damping HyperRank
- An empirical damping
-
- Comparing Damping Functions
-
- Approximating TotalRank with PageRank
- Approximating HyperRank with PageRank
- Approximating PageRank with LinearRank
-
- Parameters for the Damping Functions
-
- Characteristic path lengths
- Damping parameters and in-degree
- Experimental comparison of ranking orders
-
- Conclusions
-
it Web graph uk Web graph40 mill nodes 18 mill nodes
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
Avg distance 149 Avg distance 148
euint Web graph Synthetic graph800000 nodes 100000 nodes
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
000
005
010
015
020
025
030
5 10 15 20 25 30
Freq
uenc
y
Distance
Avg distance 100 Avg distance 42
Figure 9 Distribution of the average number of nodes at a certain distance froma given node
One possible approach consistent with what we have done so far is to consider that the sum of theweights up to the average path lengths of the graphs (L1 L2) have to be similar for both rankings to behavein a similar way If we take this approach the solution is
1minus (αlowast1)L1+1 = 1minus (αlowast2)
L2+1
αlowast2 = (αlowast1)L1+1L2+1
αlowast2 asymp (αlowast1)log(N1)log(N2)
An example that can be used in practice is the following letrsquos consider a web graph withN1 = 115times109
pages (the size of the full Web estimated by [GS05]) and another graph with onlyN2 = 50times106 pages (thesize of the Web of a large country) the second graph is roughly 3 orders of magnitude smaller
If it is shown empirically thatαlowast1 = 085 is a good value for the PageRank parameter for the whole Webthenαlowast2 = 081 should have a similar behavior in the 50-million page set which is natural as the path lengthsare shorter If the subset of web pages were even smaller for instanceN2 = 106 pages (the size of the webof a large organization) thenαlowast2 = 076
14
52 Damping parameters and in-degree
In this section we are using data from theuk Web graph and a 8500-nodes synthetic graph with the sameindegree and outdegree distribution than the previous synthetic graph We first measured the variance ofthe values from the ranking function as we consider that a high variance is good in a ranking functionas the relative values differ more We also measured the relationship between the ranking function andin-degree for different values of the parameters in terms of covariance correlation coefficient and rankingorders (Kendallrsquosτ) The results are shown inFigure10
Variance Covariance Correlation coefficient Kendallrsquosτ
000e+00200e-13400e-13600e-13800e-13100e-12120e-12140e-12160e-12180e-12200e-12
01 02 03 04 05 06 07 08 09
Var
ianc
e of
Pag
eRan
k
Damping factor α
00 times 100
50 times 10-5
10 times 10-4
15 times 10-4
20 times 10-4
25 times 10-4
30 times 10-4
35 times 10-4
01 02 03 04 05 06 07 08 09
Cov
aria
nce
betw
een
Page
Ran
k an
d in
-deg
ree
Damping factor α
07000710072007300740075007600770078007900800
01 02 03 04 05 06 07 08 09
Corr
elat
ion
coef
ficie
nt w
ith in
-deg
ree
Damping factor α
042
044
046
048
050
052
054
056
01 02 03 04 05 06 07 08 09Ken
dallrsquo
s τ b
etw
een
Page
Rank
and
in-d
egre
e
Damping factor α
005
010
015
020
025
030
035
040
045
050
055
01 02 03 04 05 06 07 08 09
Var
ianc
e of
Pag
eRan
k
Damping factor α
0 times 100
1 times 10-3
2 times 10-3
3 times 10-3
4 times 10-3
5 times 10-3
6 times 10-3
01 02 03 04 05 06 07 08 09
Cov
aria
nce
betw
een
Page
Ran
k an
d in
-deg
ree
Damping factor α
0995
0996
0997
0998
0999
1000
01 02 03 04 05 06 07 08 09
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor α
079
080
081
082
083
084
085
086
087
01 02 03 04 05 06 07 08 09Ken
dallrsquo
s τ b
etw
een
Page
Rank
and
in-d
egre
e
Damping factor α
Figure 10 Variance of PageRank and its relationship with indegree in terms ofcovariance correlation coefficient and Kendallrsquosτ coefficient for varying valuesof the parameterα Top uk web graph bottom synthetic graph
The variance is higher asα increases In regard to the relationship with indegree for company homepages it has been shown that the logarithm of the in-degree is correlated with PageRank [UCH03] Ourresults are consistent with this observation The covariance monotonically increases withα both in the caseof the synthetic graph and in the web graph Not surprisingly using the generative model the correlationsare higher We observe a maximum correlation atα = 07 in the synthetic graph and atα = 05 in the webgraph We also notice that the correlation drops significativelly asα increases because a largeα means thatlonger paths have an effect in the calculation note however that this phenomenon does not significantlyimpact on the correlation coefficient that is still very large
A high correlation between PageRank and in-degree is bad from the point of view of a search enginebecause it makes link-spam easier In particular as the correlation coefficient is higher in theuk web graphnear 05 if we chooseα close to this value we are helping link spammers Note however that a highcorrelation was foreseeable because as shown in[CGS04] even approximating PageRank with just only 1level of links gets 70 of accuracy
The behavior of the Kendallrsquosτ coefficient which measures the similarity between ranking orders isthe opposite than the one observed in the real graph This also happens for HyperRank as inFigure11we made the same measurements for this functional ranking and the results were consistent The graphseems inverted because a low value ofβ has the same effect as a high value ofα longer paths have moreimportance in the calculation
The differences in the behavior of the ranking order in the synthetic graph might be explained by thefact that the generative model we are using does not capture some properties that might be relevant for theranking order under a functional ranking such as the clustering coefficient Also the synthetic graph is
15
Variance Covariance Correlation coefficient Kendallrsquosτ
000e+00
100e-12
200e-12
300e-12
400e-12
500e-12
600e-12
700e-12
800e-12
900e-12
1 15 2 25 3 35 4
Var
ianc
e of
Gen
eral
Hyp
erbo
lic R
ank
Damping factor β
000e+00
100e-04
200e-04
300e-04
400e-04
500e-04
600e-04
700e-04
800e-04
1 15 2 25 3 35 4Cov
aria
nce
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
0730
0740
0750
0760
0770
0780
0790
0800
1 15 2 25 3 35 4
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor β
04600
04700
04800
04900
05000
05100
05200
05300
1 15 2 25 3
Ken
dallrsquo
s τ
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
000e+00
200e-05
400e-05
600e-05
800e-05
100e-04
120e-04
140e-04
160e-04
1 15 2 25 3 35 4
Var
ianc
e of
Gen
eral
Hyp
erbo
lic R
ank
Damping factor β
000e+00
200e-03
400e-03
600e-03
800e-03
100e-02
120e-02
1 15 2 25 3 35 4Cov
aria
nce
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
0997
0997
0998
0998
0999
0999
1 15 2 25 3 35 4
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor β
0815008200082500830008350084000845008500085500860008650
1 15 2 25 3 35 4
Ken
dallrsquo
s τ b
etw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
Figure 11 Variance of general hyperbolic rank for some values ofβ and itsrelation with in-degree Experiments have been performed on theuk graph (top)and synthetic graph (bottom)
assortative (highly linked pages are mostly linked to other highly linked pages) while the real web graph isdisassortative (most of the neighbours of highly linked pages have small indegree)
53 Experimental comparison of ranking orders
In this section we present experimental results about the similarity between the ranking orders induced bysome of the functional rankings discussed in the previous sections To perform the comparison we usedKendallrsquosτ and data from the U K Web graphFigure12shows how PageRank compares with HyperRankfor various pairs ofα andβ In the limit αβrarr 1 both rankings are equivalent and they remain similar in alarge region of the parameter space
α
β
07
08
09
1
τ ge 095
τ
02 03
04 05
06 07
08 09
2 25
3 35
4
01
02
03
04
05
06
07
08
09
15 2 25 3 35 4
α th
at m
axim
izes
Ken
dallrsquo
s τ
Exponent β
Actual optimumPredicted optimum with length=5
Figure 12 Comparison (using Kendallrsquosτ) between PageRank and HyperRankwith various damping parameters in the UK web graph The optimum predictedin the analysis with = 5 is very close to the real one
In the figure we can see that the rankings obtained with HyperRank and PageRank can be almostequivalent (Kendallrsquosτ ge 095) moreover the analysis shown insection42 considering only paths of
16
lengths less than 5 provides a very good approximation for the optimum combination of parameters Thismeans that in fact the difference in the damping functions in the first few levels is crucial
The exponentsβ required for giving a good approximation of PageRank are very small whenα ge 07limiting the practical applicability of HyperRank as its convergence is not faster than the one of PageRank
As far as LinearRank and PageRank are concerned long paths and largeα should be considered toobtain a sufficiently similar ranking as shown inFigure13
05 06
07 08
09
5 10
15 20
25
080085090095100
τ
τ ge 095
αL
5
10
15
20
25
05 06 07 08 09L
that
max
imiz
es K
enda
llrsquos
τExponent α
Actual optimumPredicted optimum with length=5
Figure 13 Comparison (using Kendallrsquosτ) between PageRank and LinearRankin the UK web graph with various damping parameters Again the predictedoptimum with` = 5 is very close to the actual optimum
The predicted optimum given insection43 with ` = 5 this is considering only the summation of thedifferences between both damping functions up to paths of length 5 is very close to what was obtained inpractice Forα = 08 calculating LinearRank withL = 10 (which means the same number of iterations)givesτ ge 098 for α = 09 calculating LinearRank withL = 15 also givesτ ge 098 In both cases theranking order of PageRank is approximated by the ranking order of LinearRank with very high precision
6 Conclusions
In this paper we have defined a broad class of link-based ranking algorithms based on the contribution ofdamping factors along all different paths reaching a page We introduce four particular damping decays lin-ear exponential quadratic hyperbolic and general hyperbolic where exponential is equivalent to PageRankand quadratic hyperbolic to TotalRank
We studied the differences and similarities among these ranking algorithms and we have found that
bull Functional rankings using different damping functions can provide similar orderings if the parametersare chosen carefully
bull LinearRank can be used for calculating a ranking that is as good as PageRank but with a fixed andsmaller number of iterations
bull The parameters for the damping functions depend on the characteristic of path lengths in the graphwhich is known to grow sub-logarithmically on the size of the graph
More work is needed to find other damping functions that compute rankings similar to PageRank butare easier and faster to compute We use global ranking similarity but another measure could be the ranking
17
similarity in the top 20 results of real queries In this setting our results can change so future work willinclude this variation
Because of their high cost link-based ranking methods that involve iterative calculations at query timeare probably not used by large-scale search engines at this moment but the functional ranking with lineardamping we have presented can provide a good approximation with few iterations Also the approach wehave presented could be also applied to multivalued ranking functions such as HITS [Kle99] and topic-sensitive PageRank [Hav02] to obtain for instance a method for approximating the hubs and authorityscores using less iterations and a linear damping function
Our approach also helps to understand how easy or difficult is to collude many pages to modify theranking of a given page Clearly there are many different factors path lengths damping function branchingdegrees and number of colluded pages The graph structure of the collusion will affect those factors and weplan to analyze them
Acknowledgements
We would like to thank Daniel Fogaras for a valuable discussion about TotalRank that motivated part of thisresearch
References
[AJB99] Reka Albert Hawoong Jeong and Albert L Barabasi Diameter of the world wide webNature 401130ndash131 1999
[BH98] Krishna Bharat and Monika R HenzingerImproved algorithms for topic distillation in ahyperlinked environment In Proceedings of the 21st Annual International ACM SIGIR Con-ference on Research and Development in Information Retrieval pages 104ndash111 MelbourneAustralia August 1998 ACM Press New York
[BKM +00] Andrei Broder Ravi Kumar Farzin Maghoul Prabhakar Raghavan Sridhar RajagopalanRaymie Stata Andrew Tomkins and Janet WienerGraph structure in the web Experimentsand models In Proceedings of the Ninth Conference on World Wide Web pages 309ndash320Amsterdam Netherlands May 2000 ACM Press
[Bol05] Paolo BoldiTotalrank ranking without damping In Poster proceedings of the 14th interna-tional conference on World Wide Web pages 898ndash899 Chiba Japan 2005 ACM Press
[BR04] Bela Bollobas and Oliver RiordanThe diameter of a scale-free random graph Combinator-ica 24(1)5ndash34 2004
[Bri06] Michael Brinkmeier PageRank revisited ACM Transaction on Internet Technologies6(3)257ndash279 2006
[BSV05] Paolo Boldi Massimo Santini and Sebastiano VignaPagerank as a function of the dampingfactor In Proceedings of the 14th international conference on World Wide Web pages 557ndash566 Chiba Japan 2005 ACM Press
[BYD04] Ricardo Baeza-Yates and Emilio DavisWeb page ranking using link attributes In Alternatetrack papers amp posters of the 13th international conference on World Wide Web pages 328ndash329 New York NY USA 2004 ACM Press
18
[BYRN99] Ricardo Baeza-Yates and Berthier Ribeiro-NetoModern Information Retrieval AddisonWesley May 1999
[CGS04] Yen-Yu Chen Qingqing Gan and Torsten SuelLocal methods for estimating pagerank val-ues In Proceedings of the thirteenth ACM conference on Information and knowledge man-agement (CIKM) pages 381ndash389 New York NY USA 2004 ACM Press
[CL01] Fan Chung and Linyuan LuThe diameter of random sparse graphs Adv Appl Math26257ndash279 2001
[Dav00] Brian D DavisonTopical locality in the web In Proceedings of the 23rd annual internationalACM SIGIR conference on research and development in information retrieval pages 272ndash279Athens Greece 2000 ACM Press
[GG04] Gene H Golub and Chen GreifArnoldi-type algorithms for computing stationary distributionvectors with application to pagerank Technical Report SCCM-04-15 Stanford University2004
[GS05] Antonio Gulli and Alessio SignoriniThe indexable Web is more than 115 billion pages InPoster proceedings of the 14th international conference on World Wide Web pages 902ndash903Chiba Japan 2005 ACM Press
[Hav99] Taher HaveliwalaEfficient computation of pagerank Technical report Stanford University1999
[Hav02] Taher H HaveliwalaTopic-sensitive pagerank In Proceedings of the Eleventh World WideWeb Conference pages 517ndash526 Honolulu Hawaii USA May 2002 ACM Press
[HG98] Stephanie W Haas and Erika S GramsPage and link classifications connecting diverseresources In DL rsquo98 Proceedings of the third ACM conference on Digital libraries pages99ndash107 New York NY USA 1998 ACM Press
[HK03a] Taher Haveliwala and Sepandar KamvarThe condition number of the pagerank problemTechnical Report 36 Stanford University 2003
[HK03b] Taher Haveliwala and Sepandar KamvarThe second eigenvalue of the google matrix Tech-nical Report 20 Stanford University 2003
[JM98] W-K Joo and S H Myaeng Improving retrieval effectiveness with hyperlink informationIn Proceedings of International Workshop on Information Retrieval with Asian Languages(IRAL) Singapore October 1998
[KHMG03a] S Kamvar T Haveliwala C Manning and G GolubExploiting the block structure of theweb for computing pagerank 2003
[KHMG03b] Sepandar D Kamvar Taher H Haveliwala Christopher D Manning and Gene H GolubExtrapolation methods for accelerating pagerank computations In Proceedings of the twelfthinternational conference on World Wide Web pages 261ndash270 ACM Press 2003
[Kle99] Jon M KleinbergAuthoritative sources in a hyperlinked environment Journal of the ACM46(5)604ndash632 1999
19
[KRR+00] R Kumar P Raghavan S Rajagopalan D Sivakumar A Tomkins and E UpfalStochasticmodels for the web graph In Proceedings of the 41st Annual Symposium on Foundations ofComputer Science (FOCS) pages 57ndash65 Redondo Beach CA USA 2000 IEEE CS Press
[LGZ04] Chris P Lee Gene H Golub and Stefanos A ZeniosA fast two-stage algorithm for com-puting pagerank and its extensions Technical report Stanford University 2004
[Li98] Yanhong LiToward a qualitative search engine IEEE Internet Computing July 1998
[Lif00] Maxim LifantsevVoting model for ranking Web pages In Peter Graham and MuthucumaruMaheswaran editorsProceedings of the International Conference on Internet Computingpages 143ndash148 Las Vegas Nevada USA June 2000 CSREA Press
[Mar97] Massimo MarchioriThe quest for correct information of the Web hyper search engines InProc of the sixth international conference on the Web Santa Clara USA April 1997
[NSW01] M E Newman S H Strogatz and D J WattsRandom graphs with arbitrary degree distri-butions and their applicationsPhys Rev E Stat Nonlin Soft Matter Phys 64(2 Pt 2) August2001
[PBMW98] Lawrence Page Sergey Brin Rajeev Motwani and Terry WinogradThe PageRank citationranking bringing order to the Web Technical report Stanford Digital Library TechnologiesProject 1998
[PRU02] Gopal Pandurangan Prabhakara Raghavan and Eli UpfalUsing Pagerank to characterizeWeb structure In Proceedings of the 8th Annual International Computing and CombinatoricsConference (COCOON) volume 2387 ofLecture Notes in Computer Science pages 330ndash390Singapore August 2002 Springer
[SPM05] Padmini Srinivasan Gautam Pant and Filippo MenczerA general evaluation framework fortopical crawlers Information Retrieval 8(3)417ndash447 2005
[UCH03] Trystan Upstill Nick Craswell and David HawkingPredicting fame and fortune Page-rank or indegree In Proceedings of the Australasian Document Computing SymposiumADCS2003 pages 31ndash40 Canberra Australia December 2003
[WHAT04] Fang Wu Bernardo A Huberman Lada A Adamic and Joshua R TylerInformation flow insocial groups Physica A Statistical and Theoretical Physics 337(1-2)327ndash335 June 2004
[WS98] Duncan J Watts and Steven H StrogatzCollective dynamics of small-world networksNa-ture 393(6684)440ndash442 June 1998
20
- Introduction
- Functional Rankings
- Damping Functions
-
- Linear damping
- Exponential damping PageRank
- Quadratic hyperbolic damping TotalRank
- General hyperbolic damping HyperRank
- An empirical damping
-
- Comparing Damping Functions
-
- Approximating TotalRank with PageRank
- Approximating HyperRank with PageRank
- Approximating PageRank with LinearRank
-
- Parameters for the Damping Functions
-
- Characteristic path lengths
- Damping parameters and in-degree
- Experimental comparison of ranking orders
-
- Conclusions
-
52 Damping parameters and in-degree
In this section we are using data from theuk Web graph and a 8500-nodes synthetic graph with the sameindegree and outdegree distribution than the previous synthetic graph We first measured the variance ofthe values from the ranking function as we consider that a high variance is good in a ranking functionas the relative values differ more We also measured the relationship between the ranking function andin-degree for different values of the parameters in terms of covariance correlation coefficient and rankingorders (Kendallrsquosτ) The results are shown inFigure10
Variance Covariance Correlation coefficient Kendallrsquosτ
000e+00200e-13400e-13600e-13800e-13100e-12120e-12140e-12160e-12180e-12200e-12
01 02 03 04 05 06 07 08 09
Var
ianc
e of
Pag
eRan
k
Damping factor α
00 times 100
50 times 10-5
10 times 10-4
15 times 10-4
20 times 10-4
25 times 10-4
30 times 10-4
35 times 10-4
01 02 03 04 05 06 07 08 09
Cov
aria
nce
betw
een
Page
Ran
k an
d in
-deg
ree
Damping factor α
07000710072007300740075007600770078007900800
01 02 03 04 05 06 07 08 09
Corr
elat
ion
coef
ficie
nt w
ith in
-deg
ree
Damping factor α
042
044
046
048
050
052
054
056
01 02 03 04 05 06 07 08 09Ken
dallrsquo
s τ b
etw
een
Page
Rank
and
in-d
egre
e
Damping factor α
005
010
015
020
025
030
035
040
045
050
055
01 02 03 04 05 06 07 08 09
Var
ianc
e of
Pag
eRan
k
Damping factor α
0 times 100
1 times 10-3
2 times 10-3
3 times 10-3
4 times 10-3
5 times 10-3
6 times 10-3
01 02 03 04 05 06 07 08 09
Cov
aria
nce
betw
een
Page
Ran
k an
d in
-deg
ree
Damping factor α
0995
0996
0997
0998
0999
1000
01 02 03 04 05 06 07 08 09
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor α
079
080
081
082
083
084
085
086
087
01 02 03 04 05 06 07 08 09Ken
dallrsquo
s τ b
etw
een
Page
Rank
and
in-d
egre
e
Damping factor α
Figure 10 Variance of PageRank and its relationship with indegree in terms ofcovariance correlation coefficient and Kendallrsquosτ coefficient for varying valuesof the parameterα Top uk web graph bottom synthetic graph
The variance is higher asα increases In regard to the relationship with indegree for company homepages it has been shown that the logarithm of the in-degree is correlated with PageRank [UCH03] Ourresults are consistent with this observation The covariance monotonically increases withα both in the caseof the synthetic graph and in the web graph Not surprisingly using the generative model the correlationsare higher We observe a maximum correlation atα = 07 in the synthetic graph and atα = 05 in the webgraph We also notice that the correlation drops significativelly asα increases because a largeα means thatlonger paths have an effect in the calculation note however that this phenomenon does not significantlyimpact on the correlation coefficient that is still very large
A high correlation between PageRank and in-degree is bad from the point of view of a search enginebecause it makes link-spam easier In particular as the correlation coefficient is higher in theuk web graphnear 05 if we chooseα close to this value we are helping link spammers Note however that a highcorrelation was foreseeable because as shown in[CGS04] even approximating PageRank with just only 1level of links gets 70 of accuracy
The behavior of the Kendallrsquosτ coefficient which measures the similarity between ranking orders isthe opposite than the one observed in the real graph This also happens for HyperRank as inFigure11we made the same measurements for this functional ranking and the results were consistent The graphseems inverted because a low value ofβ has the same effect as a high value ofα longer paths have moreimportance in the calculation
The differences in the behavior of the ranking order in the synthetic graph might be explained by thefact that the generative model we are using does not capture some properties that might be relevant for theranking order under a functional ranking such as the clustering coefficient Also the synthetic graph is
15
Variance Covariance Correlation coefficient Kendallrsquosτ
000e+00
100e-12
200e-12
300e-12
400e-12
500e-12
600e-12
700e-12
800e-12
900e-12
1 15 2 25 3 35 4
Var
ianc
e of
Gen
eral
Hyp
erbo
lic R
ank
Damping factor β
000e+00
100e-04
200e-04
300e-04
400e-04
500e-04
600e-04
700e-04
800e-04
1 15 2 25 3 35 4Cov
aria
nce
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
0730
0740
0750
0760
0770
0780
0790
0800
1 15 2 25 3 35 4
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor β
04600
04700
04800
04900
05000
05100
05200
05300
1 15 2 25 3
Ken
dallrsquo
s τ
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
000e+00
200e-05
400e-05
600e-05
800e-05
100e-04
120e-04
140e-04
160e-04
1 15 2 25 3 35 4
Var
ianc
e of
Gen
eral
Hyp
erbo
lic R
ank
Damping factor β
000e+00
200e-03
400e-03
600e-03
800e-03
100e-02
120e-02
1 15 2 25 3 35 4Cov
aria
nce
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
0997
0997
0998
0998
0999
0999
1 15 2 25 3 35 4
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor β
0815008200082500830008350084000845008500085500860008650
1 15 2 25 3 35 4
Ken
dallrsquo
s τ b
etw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
Figure 11 Variance of general hyperbolic rank for some values ofβ and itsrelation with in-degree Experiments have been performed on theuk graph (top)and synthetic graph (bottom)
assortative (highly linked pages are mostly linked to other highly linked pages) while the real web graph isdisassortative (most of the neighbours of highly linked pages have small indegree)
53 Experimental comparison of ranking orders
In this section we present experimental results about the similarity between the ranking orders induced bysome of the functional rankings discussed in the previous sections To perform the comparison we usedKendallrsquosτ and data from the U K Web graphFigure12shows how PageRank compares with HyperRankfor various pairs ofα andβ In the limit αβrarr 1 both rankings are equivalent and they remain similar in alarge region of the parameter space
α
β
07
08
09
1
τ ge 095
τ
02 03
04 05
06 07
08 09
2 25
3 35
4
01
02
03
04
05
06
07
08
09
15 2 25 3 35 4
α th
at m
axim
izes
Ken
dallrsquo
s τ
Exponent β
Actual optimumPredicted optimum with length=5
Figure 12 Comparison (using Kendallrsquosτ) between PageRank and HyperRankwith various damping parameters in the UK web graph The optimum predictedin the analysis with = 5 is very close to the real one
In the figure we can see that the rankings obtained with HyperRank and PageRank can be almostequivalent (Kendallrsquosτ ge 095) moreover the analysis shown insection42 considering only paths of
16
lengths less than 5 provides a very good approximation for the optimum combination of parameters Thismeans that in fact the difference in the damping functions in the first few levels is crucial
The exponentsβ required for giving a good approximation of PageRank are very small whenα ge 07limiting the practical applicability of HyperRank as its convergence is not faster than the one of PageRank
As far as LinearRank and PageRank are concerned long paths and largeα should be considered toobtain a sufficiently similar ranking as shown inFigure13
05 06
07 08
09
5 10
15 20
25
080085090095100
τ
τ ge 095
αL
5
10
15
20
25
05 06 07 08 09L
that
max
imiz
es K
enda
llrsquos
τExponent α
Actual optimumPredicted optimum with length=5
Figure 13 Comparison (using Kendallrsquosτ) between PageRank and LinearRankin the UK web graph with various damping parameters Again the predictedoptimum with` = 5 is very close to the actual optimum
The predicted optimum given insection43 with ` = 5 this is considering only the summation of thedifferences between both damping functions up to paths of length 5 is very close to what was obtained inpractice Forα = 08 calculating LinearRank withL = 10 (which means the same number of iterations)givesτ ge 098 for α = 09 calculating LinearRank withL = 15 also givesτ ge 098 In both cases theranking order of PageRank is approximated by the ranking order of LinearRank with very high precision
6 Conclusions
In this paper we have defined a broad class of link-based ranking algorithms based on the contribution ofdamping factors along all different paths reaching a page We introduce four particular damping decays lin-ear exponential quadratic hyperbolic and general hyperbolic where exponential is equivalent to PageRankand quadratic hyperbolic to TotalRank
We studied the differences and similarities among these ranking algorithms and we have found that
bull Functional rankings using different damping functions can provide similar orderings if the parametersare chosen carefully
bull LinearRank can be used for calculating a ranking that is as good as PageRank but with a fixed andsmaller number of iterations
bull The parameters for the damping functions depend on the characteristic of path lengths in the graphwhich is known to grow sub-logarithmically on the size of the graph
More work is needed to find other damping functions that compute rankings similar to PageRank butare easier and faster to compute We use global ranking similarity but another measure could be the ranking
17
similarity in the top 20 results of real queries In this setting our results can change so future work willinclude this variation
Because of their high cost link-based ranking methods that involve iterative calculations at query timeare probably not used by large-scale search engines at this moment but the functional ranking with lineardamping we have presented can provide a good approximation with few iterations Also the approach wehave presented could be also applied to multivalued ranking functions such as HITS [Kle99] and topic-sensitive PageRank [Hav02] to obtain for instance a method for approximating the hubs and authorityscores using less iterations and a linear damping function
Our approach also helps to understand how easy or difficult is to collude many pages to modify theranking of a given page Clearly there are many different factors path lengths damping function branchingdegrees and number of colluded pages The graph structure of the collusion will affect those factors and weplan to analyze them
Acknowledgements
We would like to thank Daniel Fogaras for a valuable discussion about TotalRank that motivated part of thisresearch
References
[AJB99] Reka Albert Hawoong Jeong and Albert L Barabasi Diameter of the world wide webNature 401130ndash131 1999
[BH98] Krishna Bharat and Monika R HenzingerImproved algorithms for topic distillation in ahyperlinked environment In Proceedings of the 21st Annual International ACM SIGIR Con-ference on Research and Development in Information Retrieval pages 104ndash111 MelbourneAustralia August 1998 ACM Press New York
[BKM +00] Andrei Broder Ravi Kumar Farzin Maghoul Prabhakar Raghavan Sridhar RajagopalanRaymie Stata Andrew Tomkins and Janet WienerGraph structure in the web Experimentsand models In Proceedings of the Ninth Conference on World Wide Web pages 309ndash320Amsterdam Netherlands May 2000 ACM Press
[Bol05] Paolo BoldiTotalrank ranking without damping In Poster proceedings of the 14th interna-tional conference on World Wide Web pages 898ndash899 Chiba Japan 2005 ACM Press
[BR04] Bela Bollobas and Oliver RiordanThe diameter of a scale-free random graph Combinator-ica 24(1)5ndash34 2004
[Bri06] Michael Brinkmeier PageRank revisited ACM Transaction on Internet Technologies6(3)257ndash279 2006
[BSV05] Paolo Boldi Massimo Santini and Sebastiano VignaPagerank as a function of the dampingfactor In Proceedings of the 14th international conference on World Wide Web pages 557ndash566 Chiba Japan 2005 ACM Press
[BYD04] Ricardo Baeza-Yates and Emilio DavisWeb page ranking using link attributes In Alternatetrack papers amp posters of the 13th international conference on World Wide Web pages 328ndash329 New York NY USA 2004 ACM Press
18
[BYRN99] Ricardo Baeza-Yates and Berthier Ribeiro-NetoModern Information Retrieval AddisonWesley May 1999
[CGS04] Yen-Yu Chen Qingqing Gan and Torsten SuelLocal methods for estimating pagerank val-ues In Proceedings of the thirteenth ACM conference on Information and knowledge man-agement (CIKM) pages 381ndash389 New York NY USA 2004 ACM Press
[CL01] Fan Chung and Linyuan LuThe diameter of random sparse graphs Adv Appl Math26257ndash279 2001
[Dav00] Brian D DavisonTopical locality in the web In Proceedings of the 23rd annual internationalACM SIGIR conference on research and development in information retrieval pages 272ndash279Athens Greece 2000 ACM Press
[GG04] Gene H Golub and Chen GreifArnoldi-type algorithms for computing stationary distributionvectors with application to pagerank Technical Report SCCM-04-15 Stanford University2004
[GS05] Antonio Gulli and Alessio SignoriniThe indexable Web is more than 115 billion pages InPoster proceedings of the 14th international conference on World Wide Web pages 902ndash903Chiba Japan 2005 ACM Press
[Hav99] Taher HaveliwalaEfficient computation of pagerank Technical report Stanford University1999
[Hav02] Taher H HaveliwalaTopic-sensitive pagerank In Proceedings of the Eleventh World WideWeb Conference pages 517ndash526 Honolulu Hawaii USA May 2002 ACM Press
[HG98] Stephanie W Haas and Erika S GramsPage and link classifications connecting diverseresources In DL rsquo98 Proceedings of the third ACM conference on Digital libraries pages99ndash107 New York NY USA 1998 ACM Press
[HK03a] Taher Haveliwala and Sepandar KamvarThe condition number of the pagerank problemTechnical Report 36 Stanford University 2003
[HK03b] Taher Haveliwala and Sepandar KamvarThe second eigenvalue of the google matrix Tech-nical Report 20 Stanford University 2003
[JM98] W-K Joo and S H Myaeng Improving retrieval effectiveness with hyperlink informationIn Proceedings of International Workshop on Information Retrieval with Asian Languages(IRAL) Singapore October 1998
[KHMG03a] S Kamvar T Haveliwala C Manning and G GolubExploiting the block structure of theweb for computing pagerank 2003
[KHMG03b] Sepandar D Kamvar Taher H Haveliwala Christopher D Manning and Gene H GolubExtrapolation methods for accelerating pagerank computations In Proceedings of the twelfthinternational conference on World Wide Web pages 261ndash270 ACM Press 2003
[Kle99] Jon M KleinbergAuthoritative sources in a hyperlinked environment Journal of the ACM46(5)604ndash632 1999
19
[KRR+00] R Kumar P Raghavan S Rajagopalan D Sivakumar A Tomkins and E UpfalStochasticmodels for the web graph In Proceedings of the 41st Annual Symposium on Foundations ofComputer Science (FOCS) pages 57ndash65 Redondo Beach CA USA 2000 IEEE CS Press
[LGZ04] Chris P Lee Gene H Golub and Stefanos A ZeniosA fast two-stage algorithm for com-puting pagerank and its extensions Technical report Stanford University 2004
[Li98] Yanhong LiToward a qualitative search engine IEEE Internet Computing July 1998
[Lif00] Maxim LifantsevVoting model for ranking Web pages In Peter Graham and MuthucumaruMaheswaran editorsProceedings of the International Conference on Internet Computingpages 143ndash148 Las Vegas Nevada USA June 2000 CSREA Press
[Mar97] Massimo MarchioriThe quest for correct information of the Web hyper search engines InProc of the sixth international conference on the Web Santa Clara USA April 1997
[NSW01] M E Newman S H Strogatz and D J WattsRandom graphs with arbitrary degree distri-butions and their applicationsPhys Rev E Stat Nonlin Soft Matter Phys 64(2 Pt 2) August2001
[PBMW98] Lawrence Page Sergey Brin Rajeev Motwani and Terry WinogradThe PageRank citationranking bringing order to the Web Technical report Stanford Digital Library TechnologiesProject 1998
[PRU02] Gopal Pandurangan Prabhakara Raghavan and Eli UpfalUsing Pagerank to characterizeWeb structure In Proceedings of the 8th Annual International Computing and CombinatoricsConference (COCOON) volume 2387 ofLecture Notes in Computer Science pages 330ndash390Singapore August 2002 Springer
[SPM05] Padmini Srinivasan Gautam Pant and Filippo MenczerA general evaluation framework fortopical crawlers Information Retrieval 8(3)417ndash447 2005
[UCH03] Trystan Upstill Nick Craswell and David HawkingPredicting fame and fortune Page-rank or indegree In Proceedings of the Australasian Document Computing SymposiumADCS2003 pages 31ndash40 Canberra Australia December 2003
[WHAT04] Fang Wu Bernardo A Huberman Lada A Adamic and Joshua R TylerInformation flow insocial groups Physica A Statistical and Theoretical Physics 337(1-2)327ndash335 June 2004
[WS98] Duncan J Watts and Steven H StrogatzCollective dynamics of small-world networksNa-ture 393(6684)440ndash442 June 1998
20
- Introduction
- Functional Rankings
- Damping Functions
-
- Linear damping
- Exponential damping PageRank
- Quadratic hyperbolic damping TotalRank
- General hyperbolic damping HyperRank
- An empirical damping
-
- Comparing Damping Functions
-
- Approximating TotalRank with PageRank
- Approximating HyperRank with PageRank
- Approximating PageRank with LinearRank
-
- Parameters for the Damping Functions
-
- Characteristic path lengths
- Damping parameters and in-degree
- Experimental comparison of ranking orders
-
- Conclusions
-
Variance Covariance Correlation coefficient Kendallrsquosτ
000e+00
100e-12
200e-12
300e-12
400e-12
500e-12
600e-12
700e-12
800e-12
900e-12
1 15 2 25 3 35 4
Var
ianc
e of
Gen
eral
Hyp
erbo
lic R
ank
Damping factor β
000e+00
100e-04
200e-04
300e-04
400e-04
500e-04
600e-04
700e-04
800e-04
1 15 2 25 3 35 4Cov
aria
nce
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
0730
0740
0750
0760
0770
0780
0790
0800
1 15 2 25 3 35 4
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor β
04600
04700
04800
04900
05000
05100
05200
05300
1 15 2 25 3
Ken
dallrsquo
s τ
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
000e+00
200e-05
400e-05
600e-05
800e-05
100e-04
120e-04
140e-04
160e-04
1 15 2 25 3 35 4
Var
ianc
e of
Gen
eral
Hyp
erbo
lic R
ank
Damping factor β
000e+00
200e-03
400e-03
600e-03
800e-03
100e-02
120e-02
1 15 2 25 3 35 4Cov
aria
nce
betw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
0997
0997
0998
0998
0999
0999
1 15 2 25 3 35 4
Cor
rela
tion
coef
fici
ent w
ith in
-deg
ree
Damping factor β
0815008200082500830008350084000845008500085500860008650
1 15 2 25 3 35 4
Ken
dallrsquo
s τ b
etw
een
Gen
eral
Hyp
erbo
lic R
ank
and
in-d
egre
e
Damping factor β
Figure 11 Variance of general hyperbolic rank for some values ofβ and itsrelation with in-degree Experiments have been performed on theuk graph (top)and synthetic graph (bottom)
assortative (highly linked pages are mostly linked to other highly linked pages) while the real web graph isdisassortative (most of the neighbours of highly linked pages have small indegree)
53 Experimental comparison of ranking orders
In this section we present experimental results about the similarity between the ranking orders induced bysome of the functional rankings discussed in the previous sections To perform the comparison we usedKendallrsquosτ and data from the U K Web graphFigure12shows how PageRank compares with HyperRankfor various pairs ofα andβ In the limit αβrarr 1 both rankings are equivalent and they remain similar in alarge region of the parameter space
α
β
07
08
09
1
τ ge 095
τ
02 03
04 05
06 07
08 09
2 25
3 35
4
01
02
03
04
05
06
07
08
09
15 2 25 3 35 4
α th
at m
axim
izes
Ken
dallrsquo
s τ
Exponent β
Actual optimumPredicted optimum with length=5
Figure 12 Comparison (using Kendallrsquosτ) between PageRank and HyperRankwith various damping parameters in the UK web graph The optimum predictedin the analysis with = 5 is very close to the real one
In the figure we can see that the rankings obtained with HyperRank and PageRank can be almostequivalent (Kendallrsquosτ ge 095) moreover the analysis shown insection42 considering only paths of
16
lengths less than 5 provides a very good approximation for the optimum combination of parameters Thismeans that in fact the difference in the damping functions in the first few levels is crucial
The exponentsβ required for giving a good approximation of PageRank are very small whenα ge 07limiting the practical applicability of HyperRank as its convergence is not faster than the one of PageRank
As far as LinearRank and PageRank are concerned long paths and largeα should be considered toobtain a sufficiently similar ranking as shown inFigure13
05 06
07 08
09
5 10
15 20
25
080085090095100
τ
τ ge 095
αL
5
10
15
20
25
05 06 07 08 09L
that
max
imiz
es K
enda
llrsquos
τExponent α
Actual optimumPredicted optimum with length=5
Figure 13 Comparison (using Kendallrsquosτ) between PageRank and LinearRankin the UK web graph with various damping parameters Again the predictedoptimum with` = 5 is very close to the actual optimum
The predicted optimum given insection43 with ` = 5 this is considering only the summation of thedifferences between both damping functions up to paths of length 5 is very close to what was obtained inpractice Forα = 08 calculating LinearRank withL = 10 (which means the same number of iterations)givesτ ge 098 for α = 09 calculating LinearRank withL = 15 also givesτ ge 098 In both cases theranking order of PageRank is approximated by the ranking order of LinearRank with very high precision
6 Conclusions
In this paper we have defined a broad class of link-based ranking algorithms based on the contribution ofdamping factors along all different paths reaching a page We introduce four particular damping decays lin-ear exponential quadratic hyperbolic and general hyperbolic where exponential is equivalent to PageRankand quadratic hyperbolic to TotalRank
We studied the differences and similarities among these ranking algorithms and we have found that
bull Functional rankings using different damping functions can provide similar orderings if the parametersare chosen carefully
bull LinearRank can be used for calculating a ranking that is as good as PageRank but with a fixed andsmaller number of iterations
bull The parameters for the damping functions depend on the characteristic of path lengths in the graphwhich is known to grow sub-logarithmically on the size of the graph
More work is needed to find other damping functions that compute rankings similar to PageRank butare easier and faster to compute We use global ranking similarity but another measure could be the ranking
17
similarity in the top 20 results of real queries In this setting our results can change so future work willinclude this variation
Because of their high cost link-based ranking methods that involve iterative calculations at query timeare probably not used by large-scale search engines at this moment but the functional ranking with lineardamping we have presented can provide a good approximation with few iterations Also the approach wehave presented could be also applied to multivalued ranking functions such as HITS [Kle99] and topic-sensitive PageRank [Hav02] to obtain for instance a method for approximating the hubs and authorityscores using less iterations and a linear damping function
Our approach also helps to understand how easy or difficult is to collude many pages to modify theranking of a given page Clearly there are many different factors path lengths damping function branchingdegrees and number of colluded pages The graph structure of the collusion will affect those factors and weplan to analyze them
Acknowledgements
We would like to thank Daniel Fogaras for a valuable discussion about TotalRank that motivated part of thisresearch
References
[AJB99] Reka Albert Hawoong Jeong and Albert L Barabasi Diameter of the world wide webNature 401130ndash131 1999
[BH98] Krishna Bharat and Monika R HenzingerImproved algorithms for topic distillation in ahyperlinked environment In Proceedings of the 21st Annual International ACM SIGIR Con-ference on Research and Development in Information Retrieval pages 104ndash111 MelbourneAustralia August 1998 ACM Press New York
[BKM +00] Andrei Broder Ravi Kumar Farzin Maghoul Prabhakar Raghavan Sridhar RajagopalanRaymie Stata Andrew Tomkins and Janet WienerGraph structure in the web Experimentsand models In Proceedings of the Ninth Conference on World Wide Web pages 309ndash320Amsterdam Netherlands May 2000 ACM Press
[Bol05] Paolo BoldiTotalrank ranking without damping In Poster proceedings of the 14th interna-tional conference on World Wide Web pages 898ndash899 Chiba Japan 2005 ACM Press
[BR04] Bela Bollobas and Oliver RiordanThe diameter of a scale-free random graph Combinator-ica 24(1)5ndash34 2004
[Bri06] Michael Brinkmeier PageRank revisited ACM Transaction on Internet Technologies6(3)257ndash279 2006
[BSV05] Paolo Boldi Massimo Santini and Sebastiano VignaPagerank as a function of the dampingfactor In Proceedings of the 14th international conference on World Wide Web pages 557ndash566 Chiba Japan 2005 ACM Press
[BYD04] Ricardo Baeza-Yates and Emilio DavisWeb page ranking using link attributes In Alternatetrack papers amp posters of the 13th international conference on World Wide Web pages 328ndash329 New York NY USA 2004 ACM Press
18
[BYRN99] Ricardo Baeza-Yates and Berthier Ribeiro-NetoModern Information Retrieval AddisonWesley May 1999
[CGS04] Yen-Yu Chen Qingqing Gan and Torsten SuelLocal methods for estimating pagerank val-ues In Proceedings of the thirteenth ACM conference on Information and knowledge man-agement (CIKM) pages 381ndash389 New York NY USA 2004 ACM Press
[CL01] Fan Chung and Linyuan LuThe diameter of random sparse graphs Adv Appl Math26257ndash279 2001
[Dav00] Brian D DavisonTopical locality in the web In Proceedings of the 23rd annual internationalACM SIGIR conference on research and development in information retrieval pages 272ndash279Athens Greece 2000 ACM Press
[GG04] Gene H Golub and Chen GreifArnoldi-type algorithms for computing stationary distributionvectors with application to pagerank Technical Report SCCM-04-15 Stanford University2004
[GS05] Antonio Gulli and Alessio SignoriniThe indexable Web is more than 115 billion pages InPoster proceedings of the 14th international conference on World Wide Web pages 902ndash903Chiba Japan 2005 ACM Press
[Hav99] Taher HaveliwalaEfficient computation of pagerank Technical report Stanford University1999
[Hav02] Taher H HaveliwalaTopic-sensitive pagerank In Proceedings of the Eleventh World WideWeb Conference pages 517ndash526 Honolulu Hawaii USA May 2002 ACM Press
[HG98] Stephanie W Haas and Erika S GramsPage and link classifications connecting diverseresources In DL rsquo98 Proceedings of the third ACM conference on Digital libraries pages99ndash107 New York NY USA 1998 ACM Press
[HK03a] Taher Haveliwala and Sepandar KamvarThe condition number of the pagerank problemTechnical Report 36 Stanford University 2003
[HK03b] Taher Haveliwala and Sepandar KamvarThe second eigenvalue of the google matrix Tech-nical Report 20 Stanford University 2003
[JM98] W-K Joo and S H Myaeng Improving retrieval effectiveness with hyperlink informationIn Proceedings of International Workshop on Information Retrieval with Asian Languages(IRAL) Singapore October 1998
[KHMG03a] S Kamvar T Haveliwala C Manning and G GolubExploiting the block structure of theweb for computing pagerank 2003
[KHMG03b] Sepandar D Kamvar Taher H Haveliwala Christopher D Manning and Gene H GolubExtrapolation methods for accelerating pagerank computations In Proceedings of the twelfthinternational conference on World Wide Web pages 261ndash270 ACM Press 2003
[Kle99] Jon M KleinbergAuthoritative sources in a hyperlinked environment Journal of the ACM46(5)604ndash632 1999
19
[KRR+00] R Kumar P Raghavan S Rajagopalan D Sivakumar A Tomkins and E UpfalStochasticmodels for the web graph In Proceedings of the 41st Annual Symposium on Foundations ofComputer Science (FOCS) pages 57ndash65 Redondo Beach CA USA 2000 IEEE CS Press
[LGZ04] Chris P Lee Gene H Golub and Stefanos A ZeniosA fast two-stage algorithm for com-puting pagerank and its extensions Technical report Stanford University 2004
[Li98] Yanhong LiToward a qualitative search engine IEEE Internet Computing July 1998
[Lif00] Maxim LifantsevVoting model for ranking Web pages In Peter Graham and MuthucumaruMaheswaran editorsProceedings of the International Conference on Internet Computingpages 143ndash148 Las Vegas Nevada USA June 2000 CSREA Press
[Mar97] Massimo MarchioriThe quest for correct information of the Web hyper search engines InProc of the sixth international conference on the Web Santa Clara USA April 1997
[NSW01] M E Newman S H Strogatz and D J WattsRandom graphs with arbitrary degree distri-butions and their applicationsPhys Rev E Stat Nonlin Soft Matter Phys 64(2 Pt 2) August2001
[PBMW98] Lawrence Page Sergey Brin Rajeev Motwani and Terry WinogradThe PageRank citationranking bringing order to the Web Technical report Stanford Digital Library TechnologiesProject 1998
[PRU02] Gopal Pandurangan Prabhakara Raghavan and Eli UpfalUsing Pagerank to characterizeWeb structure In Proceedings of the 8th Annual International Computing and CombinatoricsConference (COCOON) volume 2387 ofLecture Notes in Computer Science pages 330ndash390Singapore August 2002 Springer
[SPM05] Padmini Srinivasan Gautam Pant and Filippo MenczerA general evaluation framework fortopical crawlers Information Retrieval 8(3)417ndash447 2005
[UCH03] Trystan Upstill Nick Craswell and David HawkingPredicting fame and fortune Page-rank or indegree In Proceedings of the Australasian Document Computing SymposiumADCS2003 pages 31ndash40 Canberra Australia December 2003
[WHAT04] Fang Wu Bernardo A Huberman Lada A Adamic and Joshua R TylerInformation flow insocial groups Physica A Statistical and Theoretical Physics 337(1-2)327ndash335 June 2004
[WS98] Duncan J Watts and Steven H StrogatzCollective dynamics of small-world networksNa-ture 393(6684)440ndash442 June 1998
20
- Introduction
- Functional Rankings
- Damping Functions
-
- Linear damping
- Exponential damping PageRank
- Quadratic hyperbolic damping TotalRank
- General hyperbolic damping HyperRank
- An empirical damping
-
- Comparing Damping Functions
-
- Approximating TotalRank with PageRank
- Approximating HyperRank with PageRank
- Approximating PageRank with LinearRank
-
- Parameters for the Damping Functions
-
- Characteristic path lengths
- Damping parameters and in-degree
- Experimental comparison of ranking orders
-
- Conclusions
-
lengths less than 5 provides a very good approximation for the optimum combination of parameters Thismeans that in fact the difference in the damping functions in the first few levels is crucial
The exponentsβ required for giving a good approximation of PageRank are very small whenα ge 07limiting the practical applicability of HyperRank as its convergence is not faster than the one of PageRank
As far as LinearRank and PageRank are concerned long paths and largeα should be considered toobtain a sufficiently similar ranking as shown inFigure13
05 06
07 08
09
5 10
15 20
25
080085090095100
τ
τ ge 095
αL
5
10
15
20
25
05 06 07 08 09L
that
max
imiz
es K
enda
llrsquos
τExponent α
Actual optimumPredicted optimum with length=5
Figure 13 Comparison (using Kendallrsquosτ) between PageRank and LinearRankin the UK web graph with various damping parameters Again the predictedoptimum with` = 5 is very close to the actual optimum
The predicted optimum given insection43 with ` = 5 this is considering only the summation of thedifferences between both damping functions up to paths of length 5 is very close to what was obtained inpractice Forα = 08 calculating LinearRank withL = 10 (which means the same number of iterations)givesτ ge 098 for α = 09 calculating LinearRank withL = 15 also givesτ ge 098 In both cases theranking order of PageRank is approximated by the ranking order of LinearRank with very high precision
6 Conclusions
In this paper we have defined a broad class of link-based ranking algorithms based on the contribution ofdamping factors along all different paths reaching a page We introduce four particular damping decays lin-ear exponential quadratic hyperbolic and general hyperbolic where exponential is equivalent to PageRankand quadratic hyperbolic to TotalRank
We studied the differences and similarities among these ranking algorithms and we have found that
bull Functional rankings using different damping functions can provide similar orderings if the parametersare chosen carefully
bull LinearRank can be used for calculating a ranking that is as good as PageRank but with a fixed andsmaller number of iterations
bull The parameters for the damping functions depend on the characteristic of path lengths in the graphwhich is known to grow sub-logarithmically on the size of the graph
More work is needed to find other damping functions that compute rankings similar to PageRank butare easier and faster to compute We use global ranking similarity but another measure could be the ranking
17
similarity in the top 20 results of real queries In this setting our results can change so future work willinclude this variation
Because of their high cost link-based ranking methods that involve iterative calculations at query timeare probably not used by large-scale search engines at this moment but the functional ranking with lineardamping we have presented can provide a good approximation with few iterations Also the approach wehave presented could be also applied to multivalued ranking functions such as HITS [Kle99] and topic-sensitive PageRank [Hav02] to obtain for instance a method for approximating the hubs and authorityscores using less iterations and a linear damping function
Our approach also helps to understand how easy or difficult is to collude many pages to modify theranking of a given page Clearly there are many different factors path lengths damping function branchingdegrees and number of colluded pages The graph structure of the collusion will affect those factors and weplan to analyze them
Acknowledgements
We would like to thank Daniel Fogaras for a valuable discussion about TotalRank that motivated part of thisresearch
References
[AJB99] Reka Albert Hawoong Jeong and Albert L Barabasi Diameter of the world wide webNature 401130ndash131 1999
[BH98] Krishna Bharat and Monika R HenzingerImproved algorithms for topic distillation in ahyperlinked environment In Proceedings of the 21st Annual International ACM SIGIR Con-ference on Research and Development in Information Retrieval pages 104ndash111 MelbourneAustralia August 1998 ACM Press New York
[BKM +00] Andrei Broder Ravi Kumar Farzin Maghoul Prabhakar Raghavan Sridhar RajagopalanRaymie Stata Andrew Tomkins and Janet WienerGraph structure in the web Experimentsand models In Proceedings of the Ninth Conference on World Wide Web pages 309ndash320Amsterdam Netherlands May 2000 ACM Press
[Bol05] Paolo BoldiTotalrank ranking without damping In Poster proceedings of the 14th interna-tional conference on World Wide Web pages 898ndash899 Chiba Japan 2005 ACM Press
[BR04] Bela Bollobas and Oliver RiordanThe diameter of a scale-free random graph Combinator-ica 24(1)5ndash34 2004
[Bri06] Michael Brinkmeier PageRank revisited ACM Transaction on Internet Technologies6(3)257ndash279 2006
[BSV05] Paolo Boldi Massimo Santini and Sebastiano VignaPagerank as a function of the dampingfactor In Proceedings of the 14th international conference on World Wide Web pages 557ndash566 Chiba Japan 2005 ACM Press
[BYD04] Ricardo Baeza-Yates and Emilio DavisWeb page ranking using link attributes In Alternatetrack papers amp posters of the 13th international conference on World Wide Web pages 328ndash329 New York NY USA 2004 ACM Press
18
[BYRN99] Ricardo Baeza-Yates and Berthier Ribeiro-NetoModern Information Retrieval AddisonWesley May 1999
[CGS04] Yen-Yu Chen Qingqing Gan and Torsten SuelLocal methods for estimating pagerank val-ues In Proceedings of the thirteenth ACM conference on Information and knowledge man-agement (CIKM) pages 381ndash389 New York NY USA 2004 ACM Press
[CL01] Fan Chung and Linyuan LuThe diameter of random sparse graphs Adv Appl Math26257ndash279 2001
[Dav00] Brian D DavisonTopical locality in the web In Proceedings of the 23rd annual internationalACM SIGIR conference on research and development in information retrieval pages 272ndash279Athens Greece 2000 ACM Press
[GG04] Gene H Golub and Chen GreifArnoldi-type algorithms for computing stationary distributionvectors with application to pagerank Technical Report SCCM-04-15 Stanford University2004
[GS05] Antonio Gulli and Alessio SignoriniThe indexable Web is more than 115 billion pages InPoster proceedings of the 14th international conference on World Wide Web pages 902ndash903Chiba Japan 2005 ACM Press
[Hav99] Taher HaveliwalaEfficient computation of pagerank Technical report Stanford University1999
[Hav02] Taher H HaveliwalaTopic-sensitive pagerank In Proceedings of the Eleventh World WideWeb Conference pages 517ndash526 Honolulu Hawaii USA May 2002 ACM Press
[HG98] Stephanie W Haas and Erika S GramsPage and link classifications connecting diverseresources In DL rsquo98 Proceedings of the third ACM conference on Digital libraries pages99ndash107 New York NY USA 1998 ACM Press
[HK03a] Taher Haveliwala and Sepandar KamvarThe condition number of the pagerank problemTechnical Report 36 Stanford University 2003
[HK03b] Taher Haveliwala and Sepandar KamvarThe second eigenvalue of the google matrix Tech-nical Report 20 Stanford University 2003
[JM98] W-K Joo and S H Myaeng Improving retrieval effectiveness with hyperlink informationIn Proceedings of International Workshop on Information Retrieval with Asian Languages(IRAL) Singapore October 1998
[KHMG03a] S Kamvar T Haveliwala C Manning and G GolubExploiting the block structure of theweb for computing pagerank 2003
[KHMG03b] Sepandar D Kamvar Taher H Haveliwala Christopher D Manning and Gene H GolubExtrapolation methods for accelerating pagerank computations In Proceedings of the twelfthinternational conference on World Wide Web pages 261ndash270 ACM Press 2003
[Kle99] Jon M KleinbergAuthoritative sources in a hyperlinked environment Journal of the ACM46(5)604ndash632 1999
19
[KRR+00] R Kumar P Raghavan S Rajagopalan D Sivakumar A Tomkins and E UpfalStochasticmodels for the web graph In Proceedings of the 41st Annual Symposium on Foundations ofComputer Science (FOCS) pages 57ndash65 Redondo Beach CA USA 2000 IEEE CS Press
[LGZ04] Chris P Lee Gene H Golub and Stefanos A ZeniosA fast two-stage algorithm for com-puting pagerank and its extensions Technical report Stanford University 2004
[Li98] Yanhong LiToward a qualitative search engine IEEE Internet Computing July 1998
[Lif00] Maxim LifantsevVoting model for ranking Web pages In Peter Graham and MuthucumaruMaheswaran editorsProceedings of the International Conference on Internet Computingpages 143ndash148 Las Vegas Nevada USA June 2000 CSREA Press
[Mar97] Massimo MarchioriThe quest for correct information of the Web hyper search engines InProc of the sixth international conference on the Web Santa Clara USA April 1997
[NSW01] M E Newman S H Strogatz and D J WattsRandom graphs with arbitrary degree distri-butions and their applicationsPhys Rev E Stat Nonlin Soft Matter Phys 64(2 Pt 2) August2001
[PBMW98] Lawrence Page Sergey Brin Rajeev Motwani and Terry WinogradThe PageRank citationranking bringing order to the Web Technical report Stanford Digital Library TechnologiesProject 1998
[PRU02] Gopal Pandurangan Prabhakara Raghavan and Eli UpfalUsing Pagerank to characterizeWeb structure In Proceedings of the 8th Annual International Computing and CombinatoricsConference (COCOON) volume 2387 ofLecture Notes in Computer Science pages 330ndash390Singapore August 2002 Springer
[SPM05] Padmini Srinivasan Gautam Pant and Filippo MenczerA general evaluation framework fortopical crawlers Information Retrieval 8(3)417ndash447 2005
[UCH03] Trystan Upstill Nick Craswell and David HawkingPredicting fame and fortune Page-rank or indegree In Proceedings of the Australasian Document Computing SymposiumADCS2003 pages 31ndash40 Canberra Australia December 2003
[WHAT04] Fang Wu Bernardo A Huberman Lada A Adamic and Joshua R TylerInformation flow insocial groups Physica A Statistical and Theoretical Physics 337(1-2)327ndash335 June 2004
[WS98] Duncan J Watts and Steven H StrogatzCollective dynamics of small-world networksNa-ture 393(6684)440ndash442 June 1998
20
- Introduction
- Functional Rankings
- Damping Functions
-
- Linear damping
- Exponential damping PageRank
- Quadratic hyperbolic damping TotalRank
- General hyperbolic damping HyperRank
- An empirical damping
-
- Comparing Damping Functions
-
- Approximating TotalRank with PageRank
- Approximating HyperRank with PageRank
- Approximating PageRank with LinearRank
-
- Parameters for the Damping Functions
-
- Characteristic path lengths
- Damping parameters and in-degree
- Experimental comparison of ranking orders
-
- Conclusions
-
similarity in the top 20 results of real queries In this setting our results can change so future work willinclude this variation
Because of their high cost link-based ranking methods that involve iterative calculations at query timeare probably not used by large-scale search engines at this moment but the functional ranking with lineardamping we have presented can provide a good approximation with few iterations Also the approach wehave presented could be also applied to multivalued ranking functions such as HITS [Kle99] and topic-sensitive PageRank [Hav02] to obtain for instance a method for approximating the hubs and authorityscores using less iterations and a linear damping function
Our approach also helps to understand how easy or difficult is to collude many pages to modify theranking of a given page Clearly there are many different factors path lengths damping function branchingdegrees and number of colluded pages The graph structure of the collusion will affect those factors and weplan to analyze them
Acknowledgements
We would like to thank Daniel Fogaras for a valuable discussion about TotalRank that motivated part of thisresearch
References
[AJB99] Reka Albert Hawoong Jeong and Albert L Barabasi Diameter of the world wide webNature 401130ndash131 1999
[BH98] Krishna Bharat and Monika R HenzingerImproved algorithms for topic distillation in ahyperlinked environment In Proceedings of the 21st Annual International ACM SIGIR Con-ference on Research and Development in Information Retrieval pages 104ndash111 MelbourneAustralia August 1998 ACM Press New York
[BKM +00] Andrei Broder Ravi Kumar Farzin Maghoul Prabhakar Raghavan Sridhar RajagopalanRaymie Stata Andrew Tomkins and Janet WienerGraph structure in the web Experimentsand models In Proceedings of the Ninth Conference on World Wide Web pages 309ndash320Amsterdam Netherlands May 2000 ACM Press
[Bol05] Paolo BoldiTotalrank ranking without damping In Poster proceedings of the 14th interna-tional conference on World Wide Web pages 898ndash899 Chiba Japan 2005 ACM Press
[BR04] Bela Bollobas and Oliver RiordanThe diameter of a scale-free random graph Combinator-ica 24(1)5ndash34 2004
[Bri06] Michael Brinkmeier PageRank revisited ACM Transaction on Internet Technologies6(3)257ndash279 2006
[BSV05] Paolo Boldi Massimo Santini and Sebastiano VignaPagerank as a function of the dampingfactor In Proceedings of the 14th international conference on World Wide Web pages 557ndash566 Chiba Japan 2005 ACM Press
[BYD04] Ricardo Baeza-Yates and Emilio DavisWeb page ranking using link attributes In Alternatetrack papers amp posters of the 13th international conference on World Wide Web pages 328ndash329 New York NY USA 2004 ACM Press
18
[BYRN99] Ricardo Baeza-Yates and Berthier Ribeiro-NetoModern Information Retrieval AddisonWesley May 1999
[CGS04] Yen-Yu Chen Qingqing Gan and Torsten SuelLocal methods for estimating pagerank val-ues In Proceedings of the thirteenth ACM conference on Information and knowledge man-agement (CIKM) pages 381ndash389 New York NY USA 2004 ACM Press
[CL01] Fan Chung and Linyuan LuThe diameter of random sparse graphs Adv Appl Math26257ndash279 2001
[Dav00] Brian D DavisonTopical locality in the web In Proceedings of the 23rd annual internationalACM SIGIR conference on research and development in information retrieval pages 272ndash279Athens Greece 2000 ACM Press
[GG04] Gene H Golub and Chen GreifArnoldi-type algorithms for computing stationary distributionvectors with application to pagerank Technical Report SCCM-04-15 Stanford University2004
[GS05] Antonio Gulli and Alessio SignoriniThe indexable Web is more than 115 billion pages InPoster proceedings of the 14th international conference on World Wide Web pages 902ndash903Chiba Japan 2005 ACM Press
[Hav99] Taher HaveliwalaEfficient computation of pagerank Technical report Stanford University1999
[Hav02] Taher H HaveliwalaTopic-sensitive pagerank In Proceedings of the Eleventh World WideWeb Conference pages 517ndash526 Honolulu Hawaii USA May 2002 ACM Press
[HG98] Stephanie W Haas and Erika S GramsPage and link classifications connecting diverseresources In DL rsquo98 Proceedings of the third ACM conference on Digital libraries pages99ndash107 New York NY USA 1998 ACM Press
[HK03a] Taher Haveliwala and Sepandar KamvarThe condition number of the pagerank problemTechnical Report 36 Stanford University 2003
[HK03b] Taher Haveliwala and Sepandar KamvarThe second eigenvalue of the google matrix Tech-nical Report 20 Stanford University 2003
[JM98] W-K Joo and S H Myaeng Improving retrieval effectiveness with hyperlink informationIn Proceedings of International Workshop on Information Retrieval with Asian Languages(IRAL) Singapore October 1998
[KHMG03a] S Kamvar T Haveliwala C Manning and G GolubExploiting the block structure of theweb for computing pagerank 2003
[KHMG03b] Sepandar D Kamvar Taher H Haveliwala Christopher D Manning and Gene H GolubExtrapolation methods for accelerating pagerank computations In Proceedings of the twelfthinternational conference on World Wide Web pages 261ndash270 ACM Press 2003
[Kle99] Jon M KleinbergAuthoritative sources in a hyperlinked environment Journal of the ACM46(5)604ndash632 1999
19
[KRR+00] R Kumar P Raghavan S Rajagopalan D Sivakumar A Tomkins and E UpfalStochasticmodels for the web graph In Proceedings of the 41st Annual Symposium on Foundations ofComputer Science (FOCS) pages 57ndash65 Redondo Beach CA USA 2000 IEEE CS Press
[LGZ04] Chris P Lee Gene H Golub and Stefanos A ZeniosA fast two-stage algorithm for com-puting pagerank and its extensions Technical report Stanford University 2004
[Li98] Yanhong LiToward a qualitative search engine IEEE Internet Computing July 1998
[Lif00] Maxim LifantsevVoting model for ranking Web pages In Peter Graham and MuthucumaruMaheswaran editorsProceedings of the International Conference on Internet Computingpages 143ndash148 Las Vegas Nevada USA June 2000 CSREA Press
[Mar97] Massimo MarchioriThe quest for correct information of the Web hyper search engines InProc of the sixth international conference on the Web Santa Clara USA April 1997
[NSW01] M E Newman S H Strogatz and D J WattsRandom graphs with arbitrary degree distri-butions and their applicationsPhys Rev E Stat Nonlin Soft Matter Phys 64(2 Pt 2) August2001
[PBMW98] Lawrence Page Sergey Brin Rajeev Motwani and Terry WinogradThe PageRank citationranking bringing order to the Web Technical report Stanford Digital Library TechnologiesProject 1998
[PRU02] Gopal Pandurangan Prabhakara Raghavan and Eli UpfalUsing Pagerank to characterizeWeb structure In Proceedings of the 8th Annual International Computing and CombinatoricsConference (COCOON) volume 2387 ofLecture Notes in Computer Science pages 330ndash390Singapore August 2002 Springer
[SPM05] Padmini Srinivasan Gautam Pant and Filippo MenczerA general evaluation framework fortopical crawlers Information Retrieval 8(3)417ndash447 2005
[UCH03] Trystan Upstill Nick Craswell and David HawkingPredicting fame and fortune Page-rank or indegree In Proceedings of the Australasian Document Computing SymposiumADCS2003 pages 31ndash40 Canberra Australia December 2003
[WHAT04] Fang Wu Bernardo A Huberman Lada A Adamic and Joshua R TylerInformation flow insocial groups Physica A Statistical and Theoretical Physics 337(1-2)327ndash335 June 2004
[WS98] Duncan J Watts and Steven H StrogatzCollective dynamics of small-world networksNa-ture 393(6684)440ndash442 June 1998
20
- Introduction
- Functional Rankings
- Damping Functions
-
- Linear damping
- Exponential damping PageRank
- Quadratic hyperbolic damping TotalRank
- General hyperbolic damping HyperRank
- An empirical damping
-
- Comparing Damping Functions
-
- Approximating TotalRank with PageRank
- Approximating HyperRank with PageRank
- Approximating PageRank with LinearRank
-
- Parameters for the Damping Functions
-
- Characteristic path lengths
- Damping parameters and in-degree
- Experimental comparison of ranking orders
-
- Conclusions
-
[BYRN99] Ricardo Baeza-Yates and Berthier Ribeiro-NetoModern Information Retrieval AddisonWesley May 1999
[CGS04] Yen-Yu Chen Qingqing Gan and Torsten SuelLocal methods for estimating pagerank val-ues In Proceedings of the thirteenth ACM conference on Information and knowledge man-agement (CIKM) pages 381ndash389 New York NY USA 2004 ACM Press
[CL01] Fan Chung and Linyuan LuThe diameter of random sparse graphs Adv Appl Math26257ndash279 2001
[Dav00] Brian D DavisonTopical locality in the web In Proceedings of the 23rd annual internationalACM SIGIR conference on research and development in information retrieval pages 272ndash279Athens Greece 2000 ACM Press
[GG04] Gene H Golub and Chen GreifArnoldi-type algorithms for computing stationary distributionvectors with application to pagerank Technical Report SCCM-04-15 Stanford University2004
[GS05] Antonio Gulli and Alessio SignoriniThe indexable Web is more than 115 billion pages InPoster proceedings of the 14th international conference on World Wide Web pages 902ndash903Chiba Japan 2005 ACM Press
[Hav99] Taher HaveliwalaEfficient computation of pagerank Technical report Stanford University1999
[Hav02] Taher H HaveliwalaTopic-sensitive pagerank In Proceedings of the Eleventh World WideWeb Conference pages 517ndash526 Honolulu Hawaii USA May 2002 ACM Press
[HG98] Stephanie W Haas and Erika S GramsPage and link classifications connecting diverseresources In DL rsquo98 Proceedings of the third ACM conference on Digital libraries pages99ndash107 New York NY USA 1998 ACM Press
[HK03a] Taher Haveliwala and Sepandar KamvarThe condition number of the pagerank problemTechnical Report 36 Stanford University 2003
[HK03b] Taher Haveliwala and Sepandar KamvarThe second eigenvalue of the google matrix Tech-nical Report 20 Stanford University 2003
[JM98] W-K Joo and S H Myaeng Improving retrieval effectiveness with hyperlink informationIn Proceedings of International Workshop on Information Retrieval with Asian Languages(IRAL) Singapore October 1998
[KHMG03a] S Kamvar T Haveliwala C Manning and G GolubExploiting the block structure of theweb for computing pagerank 2003
[KHMG03b] Sepandar D Kamvar Taher H Haveliwala Christopher D Manning and Gene H GolubExtrapolation methods for accelerating pagerank computations In Proceedings of the twelfthinternational conference on World Wide Web pages 261ndash270 ACM Press 2003
[Kle99] Jon M KleinbergAuthoritative sources in a hyperlinked environment Journal of the ACM46(5)604ndash632 1999
19
[KRR+00] R Kumar P Raghavan S Rajagopalan D Sivakumar A Tomkins and E UpfalStochasticmodels for the web graph In Proceedings of the 41st Annual Symposium on Foundations ofComputer Science (FOCS) pages 57ndash65 Redondo Beach CA USA 2000 IEEE CS Press
[LGZ04] Chris P Lee Gene H Golub and Stefanos A ZeniosA fast two-stage algorithm for com-puting pagerank and its extensions Technical report Stanford University 2004
[Li98] Yanhong LiToward a qualitative search engine IEEE Internet Computing July 1998
[Lif00] Maxim LifantsevVoting model for ranking Web pages In Peter Graham and MuthucumaruMaheswaran editorsProceedings of the International Conference on Internet Computingpages 143ndash148 Las Vegas Nevada USA June 2000 CSREA Press
[Mar97] Massimo MarchioriThe quest for correct information of the Web hyper search engines InProc of the sixth international conference on the Web Santa Clara USA April 1997
[NSW01] M E Newman S H Strogatz and D J WattsRandom graphs with arbitrary degree distri-butions and their applicationsPhys Rev E Stat Nonlin Soft Matter Phys 64(2 Pt 2) August2001
[PBMW98] Lawrence Page Sergey Brin Rajeev Motwani and Terry WinogradThe PageRank citationranking bringing order to the Web Technical report Stanford Digital Library TechnologiesProject 1998
[PRU02] Gopal Pandurangan Prabhakara Raghavan and Eli UpfalUsing Pagerank to characterizeWeb structure In Proceedings of the 8th Annual International Computing and CombinatoricsConference (COCOON) volume 2387 ofLecture Notes in Computer Science pages 330ndash390Singapore August 2002 Springer
[SPM05] Padmini Srinivasan Gautam Pant and Filippo MenczerA general evaluation framework fortopical crawlers Information Retrieval 8(3)417ndash447 2005
[UCH03] Trystan Upstill Nick Craswell and David HawkingPredicting fame and fortune Page-rank or indegree In Proceedings of the Australasian Document Computing SymposiumADCS2003 pages 31ndash40 Canberra Australia December 2003
[WHAT04] Fang Wu Bernardo A Huberman Lada A Adamic and Joshua R TylerInformation flow insocial groups Physica A Statistical and Theoretical Physics 337(1-2)327ndash335 June 2004
[WS98] Duncan J Watts and Steven H StrogatzCollective dynamics of small-world networksNa-ture 393(6684)440ndash442 June 1998
20
- Introduction
- Functional Rankings
- Damping Functions
-
- Linear damping
- Exponential damping PageRank
- Quadratic hyperbolic damping TotalRank
- General hyperbolic damping HyperRank
- An empirical damping
-
- Comparing Damping Functions
-
- Approximating TotalRank with PageRank
- Approximating HyperRank with PageRank
- Approximating PageRank with LinearRank
-
- Parameters for the Damping Functions
-
- Characteristic path lengths
- Damping parameters and in-degree
- Experimental comparison of ranking orders
-
- Conclusions
-
[KRR+00] R Kumar P Raghavan S Rajagopalan D Sivakumar A Tomkins and E UpfalStochasticmodels for the web graph In Proceedings of the 41st Annual Symposium on Foundations ofComputer Science (FOCS) pages 57ndash65 Redondo Beach CA USA 2000 IEEE CS Press
[LGZ04] Chris P Lee Gene H Golub and Stefanos A ZeniosA fast two-stage algorithm for com-puting pagerank and its extensions Technical report Stanford University 2004
[Li98] Yanhong LiToward a qualitative search engine IEEE Internet Computing July 1998
[Lif00] Maxim LifantsevVoting model for ranking Web pages In Peter Graham and MuthucumaruMaheswaran editorsProceedings of the International Conference on Internet Computingpages 143ndash148 Las Vegas Nevada USA June 2000 CSREA Press
[Mar97] Massimo MarchioriThe quest for correct information of the Web hyper search engines InProc of the sixth international conference on the Web Santa Clara USA April 1997
[NSW01] M E Newman S H Strogatz and D J WattsRandom graphs with arbitrary degree distri-butions and their applicationsPhys Rev E Stat Nonlin Soft Matter Phys 64(2 Pt 2) August2001
[PBMW98] Lawrence Page Sergey Brin Rajeev Motwani and Terry WinogradThe PageRank citationranking bringing order to the Web Technical report Stanford Digital Library TechnologiesProject 1998
[PRU02] Gopal Pandurangan Prabhakara Raghavan and Eli UpfalUsing Pagerank to characterizeWeb structure In Proceedings of the 8th Annual International Computing and CombinatoricsConference (COCOON) volume 2387 ofLecture Notes in Computer Science pages 330ndash390Singapore August 2002 Springer
[SPM05] Padmini Srinivasan Gautam Pant and Filippo MenczerA general evaluation framework fortopical crawlers Information Retrieval 8(3)417ndash447 2005
[UCH03] Trystan Upstill Nick Craswell and David HawkingPredicting fame and fortune Page-rank or indegree In Proceedings of the Australasian Document Computing SymposiumADCS2003 pages 31ndash40 Canberra Australia December 2003
[WHAT04] Fang Wu Bernardo A Huberman Lada A Adamic and Joshua R TylerInformation flow insocial groups Physica A Statistical and Theoretical Physics 337(1-2)327ndash335 June 2004
[WS98] Duncan J Watts and Steven H StrogatzCollective dynamics of small-world networksNa-ture 393(6684)440ndash442 June 1998
20
- Introduction
- Functional Rankings
- Damping Functions
-
- Linear damping
- Exponential damping PageRank
- Quadratic hyperbolic damping TotalRank
- General hyperbolic damping HyperRank
- An empirical damping
-
- Comparing Damping Functions
-
- Approximating TotalRank with PageRank
- Approximating HyperRank with PageRank
- Approximating PageRank with LinearRank
-
- Parameters for the Damping Functions
-
- Characteristic path lengths
- Damping parameters and in-degree
- Experimental comparison of ranking orders
-
- Conclusions
-