Extrapolation Methods for Accelerating PageRank Computations
Sepandar D. Kamvar
Taher H. Haveliwala
Christopher D. Manning
Gene H. Golub
Stanford University
2
Results:
1. The Official Site of the San Francisco Giants
Search: Giants
Results:
1. The Official Site of the New York Giants
Motivation Problem:
Speed up PageRank
Motivation: Personalization “Freshness”
Note: PageRank Computations don’t get faster as computers do.
3
0.4
0.2
0.4
(k)1)(k Axx Repeat:
u1 u2 u3 u4 u5
u1 u2 u3 u4 u5
Outline Definition of PageRank
Computation of PageRank
Convergence Properties
Outline of Our Approach
Empirical Results
4
Link Counts
Linked by 2 Important Pages
Linked by 2 Unimportant
pages
Sep’s Home Page
Taher’s Home Page
Yahoo! CNNDB Pub Server CS361
5
Definition of PageRank
The importance of a page is given by the importance of the pages that link to it.
jBj j
i xN
xi
1
importance of page i
pages j that link to page i
number of outlinks from page j
importance of page j
6
Definition of PageRank
1/2 1/2 1 1
0.1 0.10.1
0.05
Yahoo!CNNDB Pub Server
Taher Sep
0.25
7
PageRank Diagram
Initialize all nodes to rank
0.333
0.333
0.333
nxi
1)0(
8
PageRank Diagram
Propagate ranks across links(multiplying by link weights)
0.167
0.167
0.333
0.333
9
PageRank Diagram
0.333
0.5
0.167
)0()1( 1j
Bj ji x
Nx
i
10
PageRank Diagram
0.167
0.167
0.5
0.167
11
PageRank Diagram
0.5
0.333
0.167
)1()2( 1j
Bj ji x
Nx
i
12
PageRank Diagram
After a while…
0.4
0.4
0.2
jBj j
i xN
xi
1
13
Computing PageRank Initialize:
Repeat until convergence:
)()1( 1 kj
Bj j
ki x
Nx
i
nxi
1)0(
importance of page i
pages j that link to page i
number of outlinks from page j
importance of page j
14
Matrix Notation
jBj j
i xN
xi
1
0 .2 0 .3 0 0 .1 .4 0 .1=
.1
.3
.2
.3
.1
.1
.2
.1
.3
.2
.3
.1
.1TP
x
15
Matrix Notation
.1
.3
.2
.3
.1
.1
0 .2 0 .3 0 0 .1 .4 0 .1=
.1
.3
.2
.3
.1
.1
.2
xPx TFind x that satisfies:
16
Power Method Initialize:
Repeat until convergence:
(k)T1)(k xPx
T(0)x
nn
1...
1
17
PageRank doesn’t actually use PT. Instead, it uses A=cPT + (1-c)ET.
So the PageRank problem is really:
not:
A side note
AxxFind x that satisfies:
xPx TFind x that satisfies:
18
Power Method And the algorithm is really . . .
Initialize:
Repeat until convergence:
T(0)x
nn
1...
1
(k)1)(k Axx
19
0.4
0.2
0.4
(k)1)(k Axx Repeat:
u1 u2 u3 u4 u5
u1 u2 u3 u4 u5
Outline Definition of PageRank
Computation of PageRank
Convergence Properties
Outline of Our Approach
Empirical Results
20
Power Method
u1
1u2
2
u3
3
u4
4
u5
5
Express x(0) in terms of eigenvectors of A
21
Power Method
u1
1u2
22
u3
33
u4
44
u5
55
)(1x
22
Power Method)2(x
u1
1u2
222
u3
332
u4
442
u5
552
23
Power Method
u1
1u2
22k
u3
33k
u4
44k
u5
55k
)(kx
24
Power Method
u1
1u2
u3
u4
u5
)(x
25
Why does it work?
Imagine our n x n matrix A has n distinct eigenvectors ui.
ii uAu i
n0 uuux n ...221)(
u1
1u2
2
u3
3
u4
4
u5
5
Then, you can write any n-dimensional vector as a linear combination of the eigenvectors of A.
26
Why does it work? From the last slide:
To get the first iterate, multiply x(0) by A.
First eigenvalue is 1.
Therefore:
...;1 211
n0 uuux n ...221)(
n
n
(0)(1)
uuu
AuAuAu
Axx
nn
n
...
...
22211
221
n(1) uuux nn ...2221
All less than 1
27
Power Method
n0 uuux n ...221)(
u1
1u2
2
u3
3
u4
4
u5
5
u1
1u2
22
u3
33
u4
44
u5
55
n(1) uuux nn ...2221
n)( uuux 2
22221
2 ... nn u1
1u2
222
u3
332
u4
442
u5
552
28
The smaller 2, the faster the convergence of the Power Method.
Convergence
n)( uuux k
nnkk ...2221
u1
1u2
22k
u3
33k
u4
44k
u5
55k
29
Our Approach
u1 u2 u3 u4 u5
Estimate components of current iterate in the directions of second two eigenvectors, and eliminate them.
30
Why this approach? For traditional problems:
A is smaller, often dense. 2 often close to , making the power method slow.
In our problem, A is huge and sparse More importantly, 2 is small1.
Therefore, Power method is actually much faster than other methods.
1(“The Second Eigenvalue of the Google Matrix” dbpubs.stanford.edu/pub/2003-20.)
31
Using Successive Iterates
u1
x(0)
u1 u2 u3 u4 u5
32
Using Successive Iterates
u1
x(1)
x(0)
u1 u2 u3 u4 u5
33
Using Successive Iterates
u1
x(1)
x(0)
x(2)
u1 u2 u3 u4 u5
34
Using Successive Iterates
x(0)
u1
x(1)
x(2)
u1 u2 u3 u4 u5
35
Using Successive Iterates
x(0)
x’ = u1
x(1)
u1 u2 u3 u4 u5
36
How do we do this? Assume x(k) can be written as a linear
combination of the first three eigenvectors (u1, u2, u3) of A.
Compute approximation to {u2,u3}, and subtract it from x(k) to get x(k)’
37
Assume Assume the x(k) can be represented by
first 3 eigenvectors of A
33322211 uuuAxx )()( kk
n)( uuux 3221 k
32332
2221
2 uuux )( k
33332
3221
3 uuux )( k
38
Linear Combination Let’s take some linear combination of
these 3 iterates.
)()()( xxx 33
22
11
kkk
)( 32332
22212 uuu
)( 33332
32213 uuu
)( 33322211 uuu
39
Rearranging Terms We can rearrange the terms to get:
)()()( xxx 33
22
11
kkk
1321 )( u
2323
222212 )( u
3333
232313 )( u
Goal: Find 1,2,3 so that coefficients of u2 and u3 are 0, and coefficient of u1 is 1.
40
Summary We make an assumption about the
current iterate. Solve for dominant eigenvector as a
linear combination of the next three iterates.
We use a few iterations of the Power Method to “clean it up”.
41
u1 u2 u3 u4 u5
u1 u2 u3 u4 u5
0.4
0.2
0.4
(k)1)(k Axx Repeat:
Outline Definition of PageRank
Computation of PageRank
Convergence Properties
Outline of Our Approach
Empirical Results
42
ResultsQuadratic Extrapolation speeds up convergence. Extrapolation was only used 5 times!
43
ResultsExtrapolation dramatically speeds up convergence, for high values of c (c=.99)
44
Take-home message Speeds up PageRank by a fair amount,
but not by enough for true Personalized PageRank.
Ideas are useful for further speedup algorithms.
Quadratic Extrapolation can be used for a whole class of problems.
45
The End Paper available at
http://dbpubs.stanford.edu/pub/2003-16
Top Related