Estimating PageRank on Graph Streams
description
Transcript of Estimating PageRank on Graph Streams
Estimating PageRank on Graph Streams
Atish Das Sarma (Georgia Tech) Sreenivas Gollapudi,
Rina Panigrahy (Microsoft Research)
PageRank
• PageRank – Determine Ranking of nodes in graphs
• Typically large graphs - WWW, Social Networks
• Run daily by commercial search engines
PageRank computation
u
a
b
c
PageRank Computation
Our Approach:No Matrix-Vector
Multiplication!
u
a
b
c
Our Result
Many Random Walk SamplesEfficiently.
Approximate PageRank
u
Other results from Random Walks
We can estimate:Mixing TimeConductance
Using Streams
G
u
Streaming
7
e1, e2, e3, e4, e5, e6, e7, ….
Input is a “stream”
Small RAM working memory
Few Passes
Frequency moments, quantiles
Graphs: Edges, arbitrary order
010001011
011101011
0100110111
Related Work
• Sparsifiers (Benczur-Karger 96, Spielman-Teng 01, Spielman-Srivastava 08)– Given an undirected graph, produces a sparse one– approximately preserves x’Lx– Can be used to compute sparse cuts
• Streaming version of BK96 (Ahn, Guha 09)– Sparse cuts in 1 pass and O(n) space.
• Accelarated Page Rank (McSherry 08)– heuristics
8
~
Key Idea
One walk from ulength l efficiently
Later extend toMany walks
u
vl
Single Random Walk - Naive Algo.
One Stepwith every
Pass!
Constant Space Passes
s
Second Naive Algo
Single PassSample sufficient edges!
If ,then sample2 out-edges
from each node.
(store order)
s
Comparison
Naive (single walk):
Our Result:
In fact walks!
u
l
Automatically:
Insight: Merge Short Walks
Sample fraction of nodes(centers)
passes - length walks
Merge and extendshort walks!
Two problems:End up at node second timeEnd up at non-sampled node
s
w
w
w
w
w
w
w
ab
Stuck Nodes
Sample an edgefrom stuck.
Again.And again...
Slow?
If new nodes, good in passes!
s
w
w
w
w
w
w
w
Stuck nodes
Stuck on sameNodes?
Sample s edges from each
s progress ORnew node!
Must include to set previous seen
centers
s
w
w
w
w
w
w
w
ww s s
s
s s
s
Summary
s
w
w
w
w
w
w
w
ww s s
s
s s
s
• Perform short walks from sampled centers
• Concatenate walks until stuck
• Sample edges from stuck
• Make local progress until new node
• Local progress = s• New node : center with
prob • Amortized progress,
every pass
Summary
s
w
w
w
w
w
w
w
ww s s
s
s s
s
Total number of passes :
Total Space :
Summary
s
w
w
w
w
w
w
w
ww s s
s
s s
s
Set
Number of passes =
Space =
Many WalksNaive Space
Bound:
Observation:Many short walks
not used inSingle RW.
s
w
w
w
w
w
w
w
ww s s
s
s s
s
We show:
lnKnO /for )(~
Many Random Walks
ir
ir
w
lKrK i
ir
• : probability node ’s short walk used in single RW.
• If known : save lot of space!• Perform K random walks• Total number of short walks required is
about
• Don’t know . But can estimate.ir
Estimating
• Run K = (log n) walks of length
• Gives a crude estimate of • Sufficient to double K• Continue doubling K• Gives K walks in space
• Passes
u
l
ir
irO
)(~
Kll
KnO
Distributions
samples
Distribution: u
SpacePasses
Mixing Time, Conductance• Undirected graphs: Compare Distribution
with Steady State.• Estimating difference: samples.
[Batu et. al.’ 01]– approximate mixing time.
• Directed, till distribution “stabilizes”: samples.
• Conductance:• Recall space for walks: lnKnO /for )(
~
Results recap
• - Mixing Time for Undirected Graphs :
• Quadratic Approximation to Conductance• PageRank to accuracy
)(~
:Space nO
Open Questions?
• Improve passes for random walks. In particular, sub-linear space and constant passes.
• Graph Cuts and Graph Sparsification for directed graphs
• Better (streaming) algorithms for computing eigenvectors
Thank You!
Summary
• Perform short walks from sampled centers• Concatenate walks until stuck• Sample edges from stuck• Make local progress until new node• Local progress = s• New node = nodes gives center• Amortized, every pass -
Summary
• Perform short walks from sampled centers• Concatenate walks until stuck• Sample edges from stuck• Make local progress until new node• Local progress = s• New node = nodes gives center• Amortized, every pass -
Analysis
• Total number of passes :• Total Space : • Set• Number of passes = • Space =