Estimating PageRank on Graph Streams

29
Estimating PageRank on Graph Streams Atish Das Sarma (Georgia Tech) Sreenivas Gollapudi, Rina Panigrahy (Microsoft Research)

description

Estimating PageRank on Graph Streams. Atish Das Sarma (Georgia Tech) Sreenivas Gollapudi, Rina Panigrahy (Microsoft Research). PageRank. PageRank Determine Ranking of nodes in graphs Typically large graphs - WWW, Social Networks Run daily by commercial search engines. - PowerPoint PPT Presentation

Transcript of Estimating PageRank on Graph Streams

Page 1: Estimating PageRank on Graph Streams

Estimating PageRank on Graph Streams

Atish Das Sarma (Georgia Tech) Sreenivas Gollapudi,

Rina Panigrahy (Microsoft Research)

Page 2: Estimating PageRank on Graph Streams

PageRank

• PageRank – Determine Ranking of nodes in graphs

• Typically large graphs - WWW, Social Networks

• Run daily by commercial search engines

Page 3: Estimating PageRank on Graph Streams

PageRank computation

u

a

b

c

Page 4: Estimating PageRank on Graph Streams

PageRank Computation

Our Approach:No Matrix-Vector

Multiplication!

u

a

b

c

Page 5: Estimating PageRank on Graph Streams

Our Result

Many Random Walk SamplesEfficiently.

Approximate PageRank

u

Page 6: Estimating PageRank on Graph Streams

Other results from Random Walks

We can estimate:Mixing TimeConductance

Using Streams

G

u

Page 7: Estimating PageRank on Graph Streams

Streaming

7

e1, e2, e3, e4, e5, e6, e7, ….

Input is a “stream”

Small RAM working memory

Few Passes

Frequency moments, quantiles

Graphs: Edges, arbitrary order

010001011

011101011

0100110111

Page 8: Estimating PageRank on Graph Streams

Related Work

• Sparsifiers (Benczur-Karger 96, Spielman-Teng 01, Spielman-Srivastava 08)– Given an undirected graph, produces a sparse one– approximately preserves x’Lx– Can be used to compute sparse cuts

• Streaming version of BK96 (Ahn, Guha 09)– Sparse cuts in 1 pass and O(n) space.

• Accelarated Page Rank (McSherry 08)– heuristics

8

~

Page 9: Estimating PageRank on Graph Streams

Key Idea

One walk from ulength l efficiently

Later extend toMany walks

u

vl

Page 10: Estimating PageRank on Graph Streams

Single Random Walk - Naive Algo.

One Stepwith every

Pass!

Constant Space Passes

s

Page 11: Estimating PageRank on Graph Streams

Second Naive Algo

Single PassSample sufficient edges!

If ,then sample2 out-edges

from each node.

(store order)

s

Page 12: Estimating PageRank on Graph Streams

Comparison

Naive (single walk):

Our Result:

In fact walks!

u

l

Automatically:

Page 13: Estimating PageRank on Graph Streams

Insight: Merge Short Walks

Sample fraction of nodes(centers)

passes - length walks

Merge and extendshort walks!

Two problems:End up at node second timeEnd up at non-sampled node

s

w

w

w

w

w

w

w

ab

Page 14: Estimating PageRank on Graph Streams

Stuck Nodes

Sample an edgefrom stuck.

Again.And again...

Slow?

If new nodes, good in passes!

s

w

w

w

w

w

w

w

Page 15: Estimating PageRank on Graph Streams

Stuck nodes

Stuck on sameNodes?

Sample s edges from each

s progress ORnew node!

Must include to set previous seen

centers

s

w

w

w

w

w

w

w

ww s s

s

s s

s

Page 16: Estimating PageRank on Graph Streams

Summary

s

w

w

w

w

w

w

w

ww s s

s

s s

s

• Perform short walks from sampled centers

• Concatenate walks until stuck

• Sample edges from stuck

• Make local progress until new node

• Local progress = s• New node : center with

prob • Amortized progress,

every pass

Page 17: Estimating PageRank on Graph Streams

Summary

s

w

w

w

w

w

w

w

ww s s

s

s s

s

Total number of passes :

Total Space :

Page 18: Estimating PageRank on Graph Streams

Summary

s

w

w

w

w

w

w

w

ww s s

s

s s

s

Set

Number of passes =

Space =

Page 19: Estimating PageRank on Graph Streams

Many WalksNaive Space

Bound:

Observation:Many short walks

not used inSingle RW.

s

w

w

w

w

w

w

w

ww s s

s

s s

s

We show:

lnKnO /for )(~

Page 20: Estimating PageRank on Graph Streams

Many Random Walks

ir

ir

w

lKrK i

ir

• : probability node ’s short walk used in single RW.

• If known : save lot of space!• Perform K random walks• Total number of short walks required is

about

• Don’t know . But can estimate.ir

Page 21: Estimating PageRank on Graph Streams

Estimating

• Run K = (log n) walks of length

• Gives a crude estimate of • Sufficient to double K• Continue doubling K• Gives K walks in space

• Passes

u

l

ir

irO

)(~

Kll

KnO

Page 22: Estimating PageRank on Graph Streams

Distributions

samples

Distribution: u

SpacePasses

Page 23: Estimating PageRank on Graph Streams

Mixing Time, Conductance• Undirected graphs: Compare Distribution

with Steady State.• Estimating difference: samples.

[Batu et. al.’ 01]– approximate mixing time.

• Directed, till distribution “stabilizes”: samples.

• Conductance:• Recall space for walks: lnKnO /for )(

~

Page 24: Estimating PageRank on Graph Streams

Results recap

• - Mixing Time for Undirected Graphs :

• Quadratic Approximation to Conductance• PageRank to accuracy

)(~

:Space nO

Page 25: Estimating PageRank on Graph Streams

Open Questions?

• Improve passes for random walks. In particular, sub-linear space and constant passes.

• Graph Cuts and Graph Sparsification for directed graphs

• Better (streaming) algorithms for computing eigenvectors

Page 26: Estimating PageRank on Graph Streams

Thank You!

Page 27: Estimating PageRank on Graph Streams

Summary

• Perform short walks from sampled centers• Concatenate walks until stuck• Sample edges from stuck• Make local progress until new node• Local progress = s• New node = nodes gives center• Amortized, every pass -

Page 28: Estimating PageRank on Graph Streams

Summary

• Perform short walks from sampled centers• Concatenate walks until stuck• Sample edges from stuck• Make local progress until new node• Local progress = s• New node = nodes gives center• Amortized, every pass -

Page 29: Estimating PageRank on Graph Streams

Analysis

• Total number of passes :• Total Space : • Set• Number of passes = • Space =