Connectivity Structure of Bipartite Graphs via the KNC-Plot

33
1 Connectivity Structure of Bipartite Graphs via the KNC-Plot Erik Vee joint work with Ravi Kumar, Andrew Tomkins

description

Connectivity Structure of Bipartite Graphs via the KNC-Plot. Erik Vee joint work with Ravi Kumar, Andrew Tomkins. The fundamental question…. Given graph with millions/billions of nodes, how do we understand it?. Macroscopic Success Stories. - PowerPoint PPT Presentation

Transcript of Connectivity Structure of Bipartite Graphs via the KNC-Plot

Page 1: Connectivity Structure of Bipartite Graphs via the KNC-Plot

1

Connectivity Structure of Bipartite Graphs via the KNC-Plot

Erik Vee

joint work with

Ravi Kumar, Andrew Tomkins

Page 2: Connectivity Structure of Bipartite Graphs via the KNC-Plot

2

The fundamental question…

• Given graph with millions/billions of nodes, how do we understand it?

Page 3: Connectivity Structure of Bipartite Graphs via the KNC-Plot

3

Macroscopic Success Stories

• Given graph with millions/billions of nodes, how do we understand it?

• Spectral Graph Analysis– Eigenvalues reveal intuition for mixing time, connectivity

• Conductance of a graph

• Degree distribution

Page 4: Connectivity Structure of Bipartite Graphs via the KNC-Plot

4

Macroscopic models of graphs:Understanding connectivity

Bow tie model [Broder et al]Web graph

Jellyfish model [Faloutsos et al]Internet AS graph

No equivalent model for bipartite graphs

Page 5: Connectivity Structure of Bipartite Graphs via the KNC-Plot

5

Our Goals

• Develop macroscopic tools to analyze social networks– Massive networks

– What are simple, easy-to-understand properties?

– Today: KNC-plot for bipartite graphs

• Given implicit graph representation,do something smarter than explicitly building graph– Bipartite representation gives an implicit graph

– Our algorithms never build actual graph

– Same spirit as work of [Feder, Motwani 95]

Page 6: Connectivity Structure of Bipartite Graphs via the KNC-Plot

6

Outline

• Definition of the KNC-plot– k-neighborhood graph

• Analysis of real social networks using the KNC-plot

• Description of algorithm

Page 7: Connectivity Structure of Bipartite Graphs via the KNC-Plot

7

The k-neighborhood graph, Gk

• Given bipartite graph B, users on left, interests on right

• Connect two users if they share at least k interests in common

Page 8: Connectivity Structure of Bipartite Graphs via the KNC-Plot

8

The k-neighborhood graph, Gk

• Given bipartite graph B, users on left, interests on right

• Connect two users if they share at least k interests in common

G1

Page 9: Connectivity Structure of Bipartite Graphs via the KNC-Plot

9

• Given bipartite graph B, users on left, interests on right

• Connect two users if they share at least k interests in common

The k-neighborhood graph, Gk

G2

Page 10: Connectivity Structure of Bipartite Graphs via the KNC-Plot

10

• Given bipartite graph B, users on left, interests on right

• Connect two users if they share at least k interests in common

The k-neighborhood graph, Gk

G3

Page 11: Connectivity Structure of Bipartite Graphs via the KNC-Plot

11

Illustration k=1

Page 12: Connectivity Structure of Bipartite Graphs via the KNC-Plot

12

Illustration k=2

Page 13: Connectivity Structure of Bipartite Graphs via the KNC-Plot

13

Illustration k=3

Page 14: Connectivity Structure of Bipartite Graphs via the KNC-Plot

14

Illustration k=4

Page 15: Connectivity Structure of Bipartite Graphs via the KNC-Plot

15

Illustration k=5

Page 16: Connectivity Structure of Bipartite Graphs via the KNC-Plot

16

The KNC-plot

• The k-neighbor connectivity plot

– How many connected components does Gk have?

– What is the size of the largest component?

• Answers the question: how many shared interests are meaningful?– Communities, Cuts

Page 17: Connectivity Structure of Bipartite Graphs via the KNC-Plot

17

Analysis

• Four graphs:– LiveJournal

• Blogging site, users can specify interests

– Y! query logs (interests = queries)

• Queries issued for Yahoo! Search (Try it at www.yahoo.com)

– Content match (users = web pages, interests = ads)

• Ads shown on web pages

– Flickr photo tags (users = photos, interests = tags)

• All data anonymized, sanitized, downsampled– Graphs have 100s of thousands to a million users

Page 18: Connectivity Structure of Bipartite Graphs via the KNC-Plot

18

Examples— Largest component— Number of components

At k=5, all connected.At k=6, interesting!

At k=6, nobody connected

Page 19: Connectivity Structure of Bipartite Graphs via the KNC-Plot

19

Examples— Largest component— Number of components

At k=5, all connected.At k=6, interesting!

At k=6, nobody connected

Content matchWeb pages = “users”Ads = “interests”

FlickrPhotos = “users”Tags = “interests”

Page 20: Connectivity Structure of Bipartite Graphs via the KNC-Plot

20

Examples— Largest component— Number of components

Connectivity smoothly varies“Heavy-tailed”

At k=14, 10% connectedAt k=36, 1% connected

Page 21: Connectivity Structure of Bipartite Graphs via the KNC-Plot

21

Examples— Largest component— Number of components

Connectivity smoothly varies“Heavy-tailed”

At k=14, 10% connectedAt k=36, 1% connected

Y! queriesUsers = usersQueries = “interests”

LiveJournalUsers = usersInterests = interests

Page 22: Connectivity Structure of Bipartite Graphs via the KNC-Plot

22

Algorithms

• Naïve implementation takes O(mn) time– Impractical for

large graphs

— Naïve— Ours For k = 2

Page 23: Connectivity Structure of Bipartite Graphs via the KNC-Plot

23

Algorithms

• Naïve implementation takes O(mn) time– Impractical for

large graphs

• Our implementation takes O(m2-1/k) time– Social networks are generally sparse

– Faster for power-law distribution (no change in the algorithm)

– Very fast for k=2, can trim graph for k=3, etc.

Space O(km)

— Naïve— Ours For k = 2

Page 24: Connectivity Structure of Bipartite Graphs via the KNC-Plot

24

Alg-Intersect

• Roughly speaking, for every pair of users, determine whether they have k interests in common

• For each node u, record its neighborhood– For each node v,

• see if u’s and v’s neighborhoods intersect in at least k nodes

– If so, connect them, otherwise don’t

• Takes O(nm) time (n= # nodes, m = # edges)

Space = O(m)

Page 25: Connectivity Structure of Bipartite Graphs via the KNC-Plot

25

Alg-Intersect

• Roughly speaking, for every pair of users, determine whether they have k interests in common

• For each node uS, record its neighborhood– For each node v,

• see if u’s and v’s neighborhoods intersect in at least k nodes

– If so, connect them, otherwise don’t

• Takes O(nm) time (n= # nodes, m = # edges)

• BUT: May explore only nodes in set S.– Takes O(|S|m) time

Space = O(m)

Page 26: Connectivity Structure of Bipartite Graphs via the KNC-Plot

26

Alg-Tuples

• Consider k=2.

• Suppose user 1 has interests {A,B,C} user 2 has interests {A,C,D}

• Create “virtual nodes”

• Connect user 1 to {AB}, {AC}, {BC}

• Connect user 2 to {AC}, {AD}, {CD}

• There is an edge between user 1 and user 2 in Gk

iff there is a virtual node that both are connected to.

Page 27: Connectivity Structure of Bipartite Graphs via the KNC-Plot

27

Alg-Tuples

• For each node u,– Create virtual nodes for u (if not already created)

– Connect u to those virtual nodes

• // (note: there are O( deg(u)k ) of them)

• Figure out connectivity of Gk using virtual graph

• Runtime O( u deg(u)k)

– Uses Union-Set structure

– Edges not actually explicitly computed

Space O ( u deg(u)k)

Page 28: Connectivity Structure of Bipartite Graphs via the KNC-Plot

28

Combining them

• Run Alg-Intersect for some subset S of nodes

– We know all edges in Gk that go from uS to any node v

– Runtime O(|S|m)

S

Other nodes

High degree nodes

Page 29: Connectivity Structure of Bipartite Graphs via the KNC-Plot

29

Combining them

• Run Alg-Intersect for some subset S of nodes

– We know all edges in Gk that go from uS to any node v

– Runtime O(|S|m)

• Run Alg-Tuple on the rest of the nodes

– We “know” all edges in Gk that go from uS to vS

– Runtime O(uS deg(u)k )

S

Other nodes

Page 30: Connectivity Structure of Bipartite Graphs via the KNC-Plot

30

• Order u1, u2, … by decreasing deg(ui)

• Initialize b=1. Increase b until

i≥b deg(ui)k ≤ bm

• Let S = {u1, u2 …, ub}

• Run Alg-Intersect on nodes in S

• Run Alg-Tuple on nodes not in S– Connect the two

• Runtime is

O(bm) + O(i≥b deg(ui)k ) = O(2bm)

Finding S

High degree nodes

Page 31: Connectivity Structure of Bipartite Graphs via the KNC-Plot

31

Combining them

• Runtime is O(bm) + O(i≥b deg(ui)k )

• But, for any graph, deg(ui) ≤ m/i (by Markov)

– Do not need power-law

• Hence, bm = i≥b deg(ui)k ≤ i≥b mk /ik = O( mk/bk )

• So b = O(m1-1/k) Runtime is O(m2-1/k)

Page 32: Connectivity Structure of Bipartite Graphs via the KNC-Plot

32

Extensions

• Power-law distributed provably faster– O(m1+(1-1/k)/) for power law with exponent

– Algorithm works exactly the same

– No need to know whether power-law ahead of time

• When set of interests is logarithmic, can get quasi-linear time algorithms– Different algorithm

– In paper

Page 33: Connectivity Structure of Bipartite Graphs via the KNC-Plot

33

Conclusion

• KNC-plot useful tool– Exposes how meaningful shared interests are

• The k-neighborhood graph defined implicitly– Efficient algorithm for implicit graph

– Other algorithms for Gk, given bipartite representation

• Find additional social graph properties that are meaningful, computable– Describe macroscopic structure of social networks