Multigraph Sampling of Online Social Networks Minas Gjoka, Carter Butts, Maciej Kurant, Athina...

20
Multigraph Sampling of Online Social Networks Minas Gjoka, Carter Butts, Maciej Kurant, Athina Markopoulou 1 Multigraph sampling

Transcript of Multigraph Sampling of Online Social Networks Minas Gjoka, Carter Butts, Maciej Kurant, Athina...

Multigraph Sampling of Online Social Networks

Minas Gjoka, Carter Butts, Maciej Kurant, Athina Markopoulou

1Multigraph sampling

Outline

• Multigraph sampling– Motivation– Sampling method– Internet Measurements– Conclusion

2Multigraph sampling Minas Gjoka

3

Problem statement

• Obtain a representative sample of OSN users by exploration of the social graph.

F

HE

I

G

D

B

C

A

Multigraph sampling Minas Gjoka

Motivation for multiple relations

• Principled methods for graph sampling– Metropolis Hastings Random Walk– Re-weighted Random Walk

“Walking in Facebook: A Case Study of Unbiased Sampling of OSNs,” INFOCOM ‘10

• But..graph characteristics affect mixing and convergence

• fragmented social graph• highly clustered areas

4Multigraph sampling Minas Gjoka

Fragmented social graph

5Union

Friendship

Event attendance

Group membershipMultigraph sampling

Largest Connected ComponentOther Connected Components

Highly clustered social graph

Friendship Event attendance

6

Union

Multigraph sampling Minas Gjoka

Proposal

• Graph exploration using multiple user relations– perform random walk– re-weighting at the end of the walk– online convergence diagnostics applicable

• Theoretical benefits– faster mixing– discovery of isolated components

• Open questions– how to combine relations– implementation efficiency– evaluation of sampling benefits in a realistic scenario

7Multigraph sampling Minas Gjoka

8

D

F

H

EI

J

GC

B

A

K

D

F

H

EI

J

GC

B

A

K

D

F

H

EI

J

GC

B

A

K

Friends

Events

Groups

Multigraph sampling Minas Gjoka

9

D

F

H

EI

J

GC

B

A

K

D

F

H

EI

J

GC

B

A

K

D

F

H

EI

J

GC

B

A

K

D

F

H

EI

J

GC

B

A

K

Friends

Events

Groups

Multigraph sampling Minas Gjoka

10

D

F

H

EI

J

GC

B

A

K

deg(F, tot) = 8

deg(F, red) = 1

deg(F, blue) = 3

deg(F, green) = 4

G* = Friends + Events + Groups

( G* is a union multigraph )

Combination of multiple relations

D

F

H

E

I

J

GC

B

A

K G = Friends + Events + Groups

( G is a union graph )

Multigraph sampling Minas Gjoka

Multigraph samplingImplementation efficiency

Degree information available without enumeration

5)( Fd

8/1)( Friendsp

8/4)( Eventsp

8/3)( Groupsp

Take advantage of pages functionality 11

8)(* Fd

Multigraph sampling Minas Gjoka

Multigraph samplingInternet Measurements

• Last.fm, an Internet radio service– social networking features– multiple relations– fragmented graph components and highly clustered

users expected

• Last.fm relations used– Friends– Groups– Events– Neighbors

12Multigraph sampling Minas Gjoka

Data CollectionSampled node information

• Crawling using Last.fm API and HTML scrapinguserIDcountryageregistration time…

13Multigraph sampling Minas Gjoka

Summary of datasetsLast.fm - July 2010

Crawl type # Total Users % Unique Users

Friends 5x50K 71%

Events 5x50K 58%Groups 5x50K 74%Neighbors 5x50K 53%Friends-Events-Groups-Neighbors

5x50K 76%

UNI 500K 99%

15Multigraph sampling Minas Gjoka

Comparison to UNI% of Subscribers

16

% o

f Sub

scrib

ers

Multigraph sampling Minas Gjoka

Last.fm Charts EstimationApplication of sampling

17Multigraph sampling Minas Gjoka

Last.fm Charts EstimationArtist Charts

18Multigraph sampling Minas Gjoka

Related Work

• Fastest mixing Markov Chain– Boyd et al - SIAM Review 2004

• Sampling in fragmented graphs– Ribeiro et al. Frontier Sampling – IMC 2010

• Last.fm studies– Konstas et al - SIGIR ‘09– Schifanella et al - WSDM ‘10

19Multigraph sampling Minas Gjoka

20

Conclusion

• Introduced multigraph sampling– simple and efficient– discovers isolates components– better approximation of distributions and means– multigraph dataset planned for public release

• Future work on multigraph sampling– selection of relations– weighted relations

Multigraph sampling Minas Gjoka

21

Thank youQuestions?

Multigraph sampling Minas Gjoka