1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej...
-
Upload
shawn-goodwin -
Category
Documents
-
view
213 -
download
0
Transcript of 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej...
![Page 1: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/1.jpg)
1
Walking on a Graph with a Magnifying GlassStratified Sampling via Weighted Random Walks
Maciej KurantMinas Gjoka, Carter T. Butts, Athina Markopoulou
University of California, Irvine
SIGMETRICS 2011, June 11th, San Jose
![Page 2: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/2.jpg)
2(over 15% of world’s population, and over 50% of world’s Internet users !)
Online Social Networks (OSNs)
> 1 billion users October 2010
500 million 2
200 million 9
130 million 12
100 million 43
75 million 10
75 million 29
Size Traffic
![Page 3: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/3.jpg)
Facebook:•500+M users•130 friends each (on average)•8 bytes (64 bits) per user ID
The raw connectivity data, with no attributes:•500 x 130 x 8B = 520 GB
This is neither feasible nor practical. Solution: Sampling!
To get this data, one would have to download:•100+ TB of (uncompressed) HTML data!
3
![Page 4: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/4.jpg)
Sampling
• Topology?What:
4
![Page 5: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/5.jpg)
Sampling
• Topology?• Nodes?
What:• Directly?How:
![Page 6: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/6.jpg)
• Topology?• Nodes?
What:• Directly?• Exploration?
How:
Sampling
6
![Page 7: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/7.jpg)
E.g., Random Walk (RW)
• Topology?• Nodes?
What:• Directly?•
Exploration?
How:
Sampling
7
![Page 8: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/8.jpg)
sampled
real
Random Walk (RW):
Apply the Hansen-Hurwitz estimator:
[1] M. Gjoka, M. Kurant, C. T. Butts and A. Markopoulou, “Walking in Facebook: A Case Study of Unbiased Sampling of OSNs”, INFOCOM 2010.
Real average node degree: 94Observed average node degree: 338
A Random Walk in Facebook
degree of node s
![Page 9: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/9.jpg)
Related Work
RW in online graph sampling: • WWW [Henzinger et at. 2000, Baykan et al. 2009]• P2P [Gkantsidis et al. 2004 , Stutzbach et al. 2006, Rasti et al. 2009]• OSN [Rasti et al. 2008, Krishnamurthy et al, 2008, Gjoka et al. 2010]
RW mixing improvements: • Random jumps [Henzinger et al. 2000, Avrachenkov, et al. 2010]• Fastest Mixing Markov Chain [Boyd et al. 2004]• Multiple dependent walks [Ribeiro et al. 2010]• Multigraph Sampling [Gjoka et al. 2011]
![Page 10: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/10.jpg)
What if the nodes are not equally important in our measurement?
![Page 11: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/11.jpg)
Not all nodes are equal
irrelevant
important(equally) important
Node categories: Stratification under Weighted Independence Sampler (WIS)(node size is proportional to its sampling probability)
11
![Page 12: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/12.jpg)
Not all nodes are equal
12
. need we),ˆVar()ˆVar( minimize To
. need we)),ˆVar(),ˆVar(max( minimize To
. and averages theCalculate :
samples) blue no samples,green and red ofnumber same (the 2
:categoriesgreen and red of sizes relative theCompare :
22
2
red
nn
nn
nnn
greenred
redredgreenred
greenred
redredgreenred
green
greenred
2 Example
1 Example
irrelevant
important(equally) important
Node categories: Stratification under Weighted Independence Sampler (WIS)(node size is proportional to its sampling probability)
![Page 13: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/13.jpg)
Not all nodes are equal
But graph exploration techniques have to follow the links!
Trade-off between • ideal (WIS) sampling weights• fast convergence
Enforcing WIS weights may lead to slow (or no) convergence
13
Assumption: On sampling a node, we learn categories of its neighbors.
irrelevant
important(equally) important
Node categories: Stratification under Weighted Independence Sampler (WIS)(node size is proportional to its sampling probability)
Fastest Mixing Markov Chain [Boyd et al. 2004]
![Page 14: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/14.jpg)
Initialization: Pilot Random Walk
![Page 15: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/15.jpg)
• Use classic Random Walk (RW)
Pilot Random Walk (RW)
![Page 16: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/16.jpg)
• Use classic Random Walk (RW)
• Collect a list of existing relevant and irrelevant categories
Pilot Random Walk (RW)
![Page 17: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/17.jpg)
• Use classic Random Walk (RW)
• Collect a list of existing relevant and irrelevant categories
• Estimate the relative volume of each category Ci :
Pilot Random Walk (RW)
![Page 18: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/18.jpg)
• Use classic Random Walk (RW)
• Collect a list of existing relevant and irrelevant categories
• Estimate the relative volume of each category Ci :
Pilot Random Walk (RW)
)(vol
)(vol
)deg()(vol
vol
V
C
vC
i
i
Cvi
f
i
:volume Relative
:Volume
46
2222)(vol
46
2020)(vol
46
44)(vol
vol
vol
vol
f
f
f
blue
green
red
blue
green
red
![Page 19: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/19.jpg)
• Use classic Random Walk (RW)
• Collect a list of existing relevant and irrelevant categories
• Estimate the relative volume of each category Ci :
19
Pilot Random Walk (RW)
• Efficient!
• No need to visit Ci at all!
• Estimation errors do not bias the ultimate measurement result (but they may increase its variance)
)(vol
)(vol
)deg()(vol
vol
V
C
vC
i
i
Cvi
f
i
:volume Relative
:Volume
46
2222)(vol
46
2020)(vol
46
44)(vol
vol
vol
vol
f
f
f
blue
green
red
blue
green
red
RW-based estimator: # of neighbors of u in Ci :
The size of sample S
![Page 20: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/20.jpg)
Stratified Weighted Random Walk
![Page 21: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/21.jpg)
Measurement objective
E.g., compare the size of red and green categories.
21
![Page 22: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/22.jpg)
Measurement objective
Category weights optimal under WIS
Stratified sampling theory +
Information collected by pilot RW
E.g., compare the size of red and green categories.
22
![Page 23: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/23.jpg)
Problem 2: “Black holes”
Measurement objective
Category weights optimal under WIS
Modified category weights
Problem 1: Poor or no connectivity
Solution: Small weight>0 for irrelevant categories. f* -the fraction of time we plan to spend
in irrelevant nodes (e.g., 1%)
Solution:Limit the weight of tiny relevant categories.Γ - maximal factor by which we can
increase edge weights (e.g., 100 times)
E.g., compare the size of red and green categories.
![Page 24: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/24.jpg)
Measurement objective
Category weights optimal under WIS
Modified category weights
Edge weights in G
E.g., compare the size of red and green categories.
20=
vol(green), from pilot RW *
Target edge weights:
22=
4=
![Page 25: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/25.jpg)
Measurement objective
Category weights optimal under WIS
Modified category weights
Edge weights in G
20=
Target edge weights:
22=
4=
Resolve conflicts: • arithmetic mean, • geometric mean, • max, • …
E.g., compare the size of red and green categories.
![Page 26: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/26.jpg)
Measurement objective
Category weights optimal under WIS
Modified category weights
Edge weights in G
WRW sample
E.g., compare the size of red and green categories.
![Page 27: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/27.jpg)
Measurement objective
Category weights optimal under WIS
Modified category weights
Edge weights in G
WRW sample
Final result
Hansen-Hurwitz estimator
E.g., compare the size of red and green categories.
![Page 28: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/28.jpg)
Stratified Weighted Random Walk
(S-WRW)
Measurement objective
Category weights optimal under WIS
Modified category weights
Edge weights in G
WRW sample
Final result
E.g., compare the size of red and green categories.
![Page 29: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/29.jpg)
Simulation results
![Page 30: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/30.jpg)
Simulation results
![Page 31: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/31.jpg)
Simulation results
weight w
NR
MS
E(s
ize(
red
))
S-WRW
RW
WIS
Optimal under WISTradeoff between fast mixing (~RW) and the weights optimal under Weighted Independence Sampler (WIS)
Uniform
![Page 32: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/32.jpg)
weight w
NR
MS
E(s
ize(
red
))
Simulation results
Optimal under WISThe larger the sample size n, the closer to WIS.
![Page 33: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/33.jpg)
Evaluation on Facebook
![Page 34: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/34.jpg)
Colleges in Facebook
versions of S-WRW
Random Walk (RW)
Samples in colleges: 86% of S-WRW, 9% of RW.
This is because S-WRW avoids irrelevant categories.
The difference is larger (100x) for small colleges. This is due
to S-WRW’s stratification.
RW discovered 5’325 colleges. S-WRW: 8’815 (not shown)
![Page 35: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/35.jpg)
35
College size estimation
RW needs about 14 times more samples to achieve the same error!
versions of S-WRW
Random Walk (RW)
13-15 times
irrelevant categories stratification
14 ~= 9 x 1.5
![Page 36: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/36.jpg)
Thank you!
irrelevant
important(equally) important
Walking on a Graph with a Magnifying Glass
Maciej Kurant, Minas Gjoka, Carter T. Butts and Athina Markopoulou, UC Irvine36
Facebook datasets available from : http://odysseas.calit2.uci.edu/osn
Example application: http://geosocialmap.com
![Page 37: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/37.jpg)
![Page 38: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/38.jpg)
Parametersf* : the fraction of time we plan to spend in irrelevant nodes:• f*=0 iff all nodes relevant, f*>0 otherwise.• f*<<1• Exploit the pilot RW information. E.g., f* higher when relevant categories poorly
interconnected• In Facebook, we used f*=1%
Γ>=1 : maximal resolution of our “graph magnifying glass”:• Let B be the size of the largest relevant category. S-WRW will typically
sample well all categories whose size is at least equal to B / Γ.• Think of the smallest category that is still relevant – this gives Γ. • Set Γ smaller for smaller sample size.• Set Γ smaller in graphs with tight community structure. • In Facebook, we set Γ=1000.
In the paper, we show that S-WRW is quite robust to the choice of these parameters.
![Page 39: 1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d1f5503460f949f2c3d/html5/thumbnails/39.jpg)
Toy graphs