Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network
-
Upload
lester-drake -
Category
Documents
-
view
21 -
download
0
description
Transcript of Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network
![Page 1: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/1.jpg)
Size matters:1) Cluster structure of large networks2) Searching the world’s social networkJure Leskovec ([email protected])Computer Science DepartmentCornell University / Stanford University
Joint work with: Eric Horvitz, Michael Mahoney, Kevin Lang, Aniraban Dasgupta
![Page 2: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/2.jpg)
Rich data: Networks
Large on-line computing applications have detailed records of human activity: On-line communities: Facebook (120 million) Communication: Instant Messenger (~1 billion) News and Social media: Blogging (250 million)
We model the data as a network (an interaction graph)
Can observe and study phenomena at scales not
possible before Communication network
![Page 3: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/3.jpg)
3
Outline
The Small-world experiment:▪ On a 240 million node communication
network of Microsoft Instant Messenger
Small vs. large networks:▪ Modeling community (cluster) structure of
large networks
Zachary’s karate club (N=34) Tiny part of a large social network
![Page 4: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/4.jpg)
4
How expressed are communities?
How community like is a set of nodes?
Idea: Use approximation algorithms for NP-hard graph partitioning problems as experimental probes of network structure.
Conductance (normalized cut)
Φ(S) = # edges cut / # edges inside Small Φ(S) corresponds to more
community-like sets of nodes
S
S’
![Page 5: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/5.jpg)
5
Community score (quality)
Score: Φ(S) = # edges cut / # edges inside
What is “best”
community of 5 nodes?
![Page 6: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/6.jpg)
6
Community score (quality)
Score: Φ(S) = # edges cut / # edges inside
Bad communit
yΦ=5/6 = 0.83
What is “best”
community of 5 nodes?
![Page 7: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/7.jpg)
7
Community score (quality)
Score: Φ(S) = # edges cut / # edges inside
Better communit
y
Φ=5/7 = 0.7
Bad communit
y
Φ=2/5 = 0.4
What is “best”
community of 5 nodes?
![Page 8: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/8.jpg)
8
Community score (quality)
Score: Φ(S) = # edges cut / # edges inside
Better communit
y
Φ=5/7 = 0.7
Bad communit
y
Φ=2/5 = 0.4
Best communit
yΦ=2/8 = 0.25
What is “best”
community of 5 nodes?
![Page 9: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/9.jpg)
9
Network Community Profile Plot We define:
Network community profile (NCP) plotPlot the score of best community of size k
Community size, log k
log Φ(k)Φ(5)=0.25
Φ(7)=0.18
k=5 k=7
![Page 10: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/10.jpg)
10
NCP plot: Low-dimensional and random graphs
d-dimensional meshes Hierarchically nested clusters
![Page 11: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/11.jpg)
11
NCP plot: Zachary’s karate club
Zachary’s university karate club social network During the study club split into 2 The split (squares vs. circles) corresponds
to cut B
![Page 12: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/12.jpg)
12
NCP plot: Network Science Collaborations between scientists in
Networks [Newman, 2005]
![Page 13: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/13.jpg)
13
Present work: Large networks
Previous work mostly focused on community structure of small networks (~100 nodes)
We examined 108 different large networks
![Page 14: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/14.jpg)
14
Example of a large network Typical example:
General relativity collaboration network (4,158 nodes, 13,422 edges)
![Page 15: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/15.jpg)
15
More NCP plots of networks
![Page 16: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/16.jpg)
16
Φ(k
), (
con
du
ctan
ce)
k, (community size)
NCP: LiveJournal (N=5M, E=42M)
Better and better
communities
Communities get worse and worse
Best community has ~100
nodes
![Page 17: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/17.jpg)
17
Explanation: Downward part
Small clusters on the edge of the network are responsible for downward part of NCP plot
NCP plot
Best cluster
![Page 18: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/18.jpg)
18
Explanation: Upward part
Each additional edge inside the cluster costs more: NCP plot
Φ=2/4 = 0.5
Φ=8/6 = 1.3
Φ=64/14 = 4.5
Each node has twice as many
children
Φ=1/3 = 0.33
![Page 19: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/19.jpg)
19
Suggested network structure
Network structure: Core-
periphery (jellyfish, octopus)
Whiskers are responsible for
good communities
Denser and denser
core of the network
Core contains
~60% nodes and ~80%
edges
![Page 20: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/20.jpg)
20
What is a good model?
What is a good model that explains such network structure?
Pref. attachment Small World Geometric Pref. Attachment
FlatDown and Flat
Flat and Down
![Page 21: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/21.jpg)
21
Forest Fire model works
Forest Fire [LKF05]: connections spread like a fire New node joins the network Selects a seed node Connects to some of its neighbors Continue recursively
Notes:• Preferential attachment flavor - second neighbor is not uniform at random.• Copying flavor - since burn seed’s neighbors.• Hierarchical flavor - seed is parent.• “Local” flavor - burn “near” -- in a diffusion sense -- the seed vertex.As community grows it
blends into the core of
the network
![Page 22: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/22.jpg)
22
Forest Fire NCP plot
rewired
network
![Page 23: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/23.jpg)
23
Typical cluster size
How does the size of best cluster scale with the size of the network?
![Page 24: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/24.jpg)
24
Size of best cluster over time
Cluster size remains constant (even if one allows nesting) over time
Linked in network over time
![Page 25: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/25.jpg)
25
Cluster size vs. network size
Each dot is a different network
![Page 26: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/26.jpg)
26
Connections
The Dunbar number 150 individuals is maximum community size
What edges “mean” and community identification
Using node and edge types/attributes Implications for machine learning
No large clusters No/little (assortative) hierarchical structure Can’t be well embedded – no underlying
geometry
![Page 27: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/27.jpg)
27
The small-world of the MSN Instant Messenger
Joint work with Eric Horvitz, Microsoft Research
![Page 28: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/28.jpg)
28
The Small-world experiment
Milgram’s small world experiment
The Small-world experiment [Milgram ’67, Dodds-Muhamad-Watts ‘03] People send letters from Nebraska to Boston
How many steps does it take? 6.2 on the average, thus “6 degrees of separation”
![Page 29: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/29.jpg)
29
The Small-world experiment 1) Short paths exist in a social
network 2) People are able to find them
(using only partial knowledge of the network)
Local search: forwarding a message
ts
d(s,t)=h
Good nodes:d=h-1
Bad nodes: d≥h
Target
![Page 30: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/30.jpg)
30
Our dataset: Instant Messaging
Contact (buddy) list Messaging window
![Page 31: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/31.jpg)
31
MSN communication
We collected the data for June 20064.5Tb of compressed data: 245 million users logged in 180 million users engaged in
conversations 255 billion exchanged messages 1 billion conversations / day
![Page 32: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/32.jpg)
32
MSN network
The network: 180M nodes, 1.3B undirected edges
![Page 33: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/33.jpg)
33
MSN: path lengths
MSN Messenger network
Number of steps
between pairs of people
Avg. path length 6.690% of the people can be reached in
< 8 hops
Hops Nodes0 1
1 10
2 78
3 3,96
4 8,648
5 3,299,252
6 28,395,849
7 79,059,497
8 52,995,778
9 10,321,008
10 1,955,007
11 518,410
12 149,945
13 44,616
14 13,740
15 4,476
16 1,542
17 536
18 167
19 71
20 29
21 16
22 10
23 3
24 2
25 3
![Page 34: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/34.jpg)
34
Degree distribution:
A node that exchanged
messages with ~2 million people
![Page 35: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/35.jpg)
35
Robustness of shortest paths
Short paths exist and they are robust
Randomized network (same degree distr.)
All links
Both way links
![Page 36: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/36.jpg)
36
Learning to search in a network
What is the decision function that makes me forward the message to the target?
ts
d(s,t)=h
Good nodes:d=h-1
Bad nodes: d≥h
Target
What are the characteristics of shortest paths? How hard is it to
find them?
![Page 37: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/37.jpg)
37
Does geography help?
t s
![Page 38: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/38.jpg)
38
Does geography help?
t s
![Page 39: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/39.jpg)
39
How hard is to find a good node?
t s
![Page 40: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/40.jpg)
40
How hard is to find a good node?
Probability of success if we forward to a
random neighbor
t s
![Page 41: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/41.jpg)
41
Algorithm accuracy at hops
t s
![Page 42: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/42.jpg)
42
Algorithm accuracy at hops
t s
Use a decision tree to learn a classifier:Model: 0.4128Random : 0.0207
![Page 43: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/43.jpg)
43
The learned model
Green bar is prob. that node is good
![Page 44: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/44.jpg)
44
Comparing search heuristics Pick a pair of nodes: start at s Walk until hit the target t where next node is chosen:
Search alg. % found Mean path lengthRandom 0.0008 3,709MinGeoDist 0.0282 778MaxDeg 0.0158 4,964Deg/Geo2 0.1446 2,676Cntry 0.0108 402Cntry*Deg 0.1313 3,114Lang 0.0055 1,699Lang*Deg 0.0496 3,163 Age 0.0012 2,890 Age*Deg 0.0203 5,324 ts
It works!(in a network with 180 million nodes)
-- Milgram’s path completion is 29%-- Dodds,Muhhamad, Watts: 0.015% comp
![Page 45: Size matters: 1) Cluster structure of large networks 2) Searching the world’s social network](https://reader035.fdocuments.us/reader035/viewer/2022070402/5681384d550346895d9ff7c6/html5/thumbnails/45.jpg)
45
Conclusions and reflections
Why are networks the way they are?
Only recently have basic properties been observed on a large scale Confirms social science intuitions; calls
others into question
Benefits of working with large data Observe structures not visible at
smaller scales