Post on 18-Dec-2015
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
Random Walk on Graph
1
Random Walk Start from a given node at time
0 Choose a neighbor randomly
(including previous) and move there
Repeat until time t = n
t=0
Q1. Where does this converge to as n ∞
Q2. How fast does it converge?
Q3. What are the implications for different applications?
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
Random Walks on Graphs
Node degree ki move to any neighbor with prob = 1/ki
This is a Markov chain! Start at a node i p(0) = (0,0,…,1,…0,0) p(n) = p(0) An
π = π A [where π = limn∞ p(n)]
Q: what is π for a random walk on a graph?
2
0 1/k1 1/k1 0 0
1/k2 0 1/k2 1/k2 0
1/k3 1/k3 0 0 1/k3
0 1/k4 0 0 1/k4
0 0 1/k5 1/k5 0
0 1 1 0 0
1 0 1 1 0
1 1 0 0 1
0 1 0 0 1
0 0 1 1 0
A =
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
Random Walks on Undirected Graphs
Stationarity: π(z) = Σxπ(x)p(x,z) p(x,y) = 1/kx
Could try to solve these or global balance. Not Easy!!
Define N(z): {neighbors of z) Σx ∈ N(z) kx⋅p(x,z) =Σx ∈ N(z) kx⋅(1/kx) = Σx ∈ N(z)1 = kz Normalize by (dividing both sides with) Σxkx
Σxkx = 2|E| (|E| = m = # of edges)Σx ∈ N(z) (kx/2|E|)⋅p(x,z) = kz/2|E| π(x) = kx/2|E| is the stationary distribution
always satisfies the stationarity eq π(x) = π(x)P 3
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
What about Random Walks on Directed Graphs?
Assign each node centrality 1/n (for n nodes)
4
1/8
1/8
1/8
1/8
1/8 1/8
1/8
1/8
4/13
2/13
1/13
1/13
1/13 1/13
2/13
1/13
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
A Problematic Graph
Q: What is the problem with this graph?A: All centrality “points” will eventually go to F and GSolution: when at node i
1) With probability β jump to any (of the total N) node(s)2) With 1-β jump to a random neighbor of i
Q: Does this remind you of something?A: PageRank algorithm! PageRank of node i is the stationary probability for a random walk on this (modified) directed graph
factor β in PageRank function avoids this problem by “leaking” some small amount of centrality from each node to all other nodes
5
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
PageRank Centrality
PageRank as a Random Walk
A (bored) web surfer Either surf a linked webpage
with probability 1-β Or surf a random page (e.g. new search)
with probability β The probability of ending up at page X, after a large
enough time = PageRank of page X! Can generalize PageRank with general β = (β1,β2,…,βn) Undirected network: removing β degree centrality
6
Nβ
k
xAβ)(1-x out
j
j
j iji
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
Applications of RW: Measuring Large Networks We are interested in studying the properties (degree
distribution, path lengths, clustering, connectivity, etc.) of many real networks (Internet, Facebook, YouTube, Flickr, etc.) as this contain many important ($$$) information
E.g. to plot degree distribution, we need to crawl the whole network and obtain a “degree value” for each node.
This networks might contain millions of nodes!!
7
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
Online Social Networks (OSNs)
8
(over 15% of world’s population, and over 50% of world’s Internet users !)
> 1 billion users October 2010
500 million 2
200 million 9
130 million 12
100 million 43
75 million 10
75 million 29
Size Traffic
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
Facebook:•500+M users•130 friends each (on average)•8 bytes (64 bits) per user ID
The raw connectivity data, with no attributes:•500 x 130 x 8B = 520 GB
This is neither feasible nor practical. Solution: Sampling!
To get this data, one would have to download:•100+ TB of (uncompressed) HTML data!
Measuring FaceBook
9
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
Measuring Large Networks (for the mere mortals)Obtaining complete dataset difficult
companies usually unwilling to share data for privacy and performance reasons (e.g. Facebook will ban
accounts if it sees extensive crawling) tremendous overhead to measure all (~100TB for Facebook)
Representative samples desirable study properties test algorithms
10
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
• Topology?• Nodes?
What:• Directly?• Exploration?
How:
Sampling
11
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
(1) Breadth-First-Search (BFS)
C
A
EGF
BD
H
12
Unexplored
Explored
Visited
Starting from a seed, explores all neighbor nodes. Process continues iteratively without replacement.
BFS leads to bias towards high degree nodes
Lee et al, “Statistical properties of Sampled Networks”, Phys Review E, 2006
Early measurement studies of OSNs use BFS as primary sampling technique
i.e [Mislove et al], [Ahn et al], [Wilson et al.]
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
(2) Random Walk (RW)
C
A
EGF
BD
H
1/3
1/3
1/3
Next candidate
Current node
13
Explores graph one node at a time with replacement
Restart from different seeds
Or multiple seeds in parallel
Does this lead to a good sample??
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
Random Walk (RW):
[1] M. Gjoka, M. Kurant, C. T. Butts and A. Markopoulou, “Walking in Facebook: A Case Study of Unbiased Sampling of OSNs”, INFOCOM 2010.
Implications for Random Walk Sampling
Say, we collect a small part of the Facebook graph using RW
Higher chance to visit high-degree nodes High-degree nodes overrepresented Low-degree nodes under-represented
Real degree distribution
sampled degree distribution 2?
sampled degree distribution 1?
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
Random Walk Sampling of Facebook
15
Q: How can we fix this?A: Intuition Need to reduce (increase) the probability of
visiting high (low) degree nodes
Real average node degree: 94Observed average node degree: 338
sampled
real
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
Markov Chain Monte Carlo (MCMC)
Q:How should we modify the Random Walk?A: Markov Chain Monte Carlo theory
Original chain: move xy with prob Q(x,y) Stationary distribution π(x)
Desired chain: Stationary distribution w(x) (for uniform sampling: w(x) =
1/N)
New transition probabilities
16
xzz: xy if y)y)a(x,Q(x,1 xy if y)y)a(x,Q(x,y)P(x,
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
MCMC (2)
a(x,y): probability of accepting proposed move
Q: How should we choose a(x,y) so as to converge to the desired stationary distribution w(x)?
A: w(x) station. distr. w(x)P(x,y) = w(y)P(y,x) (for all x,y)
Q: Why? Local balance (time-reversibility) equations
w(x)Q(x,y)a(x,y) = w(y)Q(y,x)a(y,x) (denote b(x,y) = b(y,x)) a(x,y) ≤ 1 (probability) b(x,y) ≤ w(x)Q(x,y) b(x,y) = b(y,x) ≤ w(y)Q(y,x) 17
xzz: xy if y)y)a(x,Q(x,1 xy if y)y)a(x,Q(x,y)P(x,
y)w(x)Q(x,x)w(y)Q(y,1,miny)a(x,
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
MCMC for Uniform Sampling
w(x) = w(y) (= 1/n…doesn’t really matter)
Q(y,x)/Q(x,y) = kx/ky
Metropolis-Hastings random walk: Move to lower degree node always accepted Move to higher degree node reject with prob related to
degree ratio
18
y)w(x)Q(x,x)w(y)Q(y,1,miny)a(x,
y
xkk1,miny)a(x,
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
Explore graph one node at a time with replacement
In the stationary distribution
,
,
1min(1, ) if neighbor of
1 if =
MH ww
MHy
y
kw
k kPP w
Metropolis-Hastings (MH) Random Walk
1
V
19
C
A
EGF
BD
H
Next candidate
Current node
2/15
5
1
5
3
3
1MH
ACP15
2)
5
1
3
1
3
1(1 MH
AAP
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
Degree Distribution of FB with MHRW
Sampled degree distribution almost identical to real one
MCMC methods have MANY other applications Sampling Optimization
20
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
Node Importance: Who is most “central”?
21
?
?
?
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
Node Centrality: Depends on Application Influence: Which social network nodes should I pick to
advertise/spread a video/product/opinion?
Resilience: Which node(s) should I attack to disconnect the network?
Malware/Virus Infection: Which nodes should I immunize (e.g. upload a patch) to stop a given Internet “worm” from spreading quickly?
Performance: Which nodes are the bottleneck in a network?
Search Engines: Which nodes contain the most relevant information?
A centrality measure implicitly solves some optimization problem
22
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
indegree
In each of the following networks, X has higher centrality than Y according to a particular measure
outdegree betweenness closeness
Centrality: Importance based on network position
23
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
He who has many friends is most important.
When is the number of connections the best centrality measure? people who will do favors for you people you can talk to (influence set, information access, …) influence of an article in terms of citations (using in-degree)
Degree Centrality
24
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
divide by the max. possible, i.e. (N-1)
Normalized Degree Centrality
25
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
jkjkB gigiC /)()(
Where gjk = the number of shortest paths connecting j-k, and gjk = the number that node i is on.
Usually normalized by:
2' /)()( niCiC BB
Betweeness Centrality: Definition
26
betweenness of vertex i paths between j and k that pass through i
all paths between j and k
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
betweenness on toy networks
non-normalized version:
27
bridge
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
Nodes are sized by degree, and colored by betweenness.
Can you spot nodes with high betweenness but relatively low degree?
What about high degree but relatively low betweenness?
Betweeness vs. Degree Centrality
28
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
Why is Betweeness Centrality Important?Connectivitya) Remove random nodeb) Remove high degree nodec) Remove high betweeness node
29
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
Why is Betweeness Centrality Important? The network below is a wireless network (e.g. sensor
network) Nodes run on battery total energy Emax
Each node picks a destination randomly and sends data at constant rate every packet going through a node spends E of its energy
Q: How long would it take until the first node dies out of battery?
30
S1
D1
D2
S2
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
How About in This Network?
31
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
Why is Betweeness Centrality Important?Monitoringa) Where would you place a traffic monitor in order to
track the maximum number of packets (if this was your university network)?
b) Where would you place traffic cameras if that was a street network?
32
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
Why is Betweeness Centrality Important? Traffic Flow: Each link has capacity 1Q: What is the maximum throughput between S-D?A: Max Flow – Min Cut theorem max flow equal to min
number of links removed to disconnect S-D S-D throughput = 1
33
S
D
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
Spectral Analysis of (ergodic) Markov Chains If a Markov Chain (defined by transition matrix P) is
ergodic (irreducible, aperiodic, and positive recurrent) P(n)
ik πk and π = [π1, π2,…, πn]
Q: But how fast does the chain converge? E.g. how many steps until we are “close enough” to π
A: This depends on the eigenvalues of P The convergence time is also called the mixing time
34
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
Eigenvalues and Eigenvectors of matrix P
Left EigenvectorsA row vector π is a left eigenvector for eigenvalue λ of matrix P iff πP = λπ Σk πk pki = λπi
Right EigenvectorsA column vector v is a right eigenvector for eigenvalue λ of matrix P iff Pv = λv Σk pik vk = λvi
Q: What eigenvalues and eigenvectors can we guess already?
A: λ = 1 is a left eigenvalue with eigenvector π the stationary distr. λ = 1 is a right eigenvalue with eigenvector v=1 (all 1s) 35
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
Both sets have non-zero solutions (P-λI) is singular There exists v ≠ 0 such that (P-λI)v = 0
Þ Determinant |P-λI| = 0Þ (p11- λ)(p22- λ)-p12p21 = 0Þ λ1=1, λ2 = 1 – p12 – p21 (replace above and confirm using some algebra)
|λ2| < 1
(normalized: π(1) to be a stationary distribution AND v(i) ∙π(i) = 1, ∀i)
Eigenvalues and Eigenvectors for 2-state Chains
36
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
Diagonalization
Þ Eigenvalue decomposition: P = U Λ U-1
Q: What is P(n)?
A:
=>
Q: How fast does the chain converge to stationary distrib.?
A: It converges exponentially fast in n, as (λ2)n 37
)()(
)()(
)()(
)()(
ππ
ππ
λ
λ
vv
vvUΛUP
22
21
12
11
2
1
22
12
21
111
0
0
)()(
)()(
n
n
)()(
)()(nn
ππ
ππ
λ
λ
vv
vvUΛUP
22
21
12
11
2
1
22
12
21
111
0
0
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
Generalization for M-state Markov Chains We’ll assume that there are M distinct eigenvalues
(see notes for repeated ones) Matrix P is stochastic all eigenvalues |λi| ≤ 1Q: Why?
A:
Q: How fast does an (ergodic) chain converge to stationary distribution?
A: Exponentially with rate equal to 2nd largest eigenvalue 38
1 UΛUP nn i
iini
nP )()( πv
)2()2(2
)2(1
)(1
)2(1
)2(1
221
1
1
1
1 n
n
nn
nn
v
v
v
P
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
Speed of Sampling on this Network?
λ2 (2nd largest eigenvalue) related to (balanced) min-cut of the graph
The more “partitioned” a graph is into clusters with few links between them the longer the convergence time for the respective MC the slower the random walk search
39
39
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
Community Detection - Clustering
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
Laplacian
P7-41
L= D-A=1
2 3
4
Diagonal matrix, dii=di
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
Weighted Laplacian
P7-42
1
2 3
410
0.3
2
4
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
so, zero is an eigenvalue
If k connected components,
Fiedler (‘73) called “algebraic connectivity of a graph”The further from 0, the more connected.
Laplacian: fast facts
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
Connected Components
P7-44
G(V,E)
L=
eig(L)=
#zeros = #components
1 2 3
6
7 5
4
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
Connected Components
P7-45
G(V,E)
L=
eig(L)=
#zeros = #components
1 2 3
6
7 5
40.01
Indicates a “good cut”
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
Spectral Image Segmentation (Shi-Malik ‘00)
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
The second eigenvector
Thrasyvoulos Spyropoulos / spyropou@eurecom.fr Eurecom, Sophia-Antipolis
Second Eigenvector’s sparsest cut