Centrality Measures and Link Analysis
Transcript of Centrality Measures and Link Analysis
![Page 1: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/1.jpg)
Centrality Measures and Link Analysis
Gonzalo MateosDept. of ECE and Goergen Institute for Data Science
University of Rochester
http://www.ece.rochester.edu/~gmateosb/
February 21, 2020
Network Science Analytics Centrality Measures and Link Analysis 1
![Page 2: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/2.jpg)
Centrality measures
Centrality measures
Case study: Stability of centrality measures in weighted graphs
Centrality, link analysis and web search
A primer on Markov chains
PageRank as a random walk
PageRank algorithm leveraging Markov chain structure
Network Science Analytics Centrality Measures and Link Analysis 2
![Page 3: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/3.jpg)
Quantifying vertex importance
I In network analysis many questions relate to vertex importance
Example
I Q1: Which actors in a social network hold the ‘reins of power’?
I Q2: How authoritative is a WWW page considered by peers?
I Q3: The ‘knock-out’ of which genes is likely to be lethal?
I Q4: How critical to the daily commute is a subway station?
I Measures of vertex centrality quantify such notions of importance
⇒ Degrees are simplest centrality measures. Let’s study others
Network Science Analytics Centrality Measures and Link Analysis 3
![Page 4: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/4.jpg)
Closeness centrality
I Rationale: ‘central’ means a vertex is ‘close’ to many other vertices
I Def: Distance d(u, v) between vertices u and v is the length of theshortest u − v path. Oftentimes referred to as geodesic distance
I Closeness centrality of vertex v is given by
cCl(v) =1∑
u∈V d(u, v)
I Interpret v∗ = argmaxv cCl(v) as the most approachable node in G
Network Science Analytics Centrality Measures and Link Analysis 4
![Page 5: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/5.jpg)
Normalization, computation and limitations
I To compare with other centrality measures, often normalize to [0, 1]
cCl(v) =Nv − 1∑
u∈V d(u, v)
I Computation: need all pairwise shortest path distances in G
⇒ Dijkstra’s algorithm in O(N2v logNv + NvNe) time
I Limitation 1: sensitivity, values tend to span a small dynamic range
⇒ Hard to discriminate between central and less central nodes
I Limitation 2: assumes connectivity, if not cCl(v) = 0 for all v ∈ V
⇒ Compute centrality indices in different components
Network Science Analytics Centrality Measures and Link Analysis 5
![Page 6: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/6.jpg)
Betweenness centrality
I Rationale: ‘central’ node is (in the path) ‘between’ many vertex pairs
I Betweenness centrality of vertex v is given by
cBe(v) =∑
s 6=t 6=v∈V
σ(s, t|v)σ(s, t)
I σ(s, t) is the total number of s − t shortest pathsI σ(s, t|v) is the number of s − t shortest paths through v ∈ V
I Interpret v∗ = argmaxv cBe(v) as the controller of information flow
Network Science Analytics Centrality Measures and Link Analysis 6
![Page 7: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/7.jpg)
Computational considerations
I Notice that a s − t shortest path goes through v if and only if
d(s, t) = d(s, v) + d(v , t)
I Betweenness centralities can be naively computed for all v ∈ V by:
Step 1: Use Dijkstra to tabulate d(s, t) and σ(s, t) for all s, tStep 2: Use the tables to identify σ(s, t|v) for all vStep 3: Sum the fractions to obtain cBe(v) for all v (O(N3
v ) time)
I Cubic complexity can be prohibitive for large networks
I O(NvNe)-time algorithm for unweighted graphs in:
U. Brandes, “A faster algorithm for betweenness centrality,” Journalof Mathematical Sociology, vol. 25, no. 2, pp. 163-177, 2001
Network Science Analytics Centrality Measures and Link Analysis 7
![Page 8: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/8.jpg)
Eigenvector centrality
I Rationale: ‘central’ vertex if ‘in-neighbors’ are themselves important
⇒ Compare with ‘importance-agnostic’ degree centrality
I Eigenvector centrality of vertex v is implicitly defined as
cEi (v) = α∑
(u,v)∈E
cEi (u)
1
2 3
45
6
I No one points to 1
I Only 1 points to 2
I Only 2 points to 3, but 2more important than 1
I 4 as high as 5 with less links
I Links to 5 have lower rank
I Same for 6
Network Science Analytics Centrality Measures and Link Analysis 8
![Page 9: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/9.jpg)
Eigenvalue problem
I Recall the adjacency matrix A and
cEi (v) = α∑
(u,v)∈E
cEi (u)
I Vector cEi = [cEi (1), . . . , cEi (Nv )]> solves the eigenvalue problem
AcEi = α−1cEi
⇒ Typically α−1 chosen as largest eigenvalue of A [Bonacich’87]
I If G is undirected and connected, by Perron’s Theorem then
⇒ The largest eigenvalue of A is positive and simple
⇒ All the entries in the dominant eigenvector cEi are positive
I Can compute cEi and α−1 via O(N2v ) complexity power iterations
cEi (k + 1) =AcEi (k)
‖AcEi (k)‖, k = 0, 1, . . .
Network Science Analytics Centrality Measures and Link Analysis 9
![Page 10: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/10.jpg)
Example: Comparing centrality measures
I Q: Which vertices are more central? A: It depends on the context
I Each measure identifies a different vertex as most central
⇒ None is ‘wrong’, they target different notions of importance
Network Science Analytics Centrality Measures and Link Analysis 10
![Page 11: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/11.jpg)
Example: Comparing centrality measures
I Q: Which vertices are more central? A: It depends on the context
4
(a) (b)
(c) (d)
Fig. 4.4 Illustration of (b) closeness, (c) betweenness, and (d) eigenvector centrality measures onthe graph in (a). Example and figures courtesy of Ulrik Brandes.
4
(a) (b)
(c) (d)
Fig. 4.4 Illustration of (b) closeness, (c) betweenness, and (d) eigenvector centrality measures onthe graph in (a). Example and figures courtesy of Ulrik Brandes.
4
(a) (b)
(c) (d)
Fig. 4.4 Illustration of (b) closeness, (c) betweenness, and (d) eigenvector centrality measures onthe graph in (a). Example and figures courtesy of Ulrik Brandes.
4
(a) (b)
(c) (d)
Fig. 4.4 Illustration of (b) closeness, (c) betweenness, and (d) eigenvector centrality measures onthe graph in (a). Example and figures courtesy of Ulrik Brandes.Closeness Betweenness Eigenvector
I Small green vertices are arguably more peripheral
⇒ Less clear how the yellow, dark blue and red vertices compare
Network Science Analytics Centrality Measures and Link Analysis 11
![Page 12: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/12.jpg)
Case study
Centrality measures
Case study: Stability of centrality measures in weighted graphs
Centrality, link analysis and web search
A primer on Markov chains
PageRank as a random walk
PageRank algorithm leveraging Markov chain structure
Network Science Analytics Centrality Measures and Link Analysis 12
![Page 13: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/13.jpg)
Centrality measures robustness
I Robustness to noise in network data is of practical importance
I Approaches have been mostly empirical
⇒ Find average response in random graphs when perturbed
⇒ Not generalizable and does not provide explanations
I Characterize behavior in noisy real graphs
⇒ Degree and closeness are more reliable than betweenness
I Q: What is really going on?
⇒ Framework to study formally the stability of centrality measures
I S. Segarra and A. Ribeiro, “Stability and continuity of centralitymeasures in weighted graphs,” IEEE Trans. Signal Process., 2015
Network Science Analytics Centrality Measures and Link Analysis 13
![Page 14: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/14.jpg)
Definitions for weighted digraphs
I Weighted and directed graphs G (V ,E ,W )
⇒ Set V of Nv vertices
⇒ Set E ⊆ V × V of edges
⇒ Map W : E → R++ of weights in each edge
a b
c
5
42
3
I Path P(u, v) is an ordered sequence of nodes from u to v
I When weights represent dissimilarities
⇒ Path length is the sum of the dissimilarities encountered
I Shortest path length sG (u, v) from u to v
sG (u, v) := minP(u,v)
`−1∑i=0
W (ui , ui+1)
Network Science Analytics Centrality Measures and Link Analysis 14
![Page 15: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/15.jpg)
Stability of centrality measures
I Space of graphs G(V ,E) with (V ,E ) as vertex and edge set
I Define the metric d(V ,E)(G ,H) : G(V ,E) × G(V ,E) → R+
d(V ,E)(G ,H) :=∑e∈E
|WG (e)−WH(e)|
I Def: A centrality measure c(·) is stable if for any vertex v ∈ V in anytwo graphs G ,H ∈ G(V ,E), then∣∣cG (v)− cH(v)
∣∣ ≤ KG d(V ,E)(G ,H)
I KG is a constant depending on G onlyI Stability is related to Lipschitz continuity in G(V ,E)
I Independent of the definition of d(V ,E) (equivalence of norms)
I Node importance should be robust to small perturbations in the graph
Network Science Analytics Centrality Measures and Link Analysis 15
![Page 16: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/16.jpg)
Degree centrality
I Sum of the weights of incoming arcs
cDe(v) :=∑
u|(u,v)∈E
W (u, v)
I Applied to graphs where the weights in W represent similaritiesI High cDe(v) ⇒ v similar to its large number of neighbors
Proposition 1
For any vertex v ∈ V in any two graphs G ,H ∈ G(V ,E), we have that
|cGDe(v)− cHDe(v)| ≤ d(V ,E)(G ,H)
i.e., degree centrality cDe is a stable measure
I Can show closeness and eigenvector centralities are also stable
Network Science Analytics Centrality Measures and Link Analysis 16
![Page 17: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/17.jpg)
Betweenness centrality
I Look at the shortest paths for every two nodes distinct from v
⇒ Sum the proportion that contains node v
cBe(v) :=∑
s 6=v 6=t∈V
σ(s, t|v)σ(s, t)
I σ(s, t) is the total number of s − t shortest pathsI σ(s, t|v) is the number of those paths going through v
Proposition 2
The betweenness centrality measure cBe is not stable
Network Science Analytics Centrality Measures and Link Analysis 17
![Page 18: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/18.jpg)
Instability of betweenness centrality
I Compare the value of cBe(v) in graphs G and H
G
v1 1
1 1
1
1
1
1
cGBe(v) = 9
H
v1 + ε 1 + ε
1 1
1
1
1
1
cHBe(v) = 0
⇒ Centrality value cHBe(v) = 0 remains unchanged for any ε > 0
I For small values of ε, graphs G and H become arbitrarily similar
9 = |cGBe(v)− cHBe(v)| ≤ KG d(V ,E)(G ,H) → 0
⇒ Inequality is not true for any constant KG
Network Science Analytics Centrality Measures and Link Analysis 18
![Page 19: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/19.jpg)
Stable betweenness centrality
I Define G v =(V v ,E v ,W v ), V v=V \{v}, E v=E |V v×V v , W v=W |E v
⇒ G v obtained by deleting from G node v and edges connected to v
I Stable betweenness centrality cSBe(v)
cSBe(v) :=∑
s 6=v 6=t∈V
sG v (s, t)− sG (s, t)
⇒ Captures impact of deleting v on the shortest paths
I If v is (not) in the s − t shortest path, sG v (s, t)− sG (s, t) > (=)0
⇒ Same notion as (traditional) betweenness centrality cBe
Proposition 3
For any vertex v ∈ V in any two graphs G ,H ∈ G(V ,E), then
|cGSBe(v)− cHSBe(v)| ≤ 2N2v d(V ,E)(G ,H)
i.e., stable betweenness centrality cSBe is a stable measure
Network Science Analytics Centrality Measures and Link Analysis 19
![Page 20: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/20.jpg)
Centrality ranking variation in random graphs
I Gn,p graphs with p = 10/n and weights U(0.5, 1.5)⇒ Vary n from 10 to 200
⇒ Perturb multiplying weights with random numbers U(0.99, 1.01)
I Compare centrality rankings in the original and perturbed graphs
0 20 40 60 80 100 120 140 160 180 2000
2
4
6
8
10
Network Size
Mea
n m
axim
um c
hang
e in
ran
king
CD
CC
CB
CE
CSB
0 20 40 60 80 100 120 140 160 180 2000
0.2
0.4
0.6
0.8
1
1.2
1.4
Network Size
Mea
n av
erag
e ch
ange
in r
anki
ng
CD
CC
CB
CE
CSB
I Betweenness centrality presents larger maximum and average changes
Network Science Analytics Centrality Measures and Link Analysis 20
![Page 21: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/21.jpg)
Centrality ranking variation in random graphs
I Compute probability of observing a ranking change ≥ 5
⇒ Plot the histogram giving rise to the empirical probabilities
0 20 40 60 80 100 120 140 160 180 2000
0.2
0.4
0.6
0.8
1
Network Size
Pro
babi
lity
max
cha
nge
grea
ter
than
5
CD
CC
CB
CE
CSB
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200
10
20
30
40
50
60
Maximum change in rankingF
requ
ency
CD
CC
CB
CE
CSB
I For cBe some node varies its ranking by 5 positions with high probability
I Long tail in histogram is evidence of instability
⇒ Minor perturbation generates change of 19 positions
Network Science Analytics Centrality Measures and Link Analysis 21
![Page 22: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/22.jpg)
Centrality ranking variation in an airport graph
I Real-world graph based on the air traffic between popular U.S. airports
⇒ Nodes are Nv = 25 popular airports
⇒ Edge weights are the number of yearly passengers between them
5 10 15 20 25 30 35 40 45 500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Perturbation Size (1000 x δ)
Pro
babi
lity
max
cha
nge
grea
ter
than
1
CD
CC
CB
CE
CSB
0 1 2 3 4 5 60
10
20
30
40
50
60
70
80
90
Maximum change in ranking
Fre
quen
cy
CD
CC
CB
CE
CSB
I Betweenness centrality still presents the largest variations
Network Science Analytics Centrality Measures and Link Analysis 22
![Page 23: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/23.jpg)
Centrality, link analysis and web search
Centrality measures
Case study: Stability of centrality measures in weighted graphs
Centrality, link analysis and web search
A primer on Markov chains
PageRank as a random walk
PageRank algorithm leveraging Markov chain structure
Network Science Analytics Centrality Measures and Link Analysis 23
![Page 24: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/24.jpg)
The problem of ranking websites
I Search engines rank pages by looking at the Web itself
⇒ Enough information intrinsic to the Web and its structure
I Information retrieval is a historically difficult problem
⇒ Keywords vs complex information needs (synonymy, polysemy)
I Beyond explosion in scale, unique issues arised with the WebI Diversity of authoring styles, people issuing queriesI Dynamic and constantly changing contentI Paradigm: from scarcity to abundance
I Finding and indexing documents that are relevant is ‘easy’
I Q: Which few of these should the engine recommend?
⇒ Key is understanding Web structure, i.e., link analysis
Network Science Analytics Centrality Measures and Link Analysis 24
![Page 25: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/25.jpg)
Voting by in-links
Ex: Suppose we issue the query ‘newspapers’
I First, use text-only information retrieval to identify relevant pages
I Idea: Links suggest implicit endorsements of other relevant pagesI Count in-links to assess the authority of a page on ‘newspapers’
Network Science Analytics Centrality Measures and Link Analysis 25
![Page 26: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/26.jpg)
A list-finding technique
I Query also returns pages that compile lists of relevant resourcesI These hubs voted for many highly endorsed (authoritative) pages
I Idea: Good lists have a better sense of where the good results areI Page’s hub value is the sum of votes received by its linked pages
Network Science Analytics Centrality Measures and Link Analysis 26
![Page 27: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/27.jpg)
Repeated improvement
I Reasonable to weight more the votes of pages scoring well as lists
⇒ Recompute votes summing linking page values as lists
I Q: Why stop here? Use also improved votes to refine the list scores
⇒ Principle of repeated improvement
Network Science Analytics Centrality Measures and Link Analysis 27
![Page 28: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/28.jpg)
Hubs and authorities
I Relevant pages fall in two categories: hubs and authorities
I Authorities are pages with useful, relevant contentI Newspaper home pagesI Course home pagesI Auto manufacturer home pages
I Hubs are ‘expert’ lists pointing to multiple authoritiesI List of newspapersI Course bulletinI List of US auto manufacturers
I Rules: Authorities and hubs have a mutual reinforcement relationship
⇒ A good hub links to multiple good authorities
⇒ A good authority is linked from multiple good hubs
Network Science Analytics Centrality Measures and Link Analysis 28
![Page 29: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/29.jpg)
Hubs and authorities ranking algorithm
I Hyperlink-Induced Topic Search (HITS) algorithm [Kleinberg’98]
I Each page v ∈ V has a hub score hv and authority score av
⇒ Network-wide vectors h = [h1, . . . , hNv ]>, a = [a1, . . . , aNv ]
>
Authority update rule:
av (k) =∑
(u,v)∈E
hu(k − 1), for all v ∈ V ⇔ a(k) = A>h(k − 1)
Hub update rule:
hv (k) =∑
(v ,u)∈E
au(k), for all v ∈ V ⇔ h(k) = Aa(k)
I Initialize h(0) = 1/√Nv , normalize a(k) and h(k) each iteration
Network Science Analytics Centrality Measures and Link Analysis 29
![Page 30: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/30.jpg)
Limiting values
I Define the hub and authority rankings as
a := limk→∞
a(k), h := limk→∞
h(k)
I From the HITS update rules one finds for k = 0, 1, . . .
a(k + 1) =A>Aa(k)
‖A>Aa(k)‖, h(k + 1) =
AA>h(k)
‖AA>h(k)‖
I Power iterations converge to dominant eigenvectors of A>A and AA>
A>Aa = α−1a a, AA>h = α−1
h h
⇒ Hub and authority ranks are eigenvector centrality measures
Network Science Analytics Centrality Measures and Link Analysis 30
![Page 31: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/31.jpg)
Link analysis beyond the web
Ex: link analysis of citations among US Supreme Court opinions
I Rise and fall of authority of key Fifth Amendment cases [Fowler-Jeon’08]
Network Science Analytics Centrality Measures and Link Analysis 31
![Page 32: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/32.jpg)
PageRank
I Node rankings to measure website relevance, social influence
I Key idea: in-links as votes, but ‘not all links are created equal’
⇒ How many links point to a node (outgoing links irrelevant)
⇒ How important are the links that point to a node
I PageRank key to Google’s original ranking algorithm [Page-Brin’98]
I Inuition 1: fluid that percolates through the network
⇒ Eventually accumulates at most relevant Web pages
I Inuition 2: random web surfer (more soon)
⇒ In the long-run, relevant Web pages visited more often
I PageRank and HITS success was quite different after 1998
Network Science Analytics Centrality Measures and Link Analysis 32
![Page 33: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/33.jpg)
Basic PageRank update rule
I Each page v ∈ V has PageRank rv , let r = [r1, . . . , rNv ]>
⇒ Define P := (Dout)−1A, where Dout is the out-degree matrix
PageRank update rule:
rv (k) =∑
(u,v)∈E
ru(k − 1)
doutu
, for all v ∈ V ⇔ r(k) = PT r(k − 1)
I Split current PageRank evenly among outgoing links and pass it on
⇒ New PageRank is the total fluid collected in the incoming links
⇒ Initialize r(0) = 1/Nv . Flow conserved, no normalization needed
I Problem: ‘Spider traps’I Accumulate all PageRankI Only when not strongly connected
Network Science Analytics Centrality Measures and Link Analysis 33
![Page 34: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/34.jpg)
Scaled PageRank update rule
I Apply the basic PageRank rule and scale the result by s ∈ (0, 1)
Split the leftover (1− s) evenly among all nodes (evaporation-rain)
Scaled PageRank update rule:
rv (k) = s ×∑
(u,v)∈E
ru(k − 1)
doutu
+1− s
Nv, for all v ∈ V
I Can view as basic update r(k) = P̄T r(k − 1) with
P̄ := sP+ (1− s)11>
Nv
⇒ Scaling factor s typically chosen between 0.8 and 0.9
⇒ Power iteration converges to the dominant eigenvector of P̄T
Network Science Analytics Centrality Measures and Link Analysis 34
![Page 35: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/35.jpg)
A primer on Markov chains
Centrality measures
Case study: Stability of centrality measures in weighted graphs
Centrality, link analysis and web search
A primer on Markov chains
PageRank as a random walk
PageRank algorithm leveraging Markov chain structure
Network Science Analytics Centrality Measures and Link Analysis 35
![Page 36: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/36.jpg)
Markov chains
I Consider discrete-time index n = 0, 1, 2, . . .
I Time-dependent random state Xn takes values on a countable setI In general denote states as i = 0, 1, 2, . . ., i.e., here the state space is NI If Xn = i we say “the process is in state i at time n”
I Random process is XN, its history up to n is Xn = [Xn,Xn−1, . . . ,X0]T
I Def: process XN is a Markov chain (MC) if for all n ≥ 1, i , j , x ∈ Nn
P(Xn+1 = j
∣∣Xn = i ,Xn−1 = x)= P
(Xn+1 = j
∣∣Xn = i)= Pij
I Future depends only on current state Xn (memoryless, Markov property)
⇒ Future conditionally independent of the past, given the present
Network Science Analytics Centrality Measures and Link Analysis 36
![Page 37: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/37.jpg)
Matrix representation
I Group the Pij in a transition probability “matrix” P
P =
P00 P01 P02 . . . P0j . . .P10 P11 P12 . . . P1j . . ....
......
......
...Pi0 Pi1 Pi2 . . . Pij . . ....
......
......
. . .
⇒ Not really a matrix if number of states is infinite
I Row-wise sums should be equal to one, i.e.,∑∞
j=0 Pij = 1 for all i
Network Science Analytics Centrality Measures and Link Analysis 37
![Page 38: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/38.jpg)
Graph representation
I A graph representation or state transition diagram is also used
i i+1i−1 . . .. . .
Pi,i
Pi,i+1
Pi,i−1
Pi+1,i+1
Pi+1,i
Pi+1,i+2
Pi−1,i−1
Pi−1,i
Pi−1,i−2 Pi+2,i+1
Pi−2,i−1
I Useful when number of states is infinite, skip arrows if Pij = 0
I Again, sum of per-state outgoing arrow weights should be one
Network Science Analytics Centrality Measures and Link Analysis 38
![Page 39: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/39.jpg)
Example: Bipolar mood
I I can be happy (Xn = 0) or sad (Xn = 1)
⇒ My mood tomorrow is only affected by my mood today
I Model as Markov chain with transition probabilities
P =
(0.8 0.20.3 0.7
)H S
0.8
0.2
0.7
0.3
I Inertia ⇒ happy or sad today, likely to stay happy or sad tomorrow
I But when sad, a little less likely so (P00 > P11)
Network Science Analytics Centrality Measures and Link Analysis 39
![Page 40: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/40.jpg)
Example: Random (drunkard’s) walk
I Step to the right w.p. p, to the left w.p. 1− p
⇒ Not that drunk to stay on the same place
i i+1i−1 . . .. . .
p
1− p 1− p
pp
1− p 1− p
p
I States are 0,±1,±2, . . . (state space is Z), infinite number of states
I Transition probabilities are
Pi,i+1 = p, Pi,i−1 = 1− p
I Pij = 0 for all other transitions
Network Science Analytics Centrality Measures and Link Analysis 40
![Page 41: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/41.jpg)
Multiple-step transition probabilities
I Q: What can be said about multiple transitions?
I Probabilities of Xm+n given Xm ⇒ n-step transition probabilities
Pnij = P
(Xm+n = j
∣∣Xm = i)
⇒ Define the matrix P(n) with elements Pnij
TheoremThe matrix of n-step transition probabilities P(n) is given by the n-thpower of the transition probability matrix P, i.e.,
P(n) = Pn
Henceforth we write Pn
Network Science Analytics Centrality Measures and Link Analysis 41
![Page 42: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/42.jpg)
Unconditional probabilities
I All probabilities so far are conditional, i.e., Pnij = P
(Xn = j
∣∣X0 = i)
⇒ May want unconditional probabilities pj(n) = P (Xn = j)
I Requires specification of initial conditions pi (0) = P (X0 = i)
I Using law of total probability and definitions of Pnij and pj(n)
pj(n) = P (Xn = j) =∞∑i=0
P(Xn = j
∣∣X0 = i)P (X0 = i)
=∞∑i=0
Pnijpi (0)
I In matrix form (define vector p(n) = [p1(n), p2(n), . . .]T )
p(n) = (Pn)T p(0)
Network Science Analytics Centrality Measures and Link Analysis 42
![Page 43: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/43.jpg)
Limiting distributions
I MCs have one-step memory. Eventually they forget initial state
I Q: What can we say about probabilities for large n?
πj := limn→∞
P(Xn = j
∣∣X0 = i)= lim
n→∞Pnij
⇒ Assumed that limit is independent of initial state X0 = i
I We’ve seen that this problem is related to the matrix power Pn
P =
(0.8 0.20.3 0.7
), P7 =
(0.6031 0.39690.5953 0.4047
)P2 =
(0.7 0.30.45 0.55
), P30 =
(0.6000 0.40000.6000 0.4000
)
I Matrix product converges ⇒ probs. independent of time (large n)
I All rows are equal ⇒ probs. independent of initial condition
Network Science Analytics Centrality Measures and Link Analysis 43
![Page 44: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/44.jpg)
Limit distribution of ergodic Markov chains
TheoremFor an ergodic (i.e. irreducible, aperiodic, and positive recurrent) MC,limn→∞ Pn
ij exists and is independent of the initial state i , i.e.,
πj = limn→∞
Pnij
Furthermore, steady-state probabilities πj ≥ 0 are the unique nonnegativesolution of the system of linear equations
πj =∞∑i=0
πiPij ,
∞∑j=0
πj = 1
I Limit probs. independent of initial condition exist for ergodic MC
⇒ Simple algebraic equations can be solved to find πj
Network Science Analytics Centrality Measures and Link Analysis 44
![Page 45: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/45.jpg)
Markov chains meet eigenvalue problems
I Define vector steady-state distribution π := [π0, π1, . . . , πJ ]T
I Limit distribution is unique solution of
π = PTπ, πT1 = 1
I Eigenvector π associated with eigenvalue 1 of PT
I Eigenvectors are defined up to a scaling factorI Normalize to sum 1
I All other eigenvalues of PT have modulus smaller than 1
I Computing π as eigenvector is computationally efficient
Network Science Analytics Centrality Measures and Link Analysis 45
![Page 46: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/46.jpg)
Ergodicity
I Def: Fraction of time T(n)i spent in i-th state by time n is
T(n)i :=
1
n
n∑m=1
I {Xm = i}
I Compute expected value of T(n)i
E[T
(n)i
]=
1
n
n∑m=1
E [I {Xm = i}] = 1
n
n∑m=1
P (Xm = i)
I As n → ∞, probabilities P (Xm = i) → πi (ergodic MC). Then
limn→∞
E[T
(n)i
]= lim
n→∞
1
n
n∑m=1
P (Xm = i) = πi
I For ergodic MCs same is true without expected value ⇒ Ergodicity
limn→∞
T(n)i = lim
n→∞
1
n
n∑m=1
I {Xm = i} = πi , a.s.
Network Science Analytics Centrality Measures and Link Analysis 46
![Page 47: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/47.jpg)
Example: Ergodic Markov chain
I Consider an ergodic Markov chain with transition probability matrix
P :=
0 0.3 0.70.1 0.5 0.40.1 0.2 0.7
Visits to states, nT
(n)i Ergodic averages, T
(n)i
0 5 10 15 20 25 30 35 400
5
10
15
20
25
time
num
ber o
f visi
ts
State 1State 2State 3
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
time
ergo
dic
aver
ages
State 1State 2State 3
I Ergodic averages slowly converge to π = [0.09, 0.29, 0.61]T
Network Science Analytics Centrality Measures and Link Analysis 47
![Page 48: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/48.jpg)
PageRank: Random walk formulation
Centrality measures
Case study: Stability of centrality measures in weighted graphs
Centrality, link analysis and web search
A primer on Markov chains
PageRank as a random walk
PageRank algorithm leveraging Markov chain structure
Network Science Analytics Centrality Measures and Link Analysis 48
![Page 49: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/49.jpg)
Preliminary definitions
I Graph G = (V ,E ) ⇒ vertices V = {1, 2, . . . , J} and edges E
1
2 3
45
6
I Outgoing neighborhood of i is the set of nodes j to which i points
n(i) := {j : (i , j) ∈ E}
I Incoming neighborhood of i is the set of nodes that point to i :
n−1(i) := {j : (j , i) ∈ E}
I Strongly connected G ⇒ directed path joining any pair of nodes
Network Science Analytics Centrality Measures and Link Analysis 49
![Page 50: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/50.jpg)
Definition of rank
I Agent A chooses node i , e.g., web page, at random for initial visit
I Next visit randomly chosen between links in the neighborhood n(i)
⇒ All neighbors chosen with equal probability
I If reach a dead end because node i has no neighbors
⇒ Chose next visit at random equiprobably among all nodes
I Redefine graph G = (V ,E ) adding edges from dead ends to all nodes
⇒ Restrict attention to connected (modified) graphs
1
2 3
45
6
I Rank of node i is the average number of visits of agent A to i
Network Science Analytics Centrality Measures and Link Analysis 50
![Page 51: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/51.jpg)
Equiprobable random walk
I Formally, let An be the node visited at time n
I Define transition probability Pij from node i into node j
Pij := P(An+1 = j
∣∣An = i)
I Next visit equiprobable among i ’s Ni := |n(i)| neighbors
Pij =1
|n(i)|=
1
Ni, for all j ∈ n(i)
1
2 3
45
6
to 1
to 2
to 3
to 4
to 51/2
1/2
1/2
1/2 1/2
1/2
1/2
1/2
1
1/5
1/5
1/5
1/5
1/5
I Still have a graph
I But also a MC
I Red (not blue) circles
Network Science Analytics Centrality Measures and Link Analysis 51
![Page 52: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/52.jpg)
Formal definition of rank
I Def: Rank ri of i-th node is the time average of number of visits
ri := limn→∞
1
n
n∑m=1
I {Am = i}
⇒ Define vector of ranks r := [r1, r2, . . . , rJ ]T
I Rank ri can be approximated by average rni at time n
rni :=1
n
n∑m=1
I {Am = i}
⇒ Since limn→∞
rni = ri , it holds rni ≈ ri for n sufficiently large
⇒ Define vector of approximate ranks rn := [rn1, rn2, . . . , rnJ ]T
I If modified graph is connected, rank independent of initial visit
Network Science Analytics Centrality Measures and Link Analysis 52
![Page 53: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/53.jpg)
Ranking algorithm
Output : Vector r(i) with ranking of node iInput : Scalar n indicating maximum number of iterationsInput : Vector N(i) containing number of neighbors of iInput : Matrix N(i , j) containing indices j of neighbors of i
m = 1; r=zeros(J,1); % Initialize time and ranksA0 = random(’unid’,J); % Draw first visit uniformly at randomwhile m < n do
jump = random(’unid’,N(Am−1)); % Neighbor uniformly at randomAm = N(Am−1, jump); % Jump to selected neighborr(Am) = r(Am) + 1; % Update ranking for Am
m = m + 1;endr = r/n; % Normalize by number of iterations n
Network Science Analytics Centrality Measures and Link Analysis 53
![Page 54: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/54.jpg)
Social graph example
I Asked probability students about homework collaboration
I Created (crude) graph of the social network of students in the class
⇒ Used ranking algorithm to understand connectedness
Ex: I want to know how well students are coping with the class
⇒ Best to ask people with higher connectivity ranking
I 2009 data from “UPenn’s ECE440”
Network Science Analytics Centrality Measures and Link Analysis 54
![Page 55: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/55.jpg)
Ranked class graph
Aarti Kochhar
Ranga Ramachandran
Saksham Karwal
Aditya Kaji
Alexandra Malikova
Pia Ramchandani
Amanda Smith
Amanda Zwarenstein
Jane Kim
Katie Joo
Lisa Zheng
Michael Harker
Rebecca Gittler
Lindsey Eatough
Ankit Aggarwal
Priya Takiar
Anthony Dutcher
Ciara Kennedy
Carolina Lee
Daniela Savoia
Robert Feigenberg
Ceren Dumaz
Charles JeonChris Setian
Eric Lamb
Pallavi Yerramilli
Ella Kim
Jacci JeffriesHarish Venkatesan
Ivan LevcovitzJesse Beyroutey
Jihyoung Ahn
Madhur Agarwal
Owen Tian
Xiang-Li Lim
Paul Deren
Varun Balan
Thomas Cassel
Shahid Bosan
Sugyan Lohiaa
Network Science Analytics Centrality Measures and Link Analysis 55
![Page 56: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/56.jpg)
Convergence metrics
I Recall r is vector of ranks and rn of rank iterates
I By definition limn→∞
rn = r . How fast rn converges to r (r given)?
I Can measure by `2 distance between r and rn
ζn := ‖r − rn‖2 =( J∑
i=1
(rni − ri )2
)1/2
I If interest is only on highest ranked nodes, e.g., a web search
⇒ Denote r (i) as the index of the i-th highest ranked node
⇒ Let r(i)n be the index of the i-th highest ranked node at time n
I First element wrongly ranked at time n
ξn := argmini{r (i) 6= r (i)n }
Network Science Analytics Centrality Measures and Link Analysis 56
![Page 57: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/57.jpg)
Evaluation of convergence metrics
Distance
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 1000010−2
10−1
100
101
time (n)
corre
ctly
rank
ed n
odes
First element wrongly ranked
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000
2
4
6
8
10
12
14
time (n)
corre
ctly
rank
ed n
odes
I Distance close to 10−2 in≈ 5× 103 iterations
I Bad: Two highest ranksin ≈ 4× 103 iterations
I Awful: Six best ranks in≈ 8× 103 iterations
I (Very) slow convergence
Network Science Analytics Centrality Measures and Link Analysis 57
![Page 58: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/58.jpg)
When does this algorithm converge?
I Cannot confidently claim convergence until 105 iterations
⇒ Beyond particular case, slow convergence inherent to algorithm
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2x 105
0
5
10
15
20
25
30
35
40
time (n)
corre
ctly r
anke
d no
des
I Example has 40 nodes, want to use in network with 109 nodes!
⇒ Leverage properties of MCs to obtain a faster algorithm
Network Science Analytics Centrality Measures and Link Analysis 58
![Page 59: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/59.jpg)
PageRank: Fast algorithms
Centrality measures
Case study: Stability of centrality measures in weighted graphs
Centrality, link analysis and web search
A primer on Markov chains
PageRank as a random walk
PageRank algorithm leveraging Markov chain structure
Network Science Analytics Centrality Measures and Link Analysis 59
![Page 60: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/60.jpg)
Limit probabilities
I Recall definition of rank ⇒ ri := limn→∞
1
n
n∑m=1
I {Am = i}
I Rank is time average of number of state visits in a MC
⇒ Can be as well obtained from limiting probabilities
I Recall transition probabilities ⇒ Pij =1
Ni, for all j ∈ n(i)
I Stationary distribution π = [π1, π1, . . . , πJ ]T solution of
πi =∑
j∈n−1(i)
Pj iπj =∑
j∈n−1(i)
πj
Njfor all i
⇒ Plus normalization equation∑J
i=1 πi = 1
I As per ergodicity of MC (strongly connected G ) ⇒ r = π
Network Science Analytics Centrality Measures and Link Analysis 60
![Page 61: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/61.jpg)
Matrix notation, eigenvalue problem
I As always, can define matrix P with elements Pij
πi =∑
j∈n−1(i)
Pjiπj =J∑
j=1
Pjiπj for all i
I Right hand side is just definition of a matrix product leading to
π = PTπ, πT1 = 1
⇒ Also added normalization equation
I Idea: solve system of linear equations or eigenvalue problem on PT
⇒ Requires matrix P available at a central location
⇒ Computationally costly (sparse matrix P with 1018 entries)
Network Science Analytics Centrality Measures and Link Analysis 61
![Page 62: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/62.jpg)
What are limit probabilities?
I Let pi (n) denote probability of agent A visiting node i at time n
pi (n) := P (An = i)
I Probabilities at time n + 1 and n can be related
P (An+1 = i) =∑
j∈n−1(i)
P(An+1 = i
∣∣An = j)P (An = j)
I Which is, of course, probability propagation in a MC
pi (n + 1) =∑
j∈n−1(i)
Pjipj(n)
I By definition limit probabilities are (let p(n) = [p1(n), . . . , pJ(n)]T )
limn→∞
p(n) = π = r
⇒ Compute ranks from limit of propagated probabilities
Network Science Analytics Centrality Measures and Link Analysis 62
![Page 63: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/63.jpg)
Probability propagation
I Can also write probability propagation in matrix form
pi (n + 1) =∑
j∈n−1(i)
Pjipj(n) =J∑
j=1
Pjipj(n) for all i
I Right hand side is just definition of a matrix product leading to
p(n + 1) = PTp(n)
I Idea: can approximate rank by large n probability distribution
⇒ r = limn→∞
p(n) ≈ p(n) for n sufficiently large
Network Science Analytics Centrality Measures and Link Analysis 63
![Page 64: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/64.jpg)
Ranking algorithm
I Algorithm is just a recursive matrix product, a power iteration
Output : Vector r(i) with ranking of node iInput : Scalar n indicating maximum number of iterationsInput : Matrix P containing transition probabilities
m = 1; % Initialize timer=(1/J)ones(J,1); % Initial distribution uniform across all nodeswhile m < n do
r = PT r; % Probability propagationm = m + 1;
end
Network Science Analytics Centrality Measures and Link Analysis 64
![Page 65: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/65.jpg)
Interpretation of probability propagation
I Q: Why does the random walk converge so slow?
I A: Need to register a large number of agent visits to every state
Ex: 40 nodes, say 100 visits to each ⇒ 4× 103 iters.
I Smart idea: Unleash a large number of agents K
ri = limn→∞
1
n
n∑m=1
1
K
K∑k=1
I {Akm = i}
I Visits are now spread over time and space
⇒ Converges “K times faster”
⇒ But haven’t changed computational cost
Network Science Analytics Centrality Measures and Link Analysis 65
![Page 66: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/66.jpg)
Interpretation of prob. propagation (continued)
I Q: What happens if we unleash infinite number of agents K?
ri = limn→∞
1
n
n∑m=1
limK→∞
1
K
K∑k=1
I {Akm = i}
I Using law of large numbers and expected value of indicator function
ri = limn→∞
1
n
n∑m=1
E [I {Am = i}] = limn→∞
1
n
n∑m=1
P (Am = i)
I Graph walk is an ergodic MC, then limm→∞
P (Am = i) exists, and
ri = limn→∞
1
n
n∑m=1
pi (m) = limn→∞
pi (n)
⇒ Probability propagation ≈ Unleashing infinitely many agents
Network Science Analytics Centrality Measures and Link Analysis 66
![Page 67: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/67.jpg)
Distance to rank
I Initialize with uniform probability distribution ⇒ p(0) = (1/J)1
⇒ Plot distance between p(n) and r
0 20 40 60 80 100 120 14010−4
10−3
10−2
10−1
100
time (n)
Dis
tanc
e
I Distance is 10−2 in ≈ 30 iters., 10−4 in ≈ 140 iters.
⇒ Convergence two orders of magnitude faster than random walk
Network Science Analytics Centrality Measures and Link Analysis 67
![Page 68: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/68.jpg)
Number of nodes correctly ranked
I Rank of highest ranked node that is wrongly ranked by time n
0 20 40 60 80 100 120 1400
5
10
15
20
25
30
35
40
time (n)
corre
ctly
rank
ed n
odes
I Not bad: All nodes correctly ranked in 120 iterations
I Good: Ten best ranks in 70 iterations
I Great: Four best ranks in 20 iterations
Network Science Analytics Centrality Measures and Link Analysis 68
![Page 69: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/69.jpg)
Distributed algorithm to compute ranks
I Nodes want to compute their rank ri
⇒ Can communicate with neighbors only (incoming + outgoing)
⇒ Access to neighborhood information only
I Recall probability update
pi (n + 1) =∑
j∈n−1(i)
Pjipj(n) =∑
j∈n−1(i)
1
Njpj(n)
⇒ Uses local information only
I Distributed algorithm. Nodes keep local rank estimates ri (n)I Receive rank (probability) estimates rj(n) from neighbors j ∈ n−1(i)I Update local rank estimate ri (n + 1) =
∑j∈n−1(i) rj(n)/Nj
I Communicate rank estimate ri (n + 1) to outgoing neighbors j ∈ n(i)
I Only need to know the number of neighbors of my neighbors
Network Science Analytics Centrality Measures and Link Analysis 69
![Page 70: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/70.jpg)
Distributed implementation of random walk
I Can communicate with neighbors only (incoming + outgoing)
⇒ But cannot access neighborhood information
⇒ Pass agent (‘hot potato’) around
I Local rank estimates ri (n) and counter with number of visits Vi
I Algorithm run by node i at time n
if Agent received from neighbor thenVi = Vi + 1Choose random neighborSend agent to chosen neighbor
endn = n + 1; ri (n) = Vi/n;
I Speed up convergence by generating many agents to pass around
Network Science Analytics Centrality Measures and Link Analysis 70
![Page 71: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/71.jpg)
Comparison of different algorithms
I Random walk (RW) implementation
⇒ Most secure. No information shared with other nodes
⇒ Implementation can be distributed
⇒ Convergence exceedingly slow
I System of linear equations
⇒ Least security. Graph in central server
⇒ Distributed implementation not clear
⇒ Convergence not an issue
⇒ But computationally costly to obtain approximate solutions
I Probability propagation
⇒ Somewhat secure. Information shared with neighbors only
⇒ Implementation can be distributed
⇒ Convergence rate acceptable (orders of magnitude faster than RW)
Network Science Analytics Centrality Measures and Link Analysis 71
![Page 72: Centrality Measures and Link Analysis](https://reader030.fdocuments.us/reader030/viewer/2022012619/61a0715770e4ee64db5749fb/html5/thumbnails/72.jpg)
Glossary
I Centrality measure
I Closeness centrality
I Dijkstra’s algorithm
I Betweenness centrality
I Information controller
I Eigenvector centrality
I Perron’s Theorem
I Power method
I Information retrieval
I Link analysis
I Repeated improvement
I Hubs and authorities
I HITS algorithm
I PageRank
I Spider traps
I Scaled PageRank updates
I Ergodic Markov chain
I Limiting probabilities
I Random walk on a graph
I Long-run fraction of state visits
I Probability propagation
I Distributed algorithm
Network Science Analytics Centrality Measures and Link Analysis 72