DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING...
Transcript of DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING...
![Page 1: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/1.jpg)
DISCOVERING IMPORTANT NODES AND EDGES
![Page 2: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/2.jpg)
MACRO-MICRO
• First course: description of the graph at the macro level
• Second course: micro level
• How to describe each element ?
• How to find “exceptional” elements ?
![Page 3: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/3.jpg)
NODE
• We can measure nodes importance using so-called centrality.
• Bad term: nothing to do with being central in general
• Common practice: run many centralities and check relation between centralities and properties/identity of nodes
![Page 4: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/4.jpg)
NODE DEGREE
• Degree: how many neighbors
• Often enough to find important nodes‣ Main characters of a series talk with the more people‣ Largest airports have the most connections‣ …
• But not always‣ Facebook users with the most friends are spam‣ Webpages/wikipedia pages with most links are simple lists of references‣ …
![Page 5: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/5.jpg)
NODE DEGREE
• In directed networks, degree is split in:‣ In-degree‣ Out-degree
• Example: web pages:‣ Highest out-degree: list of references‣ Highest in-degree: website that attracts a lot of link: probably interesting
![Page 6: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/6.jpg)
NODE STRENGTH
• Strength: Degree in a weighted network
• Sum of the weight of adjacent edges
![Page 7: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/7.jpg)
NODE CLUSTERING COEFFICIENT
• Clustering coefficient: already seen for global analysis
• The local version
• Tells you if the neighbors of the node are connected
• Be careful! ‣ Degree 2: value 0 or 1‣ Degree 1000: Not 0 or 1 (usually)‣ Ranking them is not meaningful
![Page 8: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/8.jpg)
NODE CLUSTERING COEFFICIENT
• Clustering coefficient: already seen for global analysis
• Can be used as a proxy for “communities” belonging:‣ If node belong to single group: high CC‣ If node belong to several groups: lower CC
![Page 9: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/9.jpg)
NODE BETWEENNESS
• Betweenness centrality:‣ 1)compute all shortest paths between all nodes‣ 2)count the fraction of them going through the node
• Idea: if the node is “between” many nodes, then it is important.
• Related to the notion of “flow” of information in the network
![Page 10: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/10.jpg)
NODE BETWEENNESS• Betweenness centrality:
![Page 11: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/11.jpg)
NODE BETWEENNESS
• Betweenness centrality:
• Computationally intractable
• Common approximation:‣ Compute random paths between k nodes (e.g. k=100)
![Page 12: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/12.jpg)
NODE PAGERANK
![Page 13: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/13.jpg)
NODE PAGERANK
• Idea: ranking webpages by relevance
• Problems with in-degree: ‣ Easy to fool‣ Where the link comes from Is ignored
![Page 14: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/14.jpg)
NODE PAGERANK• Solution: Give a score of “authority” to each node determining
the score of other nodes
• Interpretation:‣ Likelihood to reach a particular page by clicking links at random
• Parameter:‣ Probability of random hop anywhere to avoid dead end biases
• Computation:‣ Principal eigenvector of the normalized link matrix (including random hops)‣ Power method: random walks on the graph
![Page 15: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/15.jpg)
NODE PAGERANK
• Interpretation: A node is important if many important nodes are linking to it.
• Often correlated with in-degree
• Allow to find tops in hierarchical structures:‣ Commoners talk to local deputy, deputy talk to ministers, ministers talk to the
president: the president has low in-degree, but high pagerank.
![Page 16: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/16.jpg)
NODE PAGERANK
• Then how do Google rank when we do a research?
• Create a subgraph of documents related to our topic
• Compute pagerank
• (Of course now it is much more complex…)
![Page 17: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/17.jpg)
EIGENVECTOR CENTRALITY
• Corresponding value of the eigenvector corresponding to the highest eigenvalue of the adjacency matrix
• Crude version of the PageRank
![Page 18: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/18.jpg)
KATZ CENTRALITY• Variant of the PageRank & Eigenvector centrality
![Page 19: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/19.jpg)
KATZ CENTRALITY• Variant of the PageRank & Eigenvector centrality
Katz centrality of node i=
![Page 20: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/20.jpg)
KATZ CENTRALITY• Variant of the PageRank & Eigenvector centrality
Repeat for all distances as long As possible (convergence)
![Page 21: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/21.jpg)
KATZ CENTRALITY• Variant of the PageRank & Eigenvector centrality
Sum for each node j
![Page 22: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/22.jpg)
KATZ CENTRALITY• Variant of the PageRank & Eigenvector centrality
Alpha is a parameter.Its strength decreases at
each iteration
![Page 23: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/23.jpg)
KATZ CENTRALITY• Variant of the PageRank & Eigenvector centrality
Number of different paths from I to jof length k
![Page 24: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/24.jpg)
KATZ CENTRALITY• Variant of the PageRank & Eigenvector centrality
Sum of paths to all other nodes at each distance multiplied by a factor decreasing
with distance
![Page 25: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/25.jpg)
NODE CLOSENESS
• Farness: sum of length of shortest paths to all other nodes.
• Closeness: inverse of the Farness
‣ Highest closeness = More central
![Page 26: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/26.jpg)
NODE CLOSENESS
![Page 27: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/27.jpg)
HARMONIC CENTRALITY
• Harmonic centrality related to closeness centrality
Closeness
![Page 28: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/28.jpg)
OTHERS
• Many other centralities have been proposed
• The problem is how to interpret them ?
• Can be used as supervised tool:‣ Compute many centralities on all nodes‣ Learn how to combine them to find chosen nodes‣ Discover new similar nodes‣ (roles in social networks, key elements in an infrastructure, …)
![Page 29: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/29.jpg)
? Which is which
Harmonic Closeness
BetweennessEigenvector
KatzDegree
![Page 30: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/30.jpg)
A: BetweennessB:Closeness
C:EigenvectorD:Degree
E:Harmonic F: Katz
![Page 31: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/31.jpg)
Try again :)
DegreeBetweenness
ClosenessEigenvector
![Page 32: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/32.jpg)
Try again :)
A: DegreeB:Closeness
C: BetweennessD: Eigenvector
![Page 33: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/33.jpg)
EDGES CENTRALITIES
![Page 34: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/34.jpg)
EDGES
• Most centralities can be computed for edges
• Methods based on flow are more natural for edges:‣ Edge betweenness centrality: how many shortest paths go through the
edge
sigma: shortest paths
![Page 35: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/35.jpg)
EDGES
Can you guess the edges ofhighest betweenness in
the European rail network ?
![Page 36: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/36.jpg)
K-PATH EDGE CENTRALITY
s: source node
K-path: random walk of distance k
![Page 37: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/37.jpg)
CURRENT-FLOW BETWEENNESS
Analogy with electrical circuit
How much voltage at the node if unit injected at random node and collected at other random node (average)
The current flowing through the ith vertex is given by a half of the sum of the absolute values of the currents flowing along the edges incident on that vertex
Average current flow for all pairs
![Page 38: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/38.jpg)
CURRENT-FLOW BETWEENNESS
Also called Random walk betweenness
Average probability to go through the edge in a random walk from
U to V for all pairs of nodes (U,V)
![Page 39: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/39.jpg)
COMMUNICABILITY BETWEENNESS CENTRALITY
![Page 40: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/40.jpg)
COMMUNICABILITY BETWEENNESS CENTRALITY
Number of shortest paths of length< s
![Page 41: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/41.jpg)
COMMUNICABILITY BETWEENNESS CENTRALITY
Number of shortest paths of length> s
![Page 42: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/42.jpg)
COMMUNICABILITY BETWEENNESS CENTRALITY
Score for paths going though r/ scores for all paths
![Page 43: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/43.jpg)
EDGE CENTRALITIES
• And of course, many more
![Page 44: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/44.jpg)
SOME EXAMPLES ON REAL NETWORKS
![Page 45: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/45.jpg)
WIKIPEDIA
• What are the most important pages on Wikipedia ?
• Wikipedia network:‣ Nodes are pages‣ Links are hypertext links
• Wikipedia in english: Cultural bias !
• Results from http://wikirank-2018.di.unimi.it
![Page 46: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/46.jpg)
WIKIPEDIA
(2018)
![Page 47: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/47.jpg)
WIKIPEDIAMovies
![Page 48: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/48.jpg)
WIKIPEDIAColombian personalities
![Page 49: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/49.jpg)
WIKIPEDIAColombia
![Page 50: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/50.jpg)
WIKIPEDIAhttps://www.sixdegreesofwikipedia.com
(side note:shortest pathsIn wikipedia)
![Page 51: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/51.jpg)
USAGES OF CENTRALITIES
• Identifying important nodes/edges‣ Search on web or any document base‣ Recommendation (products…)‣ Social Network analysis (criminal networks..)
• Identify critical nodes/edges:‣ High betweenness: a “bridge”, affect flow if disappear‣ High PageRank: on dependance graphs, many depends on that element (supply
chain, production line ..)
• Visualisation: size of nodes determined by centrality
![Page 52: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/52.jpg)
LIBRARIES FOR GRAPH MANIPULATION
![Page 53: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/53.jpg)
THE FASTEST
• Standford Network Analysis Project library
• http://snap.stanford.edu
• Built by Juri Leskovec (Prof. Standford)
• C++ / Python interface
• Not many built-in functions but all the building-blocks
• Single machine (advantages and drawbacks)
![Page 54: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/54.jpg)
THE MOST STATISTIC
• Graph tools
• https://graph-tool.skewed.de
• C++ / Python interface
• Richer than SNAP, fast
• Best for statistical inference (Stochastic Block Model, see later)
• Cons: often difficulties to install it
![Page 55: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/55.jpg)
THE SIMPLEST
• Networkx
• https://networkx.github.io
• Python
• Very simple syntax
• A lot of already implemented functions
• Cons: do not scale for large graphs
![Page 56: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/56.jpg)
AND THE OTHERS• Big Data framework
‣ Apache Giraf : http://giraph.apache.org‣ Spark GraphX: https://spark.apache.org/graphx/‣ + Efficient in distributed multi-computer environment‣ - few functions, poorly documented
‣ JAVA‣ Jung (http://jung.sourceforge.net)‣ Graph-stream (http://graphstream-project.org) => Dynamic graphs‣ (poorly maintained)
• Other famous one:‣ Igraph (http://igraph.org) (R/C/Python) (2nd best for many things)‣ …
![Page 57: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/57.jpg)
NETWORKX
• For this class, I propose to use networkx‣ Already included in Anaconda (standard python package)‣ Easiest to use (in my opinion)
• You are free to try another library, they all have strengths and weaknesses
![Page 58: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/58.jpg)
NETWORKX
• Where to start:
• Tutorial on the website‣ https://networkx.github.io/documentation/stable/tutorial.html
• Page “reference” list all methods organized in categories
![Page 59: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/59.jpg)
NETWORKX
![Page 60: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/60.jpg)
NETWORKX
• Drawing with networkx: not recommended.
• Better to export and load with Gephi
![Page 61: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/61.jpg)
NETWORKX • Proposed working environment:
‣ Python 3 ‣ Jupiter notebook
- Perfect for experimenting (avoid reloading graphs…)- Make work easily reproducible
‣ IDE/Text editor for more complex functions- You can call these functions form your notebook
• Additional libraries‣ Pandas (handling tabular data: similar to R, spreadsheet logic)‣ Seaborn (for plotting faster)‣ Sklearn (data mining)‣ =>They simplify things, not complexity :)
![Page 62: DISCOVERING IMPORTANT NODES AND EDGEScazabetremy.fr/Teaching/catedra/2-IdentifyNE.pdfNODE CLUSTERING COEFFICIENT • Clustering coefficient: already seen for global analysis • The](https://reader036.fdocuments.us/reader036/viewer/2022062602/5eb9af96949e295e3441062e/html5/thumbnails/62.jpg)
NETWORKX Demo
(for airports: file with coordinates and countries)
2)Centrality of airports in Colombia?
3)What if we take only South-American countries?
4)Can we compare graphs by continents?
5)The graph of connections between countries?6)Exporting and plotting the graph…
1)Compute other centralities/network measures