Scalable overlay Networks - University of Helsinki · 12.02.2018 HELSINGIN YLIOPISTO HELSINGFORS...
Transcript of Scalable overlay Networks - University of Helsinki · 12.02.2018 HELSINGIN YLIOPISTO HELSINGFORS...
12.02.2018
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI
Faculty of SciencesDepartment of Computer Science
1Scalableoverlay networks
Scalable overlay Networks
Dr. Samu Varjonen12.02.2018
12.02.2018
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI
Faculty of SciencesDepartment of Computer Science
2Scalableoverlay networks
Lectures● MO 15.01. C122 Introduction. Exercises. Motivation.
● TH 18.01. DK117 Unstructured networks I
● MO 22.01. C122 Unstructured networks II
● TH 25.01. DK117 Bittorrent and evaluation
● MO 29.01. C122 Privacy (Freenet etc.) and intro to power-law networks.
● TH 01.02. DK117 Consistent hashing. Distributed Hash Tables (DHTs)
● MO 05.02. C122 DHTs continued
● TH 08.02. DK117 Power-law networks
● MO 12.02. C122 Power-law networks and applications.
● TH 15.02. DK117 Applications I
● MO 19.02. C122 Applications I
● TH 22.02. DK117 Advanced topics
● MO 26.02. C122 Conclusions and summary
● TH 01.03. DK117 Reserved
12.02.2018
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI
Faculty of SciencesDepartment of Computer Science
3Scalableoverlay networks
Contents
● Today
– Small recap
– Small world
– Power-law networks
12.02.2018
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI
Faculty of SciencesDepartment of Computer Science
4Scalableoverlay networks
Birds Eye View of Complex Networks
● Milgram's Experiment
– Small diameter(number of hops small)
● Duncan Watts Model
– Random Rewiring of Regular Graph
– Graph with small diameter, high clustering coefficient(many real-world networks have a small average shortest path length, but also a clustering coefficient significantly higher than expected by random chance)
12.02.2018
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI
Faculty of SciencesDepartment of Computer Science
5Scalableoverlay networks
Graph Propertiesthat you will hear in the context of Overlay
Networks
12.02.2018
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI
Faculty of SciencesDepartment of Computer Science
6Scalableoverlay networks
Local Clustering Coefficient
● Degree to which vertices of a graph tend to cluster together
● Quantifies how close neighbors of a vertex are to being a clique (complete graph)
● Computing clustering coefficient of vertex
– : Neighborhood (immediately connected vertices)
– : the number vertices in
– Number of edges in neighborhood of :
– Local Clustering Coefficient =
eij == e
ji
12.02.2018
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI
Faculty of SciencesDepartment of Computer Science
7Scalableoverlay networks
Local Clustering CoefficientExample
12.02.2018
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI
Faculty of SciencesDepartment of Computer Science
8Scalableoverlay networks
Average Clustering Coefficient
12.02.2018
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI
Faculty of SciencesDepartment of Computer Science
9Scalableoverlay networks
Graph
● Distance between two vertices
– Number of vertices in the shortest path between ● Diameter: Greatest distance (eccentricity)
between any pair of vertices
● Radius: Minimum distance (of those eccentricities) between any pair of vertices
12.02.2018
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI
Faculty of SciencesDepartment of Computer Science
10Scalableoverlay networks
Graph
● Larger eccentricities occur at the sides of the graph
● The smallest eccentricity occurs at the central vertex
– This is your radius
– Your center consists of all the vertices which have eccentricity equal to the radius
● The vertices of the center minimizes the maximum distance to any vertex of the graph
12.02.2018
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI
Faculty of SciencesDepartment of Computer Science
11Scalableoverlay networks
Scale-Free networks
● ”A scale-free network is a connected graph or network with the property that the number of links k originating from a given node exhibits a power law distribution”
– At least asymptotically
12.02.2018
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI
Faculty of SciencesDepartment of Computer Science
12Scalableoverlay networks
Power-law
– A power law is a functional relationship between two quantities, where a relative change in one quantity results in a proportional relative change in the other quantity, independent of the initial size of those quantities: one quantity varies as a power of another
12.02.2018
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI
Faculty of SciencesDepartment of Computer Science
13Scalableoverlay networks
Power-law
c : slope (gradient) of log-log plota : constant (uninteresting)
The slope of a log-log plot gives the power of the relationship, and a straight line is an indication that a power relationship exists.
12.02.2018
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI
Faculty of SciencesDepartment of Computer Science
14Scalableoverlay networks
Power-law distribution
Selfsimilar & ScalefreeHeavy tail Small occurrences (x) extremely common Large occurrences (x) extremely rare
Micheal Mitzenmacher. "A brief history of generative models for power law and lognormal distributions." Internet mathematics 1.2 (2004): 226251.
12.02.2018
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI
Faculty of SciencesDepartment of Computer Science
15Scalableoverlay networks
Scale Free Networks
● Probability that a node has k links is
● exponent for the degree distribution
● Why the name Scale-Free?
– Same functional form across all scales ● e.g. = 1 → P(x) = 1/x
12.02.2018
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI
Faculty of SciencesDepartment of Computer Science
16Scalableoverlay networks
Scale Free NetworksWhat is the slope?
12.02.2018
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI
Faculty of SciencesDepartment of Computer Science
17Scalableoverlay networks
Scale-Free Model for AS-Graph
= 2.18
AS Topology of skitter dataset parsed by SNAP team http://snap.stanford.edu/data/asskitter.html
12.02.2018
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI
Faculty of SciencesDepartment of Computer Science
18Scalableoverlay networks
Preferential Attachment (Growth in Scale-Free Network)
● Initial number of nodes m0
● At time 't' add a new vertex j with degree m (< m0)
● The probability that vertex j creates a link with vertex i is the probability π which depends on ki
(the degree of vertex i)
● Probability that a node has k links is
● exponent for the degree distribution
Barabási, AlbertLászló, and Réka Albert. "Emergence of scaling in random networks." Science 286, no. 5439 (1999): 509512.
What does this mean?What does this mean? Those who are already wealthy receive more than those who are not
12.02.2018
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI
Faculty of SciencesDepartment of Computer Science
19Scalableoverlay networks
Preferential Attachment
● We start with a small base graph
● At each time (t) step
● We create a new node (u)
● Draw its edges according to a predetermined distribution
● In particular, node u is joined to an existing node v with probability proportional to deg(v).
12.02.2018
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI
Faculty of SciencesDepartment of Computer Science
20Scalableoverlay networks
Evolving Copying ModelObservations & Assumptions
● Observations of the Web-graph
– The graph is a directed graph
– Average degree of vertex is largely constant over-time
– Disjoint instances of bipartite cliques
– On creation, vertex adds edge(s) to existing vertices● Assumptions made by the model
– The directed graph evolves over discrete timesteps
– Some vertices choose their outgoing edges independently at random, while others replicate existing linkage patterns by copying edges from a random vertex
Ravi Kumar et al. "Stochastic models for the web graph." In Annual Symposium on Foundations of Computer Science, 2000.
12.02.2018
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI
Faculty of SciencesDepartment of Computer Science
21Scalableoverlay networks
Generic Evolving Graph Model
Ravi Kumar et al. "Stochastic models for the web graph." In Annual Symposium on Foundations of Computer Science, 2000.
An evolving graph model can be completely characterized by
returns an integer denoting the number of vertices to be added at time
returns the set of edges to be added at time
12.02.2018
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI
Faculty of SciencesDepartment of Computer Science
22Scalableoverlay networks
Evolving Copying ModelLinear Growth Copying
● Parameters
– Constant out-degree
– Copy factor ● At time instant
– Add a single new vertex
– is a copy of a prototype vertex
– The outlink from is randomly chosen from with a probability and with probability it is the outlink from
Ravi Kumar et al. "Stochastic models for the web graph." In Annual Symposium on Foundations of Computer Science, 2000.
12.02.2018
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI
Faculty of SciencesDepartment of Computer Science
23Scalableoverlay networks
Evolving copying model
● Start with a small base graph
● At each time (t) step
– Create a new node (u)
– Choose an existing node v uniformly at random ● For each edge vw,
– with probability p, add the edge uw.
● Hence, the neighbourhood of the new node u will be a subset of the neighbourhood of the existing node v.
12.02.2018
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI
Faculty of SciencesDepartment of Computer Science
24Scalableoverlay networks
Evolving copying model
v v
t t+1Time
u
12.02.2018
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI
Faculty of SciencesDepartment of Computer Science
25Scalableoverlay networks
Zipf’s Law (example of (discrete) power-law)
● Sort words in a corpus in the decreasing order of their frequency
● The frequency of any word is inversly proportional to its rank in teh frequency tablei.e., the most common word will appear twice as often as the second, which will appear twive as often as...
https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists/PG/2006/04/110000
Ranking Word Frequency
1 the 56271872
5 in 17420636
10 he 8397205
50 when 1980046
100 men 923053
500 paid 162605
1000 names 79366
12.02.2018
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI
Faculty of SciencesDepartment of Computer Science
26Scalableoverlay networks
Examples
http://www.cs.uoi.gr/~tsap/teaching/InformationNetworks/lectures/lecture3.ppt1
● Pareto: income distribution, 1897● Zipf-Auerbach: city sizes,
1913/1940s● Zipf-Estouf: word frequency,
1916/1940s● Lotka: bibliometrics, 1926● Yule: species and genera, 1924● Mandelbrot: economics/information
theory, 1950s● 1990- networking community● 2000- online social networks
”Universe counts in powers rather than linear progression” - S. Weingart
12.02.2018
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI
Faculty of SciencesDepartment of Computer Science
27Scalableoverlay networks
Scale Free for Gnutella?
Gnutella August 2002 dataset parsed by SNAP team http://snap.stanford.edu/data/p2pGnutella31.html
”Gnutella overlay network has the small world characters, but it is not a scale-free network, which has developed over time following a different set of growth processes from those of the BA (Barabdsi-Albert) model.”
Analyzing the Characteristics of Gnutella Overlays, http://ieeexplore.ieee.org/document/4151850/
12.02.2018
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI
Faculty of SciencesDepartment of Computer Science
28Scalableoverlay networks
Scale Free Networks
● Node degree distribution follows Power-law
● Can be created using
– Preferential attachment (undirected graphs)
– Copying Generative Models (directed graphs)
– … ● How realistic are these models?
– Do graphs undergo densification (degree increases with time)?
– Does the diameter decrease with time? Jure Leskovec et al. "Graphs over time: densification laws, shrinking diameters and possible explanations." In ACM SIGKDD, pp. 177187. 2005.
12.02.2018
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI
Faculty of SciencesDepartment of Computer Science
29Scalableoverlay networks
When to use a particular model?
● What is the objective of your study?
● Why do you need the model?
● Which model will you use for
(a) Properties of a snapshot of a graph
(b) Evolution of the graph
(c) …
12.02.2018
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI
Faculty of SciencesDepartment of Computer Science
30Scalableoverlay networks
Random Rewiring Procedure
p=0 p=1(p=0.15)
n: number of verticesk: degree of each node, i.e., neighbours of a vertex (n >> k >> ln(n) >> 1) p: probability of rewiring
n=10, k=4
Large diameterHigh Clustering
Small diameterHigh Clustering
Small diameterLow Clustering
Duncan Watts and Steven H. Strogatz. "Collective dynamics of ‘smallworld’networks." nature 393, no. 6684 (1998): 440442.
12.02.2018
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI
Faculty of SciencesDepartment of Computer Science
31Scalableoverlay networks
Path Length & Clustering Coefficient in Small-World
Networks
n=1000, k=10
Duncan Watts and Steven H. Strogatz. "Collective dynamics of ‘smallworld’networks." nature 393, no. 6684 (1998): 440442.
Regular SmallWorld Random
12.02.2018
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI
Faculty of SciencesDepartment of Computer Science
32Scalableoverlay networks
Decentralized Search in Small World Networks
(Motivation)● How would you find the shortest path if you
have the complete picture of the graph?
– What will be the information required to be present on each node?
● Can you find the shortest path of small world graph if nodes only have local information?
– My coordinates on the graph
– Coordinates of my neighbors
– Which algorithm would you use?
12.02.2018
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI
Faculty of SciencesDepartment of Computer Science
33Scalableoverlay networks
Manhattan distance
● The distance between two points in a grid based on a strictly horizontal and/or vertical path (that is, along the grid lines), as opposed to the diagonal or "as the crow flies" distance.
– Simple sum of the horizontal and vertical components.
– Based on the well-known gridlikestreet geography of the New York borough of Manhattan
12.02.2018
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI
Faculty of SciencesDepartment of Computer Science
34Scalableoverlay networks
Kleinbergs Small World
● Lattice
● Each node has short-range links to its neighbors
● Long range connection to a node selected with probability proportional to . Where is the Manhattan distance between and
● : the clustering exponent, probability of connection as a function of lattice distance
v
a
d
u
b
cr
Jon Kleinberg. "Navigation in a small world."Nature 406, no. 6798 (2000): 845845.
12.02.2018
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI
Faculty of SciencesDepartment of Computer Science
35Scalableoverlay networks
What does Klienbergs Model abstract?
● abstracts the randomness of long links
– = 0: uniform distribution over long-range contacts (similar to rewiring graph)
– As increases, long range contacts become more clustered
● Model can be extended to d-dimensional lattices
12.02.2018
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI
Faculty of SciencesDepartment of Computer Science
36Scalableoverlay networks
Kleinberg's Result● Assumption:
● Each node only has local information (neighbors)● d : lattice dimension n x n x …. (d times)● p : short-range links to vertices in p lattice steps● q : long-range links computed with exponent α
● Greedy Algorithm: Each message holder forwards message across a connection that brings it as close as possible to the target in lattice distance
● Message delivery time T of any decentralized algorithm
– T ≥ cnβ for β = (2 – α)/3 for 0 ≤ α < 2 and β = (α-2)/(α-1) for α > 2
– T bounded by polynomial of logN when α = 2
– c depends on α, p, and q but not n
12.02.2018
HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI
Faculty of SciencesDepartment of Computer Science
37Scalableoverlay networks
Implications of the result
● Decentralized Search requires a large number of steps (much larger than the shortest path)
– Small world captures clustering and small paths but not the ability actually find the shortest paths
– ”When this correlation (between local structure and long-range connections) is near a critical threshold, the structure of the long-range connections forms a type of gradient that allows individuals to guide a message efficiently towards a target. As the correlation drops below this critical value and the social network becomes more homogeneous, these cues begin to disappear; in the limit, when long-range connections are generated uniformly at random, the result is a world in which short chains exist but individuals, faced with a disorienting array of social contacts, are unable to find them.”
The threshold (alpha / clustering coefficient) q = 2