Scalable overlay Networks - University of Helsinki · 12.02.2018 HELSINGIN YLIOPISTO HELSINGFORS...

12.02.2018

HELSINGIN YLIOPISTOHELSINGFORS UNIVERSITETUNIVERSITY OF HELSINKI

Faculty of SciencesDepartment of Computer Science

1Scalableoverlay networks

Scalable overlay Networks

Dr. Samu Varjonen12.02.2018

12.02.2018




Lectures● MO 15.01. C122 Introduction. Exercises. Motivation.

● TH 18.01. DK117 Unstructured networks I

● MO 22.01. C122 Unstructured networks II

● TH 25.01. DK117 Bittorrent and evaluation

● MO 29.01. C122 Privacy (Freenet etc.) and intro to power-law networks.

● TH 01.02. DK117 Consistent hashing. Distributed Hash Tables (DHTs)

● MO 05.02. C122 DHTs continued

● TH 08.02. DK117 Power-law networks

● MO 12.02. C122 Power-law networks and applications.

● TH 15.02. DK117 Applications I

● MO 19.02. C122 Applications I

● TH 22.02. DK117 Advanced topics

● MO 26.02. C122 Conclusions and summary

● TH 01.03. DK117 Reserved

12.02.2018




Contents

● Today

– Small recap

– Small world

– Power-law networks

12.02.2018




Birds Eye View of Complex Networks

● Milgram's Experiment

– Small diameter(number of hops small)

● Duncan Watts Model

– Random Rewiring of Regular Graph

– Graph with small diameter, high clustering coefficient(many real-world networks have a small average shortest path length, but also a clustering coefficient significantly higher than expected by random chance)

12.02.2018




Graph Propertiesthat you will hear in the context of Overlay

Networks

12.02.2018




Local Clustering Coefficient

● Degree to which vertices of a graph tend to cluster together

● Quantifies how close neighbors of a vertex are to being a clique (complete graph)

● Computing clustering coefficient of vertex

– : Neighborhood (immediately connected vertices)

– : the number vertices in

– Number of edges in neighborhood of :

– Local Clustering Coefficient =

eij == e

ji

12.02.2018




Local Clustering CoefficientExample

12.02.2018




Average Clustering Coefficient

12.02.2018




Graph

● Distance between two vertices

– Number of vertices in the shortest path between ● Diameter: Greatest distance (eccentricity)

between any pair of vertices

● Radius: Minimum distance (of those eccentricities) between any pair of vertices

12.02.2018




Graph

● Larger eccentricities occur at the sides of the graph

● The smallest eccentricity occurs at the central vertex

– This is your radius

– Your center consists of all the vertices which have eccentricity equal to the radius

● The vertices of the center minimizes the maximum distance to any vertex of the graph

12.02.2018




Scale-Free networks

● ”A scale-free network is a connected graph or network with the property that the number of links k originating from a given node exhibits a power law distribution”

– At least asymptotically

12.02.2018




Power-law

– A power law is a functional relationship between two quantities, where a relative change in one quantity results in a proportional relative change in the other quantity, independent of the initial size of those quantities: one quantity varies as a power of another

12.02.2018




Power-law

c : slope (gradient) of log-log plota : constant (uninteresting)

The slope of a log-log plot gives the power of the relationship, and a straight line is an indication that a power relationship exists.

12.02.2018




Power-law distribution

Selfsimilar & ScalefreeHeavy tail Small occurrences (x) extremely common Large occurrences (x) extremely rare

Micheal Mitzenmacher. "A brief history of generative models for power law and lognormal distributions." Internet mathematics 1.2 (2004): 226251.

12.02.2018




Scale Free Networks

● Probability that a node has k links is

● exponent for the degree distribution

● Why the name Scale-Free?

– Same functional form across all scales ● e.g. = 1 → P(x) = 1/x

12.02.2018




Scale Free NetworksWhat is the slope?

12.02.2018




Scale-Free Model for AS-Graph

= 2.18

AS Topology of skitter dataset parsed by SNAP team http://snap.stanford.edu/data/asskitter.html

12.02.2018




Preferential Attachment (Growth in Scale-Free Network)

● Initial number of nodes m0

● At time 't' add a new vertex j with degree m (< m0)

● The probability that vertex j creates a link with vertex i is the probability π which depends on ki

(the degree of vertex i)

● Probability that a node has k links is

● exponent for the degree distribution

Barabási, AlbertLászló, and Réka Albert. "Emergence of scaling in random networks." Science 286, no. 5439 (1999): 509512.

What does this mean?What does this mean? Those who are already wealthy receive more than those who are not

12.02.2018




Preferential Attachment

● We start with a small base graph

● At each time (t) step

● We create a new node (u)

● Draw its edges according to a predetermined distribution

● In particular, node u is joined to an existing node v with probability proportional to deg(v).

12.02.2018




Evolving Copying ModelObservations & Assumptions

● Observations of the Web-graph

– The graph is a directed graph

– Average degree of vertex is largely constant over-time

– Disjoint instances of bipartite cliques

– On creation, vertex adds edge(s) to existing vertices● Assumptions made by the model

– The directed graph evolves over discrete timesteps

– Some vertices choose their outgoing edges independently at random, while others replicate existing linkage patterns by copying edges from a random vertex

Ravi Kumar et al. "Stochastic models for the web graph." In Annual Symposium on Foundations of Computer Science, 2000.

12.02.2018




Generic Evolving Graph Model


An evolving graph model can be completely characterized by

returns an integer denoting the number of vertices to be added at time

returns the set of edges to be added at time

12.02.2018




Evolving Copying ModelLinear Growth Copying

● Parameters

– Constant out-degree

– Copy factor ● At time instant

– Add a single new vertex

– is a copy of a prototype vertex

– The outlink from is randomly chosen from with a probability and with probability it is the outlink from


12.02.2018




Evolving copying model

● Start with a small base graph

● At each time (t) step

– Create a new node (u)

– Choose an existing node v uniformly at random ● For each edge vw,

– with probability p, add the edge uw.

● Hence, the neighbourhood of the new node u will be a subset of the neighbourhood of the existing node v.

12.02.2018




Evolving copying model

v v

t t+1Time

u

12.02.2018




Zipf’s Law (example of (discrete) power-law)

● Sort words in a corpus in the decreasing order of their frequency

● The frequency of any word is inversly proportional to its rank in teh frequency tablei.e., the most common word will appear twice as often as the second, which will appear twive as often as...

https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists/PG/2006/04/110000

Ranking Word Frequency

1 the 56271872

5 in 17420636

10 he 8397205

50 when 1980046

100 men 923053

500 paid 162605

1000 names 79366

12.02.2018




Examples

http://www.cs.uoi.gr/~tsap/teaching/InformationNetworks/lectures/lecture3.ppt1

● Pareto: income distribution, 1897● Zipf-Auerbach: city sizes,

1913/1940s● Zipf-Estouf: word frequency,

1916/1940s● Lotka: bibliometrics, 1926● Yule: species and genera, 1924● Mandelbrot: economics/information

theory, 1950s● 1990- networking community● 2000- online social networks

”Universe counts in powers rather than linear progression” - S. Weingart

12.02.2018




Scale Free for Gnutella?

Gnutella August 2002 dataset parsed by SNAP team http://snap.stanford.edu/data/p2pGnutella31.html

”Gnutella overlay network has the small world characters, but it is not a scale-free network, which has developed over time following a different set of growth processes from those of the BA (Barabdsi-Albert) model.”

Analyzing the Characteristics of Gnutella Overlays, http://ieeexplore.ieee.org/document/4151850/

12.02.2018




Scale Free Networks

● Node degree distribution follows Power-law

● Can be created using

– Preferential attachment (undirected graphs)

– Copying Generative Models (directed graphs)

– … ● How realistic are these models?

– Do graphs undergo densification (degree increases with time)?

– Does the diameter decrease with time? Jure Leskovec et al. "Graphs over time: densification laws, shrinking diameters and possible explanations." In ACM SIGKDD, pp. 177187. 2005.

12.02.2018




When to use a particular model?

● What is the objective of your study?

● Why do you need the model?

● Which model will you use for

(a) Properties of a snapshot of a graph

(b) Evolution of the graph

(c) …

12.02.2018




Random Rewiring Procedure

p=0 p=1(p=0.15)

n: number of verticesk: degree of each node, i.e., neighbours of a vertex (n >> k >> ln(n) >> 1) p: probability of rewiring

n=10, k=4

Large diameterHigh Clustering

Small diameterHigh Clustering

Small diameterLow Clustering

Duncan Watts and Steven H. Strogatz. "Collective dynamics of ‘smallworld’networks." nature 393, no. 6684 (1998): 440442.

12.02.2018




Path Length & Clustering Coefficient in Small-World

Networks

n=1000, k=10

Duncan Watts and Steven H. Strogatz. "Collective dynamics of ‘smallworld’networks." nature 393, no. 6684 (1998): 440442.

Regular SmallWorld Random

12.02.2018




Decentralized Search in Small World Networks

(Motivation)● How would you find the shortest path if you

have the complete picture of the graph?

– What will be the information required to be present on each node?

● Can you find the shortest path of small world graph if nodes only have local information?

– My coordinates on the graph

– Coordinates of my neighbors

– Which algorithm would you use?

12.02.2018




Manhattan distance

● The distance between two points in a grid based on a strictly horizontal and/or vertical path (that is, along the grid lines), as opposed to the diagonal or "as the crow flies" distance.

– Simple sum of the horizontal and vertical components.

– Based on the well-known gridlikestreet geography of the New York borough of Manhattan

12.02.2018




Kleinbergs Small World

● Lattice

● Each node has short-range links to its neighbors

● Long range connection to a node selected with probability proportional to . Where is the Manhattan distance between and

● : the clustering exponent, probability of connection as a function of lattice distance

v

a

d

u

b

cr

Jon Kleinberg. "Navigation in a small world."Nature 406, no. 6798 (2000): 845845.

12.02.2018




What does Klienbergs Model abstract?

● abstracts the randomness of long links

– = 0: uniform distribution over long-range contacts (similar to rewiring graph)

– As increases, long range contacts become more clustered

● Model can be extended to d-dimensional lattices

12.02.2018




Kleinberg's Result● Assumption:

● Each node only has local information (neighbors)● d : lattice dimension n x n x …. (d times)● p : short-range links to vertices in p lattice steps● q : long-range links computed with exponent α

● Greedy Algorithm: Each message holder forwards message across a connection that brings it as close as possible to the target in lattice distance

● Message delivery time T of any decentralized algorithm

– T ≥ cnβ for β = (2 – α)/3 for 0 ≤ α < 2 and β = (α-2)/(α-1) for α > 2

– T bounded by polynomial of logN when α = 2

– c depends on α, p, and q but not n

12.02.2018




Implications of the result

● Decentralized Search requires a large number of steps (much larger than the shortest path)

– Small world captures clustering and small paths but not the ability actually find the shortest paths

– ”When this correlation (between local structure and long-range connections) is near a critical threshold, the structure of the long-range connections forms a type of gradient that allows individuals to guide a message efficiently towards a target. As the correlation drops below this critical value and the social network becomes more homogeneous, these cues begin to disappear; in the limit, when long-range connections are generated uniformly at random, the result is a world in which short chains exist but individuals, faced with a disorienting array of social contacts, are unable to find them.”

The threshold (alpha / clustering coefficient) q = 2

Scalable overlay Networks - University of Helsinki · 12.02.2018 HELSINGIN YLIOPISTO HELSINGFORS...

Documents

Transcript of Scalable overlay Networks - University of Helsinki · 12.02.2018 HELSINGIN YLIOPISTO HELSINGFORS...