users.soict.hust.edu.vn · Analysis and Models for Small-World Graphs⁄ Van Nguyen and Chip Martel...

18
Analysis and Models for Small-World Graphs * Van Nguyen and Chip Martel {martel,nguyenvk}@cs.ucdavis.edu Computer Science, UC Davis, CA 95616 Abstract We study variants of Kleinberg’s small-world model where we start with a k-dimensional grid and add a random directed edge from each node. The probability u 0 s random edge is to v is proportional to d(u, v) -r where d(u, v) is the lattice distance and r is a parameter of the model. For a k-dimensional grid, we show that these graphs have poly-log expected diameter when k<r< 2k, but have polynomial expected diameter when r> 2k. This shows an interesting phase-transition between small-world and “large-world” graphs. We also present a general framework to construct classes of small-world graphs with Θ(log n) expected diameter, which includes several existing settings such as Kleinberg’s grid-based and tree-based settings [15]. We also generalize the idea of ‘adding links with probability the inverse distance’ to design small-world graphs. We use semi-metric and metric functions to abstract distance to create a class of random graphs where almost all pairs of nodes are connected by a path of length O(log n), and using only local information we can find paths of poly-log length. 1 Introduction Small-world networks are being used and studied in many disciplines, including the social and nat- ural sciences. These networks possess a striking property, the so called small-world phenomenon, also often spoken of as “six degrees of separation” (between any two people in the United States) 1 . Since many real networks exhibit small-world properties, a number of network models have been proposed as a framework to study this phenomenon. Watts and S. Strogatz [23] introduced a ran- dom graph setting to model certain small-world graphs. This model features two main properties, low average path length and significant clustering. We use small-world graphs to mean graphs with poly-log (expected) diameters, to focus on this property of small separation between nodes. Recently, Kleinberg [16] proposed a family of small-world networks to study another com- pelling aspect of Milgram’s findings: a greedy algorithm using only local information can construct short paths. Kleinberg adds directed long-range random links to an undirected n × n lattice net- work. The long-range links have a non-uniform distribution which favors arcs to close nodes over more distant ones. These graph models have generated considerable interest and recent work. Ap- plications have been found using Kleinberg’s or related small-world models to decentralized search protocols in peer-to-peer systems [21, 24], and gossip protocols for a communication network [14]. Kleinberg’s model starts with a simple base graph and randomly adds new arcs. The base graph models local “contacts”. The additional random links model long-range contacts which can connect * This work was supported by NSF grant CCR-85961. A preliminary version is to appear in ACM-Siam Proc. of Symposium on Discrete Algorithms, 2005. 1 Milgram discovered this in his pioneering work in the 1960’s [22], and recent work by Dodds et al. suggests its still true [9]. 1

Transcript of users.soict.hust.edu.vn · Analysis and Models for Small-World Graphs⁄ Van Nguyen and Chip Martel...

Page 1: users.soict.hust.edu.vn · Analysis and Models for Small-World Graphs⁄ Van Nguyen and Chip Martel fmartel,nguyenvkg@cs.ucdavis.edu Computer Science, UC Davis, CA 95616 Abstract

Analysis and Models for Small-World Graphs∗

Van Nguyen and Chip Martelmartel,[email protected]

Computer Science, UC Davis, CA 95616

Abstract

We study variants of Kleinberg’s small-world model where we start with ak-dimensionalgrid and add a random directed edge from each node. The probabilityu′s random edge is tov is proportional tod(u, v)−r whered(u, v) is the lattice distance andr is a parameter of themodel.

For ak-dimensional grid, we show that these graphs have poly-log expected diameter whenk < r < 2k, but have polynomial expected diameter whenr > 2k. This shows an interestingphase-transition between small-world and “large-world” graphs.

We also present a general framework to construct classes of small-world graphs withΘ(log n)expected diameter, which includes several existing settings such as Kleinberg’s grid-based andtree-based settings [15].

We also generalize the idea of ‘adding links with probability∝ the inverse distance’ todesign small-world graphs. We use semi-metric and metric functions to abstract distance tocreate a class of random graphs where almost all pairs of nodes are connected by a path oflengthO(log n), and using only local information we can find paths of poly-log length.

1 Introduction

Small-world networks are being used and studied in many disciplines, including the social and nat-ural sciences. These networks possess a striking property, the so called small-world phenomenon,also often spoken of as “six degrees of separation” (between any two people in the United States)1.Since many real networks exhibit small-world properties, a number of network models have beenproposed as a framework to study this phenomenon. Watts and S. Strogatz [23] introduced a ran-dom graph setting to model certain small-world graphs. This model features two main properties,low average path length and significant clustering. We usesmall-world graphsto mean graphs withpoly-log (expected) diameters, to focus on this property of small separation between nodes.

Recently, Kleinberg [16] proposed a family of small-world networks to study another com-pelling aspect of Milgram’s findings: a greedy algorithm using only local information can constructshort paths. Kleinberg adds directed long-range random links to an undirectedn × n lattice net-work. The long-range links have a non-uniform distribution which favors arcs to close nodes overmore distant ones. These graph models have generated considerable interest and recent work. Ap-plications have been found using Kleinberg’s or related small-world models to decentralized searchprotocols in peer-to-peer systems [21, 24], and gossip protocols for a communication network [14].

Kleinberg’s model starts with a simplebasegraph and randomly adds new arcs. The base graphmodels local “contacts”. The additional random links model long-range contacts which can connect

∗This work was supported by NSF grant CCR-85961. A preliminary version is to appear in ACM-Siam Proc. ofSymposium on Discrete Algorithms, 2005.

1Milgram discovered this in his pioneering work in the 1960’s [22], and recent work by Dodds et al. suggests its stilltrue [9].

1

Page 2: users.soict.hust.edu.vn · Analysis and Models for Small-World Graphs⁄ Van Nguyen and Chip Martel fmartel,nguyenvkg@cs.ucdavis.edu Computer Science, UC Davis, CA 95616 Abstract

distant components. This greatly shrinks the diameter of the graph. Thus we see a promisingformula: a simple base graph plus some random links can add nice properties (such as Kleinberg’ssetting with expected small diameter and short greedy paths for alls−t pairs). Kleinberg’s setting isa very specific one, so we ask: what are the essential features, underlying the distribution of randomlinks and the grid structure which produce these nice properties? We address this question in twoways. First, we mostly complete the picture of the diameter problem in Kleinberg’s grid-basedsetting by identifying the critical point where the graph changes from expected poly-log to expectedpolynomial diameter, depending on how much we favor links to close nodes. Then we constructa framework, which starts with an arbitrary base graph and some general rules for adding randomarcs. We then refine our model to identify properties which lead to small expected diameter. Furtherrefinement allows us to find short paths using local information only.

Some of our graphs have small expected diameter, yet need not use a distance measure to de-scribe the random link distribution2. Kleinberg’s models (grid-based setting [16], tree-based andgroup-induced settings [15]) and several other well-known small-world graphs fit our abstract mod-els and thus can be analyzed using our general results on diameter and routing. Moreover, weintroduce or generalize several techniques used for bounding a graph’s diameter.

We briefly review Kleinberg’s setting then summarize our results in the next subsection. Klein-berg’s basic model uses a two-dimensional grid as a base with long-range random links added be-tween any two nodesu andv with a probability proportional tod−2(u, v), the inverse square of thelattice distance betweenu andv. In the basic model, each node has an undirectedlocal link to eachof its four grid neighbors and one directedlong-rangerandom link. A straightforward extension ofthis basic model is to have multiple random links from each node and use ak-dimensional grid forany k = 1, 2, 3 . . .; also use an inverserth power distribution (of the random links), for any realconstantr, instead ofr = 2.

In [20], we proved a tightΘ(log n) bound for the expected diameter of Kleinberg’s extendedmodel: for ak-dimensional grid and an inverserth power distribution when0 ≤ r ≤ k, i.e. for0 ≤ r ≤ 2 in the 2-D case. However, the diameter problem forr > k was open before thispaper. Note that the complexity of greedy routing in Kleinberg’s grid-based setting has alreadybeen analyzed. Forr = k it takesΘ(log2 n) expected steps while forr 6= k, greedy routing takesexpected polynomial time[16, 2, 20, 11].

1.1 Our results

First, we mostly complete the analysis of the diameter of Kleinberg’s grid-based setting. For ak-D grid, we show that the model still has poly-log expected diameter whenk < r < 2k, but haspolynomial expected diameter whenr > 2k. However, interestingly enough, the caser = 2k isstill open, though our initial experiments suggest that the model is a large-world. In particular, forKleinberg’s1-D model, for anyr < 2 the expected diameter is upper-bounded by poly-log functions(O(log n) for r ≤ 1), however, forr > 2, the expected diameter can be lower bounded by a (low-degree) polynomial function. This shows a phase-transition between small-world and “large-world”graphs.

We also present a framework to construct several classes of small-world graphs withΘ(log n)expected diameter. These include several existing settings such as Kleinberg’s grid-based and tree-based settings [15]. Our framework starts with a very abstract class of random graphs, then wegradually add in conditions to achieve more refined classes, which are more likely small-worldcandidates.

We also design graphs with poly-log greedy-like paths. Again, we start with a general class,

2Thus, links no longer favor close nodes over distant nodes.

2

Page 3: users.soict.hust.edu.vn · Analysis and Models for Small-World Graphs⁄ Van Nguyen and Chip Martel fmartel,nguyenvkg@cs.ucdavis.edu Computer Science, UC Davis, CA 95616 Abstract

based on an abstract semi-metric function (abstracted from the use of distance), and then add inrefining criteria to construct a hierarchy of classes with interesting properties. As a result, we obtainan abstract class of random graphs such that under some easy conditions, almost all pairs of nodesare connected by a path of lengthO(log n), and using only local information we can find paths ofexpected poly-log length.

1.2 Related work

There has been considerable work on the small-world phenomenon. See [17] for early surveys and[16] for a more recent account on modeling small-world networks. Before Kleinberg’s model, Wattsand Strogatz [23] proposed randomly rewiring the edges of a ring lattice each with a probabilityparameterp. Watts and Strogatz observed that for smallp the model reflects many practical small-world networks with small typical path length and a non-negligible clustering coefficient. Kleinberghas generalized his basic model in several ways in [15] including a generalization that encompassesboth lattice-based and tree-based (“taxonomic” or “hierarchical”) small-world networks.

The diameter of random graphs is a classic problem [5, 6, 7, 10] but most results use uniformlydistributed arcs. Bollobas and Chung [6], study a graph model very similar to Watts and Strogatz in[23] with the nodes of a cycle (or a “ring”) randomly matched to form additional long-range links.The closest diameter work with non-uniform arc probabilities is on long-range percolation graphs(LRPGs) which have been used to study physical properties. As in Kleinberg’s model, a grid with(undirected) local links is augmented by long-range random links whose probability is inverselyrelated to their distance. Note that in contrast to Kleinberg’s model, the added links are undirected,and the degree of a node is not fixed. Thus the analysis techniques for LRPGs are somewhat differentthan those to analyze Kleinberg’s and related models. Benjamini and Berger study the diameter of1-D LRPGs [3] and Coppersmith et al. extend this tok-D grids [8]. Both papers prove diameterresults which show how the expected diameter changes as the arc probability parameters change.Biskup improves these results by proving tighter bounds [4]. These papers show there are criticalpoints where the expected diameter changes from constant, to poly-log and then to polynomial asthe probability parameter changes. We show some similar transitions occur in Kleinberg’s setting.

There have also been several recent papers which analyze greedy routing in other small-worldlike networks [1, 2, 15, 18, 20, 11]. Though our focus is on diameter results, we show how to in-corporate greedy-like routing (to find short paths) into an abstract class which already has expectedO(log n) diameter.

The structure of the paper. We present new diameter results for Kleinberg’s grid settings, whichcomplement previous diameter results. In§3 we start with the most basic setting, i.e. the (one-dimensional) cycle augmented by random links.

We then generalize our approach in§4 (for analyzing Kleinberg’s grid model) and introduceseveral abstract families of random graphs which can be constructors for small-worlds. From theseabstract families, by adding some proper additional conditions, we obtain different classes of small-world graphs with poly-log expected diameter. In§5 we create classes with short paths which canbe found by decentralized algorithms (using local information only), and present a generalizationof §3’s results.

2 Preliminaries

To generalize Kleinberg’s small-world models, we develop an abstract class of random graphs,which includes Kleinberg’s small-world settings (in [16, 15]). We then use this abstract class as aplatform to create a general framework to analyze the diameter (and other related issues) in a varietyof settings.

3

Page 4: users.soict.hust.edu.vn · Analysis and Models for Small-World Graphs⁄ Van Nguyen and Chip Martel fmartel,nguyenvkg@cs.ucdavis.edu Computer Science, UC Davis, CA 95616 Abstract

Consider the following random assignment (or matching) operation: for a given nodeu in agraphG, make a random trial under a specific distribution ruleτ to select another nodev. We write

this asvRτ← u or v = Rτ (u). For example, in Kleinberg’s basic grid setting,τ is defined as having

vRτ← u with probability proportional to the inverse square of the lattice distance betweenu andv,

i.e. Pr[v Rτ← u] ∝ d−2(u, v). We can think of a random graph constructor using this operationwhich forms a family of random graphs. We use a given base graphH and a compatible graph

constructor, where each additional(u, v) link (with vRτ← u) is called a random link. Random links

are generated for a node, not for pairs of nodes as in traditional random graphs3. This operation isimplicitly used in Kleinberg’s small-world models [16, 15].

We restrict the distribution rules (τ ) we use to ones which have the following property: eachRτ call performs an independent trial. MultipleRτ calls on the same input node (u), also are inde-pendent trials. We now define an abstract class of random graphs, which includes all of Kleinberg’ssmall-world settings.

Definition 1. Given a set of undirected base graphsH, a distributionτ and a constant integerq ≥ 1, a Family of Random GraphsFRG(H, τ, q) consists of graphs, each of which is a basegraphH ∈ H plusq out-going random links4 generated under distributionτ for each node.

All the families of random graphs we consider in this paper areFRG families. For example,Kleinberg’s basic grid model ([16]) is aFRG(H, τ, q) family, whereH consists of alln × n grids(n = 1, 2, 3 . . .), q = 1, andτ is the inverse square distribution. Note that there is no restriction onthe set of fixed edgesE in the base graphs. For example, the fixed edges can be the local links inKleinberg’s grid model, a complete graph, or nothing at all as in Kleinberg’s tree-based model.

We now consider some useful basic lemmas. Consider a familyF = FRG(H, τ, q) and a graphG ∈ F , which has base graphH = (V, E).

Lemma 1. For any graphG from a familyFRG(H, τ, q), any two disjoint subset of verticesS andT chosen without any knowledge of the random links fromS, the probability of having a randomlink from some node inS to at least one node inT , is Pr[S → T ] ≥ 1 − e−qε|T ||S| (whereε = ε(S, T ) denotes the minimum value ofPr[Rτ (u) = v] for all u ∈ S andv ∈ T ).

Proof. Given an arbitrary nodeu ∈ S, let p denotePr[u ‘misses’T ], i.e. none of theq randomlinks fromu goes to any node inT , and similarly, letP = Pr[S ‘misses’T ]. A given random linkfrom u goes toT with probability at leastε|T |, therefore it is easy to see that,p ≤ (1−ε|T |)q. Usingthe basic calculus fact1 + x ≤ ex, we havep ≤ e−qε|T |. Now combining all the eventsu ‘misses’T for eachu ∈ S, we haveP ≤ e−qε|T ||S|. Therefore,Pr[S → T ] = 1− P ≥ 1− e−qε|T ||S|

We use lemma 1 where usually the sizes ofS andT are large enough so thatε|T ||S| = Ω(log n)and thus, for someθ > 0, Pr[S → T ] ≥ 1−O(n−θ), which tends to1 whenn goes to the infinity.So, almost surely,T is apart fromS by just one random link.

Lemma 2. If each ofn eventsBini=1 occurs with probability at least1− p, wherep < 1/n, then

the combining event∩ni=1Bi occurs with probability at least1− np

Proof. Using the Union Bound law, we havePr[∪ni=1Bi] ≤ ∪n

i=1Pr[Bi] ≤ np, hencePr[∩ni=1Bi] =

1− Pr[∪ni=1Bi] ≥ 1− np.

Note that lemma 2 applies even if theBi are not independent.3Even when we use undirected random links, we can consider that: each nodeu generates and, so, “owns” certain

random links, while some other random links also incident tou are not owned byu but by some other nodes (whichgenerated these links)

4They are directed by our default assumption.

4

Page 5: users.soict.hust.edu.vn · Analysis and Models for Small-World Graphs⁄ Van Nguyen and Chip Martel fmartel,nguyenvkg@cs.ucdavis.edu Computer Science, UC Davis, CA 95616 Abstract

Figure 1: Iux is ξ-complete with directed random edges crossing between any two subsegments of lengthxξ

3 Diameter transitions in Kleinberg’s model

For simplicity, we first look at the1-D setting and then extend our results to more general settings.Define C(r, n) as the setting where nodes are labeled0, 1, 2, . . . , n − 1 and each nodei has2undirectedlocal links: to (i − 1) mod n and(i + 1) mod n for 0 ≤ i ≤ n − 1. Each nodeialso has one directed random link to some nodej 6= i. The probability its random link is toj, isproportional to|i− j|r, wherer ≥ 0 is a parameter to be specified. For0 ≤ r ≤ 1 this cycle settingis known to have expectedθ(log n) diameter [20]. We now consider the diameter ofC(r, n) whenr > 1.

3.1 The C(r, n) setting with 1 < r < 2.

We present our notation and basic definitions, then a sketch of our basic approach, and finally ourtheorems and proofs in detail.

For r > 1, the normalized coefficientL = 1/(2∑n/2

d=1 d−r) = θ(1); in fact, 12Cr

< L < 1Cr

for n large enough, whereCr =∑∞

i=1 i−r is a constant depending onr only. So,Pr[i → j] =L|i − j|−r = θ(|i − j|−r). Let Il(u) or Iu

l denote a ‘segment’ of lengthl, starting at nodeu, i.e.Iul = u, (u + 1) mod n, . . . , (u + l − 1) mod n.

Consider segmentIux of lengthx for some arbitrary nodeu. Let0 < ξ < 1. Divide Iu

x into x1−ξ

(disjoint) subsegments of lengthxξ. LetDξ(Iux ) = J1, J2, . . . , Jx1−ξ be this set of subsegments,

i.e. Jk = Ixξ(u + (k − 1)xξ) for 1 ≤ k ≤ xξ. For simplicity, we assumexξ, x1−ξ and the like areintegers.

Definition 2. For each nodeu, Iux is ξ-complete if for any ordered pair of segments (Ji, Jk) from

Dξ(Iux ), there is an edge fromJi to Jk

5(see figure 1).

Let δ(Iux ) be the diameter of the subgraph induced by nodes in the segmentIu

x . Here,δ(Iux ) is

a random variable with a value for each instance of our random graph (once the random links areset).E[δ(Iu

x )] is independent of positionu, so we letδx = E[δ(Iux )].

The main idea. In order to upper bound the diameter of our random graph in this1-D setting,we use a probabilistic recurrence approach6. We establish a (probabilistic) relation between thediameter of a segment and that of a smaller one. In particular, we relateδ(Ix) (the diameter of asegment of lengthx) to δ(Iy), wherey = xξ for someξ ∈ (0, 1). Intuitively, with high probability,δ(Ix) is bounded by a constant multiple ofδ(Iy). Thus, we use standard recurrence techniques to

5If we think of a super-graph with theJi’s as it’s nodes then these crossing links make it a complete graph6Although our approach is similar to Karp’s [13], his theorems necessity conditions are not met here.

5

Page 6: users.soict.hust.edu.vn · Analysis and Models for Small-World Graphs⁄ Van Nguyen and Chip Martel fmartel,nguyenvkg@cs.ucdavis.edu Computer Science, UC Davis, CA 95616 Abstract

boundδn (the graph’s expected diameter) based onδx0 for a small initial lengthx0 (soδx0 is upperbounded by a poly-log function ofn).

We use this crucial observation:Ix is almost surelyξ-complete forx andξ < 1 large enough.So,δ(Ix) is almost surely not larger than twice the maximum diameter of any subsegment inDξ(Ix).We formalize the above ideas in the following lemmas and then prove our main theorem. The nexttwo results follow directly.

Lemma 3. If a segmentIux is ξ-complete thenδ(Iu

x ) ≤ 2 maxJ∈Dξ(Ix)

δ(J) + 1.

Corollary 1. If Iux is ξ-complete for eachu = 0..n-1 then max

u=0..n−1δ(Iu

x ) ≤ 2 maxu=0..n−1

δ(Iuxξ) + 1.

Note that for0 < ξ < .5, Iux is notξ-complete for anyu. Sincexξ, the number of random links

from nodes in a subsegmentJi ∈ Dξ(Ix), is smaller thanx1−ξ−1, the number of other subsegmentsJk ∈ Dξ(Ix).

Lemma 4. For r/2 < ξ < 1 (1 < r < 2),Pr[Iu

x is ξ-complete,∀u = 0..n− 1] ≥ 1− n−2

for x ≥ c ln1

2ξ−r n, wherec = (10Cr)1

2ξ−r .

Proof. We need to lower bound the probability of the event that there exists an edge connectingJa andJb for all possible pairs(Ja, Jb). Using lemma 1,Pr[Ja → Jb] ≥ 1− e−qε|Ja||Jb|, whereε = ε(Ja, Jb). Note,|Ja| = |Jb| = xξ, ε(Ja, Jb) ≥ Lx−r > .5Lx−r/Cr andq = 1, so

Pr[Ja → Jb] ≥ 1− e−Lx−r×x2ξ ≥ 1− e−.5x2ξ−r/Cr (1)

Ix is ξ-complete if there exists an arc betweenJa andJb for all possible pairs(Ja, Jb). The numberof such pairs is< x2(1−ξ), hence using lemma 2,

Px = Pr[Ix is ξ-complete] ≥ 1− (e−.5x2ξ−r/Cr × x2−2ξ).Let E be the event thatIu

x is ξ-complete,∀u = 0..n− 1. Again, using lemma 2:Pr[E] ≥ 1− n(1− Px) ≥ 1− (ne−.5x2ξ−r/Cr × x2−2ξ).

Now, for x ≥ (10Cr)1

2ξ−r × ln1

2ξ−r n, clearlyne−.5x2ξ−r/Cr ≤ ne−5 ln n = n−4, hencePr[E] ≥ 1− (n−4 × x2−2ξ) ≥ 1− n−2

sincex2−2ξ < n2.

Theorem 1. For any r such that1 < r < 2, there exists a constantβ such that the expecteddiameter ofC(r, n) is O(logβ n).

Proof. Sincer < 2 we can chooser/2 < ξ < 1. Let φ(x) be a random variable s.t.φ(x) =max

u=0..n−1δ(Iu

x ). φ(x) is determined for each instance of our random graph. IfIux is ξ-complete for

all u = 0..n − 1 then from corollary 1,φ(x) ≤ 2φ(xξ) + 1. Thus from lemma 4, forx ≥ x0 =(10Cr)

12ξ−r log

12ξ−r n,

Pr[φ(x) ≤ 2φ(xξ) + 1] ≥ 1− n−2 (2)

We can use a standard recurrence technique to upper boundφ(n), based onφ(x0) andn only.

Define the sequencexit+1i=0, wherexi+1 = xb

i with b = 1/ξ, x0 = c log1

2ξ−r n, and

t = blogb(logx0n)c = b log( log n

log x0)

log b c = log log nlog b + 0(1)

Thusxt ≤ n < xt+1. Now we look closer at this sequenceφ(xi)ti=0 and use (2) to upper

bound the last term (which differs fromφ(n) by a constant multiple), based on the first term and

6

Page 7: users.soict.hust.edu.vn · Analysis and Models for Small-World Graphs⁄ Van Nguyen and Chip Martel fmartel,nguyenvkg@cs.ucdavis.edu Computer Science, UC Davis, CA 95616 Abstract

t. We claim that each of the eventsEi : “φ(xi) ≤ 2φ(xi−1) + 1”, i = 1, 2, . . . , t andEt+1 :“φ(n) ≤ 2φ(xt) + 1” occurs with probability at least1 − n−2. The firstt events can be justifieddirectly from (2), while we can also easily extend our proof of lemma 3 to justify the last event. LetE be the event thatE1, E2, . . . , Et+1 all occur. Using lemma 2,E occurs with probability at least1− (t + 1)× n−2 ≥ 1−O(n−1).

It is easy to see that eventE impliesφ(xi) ≤ 2iφ(x0) + 2i − 1,∀i = 1..t and thus,φ(n) ≤ 2t+1φ(x0) + 2t+1 − 1 ≤ O((log n)logb 2)× φ(x0).

Note thatφ(x0) ≤ x0 = (10Cr)1

2ξ−r log1

2ξ−r n. That is,Pr[δ(In) ≤ c logβ n)] ≥ 1−O(n−1)

whereβ = log1/ξ 2 + 12ξ−r andc depends onr andξ only. Thus,Pr[δ(In) ≤ O(logβ n)] tends to

1 whenn goes to infinity, and almost surelyδ(In) = O(logβ n).

Note that our bound onβ grows rapidly asr approaches 2.

3.2 The C(r, n) setting with 2 < r

Theorem 2. For r > 2, C(r, n) is a ‘large’ world with expected diameterΩ(nr−2r−1

−o(1)).

Proof. Let 1r−1 < γ < 1. For any nodei, the probability thati’s random contact is at most a

distancenγ from i, isPr[i → j : |i− j| ≥ nγ ] = 1−O(

∑n/2d=nγ d−r) = 1−O(n−γ(r−1)) .

Using lemma 2, the probability that all random links have length at mostnγ , is≥ 1− n×O(n−γ(r−1)) = 1−O(n1−γ(r−1)).

Since 1r−1 < γ, this probability tends to1 whenn goes to infinity. Thus the diameter is at least

nnγ = n1−γ with overwhelming probability (tending to1 whenn goes to infinity). So, the expected

diameter isΩ(nr−2r−1

−o(1))

3.3 Extended settings

It is easy to extend our results for the1-D settings without wraparound . The normalized coefficientfor random links from a nodei depends on the position ofi, i.e. Pr[i → j] = L(i) × |i − j|−r.

Now,∑n

d=1 d−r ≤ L−1(i) ≤ 2∑n/2

d=1 d−r, i.e. L ≤ L(i) ≤ 2L, so, equation (1), and hence the restof our arguments, still apply.

We now consider the generalk-D setting fork = 1, 2, 3 . . .. Let H(k, r, n) denote ak-dimensional hypercubeHn (ann×n× . . .×n hypercube) with undirected edges between adjacentnodes and one random directed link from each node wherePr[u → v] ∝ d−r(u, v). The model isstill a small-world whenr < 2k but a ‘large-world’ whenr > 2k.

Theorem 3. For eachk, r with k < r < 2k, there existsβ > 0 s.t. H(k, r, n) has expecteddiameterO(logβ n). For eachk, r with 2k < r there existsα > 0 s.t. H(k, r, n) has expecteddiameterΩ(logα n).

Proof(sketch).It is not hard to see that the same approach (and techniques) as before still apply, butwe need to modify some details. We focus onk < r < 2k (we omit2k < r which is simpler).

To establish a (probabilistic) recurrence relation, we now usek-D hypercubes (in place of seg-ments in the1-D setting). Consider a hypercubeHx of sizex (in each dimension). As before,for some0 < ξ < 1, we can divideHx into xk(1−ξ) disjoint hypercubes each of sizexξ (in eachdimension). LetDξ(Hx) = J1, J2, . . . , Jxk(1−ξ) denote this set of sub-hypercubes. IfHx isξ-complete, i.e. there is a crossing edge fromJi to Jk for any pair (Ji, Jk), then as before, wehaveδ(Hx) ≤ 2maxδ(J) + 1, J ∈ Dξ(Hx). So, we have the desired recurrence relation and

7

Page 8: users.soict.hust.edu.vn · Analysis and Models for Small-World Graphs⁄ Van Nguyen and Chip Martel fmartel,nguyenvkg@cs.ucdavis.edu Computer Science, UC Davis, CA 95616 Abstract

Figure 2: A path froms to t

can go on as before to justify thatδ(Hn) is almost surely upper bounded by a poly-log function.Therefore, the remaining concern is on theξ-completeness of anyHx (for x large enough); morespecifically, we need the following fact, a new version of lemma 4: there existsξ ∈ (0, 1) andx0 = O(logβ n) for someβ > 0 such that forx > x0, anyHx is almost surelyξ-complete fornlarge enough. Again, we need to considerPr[Ja → Jb] for anyJa, Jb ∈ Dξ(Hx). Using lemma1, Pr[Ja → Jb] ≥ 1 − e−ε|Ja||Jb|, whereε = ε(Ja, Jb). Note that|Ja| = |Jb| = xkξ whileε(Ja, Jb) ≤ θ(x−r). So,Pr[Ja → Jb] ≥ 1 − e−θ(x−r)×x2kξ

= 1 − eθ(−x2kξ−r). Now, by choosinganyξ ∈ ( r

2k , 1), we can go on as before (with lemma 4) to finish proving this fact.

Note that the caser = 2k is open, however initial experiments (for the1-D setting only) suggestthat the setting has polynomial expected diameter.

4 Constructing O(log n) diameter graphs with non-uniform random links

To analyze the shortest path between a source nodes and a destination nodet, we construct twosubset chains, which can be viewed as two trees rooted ats and t, and then show they intersect.Each subset ins’s subset chain contains nodes which can be reached directly from the precedingsubset, and hence, can be reached froms. The subset chain fromt is similar, but contains nodeswith links towardst. To show that the shortests − t path has lengthO(log n), the main idea is toshow that each subset chain grows exponentially in size before they intersect7 (see figure 2).

Exponential growth will be likely if each time we grow a new subset, with high probability morethan one link from each node leaves the current subset. This was true in Kleinberg’s grid setting [20](we called this: “link into or out of a ball” property). We now include this feature to refine our basicclassFRG(H, τ, q). Recall that, a family of random graphsFRG(H, τ, q) consists of graphs, eachof which is a base graphH ∈ H plus at leastq out-going random links generated under distribution

7Alternatively, each subset chain grows exponentially to a threshold, so they intersect with high probability.

8

Page 9: users.soict.hust.edu.vn · Analysis and Models for Small-World Graphs⁄ Van Nguyen and Chip Martel fmartel,nguyenvkg@cs.ucdavis.edu Computer Science, UC Davis, CA 95616 Abstract

τ for each node.

Definition 3. For constantsµ > 0 andξ > 0, familyF = FRG(H, τ, q) meets ‘the (µ,ξ) expansioncriterion’, or F is (µ,ξ)-EXP , if ∀H = (V, E) ∈ H, with n = |V |:

∀u ∈ V,∀C ⊂ V, |C| < nµ : Pr[v Rτ← u : v /∈ C] ≥ ξ (3)

For example, from [19], it is easy to verify that Kleinberg’s grid setting with wrap-around dis-tance is (µ,1 − µ − o(1))-EXP for any fixed positive constantµ < 1. This criterion supportsdiversity and fairness in the distribution of random links:For a random link from any node, nosmall set of vertices (size≤ nµ) can take most of the chance to have this link come into it.

Definition 4 (Type µ-Expansion). For a constantµ > 0, typeµ-Expansion contains all thefamiliesFRG(H, τ, q) which meet (µ,ξ)-EXP for someξ > 1/q.

We defineχ, called an‘expansion function’, as follows. Given anyu ∈ V , this operation willcall operationRτ q times. Also, letχ(u) denote the set of vertices from theseq Rτ calls. Thusthe random links for graphG are formed by performing operationχ on each node. For any setS:χ(S) =

⋃u∈S χ(u).

Consider a familyF of typeµ-Expansion. Let β = qξ (soβ > 1). For any nodeu and setCof size less thannµ − q, which is determined beforeχ(u) is known, the expected number of freshelements generated byχ(u) that do not belong toC is greater thanβ: E[ | χ(u)− C | ] > β > 1.Sinceχ(u) ‘contributes’ more than one expected fresh element outside ofC, χ can be used togenerate a chain of subsets from a small initial subset such that with high probability, the subsetswill quickly grow to sizeΘ(nµ).

4.1 The out-going subset chain

Let F be aµ-Expansion family, andG = (V, E) be an arbitrary graph fromF . Now, from anarbitrary initial setS0 ⊂ V , we construct a chain of subsetsSk, namely theout-going subsetchainwith respect to the initial setS0, s.t. Sk+1 = χ(Sk)−∪k

i=0Si; k = 1, 2, 3, . . . Thus,Si is thenodes at distancei from S0 using random links. The following results forµ-Expansion familiesshow the subset chain grows rapidly ifS0 is large enough.

Lemma 5. ∀C, S ⊂ V s.t. S ⊂ C, |C| ≤ α = θ(nµ): if |S| = Ω(log n), almost surely|χ(S) −C|/|S| > γ for a constantγ > 1. Also,∃γ > 1,∀ θ > 0,∃c > 0:

|S| > c log n ⇒ Pr[ |χ(S)−C||S| > γ] = 1−O(n−θ)

The above lemma (proof in appendix) provides a probabilistic lower boundγ on the growthrate of the subset chain in each early step (by choosingC = ∪k

i=0Si to apply the lemma in eachstep). This growth rate can be maintained as long as the subset sizes are still under a threshold.For anyS0 ∈ V with sizeΩ(log n), the subset chain originating fromS0 will almost surely growexponentially in size until it reaches sizeα = θ(nµ). Also, for anyθ > 0, by choosing a sufficientlylarge constantc s.t. |S0| > c log n, Pr[|Sk| ≥ α] = 1−O(n−θ) for somek = O(log n). Moreover,this can be true for any givenθ > 0 by choosingc large enough.

4.2 The in-coming subset chain

We now construct a subset chain, based on the random links coming to the sets of the chain. Weuse an ‘expansion function’ψ, which is a counterpart ofχ, so we can reuse the formalism used in§4.1 on the out-going subset chain and obtain similar results. Functionψ is not state-less asχ was.For any subset of verticesD and a nodeu ∈ V we defineψ(u,D) to return the set of all nodes

9

Page 10: users.soict.hust.edu.vn · Analysis and Models for Small-World Graphs⁄ Van Nguyen and Chip Martel fmartel,nguyenvkg@cs.ucdavis.edu Computer Science, UC Davis, CA 95616 Abstract

v /∈ D s.t. v has a random link tou. As before,ψ(T,D) =⋃

u∈T ψ(u,D) for any subsetT . Now,from an arbitrary subsetT0 ⊂ V , we can construct a chain of subsetsTk, namely thein-comingsubset chainwith respect to the initial setT0, s.t. Tk+1 = ψ(Tk,D) for k = 1, 2, 3, . . ., whereD = ∪k

i=0Tk. Similar to definition 3, we have:

Definition 5. For constantsµ > 0 and ξ > 0, family F meets ‘the (µ,ξ) incoming expansioncriterion’, or F is (µ,ξ)-IE, if the following is satisfied.

∀D : |D| < nµ, ∀u ∈ D : Pr[∃v /∈ D : Rτ (v) = u] > ξ (4)

Similarly as withµ-Expansion, for a fixedµ > 0, we define typeµ-IncExpansion, which in-cludes all theFRG(H, τ, q) families which meet (µ,ξ)-IE whereξ > 1/q. For aµ-IncExpansionfamily, lemma 5 holds if we replace the use of functionχ by that of functionψ and subsetC by sub-setD (8). There is an interesting implication between these two expansion criteria for a large classof families. We call a family of random graphs, using a distributionτ , δ-symmetric (or just sym-metric if δ = 1) for some constantδ ≥ 1, if Pr[Rτ (v)=u]

Pr[Rτ (u)=v] ≤ δ for all pairs of nodes(u, v). It is easyto see that Kleinberg’s grid settings (using the inverse power distributions) have this property, andthey are symmetric if wrap-around distance is used.

Lemma 6. If familyF is (µ,ξ)-EXP , for 0 < µ, ξ < 1, and isδ-symmetric for someδ ≥ 1 thenFis (µ,1− e−ξ/δ)-IE.

Proof. We need to prove (4) holds. Letp(u, v) = Pr[Rτ (u) = v] and F be the event that∃v /∈ D : Rτ (v) = u. The lemma is shown as

Pr[F ] =∏

v/∈D(1− p(v, u)) ≤

v/∈De−p(v,u) = exp−

v/∈Dp(v, u) ≤ exp−1

δ

v/∈Dp(u, v) ≤ e−

ξδ

Note that∑

v/∈D p(u, v) = Pr[∃v /∈ D : Rτ (u) = v] ≥ ξ. Also, the factex ≥ 1 + x impliese−p(u,v) ≥ 1− p(u, v).

4.3 Abstract classes of small-world graphs

We refine the above families by adding conditions to obtain small-world graphs. If our graph isfrom a family of typeµ1-Expansion andµ2-IncExpansion for some0 < µ1, µ2 < 1 then, givenany sources and destinationt, we can use the following strategy to construct alog n-length pathfrom s to t (see figure 2). First, we want a connected subsetS0 containings andT0 containingt ofΩ(log n) size in the base graphH. We then construct the out-going subset chain fromS0 and thein-coming subset chain fromT0. Our above results show that, with overwhelming probability, thereexist subsetsSk with sizeθ(nµ1) andTl with sizeθ(nµ2) s.t. any node inSk can be reached fromS0 by O(log n) links, andT0 from Tl by O(log n) links. We now consider proper conditions so wecan easily reachTl from Sk.

If ε = ε(τ), the minimum value ofPr[Rτ (u) = v] for all u 6= v, is large enough, then almostsurely there is an arc fromSk to Tl (or they intersect).

Definition 6 (Expansion Family). AFRG(H, τ, q) is anExpansion family if it is (µ1,ξ)-EXPand (µ2,ξ)-IE for some constantsξ > 1/q, µ1, µ2 > 0, and ε(τ) = Ω(n−µ3) for a constantµ3 < µ1 + µ2.

8The constructions of both subset chains share the same formalism

10

Page 11: users.soict.hust.edu.vn · Analysis and Models for Small-World Graphs⁄ Van Nguyen and Chip Martel fmartel,nguyenvkg@cs.ucdavis.edu Computer Science, UC Davis, CA 95616 Abstract

We now show that a graph from anExpansion family almost always has an arc fromSk toTl (or they already intersected). We can assume all the nodes inSk are fresh (we do not know theirrandom links yet9) and hence, using lemma 1,

Pr[Sk → Tl] ≥ 1− e−qε|Tl||Sk| ≥ 1− e−Ω(nµ1+µ2−µ3) ≥ 1−O(n−1)which tends to1 whenn goes to the infinity.

The graphs from anExpansion family 10 are small-worlds, i.e. their expected diameter ispoly-log in n, as long as each node is rich enough in neighbors in the base graph to form largeenough initial subsets (i.e.S0, T0). Without this final condition, however, often these graphs are notconnected. If there are no edges in the base graph (E = ∅) then even with the added random edges,the graphs can be unconnected; an example will be presented in the next subsection.

We now add the notion of neighboring in the base graphs. A nodeu is calledk-neighboredforsomek ∈ N if u belongs to a connected component of sizek in the base graph. A base graphH = (V, E) is calledk-neighbored if all the nodes arek-neighbored. A connected graph isk-neighbored for allk ≤ |V | − 1. For k large enough,k-neighbored graphs allow us to constructlarge enough initial subsets. The next theorem now follows fairly directly11.

Theorem 4. For any two nodess, t in a graph of anExpansion family , if s and t are c log n-neighbored for any constantc > 0 then there almost surely existO(log n)-length paths betweensand t. An Expansion family , using(c log n)-neighbored base graphs wherec > 6qξ

(qξ−1)2, has

expected diameterO(log n).

Thus, a graph from anExpansion family almost always consists of a giant component withdiameterO(log n) and perhaps some small components of sizeO(logn). There are perhaps random(directed) links between the components (but only in one direction between a given pair).

Using super-nodes.We now consider random graphs which uselog n-neighbored base graphs.

Theorem 5. Consider a familyFRG(H, τ, q), which is (µ1,ξ)-EXP and (µ2,ξ)-IE for some con-stantsξ, µ1, µ2 > 0, whereε(τ) = Ω(n−µ3) for some constantµ3 < µ1 + µ2, and all base graphsin H are log n-neighbored. There almost surely exists a path of lengthO(log n) between any twonodes (forn large enough).

Proof. This theorem is a simple corollary of the previous theorem ifq is s.t.ξ > 1/q. However, forq < Q = d1/ξe the theorem still holds. The main idea is to form super-nodes withQ random links.Thelog n-neighbored property assures that we can always partition the graph into super-nodes eachof which is a subgraph of constant diameter and has at leastQ random links. The length of a pathconstructed here differs by only a constant from before (when we haveq ≥ Q).

These abstract classes for (almost) small-world graphs are broad enough to accommodate manydifferent well-known small-world models: Bollobas and Chung’s [6], Watts and Strogatz’s [23],Kleinberg’s grid-based [16], tree-based, and group-induced models [15]. Kleinberg describes hisgroup-induced model with two abstract properties, and it is not hard to see that the second propertyimplies our (µ,ξ)-IE for some0 < µ, ξ < 1. We show that our results apply to Kleinberg’s tree-based model in the following section. It is relatively straight-forward to extend this case for similarresults in the group-induced model. Figure 3 shows how the classes relate.

9We omit a conditioning issue: if we construct thes subset chain (s-SSC) first then the growth of thet subset chain(t-SSC) is conditioned on the existence of s-SSC and vice versa. Thus, we need to add∪k−1

i=0 Si toD (§4.2) or∪l−1i=0Ti to

C (4.1). Therefore, ifµ1 > µ2 then we construct t-SSC first, otherwise s-SSC first.10Note that we can construct similar classes by usingµ-Expansion andδ-symmetric property instead.11In fact, a full proof of it is very similar to that of theorem14 in our previous work [20].

11

Page 12: users.soict.hust.edu.vn · Analysis and Models for Small-World Graphs⁄ Van Nguyen and Chip Martel fmartel,nguyenvkg@cs.ucdavis.edu Computer Science, UC Davis, CA 95616 Abstract

Figure 3: The hierarchy of classes

4.4 The diameter of a tree-based random graph

We now use our framework to analyze the diameter of Kleinberg’s tree-based model [15] and itsvariants. Kleinberg shows that decentralized routing can be applied in more settings (not only thegrid-based [16]), but even when no lattice structure appears at all (say, the network of the Web’shyper-links). Kleinberg also introduces a group-induced model, a generalization of both grid-basedand tree-based models [15]. He shows that using these models, greedy routing takes expectedtime O(log n) if nodes have out-degreeθ(log2 n), andO(log4 n) if the degrees are bounded by aconstant. Note that for the constant-degree model, there is a fair chance that some nodes have noin-coming link at all. Thus, the routing protocol is only required to find a path from a sources toa small neighborhoodof a destinationt (a “cluster” in [15]), say, a small ‘subtree’ which containsthe leaft.

We now show our results with respect to the diameter of the tree-based model (which can beextended to the group-induced model for similar results). Basically, we show thatwhen the degreeof each node is at least3, the setting is anExpansion family , that is by adding sufficient locallinks to make each node rich enough in neighbors, the graph will have diameterO(log n). Let usreview Kleinberg’s tree-based model. Nodes are the leaves of a complete (for simplicity)b-ary treeT , whereb is a constant. Leth(u, v) denote the height of the least common ancestor ofu andv inT . There are no local links in this setting but there are a number of directed random links leavingeach nodeu, under a distributionτ , where a link is tov with probability proportional tob−h(u,v).

If there are exactlyq directed random links leaving each node, the graphs in this tree-basedsetting are very likely unconnected (similar to the case of lacking local links in the grid-based setting[19]), however, the setting can still be anExpansion family by adding proper conditions. From[15], the normalizing coefficient of this link distribution isθ(log−1 n). So,ε(τ) = θ(n−1 log−1 n);

12

Page 13: users.soict.hust.edu.vn · Analysis and Models for Small-World Graphs⁄ Van Nguyen and Chip Martel fmartel,nguyenvkg@cs.ucdavis.edu Computer Science, UC Davis, CA 95616 Abstract

thus, to have anExpansion family we need this setting to meet (µ1,ξ)-EXP and (µ2,ξ)-IE forsomeξ > 1/q andµ1 + µ2 > 1. Consider the following fact which holds even ifq = 1.

Fact 7. For Kleinberg’s tree graphs with anyq ≥ 1, given a positiveθ < 1, a nodeu andC ⊂ Vwith size at mostnθ, the probability that a random link fromu hits a node outside ofC is more than1 − θ − o(1) whenn is large enough. Also, the probability that there is a random link tou fromoutside ofC is more than1− eθ+o(1)−1 (i.e. almost1− eθ−1) whenn is large enough.

See [20] for a proof of a similar fact. It is easy to see that the setting meets (x − o(1),1 − x)-EXP and (y − o(1),1 − ey−1)-IE for any0 < x, y < 1. Therefore, givenq, we need to findx, ys.t.

x + y > 1; q(1− x) > 1; q(1− ey−1) > 1Solving this system of equations, we findq ≥ 3.

Theorem 6. For q ≥ 3, Kleinberg’s tree-based setting is anExpansion family .

We can add in local links to make the base graph connected or make the base graphc log n-neighbored: ring all the nodes in the base graphH or alternately, ring all the subtrees of heightat mostlogb(c log n). With c determined as in theorem 4, this setting will have expected diameterO(log n).

5 Random graphs induced by semi-metric or metric functions

We have abstracted away topological features of Kleinberg’s grid setting with our expansion criteriato create classes where the strongest hasO(log n) expected diameter. We now generalize the use ofa distance measure in the distribution of random links, and this makes greedy-like routing (definedlater) work. We design classes of random graphs using distributions based on semi-metric functions:we define a semi-metric functiond(u, v) and generate random links between any two nodesu andv with probability∝ d−r(u, v).

Consider a pair(G, d): a graphG = (V, E) and a functiond = dG : V 2 → R+ associatedwith G. We defined to be a semi-metric function if for anyu, v ∈ V , d(u, v) = 0 ⇔ u = v; andd(u, v) = d(v, u). We defineNk(u) = v ∈ V |d(u, v) ≤ k, the nodes within ‘distance’k of u.For c1, c2 > 0, graphG is called(c1, c2) linear-expandedwith respect tod if ∀u ∈ V, k = 1, 2 . . . :c1 ≤ |Nk(u)|

k ≤ c2 if Nk−1(u) 6= V , i.e. |Nk(u)| grows nearly-proportionally tok beforeNk(u)becomesV .

Definition 7 (InvDist family). AnInvDist(r) is aFRG(H, τ, q) family where each base graphH ∈ H has an associated metric-functiond and there exists constantsc1, c2 > 0 s.t. H is (c1, c2)linear-expandedw.r.t d, and where12: Pr[Rτ (u) = v] ∝ d−r(u, v).

All Kleinberg’s small-world models (grid-based, tree-based and group-induced) fall intoInvDist(1)for an appropriated. For example, for Kleinberg’s2-D grid model [16], we defined(u, v) as thesquare of the lattice distance betweenu andv; for Kleinberg’s group-induced model [15], we defined(u, v) as the size of the minimum set containing bothu andv 13.

Theorem 7. ∀r : 0 < r < 1, δ > 0, c2 > c1 > 0, ∃q ≥ 1 s.t. anyδ-symmetricInvDist(r) familyspecified byc1, c2 andq (as in definition 7) is anExpansion family 14.

12Note that any family satisfying all these criteria except havingc1 ≤ |Nk(u)|kβ ≤ c2 instead (for some given constant

β > 0) can be normalized by using functiond′(u, v) = dβ(u, v) instead, and hence becomes anInvDist(r′) familywherer′ = r/β.

13It is not hard to see that the second property (of the two abstract properties Kleinberg uses to describe his group-induced model) implies that|Nk(u)| grows nearly-proportionally tok.

14Note that if we only use undirected random links then the condition ofδ-symmetry is not necessary.

13

Page 14: users.soict.hust.edu.vn · Analysis and Models for Small-World Graphs⁄ Van Nguyen and Chip Martel fmartel,nguyenvkg@cs.ucdavis.edu Computer Science, UC Davis, CA 95616 Abstract

As a result, for any graph from aδ-symmetricInvDist(r) family usinglog n-neighbored basegraphs, there almost surely exists anO(log n) length path between any two nodes.

Proof. Here we prove the theorem forr = 1. It is not hard to extend our proof for otherr’s(0 < r < 1), which is actually easier. Consider anInvDist(1) family and a graph from it. We showthat we can chooseq big enough such that this family is an expansion family, i.e., it meets (µ1, ξ)-FE and (µ2, ξ)-SE for someξ > 1/q andµ1, µ2 > 0, and 1

ε(τ) = o(nµ1+µ2) (whereε(τ) is theminimum probability given to a random link byτ ). In order to justify the first expansion property,

we need to estimate this type of probability:Pr[v Rτ← u : a ≤ d(u, v) ≤ b] for 1 ≤ a < b ≤ Ku,whereKu = mink | Nk(u) = V , the maximum distance of any other node fromu. Note that,this probability is related toh(a, b) =

∑bi=a 1/i ≈ ln(b/a) as will be shown later. Also, from

c1 < |Nk|k < c2, clearly,Ku = θ(n).

Let A(a, b) =∑b

k=a(|Nk(u)| − |Nk−1(u)|)× 1k where|Nk(u)| − |Nk−1(u)| is the number of

nodes at distancek from u. Thus,1/A(1,Ku) is the normalized coefficientCu (of the distribution

of the random links atu). So,Pr[v Rτ← u : a ≤ d(u, v) ≤ b] = A(a, b)/A(1,Ku). Now, we have

A(a, b) =b∑

k=a

(|Nk(u)| − |Nk−1(u)|)× 1k

=b−1∑

k=a

|Nk(u)| × (1k− 1

k + 1) +

|Nb|b

− |Na−1|a− 1

=∑b−1

k=a|Nk(u)|k(k+1) + |Nb|

b − |Na−1|a−1 .

Now note thatc1 < |Nk|k < c2, i.e.

∑b−1k=a

c1(k+1) ≤

∑b−1k=a

|Nk(u)|k(k+1) ≤

∑b−1k=a

c2(k+1) , so

c1h(a + 1, b) + c1 − c2 ≤ A(a, b) ≤ c2h(a + 1, b) + c2 − c1 OR A(a, b) = θ(ln(b/a)) (5)

Now we justify the fist expansion property. For anyC ∈ V with sizeO(nµ) and0 < µ < 1, weconsiderPr[Rτ (u) /∈ C]. Definek ∈ N such that|Nk−1(u)| ≤ |C| = O(nµ) < |Nk(u)|. Thus,|Nk(u)| = O(nµ) andk = O(nµ). Again, using the observation that a ball is the best shape forCto minimize the probabilityPr[Rτ (u) /∈ C], it is easy to see that

Pr[Rτ (u) /∈ C] ≥ Pr[Rτ (u) /∈ Nk(u)] = Pr[v Rτ← u : k+1 ≤ d(u, v) ≤ Ku] = A(k+1,Ku)/A(1, Ku).

From (5),

A(k + 1,Ku) ≥ c1h(k + 2,Ku) + c1 − c2 ≥ c1 ln(n

O(nµ)) + O(1) = c1(1− µ) lnn + O(1).

Also,A(1,Ku) ≤ c2 ln n+O(1). Thus,Pr[Rτ (u) /∈ C] ≥ c1(1−µ)c2

, i.e. the family meets (µ, ξ1)-FE

for any 0 < µ < 1 and ξ1 = c1(1−µ)c2

. Using δ-symmetry, the family meets (µ, ξ2)-SE where

ξ2 = 1− e−ξ1/δ > 0. Thus we need to chooseq > ξ2.Now we only need to assure that1ε(τ) = o(nµ1+µ2), i.e. 1

ε(τ) = o(n2µ). Note thatε(τ) =1/Ku

A(1,Ku) = O( 1n ln n). Thus we only need to chooseµ > 0.5 then this condition is met.

Note that, from (5), the normalized coefficient forInvDist(1) isCu = 1/A(1,Ku) = θ(log−1 n),the fact we will use in the next section.

14

Page 15: users.soict.hust.edu.vn · Analysis and Models for Small-World Graphs⁄ Van Nguyen and Chip Martel fmartel,nguyenvkg@cs.ucdavis.edu Computer Science, UC Davis, CA 95616 Abstract

5.1 Greedy-like routing.

This section constructs a new class of graphs where most pairs of nodes have shortest paths oflengthO(log n) andgreedy-like paths(defined below) with expected lengthO(log2 n). Inspired byKleinberg’s idea of greedy routing using only local information [16], we assume that each nodeuknows the random links which leave nodes in a smallneighborhoodnearu (e.g. thelog n nodesclosest tou in the base graph).Greedy-like pathsare paths found by agreedy-like algorithmwhichis defined as follows: if the current node isu, choose the random link(w, v) wherew is in u′sneighborhood andv is the closest such node to the destination. Route tow using local links andthen take link(w, v). Updatev to be the current node. We now present new definitions and then ourtheorems for this routing strategy.

We restrictd(u, v) to be a‘light’ metric by adding the condition thatd(u, v) ≤ α(d(u,w) +d(w, v)) for any nodesu, v, w and for a constantα (so less strict than the triangle inequality). Wedefine classMET R(r) as classInvDist(r) but each functiond is a light metric function instead.All Kleinberg’s small-world models (grid-based, tree-based and group-induced) areMET R(1)families with the functiond(u, v) derived naturally from each model’s context. Except for the1-Dand the tree-based setting, this function is not a metric. For the tree-based setting, letd(u, v) be thenumber of leaves in the smallest subtree containingu andv (this satisfies the triangle inequality).For the group-induced model, we letd(u, v) be the size of the smallest group containing nodesuandv. This generally doesn’t satisfy the triangle inequality, but satisfies ours for a properα.

We now add neighboring conditions so our greedy-like routing strategy can be used. An undi-rected base graphH(V,E) is calledk-strongly neighbored(w.r.t. the functiond(u, v)) if for eachu ∈ V , the sub-graph induced by the set of nodesv such thatd(u, v) ≤ k is connected.

Theorem 8. For any graph from aMET R(1) family usinglog n-strongly neighbored base graphs,a greedy-like algorithm will find paths of expected lengthO(log2 n) between any two nodes.

Proof (Sketch).Let l = log n and assumeq = 1 (worst case). Consider this greedy-like algorithm:each step of routing from the current nodeu finds a nodew ∈ Nl(u) with a random contactv =Rτ (w) closest tot. Thus we route tov (throughw) and update it as the current node15. We nowjust apply Kleinberg’s idea of using phases (each phase roughly halves the remaining distance) toanalyze this routing algorithm.

Supposed(u, t) = 2k for the current nodeu. Assume2k > l (otherwiset ∈ Nl(u) and we canuse local links to go directly tot). ConsiderPr[Rτ (w) ∈ Nk(t)] wherew ∈ Nl(u). For any nodev ∈ Nk(t), note that (see figure 4)

d(w, v) ≤ α(d(w, u) + d(u, v)) ≤ αl + α2(d(u, t) + d(t, v)) ≤ 2kα + 3kα2 = k(2α + 3α2)

Thus,Pr[Rτ (u) ∈ Nk(t)] ≥ Cu|Nk(t)|

k(2α+3α2)= θ(log−1 n) sinceCu = θ(log−1 n) andc1 ≤

|Nk(t)|/k ≤ c2. Since the number of such nodew ∈ Nl(u) is |Nlog n(u)| = θ(log n), the prob-ability that the next nodev (set after each step of routing) is in the inner ballNk(t), can be lowerbounded by a positive constant. Therefore, the expected number of routing steps to complete a phase(of halving the remaining distance) is upper bounded by a constant and thus, just like Kleinberg’sproof of hisO(log2 n) upper bound in [16], we can argue that each phase takes expectedO(log n)links and then the obtained greedy-like path has expected lengthO(log2 n) (afterlog n phases).

Combining theorems 7 and 8, we have:

15Here we assume knowing the local topology of the base graph and thus, we can use ‘local links’ (of the base graph)to traverseNl(u). We can certainly route fromu to w by the shortest path withinNl(u).

15

Page 16: users.soict.hust.edu.vn · Analysis and Models for Small-World Graphs⁄ Van Nguyen and Chip Martel fmartel,nguyenvkg@cs.ucdavis.edu Computer Science, UC Davis, CA 95616 Abstract

Figure 4: A greedy-like routing step may halve the remaining distance to the destination

Theorem 9. For any graph from aMET R(1) δ-symmetric family usinglog n-strongly neighboredbase graphs, there almost surely exists a path of lengthO(log n) and a greedy-like path of expectedO(log2 n) between any two nodes.

This theorem easily applies to Kleinberg’s grid, tree-based and group-induced models withproper local links (to make the base graphslog n-strongly neighbored).

5.2 Diameter ofMET R(r) for 1 < r < 2.

We now present a natural generalization of our results in§3. We consider the diameter ofMET R(r),where1 < r < 2.

Theorem 10. For 1 < r < 2, for aMET R(r) family, there exists a constantc s.t. if the base

graphs arex0-strongly neighbored, wherex0 = c log2

2−r n (n: number of vertices), then almostsurely the expected diameter of this family is upper bounded by a poly-log function.

Proof(sketch).We extend our approach in§3 (using probabilistic recurrence relations) to upperbound the diameter of the graphs in this abstract class. Defineφ(x) = max

u∈Vδ(Bu

x), whereδ(Bux)

is the diameter of the subgraphBux (induced in our graph, withinNx(u)), which is∞ if the graph

is not strongly connected.φ(x) (andδ(Bux)) is determined for each graph (after random links are

chosen). The key idea is to establish a probabilistic recurrence relation, whereφ(x) ≤ 2φ(xξ) + 1with probability≥ 1 − n−2, for some chosen0 < ξ < 1 and for allx greater than some chosenconstantx0 > 0. Then, we can just repeat theorem 1’s work to upper bound the expected diameterof MET R(r).

For simplicity, we assume using metric functions (α = 1). Let r/2 < ξ < 1, x > 0; u, v aretwo arbitrary nodes withd(u, v) = x. ConsiderPr[Bu → Bv], the probability of having a randomlink from Bu = Bxξ(u) to Bv = Bxξ(v). Using lemma 1,Pr[Bu → Bv] ≥ 1 − e−ε|Bu||Bv | ≥1− e−cx2ξ−r

, whereε = ε(Bu, Bv) = Cu(x + 2xξ)−r = θ(x−r) and both|Bu| and|Bv| areθ(xξ).Note that, for1 < r, the normalized coefficientCu is constant-bounded.

Now for any nodeu ∈ V , a ballBx(u) will have its diameter upper bounded by2φ(xξ) + 1if the eventsBv → Bw occur for any two distinct nodesv, w ∈ Bx(u) andBv ∩ Bw = ∅. Thenumber of such events is≤ n2, so using lemma 2,Pr[δ(Bu

x) ≤ 2φ(xξ) + 1] ≥ 1− n2 × e−cx2ξ−r.

Combining the same event for allu ∈ V , Pr[φ(x) ≤ 2φ(xξ) + 1] ≥ 1− n3 × e−cx2ξ−r.

16

Page 17: users.soict.hust.edu.vn · Analysis and Models for Small-World Graphs⁄ Van Nguyen and Chip Martel fmartel,nguyenvkg@cs.ucdavis.edu Computer Science, UC Davis, CA 95616 Abstract

Now we choosex0 = (5/c)1

2ξ−r × ln1

2ξ−r n and hence, forx ≥ x0, Pr[φ(x) ≤ 2φ(xξ) + 1] ≥1 − n−2. Now simply chooseξ = .5 + .25r (so, r < 2ξ = (2 + r)/2 < 2) and continue as inTheorem 1’s proof.

For example, we can modify Kleinberg’s tree-based setting to become a small-world graph withpoly-log expected diameter as follows. We connect all the nodes together (say, order the nodes fromleft to right and connect them with undirected edges), and for any two nodesu andv, we add an arcfrom u to v with probability proportional tob−rh(u,v) instead ofb−h(u,v) with 1 < r < 2. Note that,under the context of this section we defined(u, v) as the number of leaves in the smallest subtree16

which contains bothu andv: bh(u,v).

6 Concluding remarks

We consider a general construction of random graphs: a base graph plus random links added toeach node. By gradually adding properties to the base graphs and/or the distribution of randomlinks, we build a hierarchy of classes of random graphs with the finest ones featuring small-worldproperties (small diameter and greedy-like routing using local information only). Thus, we proposea framework for analyzing and characterizing small-world graphs.

There are still some open questions in our study of ‘adding links with probability∝ the inversedistance’. As noted before, the caser = 2k in thek-D grid setting is still open. We also expectto extend our results in§5 for base graphs with restricted growth rate17, a general class of graphswhich can be used to model many real networks [12]. Thus our work can be useful for a practicaldesign problem, where we want to add in “long links” to a given network to shrink its diameter.

References[1] J. Aspnes, Z. Diamadi, G. Shah, “ Fault-tolerant Routing in Peer-to-peer Systems”, Proc. 22nd ACM Symp. on

Princ. of Dist. Comp., PODC’02, p. 223-232

[2] L. Barriere, P. Fraigniaud, E. Kranakis, D. Krizanc, “Efficient Routing in Networks with Long Range Contacts”,Proc. 15th Intr. Symp. on Dist. Comp., DISC’01, p. 270-284

[3] I. Benjamini and N. Berger, “The Diameter of a Long-Range Percolation Clusters on Finite Cycles” Random Struc-tures and Algorithms, 19,2:102-111 (2001).

[4] M. Biskup, “Graph Diameter in Long-range percolation”, submitted to Electron. Comm. Probab.

[5] B. Bollobas, “The Diameter of Random Graphs”, IEEE Trans. Inform. Theory 36 (1990), no. 2, 285-8.

[6] B. Bollobas, F.R.K. Chung, “The diameter of a cycle plus a random matching”, SIAM J. Discrete Math. 1,328-333(1988)

[7] F. Chung and L. Lu, “The Diameter of Random Sparse Graphs”, Adva. in App. Math. 26 (2001), 257-279.

[8] D. Coppersmith, D. Gamarnik, and M. Sviridenko “The diameter of a long- range percolation graph”, RandomStructures and Algorithms, 21,1:1-13 (2002).

[9] P. Dodds, R. Muhamad, and D. Watts, “An experimental study of search in global social networks”, Science,301:827-9, 2003.

[10] P. Erdos and A. Renyi, “On random graphs”, Publicationes Mathematicas 6, 290-7 (1959).

[11] P. Fraigniaud, C. Gavoille and Christophe Paul, ”Eclecticism Shrinks the World”, Proc. 23rd ACM Symp. on Princ.of Dist. Comp. PODC’04, p. 169-178.

[12] J. Gao and L. Zhang, “Tradeoffs between Stretch Factor and Load Balancing Ratio in Routing on Growth RestrictGraphs”, Proc. 23rd ACM Symp. on Princ. of Dist. Comp. PODC’04, p. 189-196.

[13] R. Karp, “Probabilistic Recurrence Relations”, inJournal of the ACM41(6):1136-1150, Nov. 1994.

[14] D. Kemper, J. Kleinberg, and A. Demers, “Spatial gossip and resource location protocols”, in Proc. ACM Symp.Theory of Computing, 163-172, 2001.

16Or we can use the size of the subtree:≈ bb−1

bh(u,v)

17Where, informally, any ‘ball’ with radiusr around a nodeu has at mostO(rβ) nodes for some fixed constantβ > 0.

17

Page 18: users.soict.hust.edu.vn · Analysis and Models for Small-World Graphs⁄ Van Nguyen and Chip Martel fmartel,nguyenvkg@cs.ucdavis.edu Computer Science, UC Davis, CA 95616 Abstract

[15] J. Kleinberg, “Small-World Phenomena and the Dynamics of Information”, inAdvances in Neural InformationProcessing Systems (NIPS)14, 2001.

[16] J. Kleinberg, “The Small-World Phenomenon: An Algorithmic Perspective”, in Proc. 32nd ACM Symp. Theory ofComp., 163-170, 2000.

[17] M. Kochen, Ed.,The Small-World(Ablex, Norwood, 1989)

[18] G. S. Manku and M. Naor and U. Wieder, “Know Thy Neighbour’s Neighbour: The Role of Lookahead in Ran-domized P2P Networks”, Proc. 36th ACM Symp. on Theory of Comp. STOC’04, p. 54-63.

[19] C. Martel and V. Nguyen, “The complexity of message delivery in Kleinberg’s Small-world Model”, Tech-ReportCSE-2003-23, CS-UCDavis, 2003.

[20] C. Martel and V. Nguyen, “Analyzing Kleinberg’s (and other) Smallworld Models”, Proc. 23rd ACM Symp. onPrinc. of Dist. Comp. PODC’04, pp. 179-188.

[21] D. Malkhi, M. Naor, D. Ratajczak, “Viceroy: A Scalable and Dynamic Emulation of the Butterfly”, Proc. 21st ACMSymp. on Princ. of Dist. Comp., p. 183-192.

[22] S. Milgram, “The small world problem” Psychology Today 1,61 (1967).

[23] D. Watts and S. Strogatz, “Collective Dynamics of small-world networks” Nature 393,p. 440-2 (1998).

[24] H. Zhang, A. Goel and R. Gividan, “Using the Small-World Model to Improve Freenet Performance”, IEEE INFO-COM’02, vol. 21(1), pp. 1228-37.

Appendix

Proof of Lemma 5.Before proving this lemma, we need to introduce some new notions. LetZ(u, C) denote the indicatorrandom variable, which indicates if the resultv = Rτ (u) is outside ofC or not. For a fixedu, E[Z(u, C)] is a functionof C on domain2V . Consider all subsetsC with |C| = nµ and corresponding valuesE[Z(u, C)]. Let Cu denote the onethat hasE[Z(u, C)] being the minimum. LetZu denoteZ(u, Cu) and note thatE[Zu] = minE[Z(u, C)], |C| ≤ nµ.Also note that theZus for all u are independent (though not always identical) Bernoulli random variables and from thedefinition of classµ-Expansion each of them has expectation at leastξ.

Lemma 5 can be proved by using Chernoff’s inequality. LetS = u1, u2, . . . , ul wherel = |S| = Ω(log n),andT denoteχ(S)− C. If we produce an ordered sequence ofdl resultsχ1(u1), χ2(u1), . . . , χq(u1), χ1(u2), χ2(u2),. . . , χq(ul) thenχ(S) is the union of thesedl items. Therefore,T can be seen as the result of accumulating only thefresh ones from thesedl items if they come one-by-one in order. A valueχi(uk) is seen fresh if it is neither inC noralready in the current accumulated setT .

For i = 1..q, k = 1..l, let Xki denote a random variable that takes value1 if χi(uk) is fresh, i.e. it is outside ofSk−1

j=1 χ(uj) ∪Si−1

j=1 χj(uk) ∪ C (which clearly has size less than(q + 1)|C| < nµ for large enoughn). Otherwise,Xki

takes value0. LetX denote the sum of thesedl random variables. Note that eachXki can be lower bounded byZuk ; thus

X can be lower bounded by the sum ofdl independent Bernoulli random variables, each of which has expectation at leastξ = β/q. Therefore, by applying Chernoff’s inequality we havePr[X ≤ (βl)(1 − δ)] ≤ e−δ2(βl)/2 whereE[X] ≥(ld)(β/q) = βl and0 < δ < 1. Sincel = Ω(log n) thene−δ2(βl)/2 = O(n−t) for somet = t(δ) > 0. This is due to asimple fact thatea log n = na for anya. Thus,Pr[X ≤ (βl)(1− δ)] = O(n−t) or Pr[X > lβ(1− δ)] = 1−O(n−t).This assures that when n is large enough,X, and thus|T |, is almost surely greater thanlβ(1− δ). Setγ = β(1− δ).

When we have obtained this fixedδ, given anyθ > 0 we can choosec s.t. whenl > c log n we havePr[X >

lβ(1− δ)] = 1− O(n−θ). This can be done by choosingc s.t. e−δ2(βc log n)/2 = O(n−θ) or δ2βc/2 > θ or c > 2θδ2β

.Put another way, we can start choosing anyγ s.t. 1 < γ < β (thus we have the intermediateδ = 1 − γ/β) and thenchoose anyc s.t. c > 2θβ

(β−γ)2

18