Making Doubling Metrics Geodesic

15
Algorithmica (2011) 59: 66–80 DOI 10.1007/s00453-010-9397-x Making Doubling Metrics Geodesic Anupam Gupta · Kunal Talwar Received: 10 July 2008 / Accepted: 22 February 2010 / Published online: 7 April 2010 © Springer Science+Business Media, LLC 2010 Abstract The starting point of our research is the following problem: given a dou- bling metric M = (V,d), can one (efficiently) find an unweighted graph G = (V ,E ) with V V whose shortest-path metric d is still doubling, and which agrees with d on V × V ? While it is simple to show that the answer to the above ques- tion is negative if distances must be preserved exactly. However, allowing a (1 + ε) distortion between d and d enables us bypass this hurdle, and obtain an unweighted graph G with doubling dimension at most a factor O(log ε 1 ) times the doubling dimension of G. More generally, this paper gives algorithms that construct graphs G whose convex (or geodesic) closure has doubling dimension close to that of M, and the shortest- path distances in G closely approximate those of M when restricted to V × V . Similar results are shown when the metric M is an additive (tree) metric and the graph G is restricted to be a tree. Keywords Metric embeddings · Low-distortion embeddings · Doubling metrics · Geodesic metrics · Convex closure 1 Introduction The algorithmic study of finite metrics has become a central theme in theoretical com- puter science in recent years. Of particular interest has been the study of the geometry This research was partly supported by the NSF CAREER award CCF-0448095 and CCF-0729022, and by an Alfred P. Sloan Fellowship. A. Gupta Computer Science Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA K. Talwar ( ) Microsoft Research, Silicon Valley Campus, 1065 La Avenida, Mountain View, CA 94043, USA e-mail: [email protected]

Transcript of Making Doubling Metrics Geodesic

Page 1: Making Doubling Metrics Geodesic

Algorithmica (2011) 59: 66–80DOI 10.1007/s00453-010-9397-x

Making Doubling Metrics Geodesic

Anupam Gupta · Kunal Talwar

Received: 10 July 2008 / Accepted: 22 February 2010 / Published online: 7 April 2010© Springer Science+Business Media, LLC 2010

Abstract The starting point of our research is the following problem: given a dou-bling metric M = (V , d), can one (efficiently) find an unweighted graph G′ =(V ′,E′) with V ⊆ V ′ whose shortest-path metric d ′ is still doubling, and whichagrees with d on V ×V ? While it is simple to show that the answer to the above ques-tion is negative if distances must be preserved exactly. However, allowing a (1 + ε)

distortion between d and d ′ enables us bypass this hurdle, and obtain an unweightedgraph G′ with doubling dimension at most a factor O(log ε−1) times the doublingdimension of G.

More generally, this paper gives algorithms that construct graphs G′ whose convex(or geodesic) closure has doubling dimension close to that of M, and the shortest-path distances in G′ closely approximate those of M when restricted to V × V .Similar results are shown when the metric M is an additive (tree) metric and thegraph G′ is restricted to be a tree.

Keywords Metric embeddings · Low-distortion embeddings · Doubling metrics ·Geodesic metrics · Convex closure

1 Introduction

The algorithmic study of finite metrics has become a central theme in theoretical com-puter science in recent years. Of particular interest has been the study of the geometry

This research was partly supported by the NSF CAREER award CCF-0448095 and CCF-0729022,and by an Alfred P. Sloan Fellowship.

A. GuptaComputer Science Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA

K. Talwar (�)Microsoft Research, Silicon Valley Campus, 1065 La Avenida, Mountain View, CA 94043, USAe-mail: [email protected]

Page 2: Making Doubling Metrics Geodesic

Algorithmica (2011) 59: 66–80 67

of metrics—embeddings into Minkowski spaces have been the most obvious exam-ple, accompanied by the study of notions of metric dimension which have allowedus to partially quantify geometric properties that make metrics tractable for severalalgorithmic problems.

Given these advances in our understanding of the geometric properties of abstractmetric spaces, it is worth remarking that our comprehension of the topological prop-erties of metric spaces—and of the relationship between topology and geometry haslagged behind: we do not yet have a good comprehension of how the structure of agraph interacts with the dimensionality of the shortest-path metric induced by it. Onesuch example shows up in a paper [8], where a fairly simple algorithm is given forlow-distortion Euclidean embeddings of unweighted trees whose shortest-path met-ric is doubling—however, extending the result to embed weighted trees (also withdoubling shortest-path metrics) requires significantly more work. This raises the nat-ural question: given a doubling tree metric M = (V , d), is there an unweighted treeG′ = (V ′,E′) whose shortest-path metric is also doubling, and contains M as a sub-metric? In fact, we do not know the answer even if we drop the requirement that G

be a tree, and look for any unweighted graph G′.An immediate obstacle to answering these question is the observation that subdi-

viding the edges of a weighted tree to convert it into an unweighted tree can increasethe dimension unboundedly. For example, take a star K1,n, and set the length of theith edge {v0, vi} to be 2i . It is easy to check that the metric dG has constant doublingdimension; however, subdividing the ith edge into 2i parts to make it unit-weightedcreates a new graph with n points at unit distance from each other, which has a dou-bling dimension logn that is unbounded. On the positive side, it is easy to showthat this metric can be embedded into the real line with distortion 2 (e.g., the mapvi �→ 2i ), which we can subdivide without altering the doubling dimension. In thispaper, we show that this positive result is not an aberration: any tree metric can berepresented as a submetric of an unweighted tree metric which has almost the samedoubling dimension. We show a similar result for arbitrary graphs as well, and showthat our trade-off between distortion and the dimension blowup is asymptotically op-timal.

Formal Definitions To define the problems we study, let us define the convex clo-sure or geodesic closure of a graph, which is an extension of the notion of subdividingedges. Given a graph G = (V ,E) with edge lengths � : E → R≥0, assume that thenames of the vertices in V belong to some total order (V ,≺). Let V G be the uncount-ably infinite set of points V ∪ {e[x] | e ∈ E,x ∈ (0, �(e))} obtained by consideringeach edge as a continuous segment of length �(e).

Definition 1 Given a graph G = (V ,E) with shortest path metric MG = (V , dG)

define a metric on the set V G as:

dG(e[x], e′[y]) = min{x + dG(u,u′) + y, x + dG(u, v′) + (�(e′) − y),

(�(e) − x) + dG(u′, v) + y,

(�(e) − x) + dG(u′, v′) + (�(e′) − y)},

Page 3: Making Doubling Metrics Geodesic

68 Algorithmica (2011) 59: 66–80

if e = {u,v} (with u ≺ v) and e′ = {u′, v′} (with u′ ≺ v′). Define the convex closureof the graph G to be the metric space conv(G)

.= MG = (V G,dG).

Note the metric obtained by subdividing edges of G is a sub-metric of the convexclosure of G, and hence it suffices to study the doubling dimension of this convexclosure conv(G).

1.1 Our Results

The example of K1,n with exponential edge weights shows that even if the shortest-path metric MG of a graph G is doubling, its convex closure MG may not be dou-bling. The goal of this paper is to show that despite this, there is a “close-by” graphG′ whose convex closure MG′ is indeed doubling. In particular, the main theorem isthe following:

Theorem 2 (Main Theorem) Given a graph G = (V ,E) with specified edge-lengths,we can efficiently find a graph G′ = (V ,E′) (also with non-negative edge-lengths)such that

• The distances in G and G′ are within a multiplicative factor of (1 + ε) of eachother, and

• If dimMG= k, then dimMG′ = O(k), and dimconv(G′) = O(k log ε−1).

The graph G′ is based on a spanner construction of Chan et al. [4]. We showhere that the convex closure of G′ is doubling. Since Theorem 2 does not give anyguarantees about the topology of the graph G′, we prove an analogous result abouttree metrics, with improved guarantees on the dimension:

Theorem 3 Given a tree T = (V ,E) with specified edge-lengths, we can efficientlyfind a tree T ′ = (V ′,E′) with V ⊆ V ′ (and with non-negative edge-lengths) such that

• For x, y ∈ V , the distance between them in T and T ′ are within a multiplicativefactor of (1 + ε) of each other, and

• If dimMT= k, then dimMT ′ = O(k), and dimconv(T ′) = O(k + log log ε−1).

In addition, we show that the trade-off between the distortion and the dimensionof the convex closure shown in Theorem 3 is asymptotically optimal:

Theorem 4 For any k ≥ 1, there exists a tree metric T = (V ,E) with dimMT=

O(k) such that for any tree metric T ′ = (V ′,E′) with V ⊆ V ′, the following holds.If dT (u, v) ≤ dT ′(u, v) ≤ (1 + ε)dT (u, v) for all u,v ∈ V , then dimconv(T ′) must be�(k + log log ε−1).

We do not know whether the bound in Theorem 2 can be improved. However, wecan show a stronger lower bound on the trade-off, under the restriction that the graphG′ is defined on the same vertex set as G, i.e. we do not use any Steiner points. In factthis is true even when G is itself an ultrametric. This shows that obtaining a betterdependence on ε necessarily requires Steiner points, and hence shows the limitationsof the construction in Theorem 2.

Page 4: Making Doubling Metrics Geodesic

Algorithmica (2011) 59: 66–80 69

Theorem 5 There exists an ultrametric G = (V ,E) with dimMG= O(1) such

that for any graph G′ = (V ,E′), the following holds. If dG(u, v) ≤ dG′(u, v) ≤(1 + ε)dG(u, v) for all u,v ∈ V , then dimconv(G′) must be �(log ε−1).

The proof of Theorem 2 appears in Sect. 4, and of Theorem 3 appears in Sect. 5.Both the lower bounds (i.e., Theorems 4 and 5) are proved in Sect. 6.

1.2 Related Work

The notion of doubling dimension was introduced by Assouad [1] and used in algo-rithm design by Clarkson [5]. The properties of doubling metrics and their algorith-mic applications have since been studied extensively, a few examples of which appearin [2, 6, 8–10, 13–15, 20].

Somewhat similar in spirit to our work is the 0-extension [3, 7, 12] (cf. the Lip-schitz Extendability problem [11, 16, 18]). Given a graph G, the 0-extension prob-lem deals with extending an embedding of some subset of vertices of a graph intosome host space to an embedding of the entire graph (or even the convex closureof the graph), while approximately preserving the Lipschitz constant of the embed-ding. Our results can be interpreted as analogues to the above where the goal is toapproximately preserve the doubling dimension.

Finally, a result similar in spirit to Theorem 2 can be derived from the work ofSemmes [19, Theorem 1.15], who shows how to embed any doubling metric into �n

endowed with a specific geodesic metric, with constant distortion. The result usesAssouad’s embedding as a black-box and the distortion required is at least as large.Since the best distortion currently known for Assouad’s embedding is linear in thedoubling dimension [8], this approach does not currently give a geodesic metric withdistortion of (1 + ε), as our results do.

2 Preliminaries and Notation

Given a metric (X,d), define the (open) ball B(x, r) = {y ∈ X | d(x, y) < r}. Givena graph G = (V ,E) with edge lengths �, we denote the shortest path metric on it bydG and use BG(x, r) to denote the ball in the metric (V , dG). We omit the subscriptG when it is obvious from context. There are several ways of defining the doublingconstant λ and the doubling dimension dim for a metric space, all of them within aconstant factor of each other: here is the one that will be most useful for us.

Definition 6 (Doubling Constant and Doubling Dimension) A metric space M =(X,d) has doubling constant λ ∈ Z≥0 if for each x ∈ X and r ≥ 0, given the ballB(x,2r), there is a set Y ⊆ X of size at most λ such that B(x,2r) ⊆ ⋃

y∈Y B(y, r),and λ is the smallest such integer. The doubling dimension dimM = log2 λ.

Given a graph G = (V ,E), we let dimG denote the dimension of the metric MG =(V , dG). For any metric M with at least two points, λM ≥ 2 and dimM ≥ 1; we willhenceforth only deal with such non-trivial metrics.

Page 5: Making Doubling Metrics Geodesic

70 Algorithmica (2011) 59: 66–80

Definition 7 (Packings, Coverings, Nets) Given a metric (X,d), an r-packing is asubset P ⊆ X such that any two points in P are at least distance r from each other.An r-covering is a subset C ⊆ X such that for each point x ∈ X, there is a point c ∈ C

at distance d(x, c) ≤ r . An r-net is a subset N ⊆ X that is both an r-packing and anr-covering.

Fact 8 (“Small” Packings) Let metric M = (X,d) have doubling dimension atmost k. Given any δ-packing P ⊆ X, for any x ∈ X and radius � ≥ δ, the setS := B(x,�) ∩ P has size at most (4�/δ)k . Hence the doubling dimension

dimM ≥ log(4�/δ) |B(x,�) ∩ P | (1)

for every x ∈ X, δ-packing P and � ≥ δ.

Proof By Definition 6, there exists Y1 ⊆ X of size at most 2k such that B(x,�) ⊆⋃y∈Y1

B(y,�/2). Now applying the definition to each B(y,�/2), there is a set Y2

of size at most 22k such that B(x,�) ⊆ ⋃y∈Y2

B(y,�/4); inductively, it follows that

for each integer j ≥ 1, there exists a set Yj ⊆ X with |Yj | ≤ 2jk such that B(x,�) ⊆⋃y∈Yj

B(y,�/2j ).Now setting j∗ = �log2(2�/δ)� ≤ log2(2�/δ)+1, we get that B(x,�) is covered

by at most 2j∗k ≤ 2(2�/δ)k balls of radius δ/2. Since each of these (open) balls ofradius δ/2 can only contain one point from S because of the δ-packing property of S,it follows that the size of S is at most 2(2�/δ)k ≤ (4�/δ)k . �

Definition 9 (Geodesic Metrics) A metric (X,d) is said to be geodesic if for everyu,v ∈ X, u �= v, there is a continuous map fuv : [0, d(u, v)] → X such that fuv(0) =u, fuv(d(u, v)) = v and d(fuv(x), fuv(y)) = |x − y| for any x, y ∈ [0, d(u, v)].

Fact 10 For any graph G = (V ,E), the metric conv(G) defined in Definition 1 isgeodesic.

3 A Structure Theorem

In this section, we characterize the dimension of the convex closure of a graph H =(V ,E) in terms of some parameters of the graph.

Definition 11 (Long Edges) Given a graph H = (V ,E) with edge lengths � : E →R≥0, a vertex u ∈ V and a radius r ≥ 0, an edge e = {v,w} is a long edge with respectto u, r if at least one endpoint of e is at distance at most r from u, and �(e) > r .

Let Lu(r) be the set of long edges with respect to vertex u and radius r . Thefollowing structure theorem gives us a characterization of the doubling dimension ofconv(H) in terms of long edges.

Theorem 12 (Structure Theorem) For any graph H = (V ,E), the following holds:

Page 6: Making Doubling Metrics Geodesic

Algorithmica (2011) 59: 66–80 71

(a) For any k ≥ dimH , if the number of long edges |Lu(r)| ≤ 2k for every u ∈ V andevery r ≥ 0, then dimconv(H) ≤ 2k + 3 ≤ 5k.

(b) If dimconv(H) ≤ k, then for every u ∈ V and every r ≥ 0, the number of long edges|Lu(r)| ≤ 24k .

Proof Suppose the number of long edges |L(u, r)| is at most 2k for every u, r , thenwe show that for any u ∈ conv(H) and any r > 0, the ball Bconv(H)(u,2r) can becovered by at most 2O(k) balls Bconv(H)(y, 3

2 r). Since ( 34 )3 < 1

2 , applying this factthree times implies that the doubling dimension of conv(H) is 2O(k).

• Firstly, consider u ∈ V . From the definition of doubling dimension of H , there isa set Y ⊂ V of size at most 22 dimH ≤ 22k such that BH (u,2r) ⊆ ⋃

y∈Y BH (y, r2 ).

Let Y ′ = Y ∪ {e[r] | e ∈ Lu(r)} ∪ {e[l(e) − r] | e ∈ Lu(r)}. Note that |Y ′| ≤ |Y | +2|Lu(r)| ≤ 22k + 2k+1 ≤ 22k+1.

We claim that⋃

y∈Y ′ Bconv(H)(y, 32 r) ⊆ Bconv(H)(u,2r). Indeed, consider a

point e[x] ∈ Bconv(H)(u,2r), where e = {v,w} such that d(u, e[x]) = d(u, v) + x′where x′ ∈ {x, l(e) − x}. Assume v ≺ w (the other case is similar) so that x′ = x.If x ≤ r , then consider y ∈ Y such that v ∈ BH (y, r

2 ). By the triangle inequality,d(y, e[x]) ≤ d(y, v) + x < 3

2 r , and hence e[x] ∈ Bconv(H)(y, 32 r). On the other

hand, if x > r and e[x] ∈ Bconv(H)(u,2r), then d(u, v) = d(u, e[x]) − x ≤ r sothat the edge e is long with respect to (u, r). In this case, e[r] ∈ Y ′ and since0 ≤ x ≤ 2r , we conclude that e[x] ∈ Bconv(H)(e[r], r). Thus for any u ∈ V ,Bconv(H)(u,2r) ⊆ ⋃

y∈Y ′ Bconv(H)(y, 32 r).

• Now consider balls around vertices in conv(H) \ V . Note that for some edgee = {u,v} ∈ E, the Bconv(H)(e[x],2r) ⊆ Bconv(H)(u,2r)∪Bconv(H)(v,2r)∪{e[z] |max(0, x − 2r) ≤ z ≤ min(l(e), x + 2r)}. By the argument above, the first two canbe covered by a 22k+1 balls of radius r each. The subset of e in Bconv(H)(e[x],2r)

is one dimensional and thus can be covered by two balls of radius r each. Thiscompletes the argument showing that if the number of long edges in H is small,the convex closure conv(H) has doubling dimension at most 22k+3.

To prove part (b), consider the set of points W = {e[ r2 ] | e ∈ Lu(r)}; clearly

|W | = |Lu(r)|. By the triangle inequality, W ⊆ Bconv(H)(u, 32 r) but the balls

{Bconv(H)(w, r2 ) | w ∈ W } are disjoint, and hence W is an r

2 -packing. Applying Fact 8with � = 3

2 r and δ = r2 , we get |Lu(r)| ≤ 12k ≤ 24k . �

The following simple result follows immediately.

Corollary 13 For any n point metric (X,d), there exists a geodesic metric (X′, d ′)that contains an isometric copy of (X,d) and has doubling dimension at mostO(logn).

The example of the star graph K1,n with exponential edge weights shows that thisbound is tight, even when (X,d) itself has constant doubling dimension. In the fol-lowing sections, we show that the O(logn) bound above can be improved in generalif one allows a small distortion.

Page 7: Making Doubling Metrics Geodesic

72 Algorithmica (2011) 59: 66–80

4 Convex Closures for Graphs: Proof of Main Theorem

In this section, we show how to take a graph G = (V ,E) and obtain a graphG′ = (V ,E′) on the same vertex set, which has (almost) the same distances asin G, but whose doubling dimension does not change by much under taking theconvex closure. In particular, we use a bounded-degree spanner construction dueto Chan et al. [4]: they give an algorithm that given a metric (V , d) with dimen-sion dimG and a parameter ε < 1/4, outputs a spanner G′ = (V ,E′) such thatd(x, y) ≤ dG′(x, y) ≤ (1 + ε) d(x, y) for all pairs x, y ∈ V , and moreover the de-gree of each vertex x ∈ V is bounded by ε−O(dimG). We show that the convex closureof this spanner has doubling dimension of O(dimG log ε−1).

4.1 The Spanner Construction

We start with a graph G = (V ,E) and carry out a series of transformations to ob-tain graph G′. Given ε < 1

4 , let τε = 6 + �log( 1ε)�. We assume the smallest pairwise

distance in G is at least 2τε , else we can scale up edge lengths in G to satisfy thisproperty. We start with some more definitions.

Definition 14 (Hierarchical Tree) A hierarchical tree for a set V is a pair (T, φ),where T is a rooted tree, and φ is a labeling function φ : T → V that labels each nodeof T with an element in V , such that the following conditions hold.

1. Every leaf is at the same depth from the root.2. The function φ restricted to the leaves of T is in a bijection with V .3. If u is an internal node of T, there exists a child v of u in T such that φ(v) = φ(u).

Hence the nodes mapped by φ to any point in x ∈ V form a connected subtree of T.

Definition 15 (Net-Tree) A net tree for a metric (V , d) is a hierarchical tree (T, φ)

for the set V such that the following conditions hold.

1. Let Ni be the set of nodes of T that have height i. (The leaves have height 0.) Letr0 = 1, and ri+1 = 2ri , for i ≥ 0. (Hence, ri = 2i .) Then, for i ≥ 0, φ(Ni+1) is anri+1-net for φ(Ni).

2. Let node u ∈ Ni , and its parent node be pu. Then, d(φ(u),φ(pu)) ≤ ri+1.

One can construct a net-tree for any metric using a greedy algorithm; the paper ofHar-Peled and Mendel shows how to construct a net-tree in near-linear time [9].

To construct their bounded-degree spanner, Chan et al. [4] define the following:suppose we are given a graph G = (V ,E), whose shortest-path metric M = (V , dG)

has doubling dimension dimG. Let ε > 0 and (T, φ) be a net-tree for M. Set Cε :=(4 + 32

ε). For each positive integer i > 0, let

Ei :={{u,v} | u,v ∈ φ(Ni), dG(u, v) ≤ Cε · ri

}∖ ⋃

j≤i−1

Ej , (2)

and define E0 to be the empty set. (See Definition 15 for the definitions of Ni, ri .)Each edge {u,v} is assigned a length dG(x, y). Note that all edges in Ei have theirlength in (Cεri−1,Cεri].

Page 8: Making Doubling Metrics Geodesic

Algorithmica (2011) 59: 66–80 73

While the graph G = (V , E = ⋃i Ei) is a (1 + ε)-spanner for the original metric

with a small number of edges, obtaining a bounded-degree spanner requires somemodifications to this basic construction. For each v ∈ V , define i∗(v) := max{i|v ∈φ(Ni)}, and for each edge (u, v) ∈ E, direct the edge from u to v if i∗(u) < i∗(v). Ifi∗(u) = i∗(v), the edge can be directed arbitrarily. (These directions are merely forthe algorithm, and the proof.) Chan et al. show that each vertex x ∈ V has out-degreebounded by β = ε−O(dimG). However, the in-degree could still be large, and hencethe following steps are then performed:

• For any x ∈ V , let Fi = Fi(x) be the subset of edges directed into x that belongto Ei .

• If the non-empty subsets are Fi1,Fi2, . . . ,Fit , where ij < ij+1. We do nothing tothe first 7 log ε−1 of them, which contribute ε−O(dimG) to the final degree of x.

• Consider some value j > 7 log ε−1: from the set Fi(j−7 logε−1)

of edges directed intox, we choose an arbitrary one {u,x}. We replace edges {y, x} ∈ Fij by new edges{y,u}. These at most ε−O(dimG) edges as referred to as edges donated from x to u.

Note the length of the edge {u,x} is at most Cε2i(j−7 log ε−1) ≤ Cεε

72ij , whereasthe length of any edge in {y, x} ∈ Fij is at least Cε2ij −1; hence dG(u, x) ≤(ε7/2)dG(x, y) ≤ ε6dG(x, y), since ε ≤ 1/4. Now by the triangle inequality,dG(u, y) ∈ (1 ± ε6)dG(x, y). Additionally, note that if x donates an edge {x, y} ∈Fij to u, then {u,x} ∈ Fi

(j−7 logε−1)so that dG(x,u) is at least Cε2(i

j−7 logε−1 )−1.

Again, the new edges{u,v} constructed above are assigned a length dG(x, y). Letus call the new graph G′ = (V ,E′); from the construction, each vertex x ∈ V has thefollowing edges incident to it:

• Type-A edges. These correspond to the ε−O(dimG) edges that were directed awayfrom x.

• Type-B edges. These correspond to the edges directed into x that belong to thesmallest 7 log ε−1 levels; this gives another ε−O(dimG) edges in total.

• Type-C edges. For each edge e = {x, y} of type-A that is incident to x, there are atmost ε−O(dimG) that have been donated from y to x to maintain y’s degree bound;each such edge e′ = {z, x} corresponds to some original edge of the form {z, y} ∈Ei , for some i such that y, z ∈ φ(Ni).

This immediately shows that G′ has degree ε−O(dimG); showing that distances arewell-maintained by this operation proves the following theorem.

Theorem 16 [4] The spanner G′ has degree ε−O(dimG) and stretch (1 + ε).

4.2 Bounding the Dimension of the Convex Closure

While the distortion bound in Theorem 16 proves that dimG′ ≈ dimG, neither thisnor the bounded degree implies much about the dimension of conv(G′); hence weuse Theorem 12 to show this fact, and thus prove Theorem 2.

Lemma 17 Consider the graph G′: fix a vertex v and radius R, and ε < 14 . The

number of long edges |Lv(R)| with respect to vertex v and radius R is at mostO(ε−O(dimG)).

Page 9: Making Doubling Metrics Geodesic

74 Algorithmica (2011) 59: 66–80

Proof Recall that Lv(R) is the set of edges that have one endpoint within theball B(v,R), and have length greater than R. Define ρ ∈ Z≥0 such that R ∈(Cε2ρ−1,Cε2ρ]. All balls B(x, r) in this proof are with respect to the graph G′.

By the spanner construction, any long edge of type-A or type-B must belong to⋃i≥ρ Ei , and hence must have both endpoints in φ(Nρ). Moreover, one endpoint of

each such a long edge must lie in the ball B(v,R) ⊆ B(v,Cε2ρ); since the points inφ(Nρ) are at distance at least 2ρ from each other, there can be at most (Cε)

O(dimG)

many such endpoints within the ball. Moreover, each one of these endpoints has atmost ε−O(dimG) type-A or type-B edges—hence the number of long edges that are oftype-A or type-B (with respect to their endpoint that lies within B(v,R)) is boundedby (Cε)

O(dimG) × ε−O(dimG) = ε−O(dimG), using the fact that Cε = O(ε−1).Now for the long edges in Lv(R) that are of type-C with respect to their endpoint

within B(v,R). By their construction, a type-C edge {u,y} at u is associated withan edge {x, y} ∈ E (of almost the same length—up to a factor of (1 ± ε6)) suchthat x donated the edge to u. Consider one such type-C edge e = {u,y} with u ∈B(v,R) associated with {x, y} ∈ Ei—hence the distance dG(x, y) ∈ (Cε2i−1,Cε2i],and also x, y ∈ φ(Ni). By the construction of the type-C edges, {u,x} must be atype-A edge at u, and the distance dG(u, x) ≤ ε6 · dG(x, y). Hence the distance fromv to x in G′ is at most R + ε6 · dG(x, y); the former term since u ∈ B(v,R) andthe latter term since the edge {u,x} is a type-A edge present in G′. This implies thatx ∈ B(v,R + ε6Cε2i ). From this, it follows that for any level i ≥ ρ − 1, the numberof donor vertices is bounded by |B(v,Cε(2ρ + ε62i ))| that are at least 2i distanceapart from each other; this can be loosely bounded by ε−O(dimG). Each such donorvertex could donate ε−O(dimG) edges, which would give us a total of ε−O(dimG) edgesfor the level i.

However, summing this bound over all the levels would give us too weak abound, so we use this only for levels i ∈ {ρ − 1, ρ, . . . , ρ + 6 log ε−1}. Now con-sider any level i > ρ + 6 log ε−1: any donor vertex for such a level must lie in the ballB(v,Cε(2ρ + ε62i )) ⊆ B(v,Cεε

62i+1) ⊆ B(v,Cεε52i ). A little algebra shows that

ε5Cε = ε5(

4 + 32

ε

)

≤ ε5 33

ε≤ ε4 · 33 ≤ ε,

and so the donor vertex must be at distance at most ε2i from v. However, since thedonor vertex must belong to φ(Ni), it must be at distance at least 2i from any otherdonor vertices. Now, if there were two donor vertices at distance ε2i from v, theywould be at distance 2ε2i < 2i from each other—this implies that there can be at mostone donor vertex for any such level i > ρ + 6 log ε−1. Moreover, by the containmentproperty φ(Nj+1) ⊆ φ(Nj ) for net-trees, it follows that there is at most one donorvertex x that donates at levels i > ρ + 6 log ε−1. So, to complete the argument, itwould suffice to show that the total number of long edges donated by x to vertices inB(v,R), summed over all the levels, is small.

Let i1 < i2 < · · · < it be the levels for which x donates long edges to vertices inB(v,R); we claim that t = O(log ε−1). Since the first edge is long, R ≤ Cε2i1+1.Moreover, since x donates this edge to u1 ∈ B(v,R), we conclude that dG(x,u1) ≤ε6Cε2i1 , so that dG(v, x) ≤ R + ε6Cε2i1 ≤ (2 + ε6)Cε2i1 . For the last edge, supposethat t > 7 log ε−1 + 3. Then an edge in Fit is donated from x to ut , and we have that

Page 10: Making Doubling Metrics Geodesic

Algorithmica (2011) 59: 66–80 75

dG(x,ut ) ≥ Cε2i4−1. On the other hand, since ut ∈ B(v,R), by triangle inequality,dG(x,ut ) ≤ dG(x, v) + dG(v,ut ) ≤ (3 + ε6)Cε2i1 . Since i4 ≥ i1 + 3, this gives usthe desired contradiction. Thus the number of levels is t ≤ O(log ε−1). There are atmost ε−O(dimG) edges donated from x to B(v,R) from each of these levels, and theclaim follows. �

Using the bound on the number of long edges given by Lemma 17, along with The-orem 12(a) implies that dimconv(G′) = O(dimG · log ε−1), which proves Theorem 2.

5 Convex Closures for Trees

The construction of the previous section showed that given any graph G, we couldconstruct a new graph G′ such that distances in G and G′ are within (1 + ε) of eachother, and conv(G′) has low doubling dimension. However, since the constructionstarts with the shortest-path metric dG and completely ignores the topological struc-ture of G itself, it is not suited to proving Theorem 3 which seeks to start with a treeand end with another tree. In this section, we show a different approach that allowsus to monitor the graph structure more closely.

5.1 A General Construction

We give a procedure that takes a general graph G = (V ,E) and outputs another graphG′ = (V ′,E′). While the construction itself does not depend on G being a tree, theanalysis given in the next section does. The procedure takes a graph G = (V ,E) withedge lengths � : E → R≥0, and constructs a new graph G′ = (V ′,E′) with V ⊆ V ′(by way of an intermediate graph G) as follows. Define an exponential tail with k

edges as a path P = 〈v0, v1, v2, . . . , vk〉, where the length of the edge {vi−1, vi} is 2i .Without loss of generality, the smallest edge length in G is at least 2τε , where τε =6 + �log( 1

ε)�. Recall that Cε = (4 + 32

ε). We construct the graph G′ in the following

way:

• As in Sect. 4.1, we consider a net-tree (T, φ) for the graph G. If Ni is the set ofnodes in the net-tree T at height i, then for u ∈ V define i∗(u) to be the largesti such that u ∈ φ(Ni). Attach to each u ∈ V an exponential tail with i∗(u) edges;refer to the j th vertex on this path as u[j ], with u[0] = u. Let G be this intermediategraph consisting of G along with the tails.

• Consider an edge e = {u,v} ∈ E(G), and suppose its length lies in the interval(Cε2i−1,Cε2i]. Since there is a bijection between leaves of T and nodes in V ,some leaf of T must be mapped by φ to the endpoint u: let the level-i ancestorof that node be mapped by φ to u ∈ V ; similarly, let node v ∈ V be defined forthe other endpoint v. We now construct an edge {u[i], v[i]} of length �(e) in thegraph G′.

5.2 The Analysis for Trees

Note that if we start off with a tree T instead of a general graph, the general con-struction in Sect. 5.1 adds exponential tails to the vertices of T to get an intermediate

Page 11: Making Doubling Metrics Geodesic

76 Algorithmica (2011) 59: 66–80

tree T , and then replaces edges between the nodes in T to edges between appropriaterepresentatives on the tails to get the final tree T ′. The following argument will provethat both T and T ′ are indeed both connected trees.

Proposition 18 (Distance Preservation) Let ε < 1/4. If the input graph is a tree T =(V ,E), then the above procedure results in a connected tree T ′ = (V ′,E′) such thatfor any x, y ∈ V ,

(1 + ε)−1dT (x, y) ≤ dT ′(x, y) ≤ (1 + ε)dT (x, y).

Proof Let us consider performing the above-mentioned transformation for edges inincreasing order of edge-length. Given j ∈ Z≥0, let Tj be the forest formed by delet-ing all edges of length more than Cε2j from T ; also, let T ′

j be the forest formed bydeleting the corresponding edges in T ′. We will prove by induction on j that for allx, y that lie in some connected component in Tj , their distance in T ′

j will satisfy thedesired stretch bound. The base case is trivial, since all components of T0 have singlenodes in them.

To prove the claim for j , we inductively assume it for j − 1. Now consider takingsome edge e = {u,v} of length �(e) ∈ (Cε2j−1,Cε2j ]. In this case we find somenodes u and v, and add an edge of length �(e) between u[j ] and v[j ]. By the propertiesof the net-tree, the distance dT (u, u) ≤ 2j+1 − 2. Since Tj already contains all edgesof length at most Cε2j−1, and Cε ≥ 4, the net point u lies in the same component as u

in Tj . By the induction hypothesis, dT ′(u, u) ≤ (1 + ε)2j+1; note that this implicitlyproves that u and u are in the same component in T ′

j . A similar claim holds fordT (v, v). Hence the distance in T ′

j+1 between u and v is at most

dT ′j(u, u) + dT ′

j(u, u[j ]) + �(e) + dT ′

j(v[j ], v) + dT ′

j(v, v)

= 2 × (1 + ε)2j+1 + 2 × 2j+1 + �(e)

≤ �(e)

(8(1 + ε) + 8

+ 1

)

≤ (1 + ε)�(e),

where we used the fact that Cε = (4 + 32ε

) and ε < 1/4. Since each of the edges of T

are not stretched by more than (1 + ε) in T ′, it follows that the stretch for all pairs isbounded by (1 + ε).

We also need to show that the distances are not shrunk too much in T ′: to showthis, we go via T . (Recall that T was the original tree T along with the exponentialtails.) First note that for any u,v ∈ V , dT (u, v) = dT (u, v). We show that distance donot shrink in going from T to T ′. It suffices to show this for the edges of T ′. For anedge e′ = (u[j ], v[j ]) that has length �(e) ≥ Cε2j−1, we note that their distance in T

dT (u[j ], v[j ]) ≤ dT (u[j ], u) + dT (u, u) + �(e) + dT (v, v) + dT (v, v[j ])

≤ 4(2j+1 − 2) + �(e) ≤ (1 + ε)�(e),

where the last inequality follows from the facts that �(e) ≥ Cε2j−1 and Cε > 32/ε.Thus the contraction going from T to T ′ is at most (1 + ε).

Page 12: Making Doubling Metrics Geodesic

Algorithmica (2011) 59: 66–80 77

To end, we note that we have shown that T ′ is connected (since it has finite dis-tortion with respect to T ), and the number of edges in T ′ is equal to the number ofedges in T , which is a tree. Thus T ′ is a tree as well. �

5.2.1 Dimension of the Convex Closure

Finally, to show that the doubling dimension of conv(T ′) is small, we will againbound the number of long edges, and invoke Theorem 12. However, since we haveadded additional vertices in going from T to T ′, we first show that dimT ′ is O(dimT );given that distances are well-preserved between T and T ′, it suffices to bound thedoubling dimension of T .

Lemma 19 The doubling dimension of T (and hence of T ′) is at most O(dimT ).

Proof Let u[i] ∈ V (T ) and R ∈ (2j−1,2j ] for integer j ≥ 1. We claim thatBT (u[i],2R) can be covered by a small number of balls of radius R. From thedefinition of dimT , there is a set Y with |Y | ≤ 22 dimT such that BT (u,2R) ⊆⋃

y∈Y BT (u,R/2). Note that for any v �∈ φ(Nj−2), the tail attached to v has length atmost R/2. Let Z = BT (u,2R) ∩ φ(Nj−2), where for j = 1, we consider N−1 = N0.By Fact 8, it follows that |Z| ≤ 2O(dimT ). Finally, let Z′ = {v[j−1] : v ∈ Z} andZ′′ = {v[j ] : v ∈ Z}. It is easy to verify that BT (u[i],2R) ⊆ ⋃

y∈Y∪Z∪Z′∪Z′′ BT (y,R),and hence the claim follows. �

Finally, we bound on the number of long edges in T ′.

Lemma 20 (Few Long Edges) For any vertex v ∈ T ′ and every radius R, the numberof long edges in T ′ is bounded by 2O(dimT ) log ε−1.

Proof First consider some v ∈ V , and R ≥ 0, and define � ∈ Z≥0 such that R ∈(Cε2�−1,Cε2�]. Every long edge incident on BT "(v,R) must have length at leastR. Further, edges longer than 2CεR are incident on a tail node further than R fromits root, and hence such an edge cannot be incident on BT ′(v,R). For each of thelength scales (Cε2�+j−1,Cε2�+j ) : 0 ≤ j ≤ logCε , we will bound the number oflong edges in that length scale. Fix one such scale, and let L(v,R, j) = {(ui,wi) :1 ≤ i ≤ |L(v,R, j)|} be the set of long edges of length in (Cε2�+j−1,Cε2�+j ), suchthat d(v,ui) ≤ R. Since each long edge has length more than R, there is a pathfrom v to ui that does not use any of the long edges. Consider the set of nodesW = {wi : 1 ≤ i ≤ |L(v,R, j)|}. Clearly, for any w,w′ ∈ W , d(w,w′) is at most2R + 2Cε2�+j ≤ 4Cε2�+j . Moreover, since T is a tree, the symmetric difference ofthe v-w and v-w′ paths gives the shortest path from w′ to w. Since the long edgesincident on w and w′ are in this symmetric difference, we conclude that d(w,w′) ≥2Cε2�+j−1. From the bound in Lemma 19 on doubling dimension of T ′, we concludethat |W | ≤ 2O(dimT ). Summing the contribution over each of the O(log ε−1) distancescales, we get the desired bound.

We now extend the argument to a vertex v[i] on an exponential tail hanging off v.If i ≥ j , then BT ′(v[i],R) = {v[i]}. All edges incident on v have, up to a factor of

Page 13: Making Doubling Metrics Geodesic

78 Algorithmica (2011) 59: 66–80

two, the same length, and thus their endpoints form a near uniform submetric. Thuswe can bound the degree of v[i] by 2O(dimT ) and the claim follows. On the other hand,when i < j , BT ′′(v[i],R) ⊆ BT ′(v,2R) and an argument analogous to the one for thecase v ∈ V above suffices. �

Using the relationship between the doubling dimension of the convex closureand the number of long edges given by Theorem 12, we get that dimconv(T ′) =O(dimT + log log ε−1), which finishes the proof of Theorem 3.

6 Lower Bounds

In this section, we show lower bounds on the trade-off between the dimension blow-up and the distortion.

Let m > 0 be an integer and let k = logm. Consider the graph K1,mn with v0as the center node and {v1, . . . , vmn} as the set of leaves. Set the length of the edge

{v0, vi} to 2� im

� and let d be the resulting metric on the vertices V of K1,mn. In otherwords, for each j = 1, . . . , n, there are m nodes at distance exactly 2j from v0. It iseasy to check that this metric has doubling dimension O(k). We next show that thedoubling dimension of any geodesic metric (X,d ′) containing a (1 + ε)-distortioncopy of (V , d) is �(k + log log ε−1).

Lemma 21 Let (X,d ′) be any geodesic metric such that V ⊆ X and d(vi, vj ) ≤d ′(vi, vj ) ≤ (1+ε)d(vi, vj ) for all vi, vj ∈ V . Then dim(X,d ′) is �(k+ log log ε−1).

Proof Denote by uw[x] the point on the shortest u-w path in X that is at distancex from u (if there is more than one shortest path, pick one arbitrarily). We shallargue that the points v0vi[1] for i = {1, . . . ,m log(2ε)−1} form a large near-uniformsubmetric in X. Indeed d ′(v0vi[1], v0vj [1]) ≤ d ′(v0vi[1], v0) + d ′(v0, v0vj [1]) = 2.On the other hand, by triangle inequality,

d ′(v0vi[1], v0vj [1]) ≥ d ′(vi, vj ) − d ′(v0vi[1], vi) − d ′(v0vj [1], vj )

= d ′(vi, vj ) − (d ′(v0, vi) − 1) − (d ′(v0, vj ) − 1)

≥ 2 + d(vi, vj ) − (1 + ε)(d(v0, vi) + d(v0, vj ))

= 2 − ε(2� im

� + 2j� jm

�)

where we have used the bound on the distortion and the distance definitions in d in thelast two steps. Since i, j ≤ m log(2ε)−1, we conclude that d ′(v0vi[1], v0vj [1]) ≥ 1.Thus we have m log(2ε)−1 points in X that lie within B(v0,2) no two of which can becovered by a single ball of radius 1

2 . By Fact 8, it follows that the doubling dimensionof X is �(k + log log ε−1). �

Theorem 4 follows.Note that in the construction in Theorem 2, the graph G′ is defined on the same

vertex set as G. Under such a constraint, we can improve the lower bound even when

Page 14: Making Doubling Metrics Geodesic

Algorithmica (2011) 59: 66–80 79

G is restricted to be an ultrametric. Let V = {0,1}p with d(x, y) = 2p−lcp(x,y), wherelcp(x, y) denotes the length of the longest common prefix of strings x and y. Onceagain, one can easily check that (V , d) has constant doubling dimension. We showthat any graph H = (V ,E) on V approximating d within distortion (1 + ε) mustsatisfy dim(conv(H)) ∈ �(log ε−1).

Lemma 22 Let H = (V ,E) be any graph such that the shortest path metric d ′ sat-isfies d(x, y) ≤ d ′(x, y) ≤ (1 + ε)d(x, y) for all x, y ∈ V . Then dim(conv(H)) is�(log ε−1).

Proof For p = log(2ε)−1, we first show that H must have all edges connectingV0 = {0x : x ∈ {0,1}p−1} and V1 = {1x : x ∈ {0,1}p−1}. Indeed, suppose that edge(0x,1y) �∈ H . Then the shortest path in H between 0x and 1y must be of lengthat least 2p + 1. This however violates the distortion constraint. Now consider theset of points A = {e[2p−1] : e = (0x,1y, x, y ∈ {0,1}p−1}. Clearly for any a, b ∈ A,d(a, b) ≤ 3 · 2p−1 and d(a, b) ≥ 2 · 2p−1. The claimed bound on the doubling dimen-sion follows by using Fact 8. �

Theorem 5 follows. We note that by taking V to be [m]n, we can get a met-ric with doubling dimension k = logm, for which the lower bound improves to�(k + log ε−1). These results show that the dependence on ε in the result for generalmetrics cannot be improved without the addition of Steiner points.

7 Implications and Connections

The constructions of this paper allow us to take weighted graphs G = (V ,E) andoutput unweighted graphs G′ = (V ′,E′) such that V ⊆ V ′, and such that dG′ |V ×V =O(dG) and dimG′ = O(dimG). As a consequence, one can use these result in con-junction with results for unweighted graphs to deduce results for weighted graphs.We therefore expect the results in this paper to be useful for proving theorems aboutdoubling metrics.

As an example, we can use an relatively simple embedding of unweighted dou-bling trees into �p given by [8] to obtain a different proof of the embedding ofweighted doubling tree metrics into �p spaces.

Corollary 23 [8, 17] Every (weighted) doubling tree metric M embeds into �p withconstant distortion and constant dimension.

We would like to note that the result of [17] is quantitatively stronger, since itgives a distortion which is O(log logλM), whereas the embedding implied by theconstruction above has a distortion polynomial in λM; here λM is the doubling con-stant of the tree metric M.

Acknowledgements We would like to thank Robi Krauthgamer and Ravishankar Krishnaswamy fordiscussions, James Lee for pointing us to [19], and the anonymous referees for several helpful comments.

Page 15: Making Doubling Metrics Geodesic

80 Algorithmica (2011) 59: 66–80

References

1. Assouad, P.: Plongements Lipschitziens dans Rn . Bull. Soc. Math. France 111(4), 429–448 (1983)2. Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: The 23rd International

Conference on Machine Learning (ICML) (2006)3. Calinescu, G., Karloff, H., Rabani, Y.: Approximation algorithms for the 0-extension problem. SIAM

J. Comput. 34(2), 358–372 (2004/2005)4. Chan, T.-H.H., Gupta, A., Maggs, B.M., Zhou, S.: On hierarchical routing in doubling metrics. In: Pro-

ceedings of the 16th ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 762–771 (2005)5. Clarkson, K.L.: Nearest neighbor queries in metric spaces. Discrete Comput. Geom. 22(1), 63–93

(1999)6. Cole, R., Gottlieb, L.-A.: Searching dynamic point sets in spaces with bounded doubling dimension.

In: The Thirty-Eighth Annual ACM Symposium on Theory of Computing (STOC) (2006)7. Fakcharoenphol, J., Harrelson, C., Rao, S., Talwar, K.: An improved approximation algorithm for the

0-extension problem. In: Proceedings of the Fourteenth Annual ACM-SIAM Symposium on DiscreteAlgorithms, pp. 257–265. SIAM, Philadelphia (2003)

8. Gupta, A., Krauthgamer, R., Lee, J.R.: Bounded geometries, fractals, and low–distortion embeddings.In: Proceedings of the 44th Symposium on the Foundations of Computer Science (FOCS), pp. 534–543 (2003)

9. Har-Peled, S., Mendel, M.: Fast construction of nets in low-dimensional metrics and their applications.SIAM J. Comput. 35(5), 1148–1184 (2006) (electronic)

10. Indyk, P., Naor, A.: Nearest-neighbor-preserving embeddings. ACM Trans. Algorithms 3(3), 31(2007), 12 pp.

11. Johnson, W.B., Lindenstrauss, J., Schechtman, G.: Extensions of Lipschitz maps into Banach spaces.Isr. J. Math. 54(2), 129–138 (1986)

12. Karzanov, A.: Minimum 0-extensions of graph metrics. Eur. J. Comb. 19(1), 71–101 (1998)13. Konjevod, G., Richa, A.W., Xia, D.: Optimal scale-free compact routing schemes in doubling net-

works. In: Proceedings of the 18th ACM-SIAM Symposium on Discrete Algorithms (SODA) (2007)14. Krauthgamer, R., Lee, J.R.: Navigating nets: simple algorithms for proximity search. In: Proceed-

ings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 798–807. SIAM,Philadelphia (2004)

15. Krauthgamer, R., Lee, J.R.: The black-box complexity of nearest-neighbor search. Theor. Comput.Sci. 348(2–3), 262–276 (2005)

16. Lee, J.R., Naor, A.: Absolute Lipschitz extendability. C. R. Math. Acad. Sci. Paris 338(11), 859–862(2004)

17. Lee, J.R., Naor, A., Peres, Y.: Trees and Markov convexity. Geom. Funct. Anal. 18(5), 1609–1659(2009)

18. Matoušek, J.: Extension of Lipschitz mappings on metric trees. Comment. Math. Univ. Carol. 31(1),99–104 (1990)

19. Semmes, S.: On the nonexistence of bi-Lipschitz parameterizations and geometric problems aboutA∞-weights. Rev. Mat. Iberoam. 12(2), 337–410 (1996)

20. Talwar, K.: Bypassing the embedding: algorithms for low-dimensional metrics. In: Proceedings of the36th ACM Symposium on the Theory of Computing (STOC), pp. 281–290 (2004)