External-Memory MST

25
External-Memory MST (Arge, Brodal, Toma)

description

External-Memory MST. (Arge, Brodal, Toma). Minimum-Spanning Tree. Given a weighted, undirected graph G=(V,E), the minimum-spanning tree (MST) problem is the problem of finding a spanning tree for G of minimum weight. Assumptions: G is connected; No two edges in G have the same weight. - PowerPoint PPT Presentation

Transcript of External-Memory MST

Page 1: External-Memory MST

External-Memory MST

(Arge, Brodal, Toma)

Page 2: External-Memory MST

Minimum-Spanning Tree

• Given a weighted, undirected graph G=(V,E), the minimum-spanning tree (MST) problem is the problem of finding a spanning tree for G of minimum weight.

• Assumptions:

1. G is connected;

2. No two edges in G have the same weight.

Page 3: External-Memory MST

External-Memory Graph Algorithms

• Standard two-level I/O model with a single disk:• N = V + E• M = number of vertices/edges that can fit into

internal memory.• B = number of vertices/edges per disk block.

• The graph is given as a list of edges sorted by vertex.

Page 4: External-Memory MST

External-Memory Graph Algorithms (2)

• For MST and CC, randomize O(sort(E)) I/Os algorithms are known.

Page 5: External-Memory MST

Prim’s Algorithm

a

b

d

c

e

f

1

5

6

4

37

92

8

Priority Queue: a b c d e f

b

1 5 6 3 6 7 2 8 7

a

4 7

c

7

d

b,a

a,c

c,d

d,e

a, fe

Page 6: External-Memory MST

Prim’s Algorithm (2)

• Prim’s algorithm cannot be implemented efficiently in external memory:

• It is not guaranteed that even the priority queue alone fits in memory.

• Thus, we cannot in general get the current vertex priority without using an I/O.

• A direct implementation leads to an Ω(E) I/O algorithm.

Page 7: External-Memory MST

Prim’s Algorithm (3)

a

b

d

c

e

f

1

5

6

4

37

92

8

Priority Queue:

b,a

a,c

c,d

d,e

a, f

Modification: store edges in the priority-queue instead of vertices.

b

b,a (1)

b,c (5)

b,d (6)

a

a,c (3)

b,c (5)

b,d (6)

a, f (7)

c

c,d (2) b,d (6)

c,b (5) a, f (7)

b,c (5) c,e (8)

d

d,e (4) b,d (6)

c,b (5) a, f (7)

b,c (5) c,e (8)

d,b (6)

c,b (5) a, f (7)

b,c (5) e,c (8)

b,d (6) c,e (8)

d,b (6) e, f (9)

e

b,d (6) e,c (8)

d,b (6) c,e (8)

a, f (7) e, f (9)

a, f (7)

e,c (8)

c,e (8)

e, f (9)

f

e,c (8)

c,e (8)

e, f (9)

f, e (9)

Any two edges have distinct weights

Page 8: External-Memory MST

Modified Prim Algorithm

• The correctness follows directly from the correctness of the original algorithm (“blue rule” still applies).

• Efficiency:– At least one I/O per vertex in order to read its

adjacency list => O(V + E/B) I/Os.– O(E) operations on external priority queue can be

performed in O(sort(E)).– Thus in total we have O(V + sort(E)) I/Os.

Page 9: External-Memory MST

Boruvka’s Algorithm

a

b

d

c

e

f

1

5

6

4

37

92

8

b,a

c,d

d,e

a, f

(1) Select for each vertex the minimum weight edge adjacent to it.

(2) Contract the graph and return to (1)

Page 10: External-Memory MST

Boruvka’s Algorithm

abf

cde

3,5,6,9

b,a

a,c

c,d

d,e

a, f

(1) Select for each vertex the minimum weight edge adjacent to it.

(2) Contract the graph and return to (1)

Page 11: External-Memory MST

External-Memory Boruvka’s Step

• For each vertex v, let C(v) be the lightest vertex adjacent to it.

• Let G’ be the graph obtained by taking only edges of the form (v, C(v)) for each v.

• Let G’d be the graph obtained by directing each edge (v, C(v)) in G’ from C(v) to v.

• The goal is to contract each connected component in G’ into a single vertex.

Page 12: External-Memory MST

Unique Representatives

In each connected component of G’d:

• Each vertex has indegree 1.

•The weight of the edges along any root-leaf path is increasing.

• There is exactly one cycle, consisting of the minimal weight edge.

Page 13: External-Memory MST

External-Memory Boruvka’s Step (2)

• The roots can be easily identified, and we can choose them to be the unique representatives of the components in G’.

• We would like to replace each edge (u, v) with an edge (ur, vr), where ur and vr are the unique representatives of the components containing u and v respectively.

• Then, we can remove parallel & self edges, and obtain the contracted graph.

Page 14: External-Memory MST

External-Memory Boruvka’s Step (3)

a

b

d

c

e

f

1

5

6

4

3 7

92

8

GG’G’d

L:

(b,a) (1); (a, f) (7)

(c,d) (2); (d,e) (4)

(d,e) (4)

(a, f) (7)

Priority Queue:

a (1) [b]

d (2) [c]

Initialized with each vertex that is an immediate successor of a root vertex.

d (2) [c]

f (7) [b]

Output:

b → bc → ca → bd → c

f → be → c

e (4) [c]

f (7) [b]

Page 15: External-Memory MST

External-Memory Boruvka’s Step (4)

To finish the contraction:

1. sort the output of the previous phase and E by the first component. Then scan the two lists simultaneously, replacing each edge (v, u) in E with (vr,u).

2. sort the output and E by the second component, and then scan the two lists replacing each edge (vr, u) in E with (vr, ur).

3. sort E by both components and by weight, and with a single scan remove duplicate & self edges.

Page 16: External-Memory MST

Boruvka’s Step - I/O efficiency

1. Lightest incident edges can be collected in O(E/B) I/Os in a simple scan of the edge-list representation of G (we assume it is sorted).

2. Detection of cycles in G’d can be done in O(sort(V)) I/Os:

• sort the collected edges by weight and find duplicates in a single scan.

• remove edges to break cycles and identify unique representatives.

Page 17: External-Memory MST

Boruvka’s Step - I/O efficiency (2)3. The list L contains each edge in G’d at most

twice, and can be constructed in O(sort(V)) I/Os:

• sort one instance of the list of edges by the second component.

• sort another instance by the first component.

• create the structure of L in a single scan and sort it by weight.

4. The PQ can be initialized in a similar way in O(sort(V)) I/Os.

Page 18: External-Memory MST

Boruvka’s Step - I/O efficiency (3)5. We perform a total of V insertions to PQ, and V

extract-min operations. That can be performed in O(sort(V)) I/Os.

6. Replacing the edges of G with the unique representatives is done using a few sorting and scanning operations as described before. Here the entire edge list is sorted, and thus O(sort(E)) I/Os are needed.

Total:

O(E/B + sort(V) + sort(E)) = O(sort(E)) I/Os.

Page 19: External-Memory MST

Results So Far

O(sort(E)·lg(V·B/E)) I/Os1. Contract G until V ≤ E/B

using Boruvka’s steps.

2. Run Prim on the result.

O(sort(E) · lgV) I/OsModified Boruvka

O(V + sort(E)) I/OsModified Prim

It is possible to perform lg(V·B/E) Boruvka’s steps using lglg(V·B/E) superphases requiring O(sort(E)) I/Os each.

Page 20: External-Memory MST

Yet a better MST algorithmSuperphase Algorithm

At superphase i :

• Let Ni = 2(3/2)i (Ni+1= Ni·(Ni)1/2)

• Let Gi = (Vi, Ei) be the graph prior to superphase i.

• Let Ei‘ Ei be the set that for each vertex contains the √Ni lightest edges incident to it.

• Let the blocking value for a vertex be the weight of the √Ni + 1th lightest edge incident to it (or infinity if no such edge exists).

• Ei‘ and blocking values can be found with O(sort(Ei)) I/Os as described earlier.

Page 21: External-Memory MST

Superphase Algorithm• At superphase i, perform on Gi‘ log√Ni contraction phases

as described before, but now select the lightest edge incident to a vertex only if it is smaller than its blocking value.

• After a single contraction, the blocking value of a supervertex is set to be the minimum of the blocking values of the contracted vertices.

• After that, the remaining edges of Ei‘ contain all edges of Ei adjacent to supervertex v with weight smaller than the blocking value of v.

• Thus only edges that actually belong to the MST are contracted.

Page 22: External-Memory MST

Superphase Algorithm (2)But how many vertices remain after each superphase?

• The blocking value might prevents us from selecting an edge for v. But if so than:• The blocking value of v corresponds to the blocking value of

some vertex u in Vi, and v must contain the √Ni edges adjacent to u in Ei‘.

• Thus v must be the contraction of at least √Ni vertices from Vi

• If no blocking value prevents us from selecting an edge for v, then after log√Ni phases, v must be the contraction of at least 2log√Ni

= √Ni vertices.

Page 23: External-Memory MST

Superphase Algorithm (3)• It can be proved by induction on i that Vi ≤ 2V / Ni :

• For i = 0, Ni = 2 and V0 = V.

• Vi+1 ≤ Vi / √Ni ≤ (2V / Ni) / √Ni = 2V / Ni+1

• Conclusion: Ei‘ ≤ Vi √Ni ≤ 2V / √Ni

• Thus, in order to reduce the number of vertices by a factor of √Ni we used so far:

O(sort(Ei) + sort(Ei‘) · log√Ni) =

O(sort(E) + sort(V / √Ni) · log√Ni) =

O(sort(E)) I/Os.

Page 24: External-Memory MST

Superphase Algorithm (4)• In order to finish a superphase, we need to

reincorporate edges from Ei not selected to Ei‘:

• During the contraction phases, maintain a list C of the form (v, vs) for v Vi.

• Use the output of the Boruvka’s step, as described earlier, in order to update C:

• Sort C by second component and the output by first component and scan them simultaneously.

• This is done using O(sort(Vi)) I/Os.

• In total, in order to maintain C, we use:

O(sort(Vi)·log√Ni) = O(sort(V / Ni)·log√Ni) = O(sort(V)) I/Os.

Page 25: External-Memory MST

Superphase Algorithm – I/O Efficiency

1. Ei‘ and blocking values are computed in O(sort(Ei)) I/Os.

2. Each superphase takes up O(sort(E)) I/Os.

3. Maintaining the list C during the superphase is done with O(sort(V)) I/Os.

4. Given C, the edges in (Ei \ Ei‘) can be reincorporated in O(sort(E)) as we did in the single contraction algorithm.

5. Finally, in order to reduce V to E/B, log3/2lg(V·B / E) superphases are needed.

6. Total: O(sort(E)·lglg(V·B / E)) I/Os.