Approximating the MST Weight in Sublinear Time
-
Upload
guinevere-madden -
Category
Documents
-
view
40 -
download
1
description
Transcript of Approximating the MST Weight in Sublinear Time
Approximating the MST Weight in Sublinear Time
Bernard Chazelle (Princeton)
Ronitt Rubinfeld (NEC)
Luca Trevisan (U.C. Berkeley)
Sublinear Time Algorithms
• Make sense for problems on very large data sets
• Go contrary to common intuition that “an algorithm must be given at least enough time to read all the input”
• Must be probabilistic• Must be approximate
Approximation
• For decision problems:the output is the correct answer either for the given input, or at least for some other input “close” to it.(Property Testing)
• For optimization problems:the output is a number that is close to the cost of the optimal solution for the given input.(There is not enough time to construct a solution)
Previous Examples
• The cost of the max cut in a graph with n nodes and cn2 edges can be approximated to within a factor in time 2poly(1/c).(Goldreich, Goldwasser, Ron)
• Other results for “dense” instances of optimization problems, for low-rank approximation of matrices, . . .
• No results (that we know of) for problems on bounded-degree graphs.
Our Result
• Given a connected weighted graph G, with maximum degree d and with weights in the range {1, . . . , w},
• we can compute the weight of the minimum spanning tree of G to within a factor of in time O(dw-2log w/);
• we also prove that it is necessary to look at dw-2) entries in the representation of G.
(We assume that G is represented using adjacency lists)
Main Intuition
• Suppose all weights are 1 or 2• Then the MST weight is equal to
n – 2 + # of conn. comp. induced by weight-1 edges
weight 1
weight 2connected componentsInduced by weight-1 edges
MST
Algorithm
Algorithm for weights in {1,2}• To approximate the MST weight to within a
multiplicative factor (1+) it’s enough to approximate c1 to within an additive factor n
(c1:= # of connected components induced by weight-1 edges)
• To approximate c1 we use ideas from Goldreich-Ron (property testing of connectivity)
• The algorithm runs in time O(d-2log-1)
Approximating # of connected components
• Given a graph G of max degree d with n nodes we want to compute c, the number of connected components of G up to an additive error n.
• For every vertex u, definenu := 1 / size of component of u
• Thenc = u nu
• And if we callau:= max {nu, }
• Thenc = u au n
Wrapping up the analysis
• Can estimate summation of au using sampling
• Once we pick a vertex u at random, the value au can be computed in time O(d/)
• We need to pick O(1/) vertices, so we get running time O(d/)
Algorithm
CC-APPROX() Repeat O(1/2) times
pick a random vertex v
do a BFS from v, stopping after 2/ steps
b:= 1 / number of visited vertices
return (average of the values b) * n
Improved Algorithm
CC-APPROX(, W)
Repeat O(1/2) times
pick a random vertex v
do first step of a BSF from v
b:=0; t:=1
(*) flip a coin
If heads, and visited <W nodes so far
t:=2*t
continue BSF until ends or t nodes are visited
if BSF ends, b:= 2#random coins / nodes visited
else go to (*)
return (average of the values b) * n
• Inner procedure takes average O(dlog W) time
Analysis
• Main idea: if v is in a component of size c<W, then b is zero with prob. ~(1 – 1/c) and ~1 with probability ~1/c. The average of b is 1/c.
• Setting W:=2/ we get– each time, the average of b is within /2 from the
average over v of nv
(that is, (# conn. comp.)/n)– Repeating O(1/2) times, the probability of
deviating by another factor /2 is bounded by a constant
– The average running time is O(d-2logW), that is O(d-2log -1).
General Weights
• Generalize argument for weight 1 and 2.• Let
ci = # of connected components induced by edges of weight at most i
• Then the MST weight is
n – w + i=1,. . ., w-1 ci
Final Algorithm
• For j=1,. . ., w-1, call CC-APPROX(,2w/) on the subgraph of G obtained by removing edges of cost >j
• Get ai, an approximation of ci
• Return n – w + i=1,. . ., w-1 ai
• Average answer is within n/2 from cost of MST, and variance is bounded
• Total running time O(dw-2log w/)
Extensions
• Low average degree
• Non-integer weights
Lower Bound
Abstract sampling problem
• Fix p,• Define two binary distributions A,B• Pr[A=1] = p, Pr[A=0]=1-p• Pr[B=1] = p+ p, Pr[B=0]=1-p-p• Distinguishing A from B with constant
probability requires (1/p2) samples
Reduction• Fix p = 1/w• We consider two distributions of weights over
a cycle of length n• In distribution G, for each edge we sample
from A; if A=0 the edge gets weight 1, otherwise it gets weight w
• In distribution H, same with B• H and G are likely to have MST costs that
differ by about n• To distinguish them we need to look at
(w/2) edge weights
Higher Degree
• Sample from G or H as before, – also add d-1 forward edges of weight w+1
from each vertex– randomly permute names of vertices
• Now, on average, reading t edge weights gives us t/d samples from A or B, so t=(dw/2)
Conclusions
• A plausibility result showing that approximation for a standard graph problem in bounded degree (and sparse) graphs can be achieved in time independent of number of vertices
• Use of approximate cost without solution?• More problems?
– The trivial Max SAT approximation algorithms can be implemented in constant time, and give (an implicit representation of) a solution
– Non-trivial Max SAT approximation? (say, 3/4)– Something really useful?