Improved approximation for k-median Shi Li Department of Computer Science Princeton University...

Post on 15-Jan-2016

234 views 0 download

Tags:

Transcript of Improved approximation for k-median Shi Li Department of Computer Science Princeton University...

Improved approximation for k-medianShi Li

Department of Computer SciencePrinceton UniversityPrinceton, NJ, 08540

04/20/2013

$100 $130

maintenance cost transportation cost

$10$20

$50

$30$30

+ minimize

BALINSKI, M. L.1966. On finding integer solutions to linear programs. In Proceedings of the IBM Scientific Computing Symposium on Combinatorial Problems. IBM, New York, pp. 225–248.KUEHN, A. A., AND HAMBURGER, M. J. 1963. A heuristic program for locating warehouses.STOLLSTEIMER, J. F.1961. The effect of technical change and output expansion on the optimum number, size and location of pear marketing facilities in a California pear producing region. Ph.D. thesis, Univ. California at Berkeley, Berkeley, Calif. STOLLSTEIMER, J. F.1963. A working model for plant numbers and locations. J. Farm Econom. 45, 631– 645.

Facility Location Problem

Uncapacitated Facility Location (UFL)

facility cost connection cost

+

F : potential facility locations

C : set of clientsfi , i F : cost for opening i

d : metric over F C

find S F,

minimize

facilities clients

$30

$100

$100

$100

$20

$100

Wal-mart Stores in New Jersey

Question :

Suppose you have budget

for 50 stores, how will you

select 50 locations?

k-median facilities clients

+

F : potential facility locations

C : set of clients

d : metric over F C

find S F,

minimize

fi , i F : cost for opening ik : number of facilities to open

|S |= k

k-median clustering

Known Results: UFL O(log n)-approximation [Hoc82] constant approximations

3.16 [STA98] 2.41 [GK99] 3 [JV99] 1.853 [CG99] 1.728 [CG99] 5+ε [Kor00] 1.861 [MMSV01]

1.736 [CS03] 1.61 [JMS02] 1.582 [Svi02] 1.52 [MYZ02] 1.50 [Byr07] 1.488 [Li11]

1.463-hardness of approx. [GK98]

4 Deterministic rounding of linear programs 4.5 The uncapacitated facility location problem

5 Random sampling and randomized rounding of linear programs

5.8 The uncapacitated facility location problem

7 The primal-dual method7.6 The uncapacitated facility location problem

9 Further uses of greedy and local search algorithms9.1 A local search algorithm for the uncapacitated facility location problem9.4 A greedy algorithm for the uncapacitated facility location problem

12 Further uses of random sampling and randomized rounding of linear programmings

12.1 The uncapacitated facility location problem

Know results : k-median

pseudo-approximation 1-approx with O(k log n) facilities [Hoc82]

2(1+ε)-approx. with (1+1/ε)k facilities[LV92]

super-constant approximation O(log n loglog n) [Bar96,Bar98]

O(log k loglog k) [CCGS98]

Known Results: k-median

constant approximation

LP rounding Primal-Dual Local Search

6.667 [CGTS99] 6 [JV99]

4 [CG99]4 [JMS03]3.25 [CL12]

3+ε [AGK+01]

1+√3+ε [LS13]

(1+2/e)-hardness of approximation [JMS03]

Lloyd Algorithm[Lloyd82] k-means clustering : min total squared distances

k-means vs k-median• clustering: k-means is more

often used

• Walmart example: k-median

is more appropriate

• approximation: k-median is

“easier”

Local Search Can we improve the solution

by p swaps? No : stop

Yes : swap and repeat

Approximation : k-median : 3+2/p [AGK+01]

k-means : (3+2/p)2 [KMN+02]

LP for k-medianyi : whether to open i

xi,j : whether connect j to i

open at most k facilitiesclient j must be connectedclient j can only connected to an open facility

integrality gap is at least 2

integrality gap is at most 3 (proof non-constructive)

(1+√3+ε)-approximation on k-median

k-median and UFL f = cost of a facility

f #open facilities

Given a black-box α-approximation A for UFL

Naïve try : find an f such that A opens k facilities

α-approxition for k-median?

Proof : α ≈1.488 for UFL, α > 1.736 for k-median

k-median and UFLNaïve try : find an f such that A opens k facilities

2 issues with naïve try :

1. need LMP α-approximation for UFL

α-approximation:

LMP α-approximation

LMP = Lagragean Multiplier Preserving

k-median and UFL

S1 : set of k1 < k facilities

S2 : set of k2 > k facilitiesbi-point solution

Naïve try : find an f such that A opens k facilities

2 issues with naïve try :

1. need LMP α-approximation for UFL

2. can not find f s.t. A opens exactly k facilities

k-median and UFL2 issues with naïve try :

1. need LMP α-approximation for UFL

2. can not find f s.t. A opens exactly k facilities

LMP approx. factor

bi-point integral

final ratio for k-median

[JV] [JMS]

3

x 2

6

2

x 2

4

our result

2

do not know how to improvethis factor of 2 is tight !!

bi-point solution

k1= |S1| < k ≤ |S2| = k2

a, b : ak1 + bk2 = k, a + b = 1

bi-point solution : aS1+bS2

cost(aS1+bS2) = a cost(S1) + b cost(S2)

S1 S2

gap-2 instance

1

0

k + 1

cost of integral solution = 2

k1 = 1, k2 = k+1

cost(S1) = k+1, cost(S2) =

0

S1 S2

k-median and UFL

Main Lemma 2 : bi-point solution of cost C

solution of cost with k+O(1/ε) facilities

[JV] [JMS] our result

LMP approx. factor 3 2 2bi-point integral x 2 x 2

final ratio for k-median 6 4

this factor of 2 is tight !!

bi-point pseudo-integral

Main Lemma 1 : suffice to give an α-approximate

solution with k+O(1) facilities

Main Lemma 1

with k+1 open facilities, cost = 0with k open facilities , cost huge

A : black-box α-approximation with k+c open facilities

A' : (α+ε)-approximation with k open facilities

A' calls A nO(c/ε) times.

bad instance:

Dense FacilityBi : set of clients in a small ball around i

i is A-dense, if connection cost of Bi in OPT is ≥ A

iBi

this instance : i is A-dense for A≈opt

Dense Facility

Bi

Reduction component works directly if there are no opt/t-dense facilities, t = O(c/ε)

can reduce to such an instance in nO(t) time

i

[Awasthi-Blum-Sheffet] : ε, δ >0 constants,

OPTk-1 ≥ (1+δ)OPTk can find (1+ε)-approximation

Main Lemma 1 : suffice to give an α-approximate

solution with k+O(1) facilities

k-median clustering is easy in practice reason : there is a “meaningful” clustering

Lemma 1 from [ABS]

Lemma 1 from [ABS]

Algorithm

Apply A to (k-c, F, C, d) solution with k facilities of cost ≤ αOPTk-c

Apply [ABS] to each (k-i, F, C, d) for i = 0, 1, 2, …, c-1

Output the best of the c+1 solutions

Proof

If OPTk-c ≤ (1+ε)OPTk, then done.

otherwise, consider the smallest i s.t. OPTk-i-1 ≥ (1+ε)1/cOPTk-i

[ABS] on (k-i, F, C, d) solution of cost (1+ε)OPTk-i ≤ (1+ε)2OPTk

[ABS] OPTk-1 ≥ (1+δ)OPTk (1+ε)-approximation

A : α-approximation algorithm for k-median with k+c medians

Main Lemma 2 : bi-point solution of cost C

solution of cost with k+O(1/ε) facilities

[JV] bi-point solution of cost C solution of cost 2C

based on improving [JV] algorithm

S1 S2

given : bi-point solution aS1+bS2

select S’2 S2 , |S’2| = |S1| = k1

with prob. a, open S1

with prob. b, open S’2

randomly open k-k1 facilities in S2 \ S’2

i

JV algorithm

τi = nearest facility of i

guarantee : either i is open, or τi is open

Analysis of JV algorithm

i1

i2

i3≤ d1+d2

If i2 is open, connect j to i2

Otherwise, if i1 is open, connect j to i1

Otherwise connect j to i3

E[cost of j] ≤ × [cost of j in aS1+bS2]

d1 d2ji1 S1 , i3 S’2

either i1 or i3 is open

2

Our Algorithmon average, d1 >> d2

d(j, i3) ≤ i1

i2

i3

d1 d2

≤ d1+d2

j

i3≤ d1+d2

If i2 is open, connect j to i2

Otherwise, if i1 is open, connect j to i1

Otherwise connect j to i3

E[cost of j] ≤ × [cost of j in aS1+bS2]2

d1+2d

2

2d1+d2

Our Algorithm

for a star, either the center is open, or all leaves are open

idea : big stars: always open the center,

open each leaf with prob. ≈b group small stars of the same

size, dependent rounding for each group, open 3 more

facilities than expected

first try open each star independently? with prob. a, open the center,

with prob. b, open the leaves problem : can not bound the

number of open facilities

need to guarantee : either i is open, or τi is open

iτi

small starssmall star : star of size ≤ 2/(abε )

Mh : set of stars of size h, m = |Mh|

Roughly,

for am stars, open the center

for bm stars, open the leaves

More accurately,

permute the stars and the

facilities

open top centers

open bottom leaves

big starssize h > 2/(abε )

always open the center

randomly open leaves

≈ bh for big star

Lemma : we open at most k + 6/(abε) facilities.

for a big star of size h,

FRAC : a+bh

ALG :

for a group of m small stars of size h

FRAC : m(a+bh)

ALG :

there are at most 2/(abε) groups

Summary

Main Lemma 2 : bi-point solution of cost C

solution of cost with k+O(1/ε) facilities

[JV] [JMS] our result

LMP approx. factor 3 2 2x 2 x 2

final ratio for k-median 6 4bi-point pseudo-integral

Main Lemma 1 : suffice to give an α-approximate

solution with k+O(1) facilities

Open Problems gap between integral solution with k+1 open

facilities and LP value(with k open facilities)?

tight analysis?

algorithm works for k-means?

THANK YOU!Questions?