Improved approximation for k-median Shi Li Department of Computer Science Princeton University...

37
Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

Transcript of Improved approximation for k-median Shi Li Department of Computer Science Princeton University...

Page 1: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

Improved approximation for k-medianShi Li

Department of Computer SciencePrinceton UniversityPrinceton, NJ, 08540

04/20/2013

Page 2: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

$100 $130

maintenance cost transportation cost

$10$20

$50

$30$30

+ minimize

Page 3: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

BALINSKI, M. L.1966. On finding integer solutions to linear programs. In Proceedings of the IBM Scientific Computing Symposium on Combinatorial Problems. IBM, New York, pp. 225–248.KUEHN, A. A., AND HAMBURGER, M. J. 1963. A heuristic program for locating warehouses.STOLLSTEIMER, J. F.1961. The effect of technical change and output expansion on the optimum number, size and location of pear marketing facilities in a California pear producing region. Ph.D. thesis, Univ. California at Berkeley, Berkeley, Calif. STOLLSTEIMER, J. F.1963. A working model for plant numbers and locations. J. Farm Econom. 45, 631– 645.

Facility Location Problem

Page 4: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

Uncapacitated Facility Location (UFL)

facility cost connection cost

+

F : potential facility locations

C : set of clientsfi , i F : cost for opening i

d : metric over F C

find S F,

minimize

facilities clients

$30

$100

$100

$100

$20

$100

Page 5: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

Wal-mart Stores in New Jersey

Question :

Suppose you have budget

for 50 stores, how will you

select 50 locations?

Page 6: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

k-median facilities clients

+

F : potential facility locations

C : set of clients

d : metric over F C

find S F,

minimize

fi , i F : cost for opening ik : number of facilities to open

|S |= k

Page 7: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

k-median clustering

Page 8: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

Known Results: UFL O(log n)-approximation [Hoc82] constant approximations

3.16 [STA98] 2.41 [GK99] 3 [JV99] 1.853 [CG99] 1.728 [CG99] 5+ε [Kor00] 1.861 [MMSV01]

1.736 [CS03] 1.61 [JMS02] 1.582 [Svi02] 1.52 [MYZ02] 1.50 [Byr07] 1.488 [Li11]

1.463-hardness of approx. [GK98]

Page 9: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

4 Deterministic rounding of linear programs 4.5 The uncapacitated facility location problem

5 Random sampling and randomized rounding of linear programs

5.8 The uncapacitated facility location problem

7 The primal-dual method7.6 The uncapacitated facility location problem

9 Further uses of greedy and local search algorithms9.1 A local search algorithm for the uncapacitated facility location problem9.4 A greedy algorithm for the uncapacitated facility location problem

12 Further uses of random sampling and randomized rounding of linear programmings

12.1 The uncapacitated facility location problem

Page 10: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

Know results : k-median

pseudo-approximation 1-approx with O(k log n) facilities [Hoc82]

2(1+ε)-approx. with (1+1/ε)k facilities[LV92]

super-constant approximation O(log n loglog n) [Bar96,Bar98]

O(log k loglog k) [CCGS98]

Page 11: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

Known Results: k-median

constant approximation

LP rounding Primal-Dual Local Search

6.667 [CGTS99] 6 [JV99]

4 [CG99]4 [JMS03]3.25 [CL12]

3+ε [AGK+01]

1+√3+ε [LS13]

(1+2/e)-hardness of approximation [JMS03]

Page 12: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

Lloyd Algorithm[Lloyd82] k-means clustering : min total squared distances

k-means vs k-median• clustering: k-means is more

often used

• Walmart example: k-median

is more appropriate

• approximation: k-median is

“easier”

Page 13: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

Local Search Can we improve the solution

by p swaps? No : stop

Yes : swap and repeat

Approximation : k-median : 3+2/p [AGK+01]

k-means : (3+2/p)2 [KMN+02]

Page 14: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

LP for k-medianyi : whether to open i

xi,j : whether connect j to i

open at most k facilitiesclient j must be connectedclient j can only connected to an open facility

integrality gap is at least 2

integrality gap is at most 3 (proof non-constructive)

Page 15: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

(1+√3+ε)-approximation on k-median

Page 16: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

k-median and UFL f = cost of a facility

f #open facilities

Given a black-box α-approximation A for UFL

Naïve try : find an f such that A opens k facilities

α-approxition for k-median?

Proof : α ≈1.488 for UFL, α > 1.736 for k-median

Page 17: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

k-median and UFLNaïve try : find an f such that A opens k facilities

2 issues with naïve try :

1. need LMP α-approximation for UFL

α-approximation:

LMP α-approximation

LMP = Lagragean Multiplier Preserving

Page 18: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

k-median and UFL

S1 : set of k1 < k facilities

S2 : set of k2 > k facilitiesbi-point solution

Naïve try : find an f such that A opens k facilities

2 issues with naïve try :

1. need LMP α-approximation for UFL

2. can not find f s.t. A opens exactly k facilities

Page 19: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

k-median and UFL2 issues with naïve try :

1. need LMP α-approximation for UFL

2. can not find f s.t. A opens exactly k facilities

LMP approx. factor

bi-point integral

final ratio for k-median

[JV] [JMS]

3

x 2

6

2

x 2

4

our result

2

do not know how to improvethis factor of 2 is tight !!

Page 20: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

bi-point solution

k1= |S1| < k ≤ |S2| = k2

a, b : ak1 + bk2 = k, a + b = 1

bi-point solution : aS1+bS2

cost(aS1+bS2) = a cost(S1) + b cost(S2)

S1 S2

Page 21: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

gap-2 instance

1

0

k + 1

cost of integral solution = 2

k1 = 1, k2 = k+1

cost(S1) = k+1, cost(S2) =

0

S1 S2

Page 22: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

k-median and UFL

Main Lemma 2 : bi-point solution of cost C

solution of cost with k+O(1/ε) facilities

[JV] [JMS] our result

LMP approx. factor 3 2 2bi-point integral x 2 x 2

final ratio for k-median 6 4

this factor of 2 is tight !!

bi-point pseudo-integral

Main Lemma 1 : suffice to give an α-approximate

solution with k+O(1) facilities

Page 23: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

Main Lemma 1

with k+1 open facilities, cost = 0with k open facilities , cost huge

A : black-box α-approximation with k+c open facilities

A' : (α+ε)-approximation with k open facilities

A' calls A nO(c/ε) times.

bad instance:

Page 24: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

Dense FacilityBi : set of clients in a small ball around i

i is A-dense, if connection cost of Bi in OPT is ≥ A

iBi

this instance : i is A-dense for A≈opt

Page 25: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

Dense Facility

Bi

Reduction component works directly if there are no opt/t-dense facilities, t = O(c/ε)

can reduce to such an instance in nO(t) time

i

Page 26: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

[Awasthi-Blum-Sheffet] : ε, δ >0 constants,

OPTk-1 ≥ (1+δ)OPTk can find (1+ε)-approximation

Main Lemma 1 : suffice to give an α-approximate

solution with k+O(1) facilities

k-median clustering is easy in practice reason : there is a “meaningful” clustering

Lemma 1 from [ABS]

Page 27: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

Lemma 1 from [ABS]

Algorithm

Apply A to (k-c, F, C, d) solution with k facilities of cost ≤ αOPTk-c

Apply [ABS] to each (k-i, F, C, d) for i = 0, 1, 2, …, c-1

Output the best of the c+1 solutions

Proof

If OPTk-c ≤ (1+ε)OPTk, then done.

otherwise, consider the smallest i s.t. OPTk-i-1 ≥ (1+ε)1/cOPTk-i

[ABS] on (k-i, F, C, d) solution of cost (1+ε)OPTk-i ≤ (1+ε)2OPTk

[ABS] OPTk-1 ≥ (1+δ)OPTk (1+ε)-approximation

A : α-approximation algorithm for k-median with k+c medians

Page 28: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

Main Lemma 2 : bi-point solution of cost C

solution of cost with k+O(1/ε) facilities

[JV] bi-point solution of cost C solution of cost 2C

based on improving [JV] algorithm

Page 29: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

S1 S2

given : bi-point solution aS1+bS2

select S’2 S2 , |S’2| = |S1| = k1

with prob. a, open S1

with prob. b, open S’2

randomly open k-k1 facilities in S2 \ S’2

i

JV algorithm

τi = nearest facility of i

guarantee : either i is open, or τi is open

Page 30: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

Analysis of JV algorithm

i1

i2

i3≤ d1+d2

If i2 is open, connect j to i2

Otherwise, if i1 is open, connect j to i1

Otherwise connect j to i3

E[cost of j] ≤ × [cost of j in aS1+bS2]

d1 d2ji1 S1 , i3 S’2

either i1 or i3 is open

2

Page 31: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

Our Algorithmon average, d1 >> d2

d(j, i3) ≤ i1

i2

i3

d1 d2

≤ d1+d2

j

i3≤ d1+d2

If i2 is open, connect j to i2

Otherwise, if i1 is open, connect j to i1

Otherwise connect j to i3

E[cost of j] ≤ × [cost of j in aS1+bS2]2

d1+2d

2

2d1+d2

Page 32: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

Our Algorithm

for a star, either the center is open, or all leaves are open

idea : big stars: always open the center,

open each leaf with prob. ≈b group small stars of the same

size, dependent rounding for each group, open 3 more

facilities than expected

first try open each star independently? with prob. a, open the center,

with prob. b, open the leaves problem : can not bound the

number of open facilities

need to guarantee : either i is open, or τi is open

iτi

Page 33: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

small starssmall star : star of size ≤ 2/(abε )

Mh : set of stars of size h, m = |Mh|

Roughly,

for am stars, open the center

for bm stars, open the leaves

More accurately,

permute the stars and the

facilities

open top centers

open bottom leaves

Page 34: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

big starssize h > 2/(abε )

always open the center

randomly open leaves

≈ bh for big star

Page 35: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

Lemma : we open at most k + 6/(abε) facilities.

for a big star of size h,

FRAC : a+bh

ALG :

for a group of m small stars of size h

FRAC : m(a+bh)

ALG :

there are at most 2/(abε) groups

Page 36: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

Summary

Main Lemma 2 : bi-point solution of cost C

solution of cost with k+O(1/ε) facilities

[JV] [JMS] our result

LMP approx. factor 3 2 2x 2 x 2

final ratio for k-median 6 4bi-point pseudo-integral

Main Lemma 1 : suffice to give an α-approximate

solution with k+O(1) facilities

Page 37: Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

Open Problems gap between integral solution with k+1 open

facilities and LP value(with k open facilities)?

tight analysis?

algorithm works for k-means?

THANK YOU!Questions?