Cardinality-Constrained Clustering and Outlier Detection via Conic … · 2017. 6. 12. · Kilian...

30
Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization Bergamo – June 1, 2017 Computational Management Science 2017 Presented by Kilian Schindler École Polytechnique Fédérale de Lausanne Napat Rujeerapaiboon, Daniel Kuhn École Polytechnique Fédérale de Lausanne Wolfram Wiesemann Imperial College London Joint work with

Transcript of Cardinality-Constrained Clustering and Outlier Detection via Conic … · 2017. 6. 12. · Kilian...

Page 1: Cardinality-Constrained Clustering and Outlier Detection via Conic … · 2017. 6. 12. · Kilian Schindler (EPFL) | CMS 2017 | Slide 15 Relaxation Theorem Cardinality-Constrained

Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization

Bergamo – June 1, 2017

Computational Management Science 2017

Presented by

Kilian Schindler École Polytechnique Fédérale de Lausanne

Napat Rujeerapaiboon, Daniel Kuhn

École Polytechnique Fédérale de Lausanne

Wolfram Wiesemann Imperial College London

Joint work with

Page 2: Cardinality-Constrained Clustering and Outlier Detection via Conic … · 2017. 6. 12. · Kilian Schindler (EPFL) | CMS 2017 | Slide 15 Relaxation Theorem Cardinality-Constrained

Sequential K-means clustering approach (Lloyd, 1982)

Kilian Schindler (EPFL) | CMS 2017 | Slide 2

K-means Clustering

Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization

Standard K-means clustering formulation

⇣k⇠i

⇡ki

min.PK

k=1

PNi=1 ⇡k

i k⇠i � ⇣kk2

s.t. ⇣k 2 Rd, ⇡ki 2 {0, 1},

PKk=1 ⇡

ki = 1 8i.

Step 1: Fix {⇣k} and solve

min.PK

k=1

PNi=1 ⇡k

i k⇠i � ⇣kk2

s.t. ⇡ki 2 {0, 1},

PKk=1 ⇡

ki = 1 8i.

Step 2: Fix {⇡ki } and solve

min.PK

k=1

PNi=1 ⇡k

i k⇠i � ⇣kk2

s.t. ⇣k 2 Rd.

totally unimodular constraints

optimal ⇣k is average of cluster k

Page 3: Cardinality-Constrained Clustering and Outlier Detection via Conic … · 2017. 6. 12. · Kilian Schindler (EPFL) | CMS 2017 | Slide 15 Relaxation Theorem Cardinality-Constrained

Sequential K-means clustering approach (Bennett et al., 2000)

Kilian Schindler (EPFL) | CMS 2017 | Slide 3

Cardinality-Constrained K-means Clustering

Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization

Standard K-means clustering formulation

⇣k⇠i

⇡ki

Step 2: Fix {⇡ki } and solve

min.PK

k=1

PNi=1 ⇡k

i k⇠i � ⇣kk2

s.t. ⇣k 2 Rd.

totally unimodular constraints

nk = 3

min.PK

k=1

PNi=1 ⇡k

i k⇠i � ⇣kk2

s.t. ⇣k 2 Rd, ⇡ki 2 {0, 1},

PKk=1 ⇡

ki = 1 8i,

PNi=1 ⇡

ki = nk 8k.

Step 1: Fix {⇣k} and solve

min.PK

k=1

PNi=1 ⇡k

i k⇠i � ⇣kk2

s.t. ⇡ki 2 {0, 1},

PKk=1 ⇡

ki = 1 8i,

PNi=1 ⇡

ki = nk 8k.

optimal ⇣k is average of cluster k

Page 4: Cardinality-Constrained Clustering and Outlier Detection via Conic … · 2017. 6. 12. · Kilian Schindler (EPFL) | CMS 2017 | Slide 15 Relaxation Theorem Cardinality-Constrained

Kilian Schindler (EPFL) | CMS 2017 | Slide 4

Motivation for Balanced Clustering

Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization

market segmentation

distributed computing

document clustering

Page 5: Cardinality-Constrained Clustering and Outlier Detection via Conic … · 2017. 6. 12. · Kilian Schindler (EPFL) | CMS 2017 | Slide 15 Relaxation Theorem Cardinality-Constrained

Kilian Schindler (EPFL) | CMS 2017 | Slide 5

Motivation for Outlier Detection

Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization

Standard k-means (objective = 25.21)

Balanced k-means and outlier detection

(objective = 1.97)

Suppose we wanted to find three (balanced) clusters in the following dataset...

Balanced k-means (objective = 54.27)

But if we could also specify a number of outliers to be removed…

Page 6: Cardinality-Constrained Clustering and Outlier Detection via Conic … · 2017. 6. 12. · Kilian Schindler (EPFL) | CMS 2017 | Slide 15 Relaxation Theorem Cardinality-Constrained

Numerical Experiments

MILP Reformulation, Conic Relaxations

Rounding Algorithm and Recovery Guarantees

Page 7: Cardinality-Constrained Clustering and Outlier Detection via Conic … · 2017. 6. 12. · Kilian Schindler (EPFL) | CMS 2017 | Slide 15 Relaxation Theorem Cardinality-Constrained

Kilian Schindler (EPFL) | CMS 2017 | Slide 7

Auxiliary Lemma

Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization

⇠i ⇠i

Lemma (Zha et al., 2000). For vectors ⇠1, . . . , ⇠n 2 Rd, we have that

Pni=1

��⇠i � 1n

Pnj=1 ⇠j

��2 = 12n

Pni,j=1

��⇠i � ⇠j��2.

⇣ = 1n

Pnj=1 ⇠j

Page 8: Cardinality-Constrained Clustering and Outlier Detection via Conic … · 2017. 6. 12. · Kilian Schindler (EPFL) | CMS 2017 | Slide 15 Relaxation Theorem Cardinality-Constrained

Kilian Schindler (EPFL) | CMS 2017 | Slide 8

MILP Reformulation

Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization

min.PK

k=1

PNi=1 ⇡k

i k⇠i � ⇣kk2

s.t. ⇣k 2 Rd, ⇡ki 2 {0, 1},

PKk=1 ⇡

ki = 1 8i,

PNi=1 ⇡

ki = nk 8k.

(1)

(2)

(3)

introduce epigraphical variables ⌘kij

min. 12

PKk=1

1nk

PNi,j=1 ⌘kij dij

s.t. ⌘kij 2 R+, ⇡ki 2 {0, 1},

PKk=1 ⇡

ki = 1 8 i,

PNi=1 ⇡

ki = nk 8 k,

⌘kij � ⇡ki + ⇡k

j � 1 8 i, j, k.

(P)

(1)

(2)

(3) define dij = k⇠i � ⇠jk2

apply Lemma (Zha et al., 2000)

optimal ⇣k is average of cluster k

=PK

k=1

PNi=1 ⇡k

i

��⇠i � 1nk

PNj=1 ⇡

kj ⇠j

��2

=PK

k=11

2nk

PNi,j=1 ⇡k

i ⇡kj

��⇠i � ⇠j��2

= 12

PKk=1

1nk

PNi,j=1 ⇡k

i ⇡kj dij

Page 9: Cardinality-Constrained Clustering and Outlier Detection via Conic … · 2017. 6. 12. · Kilian Schindler (EPFL) | CMS 2017 | Slide 15 Relaxation Theorem Cardinality-Constrained

Kilian Schindler (EPFL) | CMS 2017 | Slide 9

Towards an SDP Relaxation (1/4)

Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization

min.PK

k=1

PNi=1 ⇡k

i k⇠i � ⇣kk2

s.t. ⇣k 2 Rd, ⇡ki 2 {0, 1},

PKk=1 ⇡

ki = 1 8i,

PNi=1 ⇡

ki = nk 8k.

(1)

(2)

(3)

=PK

k=1

PNi=1 ⇡k

i

��⇠i � 1nk

PNj=1 ⇡

kj ⇠j

��2

=PK

k=11

2nk

PNi,j=1 ⇡k

i ⇡kj

��⇠i � ⇠j��2

= 12

PKk=1

1nk

PNi,j=1 ⇡k

i ⇡kj dij

min. 18

PKk=1

1nk

PNi,j=1(1 + x

ki )(1 + x

kj )dij

s.t. x

ki 2 {�1,+1},

PKk=1 x

ki = 2�K 8i,

PNi=1 x

ki = 2nk �N 8k.

(1 + x

ki )(1 + x

kj ) = 1 + x

ki + x

kj +m

kij

define m

kij = x

ki x

kj and notice that

apply transformation x

ki 2⇡

ki � 1 to obtain x

ki 2 {�1,+1}

Page 10: Cardinality-Constrained Clustering and Outlier Detection via Conic … · 2017. 6. 12. · Kilian Schindler (EPFL) | CMS 2017 | Slide 15 Relaxation Theorem Cardinality-Constrained

Kilian Schindler (EPFL) | CMS 2017 | Slide 10

Towards an SDP Relaxation (2/4)

Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization

min. 18

PKk=1

1nk

PNi,j=1(1 + x

ki + x

kj +m

kij) dij

s.t. x

ki 2 {�1,+1}, m

kij 2 R,

m

kij = x

ki x

kj 8 i, j, k,

PKk=1 x

ki = 2�K 8 i,

PNi=1 x

ki = 2nk �N 8 k.

min. 18

DD,

PKk=1

1nk

�Mk + 11> + x

k1> + 1(xk)>�E

s.t. x

k 2 {�1,+1}N , Mk 2 SN ,

Mk = x

k(xk)> 8k,PK

k=1 xk = (2�K)1,

1>x

k = 2nk �N 8k.

switch to matrix notation

⇥Mk

⇤ij= m

kij and

⇥x

k⇤i= x

ki

Page 11: Cardinality-Constrained Clustering and Outlier Detection via Conic … · 2017. 6. 12. · Kilian Schindler (EPFL) | CMS 2017 | Slide 15 Relaxation Theorem Cardinality-Constrained

Kilian Schindler (EPFL) | CMS 2017 | Slide 11

Towards an SDP Relaxation (3/4)

Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization

Mk + 11> + x

k1> + 1(xk)> = +(1+ x

k)(1+ x

k)> � 0 8kMk + 11> � x

k1> � 1(xk)> = +(1� x

k)(1� x

k)> � 0 8kMk � 11> + x

k1> � 1(xk)> = �(1� x

k)(1+ x

k)> 0 8kMk � 11> � x

k1> + 1(xk)> = �(1+ x

k)(1� x

k)> 0 8k

diag(Mk) = 1 8k

Mk1 = x

k(xk)>1 = (2nk �N)xk 8k

min. 18

DD,

PKk=1

1nk

�Mk + 11> + x

k1> + 1(xk)>�E

s.t. x

k 2 {�1,+1}N , Mk 2 SN ,

Mk = x

k(xk)> 8k,PK

k=1 xk = (2�K)1,

1>x

k = 2nk �N 8k.

Page 12: Cardinality-Constrained Clustering and Outlier Detection via Conic … · 2017. 6. 12. · Kilian Schindler (EPFL) | CMS 2017 | Slide 15 Relaxation Theorem Cardinality-Constrained

Kilian Schindler (EPFL) | CMS 2017 | Slide 12

Towards an SDP Relaxation (4/4)

Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization

min. 18

DD,

PKk=1

1nk

�Mk + 11> + x

k1> + 1(xk)>�E

s.t. x

k 2 {�1,+1}N , Mk 2 SN ,

Mk = x

k(xk)> 8k,PK

k=1 xk = (2�K)1,

1>x

k = 2nk �N 8k,

Mk1 = (2nk �N)xk 8k,

diag(Mk) = 1 8k,Mk + 11> + x

k1> + 1(xk)> � 0 8k,

Mk + 11> � x

k1> � 1(xk)> � 0 8k,

Mk � 11> + x

k1> � 1(xk)> 0 8k,

Mk � 11> � x

k1> + 1(xk)> 0 8k.

Mk ⌫ x

k(xk)>

x

k 2 [�1,+1]N

SDP relaxation

these additional constraints may play a role now

(Awasthi et al., 2015)

Page 13: Cardinality-Constrained Clustering and Outlier Detection via Conic … · 2017. 6. 12. · Kilian Schindler (EPFL) | CMS 2017 | Slide 15 Relaxation Theorem Cardinality-Constrained

Kilian Schindler (EPFL) | CMS 2017 | Slide 13

SDP Relaxation

Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization

min. 18

DD,

PKk=1

1nk

�Mk + 11> + x

k1> + 1(xk)>�E

s.t. x

k 2 [�1,+1]N , Mk 2 SN ,

Mk ⌫ x

k(xk)> 8k,PK

k=1 xk = (2�K)1,

1>x

k = 2nk �N 8k,

Mk1 = (2nk �N)xk 8k,

diag(Mk) = 1 8k,

Mk + 11> + x

k1> + 1(xk)> � 0 8k,

Mk + 11> � x

k1> � 1(xk)> � 0 8k,

Mk � 11> + x

k1> � 1(xk)> 0 8k,

Mk � 11> � x

k1> + 1(xk)> 0 8k.

(RSDP)

Page 14: Cardinality-Constrained Clustering and Outlier Detection via Conic … · 2017. 6. 12. · Kilian Schindler (EPFL) | CMS 2017 | Slide 15 Relaxation Theorem Cardinality-Constrained

min. 18

DD,

PKk=1

1nk

�Mk + 11> + x

k1> + 1(xk)>�E

s.t. x

k 2 [�1,+1]N , Mk 2 SN ,

Mk ⌫ x

k(xk)> 8k,PK

k=1 xk = (2�K)1,

1>x

k = 2nk �N 8k,

Mk1 = (2nk �N)xk 8k,

diag(Mk) = 1 8k,

Mk + 11> + x

k1> + 1(xk)> � 0 8k,

Mk + 11> � x

k1> � 1(xk)> � 0 8k,

Mk � 11> + x

k1> � 1(xk)> 0 8k,

Mk � 11> � x

k1> + 1(xk)> 0 8k.

(RLP)

Kilian Schindler (EPFL) | CMS 2017 | Slide 14

LP Relaxation

Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization

Page 15: Cardinality-Constrained Clustering and Outlier Detection via Conic … · 2017. 6. 12. · Kilian Schindler (EPFL) | CMS 2017 | Slide 15 Relaxation Theorem Cardinality-Constrained

Kilian Schindler (EPFL) | CMS 2017 | Slide 15

Relaxation Theorem

Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization

§  We can obtain lower bounds on the objective of the cardinality-constrained k-means clustering problem in polynomial time.

§  Lloyd’s algorithm does not give lower bounds and is not guaranteed to terminate in polynomial time (Arthur and Vassilvitskii, 2006).

§  Can we recover a feasible solution (and thus an upper bound)?

Theorem: min RLP min RSDP min P.

Page 16: Cardinality-Constrained Clustering and Outlier Detection via Conic … · 2017. 6. 12. · Kilian Schindler (EPFL) | CMS 2017 | Slide 15 Relaxation Theorem Cardinality-Constrained

Numerical Experiments

MILP Reformulation, Conic Relaxations

Rounding Algorithm and Recovery Guarantees

lower bound on objective

Page 17: Cardinality-Constrained Clustering and Outlier Detection via Conic … · 2017. 6. 12. · Kilian Schindler (EPFL) | CMS 2017 | Slide 15 Relaxation Theorem Cardinality-Constrained

Kilian Schindler (EPFL) | CMS 2017 | Slide 17

Rounding Algorithm

Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization

Step 1: Solve RSDP or RLP and record the optimal x

1, . . . ,xK 2 RN.

Note: All of the above problems can be solved in polynomial time.

Step 2: Solve the (totally unimodular) linear assignment problem

to obtain an assignment {⇡ki } that is feasible in P.

max.

PKk=1

PNi=1 ⇡

ki x

ki

s.t. ⇡

ki 2 {0, 1},

PKk=1 ⇡

ki = 1 8i,

PNi=1 ⇡

ki = nk 8k.

x

ki = +1 ! assign point i

to cluster k

Page 18: Cardinality-Constrained Clustering and Outlier Detection via Conic … · 2017. 6. 12. · Kilian Schindler (EPFL) | CMS 2017 | Slide 15 Relaxation Theorem Cardinality-Constrained

Kilian Schindler (EPFL) | CMS 2017 | Slide 18

Perfect Separation for Balanced Clustering

Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization

minimum distance between clusters

9 {Ik}, |Ik| = n 8 k, max

kmax

i,j2Ikdij < min

k 6=`min

i2Ik, j2I`dij

maximum distance within clusters

This condition is also used in Elhamifar et al., 2012, and Nellore and Ward, 2015.

Page 19: Cardinality-Constrained Clustering and Outlier Detection via Conic … · 2017. 6. 12. · Kilian Schindler (EPFL) | CMS 2017 | Slide 15 Relaxation Theorem Cardinality-Constrained

Kilian Schindler (EPFL) | CMS 2017 | Slide 19

Recovery Theorem for Balanced Clustering

Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization

Theorem: Under perfect separation, min RLP = min RSDP = min P.

§  Derive a lower bound on from its own constraints.

§  Show that under perfect separation this is attainable in .

Proof idea:

P

min RLP

Page 20: Cardinality-Constrained Clustering and Outlier Detection via Conic … · 2017. 6. 12. · Kilian Schindler (EPFL) | CMS 2017 | Slide 15 Relaxation Theorem Cardinality-Constrained

Kilian Schindler (EPFL) | CMS 2017 | Slide 20

Proof of Recovery Theorem for Balanced Clustering (1/2)

Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization

<

Pi 6=j

PKk=1 w

kij =

PKk=1

Pi 6=j

�m

kij + 1 + x

ki + x

kj

=PK

k=1

�(2n�N)2 �N +N(N � 1) + 2(N � 1)(2n�N)

�= 4Kn(n� 1)

0 PK

k=1 wkij =

PKk=1

�m

kij + 1 + x

ki + x

kj

PKk=1 (1 + 1) + 2�K + 2�K = 4

x

k 2 [�1,+1]N , Mk 2 SN ,PK

k=1 xk = (2�K)1,

1>x

k = 2n�N 8k,Mk1 = (2n�N)xk 8k,diag(Mk) = 1 8k,Mk + 11> + x

k1> + 1(xk)> � 0 8k,Mk + 11> � x

k1> � 1(xk)> � 0 8k,Mk � 11> + x

k1> � 1(xk)> 0 8k,Mk � 11> � x

k1> + 1(xk)> 0 8k.

(1) (2) (3)

(4)

(6) (7)

Define Wk = Mk + 11> + x

k1> + 1(xk)>.

(6),(7) (1) (1)

(2),(3),(4) (2)

18n

DD,

PKk=1

�Mk + 11> + x

k1> + 1(xk)>�E

= 18n

DD,

PKk=1 W

kE

= 18n

Pi 6=j dij

⇣PKk=1 w

kij

weighted sum of non-negative terms

(5)

(5)

Page 21: Cardinality-Constrained Clustering and Outlier Detection via Conic … · 2017. 6. 12. · Kilian Schindler (EPFL) | CMS 2017 | Slide 15 Relaxation Theorem Cardinality-Constrained

Kilian Schindler (EPFL) | CMS 2017 | Slide 21

Proof of Recovery Theorem for Balanced Clustering (2/2)

Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization

§  Bounds on individual weights:

§  Restriction on total weight:

§  Lower bound on non-negative weighted sum:

Pi 6=j

PKk=1 w

kij = 4Kn(n� 1)

0 PK

k=1 wkij 4

18n

P

i 6=j dij⇣

PKk=1 w

kij

� 12n

n

sum of the Kn(n� 1) smallest dij with i 6= jo

§  Under perfect separation, this lower bound is attainable in : P

⇠i

⇠j

⇠`

⇡ki = 1

⇡kj = 1

⇡k` = 0

Page 22: Cardinality-Constrained Clustering and Outlier Detection via Conic … · 2017. 6. 12. · Kilian Schindler (EPFL) | CMS 2017 | Slide 15 Relaxation Theorem Cardinality-Constrained

Kilian Schindler (EPFL) | CMS 2017 | Slide 22

Simultaneous Clustering and Outlier Detection

Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization

§  Outliers can be dealt with by introducing an additional dummy cluster.

§  This dummy cluster is not penalized in the objective function, but it has to fulfill appropriate constraints.

§  MILP reformulation, SDP/LP relaxations and the recovery guarantee are still available.

A similar approach is taken in Chawla and Gionis, 2013, and Ott et al., 2014.

Page 23: Cardinality-Constrained Clustering and Outlier Detection via Conic … · 2017. 6. 12. · Kilian Schindler (EPFL) | CMS 2017 | Slide 15 Relaxation Theorem Cardinality-Constrained

Numerical Experiments

MILP Reformulation, Conic Relaxations

Rounding Algorithm and Recovery Guarantees

lower bound on objective

feasible clustering that can be optimal

Page 24: Cardinality-Constrained Clustering and Outlier Detection via Conic … · 2017. 6. 12. · Kilian Schindler (EPFL) | CMS 2017 | Slide 15 Relaxation Theorem Cardinality-Constrained

Kilian Schindler (EPFL) | CMS 2017 | Slide 24

Performance on Real-World Data

Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization

Consider classification datasets in the UCI Machine Learning Repository with

§  150-300 datapoints §  up to 200 attributes §  no missing values

Perform classification by means of the following approaches

§  Rounded §  Rounded §  Best-of-10 Bennett et al.

RLP RSDP Bennett et al.

Dataset UB LB UB LB UB CV (%)

Iris 81.4 78.8 81.4 81.4 81.4 0.0

Seeds 620.7 539.0 605.6 605.6 605.6 0.0

Planning Relax 325.9 297.0 315.7 315.7 315.8 0.3

Connectionist Bench 312.6 259.1 280.6 280.1 280.6 0.3

Urban Land Cover 3.61e9 3.17e9 3.54e9 3.44e9 3.64e9 9.2

Parkinsons 1.36e6 1.36e6 1.36e6 1.36e6 1.36e6 15.1

Glass Identification 469.0 377.2 – – 438.2 28.4

RLP

RSDP

Page 25: Cardinality-Constrained Clustering and Outlier Detection via Conic … · 2017. 6. 12. · Kilian Schindler (EPFL) | CMS 2017 | Slide 15 Relaxation Theorem Cardinality-Constrained

Kilian Schindler (EPFL) | CMS 2017 | Slide 25

Performance on Synthetic Data (1/3)

Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization

§  Generate three clouds with 10, 20 and 70 datapoints, respectively.

§  The datapoints of each cloud are contained within a unit ball.

§  Vary the separation between the clouds.

§  Apply Rounded , Rounded and Best-of-10 Bennett et al. RLP RSDP

Page 26: Cardinality-Constrained Clustering and Outlier Detection via Conic … · 2017. 6. 12. · Kilian Schindler (EPFL) | CMS 2017 | Slide 15 Relaxation Theorem Cardinality-Constrained

Kilian Schindler (EPFL) | CMS 2017 | Slide 26 Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization

Performance on Synthetic Data (2/3)

Page 27: Cardinality-Constrained Clustering and Outlier Detection via Conic … · 2017. 6. 12. · Kilian Schindler (EPFL) | CMS 2017 | Slide 15 Relaxation Theorem Cardinality-Constrained

Kilian Schindler (EPFL) | CMS 2017 | Slide 27 Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization

Performance on Synthetic Data (3/3)

Best-of-10 Bennett et al. Rounded RLP

Page 28: Cardinality-Constrained Clustering and Outlier Detection via Conic … · 2017. 6. 12. · Kilian Schindler (EPFL) | CMS 2017 | Slide 15 Relaxation Theorem Cardinality-Constrained

Kilian Schindler (EPFL) | CMS 2017 | Slide 28

Performance on Outlier Detection

Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization

§  Consider the Breast cancer Wisconsin (diagnostic) dataset in the UCI Machine Learning Repository.

§  357 benign (considered to be the cluster), and 212 malign (considered to be the outliers).

§  Vary number of outliers and apply rounded .

§  Optimality gap always below 3%.

RLP

Page 29: Cardinality-Constrained Clustering and Outlier Detection via Conic … · 2017. 6. 12. · Kilian Schindler (EPFL) | CMS 2017 | Slide 15 Relaxation Theorem Cardinality-Constrained

Numerical Experiments

MILP Reformulation, Conic Relaxations

Rounding Algorithm and Recovery Guarantees

lower bound on objective

feasible clustering that can be optimal

proof of concept

Page 30: Cardinality-Constrained Clustering and Outlier Detection via Conic … · 2017. 6. 12. · Kilian Schindler (EPFL) | CMS 2017 | Slide 15 Relaxation Theorem Cardinality-Constrained

Kilian Schindler (EPFL) | CMS 2017 | Slide 30

Thank you & References

Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization

§  Awasthi, P., A. Bandeira, M. Charikar, R. Krishnaswamy, S. Villar, R. Ward. 2015. Relax, no need to round: Integrality of clustering formulations. Conference on Innovations in Theoretical Computer Science. 191-200.

§  Bennett, K., P. Bradley, A. Demiriz. 2000. Constrained K-means clustering. Technical Report, Microsoft Research.

§  Chewla, S., A. Gionis. 2013. k-means--: A unified approach to clustering and outlier detection. SIAM International Conference on Data Mining. 189-197.

§  Elhamifar, E., G. Sapiro, R. Vidal. 2012. Finding exemplars from pairwise dissimilarities via simultaneous sparse recovery. Advances in Neural Information Processing Systems 25. 19-27.

§  Lloyd, S. 1982. Least squares quantization in PCM. IEEE Transactions on Information Theory 28(2) 129-137.

§  Nellore, A., R. Ward. 2015. Recovery guarantees for exemplar-based clustering. Information and Computation 245 165-180.

§  Ott, L., L. Pang, F. Ramos, S. Chewla. 2014. On integrated clustering and outlier detection. Advances in Neural Information Processing Systems 27. 1359-1367.

§  Rujeerapaiboon, N., K. Schindler, D. Kuhn, W. Wiesemann. 2017. Size matters: Cardinality-constrained clustering and outlier detection via conic optimization. Optimization Online.

§  Arthur, D., S. Vassilvitskii. 2006. How slow is the k-means method?. Symposium on Computational Geometry. 144-153.

§  Zha, H., X. He, C. Ding, H. Simon, M. Gu. 2002. Spectral relaxation for K-means clustering. Advances in Neural Information Processing Systems 14. 1057-1064.