Cardinality-Constrained Clustering and Outlier Detection via Conic … · 2017. 6. 12. · Kilian...
Transcript of Cardinality-Constrained Clustering and Outlier Detection via Conic … · 2017. 6. 12. · Kilian...
Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization
Bergamo – June 1, 2017
Computational Management Science 2017
Presented by
Kilian Schindler École Polytechnique Fédérale de Lausanne
Napat Rujeerapaiboon, Daniel Kuhn
École Polytechnique Fédérale de Lausanne
Wolfram Wiesemann Imperial College London
Joint work with
Sequential K-means clustering approach (Lloyd, 1982)
Kilian Schindler (EPFL) | CMS 2017 | Slide 2
K-means Clustering
Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization
Standard K-means clustering formulation
⇣k⇠i
⇡ki
min.PK
k=1
PNi=1 ⇡k
i k⇠i � ⇣kk2
s.t. ⇣k 2 Rd, ⇡ki 2 {0, 1},
PKk=1 ⇡
ki = 1 8i.
Step 1: Fix {⇣k} and solve
min.PK
k=1
PNi=1 ⇡k
i k⇠i � ⇣kk2
s.t. ⇡ki 2 {0, 1},
PKk=1 ⇡
ki = 1 8i.
Step 2: Fix {⇡ki } and solve
min.PK
k=1
PNi=1 ⇡k
i k⇠i � ⇣kk2
s.t. ⇣k 2 Rd.
totally unimodular constraints
optimal ⇣k is average of cluster k
Sequential K-means clustering approach (Bennett et al., 2000)
Kilian Schindler (EPFL) | CMS 2017 | Slide 3
Cardinality-Constrained K-means Clustering
Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization
Standard K-means clustering formulation
⇣k⇠i
⇡ki
Step 2: Fix {⇡ki } and solve
min.PK
k=1
PNi=1 ⇡k
i k⇠i � ⇣kk2
s.t. ⇣k 2 Rd.
totally unimodular constraints
nk = 3
min.PK
k=1
PNi=1 ⇡k
i k⇠i � ⇣kk2
s.t. ⇣k 2 Rd, ⇡ki 2 {0, 1},
PKk=1 ⇡
ki = 1 8i,
PNi=1 ⇡
ki = nk 8k.
Step 1: Fix {⇣k} and solve
min.PK
k=1
PNi=1 ⇡k
i k⇠i � ⇣kk2
s.t. ⇡ki 2 {0, 1},
PKk=1 ⇡
ki = 1 8i,
PNi=1 ⇡
ki = nk 8k.
optimal ⇣k is average of cluster k
Kilian Schindler (EPFL) | CMS 2017 | Slide 4
Motivation for Balanced Clustering
Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization
market segmentation
distributed computing
document clustering
Kilian Schindler (EPFL) | CMS 2017 | Slide 5
Motivation for Outlier Detection
Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization
Standard k-means (objective = 25.21)
Balanced k-means and outlier detection
(objective = 1.97)
Suppose we wanted to find three (balanced) clusters in the following dataset...
Balanced k-means (objective = 54.27)
But if we could also specify a number of outliers to be removed…
Numerical Experiments
MILP Reformulation, Conic Relaxations
Rounding Algorithm and Recovery Guarantees
Kilian Schindler (EPFL) | CMS 2017 | Slide 7
Auxiliary Lemma
Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization
⇠i ⇠i
Lemma (Zha et al., 2000). For vectors ⇠1, . . . , ⇠n 2 Rd, we have that
Pni=1
��⇠i � 1n
Pnj=1 ⇠j
��2 = 12n
Pni,j=1
��⇠i � ⇠j��2.
⇣ = 1n
Pnj=1 ⇠j
Kilian Schindler (EPFL) | CMS 2017 | Slide 8
MILP Reformulation
Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization
min.PK
k=1
PNi=1 ⇡k
i k⇠i � ⇣kk2
s.t. ⇣k 2 Rd, ⇡ki 2 {0, 1},
PKk=1 ⇡
ki = 1 8i,
PNi=1 ⇡
ki = nk 8k.
(1)
(2)
(3)
introduce epigraphical variables ⌘kij
min. 12
PKk=1
1nk
PNi,j=1 ⌘kij dij
s.t. ⌘kij 2 R+, ⇡ki 2 {0, 1},
PKk=1 ⇡
ki = 1 8 i,
PNi=1 ⇡
ki = nk 8 k,
⌘kij � ⇡ki + ⇡k
j � 1 8 i, j, k.
(P)
(1)
(2)
(3) define dij = k⇠i � ⇠jk2
apply Lemma (Zha et al., 2000)
optimal ⇣k is average of cluster k
=PK
k=1
PNi=1 ⇡k
i
��⇠i � 1nk
PNj=1 ⇡
kj ⇠j
��2
=PK
k=11
2nk
PNi,j=1 ⇡k
i ⇡kj
��⇠i � ⇠j��2
= 12
PKk=1
1nk
PNi,j=1 ⇡k
i ⇡kj dij
Kilian Schindler (EPFL) | CMS 2017 | Slide 9
Towards an SDP Relaxation (1/4)
Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization
min.PK
k=1
PNi=1 ⇡k
i k⇠i � ⇣kk2
s.t. ⇣k 2 Rd, ⇡ki 2 {0, 1},
PKk=1 ⇡
ki = 1 8i,
PNi=1 ⇡
ki = nk 8k.
(1)
(2)
(3)
=PK
k=1
PNi=1 ⇡k
i
��⇠i � 1nk
PNj=1 ⇡
kj ⇠j
��2
=PK
k=11
2nk
PNi,j=1 ⇡k
i ⇡kj
��⇠i � ⇠j��2
= 12
PKk=1
1nk
PNi,j=1 ⇡k
i ⇡kj dij
min. 18
PKk=1
1nk
PNi,j=1(1 + x
ki )(1 + x
kj )dij
s.t. x
ki 2 {�1,+1},
PKk=1 x
ki = 2�K 8i,
PNi=1 x
ki = 2nk �N 8k.
(1 + x
ki )(1 + x
kj ) = 1 + x
ki + x
kj +m
kij
define m
kij = x
ki x
kj and notice that
apply transformation x
ki 2⇡
ki � 1 to obtain x
ki 2 {�1,+1}
Kilian Schindler (EPFL) | CMS 2017 | Slide 10
Towards an SDP Relaxation (2/4)
Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization
min. 18
PKk=1
1nk
PNi,j=1(1 + x
ki + x
kj +m
kij) dij
s.t. x
ki 2 {�1,+1}, m
kij 2 R,
m
kij = x
ki x
kj 8 i, j, k,
PKk=1 x
ki = 2�K 8 i,
PNi=1 x
ki = 2nk �N 8 k.
min. 18
DD,
PKk=1
1nk
�Mk + 11> + x
k1> + 1(xk)>�E
s.t. x
k 2 {�1,+1}N , Mk 2 SN ,
Mk = x
k(xk)> 8k,PK
k=1 xk = (2�K)1,
1>x
k = 2nk �N 8k.
switch to matrix notation
⇥Mk
⇤ij= m
kij and
⇥x
k⇤i= x
ki
Kilian Schindler (EPFL) | CMS 2017 | Slide 11
Towards an SDP Relaxation (3/4)
Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization
Mk + 11> + x
k1> + 1(xk)> = +(1+ x
k)(1+ x
k)> � 0 8kMk + 11> � x
k1> � 1(xk)> = +(1� x
k)(1� x
k)> � 0 8kMk � 11> + x
k1> � 1(xk)> = �(1� x
k)(1+ x
k)> 0 8kMk � 11> � x
k1> + 1(xk)> = �(1+ x
k)(1� x
k)> 0 8k
diag(Mk) = 1 8k
Mk1 = x
k(xk)>1 = (2nk �N)xk 8k
min. 18
DD,
PKk=1
1nk
�Mk + 11> + x
k1> + 1(xk)>�E
s.t. x
k 2 {�1,+1}N , Mk 2 SN ,
Mk = x
k(xk)> 8k,PK
k=1 xk = (2�K)1,
1>x
k = 2nk �N 8k.
Kilian Schindler (EPFL) | CMS 2017 | Slide 12
Towards an SDP Relaxation (4/4)
Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization
min. 18
DD,
PKk=1
1nk
�Mk + 11> + x
k1> + 1(xk)>�E
s.t. x
k 2 {�1,+1}N , Mk 2 SN ,
Mk = x
k(xk)> 8k,PK
k=1 xk = (2�K)1,
1>x
k = 2nk �N 8k,
Mk1 = (2nk �N)xk 8k,
diag(Mk) = 1 8k,Mk + 11> + x
k1> + 1(xk)> � 0 8k,
Mk + 11> � x
k1> � 1(xk)> � 0 8k,
Mk � 11> + x
k1> � 1(xk)> 0 8k,
Mk � 11> � x
k1> + 1(xk)> 0 8k.
Mk ⌫ x
k(xk)>
x
k 2 [�1,+1]N
SDP relaxation
these additional constraints may play a role now
(Awasthi et al., 2015)
Kilian Schindler (EPFL) | CMS 2017 | Slide 13
SDP Relaxation
Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization
min. 18
DD,
PKk=1
1nk
�Mk + 11> + x
k1> + 1(xk)>�E
s.t. x
k 2 [�1,+1]N , Mk 2 SN ,
Mk ⌫ x
k(xk)> 8k,PK
k=1 xk = (2�K)1,
1>x
k = 2nk �N 8k,
Mk1 = (2nk �N)xk 8k,
diag(Mk) = 1 8k,
Mk + 11> + x
k1> + 1(xk)> � 0 8k,
Mk + 11> � x
k1> � 1(xk)> � 0 8k,
Mk � 11> + x
k1> � 1(xk)> 0 8k,
Mk � 11> � x
k1> + 1(xk)> 0 8k.
(RSDP)
min. 18
DD,
PKk=1
1nk
�Mk + 11> + x
k1> + 1(xk)>�E
s.t. x
k 2 [�1,+1]N , Mk 2 SN ,
Mk ⌫ x
k(xk)> 8k,PK
k=1 xk = (2�K)1,
1>x
k = 2nk �N 8k,
Mk1 = (2nk �N)xk 8k,
diag(Mk) = 1 8k,
Mk + 11> + x
k1> + 1(xk)> � 0 8k,
Mk + 11> � x
k1> � 1(xk)> � 0 8k,
Mk � 11> + x
k1> � 1(xk)> 0 8k,
Mk � 11> � x
k1> + 1(xk)> 0 8k.
(RLP)
Kilian Schindler (EPFL) | CMS 2017 | Slide 14
LP Relaxation
Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization
Kilian Schindler (EPFL) | CMS 2017 | Slide 15
Relaxation Theorem
Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization
§ We can obtain lower bounds on the objective of the cardinality-constrained k-means clustering problem in polynomial time.
§ Lloyd’s algorithm does not give lower bounds and is not guaranteed to terminate in polynomial time (Arthur and Vassilvitskii, 2006).
§ Can we recover a feasible solution (and thus an upper bound)?
Theorem: min RLP min RSDP min P.
Numerical Experiments
MILP Reformulation, Conic Relaxations
Rounding Algorithm and Recovery Guarantees
lower bound on objective
Kilian Schindler (EPFL) | CMS 2017 | Slide 17
Rounding Algorithm
Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization
Step 1: Solve RSDP or RLP and record the optimal x
1, . . . ,xK 2 RN.
Note: All of the above problems can be solved in polynomial time.
Step 2: Solve the (totally unimodular) linear assignment problem
to obtain an assignment {⇡ki } that is feasible in P.
max.
PKk=1
PNi=1 ⇡
ki x
ki
s.t. ⇡
ki 2 {0, 1},
PKk=1 ⇡
ki = 1 8i,
PNi=1 ⇡
ki = nk 8k.
x
ki = +1 ! assign point i
to cluster k
Kilian Schindler (EPFL) | CMS 2017 | Slide 18
Perfect Separation for Balanced Clustering
Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization
minimum distance between clusters
9 {Ik}, |Ik| = n 8 k, max
kmax
i,j2Ikdij < min
k 6=`min
i2Ik, j2I`dij
maximum distance within clusters
This condition is also used in Elhamifar et al., 2012, and Nellore and Ward, 2015.
Kilian Schindler (EPFL) | CMS 2017 | Slide 19
Recovery Theorem for Balanced Clustering
Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization
Theorem: Under perfect separation, min RLP = min RSDP = min P.
§ Derive a lower bound on from its own constraints.
§ Show that under perfect separation this is attainable in .
Proof idea:
P
min RLP
Kilian Schindler (EPFL) | CMS 2017 | Slide 20
Proof of Recovery Theorem for Balanced Clustering (1/2)
Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization
<
Pi 6=j
PKk=1 w
kij =
PKk=1
Pi 6=j
�m
kij + 1 + x
ki + x
kj
�
=PK
k=1
�(2n�N)2 �N +N(N � 1) + 2(N � 1)(2n�N)
�= 4Kn(n� 1)
0 PK
k=1 wkij =
PKk=1
�m
kij + 1 + x
ki + x
kj
�
PKk=1 (1 + 1) + 2�K + 2�K = 4
x
k 2 [�1,+1]N , Mk 2 SN ,PK
k=1 xk = (2�K)1,
1>x
k = 2n�N 8k,Mk1 = (2n�N)xk 8k,diag(Mk) = 1 8k,Mk + 11> + x
k1> + 1(xk)> � 0 8k,Mk + 11> � x
k1> � 1(xk)> � 0 8k,Mk � 11> + x
k1> � 1(xk)> 0 8k,Mk � 11> � x
k1> + 1(xk)> 0 8k.
(1) (2) (3)
(4)
(6) (7)
Define Wk = Mk + 11> + x
k1> + 1(xk)>.
(6),(7) (1) (1)
(2),(3),(4) (2)
18n
DD,
PKk=1
�Mk + 11> + x
k1> + 1(xk)>�E
= 18n
DD,
PKk=1 W
kE
= 18n
Pi 6=j dij
⇣PKk=1 w
kij
⌘
weighted sum of non-negative terms
(5)
(5)
Kilian Schindler (EPFL) | CMS 2017 | Slide 21
Proof of Recovery Theorem for Balanced Clustering (2/2)
Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization
§ Bounds on individual weights:
§ Restriction on total weight:
§ Lower bound on non-negative weighted sum:
Pi 6=j
PKk=1 w
kij = 4Kn(n� 1)
0 PK
k=1 wkij 4
18n
P
i 6=j dij⇣
PKk=1 w
kij
⌘
� 12n
n
sum of the Kn(n� 1) smallest dij with i 6= jo
§ Under perfect separation, this lower bound is attainable in : P
⇠i
⇠j
⇠`
⇡ki = 1
⇡kj = 1
⇡k` = 0
Kilian Schindler (EPFL) | CMS 2017 | Slide 22
Simultaneous Clustering and Outlier Detection
Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization
§ Outliers can be dealt with by introducing an additional dummy cluster.
§ This dummy cluster is not penalized in the objective function, but it has to fulfill appropriate constraints.
§ MILP reformulation, SDP/LP relaxations and the recovery guarantee are still available.
A similar approach is taken in Chawla and Gionis, 2013, and Ott et al., 2014.
Numerical Experiments
MILP Reformulation, Conic Relaxations
Rounding Algorithm and Recovery Guarantees
lower bound on objective
feasible clustering that can be optimal
Kilian Schindler (EPFL) | CMS 2017 | Slide 24
Performance on Real-World Data
Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization
Consider classification datasets in the UCI Machine Learning Repository with
§ 150-300 datapoints § up to 200 attributes § no missing values
Perform classification by means of the following approaches
§ Rounded § Rounded § Best-of-10 Bennett et al.
RLP RSDP Bennett et al.
Dataset UB LB UB LB UB CV (%)
Iris 81.4 78.8 81.4 81.4 81.4 0.0
Seeds 620.7 539.0 605.6 605.6 605.6 0.0
Planning Relax 325.9 297.0 315.7 315.7 315.8 0.3
Connectionist Bench 312.6 259.1 280.6 280.1 280.6 0.3
Urban Land Cover 3.61e9 3.17e9 3.54e9 3.44e9 3.64e9 9.2
Parkinsons 1.36e6 1.36e6 1.36e6 1.36e6 1.36e6 15.1
Glass Identification 469.0 377.2 – – 438.2 28.4
RLP
RSDP
Kilian Schindler (EPFL) | CMS 2017 | Slide 25
Performance on Synthetic Data (1/3)
Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization
§ Generate three clouds with 10, 20 and 70 datapoints, respectively.
§ The datapoints of each cloud are contained within a unit ball.
§ Vary the separation between the clouds.
§ Apply Rounded , Rounded and Best-of-10 Bennett et al. RLP RSDP
Kilian Schindler (EPFL) | CMS 2017 | Slide 26 Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization
Performance on Synthetic Data (2/3)
Kilian Schindler (EPFL) | CMS 2017 | Slide 27 Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization
Performance on Synthetic Data (3/3)
Best-of-10 Bennett et al. Rounded RLP
Kilian Schindler (EPFL) | CMS 2017 | Slide 28
Performance on Outlier Detection
Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization
§ Consider the Breast cancer Wisconsin (diagnostic) dataset in the UCI Machine Learning Repository.
§ 357 benign (considered to be the cluster), and 212 malign (considered to be the outliers).
§ Vary number of outliers and apply rounded .
§ Optimality gap always below 3%.
RLP
Numerical Experiments
MILP Reformulation, Conic Relaxations
Rounding Algorithm and Recovery Guarantees
lower bound on objective
feasible clustering that can be optimal
proof of concept
Kilian Schindler (EPFL) | CMS 2017 | Slide 30
Thank you & References
Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization
§ Awasthi, P., A. Bandeira, M. Charikar, R. Krishnaswamy, S. Villar, R. Ward. 2015. Relax, no need to round: Integrality of clustering formulations. Conference on Innovations in Theoretical Computer Science. 191-200.
§ Bennett, K., P. Bradley, A. Demiriz. 2000. Constrained K-means clustering. Technical Report, Microsoft Research.
§ Chewla, S., A. Gionis. 2013. k-means--: A unified approach to clustering and outlier detection. SIAM International Conference on Data Mining. 189-197.
§ Elhamifar, E., G. Sapiro, R. Vidal. 2012. Finding exemplars from pairwise dissimilarities via simultaneous sparse recovery. Advances in Neural Information Processing Systems 25. 19-27.
§ Lloyd, S. 1982. Least squares quantization in PCM. IEEE Transactions on Information Theory 28(2) 129-137.
§ Nellore, A., R. Ward. 2015. Recovery guarantees for exemplar-based clustering. Information and Computation 245 165-180.
§ Ott, L., L. Pang, F. Ramos, S. Chewla. 2014. On integrated clustering and outlier detection. Advances in Neural Information Processing Systems 27. 1359-1367.
§ Rujeerapaiboon, N., K. Schindler, D. Kuhn, W. Wiesemann. 2017. Size matters: Cardinality-constrained clustering and outlier detection via conic optimization. Optimization Online.
§ Arthur, D., S. Vassilvitskii. 2006. How slow is the k-means method?. Symposium on Computational Geometry. 144-153.
§ Zha, H., X. He, C. Ding, H. Simon, M. Gu. 2002. Spectral relaxation for K-means clustering. Advances in Neural Information Processing Systems 14. 1057-1064.