Simultaneous Subspace Clustering and Cluster Number ... · of triplets from different clusters...

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 28, NO. 8, AUGUST 2019 3973

Simultaneous Subspace Clustering andCluster Number Estimating Based

on Triplet RelationshipJie Liang , Jufeng Yang , Ming-Ming Cheng , Paul L. Rosin , and Liang Wang

Abstract— In this paper, we propose a unified frameworkto discover the number of clusters and group the data pointsinto different clusters using subspace clustering simultaneously.Real data distributed in a high-dimensional space can be dis-entangled into a union of low-dimensional subspaces, which canbenefit various applications. To explore such intrinsic structure,state-of-the-art subspace clustering approaches often optimize aself-representation problem among all samples, to construct apairwise affinity graph for spectral clustering. However, a graphwith pairwise similarities lacks robustness for segmentation, espe-cially for samples which lie on the intersection of two subspaces.To address this problem, we design a hyper-correlation-baseddata structure termed as the triplet relationship, which revealshigh relevance and local compactness among three samples. Thetriplet relationship can be derived from the self-representationmatrix, and be utilized to iteratively assign the data pointsto clusters. Based on the triplet relationship, we propose aunified optimizing scheme to automatically calculate clusteringassignments. Specifically, we optimize a model selection rewardand a fusion reward by simultaneously maximizing the similarityof triplets from different clusters while minimizing the correlationof triplets from the same cluster. The proposed algorithm alsoautomatically reveals the number of clusters and fuses groups toavoid over-segmentation. Extensive experimental results on bothsynthetic and real-world datasets validate the effectiveness androbustness of the proposed method.

Index Terms— Subspace clustering, triplet relationship,estimating the number of clusters, hyper-graph clustering.

I. INTRODUCTION

W ITH the ability to disentangle latent structure of data inan unsupervised manner [2]–[4], subspace clustering is

Manuscript received September 3, 2018; revised February 12, 2019;accepted February 19, 2019. Date of publication March 6, 2019; dateof current version June 20, 2019. This work was supported in part byNSFC under Grant 61876094, Grant 61620106008, and Grant 61572264,in part by the Natural Science Foundation of Tianjin, China, under Grant18JCYBJC15400 and Grant 18ZXZNGX00110, in part by the Open ProjectProgram of the National Laboratory of Pattern Recognition (NLPR), andin part by the Fundamental Research Funds for the Central Universities.This paper was presented at the AAAI [1]. The associate editor coordi-nating the review of this manuscript and approving it for publication wasProf. Peter Tay. (Corresponding author: Jufeng Yang.)

J. Liang, J. Yang, and M.-M. Cheng are with College of Computer Sci-ence, Nankai University, Tianjin 300350, China (e-mail: [email protected];[email protected]; [email protected]).

P. L. Rosin is with the School of Computer Science and Informatics, CardiffUniversity, Cardiff CF24 3AA, U.K. (e-mail: [email protected]).

L. Wang is with the National Laboratory of Pattern Recognition, CAS Centerfor Excellence in Brain Science and Intelligence Technology, Institute ofAutomation, Chinese Academy of Sciences, Beijing 100190, China (e-mail:[email protected]).

Digital Object Identifier 10.1109/TIP.2019.2903294

regarded as an important technique in the data mining com-munity and for various computer vision applications [5]–[8].Traditional subspace clustering methods approximate a setof high-dimensional data samples into a union of lower-dimensional linear subspaces [9], where each subspace usuallycontains a subset of the samples.

In past two decades, spectral-based subspace clusteringmethods have achieved state-of-the-art performance, taking atwo-step framework as follows. First, by optimizing a self-representation problem [4], [10], a similarity matrix (andalso a similarity graph) is constructed to depict the rela-tionship (or connection) among samples. Second, spectralclustering [11], [12] is employed for calculating the finalassignment based on eigen-decomposition of the affinitygraph. Note that in practice, both the number of subspacesand their dimensionalities are always unknown [4], [13].Hence, the goals of subspace clustering include finding theappropriate number of clusters and grouping data points intothem [14], [15].

Nevertheless, it is challenging to estimate the number ofclusters in a unified optimization framework, since the defini-tion of clusters is subjective, especially in a high-dimensionalambient space [14]. In addition, for samples which are ofdifferent clusters but near the intersection of two subspaces,they may be even closer than samples from same cluster.This may lead to a wrong estimation with several redundantclusters, namely over-segmentation problem. Therefore, mostof the spectral-based subspace clustering algorithms depend ona manually given and fixed number of clusters, which cannotbe generalized for multiple applications [4].

Most clustering schemes group similar patterns into thesame cluster by jointly minimizing the inter-cluster sim-ilarity and the intra-cluster dissimilarity [16]. Consideringthe complexity of the high-dimensional ambient data space,an effective way for estimating the number of clusters is tofirst map the raw samples into an intrinsic correlation space,namely similarity matrix, followed by an iterative optimizationaccording to the local and global similarity relationshipsderived from the projection. Elhamifar and Vidal [17] proposethat the permuted similarity matrix can be block-diagonal,where the number of blocks is identical to the number ofclusters. Moreover, Peng et al. [3] verify the intra-subspaceprojection dominance (IPD) of such a similarity matrix, whichcan be applied to self-representation optimizations with var-ious kinds of regularizations. The IPD theory says that for

1057-7149 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

https://orcid.org/0000-0003-2822-5466

https://orcid.org/0000-0003-0219-3443

https://orcid.org/0000-0001-5550-8758

https://orcid.org/0000-0002-4965-3884

3974 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 28, NO. 8, AUGUST 2019

two arbitrary samples from the same subspace and one fromanother, the generated similarities between the former samplesare always larger than between latter ones in a noise-freesystem.

Accordingly, considering an affinity graph derived from thesimilarity matrix [18], [19], where the vertices denote datasamples and the edge weights denote similarities, an automaticsub-graph segmentation can be greedily conducted via the fol-lowing two steps inspired by the density based algorithms [20]:1) constructing a proper number of initialized cluster centersby minimizing the weighted sum of all inter-cluster connec-tions and maximizing the intra-cluster ones; 2) merging theremaining samples to an existing cluster by maximizing theweighted connections between the sample and the cluster.

Yet, there are also difficulties for these greedy iterativeschemes. Since the ambient space can also be very dense [21],any two points which are close by evaluating the pairwisedistance may not belong to the same subspace, especially forsamples near the intersection of two subspaces. Consequently,a hypergraph where each edge can be connected to more thantwo samples [22], [23] is proposed to solve the problem intraditional pairwise graphs. In this paper, we further introducea novel data structure termed as the triplet relationship,to explore the local geometry structure in projected space withhyper-correlations. Each triplet consists of three points andtheir correlations, which are considered as a meta-element forclustering. We require that all correlations are large enough,which indicates that the three points are strongly connectedaccording to the IPD property [3].

In contrast to evaluating similarities using pairwise dis-tances, the proposed triplet relationship demonstrates favorableperformance due to the following two reasons. On one hand,it is more robust when partitioning the samples near the inter-section of two subspaces since the mutual relevance amongmultiple samples can provide complementary informationwhen calculating the local segmentation. On the other hand,the triplet evokes a hyper-similarity by efficiently counting thefrequency of intra-triplet samples, which enables a greedy wayto calculate the assignments.

Based on the newly-defined triplet relationship, in thispaper, we further propose a unified framework termed asthe autoSC to jointly estimate the number of clusters andgroup the samples by exploring the local density derived fromtriplet relationships. Specifically, we first calculate the self-representation for each sample via an off-the-shelf optimiza-tion scheme, followed by extracting the triplet relationship forall samples. Then, we greedily initialize a proper number ofclusters via optimizing a new model selection reward, whichis achieved by maximizing inter-cluster dissimilarity amongtriplets. Finally, we merge each of the remaining samples intoan existing cluster to maximize the intra-cluster similarity byoptimizing a new fusion reward. We also fuse groups to avoidover-segmentation.

The main contributions of this paper are summarized asfollows:• First, we define a hyper-dimensional triplet relationship

which ensures a high relevance and density among threesamples to reflect their local similarity. We also validate

the effectiveness of triplets and distinguish them againstthe standard pairwise relation.

• Second, we design a unified framework, i.e., autoSC,based on the intrinsic geometrical structures depicted byour triplet relationships. The proposed autoSC can beused for simultaneous estimating the number of clustersand subspace clustering in a greedy way.

Extensive experiments on benchmark datasets indicate thatour autoSC outperforms the state-of-the-art methods in botheffectiveness and efficiency.

This paper is an extended version of our earlier conferencepaper [1], to which we have enriched the contributions in thefollowing five aspects: (1) We have added detailed analysisof the proposed algorithm to distinguish it from comparativemethods, for example, we have added analysis and experimen-tal validation on the computational complexity. (2) We haveprovided a visualized illustration of the proposed autoSC forclear presentation. (3) We have designed a relaxation termedas the neighboring based autoSC (autoSC-N), which directlycalculates the neighborhood relationship from raw data spaceand is more efficient than autoSC. (4) We have conductedexperiments on evaluating the influence of the parameter m(number of preserved neighbors for each sample). (5) We haveexperimentally evaluated our method on a real-world applica-tion, i.e., motion segmentation, which also demonstrates thebenefits of the proposed method.

II. RELATED WORK

Automatically approximating samples in high-dimensionalambient space by a union of low-dimensional linear subspacesis considered to be a crucial task in computer vision [24]–[32].In this section, we review the related contributions in thefollowing three aspects, i.e., self-representation calculation,estimating the number of clusters and hyper-graph clustering.

A. Calculating Self-RepresentationTo separate a collection of data samples which are drawn

from a high-dimensional space according to the latent low-dimensional structure, traditional self-expressiveness basedsubspace clustering method calculates a linear representationfor each sample using the remaining samples as a basis setor a dictionary [33], [34]. Subspace clustering assumes thatthe set of data samples are drawn from a union of multiplesubspaces, which can best fit the ambient space [17]. Thereare numerous real applications satisfying this assumption withvarying degrees of exactness [35], e.g., face recognition,motion segmentation, etc.

By solving an optimization problem with self-representationloss and regularizations, subspace clustering [36], [37] calcu-lates a similarity matrix where each entry indicates the rele-vance between two samples. Different regularizing schemeswith various norms of the similarity matrix, e.g., �1 [9],�2 [38], elastic net [39] or nuclear norm [40], can exploredifferent intrinsic properties of the neighborhood space. Thereare mainly three types of the regularization terms, includingsparse-oriented, densely-connected and mixed norms.

Algorithms based on sparse-type norms [10], [41],e.g., �0 and �1 norms, eliminate most of the non-zero values

LIANG et al.: SIMULTANEOUS SUBSPACE CLUSTERING AND CLUSTER NUMBER ESTIMATING 3975

in the similarity matrix to ensure that there are no con-nections between samples from different clusters. Elhamifarand Vidal [9] propose the sparse representation based on�1 norm optimization. The obtained similarity matrix recoversa sparse subspace representation but may not satisfy thegraph connectivity if the dimension of the subspace is greaterthan three [18]. In addition, the �0 based subspace clusteringmethods aim to compute a sparse and subspace-preservingrepresentation for each data sample. Yang et al. [10] presenta sparse clustering method with a regularizer based on the�0 norm by using the proximal gradient descent method.Numerous alternative methods have been proposed for �0 min-imization while avoiding non-convex problems, e.g., orthogo-nal matching pursuit [42] and nearest subspace neighbor [43].The scalable sparse subspace clustering by orthogonal match-ing pursuit (SSC-OMP) method [33] compares elements ineach column of the dot product matrix to determine whichpositions of the similarity matrix should be non-zero. However,this general pairwise relationship does not reflect the samplecorrelation well, especially for data pairs in the intersection oftwo subspaces [44].

In contrast, dense connection based methods, such assmooth representation [38] with �2 norm and low rank repre-sentation with nuclear norm based methods [45], [46], proposeto preserve many non-zero values in the similarity matrix toensure the connectivity among intra-cluster samples [47]–[49].For these densely connected frameworks [50], [51], the simi-larity matrix is interpreted as a projected representation of rawsamples. Each column of the matrix is considered as the self-representation of a sample, and should be dense for mappinginvariance (also termed as the grouping effect [38], [52]).Low-rank clustering methods [53], [54] solve a nuclear normbased optimization problem with the aim of generating ablock diagonal solution with dense connections. However,the nuclear norm does not enforce subset selection well whennoise exists, and the self-representation is too dense to be anefficient feature.

Neither a sparse nor dense similarity matrix reveals acomprehensive correlation structure among samples due totheir conflicted nature [55]–[57]. Consequently, to achievetrade-off between sparsity and the grouping effect, numerousmixed norms, e.g., trace Lasso [58] and elastic net [39], havebeen integrated into the optimization function. Nevertheless,the structure of the data correlations depends on the datamatrix, and the mixed norm is not effective for structureselection. Therefore, this method does not perform consistentlywell on different applications.

Recently, many frameworks that incorporate various con-straints into the optimization function have been pro-posed to detect different intrinsic properties of thesubspace [40], [59], [60]. For instance, to handle sequentialsamples, Guo et al. [61] explore the neighboring relationshipby incorporating a new penalty, i.e., a lower triangular matrixwith −1 on the diagonal and 1 on the second diagonal, to forceconsecutive columns in the similarity matrix to be closer.In this paper, based on the intrinsic neighboring relevanceand geometrical structures depicted in the similarity matrix,

we calculate triplet relationships to form a hyper-correlationconstraint of the clustering system. We validate the robust-ness of the proposed triplet relationship on top of differentsimilarity matrices with various intrinsic properties.

B. Estimating the Number of ClustersMost of the real applications in computer vision require

estimating the number of clusters, according to the latent dis-tribution of data samples [9]. To solve this problem, three maintechniques exists: singular-based Laplacian matrix decomposi-tion, density-based greedy assignment and hyper-graph basedsegmentation.

Singular-based Laplacian matrix decomposition is commonin subspace clustering [62] and spectral clustering [63] dueto the availability of the similarity matrix. Liu et al. [64]propose a heuristic estimator inspired by the block-diagonalstructure of the similarity matrix [48]. Specifically, they esti-mate the number of clusters by counting the small singularvalues of a normalized Laplacian matrix which should besmaller than a given cut-off threshold. These singular basedmethods [9], [65] are dependent on a large gap between sin-gular values, which is limited to applications in which thesubspaces are sparsely distributed in the ambient space. Mean-while, the matrix decomposition process is time-consumingwhen extended to large scale problems. Recently, Li et al.propose SCAMS [35], [66] which estimates the number ofclusters by minimizing the rank of a binary relationship matrixencoding the pairwise relevance among all data samples.Simultaneously, they incorporate a penalty term on the clus-tering cost by minimizing the Frobenius inner product of thesimilarity matrix and binary relationship matrix.

Density based methods [67] greedily discover both theoptimal number of clusters and the assignments of data tothe clusters according to the local and global densities whichare calculated by the pairwise distances in ambient space.Rodriguez and Laio [20] automatically cluster samples basedon the assumption that each cluster center is characterized bya higher density in the weight space than all its neighbors,while different centers should be far apart enough to avoidredundancy. Specifically, for each sample, its Euclidean-basedlocal density and the distance to any points with higher den-sities are iteratively calculated and updated. In each iteration,the algorithm finds a trade-off between the density of clustercenters and the inter-cluster distance to update the assign-ments. Wang and Zhu [13] employ the Bayesian nonparametricmethod based on a Dirichlet process, and propose DP-space,which exploits a trade-off between data fitness and modelcomplexity. DP-space is more tolerate to noisy and outliervalues than the alternative algebraic and geometric solutions.Recently, correlation clustering (CC) [68] first constructs anundirected graph with positive and negative edge weights,followed by minimizing the sum of cut weights during thesegmenting process. Sequentially, the clustering assignmentscan be optimized with a greedy scheme. Nevertheless, mostof these density based algorithms are limited to pairwisecorrelation when evaluating the similarity of data samples,which is not robust for densely distributed subspaces.


C. Hyper-Graph ClusteringTo tackle the limitations of the pairwise relation based

methods, the hyper-graph relation [44], [69], [70] is proposedand the related literature follows two different directions.Some transform the hyper-correlation into a simpler pairwisegraph [22], [71], followed by a standard graph clusteringmethod, e.g., normalized cut [12], to calculate the assignments.In addition, other methods [14], [45] explore a generalizedway of extending the pairwise graph to the hyper-graph orhyper-dimensional tensor analysis. For instance, Li et al. [14]propose a tensor affinity variant of SCAMS, i.e., SCAMSTA,which exploits a higher order mathematical structure by pro-viding multiple groups of nodes in the binary matrix derivedfrom an outer product operation on multiple indicator vectors.However, estimating the number of clusters from the rank ofaffinity matrix only works well for the ideal case, and canhardly be extended to complex applications since the noisecan have a significant impact on the rank of affinity matrix.

In this paper, we estimate the number of clusters byinitializing the cluster centers with maximum inter-clusterdissimilarities and also maximum local densities. We calculatethe initialization according to the local correlations reflected bythe proposed triplet relationships, where each of them depicts ahyper-dimensional similarity among three samples and easily-evaluated relevances to other triplets. Both theoretical analysison triplets as well as the experimental results demonstrate theeffectiveness of the proposed method.

III. METHODOLOGY

A. PreliminaryThe main notations in the manuscript and the corresponding

descriptions are shown in Table I. Given a set of N data sam-ples X = {xi ∈ R

D}Ni=1 lying in K subspaces {Si }Ki=1 where Ddenotes the dimensionality of each sample, spectral-based sub-space clustering usually takes a two-step approach to calculatethe clustering assignment. First, it learns a self-representationfor each sample to disentangle the subspace structure. Thealgorithm then employs spectral clustering [12] on the learnedsimilarity graph derived from C for final assignments. Notethat in practice, both the number of subspaces K and theirdimensions {d j }Kj=1 are always unknown [4], [13]. Hence,the goals of subspace clustering include finding the appropriateK and assigning data points into K clusters [14], [17].

In this paper, inspired by the block-diagonal structure of thesimilarity matrix [72], we propose to simultaneously estimatethe number of clusters and assign samples into each cluster ina greedy manner. We design a novel meta-sample that we calla triplet relationship, followed by optimizing both a modelselection reward and a fusion reward for clustering.

B. Learning the Self-RepresentationTo explore the neighboring relationship in the ambient space

X = {xi ∈ RD}Ni=1, typical subspace clustering methods first

optimize a linear representation of each data sample usingthe remaining dataset as a dictionary. Specifically, spectral-based subspace clustering calculates a similarity matrix C ∈

TABLE I

SUMMARY OF MAIN NOTATIONS IN THE MANUSCRIPTAND CORRESPONDING DESCRIPTIONS

RN×N by solving a self-representation optimization problem

as follows:

minC

L(X̃C, X̃)+ λ‖C‖ξ , (1)

where X̃ indicates the data matrix composed of the sam-ples in X , L(·, ·) : R

N×D − RN×D → R

+ denotes thereconstruction loss, λ is the trade-off parameter and ‖·‖ξdenotes the regularization term where different ξ ’s lead tovarious norms [4], [10], e.g., ‖C‖1 [9], [17], ‖C‖∗ [48], [64],‖C‖2F [52], or many kinds of mixed norms like traceLasso [58] or elastic net [39].

The C in (1) can be explained as a new representationof X̃ , where each sample xi ∈ R

D is mapped to ci ∈ RN .

Furthermore, C is a pairwise distance matrix where each entryci j reflects the similarity between two samples xi and x j .Nevertheless, the pairwise distance reflects poor discriminativecapacity on partitioning samples near the intersection of twosubspaces. To handle this problem, in this paper, we explorea higher-dimensional similarity called Triplet relationship,which is based on a greedy combination of pairwise distancesreflected by C .

C. Discovering Triplet Relationships

Given the similarity matrix C where each entry ci j reflectsthe pairwise relationship between xi and x j , we propose tofind the neighboring structure in a greedy way. For each datasample x j ∈ S, subspace clustering algorithms calculate a


projected adjacent space based on the self-expressive property,i.e., each data sample can be reconstructed by a linear combi-nation of other points in the dataset X [17], [73]. Therefore,x j is represented as

x j = X̃c j , s.t ., c j j = 0, (2)

where the data matrix X̃ is considered as a self-expressiondictionary for representation. In addition, c j = [c1 j , c2 j ,· · · , cN j ] records the coefficients of such combination system.With the regularization from various well-designed normson c j , the optimized result of (2) is capable of preserving onlylinear combinations of samples in S while eliminating others.Inspired by [3] and [43], for each sample x j , we first collectits m nearest neighbors, i.e. those with the top m coefficientsin c j . The m nearest neighbors are defined as follows.

Definition 1: (m Nearest Neighbors) Let Nm(x j ) ∈ R1×m

denote the m nearest neighbors for data point x j . Let:

Nm(x j ) = arg max{xil }

m∑

l=1

|cil , j |. (3)

where i l denotes the set of indices for the nearest neighbors,and ci l , j denotes the coefficient between xi l and x j .

According to Definition 1, we obtain Nm(x j ) for x j whichcontains the m samples with the m largest coefficients in c j .The number of preserved neighbors, i.e., the parameter m,reflects the intrinsic dimension of the low-dimensional sub-spaces [74], [75], which we empirically evaluated in the exper-iment section. Based on the m nearest neighbors, we definethe triplet relationship to explore the local hyper-correlationamong samples.

Definition 2: (Triplet Relationship) A triplet τ includesthree samples, i.e., τ = {xi , x j , xk}, and their relationships,if and only if xi , x j and xk satisfy:

1xi∈Nm(x j ) × 1x j∈Nm(xk ) × 1xk∈Nm(xi ) = 1, (4)

where 1x∈Nm denotes the indicator function which equals 1 ifx ∈ Nm and 0 otherwise.

Based on Definition 2, we obtain n triplets where we alwayshave n > N , i.e., each sample is included in multiple tripletrelationships. For clarity of presentation, we define a tripletmatrix T ∈ R

n×3 for data samples X , where each row of Trecords the indices of a samples in a triplet τ = {xi , x j , xk}.

Compared against the traditional pairwise relationshipevoked from C , the triplet incorporates complementary usingthe constraint in (4), which shows more robust capacity inpartitioning samples near the intersection of two subspaces.Each triplet depicts a local geometrical structure which enablesa better performance to estimate the density of each sample.Furthermore, the overlapped samples in multiple triplets reflecta global hyper-similarity among each other, which can be mea-sured efficiently. Therefore, based on the triplet relationship,we can jointly estimate the number of subspaces and calculatethe clustering assignment in a greedy manner.

D. Modeling Clustering Rewards

Given X , we iteratively group data samples into clusters,i.e., {Ci }K̂i=1, where K̂ denotes the estimated number of sub-spaces. According to the greedy strategy, in the I -th iteration,

the triplet T is divided into two subsets, i.e., “in-cluster”triplets T I

in which are already assigned into clusters, and “out-of-cluster” triplets T I

out which are still to be assigned in thesubsequent iterations. For clear presentation, we reshape bothmatrices T I

in ∈ Rp×3 and T I

out ∈ Rq×3 to vectors X I

in ∈ R3p

and X Iout ∈ R

3q . In each iteration, we propose to optimize twonew rewards, i.e., the model selection and the fusion reward,to simultaneously estimate the number of clusters and mergesamples into respective cluster.

Definition 3: (Model Selection Reward) Given X Iin and

X Iout in the I -th iteration, the model selection reward Rm(C)

for each initialized cluster C in {Ci }K̂i=1 is defined as:

Rm(C) =∑

i

σ(Ci |X Iout)− λm

∑

i

σ(Ci |X Iin), (5)

where σ(C|X) is a counting function on the frequency thatx ∈ X for all x ∈ Ci , λm denotes the trade-off parameter.

By maximizing the model selection reward Rm(C), we gen-erate the initialized cluster {Ci }K̂i=1 which has the followingtwo advantages, where K̂ is the estimated number of clusters(see Fig. 1 for visualization). Firstly, the local density ofsample x ∈ C is high, i.e., x has a large amount of correlatedsamples in Xout , which enables many to be merged in the nextiteration. Secondly, each C has little correlation with samplesin X in , which eliminates the overlap of any inter-clusters.Consequently, we can simultaneously estimate K̂ and initializethe clusters by optimizing the model selection reward Rm .

Definition 4: (Fusion Reward) Given the initialized clus-ters {Ci}K̂i=1, the fusion reward is defined as the probabilitythat x j ∈ Xout is assigned into Ci :

Rif (Ci |x j ∈ Xout ) = σ(x j |Ci )+ λ f σ(Nm(x j )|Nm(Ci )), (6)

where Nm(x j ) denotes the m nearest neighbors of x j andNm(Ci ) denotes the set of m nearest neighbors of samples inCi , λ f denotes the trade-off parameters.

In the optimization procedure, we calculate K̂ fusionrewards {Ri

f }K̂i=1 for each x j , which represent the probabilities

that x j is assigned into clusters {Ci }K̂i=1, respectively. We thenmerge x j into the cluster with the largest fusion reward, andmove x j from Xout to X in .

E. Automatic Subspace Clustering AlgorithmThe first triplet for initializing a new cluster is chosen to

have maximal local density. The local density ρ is defined asfollows.

Definition 5: (Local Density) The local density ρ of thetriplet τ regarding to the Xout is defined as follows:

ρ(τ , Xout ) =|n|∑

j=1

σ(xn j |Xout), (7)

where xn j denotes the sample in the current triplet τ and nis the set of their indexes, |n| denotes the scale of n.

Also, to measure the hyper-similarity between samples anddetermine the optimal triplet to merge into the initializedclusters, we define the connection score s as follows.

Definition 6: (Connection Score) The connection score sbetween samples xi and x j is defined as:


Fig. 1. Overview of the proposed autoSC. The algorithm is composed of three steps, i.e., calculating triplet relationships T (blue dashed box), estimating thenumber of clusters via model selection reward (black) and finishing the clustering assignment via fusion reward (green). Given similarity matrix C derivedfrom self-representation schemes, we illustrate an example of the triplet which is composed of the three samples shown with magenta frames. These samplesinduce a high local correlation and should be grouped into the same cluster. By optimizing the fusion reward, sample ‘a’ is assigned into C1 since C1 hasmost connections to the triplet which involves ‘a’.

s(xi , x j ) = f

⎛

⎝xi

∣∣∣∣n′⋂

k=1

(1x j∈τ k × τ k

)⎞

⎠ , (8)

where 1x j∈τ k is equal to 1 when x j ∈ τ k and 0 otherwise, n′is the number of all triplets in T out .

We greedily optimize the proposed model selection rewardRm and fusion reward R f in autoSC to simultaneously esti-mate the number of clusters and generate the segmentationamong samples:

maxG,K̂

K̂∑

k=1

Rm(Gk)+ λ

K̂∑

k=1

R f (Gk |X),

s.t . Gk

⋂Gk′ �=k = ∅,

K̂⋃

k=1

Gk = [1, · · · , N], (9)

where G = {G1, · · · ,GK̂ } denotes the set of the result groups,K̂ is the estimated number of clusters and [1, · · · , N] denotesthe universal ordinal set of samples.

We present the proposed autoSC in Fig. 1 and Algorithm 1.Specifically, the optimization includes three steps: 1) generat-ing the triplet relationships T from the similarity matrix C;2) estimating the number of clusters K̂ and initializing theclusters C; 3) assigning the samples x ∈ Xout into propercluster.

1) Calculating Triplets: The similarity matrix C reflects thecorrelations among samples [9], where larger values demon-strate stronger belief for the correlation between samples. Forinstance, ci j > cik indicates a larger probability for xi and x j

being in the same cluster over xi and xk . Accordingly,we explore the intrinsic local correlations among samples bythe proposed triplets derived from C .

Many subspace representations guarantee the mappinginvariance via a dense similarity matrix C . However, the gen-eration of triplets relies only on the strongest connections toavoid the wrong assignment. Therefore, for each column of C ,i.e., ci , we preserve only the top m values which are thenmodified to 1 for a new binary similarity matrix C∗.

Algorithm 1 Automatic Subspace Clustering (AutoSC)

Then, we extract each triplet from C∗ by the followingfunction:

τ = {xn1, xn2, xn3} ∈ T ,

if and only if : c∗n1n2× c∗n2n3

× c∗n3n1= 1,

(10)

where c∗xy denotes the xy-th value of C∗. Note each samplex can appear in many triplets. Therefore, we consider eachτ as a meta-element in the clustering, which improves therobustness due to the complementarity constraints.

2) Initializing Clusters: In the I -th iteration, we first deter-mine an initial triplet (termed as τ I

ini ) from T out to initialize


the cluster C, followed by merging the most correlated samplesof τ I

ini into each C.Following [20], we initialize a new cluster using τ I

ini withhighest local density:

τ Iini = arg max

τρ(τ , X I

out), (11)

where ρ calculates the local density defined in Definition 5.The high local density of the triplet reflects the most connec-tions between τ i and other triplets, which produces the mostconnections between xni

jand other samples in X I

out .

Once the initialized triplet τ Iini is determined, we iteratively

extend the initialized cluster C by fusing the most confidenttriplets. For each triplet τ i in Tout , we calculate the sum ofthe connection score regarding the samples in C to greedilydetermine whether the samples in τ i should be assigned intoC or not:

τ∗ = arg maxτ

3∑

j=1

|m|∑

κ

sn j mκ ,

s.t .3∑

j=1

|m|∑

κ

sn j mκ > 1; {xn j }3j=1 ∈ τ ; {xmκ }|m|κ=1 ∈ C,

(12)

where n, m denote the set of indexes for the samples in τ

and C, respectively. We iteratively update the auxiliary setsT I

out , T Iin , X I

out and X Iin in the iterations.

3) Terminating: We terminate the process of estimating thenumber of clusters and get K̃ clusters if and only if τ K̃+1

inisatisfies:

ρ(τ K̃+1ini , X K̃+1

out ) ≤ ρ(τ K̃+1ini , X K̃+1

in ). (13)

Specifically, if the samples in τ K̃+1ini are of high frequency

in X K̃+1in , i.e., the triplet with the highest local density in T K̃+1

out

is already contained in T K̃+1in , we consider that the clusters are

sufficient for modeling the intrinsic subspaces.4) Avoiding Over-Segmentation: We also introduce an alter-

native step to check the redundancy among initialized clusters{Ci }K̃i=1 to avoid over-segmentation. We calculate the connec-tion scores s for small-scale clusters against others, and mergethe highly correlated clusters Ci and C j if we have

si j > min(|Ci |, |C j |), (14)

where |C| denotes the number of samples in C. We then get theinitialized clusters {Ci }K̂i=1, where K̂ is the estimated numberof clusters and K̂ ≤ K̃ .

5) Assigning Rest Samples: Given {Ci }K̂i=1, we assign eachof the remaining samples into C which evokes an optimalfusion reward. For x j , we find its optimal cluster C∗ by thefollowing equation:

C∗ = arg maxCi

R f (Ci |x j ), i ∈ {1, 2, · · · , K̂ }, (15)

where R f (C|x) is the fusion reward defined by (6).

F. An Extension: Neighboring Based AutoSC AlgorithmIn Definition 1, we collect m nearest neighbors according

to the magnitude of similarities between sample x j and all

Algorithm 2 Neighboring Based AutoSC (AutoSC-N)

other samples in X . These similarities are depicted in C byoptimizing (1) which is composed of a reconstruction lossterm and a regularization term. In this subsection, we extendthe autoSC with an alternative technique to find Nm(x j ) foreach x j based on greedy search.

For each data sample x j , we let S Ij be the subspace

spanned by x j and its neighbors in the I -th iteration, wherethe neighbor set N j is initialized as N 0

j = ∅, and S Ij ∈

RD×dim(N j ). In each iteration, we measure the projected

similarity between x j and other non-neighbor samples bycalculating the orthonormal ordinates in the spanned subspace.For example, to calculate the similarity between x j and xk inthe I -th iteration, we have

s jk = ‖(SIj )xk‖2F , (16)

where ‖·‖2F denotes the Frobenius norm and xk ∈ X−N .Consequently, for x j in the I -th iteration, we find the closestneighbor and update S j as follows:

S I+1j ← S I

j ∪ arg maxxk∈X−N

s jk . (17)

Here, we find one neighbor in each iteration and update thespanned subspace accordingly. The newly spanned subspacereflects more local structure of the ambient space which isassumed to cover the current sample. The neighbor set N j isalso updated by adding the new neighbor which is found inthe I -th iteration. Finally, with m iterations for each sample,we get an alternative m nearest neighbor set N .

Given the neighbor matrix N , we propose the neighboringbased autoSC algorithm (autoSC-N) to directly discover thetriplet relationship among data samples, followed by optimiz-ing both model selection and fusion rewards for clustering.The main steps of autoSC-N are summarized in Algorithm 2.

G. Computational Complexity AnalysisIn traditional subspace clustering system, the calculation

of self-representation requires solving N convex optimization


problems over D × N constraints [21]. Spectral clustering isbased on an eigen-decomposition operation on the Laplacianmatrix followed by conducting K-means on the eigenvectors,both of which are time-consuming, involving a complexalgebraic decomposition and iterative optimization, respec-tively [73], [76]. The overall computational complexity can bemore than O(N3). For the proposed autoSC, it takes O(Nm2)to collect the triplet relationships for N samples in the spacespanned by the m nearest neighbors. Here, since we havem � N , the complexity of collecting the triplet relationshipsis O(N). The optimization of both model selection and fusionrewards takes O(Nn) where the number of triplets n hasthe same order of magnitude as N . Specifically, we haven ≈ N × m

2 , and thus the complexity of clustering is O(N2).For the extension, i.e., autoSC-N, collecting the neighbor

matrix takes O(Nm) where the basic operation is a dot productof the D-dimensional tensors. This avoids the calculation ofany convex optimization problem.

IV. EXPERIMENTS

A. Experimental Setup

In the experiments1, we compare the automatic methods onthe benchmark datasets, i.e., the extended Yale B [77] and theCOIL-20 [78] dataset, followed by verifying the robustness ofthe proposed method to different C derived from various self-representation schemes along with combinations of differentmethods for estimating the number of clusters and segmentingthe samples. We design comprehensive evaluation metrics tovalidate the clustering performance, i.e., the error rate ofthe number of clusters and the triplets. For all experimentson subsets, the reported results are the average of 50 trials.We also conduct experiments on a motion segmentation taskusing the Hopkins 155 dataset.

1) Datasets: The extended Yale B [77] dataset is a widelyused face clustering dataset which contains face imageswith different illumination of 38 subjects, each subject has64 images.

The COIL-20 [78] dataset consists of 20 different realsubjects, including cups, bottles and so on. For each subject,there are 72 images with different camera viewpoints.

The Hopkins 155 dataset [79] consists of 155 videosequences. For each video sequence, there are 2 or 3 motions.

2) Comparative Methods: We make comparisons with thefollowing methods: SCAMS [14], [35], a density peak basedmethod (DP) [20], a singular value decomposition basedmethod (SVD) [64] and DP-space [13]. Besides, we utilize thefollowing subspace representation methods to generate differ-ent coefficient matrices C: LRR [45], CASS [58], LSR [52],SMR [38] and ORGEN [39]. The similarity matrix C is thenused to calculate the triplet relationships for autoSC.

3) Evaluation Metrics: To evaluate the performance of theproposed triplets, we define the error rate A as follows:

A = 1

n

n∑

i=1

3− σ(τ i |g∗i )2

, (18)

1Code available at https://github.com/JLiangNKU/autoSC

Fig. 2. Visualization of the clustering assignments for six methods, i.e.,(a) SCAMS, (b) DP, (c) SVD, (d) DP-space, (e) autoSC-N and (f) autoSC.The experiments are conducted on the extended Yale B dataset with 8 subjects.SCAMS over-segments the samples, while DP-space assigns the majority intoone cluster. The proposed autoSC-N and auto-SC do not suffer from theseproblems.

where n denotes the number of the triplets and σ(τ |g∗) is thecounting function on the frequency that x ∈ g∗ for all x ∈ τ .Here the output of f ranges from 0 to 2. The dynamic set g∗iconsists of samples in one subspace S according to the groundtruth, where S contains as many samples in τ i as possible.

We introduce the error rate of the number of clusters (NCe)as the primary evaluation metric for the clustering methodswhich estimate the number of clusters K̂ automatically:

NCe = 1

M

M∑

i=1

|K̂i − K |, (19)

where K is the real number of clusters, M is the number oftrials and K̂i is the estimated number of clusters in the i -thtrial. We also use the standard normalized mutual informa-tion (NMI) [80] to measure the similarity between two cluster-ing distributions, i.e., the prediction and the ground truth. Withrespect to NMI, the entropy illustrates the nondeterminacyof one clustering to the other, and the mutual informationquantifies the amount of information that one variable obtainsfrom the other.

4) Parameter m: The parameter m in Definition 1, i.e., thenumber of preserved neighbors for each sample, is relatedto the intrinsic dimension of the subspaces. We empiricallyevaluate the influence of m on both extended Yale B andCOIL-20 datasets with 15 subjects. Besides, we use subspacerepresentations derived from SMR. The results are shownin Table III. As shown in the table, the proposed methodachieves best performance when we have m = 8 for mostcases. Actually, the parameter m is robust since the perfor-mance is stable when m > 8.

B. Comparisons among Automatic ClusteringWe conduct experiments on the extended Yale B and

COIL-20 datasets with different numbers of subjects, and com-pare four methods with the proposed autoSC and autoSC-Non the metrics of NCe and NMI. For SCAMS [14], [35],DP [20], SVD [64] and our autoSC, the optimization modulein SMR [38] is employed to generate the similarity matrix C .The DP-space method simultaneously estimates K̂ and findsthe subspaces without the requirement of a similarity matrix.All parameters of the contrasted methods are tuned to providethe best performance.

Fig. 2 and Table II report the performance. As shown inTable II, when combining SMR, the averaged NCe of autoSCis smaller than other comparative methods on all experimental


TABLE II

OVERALL COMPARISON AMONG COMPARATIVE METHODS ON SUBSETS OF THE EXTENDED YALE B AND COIL-20 DATASETS. THE SIMILARITYMATRIX C CALCULATED BY BOTH SMR AND LSR IS UTILIZED AS THE CORRELATION MATRIX OF SCAMS, DP, SVD AND THE

PROPOSED AUTOSC. THE BEST RESULTS ARE IN BOLD FONT WHILE ∗ INDICATES THE SECOND BEST PERFORMANCE

TABLE III

CLUSTERING PERFORMANCE OF THE PROPOSED AUTOSC WITH

DIFFERENT m (THE FIRST ROW) ON BOTH THE EXTENDEDYALE B (EYALEB) AND COIL-20 (COIL-20) DATASETS

WITH 15 SUBJECTS EACH. THE SUBSPACE

REPRESENTATION IS DERIVED FROM SMR.BASED ON THE RESULTS, WE SET m = 8

IN THE REST OF THE PAPER

configurations, indicating that it gives a close estimation onthe number of clusters. For example, the estimated K̂ onthe extended Yale B with 8 subjects has a deviation of lessthan 1, and produces a NMI higher than 0.9. For each of theproposed triplet relationships, three samples are encouragedto be highly correlated and are considered as a meta-elementduring clustering, which can introduce more accurate infor-mation for segmenting two densely distributed subspaces thanthe conventional pair-wise methods. In addition, the operationfor avoiding over-segmentation as shown in (14) also inducesa better performance on determining the number of clus-ters. However, combining the similarity matrix derived fromLSR [52] can hardly outperform autoSC-N which calculatesNm(x j ) using greedy search in data space. Although it outper-forms the others without the triplet relationships, the resultsstill demonstrate a dependence of the proposed method ontuning the similarity among samples. The autoSC-N getsthe second best performance on most configurations, whichdemonstrates the effectiveness of both triplet relationship andreward optimization.

In contrast, SVD achieves comparable results on the small-scale configuration of each dataset, but the performancebecomes poor when the number of samples increases. It ismainly because the largest gap between the pair of singu-lar values decreases when the number of clusters becomeslarger. When combining SMR, SCAMS performs comparablyaccording to NMI on both datasets, however, as is illustratedin Fig. 2 (a) and Table II, it provides a much larger K̂ thanthe ground truth, e.g., K̂ > 100 when K = 30 on theextended Yale B dataset. NMI does not strongly penalizeover-segmentation, making the metric NCe be the primaryevaluation of the SCAMS method. The DP-space performswell on NCe, but has poor performance on the NMI. This isbecause most samples are assigned into one cluster, and theother clusters are small. In addition, when combining LSR,as shown in Table II, the performance of all methods decrease,while the proposed autoSC still achieves best performanceon most configurations against other state-of-the-art methods.It demonstrates the generalization ability of our autoSC. Themain reason is that the three samples in any triplet relation-ships are complementary and can be more robust on wrongsimilarities to some extent. It is also proved by the consistentlyhigh accuracy of triplets as shown in Table IV, where thepotential errors of similarities derived by LSR are compensatedor corrected during the extraction of triplet relationship.

C. Robustness to Self-Representations SchemesThe methods including SCAMS [14], [35], DP [20],

SVD [64] and the proposed autoSC require the similaritymatrix C as input. Also, for DP [20], the distance amongsamples needs to be calculated. We calculate the distance di j

between samples xi and x j by di j = 1ci j

rather than thesimple Euclidean distance. To verify the robustness of the


TABLE IV

ERROR RATES OF TRIPLETS (A) FOR THE PROPOSED AUTOSC ON SUBSETS OF THE EXTENDED YALE B (EYALEB) AND COIL-20 DATASETS.FOR EACH ROW, WE UTILIZE THE SIMILARITY MATRIX C DERIVED FROM ONE SELF-REPRESENTATION METHOD IN THE

FIRST COLUMN. THE AUTOSC ACHIEVES CONSISTENCY ON THE CALCULATION OF THE TRIPLETS

Fig. 3. Clustering results using different self-representation schemes onthe extended Yale B dataset with 8 subjects. The left figure denotes thecomparison of four methods using the NCe metric while the right oneuses NMI. Each point in the curves is derived by the combination of differentclustering methods and self-representation schemes. The proposed autoSCachieves consistent performance on the evaluation of NCe .

proposed autoSC regarding various subspace representations,we calculate the similarity matrix C using 5 subspace repre-sentation modules, followed by the combinations with the 4methods which automatically estimate the number of clustersand segment the samples.

Table IV shows the evaluation results of A on both datasetswith the combinations of 5 subspace representations, while theNCe and NMI on the extended Yale B dataset with 8 subjectsare reported in Fig. 3. Moreover, we visualize the similaritymatrix C derived from 5 subspace representation modulesin Fig. 4. We can see from Fig. 3 that the SCAMS, DP andSVD methods are sensitive to the choice of the subspacerepresentation module. For example, DP estimates K̂ as arelatively close value to the ground truth when combined withSMR (NCe = 3.06), but generates a totally wrong estima-tion when combined with LRR (NCe = 265.60). Differentsubspace representation modules generate coefficient matriceswith various intrinsic properties [4], thus the parameter fortruncation error ε needs to be tuned carefully.

For the proposed autoSC, it is stable on different combina-tions considering the metric of NCe and A, which demon-strates the complementary ability of the proposed method.For all combinations, the error rate of the triplets obtainedfrom (10) is less than 2%, which guarantees the consistencyof the proposed autoSC with different kinds of C . It is also themain reason that the proposed method is robust on combiningdifferent self-representation schemes. Furthermore, it showsbetter performance when combined with CASS, LSR and SMRthan other combinations on both metrics in Fig. 3. The reasonlies on the guarantee of the mapping invariance which istermed as the grouping effect [38], [52], [58], together with

the filtering of weak connections and the self-constraint amongsamples within triplets. As shown in Fig. 4 (b), (c), (d),the coefficient matrices are dense while it shows block-diagonal structure in Fig. 4 (d) and each block corresponds toone cluster. Therefore, the nearest neighbors which are used togenerate the triplets can be chosen precisely. The performancedecreases when combined with ORGEN since the similaritymatrix C derived from ORGEN is sparse with less locationsfor constructing effective triplets.

D. Time Efficiency

Table V shows the run-time of comparative methods usingsubsets from the extended Yale B dataset. The experimentsare conducted on a machine with a 2.93GHz CPU and32GB RAM. AutoSC-N requires the least run-time comparedto all comparative methods due to the following two reasons.First, autoSC-N explores the neighborhood relationship in theraw data space rather than solving a convex optimizationproblem. Second, it employs a greedy optimization schemeto estimate the number of clusters and calculate the clusteringassignment rather than a complex optimizing method such ascomputing the singular value decomposition. Note the pro-posed autoSC method achieves the second best result amongcomparative methods.

E. Real World Application: Motion Segmentation

Motion segmentation refers to the task of segmenting multi-ple video sequence. The candidate video is composed of multi-ple foreground objects, which are rapidly moving and requiredto be clustered into spatiotemporal regions correspondingto specific motions. Following the traditional scheme [17],we consider the Hopkins 155 dataset [79] and solve the motionsegmentation problem by first extracting a set of featurepoints for each frame followed by clustering them based onthe motions. Table VI reports the comparison against fourautomatic clustering methods. For SCAMS [14], [35], DP [20]and SVD [64], the SMR [38] is firstly conducted to calculatethe similarity matrix. As shown in the table, the proposedautoSC achieves best performance on both metrics, indicatingthat the autoSC is effective at both estimating the number ofmotions (about 0.18 error rate) and segmenting the featurepoints (obtains NMI of more than 0.98). In addition, it showsfavorable efficiency on the motion segmentation task. TheautoSC-N is the most efficient method (0.26s per sequence)with second best performance on NCe and NMI. The SVD


Fig. 4. Visualization of the Similarity Matrix C Derived From Different Self-Representation Schemes on the Extended Yale B Dataset With 3 Subjects. TheWhite Regions Denote the Locations With non-Zero Coefficients. Different Methods, i.e., (a) LRR, (b) CASS, (c) LSR, (d) SMR and (e) ORGEN, ProduceC With Different Characteristics, e.g., (d) Derived From SMR is Block-Diagonal, While (e) Derived From ORGEN is Sparse.

TABLE V

COMPARISON OF TIME-CONSUMPTION (IN SECONDS) ON THE EXTENDED

YALE B DATASET. THE BEST RESULTS ARE IN BOLD FONT WHILE ∗INDICATES THE SECOND BEST PERFORMANCE

TABLE VI

MOTION SEGMENTATION PERFORMANCE ON HOPKINS 155 DATASET.THE ROW OF ‘TIME’ REPORTS THE AVERAGE TIME-CONSUMPTION

ON HANDLING EACH VIDEO SEQUENCE. THE BEST RESULTS

ARE IN BOLD FONT WHILE ∗ INDICATES THE SECOND BESTPERFORMANCE. THE PROPOSED AUTOSC OUTPERFORMS

OTHER COMPARATIVE METHODS REGARDING BOTH

THE ESTIMATION OF K AND CLUSTERING

method obtains the best result among other comparative meth-ods, but it consumes much more time (about more than 2.5sper sequence) due to the singular value decomposition process.

V. CONCLUSION

In this paper, we present to jointly estimate the numberof clusters and segment the samples accordingly. We firstdesign a hyper-correlation oriented meta-element termed as thetriplet relationship based on the similarity matrix derived fromstate-of-the-art subspace clustering methods. The calculationof triplets is considered as a mapping from pairwise correlationto hyper-correlations, which is more robust than pairwiserelationships when partitioning samples near the intersectionof two subspaces due to the complementarity of mutualrestrictions. The proposed autoSC and autoSC-N are verifiedto be effective mainly due to the robustness and effectivenessof the triplet relationship. Experimental results also showgeneralization capacity of the proposed method on variousschemes when calculating the similarity matrix. It wouldbe interesting to explore this effective triplet relationship inother hot topics which rely on modeling the relationshipamong samples, such as metric learning [81] and zero-shotlearning [82]. etc. In addition, it is also a potential topic to

combine the triplet relationship with graph neural networks(GNN, [83]) since GNN is a powerful tool on relationalreasoning by learn the relationship between nodes and canalso benefit from hyper-correlations.

REFERENCES

[1] J. Yang, J. Liang, K. Wang, Y.-L. Yang, and M.-M. Cheng, “Auto-matic model selection in subspace clustering via triplet relationships,”in Proc. AAAI, 2018, pp. 4358–4365.

[2] X. Wang, X. Guo, Z. Lei, C. Zhang, and S. Z. Li, “Exclusivity-consistency regularized multi-view subspace clustering,” in Proc. CVPR,Jul. 2017, pp. 923–931.

[3] X. Peng, Z. Yu, Z. Yi, and H. Tang, “Constructing the L2-graph forrobust subspace learning and subspace clustering,” IEEE Trans. Cybern.,vol. 47, no. 4, pp. 1053–1066, Apr. 2016.

[4] R. Vidal, “Subspace clustering,” IEEE Signal Process. Mag., vol. 28,no. 2, pp. 52–68, Mar. 2011.

[5] H. Jia and Y.-M. Cheung, “Subspace clustering of categorical andnumerical data with an unknown number of clusters,” IEEE Trans.Neural Netw. Learn. Syst., vol. 29, no. 8, pp. 3308–3325, Aug. 2018.

[6] X. Xu, Z. Huang, D. Graves, and W. Pedrycz, “A clustering-based graphLaplacian framework for value function approximation in reinforce-ment learning,” IEEE Trans. Cybern., vol. 44, no. 12, pp. 2613–2625,Dec. 2014.

[7] P. Zhu, W. Zhu, Q. Hu, C. Zhang, and W. Zuo, “Subspace cluster-ing guided unsupervised feature selection,” Pattern Recognit., vol. 66,pp. 364–374, Jun. 2017.

[8] X. Cao, C. Zhang, C. Zhou, H. Fu, and H. Foroosh, “Constrained multi-view video face clustering,” IEEE Trans. Image Process., vol. 24, no. 11,pp. 4381–4393, Nov. 2015.

[9] E. Elhamifar and R. Vidal, “Sparse subspace clustering,” in Proc. CVPR,Jun. 2009, pp. 2790–2797.

[10] Y. Yang, J. Feng, N. Jojic, J. Yang, and T. S. Huang, “�0-sparse subspaceclustering,” in Proc. ECCV, 2016, pp. 731–747.

[11] A. Y. Ng, M. I. Jordan, and Y. Weiss, “On spectral clustering: Analysisand an algorithm,” in Proc. NIPS, 2002, pp. 849–856.

[12] J. Shi and J. Malik, “Normalized cuts and image segmentation,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 8, pp. 888–905,Aug. 2000.

[13] Y. Wang and J. Zhu, “DP-space: Bayesian nonparametric subspaceclustering with small-variance asymptotics,” in Proc. ICML, 2015,pp. 862–870.

[14] Z. Li, S. Yang, L.-F. Cheong, and K.-C. Toh, “Simultaneous clusteringand model selection for tensor affinities,” in Proc. CVPR, Jun. 2016,pp. 5347–5355.

[15] S. Javed, A. Mahmood, T. Bouwmans, and S. K. Jung, “Background–foreground modeling based on spatiotemporal sparse subspace cluster-ing,” IEEE Trans. Image Process., vol. 26, no. 12, pp. 5840–5854,Dec. 2017.

[16] D. Kumar, J. C. Bezdek, M. Palaniswami, S. Rajasegarar, C. Leckie,and T. C. Havens, “A hybrid approach to clustering in big data,” IEEETrans. Cybern., vol. 46, no. 10, pp. 2372–2385, Oct. 2016.

[17] E. Elhamifar and R. Vidal, “Sparse subspace clustering: Algorithm,theory, and applications,” IEEE Trans. Pattern Anal. Mach. Intell.,vol. 35, no. 11, pp. 2765–2781, Nov. 2013.

[18] B. Nasihatkon and R. Hartley, “Graph connectivity in sparse subspaceclustering,” in Proc. CVPR, Jun. 2011, pp. 2137–2144.

[19] K. Zhan, C. Zhang, J. Guan, and J. Wang, “Graph learning for multiviewclustering,” IEEE Trans. Cybern., vol. 48, no. 10, pp. 2887–2895,Oct. 2018.


[20] A. Rodriguez and A. Laio, “Clustering by fast search and find of densitypeaks,” Science, vol. 344, no. 6191, pp. 1492–1496, Jun. 2014.

[21] X. Peng, L. Zhang, and Z. Yi, “Scalable sparse subspace clustering,”in Proc. CVPR, Jun. 2013, pp. 430–437.

[22] S. Gao, I. W.-H. Tsang, and L.-T. Chia, “Laplacian sparse coding,hypergraph Laplacian sparse coding, and applications,” IEEE Trans.Pattern Anal. Mach. Intell., vol. 35, no. 1, pp. 92–104, Jan. 2013.

[23] S. Kim, C. D. Yoo, S. Nowozin, and P. Kohli, “Image segmentationusinghigher-order correlation clustering,” IEEE Trans. Pattern Anal.Mach. Intell., vol. 36, no. 9, pp. 1761–1774, Sep. 2014.

[24] F. Schroff, D. Kalenichenko, and J. Philbin, “FaceNet: A unified embed-ding for face recognition and clustering,” in Proc. CVPR, Jun. 2015,pp. 815–823.

[25] D. Zhang, J. Han, L. Zhao, and D. Meng, “Leveraging prior-knowledgefor weakly supervised object detection under a collaborative self-pacedcurriculum learning framework,” Int. J. Comput. Vis., vol. 127, no. 4,pp. 363–380, 2018.

[26] G. Cheng, J. Han, P. Zhou, and D. Xu, “Learning rotation-invariantand Fisher discriminative convolutional neural networks for objectdetection,” IEEE Trans. Image Process., vol. 28, no. 1, pp. 265–278,Jan. 2019.

[27] C.-G. Li, C. You, and R. Vidal, “Structured sparse subspace clustering:A joint affinity learning and subspace clustering framework,” IEEETrans. Image Process., vol. 26, no. 6, pp. 2988–3001, Jun. 2017.

[28] C. Zhang, H. Fu, Q. Hu, P. Zhu, and X. Cao, “Flexible multi-viewdimensionality co-reduction,” IEEE Trans. Image Process., vol. 26,no. 2, pp. 648–659, Feb. 2017.

[29] J. Han, G. Cheng, Z. Li, and D. Zhang, “A unified metric learning-basedframework for co-saliency detection,” IEEE Trans. Circuits Syst. VideoTechnol., vol. 28, no. 10, pp. 2473–2483, Oct. 2018.

[30] J. Han, H. Chen, N. Liu, C. Yan, and X. Li, “CNNs-based RGB-Dsaliency detection via cross-view transfer and multiview fusion,” IEEETrans. Cybern., vol. 48, no. 11, pp. 3171–3183, Nov. 2017.

[31] J. Liang, J. Yang, H.-Y. Lee, K. Wang, and M.-H. Yang, “Sub-GAN:An unsupervised generative model via subspaces,” in Proc. ECCV,Sep. 2018, pp. 698–714.

[32] J. Yang, J. Liang, H. Shen, K. Wang, P. L. Rosin, and M.-H. Yang,“Dynamic match kernel with deep convolutional features for imageretrieval,” IEEE Trans. Image Process., vol. 27, no. 11, pp. 5288–5302,Nov. 2018.

[33] C. You, D. Robinson, and R. Vidal, “Scalable sparse subspace clus-tering by orthogonal matching pursuit,” in Proc. CVPR, Jun. 2016,pp. 3918–3927.

[34] Y. Cheng, Y. Wang, M. Sznaier, and O. Camps, “Subspace clusteringwith priors via sparse quadratically constrained quadratic programming,”in Proc. CVPR, Jun. 2016, pp. 5204–5212.

[35] Z. Li, L.-F. Cheong, S. Yang, and K.-C. Toh, “Simultaneous clusteringand model selection: Algorithm, theory and applications,” IEEE Trans.Pattern Anal. Mach. Intell., vol. 40, no. 8, pp. 1964–1978, Aug. 2018.

[36] F. Wu, Y. Hu, J. Gao, Y. Sun, and B. Yin, “Ordered subspace clusteringwith block-diagonal priors,” IEEE Trans. Cybern., vol. 46, no. 12,pp. 3209–3219, Dec. 2016.

[37] C.-G. Li and R. Vidal, “Structured sparse subspace clustering: A unifiedoptimization framework,” in Proc. CVPR, Jun. 2015, pp. 277–286.

[38] H. Hu, Z. Lin, J. Feng, and J. Zhou, “Smooth representation clustering,”in Proc. CVPR, Jun. 2014, pp. 3834–3841.

[39] C. You, C.-G. Li, D. P. Robinson, and R. Vidal, “Oracle based active setalgorithm for scalable elastic net subspace clustering,” in Proc. CVPR,Jun. 2016, pp. 3928–3937.

[40] X. Fang, Y. Xu, X. Li, Z. Lai, and W. K. Wong, “Robust semi-supervisedsubspace clustering via non-negative low-rank representation,” IEEETrans. Cybern., vol. 46, no. 8, pp. 1828–1838, Aug. 2016.

[41] M. Rahmani and G. Atia, “Innovation pursuit: A new approach to thesubspace clustering problem,” in Proc. ICML, 2017, pp. 2874–2882.

[42] E. L. Dyer, A. C. Sankaranarayanan, and R. G. Baraniuk, “Greedyfeature selection for subspace clustering,” J. Mach. Learn. Res., vol. 14,no. 1, pp. 2487–2517, 2013.

[43] D. Park, C. Caramanis, and S. Sanghavi, “Greedy subspace clustering,”in Proc. NIPS, 2014, pp. 2753–2761.

[44] P. Purkait, T.-J. Chin, A. Sadri, and D. Suter, “Clustering with hyper-graphs: The case for large hyperedges,” IEEE Trans. Pattern Anal. Mach.Intell., vol. 39, no. 6, pp. 1697–1711, Sep. 2016.

[45] H. Liu, L. Latecki, and S. Yan, “Robust clustering as ensembles ofaffinity relations,” in Proc. NIPS, 2010, pp. 1414–1422.

[46] C.-G. Li and R. Vidal, “A structured sparse plus structured low-rankframework for subspace clustering and completion,” IEEE Trans. SignalProcess., vol. 64, no. 24, pp. 6557–6570, Dec. 2016.

[47] Y. Guo, J. Gao, and F. Li, “Spatial subspace clustering for drill holespectral data,” J. Appl. Remote Sens., vol. 8, no. 1, 2014, Art. no. 083644.

[48] J. Feng, Z. Lin, H. Xu, and S. Yan, “Robust subspace segmentation withblock-diagonal prior,” in Proc. CVPR, Jun. 2014, pp. 3818–3825.

[49] B. Wang, Y. Hu, J. Gao, Y. Sun, and B. Yin, “Product grassmannmanifold representation and its LRR models,” in Proc. AAAI, 2016,pp. 2122–2129.

[50] B. Liu, X.-T. Yuan, Y. Yu, Q. Liu, and D.-N. Metaxas, “Decentralizedrobust subspace clustering,” in Proc. AAAI, 2016, pp. 3539–3545.

[51] S. Xiao, W. Li, D. Xu, and D. Tao, “FaLRR: A fast low rankrepresentation solver,” in Proc. CVPR, Jun. 2015, pp. 4612–4620.

[52] C. Y. Lu, H. Min, Z.-Q. Zhao, L. Zhu, D.-S. Huang, and S. Yan,“Robust and efficient subspace segmentation via least squares regres-sion,” in Proc. ECCV, 2012, pp. 347–360.

[53] G. Liu, H. Xu, J. Tang, Q. Liu, and S. Yan, “A deterministic analysisfor LRR,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 3,pp. 417–430, Mar. 2016.

[54] C. Xu, Z. Lin, and H. Zha, “A unified convex surrogate for theSchatten-p norm,” in Proc. AAAI, 2017, pp. 926–932.

[55] Y.-X. Wang, H. Xu, and C. Leng, “Provable subspace clustering: WhenLRR meets SSC,” in Proc. NIPS, 2013, pp. 64–72.

[56] H. Lai, Y. Pan, C. Lu, Y. Tang, and S. Yan, “Efficient k-support matrixpursuit,” in Proc. ECCV, 2014, pp. 617–631.

[57] E. Kim, M. Lee, and S. Oh, “Robust elastic-net subspace representation,”IEEE Trans. Image Process., vol. 25, no. 9, pp. 4245–4259, Sep. 2016.

[58] C. Lu, J. Feng, Z. Lin, and S. Yan, “Correlation adaptive subspacesegmentation by trace lasso,” in Proc. ICCV, Dec. 2013, pp. 1345–1352.

[59] J. Xu, K. Xu, K. Chen, and J. Ruan, “Reweighted sparse subspaceclustering,” Comput. Vis. Image Understand., vol. 138, pp. 25–37,Sep. 2015.

[60] A. A. Abin, “Querying beneficial constraints before clustering usingfacility location analysis,” IEEE Trans. Cybern., vol. 48, no. 1,pp. 312–323, Jan. 2018.

[61] Y. Guo, J. Gao, and F. Li, “Spatial subspace clustering for hyperspectraldata segmentation,” in Proc. Int. Conf. Digit. Inf. Process. Commun.,2013, pp. 180–190.

[62] J. Wang, X. Wang, F. Tian, C. H. Liu, and H. Yu, “Constrained low-rankrepresentation for robust subspace clustering,” IEEE Trans. Cybern.,vol. 47, no. 12, pp. 4534–4546, Dec. 2017.

[63] Y. Yang, Z. Ma, Y. Yang, F. Nie, and H. T. Shen, “Multitask spectralclustering by exploring intertask correlation,” IEEE Trans. Cybern.,vol. 45, no. 5, pp. 1083–1094, May 2015.

[64] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma, “Robust recoveryof subspace structures by low-rank representation,” IEEE Trans. PatternAnal. Mach. Intell., vol. 35, no. 1, pp. 171–184, Jan. 2013.

[65] P. Favaro, R. Vidal, and A. Ravichandran, “A closed form solution torobust subspace estimation and clustering,” in Proc. CVPR, Jun. 2011,pp. 1801–1807.

[66] Z. Li, L.-F. Cheong, and S. Z. Zhou, “SCAMS: Simultaneous clusteringand model selection,” in Proc. CVPR, Jun. 2014, pp. 264–271.

[67] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithmfor discovering clusters in large spatial databases with noise,” in Proc.Int. Conf. Knowl. Discovery Data Mining, 1996, pp. 226–231.

[68] T. Beier, F. A. Hamprecht, and J. H. Kappes, “Fusion moves forcorrelation clustering,” in Proc. CVPR, Jun. 2015, pp. 3507–3516.

[69] C. Lu, J. Feng, Y. Chen, W. Liu, Z. Lin, and S. Yan, “Tensorrobust principal component analysis: Exact recovery of corrupted low-rank tensors via convex optimization,” in Proc. CVPR, Jun. 2016,pp. 5249–5257.

[70] X. Li, G. Cui, and Y. Dong, “Graph regularized non-negative low-rankmatrix factorization for image clustering,” IEEE Trans. Cybern., vol. 47,no. 11, pp. 3840–3853, May 2017.

[71] D. Zhou, J. Huang, and B. Schölkopf, “Learning with hypergraphs:Clustering, classification, and embedding,” in Proc. NIPS, 2006,pp. 1601–1608.

[72] M. Lee, J. Lee, H. Lee, and N. Kwak, “Membership representationfor detecting block-diagonal structure in low-rank or sparse subspaceclustering,” in Proc. CVPR, Jun. 2015, pp. 1648–1656.

[73] M. Belkin and P. Niyogi, “Laplacian eigenmaps and spectral techniquesfor embedding and clustering,” in Proc. NIPS, 2002, pp. 585–591.

[74] E. Elhamifar and R. Vidal, “Sparse manifold clustering and embedding,”in Proc. NIPS, 2011, pp. 55–63.


[75] C.-G. Li, J. Guo, and H.-G. Zhang, “Learning bundle manifold by doubleneighborhood graphs,” in Proc. ACCV, 2009, pp. 321–330.

[76] U. von Luxburg, “A tutorial on spectral clustering,” Statist. Comput.,vol. 17, no. 4, pp. 395–416, 2007.

[77] A. S. Georghiades, P. N. Belhumeur, and D. Kriegman, “From few tomany: Illumination cone models for face recognition under variablelighting and pose,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 23,no. 6, pp. 643–660, Jun. 2001.

[78] S. A. Nene, S. K. Nayar, and H. Murase, “Columbia object imagelibrary (COIL-20),” Columbia Univ., New York, NY, USA, Tech. Rep.CUCS-005-96, 1996.

[79] R. Tron and R. Vidal, “A benchmark for the comparison of 3-D motionsegmentation algorithms,” in Proc. CVPR, Jun. 2007, pp. 1–8.

[80] R. Cilibrasi and P. M. B. Vitányi, “Clustering by compression,” IEEETrans. Inf. Theory, vol. 51, no. 4, pp. 1523–1545, Apr. 2005.

[81] E. P. Xing, M. I. Jordan, S. J. Russell, and A. Y. Ng, “Distancemetric learning with application to clustering with side-information,”in Proc. NIPS, 2003, pp. 521–528.

[82] M. Palatucci, D. Pomerleau, G. E. Hinton, and T. M. Mitchell, “Zero-shot learning with semantic output codes,” in Proc. NIPS, 2009,pp. 1410–1418.

[83] T. Kawamoto, M. Tsubaki, and T. Obuchi, “Mean-field theory ofgraph neural networks in graph partitioning,” in Proc. NIPS, 2018,pp. 4366–4376.

Jie Liang is currently pursuing the master’s degreewith the College of Computer Science, Nankai Uni-versity. His current research interests include com-puter vision, machine learning, pattern recognition,and optimization.

Jufeng Yang received the Ph.D. degree from NankaiUniversity in 2009. From 2015 to 2016, he waswith the Vision and Learning Lab, University ofCalifornia at Merced, Merced. He is currently anAssociate Professor with the College of ComputerScience, Nankai University. His research is in thefield of computer vision, machine learning, andmultimedia.

Ming-Ming Cheng received the Ph.D. degree fromTsinghua University in 2012. Then, for two years, hewas a research fellow, with Prof. P. Torr in Oxford.He is currently a Professor with Nankai University,where he is leading the Media Computing Lab.His research interests include computer graphics,computer vision, and image processing. He receivedresearch awards, including the ACM China RisingStar Award, IBM Global SUR Award, and CCF-IntelYoung Faculty Researcher Program.

Paul L. Rosin is currently a Professor withthe School of Computer Science and Informatics,Cardiff University. His research interests includethe representation, segmentation, and groupingof curves, knowledge-based vision systems, earlyimage representations, low level image process-ing, machine vision approaches to remote sensing,methods for evaluation of approximation algo-rithms, medical and biological image analysis, meshprocessing, non-photorealistic rendering, and theanalysis of shape in art and architecture.

Liang Wang received the Ph.D. degree in patternrecognition and intelligent system from the NationalLaboratory of Pattern Recognition, Institute ofAutomation, Chinese Academy of Sciences (CAS),China, in 2004. He was a Research Assistant withImperial College London, U.K., and Monash Uni-versity, Australia, and a Research Fellow with TheUniversity of Melbourne, Australia. He was a Lec-turer with the Department of Computer Science,University of Bath, U.K. He is currently a Professorof the Hundred Talents Program of CAS with the

Institute of Automation, CAS. His major research interests include machinelearning, pattern recognition, computer vision, and data mining.

Simultaneous Subspace Clustering and Cluster Number ... · of triplets from different clusters...

Documents

Transcript of Simultaneous Subspace Clustering and Cluster Number ... · of triplets from different clusters...