Lee and Lee - 2010 - Dynamic Dissimilarity Measure for Support-Based Cl

download Lee and Lee - 2010 - Dynamic Dissimilarity Measure for Support-Based Cl

of 6

Transcript of Lee and Lee - 2010 - Dynamic Dissimilarity Measure for Support-Based Cl

  • 8/13/2019 Lee and Lee - 2010 - Dynamic Dissimilarity Measure for Support-Based Cl

    1/6

    Concise Papers __________________________________________________________________________________________Dynamic Dissimilarity Measurefor Support-Based Clustering

    Daewon Lee and Jaewook Lee

    AbstractClustering methods utilizing support estimates of a data distribution

    have recently attracted much attention because of their ability to generate cluster

    boundaries of arbitrary shape and to deal with outliers efficiently. In this paper, we

    propose a novel dissimilarity measure based on a dynamical system associated

    with support estimating functions. Theoretical foundations of the proposed

    measure are developed and applied to construct a clustering method that can

    effectively partition the whole data space. Simulation results demonstrate that

    clustering based on the proposed dissimilarity measure is robust to the choice of

    kernel parameters and able to control the number of clusters efficiently.

    Index TermsClustering, kernel methods, dynamical systems, equilibrium

    vector, support.

    1 INTRODUCTION

    RECENTLY,many researchers have successfully applied clusteringmethods based on the estimated support of a data distribution tosolve some difficult and diverse unsupervised learning problems[1], [2], [3], [4], [5]. These methods, inspired by kernel machinessuch as kernel-based clustering [5], [6], [7] and support vectorclustering [1], consist, in general, of two main stages: estimating asupport function [8], [9] and clustering data points based ongeometric structures of the estimated support function. The latterclustering stage is highly computer-intensive even in middle-scaleproblems and often shows poor clustering performance. Severalresearchers have therefore developed various techniques to reduceits computational complexity for the real applications, whichinclude approximated graph techniques [10], spectral graph

    partitioning strategy [11], ensembles combined strategy [12],chunking strategies [3], pseudohierarchical technique [13], orequilibrium-based techniques [4], [14], [15], [16].

    Despite their advantages over other clustering methods, theexisting support-based clustering algorithms have some draw-

    backs. First, out-of-the sample points outside of the generatedcluster boundaries cannot directly be assigned a cluster label.Second, the clustering results are very sensitive to the choice ofkernel parameters used for a support estimate since the boundariescan show highly fluctuating behavior caused by small changes ofthe kernel parameters [13]. Finally, it is difficult to control thenumber of clusters when they are applied to clustering problemswith a priori information of the cluster numbers. To obtainKclusters (related to finding corresponding kernel parameters),for example, they require a computationally intensive parametertuning process that involves repeated calls of support estimatingstep and cluster labeling step.

    To overcome these intrinsic handicaps, in this paper,we proposea novel dissimilarity measure that can be applied to support-based

    clustering. Starting from a support function that estimates thesupport of a data distribution, we build its associated dynamicprocess to partition the whole data space into so-called basin cells ofequilibrium vectors and to construct a weighted graph consisting ofequilibrium vectors. The constructed graph then defines a novel

    dissimilarity measure among equilibrium vectors with which wecan perform inductive clustering, that is, assigning cluster labels toout-of-the sample points as well as in-sample points. Unlike thetraditional SVC that focuses on the support vectors located on thecluster boundaries, the proposed dissimilarity measure focuses onthe equilibrium vectors located inside the generated clusters andcan be applied to any kernel-based support or density estimatingfunctions if they can reveal clusters of a data distribution well.Finally, we perform experiments to show that clustering based onthe proposed dissimilarity measure is robust to the choice of kernelparameters and is able to generate the user-specified number ofclusters without the parameter tuning process.

    2 DYNAMICDISSIMILARITYMEASURE

    2.1 Support of a Data DistributionA support function (or quantile function) is roughly defined as apositive scalar function f :

  • 8/13/2019 Lee and Lee - 2010 - Dynamic Dissimilarity Measure for Support-Based Cl

    2/6

    of Lfr [14]. To be specific, we build the following dynamicalsystem associated with f:

    dx

    dtFx: rfx: 4

    The existence of a unique solution (or trajectory) x: < ! 0 such that DM Lf,whereLf fx2 C e4qM

    2

    , we haveDM Lf.

    Finally, letr > . By the invariant property ofLfr(i.e., if apoint is on a connected component of Lfr, then its entirepositive trajectory lies on the same component), we should havethat for any point x02 Lfr n DM, the trajectory starting atx0 x0should first hit@DM, and then enter inside the regionDMsince all the equilibrium vectors are inside the region D Mand all the trajectories converge to one of its equilibriumvectors. This implies that the set DM is a strong deformationretract of the level set Lfr. Also, from the uniqueness of the

    trajectories [18], the boundary@DMis homeomorphic to@Lfr,which implies that L fr is connected from the connectednessof DM. Since the cardinality (i.e., the number of connectedcomponents) is the same between the graph G and Lfr [14];therefore, the graph Gr is connected. tu

    This theorem motivates us to define a dissimilarity measureon aconnected graph G Gr for r as follows:

    Definition 1 (Dissimilarity measure). Let a connected graph GV ; Ebe given. For a pair of SEVs, siand sj, in V, we can define thedistancedGsi; sjas

    dGsi; sj minn

    dEsi; sj; maxk1;...h

    dEsik1 ; sik o

    for a path sequence (with no cycle) si si0 ; si1 ; . . . ; sih1 ; sih sjsuch

    that s

    ik1; s

    ik 2E

    for each k

    1; . . . ; j

    , which endows a graph G

    with a dissimilarity measure. (Here, we assume dEsi; sj 1 ifsi; sj 62E.)

    Geometrically, the distance dG; takes the smallest functionvalue along a path connecting two SEVs to escape from one SEVand move on to the other SEV.

    3 CLUSTERINGBASED ON A DYNAMICDISSIMILARITYMEASURE

    Generally speaking, a support function (e.g., fin (2)) is often very

    sensitive to the choice of kernel parameters, so is the clusteringstructure described by Lfr. For example, if clusters overlap insome region, it is difficult or even impossible to find the kernelparameter to separate it. Moreover, to control the number ofclusters, we have to change the kernel parameters (hence, thesupport function f) by trial and error, as in [1], where eachalteration of kernel parameters entails repeated calls of a quadraticprogramming solver and a cluster labeling algorithm, which iscomputationally intensive.

    The derived dynamic dissimilarity measure on the graph G canhelp us to overcome these drawbacks when it is applied toclustering. Specifically, with an input K 1 denoting the numberof clusters, we begin with every SEV representing a singletoncluster. Denote these clusters C1 fs1g; . . . ; Cv fsvg. At each

    step, the closest two clusters (i.e., two separate clusters containing

    902 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 6, JUNE 2010

    Fig. 2. Illustration of the main step in Algorithm 1 after the prestep of Algorithm 1 in Fig. 1. The number of clusters at step l is M l, whereM 11 is the number of theSEVs, denoted by . The total number of edges is e 12. In each panel, thin solid lines represent a constructed graph Grfor varyingrand the thick solid lines represent

    the defining cluster boundaries generated byGr. At each step l, the next least weight edge does not always join two clusters, e.g., at step l 6, the edge joining s4and s7is the next least weight edge, but it no longer creates a new cluster since s4 and s7 are already in the same cluster. In this algorithm, the next weight edge is selected tojoin two clusters that do not result in a cycle when added to the set of already selected edges.

    Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on April 29,2010 at 10:20:45 UTC from IEEE Xplore. Restrictions apply.

  • 8/13/2019 Lee and Lee - 2010 - Dynamic Dissimilarity Measure for Support-Based Cl

    4/6

    two adjacent SEVs with the least edge weight distance) are mergedinto a single cluster, producing one less cluster at the next higher

    level. This process is terminated when we get K clusters starting

    fromv clusters. This procedure, employing a modified version of

    Kruskals algorithm [22] for minimum cost-spanning tree, is

    detailed in the following Algorithm 1 below (see Fig. 2):

    Algorithm 1. (Clustering based on a dynamic dissimilarity

    measure)

    (Pre-Step:)

    1: Given a support functionfand its associated weighted graph

    G V ; Ewhere si, i 1; . . . ; Mis the set of SEVs and

    dk, k 1; . . . ; e is the set of TEVs (cf. Algorithm 1 in [14])

    (Main-Step:)

    A.0.// Initialization //

    1: Given a number of clusters K

    2: Rearrange the index k 1; . . . ; e in such a way that

    fd1< fd2< < fde

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 6, JUNE 2010 903

    Fig. 3. Generated clusters by Algorithm 1 with different kernel parameters q 15 for (a) and q 50 for (b), given the number of clusters K 3. Irrespective ofconsiderable change of kernel parameter q, the proposed method generates very similar cluster boundaries, represented by thick solid lines, with the same clusterlabeling, which shows the robustness of the proposed method to varying kernel parameter q.

    TABLE 1Experimental Results on Benchmark Data Sets

    RIadj denotes the adjusted Rand Index.

    Fig. 4. Comparison of the proposed method (b) with traditional SVC algorithms(a) applied to iris and crab data sets with overlaps between clusters,respectively.

    Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on April 29,2010 at 10:20:45 UTC from IEEE Xplore. Restrictions apply.

  • 8/13/2019 Lee and Lee - 2010 - Dynamic Dissimilarity Measure for Support-Based Cl

    5/6

    3: Start with initial clusters as C1 fs1g; . . . ; CM fsMg. In this

    initial step, the distance between two clusters is defined as

    dCi; Cj dEsi; sj fdk ifsi; sj 2 E

    1 otherwise

    A.1.// Single-linkage amalgamating.//

    1: Set l 1 and k 1

    2:while l M Kdo

    3: Find the SEVs s i; sj with its edge weight dEsi; sj fdk

    4: if s i; sj are not in the same clusters then

    5: CMl Ca[ Cb where si2 Ca and sj 2 Cb6: dCMl; Cu minfdCa; Cu; dCb; Cug for all remaining

    clustersCu7: Add cluster CMl as a new cluster and remove clusters

    Ca and Cb

    8: Set l l 1; k k 1

    9: else

    10: k k 1

    11: end if

    12:end while

    This algorithm possesses a monotonicity property. That is, the

    dissimilarity between merged clusters is monotone increasing with

    the level of the merger. Thus, the binary tree, called a dendrogram,

    can be plotted so that the height of each node is proportional to the

    value of the intergroup dissimilarity between its two daughters, as

    is shown in Fig. 3. This nice property makes the method less

    sensitive to the choice of kernel parameters unlike the traditional

    support-based clustering algorithms such as the SVC in [1] (see

    Fig. 3). Also, Algorithm 1 enables us to control the number of

    clusters by manipulating the constructed graph without changing

    the kernel parameters (see Fig. 2).The most time-consuming step in Algorithm 1 involves locating

    SEVs and TEVs for constructing a weighted graph G V ; E. If we

    letm(usually order of 5-20) be the average number of iterations for

    locating SEVs from data points via steepest descent process, then

    the time complexity of getting all SEVs and TEVs of system (4) is

    ON mand OM2d, respectively. Here, Mis the number of SEVs

    andd

    is the average number of computing TEVs between SEVs [14].

    4 EXPERIMENTALRESULTS

    To demonstrate the performance of the clustering algorithm basedon the proposed dynamic dissimilarity measure empirically, weapplied it to well-known benchmark classification data sets1 andcompared it with other state-of-the-art kernel clustering methods:kernel clustering method by Camastra et al. [2] (K-SVC) andspectral clustering method [23] (Spectral) (see Table 1). In theproposed method, we used a support function in (2) and aGaussian kernel parameterqwas randomly chosen for all data setswithout parameter tuning in order to check the robustness to thekernel parameter. As a measure of clustering accuracy, we usedthe adjusted Rand Index (RIadj), which is the similarity measure

    between two data partitions [24]. Here, we calculated RIadj

    between the predicted cluster labels and the true class labels.The RIadj has a value between 0 and 1, with 0 indicating that thepredicted cluster labels and the class labels do not agree on anypair of points and 1 indicating that these two partitions are exactlythe same. Table 1 shows that the proposed method has the largestRIadj values and even the value of 1 among all data sets exceptiris and sunflower, thus outperforming other methods for allof the data sets where the cluster number Kis a priori known.

    In addition, we applied it to some well-known overlappedclustering problems, iris and crab. Fig. 4 shows the clusteringresult of the proposed method (2ndand4thpanels) compared withthe original SVC of [1] (1stand3rdpanels) for which the parameterset C; q is selected after some trials and errors. To split theoverlapped clusters, the original SVC should introduce many

    boundary support vectors (BSVs), which may render even in-

    sample data points unlabeled. In contrast, the proposed methodnot only successfully separates these clusters without allowingBSVs, but also can assign a cluster label to both in-sample and out-of-the sample data points. The result illustrates how well theproposed method can split the overlapped clusters.

    Fig. 5 shows the result of the proposed method applied toimage segmentation problems to check its scalability for large datasets. The proposed method easily generates three differentsegmentation results (with varying cluster sizes) from the samesupport function without repeated calls of support estimating stepand cluster labeling step. Hence, the image-pixel clustering processto find a suitable parameter for a specific cluster size becomes lesscomputationally intensive.

    904 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 6, JUNE 2010

    1. The data sets are downloadable in the following Webpage: http://

    sites.google.com/site/daewonweb/file/artificialData.zip.

    Fig. 5. Image segmentation results. From the same support function, three different segmentation results are generated.

    Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on April 29,2010 at 10:20:45 UTC from IEEE Xplore. Restrictions apply.

  • 8/13/2019 Lee and Lee - 2010 - Dynamic Dissimilarity Measure for Support-Based Cl

    6/6

    5 CONCLUSIONS

    In this paper, we have proposed a dynamic dissimilarity measure

    for support-based clustering. Through simulations, the clustering

    based on the derived dynamic dissimilarity measure is shown to

    be less sensitive to the choice of kernel parameters and is able to

    efficiently control the number of clusters. Also, it works

    successfully for various challenging clustering problems. Theproposed measure can be derived with minor modifications from

    any support or density estimating functions. An application of the

    proposed measure to other practical problems remains to be

    investigated further.

    ACKNOWLEDGMENTS

    This work was supported partially by the Korea Research

    Foundation under the Grant number KRF-2008-314-D00483 and

    partially by the KOSEF under the Grant number R01-2007-000-

    20792-0. The work of the first author (Daewon Lee) was partially

    supported by the Korea Research Foundation under the Grant

    number KRF-2008-357-D00231.

    REFERENCES[1] A. Ben-Hur, D. Horn, H.T. Siegelmann, and V. Vapnik, Support Vector

    Clustering, J. Machine Learning Research, vol. 2, pp. 125-137, 2001.[2] F. Camastra and A. Verri, A Novel Kernel Method for Clustering,IEEE

    Trans. Pattern Analysis and Machine Intelligence,vol. 27, no. 5, pp. 801-805,May 2005.

    [3] T. Ban and S. Abe, Spatially Chunking Support Vector ClusteringAlgorithm, Proc. Intl Joint Conf. Neural Networks, pp. 414-418, 2004.

    [4] J. Lee and D. Lee, An Improved Cluster Labeling Method for SupportVector Clustering, IEEE Trans. Pattern Analysis and Machine Intelligence,vol. 27, no. 3, pp. 461-464, Mar. 2005.

    [5] M. Girolami, Mercer Kernel-Based Clustering in Feature Space, IEEETrans. Neural Networks, vol. 13, no. 3, pp. 780-784, May 2002.

    [6] S. Chen and D. Zhang, Robust Image Segmentation Using FCM withSpatial Constraints Based on New Kernel-Induced Distance Metric, IEEETrans. System, Man, and CyberneticsPart B, vol. 34, no. 4, pp. 1907-1916,

    Aug. 2004.[7] D. Zhang and S. Chen, A Novel Kernelised Fuzzy C-Means Algorithm

    with Application in Medical Image Segmentation, Artificial Intelligence inMedicine, vol. 32, no. 1, pp. 37-50, 2004.

    [8] D.M.J. Tax and R.P.W. Duin, Support Vector Domain Description,PatternRecognition Letters, vol. 20, pp. 1191-1199, 1999.

    [9] B. Scholkopf, J. Platt, J. Shawe-Taylor, A. Smola, and R. Williamson,Estimating the Support of a High-Dimensional Distribution, NeuralComputation, vol. 13, no. 7, pp. 1443-1471, 2001.

    [10] J. Yang, V. Estivill-Castro, and S.K. Chalup, Support Vector Clusteringthrough Proximity Graph Modelling, Proc. Ninth Intl Conf. NeuralInformation Processing (ICONIP 02), pp. 898-903, 2002.

    [11] J. Park, X. Ji, H. Zha, and R. Kasturi, Support Vector Clustering Combinedwith Spectral Graph Partitioning, Proc. 17th Intl Conf. Pattern Recognition(ICPR04), pp. 581-584, 2004.

    [12] W.J. Puma-Villanueva, G.B. Bezerra, C.A.M. Lima, and F.J.V. Zuben,Improving Support Vector Clustering with Ensembles, Proc. Intl JointConf. Neural Networks, 2005.

    [13] M.S. Hansen, K. Sjostrand, H. Olafsdottir, H.B. Larsson, M.B. Stegmann,

    and R. Larsen, Robust Pseudohierarchical Support Vector Clustering,Proc. Scandinavian Conf. Image Analysis (SCIA 07), pp. 808-817, 2007.[14] J. Lee and D. Lee, Dynamic Characterization of Cluster Structures for

    Robust and Inductive Support Vector Clustering, IEEE Trans. PatternAnalysis and Machine Intelligence, vol. 28, no. 11, pp. 461-464, Nov. 2006.

    [15] H.-C. Kim and J. Lee, Clustering Based on Gaussian Processes, NeuralComputation, vol. 19, no. 11, pp. 3088-3107, 2007.

    [16] D. Lee and J. Lee, Equilibrium-Based Support Vector Machine for Semi-Supervised Classification, IEEE Trans. Neural Networks, vol. 18, no. 2,pp. 578-583, Mar. 2007.

    [17] D. Lee and J. Lee, Domain Described Support Vector Classifier for Multi-Classification Problems, Pattern Recognition, vol. 40, pp. 41-51, 2007.

    [18] J. Guckenheimer and P. Homes, Nonlinear Oscillations, Dynamical Systems,and Bifurcations of Vector Fields. Springer, 1986.

    [19] H.K. Khalil, Nonlinear Systems. Macmillan, 1992.[20] J. Lee, An Optimization-Driven Framework for the Computation of the

    Controlling UEP in Transient Stablity Analysis, IEEE Trans. AutomaticControl, vol. 49, no. 1, pp. 115-119, Jan. 2004.

    [21] J. Lee, A Novel Three-Phase Trajectory Informed Search Methodology for

    Global Optimization, J. Global Optimization, vol. 38, no. 1, pp. 61-77, 2007.

    [22] J.B. Kruskal, On the Shortest Spanning Subtree of a Graph and theTraveling Salesman Problem, Proc. Am. Math. Soc., vol. 7, no. 1, pp. 48-50,1956.

    [23] A.Y. Ng, M.I. Jordan, and Y. Weiss, On Spectral Clustering: Analysis andan Algorithm, Advances in Neural Information Processing Systems,pp. 849-856, MIT Press, 2001.

    [24] L. Hubert and P. Arabie, Comparing Partitions, J. Classification, vol. 2,pp. 193-218, 1985.

    . For more information on this or any other computing topic, please visit ourDigital Library at www.computer.org/publications/dlib.

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 6, JUNE 2010 905

    A h i d li d li i d TONGJI UNIVERSITY D l d d A il 29 2010 10 20 45 UTC f IEEE X l R i i l